>From Sidecar's perspective the things it only cares about is that
profiling should start, into what file it wants it to save it, what
format should be the result in and what events it wants to gather

{
  "event": "cpu",
  "format": "jfr",
  "file": "result.jfr" // optional
  "duration": "5m"
}

This will then be translated to MBean's start() method accepting these
four parameters.

I mean ... what is ever going to change here?

The fact that it is also going to be integrated into things like
Corretto likely means that there will be great emphasis on keeping the
interface of that as it is, I do not think that once they start to
integrate it into a JDK then they will suddenly want to change the way
how that tool works to such an extent that it would be unrecognizable
and all other integrations not possible. I think that the author(s) of
that tool is aware that this tool is spreading around into places
where backward compatibility is a necessity.

On Fri, Dec 12, 2025 at 7:54 PM Jaydeep Chovatia
<[email protected]> wrote:
>
> +1 to what Jon said.
> We should not create any abstraction due to the following reasons 1) If async 
> profiler API changes, then there must be some valid reason 2) Async profiler 
> is not going to be in the hot path for Cassandra users, so even if it breaks, 
> it is ok.
>
> Jaydeep
>
> On Fri, Dec 12, 2025 at 10:11 AM Jon Haddad <[email protected]> wrote:
>>
>> We went over this a while back, and personally, not only am I not concerned 
>> about exposing their api, but I prefer it.  Which is why we discussed both 
>> options.
>>
>> if they're changing the API of the profiler (they haven't yet), then there 
>> would be an exceptionally good reason for it.  I don't expect this would 
>> ever happen.  We don't need a premature layer of abstraction here.  If it 
>> does, we can address it.
>>
>>
>>
>> On Fri, Dec 12, 2025 at 9:23 AM David Capwell <[email protected]> wrote:
>>>
>>> > I am also not completely sure what you meant by "manager", what
>>> > manager? Is that some terminology from  your work or something we have
>>> > here? Genuinely asking what you mean by that, I am lost a bit here.
>>>
>>> Sorry, sleep deprived with a 3 month old atm… manager == side car… Side Car 
>>> is adding async profiler to their API, there was a thread about it awhile 
>>> back.
>>>
>>> > When it comes to API, we are not touching anything already there. We
>>> > expose this through brand new
>>> > org.apache.cassandra.profiler.AsyncProfilerMBean.
>>>
>>> Adding a new API isn’t a breaking change, but the point I made in the side 
>>> car thread is that the “execute” function uses the same arguments that 
>>> async profiler does, which could change for us over time as its a 3rd party 
>>> API.  Exposing a 3rd party API puts us at risk as we normally support 
>>> things for 10+ years so if they make a change than Cassandra also makes 
>>> such a change… will we detect this? To us its just a string, so how would 
>>> we know that this happened to protect our users?
>>>
>>>
>>> > On Dec 12, 2025, at 6:45 AM, Štefan Miklošovič <[email protected]> 
>>> > wrote:
>>> >
>>> > Hi Jon, answers below
>>> >
>>> > On Fri, Dec 12, 2025 at 2:19 AM Jon Haddad <[email protected]> 
>>> > wrote:
>>> >>
>>> >> +1 to including it, conceptually.  It's easily the best tool for 
>>> >> diagnosing perf issues that I've used. I've got a few questions / 
>>> >> thoughts about implementation details & user ergonomics.
>>> >>
>>> >> - Capturing call stacks in modern kernels require some params to be set. 
>>> >>  Are we going to be able to check the requirements are met and give the 
>>> >> user feedback?
>>> >
>>> > Indeed, we go to inform a user on two occasions. First, the check will
>>> > be executed in the context of Startup Checks "framework" we already
>>> > have in place in Cassandra, reading respective parameters from /proc
>>> > and a message will be logged if values of these parameters are not
>>> > "ideal". We do not go to fail the startup if they are not though. Just
>>> > a warning, because a user can always set it while Cassandra runs. No
>>> > need to _fail_ the startup.
>>> >
>>> > However, later on, if you go to profile via "nodetool profile start"
>>> > and these two are not set as they should be we will fail and inform a
>>> > user that they need to set them first.
>>> >
>>> >> - Profiling in containers is a little weird [1].  Same type of issue as 
>>> >> my first point.
>>> >
>>> > I have run this in a container (Docker Compose) and I just did not
>>> > need to do anything. It just ... worked. I think this will be on a
>>> > user to ensure all is in place if anything special is needed.  We are
>>> > also not dealing with any "pids" here as profiling is running in JVM
>>> > via AsyncProfiler API. (2)
>>> >
>>> >> - Getting allocation profiles requires debug symbols.  More ergonomics.
>>> >
>>> > That is an old recommendation in the context of Cassandra 6.0 this
>>> > lands in, no? Which runs on 11+. They say "Prior to JDK 11" which does
>>> > not happen here.
>>> >
>>> > https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingModes.md#installing-debug-symbols
>>> >
>>> >> - The profiler moves a lot faster than we do.  Are we going to bump the 
>>> >> async profiler in bug fix C* releases or are we freezing the version?
>>> >
>>> > I would update major versions of async profiler only in major versions
>>> > of Cassandra. Patch versions of AsyncProfiler might be updated within
>>> > patch versions of Cassandra. That makes the most sense to me.
>>> >
>>> > If you want to use something more recent without Cassandra providing
>>> > it first, you can basically do this and it should just work.
>>> >
>>> >> - Can I still attach using the asprof tool?  Will there be an issue if I 
>>> >> attach a newer version of the profiler?
>>> >
>>> > As said, the fact whether we can profile in Cassandra via in-built
>>> > profiler is driven by a system property, defaults to false. When set
>>> > to false, that means the logic which would check kernel parameters or
>>> > which would instantiate the AsyncProfiler object (as shown in (2))
>>> > would not be exercised at all. Hence nothing "async-related" would be
>>> > instantiated in Cassandra etc. Then you can just take the async
>>> > profiler as you know it and run bin/asprof for Cassandra's PID as you
>>> > are used to. That also answers what happens if you use a newer version
>>> > - it would act the very same way.
>>> >
>>> >> - Are we relocating the jars, or does Corretto?
>>> >
>>> > The current patch does it in such a way that we are depending on
>>> > AsyncProfiler and it will be eventually included in release tarball.
>>> > So if you start Cassandra, that library will be on the class path
>>> > (even though until a system property is set to true which enables it,
>>> > it will not be possible to use it and it is not in any way
>>> > instantiated or initialized, it is also not possible to enable it in
>>> > runtime).
>>> >
>>> > (1) 
>>> > https://github.com/apache/cassandra/blob/1b6e538c98db4287795692b7df88aa4940c3a7af/doc/modules/cassandra/pages/managing/operating/async-profiler.adoc#using-a-different-async-profiler-version
>>> > (2) 
>>> > https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#example-usage-with-the-api
>>> >
>>> >>
>>> >> Thanks!
>>> >> Jon
>>> >>
>>> >> [1] 
>>> >> https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingInContainer.md
>>> >>
>>> >> On Thu, Dec 11, 2025 at 1:12 PM Josh McKenzie <[email protected]> 
>>> >> wrote:
>>> >>>
>>> >>> If we expose whatever API the 3rd party has and they drift or break it 
>>> >>> in the future, we could introduce a shim that would keep prior 
>>> >>> ergonomics at that time w/sane defaults or graceful handling of 
>>> >>> removals.
>>> >>>
>>> >>> Think "manager" is referring to the sidecar here.
>>> >>>
>>> >>> On Thu, Dec 11, 2025, at 2:03 PM, Štefan Miklošovič wrote:
>>> >>>
>>> >>> Can you help me to understand what you mean by that? I have a feeling
>>> >>> I am missing something here or we are not on the same page.
>>> >>>
>>> >>> When it comes to API, we are not touching anything already there. We
>>> >>> expose this through brand new
>>> >>> org.apache.cassandra.profiler.AsyncProfilerMBean.
>>> >>>
>>> >>> So we are not really breaking anything here?
>>> >>>
>>> >>> I am also not completely sure what you meant by "manager", what
>>> >>> manager? Is that some terminology from  your work or something we have
>>> >>> here? Genuinely asking what you mean by that, I am lost a bit here.
>>> >>>
>>> >>> If you mean that "we start to call AsyncProfiler and then in later
>>> >>> versions these guys decide that they will change how it is called" I
>>> >>> do not think that is really an issue here, is it? A user does not deal
>>> >>> with that directly anyway at all, only via MBean and there will
>>> >>> presumably always be a way to start and stop profiling, that is
>>> >>> basically at the very core of what that library is doing, no?
>>> >>>
>>> >>> On Thu, Dec 11, 2025 at 7:03 PM David Capwell <[email protected]> 
>>> >>> wrote:
>>> >>>>
>>> >>>> If disabled, which is default,
>>> >>>>
>>> >>>>
>>> >>>> I def won’t block on this, I just want us to think about these 
>>> >>>> possible problems before we touch a public API; ill leave it to 
>>> >>>> author(s)/reviewer(s).
>>> >>>>
>>> >>>> One thing that has been brought up in a different context is if we can 
>>> >>>> make breaking changes to public facing APIs if the thing is disabled 
>>> >>>> by default (debug tables is the example); I personally don’t have 
>>> >>>> clarity here for the project so hard to say.
>>> >>>>
>>> >>>> TL;DR I am +0
>>> >>>>
>>> >>>> On Dec 11, 2025, at 3:30 AM, Štefan Miklošovič 
>>> >>>> <[email protected]> wrote:
>>> >>>>
>>> >>>> Oh wow! Thanks Dmitry for all these references. I think that the fact
>>> >>>> Corretto includes that into JDK is the testament of the quality.
>>> >>>>
>>> >>>> David, I hope this answers your concerns pretty much?
>>> >>>>
>>> >>>> On Thu, Dec 11, 2025 at 12:27 PM Dmitry Konstantinov 
>>> >>>> <[email protected]> wrote:
>>> >>>>
>>> >>>>
>>> >>>> + 1 from my side
>>> >>>>
>>> >>>> 1) it is well known mature profiling tool, there are other cases when 
>>> >>>> Apache projects embedded it, for example:
>>> >>>> - https://issues.apache.org/jira/browse/HADOOP-18055
>>> >>>> - https://issues.apache.org/jira/browse/HBASE-29045
>>> >>>> - https://issues.apache.org/jira/browse/FLINK-33325
>>> >>>> 2) Apache-2.0 license
>>> >>>> 3) the dependency has a small size (less than 1Mb) and does not have 
>>> >>>> transitive dependencies to other 3rd parties
>>> >>>> 4) the main contributors are now in Amazon, it is even included into 
>>> >>>> Corretto JDK now 
>>> >>>> (https://aws.amazon.com/about-aws/whats-new/2025/10/amazon-corretto-october-2025-quarterly-updates/
>>> >>>>  )
>>> >>>> 5) the logic is disabled by default, so no impact if you do not use it.
>>> >>>>
>>> >>>>
>>> >>>> On Wed, 10 Dec 2025 at 18:08, Štefan Miklošovič 
>>> >>>> <[email protected]> wrote:
>>> >>>>
>>> >>>>
>>> >>>> This capability is disabled by default, it is driven by a system
>>> >>>> property you have to set to true in order to be able to get an
>>> >>>> instance of AsyncProfiler which does the actual profiling. If
>>> >>>> disabled, which is default, then any calls via nodetool which needs
>>> >>>> AsyncProfiler (start, stop, status) would return a message that
>>> >>>> profiling is not enabled.
>>> >>>>
>>> >>>> Not sure if this answers your concerns but without knowingly turning
>>> >>>> it on nothing happens.
>>> >>>>
>>> >>>> On Wed, Dec 10, 2025 at 6:28 PM David Capwell <[email protected]> 
>>> >>>> wrote:
>>> >>>>
>>> >>>>
>>> >>>> I have no issues adding it.  I think my only real comment would be the 
>>> >>>> same as with manager; w/e we expose to the public api (in this case 
>>> >>>> Nodetool) we have to support, so if a 3rd party lib breaks 
>>> >>>> compatibility that puts us in a bind if we didn’t think about that up 
>>> >>>> front.
>>> >>>>
>>> >>>> Having async-profiler exposed makes it easier to profile is a good 
>>> >>>> thing.  Manager has (or is in the process of adding) API auth so we 
>>> >>>> can lock down async-profiler to those “allowed” but do we have similar 
>>> >>>> in Nodetool?  We had an issue in the past that async-profiler would 
>>> >>>> trigger a JVM crash (JVM bug), so we had to limit calls to it until it 
>>> >>>> was fixed.
>>> >>>>
>>> >>>> On Dec 10, 2025, at 9:00 AM, Štefan Miklošovič 
>>> >>>> <[email protected]> wrote:
>>> >>>>
>>> >>>> Worth to mention that we were also contemplating about the inclusion
>>> >>>> of jfr-convert so a user can also convert raw JFR files to e.g. HTML
>>> >>>> with heatmaps but we evaluated that it is not necessary. Sure, it
>>> >>>> would be comfortable, but ultimately not needed. Conversion of such a
>>> >>>> file via nodetool, on server side, is just not a good idea, it is not
>>> >>>> a job of a server to convert anything.
>>> >>>>
>>> >>>> In majority of cases, people using the profiler just want to get a
>>> >>>> HTML with cpu / allocation profile, it can even gather JFR files as
>>> >>>> such and fetch it is, it is just that the conversion as such can
>>> >>>> happen on client's side instead.
>>> >>>>
>>> >>>> I am +1 for introducing the core async profiler library only.
>>> >>>>
>>> >>>> On Wed, Dec 10, 2025 at 5:46 PM Bernardo Botella
>>> >>>> <[email protected]> wrote:
>>> >>>>
>>> >>>>
>>> >>>> Hi everyone!
>>> >>>>
>>> >>>> I’d like to propose adding the async-profiler library to the Cassandra 
>>> >>>> project. This will enable us to add a new nodetool command to do 
>>> >>>> profiling tasks on the process running Cassandra. This information can 
>>> >>>> be useful to debug a wide range of potential issues and performance 
>>> >>>> optimizations. CASSANDRA-20854 captures the effort and the details of 
>>> >>>> the proposal, and this PR proposes its implementation.
>>> >>>>
>>> >>>> I want to note that this feature was already discussed in this thread, 
>>> >>>> and this one only want to make sure that no one has any concerns about 
>>> >>>> adding the library as a dependency.
>>> >>>>
>>> >>>> What is async-profiler?
>>> >>>> async-profiler is a low overhead sampling profiler for Java that does 
>>> >>>> not suffer from the Safepoint bias problem. It features 
>>> >>>> HotSpot-specific API to collect stack traces and to track memory 
>>> >>>> allocations. The profiler works with OpenJDK and other Java runtimes 
>>> >>>> based on the HotSpot JVM.
>>> >>>>
>>> >>>> Unlike traditional Java profilers, async-profiler monitors non-Java 
>>> >>>> threads (e.g., GC and JIT compiler threads) and shows native and 
>>> >>>> kernel frames in stack traces.
>>> >>>>
>>> >>>> What can be profiled:
>>> >>>>
>>> >>>> CPU time
>>> >>>> Allocations in Java Heap
>>> >>>> Native memory allocations and leaks
>>> >>>> Contended locks
>>> >>>> Hardware and software performance counters like cache misses, page 
>>> >>>> faults, context switches
>>> >>>> and more.
>>> >>>>
>>> >>>>
>>> >>>> We propose to add async-profiler 4.2 as a dependency to Cassandra.
>>> >>>>
>>> >>>> Any concerns?
>>> >>>> Bernardo
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Dmitry Konstantinov
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>>

Reply via email to