> I am also not completely sure what you meant by "manager", what
> manager? Is that some terminology from  your work or something we have
> here? Genuinely asking what you mean by that, I am lost a bit here.

Sorry, sleep deprived with a 3 month old atm… manager == side car… Side Car is 
adding async profiler to their API, there was a thread about it awhile back.

> When it comes to API, we are not touching anything already there. We
> expose this through brand new
> org.apache.cassandra.profiler.AsyncProfilerMBean.

Adding a new API isn’t a breaking change, but the point I made in the side car 
thread is that the “execute” function uses the same arguments that async 
profiler does, which could change for us over time as its a 3rd party API.  
Exposing a 3rd party API puts us at risk as we normally support things for 10+ 
years so if they make a change than Cassandra also makes such a change… will we 
detect this? To us its just a string, so how would we know that this happened 
to protect our users?


> On Dec 12, 2025, at 6:45 AM, Štefan Miklošovič <[email protected]> wrote:
> 
> Hi Jon, answers below
> 
> On Fri, Dec 12, 2025 at 2:19 AM Jon Haddad <[email protected]> wrote:
>> 
>> +1 to including it, conceptually.  It's easily the best tool for diagnosing 
>> perf issues that I've used. I've got a few questions / thoughts about 
>> implementation details & user ergonomics.
>> 
>> - Capturing call stacks in modern kernels require some params to be set.  
>> Are we going to be able to check the requirements are met and give the user 
>> feedback?
> 
> Indeed, we go to inform a user on two occasions. First, the check will
> be executed in the context of Startup Checks "framework" we already
> have in place in Cassandra, reading respective parameters from /proc
> and a message will be logged if values of these parameters are not
> "ideal". We do not go to fail the startup if they are not though. Just
> a warning, because a user can always set it while Cassandra runs. No
> need to _fail_ the startup.
> 
> However, later on, if you go to profile via "nodetool profile start"
> and these two are not set as they should be we will fail and inform a
> user that they need to set them first.
> 
>> - Profiling in containers is a little weird [1].  Same type of issue as my 
>> first point.
> 
> I have run this in a container (Docker Compose) and I just did not
> need to do anything. It just ... worked. I think this will be on a
> user to ensure all is in place if anything special is needed.  We are
> also not dealing with any "pids" here as profiling is running in JVM
> via AsyncProfiler API. (2)
> 
>> - Getting allocation profiles requires debug symbols.  More ergonomics.
> 
> That is an old recommendation in the context of Cassandra 6.0 this
> lands in, no? Which runs on 11+. They say "Prior to JDK 11" which does
> not happen here.
> 
> https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingModes.md#installing-debug-symbols
> 
>> - The profiler moves a lot faster than we do.  Are we going to bump the 
>> async profiler in bug fix C* releases or are we freezing the version?
> 
> I would update major versions of async profiler only in major versions
> of Cassandra. Patch versions of AsyncProfiler might be updated within
> patch versions of Cassandra. That makes the most sense to me.
> 
> If you want to use something more recent without Cassandra providing
> it first, you can basically do this and it should just work.
> 
>> - Can I still attach using the asprof tool?  Will there be an issue if I 
>> attach a newer version of the profiler?
> 
> As said, the fact whether we can profile in Cassandra via in-built
> profiler is driven by a system property, defaults to false. When set
> to false, that means the logic which would check kernel parameters or
> which would instantiate the AsyncProfiler object (as shown in (2))
> would not be exercised at all. Hence nothing "async-related" would be
> instantiated in Cassandra etc. Then you can just take the async
> profiler as you know it and run bin/asprof for Cassandra's PID as you
> are used to. That also answers what happens if you use a newer version
> - it would act the very same way.
> 
>> - Are we relocating the jars, or does Corretto?
> 
> The current patch does it in such a way that we are depending on
> AsyncProfiler and it will be eventually included in release tarball.
> So if you start Cassandra, that library will be on the class path
> (even though until a system property is set to true which enables it,
> it will not be possible to use it and it is not in any way
> instantiated or initialized, it is also not possible to enable it in
> runtime).
> 
> (1) 
> https://github.com/apache/cassandra/blob/1b6e538c98db4287795692b7df88aa4940c3a7af/doc/modules/cassandra/pages/managing/operating/async-profiler.adoc#using-a-different-async-profiler-version
> (2) 
> https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#example-usage-with-the-api
> 
>> 
>> Thanks!
>> Jon
>> 
>> [1] 
>> https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingInContainer.md
>> 
>> On Thu, Dec 11, 2025 at 1:12 PM Josh McKenzie <[email protected]> wrote:
>>> 
>>> If we expose whatever API the 3rd party has and they drift or break it in 
>>> the future, we could introduce a shim that would keep prior ergonomics at 
>>> that time w/sane defaults or graceful handling of removals.
>>> 
>>> Think "manager" is referring to the sidecar here.
>>> 
>>> On Thu, Dec 11, 2025, at 2:03 PM, Štefan Miklošovič wrote:
>>> 
>>> Can you help me to understand what you mean by that? I have a feeling
>>> I am missing something here or we are not on the same page.
>>> 
>>> When it comes to API, we are not touching anything already there. We
>>> expose this through brand new
>>> org.apache.cassandra.profiler.AsyncProfilerMBean.
>>> 
>>> So we are not really breaking anything here?
>>> 
>>> I am also not completely sure what you meant by "manager", what
>>> manager? Is that some terminology from  your work or something we have
>>> here? Genuinely asking what you mean by that, I am lost a bit here.
>>> 
>>> If you mean that "we start to call AsyncProfiler and then in later
>>> versions these guys decide that they will change how it is called" I
>>> do not think that is really an issue here, is it? A user does not deal
>>> with that directly anyway at all, only via MBean and there will
>>> presumably always be a way to start and stop profiling, that is
>>> basically at the very core of what that library is doing, no?
>>> 
>>> On Thu, Dec 11, 2025 at 7:03 PM David Capwell <[email protected]> wrote:
>>>> 
>>>> If disabled, which is default,
>>>> 
>>>> 
>>>> I def won’t block on this, I just want us to think about these possible 
>>>> problems before we touch a public API; ill leave it to 
>>>> author(s)/reviewer(s).
>>>> 
>>>> One thing that has been brought up in a different context is if we can 
>>>> make breaking changes to public facing APIs if the thing is disabled by 
>>>> default (debug tables is the example); I personally don’t have clarity 
>>>> here for the project so hard to say.
>>>> 
>>>> TL;DR I am +0
>>>> 
>>>> On Dec 11, 2025, at 3:30 AM, Štefan Miklošovič <[email protected]> 
>>>> wrote:
>>>> 
>>>> Oh wow! Thanks Dmitry for all these references. I think that the fact
>>>> Corretto includes that into JDK is the testament of the quality.
>>>> 
>>>> David, I hope this answers your concerns pretty much?
>>>> 
>>>> On Thu, Dec 11, 2025 at 12:27 PM Dmitry Konstantinov <[email protected]> 
>>>> wrote:
>>>> 
>>>> 
>>>> + 1 from my side
>>>> 
>>>> 1) it is well known mature profiling tool, there are other cases when 
>>>> Apache projects embedded it, for example:
>>>> - https://issues.apache.org/jira/browse/HADOOP-18055
>>>> - https://issues.apache.org/jira/browse/HBASE-29045
>>>> - https://issues.apache.org/jira/browse/FLINK-33325
>>>> 2) Apache-2.0 license
>>>> 3) the dependency has a small size (less than 1Mb) and does not have 
>>>> transitive dependencies to other 3rd parties
>>>> 4) the main contributors are now in Amazon, it is even included into 
>>>> Corretto JDK now 
>>>> (https://aws.amazon.com/about-aws/whats-new/2025/10/amazon-corretto-october-2025-quarterly-updates/
>>>>  )
>>>> 5) the logic is disabled by default, so no impact if you do not use it.
>>>> 
>>>> 
>>>> On Wed, 10 Dec 2025 at 18:08, Štefan Miklošovič <[email protected]> 
>>>> wrote:
>>>> 
>>>> 
>>>> This capability is disabled by default, it is driven by a system
>>>> property you have to set to true in order to be able to get an
>>>> instance of AsyncProfiler which does the actual profiling. If
>>>> disabled, which is default, then any calls via nodetool which needs
>>>> AsyncProfiler (start, stop, status) would return a message that
>>>> profiling is not enabled.
>>>> 
>>>> Not sure if this answers your concerns but without knowingly turning
>>>> it on nothing happens.
>>>> 
>>>> On Wed, Dec 10, 2025 at 6:28 PM David Capwell <[email protected]> wrote:
>>>> 
>>>> 
>>>> I have no issues adding it.  I think my only real comment would be the 
>>>> same as with manager; w/e we expose to the public api (in this case 
>>>> Nodetool) we have to support, so if a 3rd party lib breaks compatibility 
>>>> that puts us in a bind if we didn’t think about that up front.
>>>> 
>>>> Having async-profiler exposed makes it easier to profile is a good thing.  
>>>> Manager has (or is in the process of adding) API auth so we can lock down 
>>>> async-profiler to those “allowed” but do we have similar in Nodetool?  We 
>>>> had an issue in the past that async-profiler would trigger a JVM crash 
>>>> (JVM bug), so we had to limit calls to it until it was fixed.
>>>> 
>>>> On Dec 10, 2025, at 9:00 AM, Štefan Miklošovič <[email protected]> 
>>>> wrote:
>>>> 
>>>> Worth to mention that we were also contemplating about the inclusion
>>>> of jfr-convert so a user can also convert raw JFR files to e.g. HTML
>>>> with heatmaps but we evaluated that it is not necessary. Sure, it
>>>> would be comfortable, but ultimately not needed. Conversion of such a
>>>> file via nodetool, on server side, is just not a good idea, it is not
>>>> a job of a server to convert anything.
>>>> 
>>>> In majority of cases, people using the profiler just want to get a
>>>> HTML with cpu / allocation profile, it can even gather JFR files as
>>>> such and fetch it is, it is just that the conversion as such can
>>>> happen on client's side instead.
>>>> 
>>>> I am +1 for introducing the core async profiler library only.
>>>> 
>>>> On Wed, Dec 10, 2025 at 5:46 PM Bernardo Botella
>>>> <[email protected]> wrote:
>>>> 
>>>> 
>>>> Hi everyone!
>>>> 
>>>> I’d like to propose adding the async-profiler library to the Cassandra 
>>>> project. This will enable us to add a new nodetool command to do profiling 
>>>> tasks on the process running Cassandra. This information can be useful to 
>>>> debug a wide range of potential issues and performance optimizations. 
>>>> CASSANDRA-20854 captures the effort and the details of the proposal, and 
>>>> this PR proposes its implementation.
>>>> 
>>>> I want to note that this feature was already discussed in this thread, and 
>>>> this one only want to make sure that no one has any concerns about adding 
>>>> the library as a dependency.
>>>> 
>>>> What is async-profiler?
>>>> async-profiler is a low overhead sampling profiler for Java that does not 
>>>> suffer from the Safepoint bias problem. It features HotSpot-specific API 
>>>> to collect stack traces and to track memory allocations. The profiler 
>>>> works with OpenJDK and other Java runtimes based on the HotSpot JVM.
>>>> 
>>>> Unlike traditional Java profilers, async-profiler monitors non-Java 
>>>> threads (e.g., GC and JIT compiler threads) and shows native and kernel 
>>>> frames in stack traces.
>>>> 
>>>> What can be profiled:
>>>> 
>>>> CPU time
>>>> Allocations in Java Heap
>>>> Native memory allocations and leaks
>>>> Contended locks
>>>> Hardware and software performance counters like cache misses, page faults, 
>>>> context switches
>>>> and more.
>>>> 
>>>> 
>>>> We propose to add async-profiler 4.2 as a dependency to Cassandra.
>>>> 
>>>> Any concerns?
>>>> Bernardo
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Dmitry Konstantinov
>>>> 
>>>> 
>>> 
>>> 

Reply via email to