If it makes everybody more comfortable we can change the signature of
start() method to accept a map instead of explicit 4 parameters so it
is elastic enough to react on whatever new parameter we might pass
into that without changing MBean methods signature.

On Fri, Dec 12, 2025 at 8:56 PM Štefan Miklošovič
<[email protected]> wrote:
>
> From Sidecar's perspective the things it only cares about is that
> profiling should start, into what file it wants it to save it, what
> format should be the result in and what events it wants to gather
>
> {
>   "event": "cpu",
>   "format": "jfr",
>   "file": "result.jfr" // optional
>   "duration": "5m"
> }
>
> This will then be translated to MBean's start() method accepting these
> four parameters.
>
> I mean ... what is ever going to change here?
>
> The fact that it is also going to be integrated into things like
> Corretto likely means that there will be great emphasis on keeping the
> interface of that as it is, I do not think that once they start to
> integrate it into a JDK then they will suddenly want to change the way
> how that tool works to such an extent that it would be unrecognizable
> and all other integrations not possible. I think that the author(s) of
> that tool is aware that this tool is spreading around into places
> where backward compatibility is a necessity.
>
> On Fri, Dec 12, 2025 at 7:54 PM Jaydeep Chovatia
> <[email protected]> wrote:
> >
> > +1 to what Jon said.
> > We should not create any abstraction due to the following reasons 1) If 
> > async profiler API changes, then there must be some valid reason 2) Async 
> > profiler is not going to be in the hot path for Cassandra users, so even if 
> > it breaks, it is ok.
> >
> > Jaydeep
> >
> > On Fri, Dec 12, 2025 at 10:11 AM Jon Haddad <[email protected]> 
> > wrote:
> >>
> >> We went over this a while back, and personally, not only am I not 
> >> concerned about exposing their api, but I prefer it.  Which is why we 
> >> discussed both options.
> >>
> >> if they're changing the API of the profiler (they haven't yet), then there 
> >> would be an exceptionally good reason for it.  I don't expect this would 
> >> ever happen.  We don't need a premature layer of abstraction here.  If it 
> >> does, we can address it.
> >>
> >>
> >>
> >> On Fri, Dec 12, 2025 at 9:23 AM David Capwell <[email protected]> wrote:
> >>>
> >>> > I am also not completely sure what you meant by "manager", what
> >>> > manager? Is that some terminology from  your work or something we have
> >>> > here? Genuinely asking what you mean by that, I am lost a bit here.
> >>>
> >>> Sorry, sleep deprived with a 3 month old atm… manager == side car… Side 
> >>> Car is adding async profiler to their API, there was a thread about it 
> >>> awhile back.
> >>>
> >>> > When it comes to API, we are not touching anything already there. We
> >>> > expose this through brand new
> >>> > org.apache.cassandra.profiler.AsyncProfilerMBean.
> >>>
> >>> Adding a new API isn’t a breaking change, but the point I made in the 
> >>> side car thread is that the “execute” function uses the same arguments 
> >>> that async profiler does, which could change for us over time as its a 
> >>> 3rd party API.  Exposing a 3rd party API puts us at risk as we normally 
> >>> support things for 10+ years so if they make a change than Cassandra also 
> >>> makes such a change… will we detect this? To us its just a string, so how 
> >>> would we know that this happened to protect our users?
> >>>
> >>>
> >>> > On Dec 12, 2025, at 6:45 AM, Štefan Miklošovič <[email protected]> 
> >>> > wrote:
> >>> >
> >>> > Hi Jon, answers below
> >>> >
> >>> > On Fri, Dec 12, 2025 at 2:19 AM Jon Haddad <[email protected]> 
> >>> > wrote:
> >>> >>
> >>> >> +1 to including it, conceptually.  It's easily the best tool for 
> >>> >> diagnosing perf issues that I've used. I've got a few questions / 
> >>> >> thoughts about implementation details & user ergonomics.
> >>> >>
> >>> >> - Capturing call stacks in modern kernels require some params to be 
> >>> >> set.  Are we going to be able to check the requirements are met and 
> >>> >> give the user feedback?
> >>> >
> >>> > Indeed, we go to inform a user on two occasions. First, the check will
> >>> > be executed in the context of Startup Checks "framework" we already
> >>> > have in place in Cassandra, reading respective parameters from /proc
> >>> > and a message will be logged if values of these parameters are not
> >>> > "ideal". We do not go to fail the startup if they are not though. Just
> >>> > a warning, because a user can always set it while Cassandra runs. No
> >>> > need to _fail_ the startup.
> >>> >
> >>> > However, later on, if you go to profile via "nodetool profile start"
> >>> > and these two are not set as they should be we will fail and inform a
> >>> > user that they need to set them first.
> >>> >
> >>> >> - Profiling in containers is a little weird [1].  Same type of issue 
> >>> >> as my first point.
> >>> >
> >>> > I have run this in a container (Docker Compose) and I just did not
> >>> > need to do anything. It just ... worked. I think this will be on a
> >>> > user to ensure all is in place if anything special is needed.  We are
> >>> > also not dealing with any "pids" here as profiling is running in JVM
> >>> > via AsyncProfiler API. (2)
> >>> >
> >>> >> - Getting allocation profiles requires debug symbols.  More ergonomics.
> >>> >
> >>> > That is an old recommendation in the context of Cassandra 6.0 this
> >>> > lands in, no? Which runs on 11+. They say "Prior to JDK 11" which does
> >>> > not happen here.
> >>> >
> >>> > https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingModes.md#installing-debug-symbols
> >>> >
> >>> >> - The profiler moves a lot faster than we do.  Are we going to bump 
> >>> >> the async profiler in bug fix C* releases or are we freezing the 
> >>> >> version?
> >>> >
> >>> > I would update major versions of async profiler only in major versions
> >>> > of Cassandra. Patch versions of AsyncProfiler might be updated within
> >>> > patch versions of Cassandra. That makes the most sense to me.
> >>> >
> >>> > If you want to use something more recent without Cassandra providing
> >>> > it first, you can basically do this and it should just work.
> >>> >
> >>> >> - Can I still attach using the asprof tool?  Will there be an issue if 
> >>> >> I attach a newer version of the profiler?
> >>> >
> >>> > As said, the fact whether we can profile in Cassandra via in-built
> >>> > profiler is driven by a system property, defaults to false. When set
> >>> > to false, that means the logic which would check kernel parameters or
> >>> > which would instantiate the AsyncProfiler object (as shown in (2))
> >>> > would not be exercised at all. Hence nothing "async-related" would be
> >>> > instantiated in Cassandra etc. Then you can just take the async
> >>> > profiler as you know it and run bin/asprof for Cassandra's PID as you
> >>> > are used to. That also answers what happens if you use a newer version
> >>> > - it would act the very same way.
> >>> >
> >>> >> - Are we relocating the jars, or does Corretto?
> >>> >
> >>> > The current patch does it in such a way that we are depending on
> >>> > AsyncProfiler and it will be eventually included in release tarball.
> >>> > So if you start Cassandra, that library will be on the class path
> >>> > (even though until a system property is set to true which enables it,
> >>> > it will not be possible to use it and it is not in any way
> >>> > instantiated or initialized, it is also not possible to enable it in
> >>> > runtime).
> >>> >
> >>> > (1) 
> >>> > https://github.com/apache/cassandra/blob/1b6e538c98db4287795692b7df88aa4940c3a7af/doc/modules/cassandra/pages/managing/operating/async-profiler.adoc#using-a-different-async-profiler-version
> >>> > (2) 
> >>> > https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#example-usage-with-the-api
> >>> >
> >>> >>
> >>> >> Thanks!
> >>> >> Jon
> >>> >>
> >>> >> [1] 
> >>> >> https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingInContainer.md
> >>> >>
> >>> >> On Thu, Dec 11, 2025 at 1:12 PM Josh McKenzie <[email protected]> 
> >>> >> wrote:
> >>> >>>
> >>> >>> If we expose whatever API the 3rd party has and they drift or break 
> >>> >>> it in the future, we could introduce a shim that would keep prior 
> >>> >>> ergonomics at that time w/sane defaults or graceful handling of 
> >>> >>> removals.
> >>> >>>
> >>> >>> Think "manager" is referring to the sidecar here.
> >>> >>>
> >>> >>> On Thu, Dec 11, 2025, at 2:03 PM, Štefan Miklošovič wrote:
> >>> >>>
> >>> >>> Can you help me to understand what you mean by that? I have a feeling
> >>> >>> I am missing something here or we are not on the same page.
> >>> >>>
> >>> >>> When it comes to API, we are not touching anything already there. We
> >>> >>> expose this through brand new
> >>> >>> org.apache.cassandra.profiler.AsyncProfilerMBean.
> >>> >>>
> >>> >>> So we are not really breaking anything here?
> >>> >>>
> >>> >>> I am also not completely sure what you meant by "manager", what
> >>> >>> manager? Is that some terminology from  your work or something we have
> >>> >>> here? Genuinely asking what you mean by that, I am lost a bit here.
> >>> >>>
> >>> >>> If you mean that "we start to call AsyncProfiler and then in later
> >>> >>> versions these guys decide that they will change how it is called" I
> >>> >>> do not think that is really an issue here, is it? A user does not deal
> >>> >>> with that directly anyway at all, only via MBean and there will
> >>> >>> presumably always be a way to start and stop profiling, that is
> >>> >>> basically at the very core of what that library is doing, no?
> >>> >>>
> >>> >>> On Thu, Dec 11, 2025 at 7:03 PM David Capwell <[email protected]> 
> >>> >>> wrote:
> >>> >>>>
> >>> >>>> If disabled, which is default,
> >>> >>>>
> >>> >>>>
> >>> >>>> I def won’t block on this, I just want us to think about these 
> >>> >>>> possible problems before we touch a public API; ill leave it to 
> >>> >>>> author(s)/reviewer(s).
> >>> >>>>
> >>> >>>> One thing that has been brought up in a different context is if we 
> >>> >>>> can make breaking changes to public facing APIs if the thing is 
> >>> >>>> disabled by default (debug tables is the example); I personally 
> >>> >>>> don’t have clarity here for the project so hard to say.
> >>> >>>>
> >>> >>>> TL;DR I am +0
> >>> >>>>
> >>> >>>> On Dec 11, 2025, at 3:30 AM, Štefan Miklošovič 
> >>> >>>> <[email protected]> wrote:
> >>> >>>>
> >>> >>>> Oh wow! Thanks Dmitry for all these references. I think that the fact
> >>> >>>> Corretto includes that into JDK is the testament of the quality.
> >>> >>>>
> >>> >>>> David, I hope this answers your concerns pretty much?
> >>> >>>>
> >>> >>>> On Thu, Dec 11, 2025 at 12:27 PM Dmitry Konstantinov 
> >>> >>>> <[email protected]> wrote:
> >>> >>>>
> >>> >>>>
> >>> >>>> + 1 from my side
> >>> >>>>
> >>> >>>> 1) it is well known mature profiling tool, there are other cases 
> >>> >>>> when Apache projects embedded it, for example:
> >>> >>>> - https://issues.apache.org/jira/browse/HADOOP-18055
> >>> >>>> - https://issues.apache.org/jira/browse/HBASE-29045
> >>> >>>> - https://issues.apache.org/jira/browse/FLINK-33325
> >>> >>>> 2) Apache-2.0 license
> >>> >>>> 3) the dependency has a small size (less than 1Mb) and does not have 
> >>> >>>> transitive dependencies to other 3rd parties
> >>> >>>> 4) the main contributors are now in Amazon, it is even included into 
> >>> >>>> Corretto JDK now 
> >>> >>>> (https://aws.amazon.com/about-aws/whats-new/2025/10/amazon-corretto-october-2025-quarterly-updates/
> >>> >>>>  )
> >>> >>>> 5) the logic is disabled by default, so no impact if you do not use 
> >>> >>>> it.
> >>> >>>>
> >>> >>>>
> >>> >>>> On Wed, 10 Dec 2025 at 18:08, Štefan Miklošovič 
> >>> >>>> <[email protected]> wrote:
> >>> >>>>
> >>> >>>>
> >>> >>>> This capability is disabled by default, it is driven by a system
> >>> >>>> property you have to set to true in order to be able to get an
> >>> >>>> instance of AsyncProfiler which does the actual profiling. If
> >>> >>>> disabled, which is default, then any calls via nodetool which needs
> >>> >>>> AsyncProfiler (start, stop, status) would return a message that
> >>> >>>> profiling is not enabled.
> >>> >>>>
> >>> >>>> Not sure if this answers your concerns but without knowingly turning
> >>> >>>> it on nothing happens.
> >>> >>>>
> >>> >>>> On Wed, Dec 10, 2025 at 6:28 PM David Capwell <[email protected]> 
> >>> >>>> wrote:
> >>> >>>>
> >>> >>>>
> >>> >>>> I have no issues adding it.  I think my only real comment would be 
> >>> >>>> the same as with manager; w/e we expose to the public api (in this 
> >>> >>>> case Nodetool) we have to support, so if a 3rd party lib breaks 
> >>> >>>> compatibility that puts us in a bind if we didn’t think about that 
> >>> >>>> up front.
> >>> >>>>
> >>> >>>> Having async-profiler exposed makes it easier to profile is a good 
> >>> >>>> thing.  Manager has (or is in the process of adding) API auth so we 
> >>> >>>> can lock down async-profiler to those “allowed” but do we have 
> >>> >>>> similar in Nodetool?  We had an issue in the past that 
> >>> >>>> async-profiler would trigger a JVM crash (JVM bug), so we had to 
> >>> >>>> limit calls to it until it was fixed.
> >>> >>>>
> >>> >>>> On Dec 10, 2025, at 9:00 AM, Štefan Miklošovič 
> >>> >>>> <[email protected]> wrote:
> >>> >>>>
> >>> >>>> Worth to mention that we were also contemplating about the inclusion
> >>> >>>> of jfr-convert so a user can also convert raw JFR files to e.g. HTML
> >>> >>>> with heatmaps but we evaluated that it is not necessary. Sure, it
> >>> >>>> would be comfortable, but ultimately not needed. Conversion of such a
> >>> >>>> file via nodetool, on server side, is just not a good idea, it is not
> >>> >>>> a job of a server to convert anything.
> >>> >>>>
> >>> >>>> In majority of cases, people using the profiler just want to get a
> >>> >>>> HTML with cpu / allocation profile, it can even gather JFR files as
> >>> >>>> such and fetch it is, it is just that the conversion as such can
> >>> >>>> happen on client's side instead.
> >>> >>>>
> >>> >>>> I am +1 for introducing the core async profiler library only.
> >>> >>>>
> >>> >>>> On Wed, Dec 10, 2025 at 5:46 PM Bernardo Botella
> >>> >>>> <[email protected]> wrote:
> >>> >>>>
> >>> >>>>
> >>> >>>> Hi everyone!
> >>> >>>>
> >>> >>>> I’d like to propose adding the async-profiler library to the 
> >>> >>>> Cassandra project. This will enable us to add a new nodetool command 
> >>> >>>> to do profiling tasks on the process running Cassandra. This 
> >>> >>>> information can be useful to debug a wide range of potential issues 
> >>> >>>> and performance optimizations. CASSANDRA-20854 captures the effort 
> >>> >>>> and the details of the proposal, and this PR proposes its 
> >>> >>>> implementation.
> >>> >>>>
> >>> >>>> I want to note that this feature was already discussed in this 
> >>> >>>> thread, and this one only want to make sure that no one has any 
> >>> >>>> concerns about adding the library as a dependency.
> >>> >>>>
> >>> >>>> What is async-profiler?
> >>> >>>> async-profiler is a low overhead sampling profiler for Java that 
> >>> >>>> does not suffer from the Safepoint bias problem. It features 
> >>> >>>> HotSpot-specific API to collect stack traces and to track memory 
> >>> >>>> allocations. The profiler works with OpenJDK and other Java runtimes 
> >>> >>>> based on the HotSpot JVM.
> >>> >>>>
> >>> >>>> Unlike traditional Java profilers, async-profiler monitors non-Java 
> >>> >>>> threads (e.g., GC and JIT compiler threads) and shows native and 
> >>> >>>> kernel frames in stack traces.
> >>> >>>>
> >>> >>>> What can be profiled:
> >>> >>>>
> >>> >>>> CPU time
> >>> >>>> Allocations in Java Heap
> >>> >>>> Native memory allocations and leaks
> >>> >>>> Contended locks
> >>> >>>> Hardware and software performance counters like cache misses, page 
> >>> >>>> faults, context switches
> >>> >>>> and more.
> >>> >>>>
> >>> >>>>
> >>> >>>> We propose to add async-profiler 4.2 as a dependency to Cassandra.
> >>> >>>>
> >>> >>>> Any concerns?
> >>> >>>> Bernardo
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> --
> >>> >>>> Dmitry Konstantinov
> >>> >>>>
> >>> >>>>
> >>> >>>
> >>> >>>
> >>>

Reply via email to