Re: Splitting NMT memory counters into "live" and "flat" classes

2024-08-29 Thread Thomas Stüfe
My current thinking (which may change) is that for VirtualMemory, we may
not need atomic counters at all since these counters are used in
conjunction with modifying the not lock-free virtual memory tracker.

For malloc counters, this makes sense, since all we do for malloc is to
increase these counters (in summary mode) and in detail mode, we add the
call site to the malloc site table, which is lock free.

Cheers, Thomas



On Thu, Aug 29, 2024 at 5:30 PM Robert Toyonaga  wrote:

> Hi Everyone,
>
> What are your thoughts on splitting the NMT memory counter classes (both
> MemoryCounter and VirtualMemory) into "live" and "flat" classes? Currently,
> the counters are used for both recording live data and for storing static
> data when creating reports. However, after snapshotting the counters for
> report generation, we no longer need atomic operations or any updating of
> peak values. The "live" set of classes could keep the atomic operations,
> peak updating, setters, and snapshotting functionality. The "flat" classes
> could get rid of atomic, peak updating, and only need getters.
>
> The JBS issue for this is JDK-8334234
>
> Thanks!
> Robert Toyonaga
>


Re: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag

2024-08-29 Thread Thomas Stüfe
Note that “NMT::MemType” may lead to many uses of just “MemType”  since a
lot of the usage will happen inside a future NMT namespace and people will
just drop the namespace. Will make it a bit more difficult to grep for,
since you need to look for both variants.

Adding an NMT namespace also opens other questions. E.g. writing - all
lower case is the standard.



On Thu 29. Aug 2024 at 17:15, Gerard Ziemski  wrote:

> On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski 
> wrote:
>
> > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`.
> >
> > `MEMFLAGS` implies that we can use more than one at the same time, but
> those are exclusive values, so `MemType` is much more suitable name.
> >
> > There is a bunch of other related cleanup that we can do, but I will
> leave for follow up issues such as [NMT: rename NMTUtil::flag to
> NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836)
>
> Taking Dan's feedback into account we have:
>
> -
>
> PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2318039929
>


Re: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag

2024-08-14 Thread Thomas Stüfe
I am out sick after JVMLS, typical post conference flu. There is no need to
rush this, right?

I really dislike MemType for its very genericness. MEMFLAGS is a handle
into the NMT subsystem. It has no meaning beyond NMT. Yet, it is spread all
over the code base. Therefore I really would like an NMT prefix, whatever
the name then is. Clear and easy to grep. A very generic name like MemType
or similar will clash with many similar sounding specifiers from other
places. That may not sound so much of an issue if you are only working
within a single hotspot subsystem; but if you work in all corners of the
JDK, you come to like clear succinct names, and an important part of
clearness is scope, and the scope here is NMT.

Just my 5 cent

On Wed 14. Aug 2024 at 17:43, Gerard Ziemski  wrote:

> On Wed, 14 Aug 2024 06:54:22 GMT, Stefan Karlsson 
> wrote:
>
> > > Is everyone OK with MemTypeFlag?
> >
> > It's quite unfortunate to have a three-word type for something this
> prolific in our code base. Why not go with `MemType` and change variable
> names from `flag` to `mt`?
> >
> > ```
> >   static char* map_memory_to_file(size_t size, int fd, MEMFLAGS flag =
> mtNone);
> > ```
> >
> > would then become:
> >
> > ```
> >   static char* map_memory_to_file(size_t size, int fd, MemType mt =
> mtNone);
> > ```
>
> My initial choice was exactly that, but then I backed-off from renaming
> the arguments, because how big and intrusive the change it seemed.
>
> David seems to prefer `MemTypeFlag`, so that we don't have to rename all
> the arguments and I see a point in that, but it wouldn't be my first choice.
>
> Thomas seems to prefer `NMTCat` that I just don't like much, despite that
> it has NMT prefix in it, for some reason.
>
> If we could find a compromise that we all can live with, despite it not
> being exactly what every single person wants, then that would be great. We
> could this in separate steps:
>
> Initial effort (this fix): we rename `MEMFLAGS` to `MemType`
>
> Follow up effort(s): we either rename all arguments in one big push
> (intrusive) or we do it a file, or related files (like NMT) together at a
> time in a followup(s) or whenever we are in the file with some related fix.
> Eventually we would get there, which is better than what we have right now
> IMHO.
>
> Is this a reasonable compromise to everyone?
>
> -
>
> PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2289134648
>


Re: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag

2024-08-08 Thread Thomas Stüfe
Just to state: if the majority wants a rename , that is of course fine. The
existing name is admittedly a bit of an eyesore.

On Thu 8. Aug 2024 at 13:33, Thomas Stüfe  wrote:

> I like it short, succinct and greppable.
>
> NMTCategory? NMTCat?
>
> That said, I can live with the current name and dread the cv backporting
> and support implications of this change.
>
> On Thu 8. Aug 2024 at 10:50, Gerard Ziemski  wrote:
>
>> On Thu, 8 Aug 2024 04:47:18 GMT, David Holmes 
>> wrote:
>>
>> > If you called it `MemTypeFlag` - which to me still suggests
>> mutually-exclusive values - then you would not need to rename all the
>> variables with "flag" in their name later.
>>
>> Hmm, not a bad idea. Are there any other opinions?
>>
>> -
>>
>> PR Comment:
>> https://git.openjdk.org/jdk/pull/20497#issuecomment-2276354395
>>
>


Re: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag

2024-08-08 Thread Thomas Stüfe
I like it short, succinct and greppable.

NMTCategory? NMTCat?

That said, I can live with the current name and dread the cv backporting
and support implications of this change.

On Thu 8. Aug 2024 at 10:50, Gerard Ziemski  wrote:

> On Thu, 8 Aug 2024 04:47:18 GMT, David Holmes  wrote:
>
> > If you called it `MemTypeFlag` - which to me still suggests
> mutually-exclusive values - then you would not need to rename all the
> variables with "flag" in their name later.
>
> Hmm, not a bad idea. Are there any other opinions?
>
> -
>
> PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2276354395
>


Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information

2024-02-02 Thread Thomas Stüfe
Hi Kevin,

Having a clear command spec to read and argue about would be helpful,
especially since this is not a simple commnd but a whole command group.

Exposing such a low level interface (this is supposed to go into product
VMs, right?) may carry some risks that could arguably fall unter CSR.

That said, what use case do you envisage? To me, these commands are usually
only useful in the debugger, when I deal with numerical adresses.

Cheers, Thomas

p.s. please note that many folks are at fosdem right now.

On Fri 2. Feb 2024 at 19:35, Kevin Walls  wrote:

> Introduce the jcmd "VM.debug" to implement access to a useful set of the
> established debug.cpp utilities, with "jcmd PID VM.debug subcommand ...".
>
> Not recommended for live production use.  Calling these "debug" utilities,
> and not including them in the jcmd help output, is to remind us they are
> not general customer-facing tools.
>
> -
>
> Commit messages:
>  - Tidy up the safety checks
>  - Whitebox not required, just check option properties.
>  - unnecessary parameter to find
>  - Test update. Recognise ZGC oops differently.  Formatting.
>  - typo
>  - Separate is_good_oop... method to avoid changing existing asserts.
>  - More oop safety
>  - 8318026: jcmd should provide access to low-level JVM debug information
>
> Changes: https://git.openjdk.org/jdk/pull/17655/files
>  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17655&range=00
>   Issue: https://bugs.openjdk.org/browse/JDK-8318026
>   Stats: 371 lines in 5 files changed: 369 ins; 0 del; 2 mod
>   Patch: https://git.openjdk.org/jdk/pull/17655.diff
>   Fetch: git fetch https://git.openjdk.org/jdk.git
> pull/17655/head:pull/17655
>
> PR: https://git.openjdk.org/jdk/pull/17655
>


Re: Clarifying jmethodID Usage and Potential JVM Crashes

2023-06-01 Thread Thomas Stüfe
On Thu, Jun 1, 2023 at 9:17 AM Jaroslav Bachorík <
jaroslav.bacho...@datadoghq.com> wrote:

> Hi David,
>
> On Thu, Jun 1, 2023 at 3:56 AM David Holmes 
> wrote:
>
>> Hi Jaroslav,
>>
>> On 31/05/2023 9:12 pm, Jaroslav Bachorík wrote:
>> > Dear Team,
>> >
>> > I've been investigating the unusual JVM crashes occurring in JVMTI
>> calls
>> > on a J9 JVM. During my investigation, I scrutinized the `jmethodID`
>> > definition closely, available here: [jmethodID
>> > definition](
>> https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID
>> <
>> https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID
>> >).
>> >
>> > To paraphrase, the definition suggests that `jmethodID` identifies a
>> > Java method, initializer, or constructor. These identifiers, once
>> > returned by JVM TI functions and events, can be safely stored. However,
>> > when the class is unloaded, they become invalid, rendering them
>> > unsuitable for use.
>> >
>> > My interpretation is that the JVMTI user should verify the validity of
>> a
>> > `jmethodID` value before using it to prevent potential crashes. Would
>> > you agree with this interpretation?
>>
>> Not quite - as you note you can't verify the jmethodID validity. What
>> the user needs to do, in line with what Dan was saying, is ensure that
>> they keep track of the classes to which the methods belong and keep them
>> alive if necessary. Now that may be easier said than done, but that is
>> the gist of it. This comes from the JNI spec:
>>
>> "A field or method ID does not prevent the VM from unloading the class
>> from which the ID has been derived. After the class is unloaded, the
>> method or field ID becomes invalid and may not be passed to any function
>> taking such an ID. The native code, therefore, must make sure to:
>>
>>  keep a live reference to the underlying class, or
>>  recompute the method or field ID
>>
>> if it intends to use a method or field ID for an extended period of time."
>>
>> > This sounds like a sensible requirement, but its practical application
>> > remains unclear. As far as I know, methods can be unloaded concurrently
>> > to the native code executing JVMTI functions. This introduces a
>> > potential race condition where the JVM unloads the methods during the
>> > check->use flow, making it only a partial solution. To complicate
>> > matters further, no method exists to confirm whether a `jmethodID` is
>> valid.
>> >
>> > Theoretically, we could monitor the `CompiledMethodUnload` event to
>> > track the validity state, creating a constantly expanding set of
>> > unloaded `jmethodID` values or a bloom filter, if one does not care
>> > about few potential false positives. This strategy, however, doesn't
>> > address the potential race condition, and it could even exacerbate it
>> > due to possible event delays. This delay might mistakenly validate a
>> > `jmethodID` value that has already been unloaded, but for which the
>> > event hasn't been delivered yet.
>> >
>> > Honestly, I don't see a way to use `jmethodID` safely unless the code
>> > using it suspends the entire JVM and doesn't resume until it's finished
>> > with that `jmethodID`. Any other approach might lead to JVM crashes, as
>> > we've observed with J9.
>> >
>> > Lastly, it's noteworthy that Hotspot takes meticulous measures to
>> ensure
>> > that using jmethodIDs for unloaded methods doesn't crash the JVM and
>> > even provides useful information. This observation has led me to
>> > question whether the documentation aligns with the Hotspot
>> > implementation, especially given that following closely the
>> > documentation appears to increase the risk associated with the use of
>> > `jmethodID` values.
>>
>> There have been folk who wanted to make this area more user-friendly but
>> that shouldn't be mistaken for moving towards a world where jmethodIDs
>> are always safe to use.
>>
>
> Yes, I see your point. Unfortunately, this confirms my worries that using
> AsyncGetCallTrace (ASGCT) on a system strictly adhering to the JVMTI spec
> of jmethoID is not really possible without risking random and quite
> frequent crashes on systems with concurrent class unloading enabled.
> FTR, ASGCT will record the stack trace as a list of frames, each one
> containing the corresponding jmethodID value. Considering that the most
> common usage of ASGCT is in a signal handler it makes it impossible to use
> JVMTI calls to resolve the holder class and create a strong reference to
> prevent it from being unloaded.
> And even if this would be possible we would need to figure out when to
> release the class reference when it is no more needed - and it is not
> really clear how we could do that reliably, leaving us with the option of
> holding the class references indefinitely or risking crashing JVM.
>
> I want to emphasize that not being able to resolve additional details for
> a jmethodID pointing to a method of an unloaded class is not an issue, as
> long as the JVMTI

Re: [External] : Re: Disallowing the dynamic loading of agents by default

2023-03-17 Thread Thomas Stüfe
Thank you for the clarification.

Oddly enough, -XX:-EnableDynamicAgentLoading seems to be broken. Tried head
(fastdebug, release) and JDK17, even with this switch my sample library
loads just fine:

```
thomas@starfish$ ./images/jdk/bin/java -XX:-EnableDynamicAgentLoading
-XX:+PrintFlagsFinal  -cp $REPROS_JAR de.stuefe.repros.Simple


[Global flags]

...
 bool EnableDynamicAgentLoading= false
{product} {command line}
...

OnAttach! Loading JVMTI sample agent
```

Investigation shows that there seems to be a bug in attachListener.cpp
where we compare AttachOperation::name for "load", but it contains "jcmd":

```
Thread 22 "Attach Listener" hit Breakpoint 1, attach_listener_thread_entry
(thread=0x7fff94000fd0, __the_thread__=0x7fff94000fd0) at
/shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/attachListener.cpp:404
404 } else if (!EnableDynamicAgentLoading && strcmp(op->name(),
"load") == 0) {
(gdb) p op
$1 = (AttachOperation *) 0x7fff7401b640
(gdb) p *op
$2 = {> = {}, _vptr.AttachOperation =
0x77b61210 , _name = "jcmd\000",
'\361' , , _arg = {
"JVMTI.agent_load /shared/projects/jvmti-sample/sample.so\000", '\361'
..., "\000", '\361' ..., "\000",
'\361' ...}}
(gdb) p op->name()
$3 = 0x7fff7401b648 "jcmd"
```

This was on Linux x64.

So if people have been using -XX:-EnableDynamicAgentLoading to check their
code, this may not have worked as intended.

Cheers, Thomas



On Fri, Mar 17, 2023 at 2:42 PM Ron Pressler 
wrote:

>
>
> On 17 Mar 2023, at 13:33, Thomas Stüfe  wrote:
>
> Hi Ron,
>
> Will this affect attaching via jcmd?
>
>
> The Attach mechanism will not be disabled by default, just the ability to
> load agents via the Attach mechanism.
> So the only jcmd command that will be affected is JVMTI.agent_load.
>
> To see the effect of the change today, launch java with
> -XX:-EnableDynamicAgentLoading, which is
> to become the new default.
>
> — Ron
>
>


Re: Disallowing the dynamic loading of agents by default

2023-03-17 Thread Thomas Stüfe
Hi Ron,

Will this affect attaching via jcmd?

Thanks, Thomas

On Thu, Mar 16, 2023 at 7:48 PM Ron Pressler 
wrote:

> Hi.
>
> In JDK 21 we intend to disallow the dynamic loading of agents by default.
> This
> will affect tools that use the Attach API to load an agent into a JVM some
> time
> after the JVM has started [1]. There is no change to any of the mechanisms
> that
> load an agent at JVM startup (-javaagent/-agentlib on the command line or
> the
> Launcher-Agent-Class attribute in the main JAR's manifest).
>
> This change in default behavior was proposed in 2017 as part of JEP 261
> [2][3].
> At that time the consensus was to switch to this default not in JDK 9 but
> in a
> later release to give tool maintainers sufficient time to inform their
> users.
> To allow the dynamic loading of agents, users will need to specify
> -XX:+EnableDynamicAgentLoading on the command line.
>
> I'll post a draft JEP for review shortly.
>
> -- Ron
>
> [1]:
> https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html
> [2]: https://openjdk.org/jeps/261
> [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html


Re: Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT)

2022-12-05 Thread Thomas Stüfe
Thank you for the positive encouragement, Roman :-)

Cheers, Thomas

On Mon, Dec 5, 2022 at 12:03 PM Kennke, Roman  wrote:

> Hi Thomas,
>
> I very much like the idea and also your proposals how to do it. Insights
> in JDK's native memory usage is sorely lacking and would be very useful!
> I don't have all that much to add about the details beyond what you
> already covered, though :-)
>
> Cheers,
> Roman
>
>
> > Are there any opinions about whether or not to extend NMT across the JDK?
> >
> > This blocks https://bugs.openjdk.org/browse/JDK-8296360
> > <https://bugs.openjdk.org/browse/JDK-8296360>, and I had a PR prepared
> > as https://github.com/openjdk/jdk/pull/10988
> > <https://github.com/openjdk/jdk/pull/10988>. Originally I was hoping to
> > get this into JDK 20, but I don't think that is realistic anymore. I am
> > fine with postponing my work in favor of a baseline discussion, but so
> > far there is very little discussion about this topic.
> >
> > How should I proceed?
> >
> > Thanks, Thomas
> >
> >
> >
> > On Wed, Nov 9, 2022 at 8:12 AM Thomas Stüfe  > <mailto:thomas.stu...@gmail.com>> wrote:
> >
> > Hi Alan,
> >
> > (replaced hotspot-runtime-dev with hotspot-dev, since its more of a
> > general topic)
> >
> > thank you for your time!
> >
> > I am very happy to talk this through. I think native memory
> > observability in the JDK (and customer code!) is sorely lacking.
> > Witness the countless "where did my native memory go" blog articles.
> > At SAP we have been struggling with this topic for a long time and
> > have come up with a mixture of solutions. The aforementioned tracker
> > was one, which extended our version of NMT across the JDK. Our
> > SapMachine MallocTracer, which allows us to trace uninstrumented
> > customer code, another. We even experimented with exchanging the
> > allocator (using jemalloc) to gain insights. But that is a whole
> > different topic with deep logistical implications, I don't want to
> > touch it here. Exchanging the allocator does not help to observe
> > virtual memory or the brk segment, of course.
> >
> > And to make the picture complete, another insight we currently lack
> > is the implicit allocator overhead, which can be very significant
> > and is hidden by the libc. We also have observability for that in
> > the SapMachine, and I miss it in OpenJDK.
> >
> > As you noticed, my original intent was just to instrument Zlib and
> > possibly improve tracking for DBBs. Although, thinking beyond that,
> > another attractive instrumentation target would be mapped NIO
> > buffers at least.
> >
> > So I think native memory observability is important. Arguably we
> > could even extend observability to cover other OS resources, e.g.
> > file handles. If we shift code around, to java/Panama: data that
> > move the java heap does not need to be tracked, but other memory
> > will always come from one of the basic system APIs, regardless of
> > who allocates it and where in the stack allocation happens. Be it
> > native JDK code, Panama, or even customer JNI code.
> >
> > If we agree on the importance of native memory observability, then I
> > believe NMT is the right tool for it. It is a good tool. The
> > machinery is already there. It covers both C-heap and virtual memory
> > APIs, as well as thread stacks, and could easily be extended to
> > cover sbrk if needed. And I assume that whatever shape OpenJDK takes
> > on in the future, there always will be a libjvm.so at its core, so
> > we will always have it. But even if not, NMT could be separated from
> > libjvm.so quite easily, since it has no deep ties with the JVM.
> >
> > About coupling JVM with outside code: We don't have to directly link
> > against libjvm.so. We can keep things loose if the intent is to be
> > runnable without a JVM, or be JVM-version-agnostic. That could take
> > the form of a function-pointer interface like JVMTI. Or outside code
> > could dynamically dlsym the JVM allocation hooks. In any case
> > gracefully falling back to system allocation routines when necessary.
> >
> > And I agree, polluting the NMT tag space with outside meaning is
> > ugly. I only did it because I planned to go no further than
> > instrumenting Zlib and possibly DBBs. But if we take this furt

Re: Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT)

2022-12-01 Thread Thomas Stüfe
Are there any opinions about whether or not to extend NMT across the JDK?

This blocks https://bugs.openjdk.org/browse/JDK-8296360, and I had a PR
prepared as https://github.com/openjdk/jdk/pull/10988. Originally I was
hoping to get this into JDK 20, but I don't think that is realistic
anymore. I am fine with postponing my work in favor of a baseline
discussion, but so far there is very little discussion about this topic.

How should I proceed?

Thanks, Thomas



On Wed, Nov 9, 2022 at 8:12 AM Thomas Stüfe  wrote:

> Hi Alan,
>
> (replaced hotspot-runtime-dev with hotspot-dev, since its more of a
> general topic)
>
> thank you for your time!
>
> I am very happy to talk this through. I think native memory observability
> in the JDK (and customer code!) is sorely lacking. Witness the countless
> "where did my native memory go" blog articles. At SAP we have been
> struggling with this topic for a long time and have come up with a mixture
> of solutions. The aforementioned tracker was one, which extended our
> version of NMT across the JDK. Our SapMachine MallocTracer, which allows us
> to trace uninstrumented customer code, another. We even experimented with
> exchanging the allocator (using jemalloc) to gain insights. But that is a
> whole different topic with deep logistical implications, I don't want to
> touch it here. Exchanging the allocator does not help to observe virtual
> memory or the brk segment, of course.
>
> And to make the picture complete, another insight we currently lack is the
> implicit allocator overhead, which can be very significant and is hidden by
> the libc. We also have observability for that in the SapMachine, and I miss
> it in OpenJDK.
>
> As you noticed, my original intent was just to instrument Zlib and
> possibly improve tracking for DBBs. Although, thinking beyond that, another
> attractive instrumentation target would be mapped NIO buffers at least.
>
> So I think native memory observability is important. Arguably we could
> even extend observability to cover other OS resources, e.g. file handles.
> If we shift code around, to java/Panama: data that move the java heap does
> not need to be tracked, but other memory will always come from one of the
> basic system APIs, regardless of who allocates it and where in the stack
> allocation happens. Be it native JDK code, Panama, or even customer JNI
> code.
>
> If we agree on the importance of native memory observability, then I
> believe NMT is the right tool for it. It is a good tool. The machinery is
> already there. It covers both C-heap and virtual memory APIs, as well as
> thread stacks, and could easily be extended to cover sbrk if needed. And I
> assume that whatever shape OpenJDK takes on in the future, there always
> will be a libjvm.so at its core, so we will always have it. But even if
> not, NMT could be separated from libjvm.so quite easily, since it has no
> deep ties with the JVM.
>
> About coupling JVM with outside code: We don't have to directly link
> against libjvm.so. We can keep things loose if the intent is to be runnable
> without a JVM, or be JVM-version-agnostic. That could take the form of a
> function-pointer interface like JVMTI. Or outside code could dynamically
> dlsym the JVM allocation hooks. In any case gracefully falling back to
> system allocation routines when necessary.
>
> And I agree, polluting the NMT tag space with outside meaning is ugly. I
> only did it because I planned to go no further than instrumenting Zlib and
> possibly DBBs. But if we take this further, my preferred solution would be
> a reserved tag range or -ranges for outside use, whose inner meaning would
> be opaque to the JVM. Kind of like SIGRTMIN+SIGRTMAX. Then, outside code
> could register tags and their meta information with the JVM, or we find a
> different way to convey the tag meaning to NMT (config files, or
> callbacks). That could even be opened up for customer use.
>
> This also touches on another question, that of NMT tag space. NMT tags are
> very useful since they allow cheap tracking without capturing call stacks.
> However, tags are underused and show growing pains since they are too
> one-dimensional and restrictive. We had competing interests in the past
> about tag granularity. It is all over the place. We have coarse-grained
> tags like "mtThread", and very fine-grained ones like "mtObjectMonitor".
> There are several ways we could improve, e.g., by making them combinable
> like UL does, or allowing for a hierarchy of them - either a hard-wired
> limited one like "domain"+"tag", or an unlimited tree-like one. Technically
> interesting since whatever the new encoding is, they still must fit into a
> malloc header. I opened htt

Re: Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT)

2022-11-30 Thread Thomas Stüfe
Hi Carter, Stefan,

thank you, I think it is good to have this discussion, it is important.

Side note, the discussion steered away from my original question - whether
to instrument the JDK with NMT. I still would love to discuss that, too.

About opening NMT up for user consumption, that is of course possible. But
I think the bigger question is which data we want to open for user
consumption, and at what granularity. And what contracts do we enter when
we do this.

NMT was originally a hotspot-dev-centric tool. It has a lot of
idiosyncrasies. Interpreting the results needs detailed knowledge about
hotspot memory management. Some examples:

- its reports are not consistent across JDK versions, not even across
different patch levels of the same JDK. So you cannot compare results, say,
between JDK11 and 17.
- before a certain version X (I believe JDK 11), the full thread stacks
were accounted for instead of just the in-use portion of the thread stacks.
I remember reading blogs about how thread stack consumption went down when
all that changed was NMT reporting.
- The memory sizes it shows may not have much to do with real RSS. It
systematically underreports some things, since it omits libc overhead and
retention, usage by system- and JNI libraries. But it also overreports
things since it mostly (not always) accounts in terms of "committed"
memory, which usually means mmap()ed or malloc()ed memory. But that is just
committed, not physical memory, it does not translate to RSS usage
directly. That memory may never be touched. OTOH NMT probes thread stacks
with mincore(), so for that section, "committed" really means "physical".

I am fine with opening up NMT via JFR. But does this mean we have to be
more consistent? Do we have to care about downward compatibility of NMT
reports? Are we then still free to redesign the tag system (see my original
mail) or will this tie us down with the current NMT tag system forever? As
a negative example, JFR exposes metaspace allocator details (chunk
statistics) which have been broken ever since JDK 16 when the underlying
implementation changed.

Therefore I am curious about what end users use NMT really for.

@Carter: can you give us examples of which NMT sections had been
particularly useful to you? Maybe we can define a subset to expose instead
of exposing all tags. E.g. I can see thread stack usage being very useful,
but things like ObjectMonitor footprint not so much.

Cheers, Thomas




On Wed, Nov 30, 2022 at 9:45 PM Carter Kozak  wrote:

> This looks fantastic, thank you so much! I can confirm that the proposed
> design would solve my use-case.
>
> I'd enjoy discussing the NMT event  contract somewhere more specific
> to the implementation, but I don't want to muddle this thread with
> implementation details.
>
> Carter Kozak
>
> On Wed, Nov 30, 2022, at 03:37, Stefan Johansson wrote:
>
> Hi Carter,
>
> Your mail made me pick up an old item from my wishlist: to have native
> memory tracking information available in JFR recordings. When we, in GC,
> do improvements to decrease the native memory overhead of our
> algorithms, NMT is a very good tool to track the progress. We have
> scripts that sound very similar to what you describe and more than once
> I've been thinking about adding this information into JFR. But it has
> not been a priority and the greater value has been unclear.
>
> Hearing that others might also benefit from such a change I took a
> discussion with the JFR team on how to best proceed with this. I have
> created a branch for this and will probably create a PR for it shortly,
> but I thought I would drop it here first:
> https://github.com/kstefanj/jdk/tree/8157023-jfr-events-for-nmt
>
> The change adds two new JFR events: one for the total usage and one for
> the usage of each memory type. These are sent only if Native Memory
> Tracking is turned on, and they are enabled in the default JFR profile
> with an interval of 1s. This might change during reviewing but it was a
> good starting point.
>
> With this you will be able to use JFR streaming to access the events
> from within your running process. I hope this will help your use cases
> and please let us know if you have any comments or suggestions.
>
> Thanks,
> Stefan
>
>
>