Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v6]

2024-09-10 Thread Thomas Stuefe
On Fri, 6 Sep 2024 16:20:52 GMT, Coleen Phillimore  wrote:

>> This change stores InstanceKlass for interface and abstract classes in the 
>> non-class metaspace, since class metaspace will have limits on number of 
>> classes that can be represented when Lilliput changes go in.  Classes that 
>> have no instances created for them don't require compressed class pointers.  
>> The generated LambdaForm classes are also AllStatic, and changing them to 
>> abstract moves them to non-class metaspace too.  It's not technically great 
>> to make them abstract and not final but you can't have both.  Java classfile 
>> access flags have no way of specifying something like AllStatic.
>> 
>> Tested with tier1-8.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Replace Metaspace::is_compressed_klass_ptr with 
> CompressedKlassPointers::is_in_encoding_range.

This looks good to me.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2292015058


Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v5]

2024-09-06 Thread Thomas Stuefe
On Thu, 29 Aug 2024 12:08:37 GMT, Coleen Phillimore  wrote:

>> This change stores InstanceKlass for interface and abstract classes in the 
>> non-class metaspace, since class metaspace will have limits on number of 
>> classes that can be represented when Lilliput changes go in.  Classes that 
>> have no instances created for them don't require compressed class pointers.  
>> The generated LambdaForm classes are also AllStatic, and changing them to 
>> abstract moves them to non-class metaspace too.  It's not technically great 
>> to make them abstract and not final but you can't have both.  Java classfile 
>> access flags have no way of specifying something like AllStatic.
>> 
>> Tested with tier1-8.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Add function in Metaspace to tell you if Klass pointer is in compressible 
> space.

I still worry about unforeseen implications of not every Klass having an 
nKlass. It will have some implications on some of my Lilliput work. I wish we 
could have done this earlier. 

But in any case, this seems to be a reasonable way forward. One remark inline, 
otherwise this looks mostly good to me.

src/hotspot/share/memory/metaspace.hpp line 165:

> 163: return using_class_space() && (is_in_class_space(k) || 
> is_in_shared_metaspace(k));
> 164:   }
> 165: 

I propose to drop this, and instead add a utility function to 
`CompressedKlassPointers` like this:


// Returns true if p falls into the narrow Klass encoding range
inline bool CompressedKlassPointers::is_in_encoding_range(const void* p) {
  return _base != nullptr && p >= _base && p < (_base + _range);
}

(Probably  the `_base != nullptr` could even be left out, since `_range==0` and 
`_base==nullptr` for -UseCompressedClassPointers)

And then use that function in `jfrTraceIdKlass.cpp`. That file needs to use 
`CompressedKlassPointers` anyway because it needs to encode the Klass*.

This avoids having to rely on the exact composition of the memory regions 
inside the encoding range. What counts is whether the Klass pointer points into 
the narrow Klass encoding range.

Essentially, with CDS, memory looks like this:


encoding   encoding 
 
base   end
|---|
|CDS---| |class space---|

-

PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2285987065
PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1746943501


Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace

2024-09-03 Thread Thomas Stuefe
On Tue, 27 Aug 2024 15:38:24 GMT, Thomas Stuefe  wrote:

>>> I don't think the costs for two address comparisons matter, not with the 
>>> comparatively few deallocations that happen (few hundreds or few thousand). 
>>> If deallocate is hot, we are using metaspace wrong.
>> 
>> MethodData does a lot of deallocations from metaspace because it's allocated 
>> racily.  It might be using Metaspace wrong.
>
>> > I don't think the costs for two address comparisons matter, not with the 
>> > comparatively few deallocations that happen (few hundreds or few 
>> > thousand). If deallocate is hot, we are using metaspace wrong.
>> 
>> MethodData does a lot of deallocations from metaspace because it's allocated 
>> racily. It might be using Metaspace wrong.
> 
> I think that should be okay. This should still be an exception. I have never 
> seen that many deallocations happening in customer cases.

> @tstuefe Do you have more comments on this PR?

Sorry, I was swamped the past days. I'll take a look tomorrow.

-

PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2326867932


Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4]

2024-08-29 Thread Thomas Stuefe
On Thu, 29 Aug 2024 11:37:19 GMT, Coleen Phillimore  wrote:

>> src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdKlassQueue.cpp
>>  line 79:
>> 
>>> 77: 
>>> 78: static bool can_compress_element(const Klass* klass) {
>>> 79:   return Metaspace::is_in_class_space(klass) &&
>> 
>> Suggestion:
>> 
>>   return (Metaspace::is_in_class_space(klass) || 
>> Metaspace::is_in_shared_metaspace(klass)) &&
>
> Is this right?  If UseCompressedClassPointers is off, then the shared 
> metaspace isn't in compressed space?

If UseCompressedClassPointers is off, we don't have a compressed class space. 
If its on, Klass from CDS and from class space are compressable. With your 
patch, interfaces will live in normal metaspace, not int class space, so those 
are excluded now.

TBH, I am not really sure what this code here does, but I assume it tries to 
reduce the size of a JFR recording by using a compressed identifier for X if X 
can be expressed by such. Maybe a JFR person should look at this.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736193248


Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4]

2024-08-28 Thread Thomas Stuefe
On Wed, 28 Aug 2024 15:42:57 GMT, Coleen Phillimore  wrote:

>> This change stores InstanceKlass for interface and abstract classes in the 
>> non-class metaspace, since class metaspace will have limits on number of 
>> classes that can be represented when Lilliput changes go in.  Classes that 
>> have no instances created for them don't require compressed class pointers.  
>> The generated LambdaForm classes are also AllStatic, and changing them to 
>> abstract moves them to non-class metaspace too.  It's not technically great 
>> to make them abstract and not final but you can't have both.  Java classfile 
>> access flags have no way of specifying something like AllStatic.
>> 
>> Tested with tier1-8.
>
> Coleen Phillimore has updated the pull request incrementally with two 
> additional commits since the last revision:
> 
>  - Merge branch 'anon' of github.com:coleenp/jdk into anon
>  - Fix copyright

src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdKlassQueue.cpp
 line 79:

> 77: 
> 78: static bool can_compress_element(const Klass* klass) {
> 79:   return Metaspace::is_in_class_space(klass) &&

Suggestion:

  return (Metaspace::is_in_class_space(klass) || 
Metaspace::is_in_shared_metaspace(klass)) &&

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1735585478


Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace

2024-08-27 Thread Thomas Stuefe
On Mon, 26 Aug 2024 13:57:16 GMT, Coleen Phillimore  wrote:

> > I don't think the costs for two address comparisons matter, not with the 
> > comparatively few deallocations that happen (few hundreds or few thousand). 
> > If deallocate is hot, we are using metaspace wrong.
> 
> MethodData does a lot of deallocations from metaspace because it's allocated 
> racily. It might be using Metaspace wrong.

I think that should be okay. This should still be an exception. I have never 
seen that many deallocations happening in customer cases.

-

PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2312905514


Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v2]

2024-08-24 Thread Thomas Stuefe
On Fri, 23 Aug 2024 20:46:39 GMT, Coleen Phillimore  wrote:

>> This change stores InstanceKlass for interface and abstract classes in the 
>> non-class metaspace, since class metaspace will have limits on number of 
>> classes that can be represented when Lilliput changes go in.  Classes that 
>> have no instances created for them don't require compressed class pointers.  
>> The generated LambdaForm classes are also AllStatic, and changing them to 
>> abstract moves them to non-class metaspace too.  It's not technically great 
>> to make them abstract and not final but you can't have both.  Java classfile 
>> access flags have no way of specifying something like AllStatic.
>> 
>> Tested with tier1-8.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Incorporated a set of Thomas Stuefe's comments. Take out AbstractClass 
> MetaspaceObj::Type.

> I renamed this is_in_class_space() with the lower case 'c'. It's still 
> directing metaspace or indicating where the object was allocated. Your name 
> is a little better but I think not enough until we want to expand the things 
> we want allocated in the class space. As we talked about, with Tiny Class 
> Pointers, class space will have different things in it (not that these new 
> things need a compressed pointers). But I think we're better off having less 
> things in the space where their pointers can be compressed since this space 
> is constrained.


How about "needs_class_space" then?

-

PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2308353198


Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace

2024-08-24 Thread Thomas Stuefe
On Fri, 23 Aug 2024 19:26:46 GMT, Coleen Phillimore  wrote:

> Yes, is_in_klass_space was just to direct where to deallocate the metaspace 
> pointer. In your patch isn't the contains metaspace call still very slow? Or 
> I suppose for class space, it's not because it's a fixed space. But it's not 
> an inlined call at all because I had to search in cpp files for the range 
> check.
> 
> * const bool is_class = Metaspace::contains_in_class_space(ptr);
> 
> I sort of think it might be better for the outside runtime code to control 
> this and the metaspace call assert if its wrong.

No, I think my way is better and it will be needed anyway for TinyCP/Lilliput. 
We only need to do two address comparisons, that should be simple and fast. 

I opened a PR to separate the change, and in that PR I also inline the check. 

https://github.com/openjdk/jdk/pull/20701

I don't think the costs for two address comparisons matter, not with the 
comparatively few deallocations that happen (few hundreds or few thousand). If 
deallocate is hot, we are using metaspace wrong.

-

PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2308352940


Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace

2024-08-23 Thread Thomas Stuefe
On Thu, 9 May 2024 13:51:09 GMT, Coleen Phillimore  wrote:

> This change stores InstanceKlass for interface and abstract classes in the 
> non-class metaspace, since class metaspace will have limits on number of 
> classes that can be represented when Lilliput changes go in.  Classes that 
> have no instances created for them don't require compressed class pointers.  
> The generated LambdaForm classes are also AllStatic, and changing them to 
> abstract moves them to non-class metaspace too.  It's not technically great 
> to make them abstract and not final but you can't have both.  Java classfile 
> access flags have no way of specifying something like AllStatic.
> 
> Tested with tier1-8.

Hi Coleen,

IIUC, the new "is_in_klass_space" function that now is present in all Metadata 
children only exists because of the template function in MetadataFactory, 
right? Just for the purpose of deallocation?

If so, see this proposed addition to your patch: 
https://gist.github.com/tstuefe/5111c735b12f6d9c3c1d32699d0820f6

This would make Metaspace::deallocate smarter - Metaspace knows whether a given 
pointer is in class space or not, it can do automatically the right thing. 
There should be no need to tell it how to deallocate that storage. (If you are 
worried, in debug builds there are also sanity checks).

If you do this, I think you could remove all variants of "is_in_klass_space" 
apart from the one in Klass.

src/hotspot/share/memory/allocation.hpp line 319:

> 317:   f(SharedClassPathEntry) \
> 318:   f(RecordComponent) \
> 319:   f(AbstractClass)

This is a minor nit: I assume this new constant is just there to steer 
allocation down in metaspace away from Class space? This breaks a bit with 
established pattern, because the type `Klass::type()` returns is still 
"ClassType", so this new constant never really appears anywhere. Its only point 
is "its not classtype". You could probably hand down any other constant to 
Metaspace::allocate, as long as its not `ClassType`.

What we should eventually do, but in a follow up RFE:  modify 
`Metaspace::allocate` to replace the `MetaspaceObj::Type` parameter with a 
`MetadataType mdType` parameter. Or a plain "allocate_in_classspace_please" 
boolean parameter. Because Metaspace::allocate does not really need to know the 
type of the metadata object. It just needs to know if the caller insists on 
having the data in class space.

 But lets do this in a follow-up.

src/hotspot/share/oops/instanceKlass.cpp line 456:

> 454: 
> 455:   InstanceKlass* ik;
> 456:   MetaspaceObj::Type type = (parser.is_interface() || 
> parser.is_abstract()) ? MetaspaceObj::AbstractClassType : 
> MetaspaceObj::ClassType;

small nit, const or constexpr?

src/hotspot/share/oops/klass.hpp line 205:

> 203: 
> 204:   void* operator new(size_t size, ClassLoaderData* loader_data, size_t 
> word_size, TRAPS) throw();
> 205: 

Oh ArrayKlass never used this? Its good to move it to InstanceKlass.

src/hotspot/share/oops/klass.hpp line 214:

> 212:   virtual bool is_klass() const { return true; }
> 213: 
> 214:   bool is_in_klass_space() const { return !is_interface() && 
> !is_abstract(); }

This name is misleading. As a caller, I expect a function with this name to 
make a range check of Klass* to be inside class space range. This is more of a 
"should be".

(We also write class space with `c` throughout hotspot, its weird to have it 
with `k` now)

How about, instead: "needs_narrow_klass_id" or "must_be_narrow_encodable" or 
similar? That clearly says what you want, that this class needs to be encodable 
with a narrow id for whatever is your reason. This leaves room for future 
changes (e.g. a possible future where we need narrow klass ids for other 
reasons than to make heap objects smaller, or where there is no class space 
anymore).

-

PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2256433389
PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1728452223
PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1728421710
PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1728421197
PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1728417646


Re: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace

2024-08-22 Thread Thomas Stuefe
On Thu, 9 May 2024 13:51:09 GMT, Coleen Phillimore  wrote:

> This change stores InstanceKlass for interface and abstract classes in the 
> non-class metaspace, since class metaspace will have limits on number of 
> classes that can be represented when Lilliput changes go in.  Classes that 
> have no instances created for them don't require compressed class pointers.  
> The generated LambdaForm classes are also AllStatic, and changing them to 
> abstract moves them to non-class metaspace too.  It's not technically great 
> to make them abstract and not final but you can't have both.  Java classfile 
> access flags have no way of specifying something like AllStatic.
> 
> Tested with tier1-8.

I am surprised that this patch is so small. I would have assumed a lot of code 
exists that unconditionally assumes we always can encode decode 
Klass*<->narrowKlass.

I looked through the typical cases (eg Klass validation) and all of them seem 
to be okay. I will keep looking.

-

PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2304169936


Re: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v2]

2024-07-17 Thread Thomas Stuefe
On Wed, 17 Jul 2024 14:02:01 GMT, Thomas Stuefe  wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Updating copyright headers
>
> src/hotspot/share/code/codeCache.cpp line 1800:
> 
>> 1798: jio_snprintf(fname, sizeof(fname), "/tmp/perf-%d.map",
>> 1799:  os::current_process_id());
>> 1800:   }
> 
> Arguably one could just do 
> 
> constexpr char[] filename_default = "/tmp/perf-%p.map";
> Arguments::copy_expand_pid(filename == nullptr ? filename_default : filename, 
> .);

This pattern can be followed in all cases where we have default filenames

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681149193


Re: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v2]

2024-07-17 Thread Thomas Stuefe
On Wed, 17 Jul 2024 14:02:31 GMT, Sonia Zaldana Calles  
wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) 
>> enabling jcmd diagnostic commands that issue an output file to accept the 
>> `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality 
>> (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly 
>> to how it’s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename (Optional) Name of the file to which the flight recording 
>> data is
>>written when the recording is stopped. If no filename is 
>> given, a
>>filename is generated from the PID and the current date 
>> and is
>>placed in the directory where the process was started. The
>>filename may also be a directory in which case, the 
>> filename is
>>generated from the PID and the current date in the 
>> specified
>>directory. (STRING, no default value)
>> 
>>Note: If a filename is given, '%p' in the filename will be
>>replaced by the PID, and '%t' will be replaced by the 
>> time in
>>'_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), 
>> sources for the jcmd manpage remain in Oracle internal repos so this PR 
>> can’t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I 
>> might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Updating copyright headers

First cursory review. That is a useful feature

- In all cases: please, in case of an error, don't THROW, don't do `warning`. 
Instead, just print to the `output()` of the DCmd. You want an error to appear 
to the user of the dcmd - so, to stdout or stderr of the jcmd process issuing 
the command. Not an exception in the target JVM process, nor a warning in the 
target JVM stderr stream

- Can you give us a variant of `Arguments::copy_expand_pid` that receives a 
zero-terminated const char* as input so that we can avoid having to pass in the 
length of the input each time?

- when passing in output buffers to functions, try to use sizeof(buffer) 
instead of repeating the buffer size. Then, one can change the size of the 
buffer array without having to modify using calls (but aware: pitfall, 
sizeof(char[]) vs sizeof(char*))

src/hotspot/share/code/codeCache.cpp line 1796:

> 1794:   // Perf expects to find the map file at /tmp/perf-.map
> 1795:   // if the file name is not specified.
> 1796:   char fname[JVM_MAXPATHLEN];

Good to see this gone, the old code implicitly relied on: max pid len 
-2147483647 = 11 chars, + length of "/tmp/perf-.map" not overflowing 32, which 
cuts a bit close to the bone.

src/hotspot/share/code/codeCache.cpp line 1800:

> 1798: jio_snprintf(fname, sizeof(fname), "/tmp/perf-%d.map",
> 1799:  os::current_process_id());
> 1800:   }

Arguably one could just do 

constexpr char[] filename_default = "/tmp/perf-%p.map";
Arguments::copy_expand_pid(filename == nullptr ? filename_default : filename, 
.);

src/hotspot/share/services/diagnosticCommand.cpp line 525:

> 523: stringStream msg;
> 524: msg.print("Invalid file path name specified: %s", _filename.value());
> 525: THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), 
> msg.base());

Why throw? Why not just print an error message to the output() stream and 
return?

src/hotspot/share/services/diagnosticCommand.cpp line 1059:

> 1057:   fileh = java_lang_String::create_from_str(fname, CHECK);
> 1058: } else {
> 1059:   warning("Invalid file path name specified, fall back to default 
> name");

`warning` prints a warning to the stdout of the JVM process. You don't want 
that; you want a warning to the issuer of the dcmd, which is another - possibly 
even remote - process. Write errors to `output()`, instead.

src/hotspot/share/services/diagnosticCommand.cpp line 1138:

> 1136: stringStream msg;
> 1137: msg.print("Invalid file path name specified: %s", 
> _filepath.value());
> 1138: THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), 
> msg.base());

write to output() and return instead of throwing

-

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2183023385
PR Revi

Re: RFR: 8335882: platform/cgroup/TestSystemSettings.java fails on Alpine Linux

2024-07-08 Thread Thomas Stuefe
On Mon, 8 Jul 2024 14:19:21 GMT, Severin Gehwolf  wrote:

> Please review this simple test fix to exclude the test from being run on an 
> Alpine Linux system. Apparently default Alpine Linux systems don't have 
> cgroups set up by default the way other distros do. More info on the bug. I 
> propose to not run the test on musl systems.

Seems fine.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20076#pullrequestreview-2163517084


Re: RFR: 8333308: javap --system handling doesn't work on internal class names

2024-06-26 Thread Thomas Stuefe
On Tue, 25 Jun 2024 13:59:35 GMT, Sonia Zaldana Calles  
wrote:

> Hi all, 
> 
> This PR addresses [JDK-808](https://bugs.openjdk.org/browse/JDK-808) 
> enabling `javap -system` to handle internal class names. 
> 
> Thanks, 
> Sonia

+1

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19883#pullrequestreview-2144277865


Re: RFR: 8333308: javap --system handling doesn't work on internal class names

2024-06-26 Thread Thomas Stuefe
On Wed, 26 Jun 2024 07:54:16 GMT, Alan Bateman  wrote:

> > Question, what is the noreg-hard label used for?
> 
> It's the label to mean that it's too hard or impossible write a regression 
> test. It is documented in the [JBS label 
> dictionary](https://openjdk.org/guide/#jbs-label-dictionary) but may not be 
> widely known.

Ah, thank you, and I never knew this documentation either.

-

PR Comment: https://git.openjdk.org/jdk/pull/19883#issuecomment-2191089714


Re: RFR: 8333308: javap --system handling doesn't work on internal class names

2024-06-25 Thread Thomas Stuefe
On Tue, 25 Jun 2024 19:49:07 GMT, Chen Liang  wrote:

> Technically javap accepts both notations of `a.b.C` and `a/b/C.class` and 
> accepts both `.` and `$` as inner class separators. So this is fine. However 
> it's hard to verify that the jdk in `--system` is really used, so I put a 
> noreg-hard label on the original JBS issue; it's hard to create a suitable 
> argument for the `--system` flag as you need a whole JDK.

Question, what is the noreg-hard label used for?

-

PR Comment: https://git.openjdk.org/jdk/pull/19883#issuecomment-2190783227


Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v6]

2024-06-20 Thread Thomas Stuefe
On Thu, 20 Jun 2024 12:06:43 GMT, Severin Gehwolf  wrote:

>> Please review this enhancement to the container detection code which allows 
>> it to figure out whether the JVM is actually running inside a container 
>> (`podman`, `docker`, `crio`), or with some other means that enforces 
>> memory/cpu limits by means of the cgroup filesystem. If neither of those 
>> conditions hold, the JVM runs in not containerized mode, addressing the 
>> issue described in the JBS tracker. For example, on my Linux system 
>> `is_containerized() == false" is being indicated with the following trace 
>> log line:
>> 
>> 
>> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false 
>> because no cpu or memory limit is present
>> 
>> 
>> This state is being exposed by the Java `Metrics` API class using the new 
>> (still JDK internal) `isContainerized()` method. Example:
>> 
>> 
>> java -XshowSettings:system --version
>> Operating System Metrics:
>> Provider: cgroupv1
>> System not containerized.
>> openjdk 23-internal 2024-09-17
>> OpenJDK Runtime Environment (fastdebug build 
>> 23-internal-adhoc.sgehwolf.jdk-jdk)
>> OpenJDK 64-Bit Server VM (fastdebug build 
>> 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
>> 
>> 
>> The basic property this is being built on is the observation that the cgroup 
>> controllers typically get mounted read only into containers. Note that the 
>> current container tests assert that `OSContainer::is_containerized() == 
>> true` in various tests. Therefore, using the heuristic of "is any memory or 
>> cpu limit present" isn't sufficient. I had considered that in an earlier 
>> iteration, but many container tests failed.
>> 
>> Overall, I think, with this patch we improve the current situation of 
>> claiming a containerized system being present when it's actually just a 
>> regular Linux system.
>> 
>> Testing:
>> 
>> - [x] GHA (risc-v failure seems infra related)
>> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 
>> (including gtests)
>> - [x] Some manual testing using cri-o
>> 
>> Thoughts?
>
> Severin Gehwolf has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove problem listing of PlainRead which is reworked here

Seems okay. I don't understand the depths of V1 vs V2 controller files as well 
as you do, @jerboaa , but I trust you there. The mechanics seem fine.

src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 373:

> 371:  * (11)   mount source:matched with '%*s' and discarded
> 372:  * (12)   super options:   matched with '%*s' and discarded
> 373:  */

Thanks for the good comment. That scanf line is a brain teaser.

src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 422:

> 420:  * (12)   super options:   matched with '%s' and captured in 
> 'tmpcgroups'
> 421:  */
> 422: if (sscanf(p, "%*d %*d %*d:%*d %s %s %s%*[^-]- %s %*s %s", tmproot, 
> tmpmount, mount_opts, tmp_fs_type, tmpcgroups) == 5) {

The only difference to v1 is that we parse the super options (12), right? Could 
we factor out the parsing into a helper function? Or, alternatively, at least 
`#define` the scanf format somewhere up top, including the nice comment, and 
reuse that format string?

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-2130943896
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1647881202
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1647925120


Re: RFR: 8211847: [aix] java/lang/ProcessHandle/InfoTest.java fails: "reported cputime less than expected"

2024-06-13 Thread Thomas Stuefe
On Thu, 13 Jun 2024 09:47:25 GMT, Christoph Langer  wrote:

> It seems the error is gone meanwhile. So we can reenable the test.

Okay. Any idea what fixed the test?

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19691#pullrequestreview-2115358316


Re: RFR: 8332400: isspace argument should be a valid unsigned char [v2]

2024-06-12 Thread Thomas Stuefe
On Tue, 11 Jun 2024 18:07:10 GMT, Robert Toyonaga  wrote:

>> ### Summary
>> This change ensures we don't get undefined behavior when 
>> calling[`isspace`](https://pubs.opengroup.org/onlinepubs/007904975/functions/isspace.html).
>>   `isspace` accepts an `int` argument that "the application shall ensure is 
>> a character representable as an unsigned char or equal to the value of the 
>> macro EOF.". 
>> 
>> Previously, there was no checking of the values passed to `isspace`. I've 
>> replaced direct calls with a new wrapper `os::is_space` that performs a 
>> range check and prevents the possibility of undefined behavior from 
>> happening. For instances outside of Hotspot, I've added casts to `unsigned 
>> char`. 
>> 
>> **Testing**
>> - Added a new test in `test/hotspot/gtest/runtime/test_os.cpp` to check 
>> `os::is_space` is working correctly.
>> - tier1
>
> Robert Toyonaga has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Replace wrapper with casts.

Yes, unfortunate, but ok. Thank you for doing this.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19567#pullrequestreview-2112563225


Re: RFR: 8332400: isspace argument should be a valid unsigned char

2024-06-10 Thread Thomas Stuefe
On Mon, 10 Jun 2024 08:20:38 GMT, David Holmes  wrote:

> > "To use these functions safely with plain chars (or signed chars), the 
> > argument should first be converted to unsigned char"
> 
> @tstuefe wow! Okay. That is a surprise to me. A cast to unsigned char doesn't 
> actually do anything. Every char is "representable" as an unsigned char 
> because it holds a bit pattern between 0x00 and 0xff i.e. the function is 
> well defined if the incoming int is either EOF (int -1) or else in the range 
> 0x00 to 0xff. But I did a bit of searching and it seems it comes down to 
> potential arithmetic operations on the "char" the might behave differently 
> depending on the signed-ness. :(

I was surprised as well. Turns out you can use something for 20+ years and not 
notice :)

-

PR Comment: https://git.openjdk.org/jdk/pull/19567#issuecomment-2157711653


Re: RFR: 8332400: isspace argument should be a valid unsigned char

2024-06-06 Thread Thomas Stuefe
On Thu, 6 Jun 2024 21:21:23 GMT, David Holmes  wrote:

>> ### Summary
>> This change ensures we don't get undefined behavior when 
>> calling[`isspace`](https://pubs.opengroup.org/onlinepubs/007904975/functions/isspace.html).
>>   `isspace` accepts an `int` argument that "the application shall ensure is 
>> a character representable as an unsigned char or equal to the value of the 
>> macro EOF.". 
>> 
>> Previously, there was no checking of the values passed to `isspace`. I've 
>> replaced direct calls with a new wrapper `os::is_space` that performs a 
>> range check and prevents the possibility of undefined behavior from 
>> happening. For instances outside of Hotspot, I've added casts to `unsigned 
>> char`. 
>> 
>> **Testing**
>> - Added a new test in `test/hotspot/gtest/runtime/test_os.cpp` to check 
>> `os::is_space` is working correctly.
>> - tier1
>
> Sorry I think this is well-intentioned but unnecessary in nearly all cases. 
> If you pass a char there is no potential problem. Only passing an actual int 
> could be a problem.

@dholmes-ora 

> Sorry I think this is well-intentioned but unnecessary in nearly all cases. 
> If you pass a char there is no potential problem. Only passing an actual int 
> could be a problem.

Note that the issue was motivated by an Oracle engineer complaining about me 
using isspace on a char. That caused me to look up its behavior. Recently, we 
seem intent on eliminating UB, so why not.

That said, I agree that we probably don't need the wrapper. And casting to int 
feels awkward. I propose
- input from trusted sources, e.g. proc fs, don't need casting
- input from other sources should be casted to unsigned char (see 
recommendation here: https://en.cppreference.com/w/cpp/string/byte/isspace "To 
use these functions safely with plain chars (or signed chars), the argument 
should first be converted to unsigned char")

-

PR Comment: https://git.openjdk.org/jdk/pull/19567#issuecomment-2154146793


Re: RFR: 8329581: Java launcher no longer prints a stack trace [v10]

2024-06-04 Thread Thomas Stuefe
On Fri, 31 May 2024 14:34:16 GMT, Sonia Zaldana Calles  
wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Decreasing diff size addressing unnecessary changes
>
> Hi all,  
> 
> I think there's some consensus that we need some follow up cleanup issues for 
> the JNI spec, renaming constants, fixing return codes, etc. 
> 
> Seeing how that grows the scope of the issue quite a bit, I'd like to push 
> this patch and track the other issues brought up separately. 
> 
> If there are no objections about the current state, I'd like to integrate 
> some time next week. Let me know your thoughts.
> 
> cc: @jaikiran, @AlanBateman

> Hello Sonia @SoniaZaldana, so based on Alan's latest inputs, the next thing 
> to do on this PR is to refresh the PR to merge it with the latest master 
> branch changes and then run tier1, tier2 and tier3 tests to make sure they 
> continue to pass and have this integrated soon.

Curious, why tier 1 to 3 specifically? Is there anything specific in tier 3 you 
want to have tested?

-

PR Comment: https://git.openjdk.org/jdk/pull/18786#issuecomment-2147514185


Re: RFR: 8329581: Java launcher no longer prints a stack trace [v11]

2024-06-04 Thread Thomas Stuefe
On Tue, 4 Jun 2024 13:34:30 GMT, Sonia Zaldana Calles  
wrote:

>> Hi folks, 
>> 
>> This PR aims to fix 
>> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
>> 
>> I think the regression got introduced in 
>> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
>> 
>> In the issue linked above, 
>> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>>  got removed to simplify launcher code.
>> 
>> Previously, we used ```getMainType``` to do the appropriate main method 
>> invocation in ```JavaMain```. However, we currently attempt to do all types 
>> of main method invocations at the same time 
>> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>>  
>> 
>> Note how all of these invocations clear the exception reported with 
>> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>>  
>> 
>> Therefore, if a legitimate exception comes up during one of these 
>> invocations, it does not get reported. 
>> 
>> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
>> forward to your suggestions. 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request with a new target base due 
> to a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 14 additional 
> commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into JDK-8329581
>  - Decreasing diff size addressing unnecessary changes
>  - Fixing indentation
>
>Co-authored-by: ExE Boss <3889017+exe-b...@users.noreply.github.com>
>  - Fixing broken uncaught exception mechanism
>  - Make deleting file OS agnostic
>  - Adding test case
>  - Removing long lines
>  - Using new macro to avoid reporting JNI error
>  - Changes based on feedback
>  - Fixing space formatting
>  - ... and 4 more: https://git.openjdk.org/jdk/compare/d654de0c...12fe416c

Latest version is still good.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18786#pullrequestreview-2096358502


Re: RFR: 8329581: Java launcher no longer prints a stack trace [v10]

2024-05-14 Thread Thomas Stuefe
On Mon, 13 May 2024 18:01:25 GMT, Sonia Zaldana Calles  
wrote:

> > This mostly looks good. I'm just puzzled CHECK_EXCEPTION_NULL_FAIL. The JNI 
> > functions GetStaticMethodID, GetMethodID and NewObject return NULL with a 
> > pending exception when they fail. So I would expect 
> > CHECK_EXCEPTION_NULL_FAIL to just check if obj is NULL rather check for an 
> > exception first. It's not wrong to check for an exception, just curious 
> > when looking at this macro.
> 
> Hi @AlanBateman, thanks for taking a look. That's a good point - would it be 
> worthwhile to delete the exception check in this case?

Well, its not wrong, and arguably more defensive.

Doc for GetStaticMethodID states:


RETURNS:
Returns a method ID, or NULL if the operation fails.

THROWS:
NoSuchMethodError: if the specified static method cannot be found.
ExceptionInInitializerError: if the class initializer fails due to an exception.
OutOfMemoryError: if the system runs out of memory.


but it does not state explicitly that an exception is thrown on every error, or 
whether there are cases where the API can return NULL but not throw an 
exception, or vice versa.

So, I'd check for both. Or, if we think that both should not happen or happen 
together, assert that. (we can use the standard C assert in the JDK libraries, 
no?)

-

PR Comment: https://git.openjdk.org/jdk/pull/18786#issuecomment-2109450146


Re: RFR: 8329581: Java launcher no longer prints a stack trace [v10]

2024-05-10 Thread Thomas Stuefe
On Thu, 9 May 2024 19:52:12 GMT, Sonia Zaldana Calles  
wrote:

>> Hi folks, 
>> 
>> This PR aims to fix 
>> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
>> 
>> I think the regression got introduced in 
>> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
>> 
>> In the issue linked above, 
>> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>>  got removed to simplify launcher code.
>> 
>> Previously, we used ```getMainType``` to do the appropriate main method 
>> invocation in ```JavaMain```. However, we currently attempt to do all types 
>> of main method invocations at the same time 
>> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>>  
>> 
>> Note how all of these invocations clear the exception reported with 
>> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>>  
>> 
>> Therefore, if a legitimate exception comes up during one of these 
>> invocations, it does not get reported. 
>> 
>> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
>> forward to your suggestions. 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Decreasing diff size addressing unnecessary changes

My remarks have all been addressed. Thank you, @SoniaZaldana. From my side this 
is good to go.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18786#pullrequestreview-2049599835


Re: RFR: 8329581: Java launcher no longer prints a stack trace [v9]

2024-05-10 Thread Thomas Stuefe
On Thu, 9 May 2024 19:48:53 GMT, Sonia Zaldana Calles  
wrote:

> > This may be food for another RFE, to keep this patch minimal. But a good 
> > solution, to me, would be like this:
> > 
> > * have the same logic for return codes (1 = error, 0 = success) to ease 
> > understanding
> > * have clearly named constants (e.g. "LAUNCHER_OK" 0, "LAUNCHER_ERR" = 1)
> > * have the LEAVE macro take the launcher return code as argument
> > * have all xxx_LEAVE macros pass in LAUNCHER_ERR to LEAVE
> > * call the final LEAVE with LAUNCHER_OK
> > * optionally, define something like "LEAVE_ERR" and "LEAVE_OK" that call 
> > LEAVE with either LAUNCHER_ERR or LAUNCHER_OK, for more concise coding.
> > 
> > For this patch, I think the return code logic is okay, but I would feel 
> > better if others double-checked.
> 
> @tstuefe Agreed, I can look into opening another issue to track this after we 
> fix the regression.

Thank you. Please do. You can just copy-paste my rant into the bug description.

-

PR Comment: https://git.openjdk.org/jdk/pull/18786#issuecomment-2104186742


Re: RFR: 8329581: Java launcher no longer prints a stack trace [v9]

2024-05-08 Thread Thomas Stuefe
On Mon, 6 May 2024 19:06:10 GMT, Sonia Zaldana Calles  
wrote:

>> Hi folks, 
>> 
>> This PR aims to fix 
>> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
>> 
>> I think the regression got introduced in 
>> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
>> 
>> In the issue linked above, 
>> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>>  got removed to simplify launcher code.
>> 
>> Previously, we used ```getMainType``` to do the appropriate main method 
>> invocation in ```JavaMain```. However, we currently attempt to do all types 
>> of main method invocations at the same time 
>> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>>  
>> 
>> Note how all of these invocations clear the exception reported with 
>> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>>  
>> 
>> Therefore, if a legitimate exception comes up during one of these 
>> invocations, it does not get reported. 
>> 
>> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
>> forward to your suggestions. 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fixing indentation
>   
>   Co-authored-by: ExE Boss <3889017+exe-b...@users.noreply.github.com>

Pre-existing: Man, I cannot grok the complex return code handling, tbh.

We have the local `ret` variable holding a return code. We also hand codes to 
CHECK_EXCEPTION_LEAVE as macro argument. But we don't hand codes to 
CHECK_EXCEPTION_NULL_LEAVE. LEAVE uses the locally defined `ret` instead of 
getting the return code as argument. CHECK_EXCEPTION_LEAVE modifies the local 
`ret`, but CHECK_EXCEPTION_NULL_LEAVE does not.

CHECK_EXCEPTION_NULL_LEAVE does not set `ret`. So, in case of an error, it 
would cause the launcher to return OK, but this does not happen because the 
local `ret` gets initialized to 1 before the first call to 
CHECK_EXCEPTION_NULL_LEAVE (line 566 resp. 560). Not sure if this was 
intentional, but it surely is very brittle. We rely on the content of `ret`, 
and that changes several times throughout JavaMain.

CHECK_EXCEPTION_NULL_LEAVE argument is named CENL_exception, which I don't 
understand.

To confuse matters more, the logic for internal error codes and the launcher 
return code is reversed: internally, 0 means error, and externally, 0 means 
success. And we only use numerical literals (`1`, `0`) instead of clearly named 
constants.

This may be food for another RFE, to keep this patch minimal. But a good 
solution, to me, would be like this:

- have the same logic for return codes (1 = error, 0 = success) to ease 
understanding
- have clearly named constants (e.g. "LAUNCHER_OK" 0, "LAUNCHER_ERR" = 1)
- have the LEAVE macro take the launcher return code as argument
- have all xxx_LEAVE macros pass in LAUNCHER_ERR to LEAVE
- call the final LEAVE with LAUNCHER_OK
- optionally, define something like "LEAVE_ERR" and "LEAVE_OK" that call LEAVE 
with either LAUNCHER_ERR or LAUNCHER_OK, for more concise coding.

For this patch, I think the return code logic is okay, but I would feel better 
if others double-checked.

src/java.base/share/native/libjli/java.c line 394:

> 392: if ((*env)->ExceptionOccurred(env)) { \
> 393: return 0; \
> 394: } else if (obj == NULL) { \

Side note, I first wondered if this comparison is strictly correct, since we 
now pass in `jmethodID` as well as `jobject`, which are opaque types and not 
necessarily of the same size.

But seems that jmethodID==NULL is defined to mean "invalid" [1] by the spec. 
Requiring NULL instead of providing an opaque invalid constant feels like an 
odd choice in the original JNI spec, since it requires implementors to use a 
pointer type to implement jmethodID? Which we do, in OpenJDK [2].

[1] 
https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#getstaticmethodid
[2] 
https://github.com/openjdk/jdk/blob/2baacfc16916220846743c6e49a99a6c41cac510/src/java.base/share/native/include/jni.h#L135-L136

src/java.base/share/native/libjli/java.c line 420:

> 418: jmethodID mainID =
> 419: (*env)->GetMethodID(env, mainClass, "main", 
> "([Ljava/lang/String;)V");
> 420: CHECK_EXCEPTION_NULL_FAIL(mainID);

Is there a particular reason why you moved this section up here, from line 432 
before? If not, I'd restore it to its original position to keep the diff small.

src/java.base/share/native/libjli/java.c line 452:

> 450: jobject mainObject = (*env)->NewObject(env, mainClass, constructor);
> 451: CHECK_EXCEPTION_NULL_FAIL(mainObject);
> 452: jmethodID mainID = (*env)->GetMethodID(env, mainClass, "main", 
> "()V");

Unnecessary change. Please restore original

Re: RFR: 8329862: GetApplicationHome and GetApplicationHomeFromDll enhance jli tracing [v6]

2024-04-26 Thread Thomas Stuefe
On Thu, 25 Apr 2024 13:22:01 GMT, Matthias Baesken  wrote:

>> We have already good JLI tracing capabilities. But GetApplicationHome and 
>> GetApplicationHomeFromDll lack some tracing and should be enhanced.
>
> Matthias Baesken has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   remove /jre path check

Change still good

> Still looks good. We should maybe change the synopsis (title) of the bug to 
> something like "libjli GetApplicationHome cleanups"?

Well, it does enhance tracing, so the title is not lying :)

-

PR Comment: https://git.openjdk.org/jdk/pull/18699#issuecomment-2078751493


Re: RFR: 8329862: GetApplicationHome and GetApplicationHomeFromDll enhance jli tracing [v5]

2024-04-25 Thread Thomas Stuefe
On Tue, 23 Apr 2024 14:31:44 GMT, Matthias Baesken  wrote:

>> We have already good JLI tracing capabilities. But GetApplicationHome and 
>> GetApplicationHomeFromDll lack some tracing and should be enhanced.
>
> Matthias Baesken has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   adjust output

LGTM

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18699#pullrequestreview-2022399268


Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]

2024-04-16 Thread Thomas Stuefe
On Sat, 13 Apr 2024 18:29:59 GMT, Thomas Stuefe  wrote:

>> Severin Gehwolf has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains ten additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>>  - jcheck fixes
>>  - Fix tests
>>  - Implement Metrics.isContainerized()
>>  - Some clean-up
>>  - Drop cgroups testing on plain Linux
>>  - Implement fall-back logic for non-ro controller mounts
>>  - Make find_ro static and local to compilation unit
>>  - 8261242: [Linux] OSContainer::is_containerized() returns true
>
> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170:
> 
>> 168: }
>> 169:   }
>> 170:   return false;
> 
> An alternative, simpler, no need for modifying source string:
> 
> static bool find_ro_opt(const char* o) {
>   return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,");
> }

Please disregard my comment. Albeit longer, your version is clearer to read and 
more fault tolerant.

> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351:
> 
>> 349: //
>> 350: // We collect the read only mount option in the cgroup infos so as 
>> to have that
>> 351: // info ready when determining is_containerized().
> 
> Here, and in other places: a comment indicating the line format we scan would 
> be appreciated, possibly with argument numbers. Saves the casual code reader 
> from looking into proc man page. Even just pasting the example line for proc 
> manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) 
> (but with order adapted to your scanf call, they count major:minor as one)

Trying to parse the `%s%*[^-]-`

So, %s parses the mount options, until we encounter whitespace. Then %*[^-]- 
parses everything that is not a dash, until we encounter the dash? Then we eat 
the dash? This is to skip the optionals?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567754861
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567767209


Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]

2024-04-16 Thread Thomas Stuefe
On Thu, 11 Apr 2024 12:08:02 GMT, Severin Gehwolf  wrote:

>> Please review this enhancement to the container detection code which allows 
>> it to figure out whether the JVM is actually running inside a container 
>> (`podman`, `docker`, `crio`), or with some other means that enforces 
>> memory/cpu limits by means of the cgroup filesystem. If neither of those 
>> conditions hold, the JVM runs in not containerized mode, addressing the 
>> issue described in the JBS tracker. For example, on my Linux system 
>> `is_containerized() == false" is being indicated with the following trace 
>> log line:
>> 
>> 
>> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false 
>> because no cpu or memory limit is present
>> 
>> 
>> This state is being exposed by the Java `Metrics` API class using the new 
>> (still JDK internal) `isContainerized()` method. Example:
>> 
>> 
>> java -XshowSettings:system --version
>> Operating System Metrics:
>> Provider: cgroupv1
>> System not containerized.
>> openjdk 23-internal 2024-09-17
>> OpenJDK Runtime Environment (fastdebug build 
>> 23-internal-adhoc.sgehwolf.jdk-jdk)
>> OpenJDK 64-Bit Server VM (fastdebug build 
>> 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
>> 
>> 
>> The basic property this is being built on is the observation that the cgroup 
>> controllers typically get mounted read only into containers. Note that the 
>> current container tests assert that `OSContainer::is_containerized() == 
>> true` in various tests. Therefore, using the heuristic of "is any memory or 
>> cpu limit present" isn't sufficient. I had considered that in an earlier 
>> iteration, but many container tests failed.
>> 
>> Overall, I think, with this patch we improve the current situation of 
>> claiming a containerized system being present when it's actually just a 
>> regular Linux system.
>> 
>> Testing:
>> 
>> - [x] GHA (risc-v failure seems infra related)
>> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 
>> (including gtests)
>> - [x] Some manual testing using cri-o
>> 
>> Thoughts?
>
> Severin Gehwolf has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains ten additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>  - jcheck fixes
>  - Fix tests
>  - Implement Metrics.isContainerized()
>  - Some clean-up
>  - Drop cgroups testing on plain Linux
>  - Implement fall-back logic for non-ro controller mounts
>  - Make find_ro static and local to compilation unit
>  - 8261242: [Linux] OSContainer::is_containerized() returns true

I am not enough of a container expert to judge if the basic approach is right - 
I trust you on this. This is just a technical code review.

src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170:

> 168: }
> 169:   }
> 170:   return false;

An alternative, simpler, no need for modifying source string:

static bool find_ro_opt(const char* o) {
  return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,");
}

src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351:

> 349: //
> 350: // We collect the read only mount option in the cgroup infos so as 
> to have that
> 351: // info ready when determining is_containerized().

Here, and in other places: a comment indicating the line format we scan would 
be appreciated, possibly with argument numbers. Saves the casual code reader 
from looking into proc man page. Even just pasting the example line for proc 
manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) (but 
with order adapted to your scanf call, they count major:minor as one)

src/hotspot/os/linux/osContainer_linux.cpp line 78:

> 76:   const char *reason;
> 77:   bool any_mem_cpu_limit_present = false;
> 78:   bool ctrl_ro = cgroup_subsystem->is_containerized();

nit: naming? what does ctrl mean in this case? Maybe use 
"cgroup_is_containerized"?

src/java.base/share/classes/sun/launcher/LauncherHelper.java line 375:

> 373: if (!c.isContainerized()) {
> 374: ostream.println(INDENT + "System not containerized.");
> 375: return;

Why return here? Would this not cut the output short in the non-containerized 
case?

And if this not intended, the not-containerized-`-XshowSettings:system` test 
below should test and catch this (e.g. scan for CPU set)

-

PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-1999328503
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1564182879
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567756663
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567774124
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567779248


Re: RFR: 8329581: Java launcher no longer prints a stack trace

2024-04-16 Thread Thomas Stuefe
On Tue, 16 Apr 2024 07:55:26 GMT, Thomas Stuefe  wrote:

>> Hi folks, 
>> 
>> This PR aims to fix 
>> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
>> 
>> I think the regression got introduced in 
>> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
>> 
>> In the issue linked above, 
>> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>>  got removed to simplify launcher code.
>> 
>> Previously, we used ```getMainType``` to do the appropriate main method 
>> invocation in ```JavaMain```. However, we currently attempt to do all types 
>> of main method invocations at the same time 
>> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>>  
>> 
>> Note how all of these invocations clear the exception reported with 
>> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>>  
>> 
>> Therefore, if a legitimate exception comes up during one of these 
>> invocations, it does not get reported. 
>> 
>> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
>> forward to your suggestions. 
>> 
>> Cheers, 
>> Sonia
>
> src/java.base/share/classes/sun/launcher/LauncherHelper.java line 912:
> 
>> 910: private static final int MAIN_WITHOUT_ARGS = 1;
>> 911: private static final int MAIN_NONSTATIC = 2;
>> 912: private static int mainType = 0;
> 
> Nit: transferring information from java to C in this way is usually done, 
> AFAICS, by accessing static fields directly 
> (GetStaticFieldID+GetStaticXXXField). There is also a helper function that 
> wraps this, see JNU_GetStaticFieldByName. Not sure if you have that available 
> in libjli though. There are many examples for both patterns.
> 
> I also would consider just declaring two booleans here, isStatic and hasArgs, 
> which that would be a bit clearer to read instead of combining both into a 
> single flag variable, and no need to keep flag values synced across java/C.

Thinking about this some more, would it not be possible to just use the 
mainMethod directly down in C?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18786#discussion_r1566948449


Re: RFR: 8329581: Java launcher no longer prints a stack trace

2024-04-16 Thread Thomas Stuefe
On Mon, 15 Apr 2024 18:25:02 GMT, Sonia Zaldana Calles  
wrote:

> Hi folks, 
> 
> This PR aims to fix 
> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
> 
> I think the regression got introduced in 
> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
> 
> In the issue linked above, 
> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>  got removed to simplify launcher code.
> 
> Previously, we used ```getMainType``` to do the appropriate main method 
> invocation in ```JavaMain```. However, we currently attempt to do all types 
> of main method invocations at the same time 
> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>  
> 
> Note how all of these invocations clear the exception reported with 
> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>  
> 
> Therefore, if a legitimate exception comes up during one of these 
> invocations, it does not get reported. 
> 
> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
> forward to your suggestions. 
> 
> Cheers, 
> Sonia

> Isn't this already fixed by #18753?

> Yes, it would be good to try the changes in pull/18753 first.

I still like @SoniaZaldana 's variant better.

AFAIU, in the original variant, we choose the main method via 
MethodFinder.findMainMethod, but then down in JavaMain in C we just try to 
invoke all variants in succession 
(static,args->instance,args->static,noargs->instance,noargs). Does that no mean 
we ignore the decision of findMainMethod?

Apart from that, it means fewer exceptions are thrown when starting the JVM if 
we avoid calling methods we know don't work.

-

PR Comment: https://git.openjdk.org/jdk/pull/18786#issuecomment-2058521084


Re: RFR: 8329581: Java launcher no longer prints a stack trace

2024-04-16 Thread Thomas Stuefe
On Mon, 15 Apr 2024 18:25:02 GMT, Sonia Zaldana Calles  
wrote:

> Hi folks, 
> 
> This PR aims to fix 
> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
> 
> I think the regression got introduced in 
> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
> 
> In the issue linked above, 
> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>  got removed to simplify launcher code.
> 
> Previously, we used ```getMainType``` to do the appropriate main method 
> invocation in ```JavaMain```. However, we currently attempt to do all types 
> of main method invocations at the same time 
> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>  
> 
> Note how all of these invocations clear the exception reported with 
> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>  
> 
> Therefore, if a legitimate exception comes up during one of these 
> invocations, it does not get reported. 
> 
> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
> forward to your suggestions. 
> 
> Cheers, 
> Sonia

Good find.

src/java.base/share/classes/sun/launcher/LauncherHelper.java line 912:

> 910: private static final int MAIN_WITHOUT_ARGS = 1;
> 911: private static final int MAIN_NONSTATIC = 2;
> 912: private static int mainType = 0;

Nit: transferring information from java to C in this way is usually done, 
AFAICS, by accessing static fields directly 
(GetStaticFieldID+GetStaticXXXField). There is also a helper function that 
wraps this, see JNU_GetStaticFieldByName. Not sure if you have that available 
in libjli though. There are many examples for both patterns.

I also would consider just declaring two booleans here, isStatic and hasArgs, 
which that would be a bit clearer to read instead of combining both into a 
single flag variable, and no need to keep flag values synced across java/C.

-

PR Review: https://git.openjdk.org/jdk/pull/18786#pullrequestreview-2002891641
PR Review Comment: https://git.openjdk.org/jdk/pull/18786#discussion_r1566896963


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases

2024-04-12 Thread Thomas Stuefe
On Fri, 12 Apr 2024 10:07:48 GMT, Claes Redestad  wrote:

> I guess Lilliput adds some hard or soft limit on the number of classes loaded?

Yes, we are concerned with that, especially for a possible future where 
Lilliput is the sole default. Atm we can address about 4 million classes. There 
are thoughts about making this number of classes infinite, but if possible we 
would like to avoid that complexity.

-

PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051597166


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases

2024-04-12 Thread Thomas Stuefe
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad  wrote:

> This patch suggests a workaround to an issue with huge SCF MH expression 
> trees taking excessive JIT compilation resources by reviving (part of) the 
> simple bytecode-generating strategy that was originally available as an 
> all-or-nothing strategy choice. 
> 
> Instead of reintroducing a binary strategy choice I propose a threshold 
> parameter, controlled by 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions 
> below or at this threshold there's no change, for expressions with an arity 
> above it we use the `StringBuilder`-chain bytecode generator. 
> 
> There are a few trade-offs at play here which influence the choice of 
> threshold. The simple high arity strategy will for example not see any reuse 
> of LambdaForms but strictly always generate a class per indy callsite, which 
> means we might end up with a higher total number of classes generated and 
> loaded in applications if we set this value too low. It may also produce 
> worse performance on average. On the other hand there is the observed 
> increase in C2 resource usage as expressions grow unwieldy. On the other 
> other hand high arity expressions are likely rare to begin with, with less 
> opportunities for sharing than the more common low-arity expressions. 
> 
> I turned the submitted test case into a few JMH benchmarks and did some 
> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`:
> 
> Baseline strategy:
> 13 args: 6.3M
> 23 args: 18M
> 123 args: 868M
> 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`:
> 13 args: 2.11M
> 23 args: 3.67M
> 123 args: 4.75M
> 
> For 123 args the memory overhead of the baseline strategy is 180x, but for 23 
> args we're down to a 5x memory overhead, and down to a 3x overhead for 13 
> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) 
> I've conservatively chosen a threshold at arity 20. This keeps C2 resource 
> pressure at a reasonable level (< 18M) while avoiding perturbing performance 
> at the vast majority of call sites.
> 
> I was asked to use the new class file API for mainline. There's a version of 
> this patch implemented using ASM in 7c52a9f which might be a reasonable basis 
> for a backport.

Fyi: 
https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html

-

PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051184229


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases

2024-04-11 Thread Thomas Stuefe
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad  wrote:

> There are a few trade-offs at play here which influence the choice of 
> threshold. The simple high arity strategy will for example not see any reuse 
> of LambdaForms but strictly always generate a class per indy callsite, which 
> means we might end up with a higher total number of classes generated and 
> loaded in applications if we set this value too low

Stupid question maybe, this would be one LambdaForm per argument set? E.g. 
would two invocations with the same arguments re-use the same LambdaForm?

I would like to get an understanding of the numbers of classes involved for 
these solutions, and whether they are bounded. Mostly for Lilliput reason.

-

PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051082454


Re: RFR: 8327675: jspawnhelper should be built on all unix platforms

2024-03-08 Thread Thomas Stuefe
On Fri, 8 Mar 2024 10:17:08 GMT, Magnus Ihse Bursie  wrote:

> We should match the building of jspawnhelper in 
> make/modules/java.base/Launcher.gmk with the usage for all unix platforms in 
> src/java.base/unix/classes/java/lang/ProcessImpl.java.
> 
> Granted, the list of OSes in the makefile amounts to the current list of all 
> Unix OSes in the JDK, but there is no need to have this logical disparity. 
> 
> This was inspired by the discovery in 
> https://github.com/openjdk/jdk/pull/18112#discussion_r1517455696.

+1

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18165#pullrequestreview-1924593320


Re: RFR: JDK-8252802: java launcher should set MALLOCOPTIONS and LDR_CNTRL in order to use 64KB pages on AIX

2024-02-19 Thread Thomas Stuefe
On Mon, 19 Feb 2024 05:52:22 GMT, Varada M  wrote:

> DeCapo Benchmark Results (3 runs) :  
> 
> Before : 
> = DaCapo 9.12 h2 PASSED in 281402 msec =
> = DaCapo 9.12 h2 PASSED in 269818 msec =
> = DaCapo 9.12 h2 PASSED in 279279 msec =
>  
> After:
> = DaCapo 9.12 h2 PASSED in 279192 msec =
> = DaCapo 9.12 h2 PASSED in 269769 msec =
> = DaCapo 9.12 h2 PASSED in 271577 msec = 
> 
> Environmental variables LDR_CNTRL and MALLOCOPTIONS has caused 
> test/jdk/java/lang/ProcessBuilder/Basic.java failure.
> Additional environmental variables has to removed from 
> removeAixExpectedVars().
> 
> JBS Issue : [JDK-8252802](https://bugs.openjdk.org/browse/JDK-8252802)

I don't think this works as intended. 

IIRC, both variables must be set *before process invocation*. Setting them 
inside the launcher will only affect child processes. The dacapo results posted 
seem to support this, they seem random-ish to me.

Note that we already use 64KB pages for all large memory regions (everything 
that goes through  `os::reserve_memory`). So, while the value of LDRCNTRL is 
not nil, it is very diminished. Mostly just the Heap is affected.

But I would not change defaults for options like these anyway, especially not 
hard-coded. These knobs have far-ranging implications. If we want that, we need 
to investigate carefully.

(for example: would using 64KB pages increase average heap size? Possibly, 
depending on the implementation. But AIX still has this problem where the heap 
can only live in the break and can bump against low-hanging regions. So, AIX is 
especially vulnerable against any changes that increase average heap usage)

-

PR Comment: https://git.openjdk.org/jdk/pull/17906#issuecomment-1953643518


Re: RFR: 8322417: Console read line with zero out should zero out when throwing exception

2023-12-19 Thread Thomas Stuefe
On Tue, 19 Dec 2023 12:47:53 GMT, Goetz Lindenmaier  wrote:

> …g exception
> 
> After leaving the method by throwing an exception the data can not be cleaned 
> any more.

Seems reasonable.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17156#pullrequestreview-1789380436


Re: RFR: 8320699: Add parameter to skip progress logging of OutputAnalyzer

2023-11-24 Thread Thomas Stuefe
On Fri, 24 Nov 2023 10:34:02 GMT, Stefan Karlsson  wrote:

> Tests using ProcessTools.executeProcess gets the following output written to 
> stdout:
> [2023-11-24T09:58:16.797540608Z] Gathering output for process 2517117
> [2023-11-24T09:58:23.070781345Z] Waiting for completion for process 2517117
> [2023-11-24T09:58:23.071045055Z] Waiting for completion finished for process 
> 2517117
> 
> This might be good for some use cases and debugging, but some tests spawns a 
> large number of processes and for those this output fills up the log files.
> 
> I propose that we add a way to turn of this output for tests where we find 
> this output to be too noisy.

I'm fine with this, but would prefer a global switch instead of hard-coding the 
value per test and handing it down.

-

PR Comment: https://git.openjdk.org/jdk/pull/16807#issuecomment-1825910975


Re: RFR: 8320691: Timeout handler on Windows takes 2 hours to complete [v2]

2023-11-24 Thread Thomas Stuefe
On Fri, 24 Nov 2023 09:51:07 GMT, Daniel Jeliński  wrote:

>> test/failure_handler/src/share/conf/windows.properties line 60:
>> 
>>> 58: 
>>> 59: native.core.app=cdb
>>> 60: native.core.args=-c ".dump /ma core.%p;qd" -p %p
>> 
>> Just to double check that this is the right option. `/ma` terminates if 
>> there is "inaccessable memory" where `/mA` seems to be continue.  The `/f` 
>> option says all "accessible" memory which I assume it means skips/ignored 
>> inaccessible, is that right?
>
> `/ma` was the option suggested by `.dump`; looking at the documentation, 
> `/mA` indeed seems to be a better choice. I'll update.
> 
> `/f` used to create a [full user-mode 
> dump](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/user-mode-dump-files#full-user-mode-dumps).
>  I couldn't find any information about the behavior on inaccessible memory.
> None of the options specify dumping inaccessible memory; I assume the 
> failures only happen if some memory becomes inaccessible while producing the 
> dump.

Oh, I overlooked this. I think that is right, the old /f minidump generation 
worked around inaccessible memory. They were just "holes".

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16806#discussion_r1404160596


Re: RFR: 8320691: Timeout handler on Windows takes 2 hours to complete

2023-11-24 Thread Thomas Stuefe
On Fri, 24 Nov 2023 07:58:18 GMT, Daniel Jeliński  wrote:

> The recent cdb versions do not support `.dump /f`:
> 
> *
> * .dump /f is not supported on a user mode process. *
> *   *
> * .dump /ma creates a complete memory dump of a user mode process.  *
> *
> 
> and after printing that message, cdb ignores the rest of the command line and 
> never quits.
> 
> This PR updates the dump command to use the recommended `/ma` parameter. This 
> allows the command to produce a dump and complete in a timely manner.

Good.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16806#pullrequestreview-1747548753


Re: RFR: JDK-8320309: AIX: pthreads created by foreign test library don't work as expected

2023-11-22 Thread Thomas Stuefe
On Tue, 21 Nov 2023 10:09:04 GMT, Varada M  wrote:

> Following test fails due to missing pthread attributes on AIX :
> java/foreign/TestUpcallAsync.java
> java/foreign/stackwalk/TestAsyncStackWalk.java
> java/foreign/loaderLookup/TestLoaderLookupJNI.java
> java/foreign/enablenativeaccess/TestEnableNativeAccessJarManifest.java
> java/foreign/enablenativeaccess/TestEnableNativeAccess.java
> 
> Note : Test needs the changes from 
> [pull/16579](https://github.com/openjdk/jdk/pull/16579) and 
> [pull/16414](https://github.com/openjdk/jdk/pull/16414 )
> 
> JBS Issue : [JDK-8320309](https://bugs.openjdk.org/browse/JDK-8320309)

+1

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16757#pullrequestreview-1744345772


Re: RFR: 8318058: Notify the jvm when the direct memory is oom

2023-10-12 Thread Thomas Stuefe
On Fri, 13 Oct 2023 03:23:04 GMT, xpbob  wrote:

> Big data processes often experience situations where the direct memory oom 
> process is alive but not serving properly. If the direct memory is oom, code 
> can notify the jvm. Can bring the following benefits:
> 1. Analysis of direct memory Java. Nio. DirectByteBuffer need heapdumps 
> reference relations. Can be used directly HeapDumpOnOutOfMemoryError.
> 2. In container environment, ExitOnOutOfMemoryError can be used to let the 
> process that cannot provide services exit, so that the container can quickly 
> pull up a new pod

Undoubtedly useful, but there have been many discussions in the past about what 
does and does not constitute an OOM error, and IIRC, the stance of Oracle devs 
was "only if it is in java heap". Hence the missing OOM error when we cannot 
create threads, for instance.

-

PR Comment: https://git.openjdk.org/jdk/pull/16176#issuecomment-1760881555


Re: RFR: JDK-8315026: ProcessHandle implementation listing processes on AIX should use getprocs64 [v5]

2023-10-12 Thread Thomas Stuefe
On Thu, 12 Oct 2023 09:30:09 GMT, Joachim Kern  wrote:

>> We see rather often failures in java/lang/ProcessHandle/TreeTest.java on AIX 
>> in TreeTest.test5.
>> The reason is: Previously the implementation based on the /proc file system 
>> lead to double pids in the child list; at least intermittent. Using the API 
>> getprocs64() instead I was able to eliminate those double pids (and increase 
>> the performance by a factor of 4). Otherwise we would have to add an 
>> algorithm to filter out the doubles after creating the raw list.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   revise of the function comment and some other proposals

ok

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16051#pullrequestreview-1674558899


Re: RFR: JDK-8315026: java/lang/ProcessHandle/TreeTest.java fails intermittent on AIX in TreeTest.test5 [v4]

2023-10-12 Thread Thomas Stuefe
On Thu, 12 Oct 2023 09:33:24 GMT, Joachim Kern  wrote:

>> src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 89:
>> 
>>> 87: 
>>> 88: do { // Block to break out of on Exception
>>> 89: pids = (*env)->GetLongArrayElements(env, jarray, NULL);
>> 
>> Nit, I'd move these invocations of GLAE into the initialization block and 
>> remove the outer loop. I see what you do here, the outer block gives you a 
>> reliable jumping point to get cleanup in case of an error. But I would just 
>> use a goto cleanup label. That's clearer and allows you to handle 
>> initialization in one place.
>> 
>> Proposal for initialization:
>> 
>> 
>> if (jparentArray != null) {
>>   - check size is same as primary array, if wrong throw + goto cleanup
>>   - ppids = GLAE
>> } else {
>>   - if input pid is 0, throw an error too, since for "get all processes mode"
>>   we need the parent array. Throw + goto cleanup.
>> }
>> // same for times array
>
> hmm? As I already mentioned I do not want to change the structure of the 
> function with this pr. This can be done by someone else for macos, linux and 
> aix in parallel.

If someone reworks the code, you would have to counter-check the code anyway 
since nobody else has AIX machines around to build and test. So I'd just do it 
to save some cycles. But I leave it up to you.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16051#discussion_r1356614666


Re: RFR: JDK-8315026: java/lang/ProcessHandle/TreeTest.java fails intermittent on AIX in TreeTest.test5 [v4]

2023-10-11 Thread Thomas Stuefe
On Wed, 11 Oct 2023 10:57:24 GMT, Joachim Kern  wrote:

>> We see rather often failures in java/lang/ProcessHandle/TreeTest.java on AIX 
>> in TreeTest.test5.
>> The reason is: Previously the implementation based on the /proc file system 
>> lead to double pids in the child list; at least intermittent. Using the API 
>> getprocs64() instead I was able to eliminate those double pids (and increase 
>> the performance by a factor of 4). Otherwise we would have to add an 
>> algorithm to filter out the doubles after creating the raw list.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   cosmetic changes 2

src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 69:

> 67: 
> 68: if (arraySize != parentArraySize) {
> 69: JNU_ThrowIllegalArgumentException(env, "array sizes not 
> equal");

proposal: "Parent pid array must have the same length as result pid array"

src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 89:

> 87: 
> 88: do { // Block to break out of on Exception
> 89: pids = (*env)->GetLongArrayElements(env, jarray, NULL);

Nit, I'd move these invocations of GLAE into the initialization block and 
remove the outer loop. I see what you do here, the outer block gives you a 
reliable jumping point to get cleanup in case of an error. But I would just use 
a goto cleanup label. That's clearer and allows you to handle initialization in 
one place.

Proposal for initialization:


if (jparentArray != null) {
  - check size is same as primary array, if wrong throw + goto cleanup
  - ppids = GLAE
} else {
  - if input pid is 0, throw an error too, since for "get all processes mode"
  we need the parent array. Throw + goto cleanup.
}
// same for times array

src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 112:

> 110: jlong startTime = 0L;
> 111: 
> 112: /* skip files that aren't numbers */

As @MBaesken pointed out, this comment is obsolete.

src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 114:

> 112: /* skip files that aren't numbers */
> 113: pid_t childpid = (pid_t) ProcessBuffer[i].pi_pid;
> 114: if ((int) childpid <= 0) {

Can this even happen? Negative pids are group pids, I don't think getprocs 
returns group pids.

src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 120:

> 118: // Get the parent pid, and start time
> 119: ppid = (pid_t) ProcessBuffer[i].pi_ppid;
> 120: startTime = ((jlong) ProcessBuffer[i].pi_start) *1000;

nit: space before 1000

src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 121:

> 119: ppid = (pid_t) ProcessBuffer[i].pi_ppid;
> 120: startTime = ((jlong) ProcessBuffer[i].pi_start) *1000;
> 121: if (ppid >= 0 && (pid == 0 || ppid == pid)) {

I think the first condition is always true. You can remove it.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16051#discussion_r1356131767
PR Review Comment: https://git.openjdk.org/jdk/pull/16051#discussion_r1356134470
PR Review Comment: https://git.openjdk.org/jdk/pull/16051#discussion_r1356137028
PR Review Comment: https://git.openjdk.org/jdk/pull/16051#discussion_r1356137647
PR Review Comment: https://git.openjdk.org/jdk/pull/16051#discussion_r1356137949
PR Review Comment: https://git.openjdk.org/jdk/pull/16051#discussion_r1356141163


Re: RFR: JDK-8315026: java/lang/ProcessHandle/TreeTest.java fails intermittent on AIX in TreeTest.test5 [v4]

2023-10-11 Thread Thomas Stuefe
On Wed, 11 Oct 2023 10:57:24 GMT, Joachim Kern  wrote:

>> We see rather often failures in java/lang/ProcessHandle/TreeTest.java on AIX 
>> in TreeTest.test5.
>> The reason is: Previously the implementation based on the /proc file system 
>> lead to double pids in the child list; at least intermittent. Using the API 
>> getprocs64() instead I was able to eliminate those double pids (and increase 
>> the performance by a factor of 4). Otherwise we would have to add an 
>> algorithm to filter out the doubles after creating the raw list.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   cosmetic changes 2

src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 43:

> 41: /*
> 42:  * Returns the children of the requested pid and optionally each parent 
> and
> 43:  * start time. If requested pid is zero return all processes.

This deserves at least a better explanation, because I was first wondering why 
you return the parent pid since that would be just, well, the pid. Then I 
thought "oh, maybe he traverses indirect children too, then returning the 
respective parent pid would make sense". And then I saw that no, you only 
return direct children *unless* pid is 0. So the parent pid business makes only 
sense for the pid==0 case.

So, from looking at the implementation, the only point of the parent pid return 
array is if you pass in 0 as pid. This should be made clearer. Or even better, 
split this into two functions, one "get children(pid, returnpidarray)" and one 
"getAllProcesses(returnarray, ppid returnarray)".

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16051#discussion_r1354791604


Re: RFR: JDK-8315026: java/lang/ProcessHandle/TreeTest.java fails intermittent on AIX in TreeTest.test5 [v2]

2023-10-09 Thread Thomas Stuefe
On Mon, 9 Oct 2023 15:15:52 GMT, Joachim Kern  wrote:

> Previously the implementation based on the /proc file system lead to double 
> pids in the child list; at least intermittent. Using the API getprocs64() 
> instead I was able to eliminate those double pids (and increase the 
> performance by a factor of 4). Otherwise we would have to add an algorithm to 
> filter out the doubles after creating the raw list.

Okay, thanks. Could you add this to the PR description so that when we look at 
this PR, we don't have to scroll through comment history? Ideally, this would 
be in the JBS bug description, too.

Cheers, Thomas

-

PR Comment: https://git.openjdk.org/jdk/pull/16051#issuecomment-1753229086


Re: RFR: JDK-8315026: java/lang/ProcessHandle/TreeTest.java fails intermittent on AIX in TreeTest.test5 [v2]

2023-10-09 Thread Thomas Stuefe
On Mon, 9 Oct 2023 15:00:18 GMT, Joachim Kern  wrote:

>> We see rather often failures in java/lang/ProcessHandle/TreeTest.java on AIX 
>> in TreeTest.test5.
>> 
>> test TreeTest.test5(): failure
>> java.lang.AssertionError: expected direct children expected [2] but found [3]
>> at org.testng.Assert.fail(Assert.java:99)
>> at org.testng.Assert.failNotEquals(Assert.java:1037)
>> at org.testng.Assert.assertEqualsImpl(Assert.java:140)
>> at org.testng.Assert.assertEquals(Assert.java:122)
>> at org.testng.Assert.assertEquals(Assert.java:907)
>> at TreeTest.test5(TreeTest.java:447)
>> at 
>> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
>> at java.base/java.lang.reflect.Method.invoke(Method.java:580)
>> at 
>> org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
>> at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599)
>> at 
>> org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
>> at 
>> org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
>> at 
>> org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
>> at 
>> org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
>> at 
>> org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
>> at 
>> org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
>> at java.base/java.util.ArrayList.forEach(ArrayList.java:1597)
>> at org.testng.TestRunner.privateRun(TestRunner.java:764)
>> at org.testng.TestRunner.run(TestRunner.java:585)
>> at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
>> at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
>> at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
>> at org.testng.SuiteRunner.run(SuiteRunner.java:286)
>> at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
>> at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
>> at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
>> at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
>> at org.testng.TestNG.runSuites(TestNG.java:1069)
>> at org.testng.TestNG.run(TestNG.java:1037)
>> at 
>> com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:102)
>> at 
>> com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:58)
>> ...
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   add comment

@JoKern65 could you explain a bit what the patch wants to do?

-

PR Comment: https://git.openjdk.org/jdk/pull/16051#issuecomment-1753181485


Re: RFR: JDK-8315213: java/lang/ProcessHandle/TreeTest.java test enhance output of children [v2]

2023-08-29 Thread Thomas Stuefe
On Tue, 29 Aug 2023 12:10:34 GMT, Matthias Baesken  wrote:

>> We have some failures in TreeTest.java where the expected number of child 
>> processes is differing from what we really get. It would be good to have 
>> more output to analyze these cases.
>
> Matthias Baesken has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   use 2 containers for children and descendants

Good.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15466#pullrequestreview-1600456967


Re: RFR: JDK-8315213: java/lang/ProcessHandle/TreeTest.java test enhance output of children

2023-08-29 Thread Thomas Stuefe
On Tue, 29 Aug 2023 07:51:59 GMT, Matthias Baesken  wrote:

> We have some failures in TreeTest.java where the expected number of child 
> processes is differing from what we really get. It would be good to have more 
> output to analyze these cases.

Looks good. Small nit inline.

test/jdk/java/lang/ProcessHandle/TreeTest.java line 451:

> 449: subprocesses.stream().map(p -> p.pid())
> 450: .collect(Collectors.toList()));
> 451: 

Small nit: I would not reuse the variable, but use an own one.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15466#pullrequestreview-1599761848
PR Review Comment: https://git.openjdk.org/jdk/pull/15466#discussion_r1308381401


Re: RFR: 8314940: Use of NIO in JDKs Metrics implementation causes issues in GraalVM

2023-08-28 Thread Thomas Stuefe
On Thu, 24 Aug 2023 13:16:16 GMT, Severin Gehwolf  wrote:

> Please review this rather trivial fix to not use `nio` in `CgroupUtil`, part 
> of the
> JDK's Metrics API. The primary motivating factor is that it allows one to use 
> the
> JDK's version of `Metrics` in GraalVM. See the bug for details as to why this 
> is
> needed.
> 
> Testing:
> - [x] GraalVM builds with/without the fix and the reproducer (fails 
> before/works after)
> - [x] `jdk/internal/platform` jtreg tests on Linux x86_64 (cgv1).
> - [x] GHA - passed (Failed GHA cross compile on RISCV is unrelated)

Looks good to me. A comment would be nice, possibly above the imports, warning 
about using nio.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15416#pullrequestreview-1598518803


Re: [jdk21] RFR: 8310265: (process) jspawnhelper should not use argv[0]

2023-07-07 Thread Thomas Stuefe
On Fri, 7 Jul 2023 13:27:47 GMT, Roger Riggs  wrote:

>> Hi all,
>> 
>> Clean backport for JDK-83210265.
>> 
>> Thanks!
>
> Marked as reviewed by rriggs (Reviewer).

Thank you @RogerRiggs !

-

PR Comment: https://git.openjdk.org/jdk21/pull/103#issuecomment-1625435021


[jdk21] Integrated: 8310265: (process) jspawnhelper should not use argv[0]

2023-07-07 Thread Thomas Stuefe
On Thu, 6 Jul 2023 19:42:08 GMT, Thomas Stuefe  wrote:

> Hi all,
> 
> Clean backport for JDK-83210265.
> 
> Thanks!

This pull request has now been integrated.

Changeset: 6ef80168
Author:    Thomas Stuefe 
URL:   
https://git.openjdk.org/jdk21/commit/6ef801683844e5cc06804d51060ed81b1e8f3cc5
Stats: 10 lines in 2 files changed: 4 ins; 0 del; 6 mod

8310265: (process) jspawnhelper should not use argv[0]

Reviewed-by: rriggs
Backport-of: 47d00a4cbeff5d757dda9c660dfd2385c02a57d7

-

PR: https://git.openjdk.org/jdk21/pull/103


[jdk21] RFR: 8310265: (process) jspawnhelper should not use argv[0]

2023-07-07 Thread Thomas Stuefe
Hi all,

Clean backport for JDK-83210265.

Thanks!

-

Commit messages:
 - Backport 47d00a4cbeff5d757dda9c660dfd2385c02a57d7

Changes: https://git.openjdk.org/jdk21/pull/103/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=103&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8310265
  Stats: 10 lines in 2 files changed: 4 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk21/pull/103.diff
  Fetch: git fetch https://git.openjdk.org/jdk21.git pull/103/head:pull/103

PR: https://git.openjdk.org/jdk21/pull/103


Re: RFR: JDK-8310265: (process) jspawnhelper should not use argv[0] [v2]

2023-06-21 Thread Thomas Stuefe
On Wed, 21 Jun 2023 07:55:31 GMT, David Holmes  wrote:

>> I'm not aware of any other uses either; its not intended to be used outside 
>> of ProcessImpl especially since the addition of the new protocol to 
>> communicate parameters and confirm launching of the child.
>
> This curiosity has been present since Rob's first version:
> https://mail.openjdk.org/pipermail/core-libs-dev/2012-November/012417.html

I assumed that at some point in its life, maybe just in the first unpublished 
versions, the jspawnhelper had a variable number of arguments, possibly from 
different callers. But the fd-string was always supposed to be the last one. 
Then this would have made perfect sense.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14531#discussion_r1236569635


Re: RFR: JDK-8310265: (process) jspawnhelper should not use argv[0] [v3]

2023-06-20 Thread Thomas Stuefe
On Tue, 20 Jun 2023 18:36:40 GMT, Volker Simonis  wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   feedback Volker and Roger
>
> Still good :)

Thanks @simonis and @RogerRiggs !

-

PR Comment: https://git.openjdk.org/jdk/pull/14531#issuecomment-1600070827


Integrated: JDK-8310265: (process) jspawnhelper should not use argv[0]

2023-06-20 Thread Thomas Stuefe
On Sat, 17 Jun 2023 18:24:54 GMT, Thomas Stuefe  wrote:

> Reported by [jarabe...@gmail.com](mailto:jarabe...@gmail.com) [1] 
> 
> jspawnhelper uses argv[0] to receive the fd string from the parent. That 
> breaks with conventions and trips over certain tools like binfmt_misc. 
> 
> For details, see linked ML discussion. 
> 
> [1] https://mail.openjdk.org/pipermail/core-libs-dev/2023-June/107738.html

This pull request has now been integrated.

Changeset: 47d00a4c
Author:Thomas Stuefe 
URL:   
https://git.openjdk.org/jdk/commit/47d00a4cbeff5d757dda9c660dfd2385c02a57d7
Stats: 10 lines in 2 files changed: 4 ins; 0 del; 6 mod

8310265: (process) jspawnhelper should not use argv[0]

Reviewed-by: simonis, rriggs

-

PR: https://git.openjdk.org/jdk/pull/14531


Re: RFR: JDK-8310265: (process) jspawnhelper should not use argv[0] [v3]

2023-06-20 Thread Thomas Stuefe
> Reported by [jarabe...@gmail.com](mailto:jarabe...@gmail.com) [1] 
> 
> jspawnhelper uses argv[0] to receive the fd string from the parent. That 
> breaks with conventions and trips over certain tools like binfmt_misc. 
> 
> For details, see linked ML discussion. 
> 
> [1] https://mail.openjdk.org/pipermail/core-libs-dev/2023-June/107738.html

Thomas Stuefe has updated the pull request incrementally with one additional 
commit since the last revision:

  feedback Volker and Roger

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/14531/files
  - new: https://git.openjdk.org/jdk/pull/14531/files/1ee05c05..8f83fa27

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=14531&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14531&range=01-02

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/14531.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14531/head:pull/14531

PR: https://git.openjdk.org/jdk/pull/14531


Re: RFR: JDK-8310265: (process) jspawnhelper should not use argv[0] [v2]

2023-06-20 Thread Thomas Stuefe
On Tue, 20 Jun 2023 15:00:29 GMT, Roger Riggs  wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   correct comment
>
> src/java.base/unix/native/jspawnhelper/jspawnhelper.c line 139:
> 
>> 137: ChildStuff c;
>> 138: struct stat buf;
>> 139: /* argv[1] contains the fd number to read all the child info */
> 
> I would prefer to also fix the use of `argc-1` below to match.  Its a pretty 
> odd form and might trip someone later.

I wondered about this, it looked so deliberate. So I wondered whether 
jspawnhelper is used outside of the posix_spawn context, but cannot find 
anything. Maybe historic?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14531#discussion_r1235414109


Re: RFR: 8310187: Improve Generational ZGC jtreg testing [v3]

2023-06-19 Thread Thomas Stuefe
On Mon, 19 Jun 2023 06:55:52 GMT, Axel Boldt-Christmas  
wrote:

>> The current implementation for testing generational ZGC with jtreg is 
>> implemented with a filter on the mode flag `ZGenerational`. Because of this 
>> only environments which set this flag explicitly will run most of the tests. 
>> So they get missed in Github Actions and for developers running jtreg 
>> locally without supplying the `ZGenerational` flag.
>> 
>> The proposed change here is to introduce two new jtreg requirement 
>> properties, `vm.gc.ZGenerational` and `vm.gc.ZSingelgen`. These flags will 
>> effectively behave the same as the existing `vm.gc.` flags but also take 
>> the specific ZGC mode in account.
>> 
>> If no gc flags are supplied to jtreg and the `vm.gc.Z` is true (the build 
>> includes ZGC) both `vm.gc.ZGenerational` and `vm.gc.ZSinglegen` will be true.
>> 
>> If `-XX:+UseZGC` is supplied then both `vm.gc.ZGenerational` and 
>> `vm.gc.ZSinglegen` will also be true.
>> 
>> If `-XX:{+,-}ZGenerational` is supplied then either  `vm.gc.ZGenerational` 
>> or `vm.gc.ZSinglegen` be true depending on the flags value.
>> 
>> And if `vm.gc.Z` is false both `vm.gc.ZGenerational` and `vm.gc.ZSinglegen` 
>> will be false.
>> 
>> This change also splits the relevant tests into two distinct runs for the 
>> two modes. And the respective test ids are set to `ZGenerational` or 
>> `ZSinglegen` to make it easier to distinguish the runs.
>> 
>> This also solves the issue that some compiler tests will never run with 
>> generational ZGC unless the `TEST_VM_FLAGLESS` is set. This is because the 
>> current filter `vm.opt.final.ZGenerational` requires the flag to be 
>> explicit, but these compiler tests uses `vm.flagless`. 
>> 
>> The introduction of  `vm.gc.ZGenerational` and `vm.gc.ZSinglegen` harmonizes 
>> the way you specify generational / single gen ZGC test with the way it is 
>> done for other gcs with `vm.gc.`
>> 
>> To support this feature the Whitebox API is extended with `isDefaultVMFlag` 
>> to enable checking if `ZGenerational` is default or not.
>> 
>> `vm.opt.final.ZGenerational` is still kept because we still have some 
>> reasons to filter based on the supplied flags. 
>> - `test/hotspot/jtreg/gc/cslocker/TestCSLocker.java` is disabled when 
>> running with ZGenerational
>> - `test/jdk/java/lang/ProcessBuilder/CloseRace.java` is ran with a different 
>> max heap size for ZGenerational, but it is not the intent to dispatch the 
>> test to both G1 and generational ZGC if Generational ZGC is not specified.  
>> 
>> `test/jdk/java/lang/management/MemoryMXBean/MemoryTest.java` was also 
>> changed to not filter but instead dispatch. However u...
>
> Axel Boldt-Christmas has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Add explicit id to all Skynet.java @test

I'm not the most qualified for ZGC, but I looked this over with a mind on the 
planned 21 backport. I wanted to check if any of the existing legacy ZGC tests 
had been changed, either accidentally or deliberately. This seems not be the 
case.

The remarks are small nits, feel free to ignore them.

How much time do we spend now in GHAs on the additional Zgenerational tests? In 
any case, this makes sense.

Cheers, Thomas

test/hotspot/jtreg/gc/stringdedup/TestStringDeduplicationTools.java line 298:

> 296: if (selectedGCMode != null) {
> 297: args.add(selectedGCMode);
> 298: }

just to be sure, maybe add a "must not be Z" assert in the else path?

test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java line 58:

> 56: public final static String ERR_MSG = "The saved state of 
> UseCompressedOops and UseCompressedClassPointers is different from runtime, 
> CDS will be disabled.";
> 57: public static void main(String... args) throws Exception {
> 58:  String zGenerational = args[0];

assert not null?

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/14509#pullrequestreview-1485730970
PR Review Comment: https://git.openjdk.org/jdk/pull/14509#discussion_r1233738199
PR Review Comment: https://git.openjdk.org/jdk/pull/14509#discussion_r1233740188


Re: RFR: JDK-8310265: (process) jspawnhelper should not use argv[0] [v2]

2023-06-19 Thread Thomas Stuefe
On Mon, 19 Jun 2023 06:07:26 GMT, Thomas Stuefe  wrote:

>> Reported by [jarabe...@gmail.com](mailto:jarabe...@gmail.com) [1] 
>> 
>> jspawnhelper uses argv[0] to receive the fd string from the parent. That 
>> breaks with conventions and trips over certain tools like binfmt_misc. 
>> 
>> For details, see linked ML discussion. 
>> 
>> [1] https://mail.openjdk.org/pipermail/core-libs-dev/2023-June/107738.html
>
> Thomas Stuefe has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   correct comment

(also ran test/jdk/java/lang/Process* locally on linux x64)

-

PR Comment: https://git.openjdk.org/jdk/pull/14531#issuecomment-1596756036


Re: RFR: JDK-8310265: (process) jspawnhelper should not use argv[0] [v2]

2023-06-19 Thread Thomas Stuefe
On Mon, 19 Jun 2023 06:07:26 GMT, Thomas Stuefe  wrote:

>> Reported by [jarabe...@gmail.com](mailto:jarabe...@gmail.com) [1] 
>> 
>> jspawnhelper uses argv[0] to receive the fd string from the parent. That 
>> breaks with conventions and trips over certain tools like binfmt_misc. 
>> 
>> For details, see linked ML discussion. 
>> 
>> [1] https://mail.openjdk.org/pipermail/core-libs-dev/2023-June/107738.html
>
> Thomas Stuefe has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   correct comment

and @RogerRiggs of course. I think this is simple though, and the process tests 
succeeded in GHAs.

-

PR Comment: https://git.openjdk.org/jdk/pull/14531#issuecomment-1596732197


Re: RFR: JDK-8310265: (process) jspawnhelper should not use argv[0] [v2]

2023-06-18 Thread Thomas Stuefe
> Reported by [jarabe...@gmail.com](mailto:jarabe...@gmail.com) [1] 
> 
> jspawnhelper uses argv[0] to receive the fd string from the parent. That 
> breaks with conventions and trips over certain tools like binfmt_misc. 
> 
> For details, see linked ML discussion. 
> 
> [1] https://mail.openjdk.org/pipermail/core-libs-dev/2023-June/107738.html

Thomas Stuefe has updated the pull request incrementally with one additional 
commit since the last revision:

  correct comment

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/14531/files
  - new: https://git.openjdk.org/jdk/pull/14531/files/2ec4f3b6..1ee05c05

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=14531&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14531&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/14531.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14531/head:pull/14531

PR: https://git.openjdk.org/jdk/pull/14531


RFR: JDK-8310265: (process) jspawnhelper should not use argv[0]

2023-06-17 Thread Thomas Stuefe
Reported by [jarabe...@gmail.com](mailto:jarabe...@gmail.com) [1] 

jspawnhelper uses argv[0] to receive the fd string from the parent. That breaks 
with conventions and trips over certain tools like binfmt_misc. 

For details, see linked ML discussion. 

[1] https://mail.openjdk.org/pipermail/core-libs-dev/2023-June/107738.html

-

Commit messages:
 - JDK-8310265-jspawnhelper-argv0

Changes: https://git.openjdk.org/jdk/pull/14531/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14531&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8310265
  Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/14531.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14531/head:pull/14531

PR: https://git.openjdk.org/jdk/pull/14531


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it [v8]

2023-06-01 Thread Thomas Stuefe
On Tue, 30 May 2023 13:19:37 GMT, Volker Simonis  wrote:

>> Since JDK13, executing commands in a sub-process defaults to the so called 
>> `POSIX_SPAWN` launching mechanism (i.e. 
>> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
>> using `posix_spawn(3)` to firstly start a tiny helper program called 
>> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
>> command data from the parent Java process over a Unix pipe and finally 
>> executes (i.e. `execvp(3)`) the requested command.
>> 
>> In cases where the parent process terminates abnormally before 
>> `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` 
>> will block indefinitely on the reading end of the pipe. This is especially 
>> harmful if the parent process had open sockets, because in that case, the 
>> forked `jspawnhelper` process will inherit them and keep all the 
>> corresponding ports open effectively preventing other processes to bind to 
>> them. Notice that this is not an abstract scenario. We've observed this 
>> regularly in production with services which couldn't be restarted after a 
>> crash after migrating to JDK 17.
>> 
>> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
>> writing end of the pipe which connects it with the parent Java process 
>> *before* starting to read from that pipe such that reads from the pipe can 
>> immediately return with EOF if the parent process terminates abnormally.
>> 
>> Also did some cleanup:
>>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
>> write the command data from the parent process to `jspawnhelper`
>>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
>> that's long gone.
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Enable JspawnhelperProtocol test on AIX as well because AIX also uses 
> posix_spawn

I gave this a cursory eye, unfortunately i'm too snowed in atm for more. It 
looks good to me, your reasoning makes sense. Very good job finding and fixing 
this, and describing the issue. 

About FD_Cloexec , thus can be done in a later change.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13956#pullrequestreview-1454968540


Re: RFR: JDK-8308288: Fix xlc17 clang warnings in shared code [v2]

2023-05-27 Thread Thomas Stuefe
On Sat, 27 May 2023 11:25:41 GMT, Thomas Stuefe  wrote:

>> This one is not just to get rid of a warning. We get real test errors 
>> because malloc gets replaced by vec_malloc in log tags.
>
>> This one is not just to get rid of a warning. We get real test errors 
>> because malloc gets replaced by vec_malloc in log tags.
> 
> That does not invalidate my argument, nor does it answer my question. Those 
> test errors could be also fixed by renaming the log tag. Maybe that would be 
> the better way. 
> 
> Also, I'm curious, why does it not complain about "free", which is a logtag 
> as well?

I am basically worried that undefining malloc, even if it seems harmless now, 
exposes us to difficult-to-investigate problems down the road, since it depends 
on how the libc devs will reform those macros in the future. I would prefer a 
simple solution that does not depend on how the libc includes evolve.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14146#discussion_r1207907447


Re: RFR: JDK-8308288: Fix xlc17 clang warnings in shared code [v2]

2023-05-27 Thread Thomas Stuefe
On Fri, 26 May 2023 20:27:12 GMT, Martin Doerr  wrote:

> This one is not just to get rid of a warning. We get real test errors because 
> malloc gets replaced by vec_malloc in log tags.

That does not invalidate my argument, nor does it answer my question. Those 
test errors could be also fixed by renaming the log tag. Maybe that would be 
the better way. 

Also, I'm curious, why does it not complain about "free", which is a logtag as 
well?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14146#discussion_r1207892378


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it [v7]

2023-05-26 Thread Thomas Stuefe
On Thu, 25 May 2023 15:25:40 GMT, Volker Simonis  wrote:

>> Since JDK13, executing commands in a sub-process defaults to the so called 
>> `POSIX_SPAWN` launching mechanism (i.e. 
>> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
>> using `posix_spawn(3)` to firstly start a tiny helper program called 
>> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
>> command data from the parent Java process over a Unix pipe and finally 
>> executes (i.e. `execvp(3)`) the requested command.
>> 
>> In cases where the parent process terminates abnormally before 
>> `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` 
>> will block indefinitely on the reading end of the pipe. This is especially 
>> harmful if the parent process had open sockets, because in that case, the 
>> forked `jspawnhelper` process will inherit them and keep all the 
>> corresponding ports open effectively preventing other processes to bind to 
>> them. Notice that this is not an abstract scenario. We've observed this 
>> regularly in production with services which couldn't be restarted after a 
>> crash after migrating to JDK 17.
>> 
>> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
>> writing end of the pipe which connects it with the parent Java process 
>> *before* starting to read from that pipe such that reads from the pipe can 
>> immediately return with EOF if the parent process terminates abnormally.
>> 
>> Also did some cleanup:
>>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
>> write the command data from the parent process to `jspawnhelper`
>>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
>> that's long gone.
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix for failing CloseRace test

> Fixing the last bug :)
> 
> If we close the writing end of the pipe to jspawnhelper early in 
> `spawnChild()` right after the last write (which I still think is a good 
> idea), we have to use `c->childenv` rather than `childenv` when closing the 
> pipe descriptors at the end of `Java_java_lang_ProcessImpl_forkAndExec()` in 
> order to avoid double closing:
> 
> ```c++
> -closeSafely(childenv[0]);
> -closeSafely(childenv[1]);
> +/* We use 'c->childenv' here rather than 'childenv' because 
> 'spawnChild()' might have
> + * already closed 'c->childenv[1]' and signaled this by setting 
> 'c->childenv[1]' to '-1'.
> + * Otherwise 'c->childenv' and 'childenv' are the same because we just 
> copied 'childenv'
> + * to 'c->childenv' (with 'copyPipe()') before calling 'startChild()'. */
> +closeSafely(c->childenv[0]);
> +closeSafely(c->childenv[1]);
> ```
> 
> This also fixes `test/jdk/java/lang/ProcessBuilder/CloseRace.java` which 
> could failed sporadically the previous version of the change.

I missed that. Christ this stuff is complex :( 

Still think it would be cleaner and simpler to set the FD in the parent to 
CLOEXEC, before doing posix_spawn, and at the same time set the childStuff 
variable to -1 to prevent the child from attempting to re-close it. Reconsider?

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1564700036


Re: RFR: JDK-8308288: Fix xlc17 clang warnings in shared code [v2]

2023-05-26 Thread Thomas Stuefe
On Thu, 25 May 2023 18:18:43 GMT, Martin Doerr  wrote:

>> src/hotspot/share/utilities/globalDefinitions_xlc.hpp line 47:
>> 
>>> 45:   #undef malloc
>>> 46:   extern void *malloc(size_t) asm("vec_malloc");
>>> 47: #endif
>> 
>> Wow!  And I don't mean that in a good way.  I'm not questioning whether this 
>> is correct, just commenting
>> on how crazy it seems that such a thing might be needed.
>
> The crazy thing is that `malloc` is defined! That means all places where we 
> use the term malloc are getting replaced without such a workaround. (E.g. for 
> log tags.)

So, we do this only for malloc? Not for calloc, posix_memalign, realloc etc? 
What about free? 

As ugly as defining malloc is (and I remember QADRT), I hesitate about removing 
that define.

Removing that define and hard-coding it here assumes 1) our replacement is 
equivalent (ok, easy to check) 2) it will always be equivalent in future AIX 
versions 3) pointers it returns work with the unchanged free() and realloc() 
the system provides, and will always do so.

I don't know... I would not do this just to get rid of a warning.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14146#discussion_r1207078176


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

2023-05-24 Thread Thomas Stuefe
On Wed, 24 May 2023 14:30:10 GMT, Volker Simonis  wrote:

>>> > Sorry, but I don't understand this argument. If we do a short read we 
>>> > will work with corrupted ChildStuff and SpawnInfo
>>> > structures. This can in the extreme case execute arbitrary code (e.g. if 
>>> > ChildStuff.argv is not fully read from the parent). You are
>>> > basically saying it is better to work on corrupted data rather than 
>>> > reporting an error.
>>> 
>>> No I am simply pointing out that this has changed more than just the issue 
>>> with close. And maybe a short-read does indicate data "corruption" and 
>>> maybe it should be a fatal error. But I don't know exactly how this might 
>>> manifest so perhaps there are benign short-reads that actually do happen. 
>>> Regardless it might be better to split this part out and focus on the close 
>>> issue here.
>> 
>> Given the purpose and implementation of the `readFully` function, I don't 
>> see how it can return anything other than an error or the full requested 
>> read length.
>
> @RogerRiggs , @tstuefe sorry for bothering you one more time, but I think 
> @mlichtblau brought up an interesting case yesterday which isn't fully 
> resolved by the current fix. In @mlichtblau example, 
> `Java_java_lang_ProcessImpl_forkAndExec()` got stuck waiting on the ping from 
> `jspawnhelper` which itself blocks in `readFully()` because of a truncated 
> `write()` from the parent.
> 
> The current fix already replaces `write()` with `restartableWrite()` which 
> handles the case where a `write()` call was interrupted **before** writing a 
> single byte. But there's a second case, namely when `write()` was interrupted 
> after it already wrote some (but not all) of the requested bytes. The 
> `write()` man page states:
> 
>> Note  that a successful `write()` may transfer fewer than `count` bytes. 
>> Such partial writes can occur for various reasons; for example, because .., 
>> or because a blocked `write()` to  a  socket, pipe, or similar was 
>> interrupted by a signal handler after it had transferred some, but before it 
>> had transferred all of the requested bytes.  In the event of a partial 
>> write, the caller can make another `write()` call to transfer the remaining 
>> bytes.
> 
> So in order to safely handle the second case as well, we have to replace 
> `restartableWrite()` with something like `writeFully()` and that's exactly 
> what this new commit does. I've also added a new test case which reproduces 
> the issue by simulating a truncated `write()`.
> 
> For your convenience I've tried to explain all the additional changes (except 
> for trivial cleanups and renamings in the test) below:
> 
> Thanks in advance,
> Volker
> 
> # `jspawnhelper.c`:
> 
> 
> +// The file descriptor for reporting errors back to our parent we got on 
> the command
> +// line should be the same like the one in the ChildStuff struct we've 
> just read.
> +assert(c.fail[1] == fdout);
> 
> 
> Just trying to be overly cautious.
> 
> # childproc.{c,h}
> 
> 
> -ssize_t restartableWrite(int fd, const void *buf, size_t count);
> +ssize_t writeFully(int fd, const void *buf, size_t count);
> 
> 
> 
> -ssize_t
> -restartableWrite(int fd, const void *buf, size_t count)
> ...
> -}
> 
> +ssize_t
> +writeFully(int fd, const void *buf, size_t nbyte)
> +#ifdef DEBUG
> ...
> +#endif
> ...
> +}
> 
> 
> Replace `restartableWrite()` with `writeFully()` which is basically a copy of 
> `readFully()` (with additional testing code for jtreg). This handles both 
> cases, when `write()` is interrupted before having written a single byte and 
> returns EINTR, as well as the case where...

@simonis I try to look at this before the weekend, but no guarantees. Your 
analysis looks sound, but the code is sensitive and I want to give it my full 
attention.

One remark though, you should change JBS title and issue text to reflect the 
growing number of issues this patch is fixing.

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1561750301


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it [v4]

2023-05-23 Thread Thomas Stuefe
On Mon, 22 May 2023 10:22:16 GMT, Volker Simonis  wrote:

>> Since JDK13, executing commands in a sub-process defaults to the so called 
>> `POSIX_SPAWN` launching mechanism (i.e. 
>> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
>> using `posix_spawn(3)` to firstly start a tiny helper program called 
>> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
>> command data from the parent Java process over a Unix pipe and finally 
>> executes (i.e. `execvp(3)`) the requested command.
>> 
>> In cases where the parent process terminates abnormally before 
>> `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` 
>> will block indefinitely on the reading end of the pipe. This is especially 
>> harmful if the parent process had open sockets, because in that case, the 
>> forked `jspawnhelper` process will inherit them and keep all the 
>> corresponding ports open effectively preventing other processes to bind to 
>> them. Notice that this is not an abstract scenario. We've observed this 
>> regularly in production with services which couldn't be restarted after a 
>> crash after migrating to JDK 17.
>> 
>> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
>> writing end of the pipe which connects it with the parent Java process 
>> *before* starting to read from that pipe such that reads from the pipe can 
>> immediately return with EOF if the parent process terminates abnormally.
>> 
>> Also did some cleanup:
>>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
>> write the command data from the parent process to `jspawnhelper`
>>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
>> that's long gone.
>
> Volker Simonis has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains five commits:
> 
>  - Merge branch 'master' into JDK-8307990
>  - Address comments from tstuefe and RogerRiggs
>  - REALLY adding the test :)
>  - Added test case
>  - 8307990: jspawnhelper must close its writing side of a pipe before reading 
> from it

Looks good. 

Please enable for muslc (see remark). Otherwise, all other remarks are nits and 
I leave it up to you whether you accept them. I don't need another review.

src/java.base/unix/native/jspawnhelper/jspawnhelper.c line 148:

> 146: r = sscanf (argv[argc-1], "%d:%d:%d", &fdinr, &fdinw, &fdout);
> 147: if (r == 3 && fcntl(fdinr, F_GETFD) != -1 && fcntl(fdinw, F_GETFD) 
> != -1) {
> 148: fstat(fdinr, &buf);

Preexisting, and possibly for another RFE:
- why don't we test fdout as well?
- we probably could make sure the output fds are valid for us (S_ISREG | 
S_ISFIFO | (possibly?) S_ISSOCK)

test/jdk/java/lang/ProcessBuilder/JspawnhelperProtocol.java line 28:

> 26:  * @test
> 27:  * @bug 8307990
> 28:  * @requires (os.family == "linux" & !vm.musl)

Muslc supports posix_spawn.

I tested your patch on Alpine, it works and the test works too. Please enable 
for musl.

test/jdk/java/lang/ProcessBuilder/JspawnhelperProtocol.java line 92:

> 90: }
> 91: }
> 92: 

Small nits, mainly to make this test easier to understand to the casual reader:
- consistent naming: we have "simulateCrashInChild" and 
"simulateCrashInParent", good, but then we have "childCode", which is actually 
the code executed in the parent, right?
- simulateCrashInChild and simulateCrashInParent could be merged into one 
function that does:
  - spawn parent with env var
  - read output, scan for `"posix_spawn:XXX"`
  - parent must have exit code != 0 and should not have timed out
  - if XXX is not 0, check grandchild pid (must not be a live process)

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13956#pullrequestreview-1438535260
PR Review Comment: https://git.openjdk.org/jdk/pull/13956#discussion_r1201539615
PR Review Comment: https://git.openjdk.org/jdk/pull/13956#discussion_r1201548057
PR Review Comment: https://git.openjdk.org/jdk/pull/13956#discussion_r1201770607


Integrated: JDK-8308350: Increase buffer size for jspawnhelper arguments

2023-05-18 Thread Thomas Stuefe
On Thu, 18 May 2023 07:08:57 GMT, Thomas Stuefe  wrote:

> Trivial fix for a small problem.
> 
> jspawnhelper gets handed several file descriptors as arguments. The buffer 
> size for this string is too small (7 chars per fd) to print out every 
> conceivable int. This will overun the buffer if we happen to have fds larger 
> than (printed size) 7 characters. This could lead to crashes or malfunctions 
> if the parent VM has opened a large amount of file descriptors.
> 
> Note that on Linux, this can normally not happen since the kernel limits the 
> number of open file descriptors per process to 1M, and these fds are still 
> printable within the limits of this buffer. It is possible to get more fds 
> per process, but only via kernel patch. But we still should not rely on that. 
> And there is also still MacOS using the same mechanism.

This pull request has now been integrated.

Changeset: 808dc1b0
Author:Thomas Stuefe 
URL:   
https://git.openjdk.org/jdk/commit/808dc1b047c5a67b7397d47e38495efde022317d
Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod

8308350: Increase buffer size for jspawnhelper arguments

Reviewed-by: rriggs

-

PR: https://git.openjdk.org/jdk/pull/14045


Re: RFR: JDK-8308350: Increase buffer size for jspawnhelper arguments [v2]

2023-05-18 Thread Thomas Stuefe
On Thu, 18 May 2023 13:46:54 GMT, Roger Riggs  wrote:

>> Thomas Stuefe has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains two additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'openjdk:master' into 
>> JDK-8308350-Increase-buffer-size-for-jspawnhelper-arguments
>>  - JDK-8308350-Increase-buffer-size-for-jspawnhelper-arguments
>
> Trivial

Thanks @RogerRiggs !

-

PR Comment: https://git.openjdk.org/jdk/pull/14045#issuecomment-1553102627


Re: RFR: JDK-8308350: Increase buffer size for jspawnhelper arguments [v2]

2023-05-18 Thread Thomas Stuefe
> Trivial fix for a small problem.
> 
> jspawnhelper gets handed several file descriptors as arguments. The buffer 
> size for this string is too small (7 chars per fd) to print out every 
> conceivable int. This will overun the buffer if we happen to have fds larger 
> than (printed size) 7 characters. This could lead to crashes or malfunctions 
> if the parent VM has opened a large amount of file descriptors.
> 
> Note that on Linux, this can normally not happen since the kernel limits the 
> number of open file descriptors per process to 1M, and these fds are still 
> printable within the limits of this buffer. It is possible to get more fds 
> per process, but only via kernel patch. But we still should not rely on that. 
> And there is also still MacOS using the same mechanism.

Thomas Stuefe has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains two additional 
commits since the last revision:

 - Merge branch 'openjdk:master' into 
JDK-8308350-Increase-buffer-size-for-jspawnhelper-arguments
 - JDK-8308350-Increase-buffer-size-for-jspawnhelper-arguments

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/14045/files
  - new: https://git.openjdk.org/jdk/pull/14045/files/6fda35fb..dbce7ce1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=14045&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14045&range=00-01

  Stats: 23 lines in 2 files changed: 23 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/14045.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14045/head:pull/14045

PR: https://git.openjdk.org/jdk/pull/14045


Re: RFR: JDK-8308350: Increase buffer size for jspawnhelper arguments

2023-05-18 Thread Thomas Stuefe
On Thu, 18 May 2023 07:08:57 GMT, Thomas Stuefe  wrote:

> Trivial fix for a small problem.
> 
> jspawnhelper gets handed several file descriptors as arguments. The buffer 
> size for this string is too small (7 chars per fd) to print out every 
> conceivable int. This will overun the buffer if we happen to have fds larger 
> than (printed size) 7 characters. This could lead to crashes or malfunctions 
> if the parent VM has opened a large amount of file descriptors.
> 
> Note that on Linux, this can normally not happen since the kernel limits the 
> number of open file descriptors per process to 1M, and these fds are still 
> printable within the limits of this buffer. It is possible to get more fds 
> per process, but only via kernel patch. But we still should not rely on that. 
> And there is also still MacOS using the same mechanism.

GHAs are all suffering from infrastructure problems.

-

PR Comment: https://git.openjdk.org/jdk/pull/14045#issuecomment-1552625592


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it [v2]

2023-05-18 Thread Thomas Stuefe
On Wed, 17 May 2023 16:12:15 GMT, Volker Simonis  wrote:

>> src/java.base/unix/native/libjava/ProcessImpl_md.c line 490:
>> 
>>> 488: pid_t resultPid;
>>> 489: int i, offset, rval, bufsize, magic;
>>> 490: char *buf, buf1[24];
>> 
>> Since you are here could you please increase buffer size to something safe? 
>> Max len of INT_MIN is 11 bytes, so lets have at least 3*11 + 3 bytes.
>
> OK, so let it be (until we get 64-bit integers and file descriptors :)

Ok, if you let it be, I fix it in a separate RFE. Cleaner that way anyway. 

I filed https://bugs.openjdk.org/browse/JDK-8308350 and 
https://github.com/openjdk/jdk/pull/14045. 

Should I happen to push before you, this should be easily adaptable for you.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13956#discussion_r1197488813


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it [v3]

2023-05-18 Thread Thomas Stuefe
On Wed, 17 May 2023 16:00:23 GMT, Volker Simonis  wrote:

>> Since JDK13, executing commands in a sub-process defaults to the so called 
>> `POSIX_SPAWN` launching mechanism (i.e. 
>> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
>> using `posix_spawn(3)` to firstly start a tiny helper program called 
>> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
>> command data from the parent Java process over a Unix pipe and finally 
>> executes (i.e. `execvp(3)`) the requested command.
>> 
>> In cases where the parent process terminates abnormally before 
>> `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` 
>> will block indefinitely on the reading end of the pipe. This is especially 
>> harmful if the parent process had open sockets, because in that case, the 
>> forked `jspawnhelper` process will inherit them and keep all the 
>> corresponding ports open effectively preventing other processes to bind to 
>> them. Notice that this is not an abstract scenario. We've observed this 
>> regularly in production with services which couldn't be restarted after a 
>> crash after migrating to JDK 17.
>> 
>> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
>> writing end of the pipe which connects it with the parent Java process 
>> *before* starting to read from that pipe such that reads from the pipe can 
>> immediately return with EOF if the parent process terminates abnormally.
>> 
>> Also did some cleanup:
>>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
>> write the command data from the parent process to `jspawnhelper`
>>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
>> that's long gone.
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   REALLY adding the test :)

src/java.base/unix/native/libjava/childproc.c line 413:

> 411: const char* env = getenv("JTREG_JSPAWNHELPER_PROTOCOL_TEST");
> 412: if (env != NULL && atoi(env) == stage) {
> 413: printf("posix_spawn:%d\n", child);

I would do an `_exit()` here, not `exit()`, to prevent exit handlers from 
running. You want to simulate a sudden death. Or a `kill(getpid(), 9);`

Combine this with an explicit `fflush(stdout)` before to get your test output 
out.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13956#discussion_r1197409076


RFR: JDK-8308350: Increase buffer size for jspawnhelper arguments

2023-05-18 Thread Thomas Stuefe
Trivial fix for a small problem.

jspawnhelper gets handed several file descriptors as arguments. The buffer size 
for this string is too small (7 chars per fd) to print out every conceivable 
int. This will overun the buffer if we happen to have fds larger than (printed 
size) 7 characters. This could lead to crashes or malfunctions if the parent VM 
has opened a large amount of file descriptors.

Note that on Linux, this can normally not happen since the kernel limits the 
number of open file descriptors per process to 1M, and these fds are still 
printable within the limits of this buffer. It is possible to get more fds per 
process, but only via kernel patch. But we still should not rely on that. And 
there is also still MacOS using the same mechanism.

-

Commit messages:
 - JDK-8308350-Increase-buffer-size-for-jspawnhelper-arguments

Changes: https://git.openjdk.org/jdk/pull/14045/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14045&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8308350
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/14045.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14045/head:pull/14045

PR: https://git.openjdk.org/jdk/pull/14045


Re: RFR: 8308347: [s390x] build broken after JDK-8304913

2023-05-17 Thread Thomas Stuefe
On Thu, 18 May 2023 06:12:19 GMT, Amit Kumar  wrote:

> s390x needs to be recognised as `S390`. After 
> [JDK-8304913](https://bugs.openjdk.org/browse/JDK-8304913) during build I was 
> getting this PluginException: 
> 
> jdk.tools.jlink.plugin.PluginException: ModuleTarget is malformed: No enum 
> constant jdk.internal.util.Architecture.S390X
> at 
> jdk.jlink/jdk.tools.jlink.builder.DefaultImageBuilder.storeFiles(DefaultImageBuilder.java:181)
> at 
> jdk.jlink/jdk.tools.jlink.internal.ImagePluginStack.storeFiles(ImagePluginStack.java:486)
> at 
> jdk.jlink/jdk.tools.jlink.internal.ImageFileCreator.writeImage(ImageFileCreator.java:168)
> at 
> jdk.jlink/jdk.tools.jlink.internal.ImageFileCreator.create(ImageFileCreator.java:100)
> at 
> jdk.jlink/jdk.tools.jlink.internal.JlinkTask$ImageHelper.retrieve(JlinkTask.java:860)
> at 
> jdk.jlink/jdk.tools.jlink.internal.ImagePluginStack.operate(ImagePluginStack.java:194)
> at 
> jdk.jlink/jdk.tools.jlink.internal.JlinkTask.createImage(JlinkTask.java:423)
> at 
> jdk.jlink/jdk.tools.jlink.internal.JlinkTask.run(JlinkTask.java:286)
> at jdk.jlink/jdk.tools.jlink.internal.Main.run(Main.java:56)
> at jdk.jlink/jdk.tools.jlink.internal.Main.main(Main.java:34)
> 
> 
> Builds tested :`fastdebug, release`.

Looks good and trivial. Thanks for fixing.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/14043#pullrequestreview-1432126181


Re: RFR: 8308270: ARM32 build broken after JDK-8304913

2023-05-17 Thread Thomas Stuefe
On Wed, 17 May 2023 08:45:23 GMT, Boris Ulasevich  
wrote:

> Build issue happens after https://git.openjdk.org/jdk/pull/13585:
> 
> $ make bundles
> ...
> jdk.tools.jlink.plugin.PluginException: ModuleTarget is malformed: No enum 
> constant jdk.internal.util.Architecture.ARM
>   at 
> jdk.jlink/jdk.tools.jlink.builder.DefaultImageBuilder.storeFiles(DefaultImageBuilder.java:181)
>   at 
> jdk.jlink/jdk.tools.jlink.internal.ImagePluginStack.storeFiles(ImagePluginStack.java:486)
>   at 
> jdk.jlink/jdk.tools.jlink.internal.ImageFileCreator.writeImage(ImageFileCreator.java:168)
>   at 
> jdk.jlink/jdk.tools.jlink.internal.ImageFileCreator.create(ImageFileCreator.java:100)
>   at 
> jdk.jlink/jdk.tools.jlink.internal.JlinkTask$ImageHelper.retrieve(JlinkTask.java:860)
>   at 
> jdk.jlink/jdk.tools.jlink.internal.ImagePluginStack.operate(ImagePluginStack.java:194)
>   at 
> jdk.jlink/jdk.tools.jlink.internal.JlinkTask.createImage(JlinkTask.java:423)
>   at jdk.jlink/jdk.tools.jlink.internal.JlinkTask.run(JlinkTask.java:286)
>   at jdk.jlink/jdk.tools.jlink.internal.Main.run(Main.java:56)
>   at jdk.jlink/jdk.tools.jlink.internal.Main.main(Main.java:34)
> 
> 
> With this PR I follow the https://github.com/openjdk/jdk/pull/13357 change to 
> add a missing ARM32 architecture to the list of supported architectures: 
> `X64, X86, AARCH64, RISCV64, S390, PPC64`.

Looks good. Thanks for fixing this.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/14028#pullrequestreview-1431433623


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it [v2]

2023-05-17 Thread Thomas Stuefe
On Wed, 17 May 2023 16:05:58 GMT, Volker Simonis  wrote:

>> src/java.base/unix/native/jspawnhelper/jspawnhelper.c line 140:
>> 
>>> 138: struct stat buf;
>>> 139: /* argv[0] contains the fd number to read all the child info */
>>> 140: int r, fdinr, fdinw, fdout;
>> 
>> Since you are here, can you init these?
>
> What would be the right value to initialize them to and what would it help? 
> We use `sscanf()` right in the following line and exit if `sscanf()` is not 
> assigning all three values. I think that should be good enough.

I usually init fds with -1. It's mostly a protection against bitrot and general 
cleanliness.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13956#discussion_r1196806451


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it [v2]

2023-05-17 Thread Thomas Stuefe
On Wed, 17 May 2023 12:33:48 GMT, Volker Simonis  wrote:

>> Since JDK13, executing commands in a sub-process defaults to the so called 
>> `POSIX_SPAWN` launching mechanism (i.e. 
>> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
>> using `posix_spawn(3)` to firstly start a tiny helper program called 
>> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
>> command data from the parent Java process over a Unix pipe and finally 
>> executes (i.e. `execvp(3)`) the requested command.
>> 
>> In cases where the parent process terminates abnormally before 
>> `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` 
>> will block indefinitely on the reading end of the pipe. This is especially 
>> harmful if the parent process had open sockets, because in that case, the 
>> forked `jspawnhelper` process will inherit them and keep all the 
>> corresponding ports open effectively preventing other processes to bind to 
>> them. Notice that this is not an abstract scenario. We've observed this 
>> regularly in production with services which couldn't be restarted after a 
>> crash after migrating to JDK 17.
>> 
>> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
>> writing end of the pipe which connects it with the parent Java process 
>> *before* starting to read from that pipe such that reads from the pipe can 
>> immediately return with EOF if the parent process terminates abnormally.
>> 
>> Also did some cleanup:
>>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
>> write the command data from the parent process to `jspawnhelper`
>>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
>> that's long gone.
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Added test case

The fixes look reasonable, though I would prefer them to be split over 
individual RFEs. Would make backporting easier too.

I would still like an automated test as part of the jtreg suite. Please see my 
test proposal, that thing should be adaptable to work with your fix.

src/java.base/unix/native/jspawnhelper/jspawnhelper.c line 140:

> 138: struct stat buf;
> 139: /* argv[0] contains the fd number to read all the child info */
> 140: int r, fdinr, fdinw, fdout;

Since you are here, can you init these?

src/java.base/unix/native/libjava/ProcessImpl_md.c line 490:

> 488: pid_t resultPid;
> 489: int i, offset, rval, bufsize, magic;
> 490: char *buf, buf1[24];

Since you are here could you please increase buffer size to something safe? Max 
len of INT_MIN is 11 bytes, so lets have at least 3*11 + 3 bytes.

-

PR Review: https://git.openjdk.org/jdk/pull/13956#pullrequestreview-1430748541
PR Review Comment: https://git.openjdk.org/jdk/pull/13956#discussion_r1196551493
PR Review Comment: https://git.openjdk.org/jdk/pull/13956#discussion_r1196586907


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

2023-05-16 Thread Thomas Stuefe
On Tue, 16 May 2023 12:07:10 GMT, David Holmes  wrote:

> > > I wonder if @Martin-Buchholz is able to look at this one?
> > > My concern with changes like this is that they fix an issue but then have 
> > > unexpected side-effects themselves.
> > 
> > 
> > Are there any specific concerns you have?
> 
> There seems to be more going on here than just closing the write side of the 
> pipe. We will now error out if any of the readFully's do a short read, rather 
> than just when reporting an error - which on the surface seems like a good 
> thing, but what if harmless short-reads can actually happen in some contexts?

I agree, it would be clearer to just to the close.

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1549575144


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

2023-05-16 Thread Thomas Stuefe
On Mon, 15 May 2023 16:11:46 GMT, Volker Simonis  wrote:

> > 2. I think you don't actually have to hand in the in-pipe-read-end fd 
> > number via command line arg, just to have the child to close it. You could 
> > just, in the parent, set the fd to FD_CLOEXEC. Since posix_spawn() exec's 
> > the spawn helper, this would close the file descriptor for you. Extending 
> > this thought, you could do this with all pipe ends you want to cauterize in 
> > the child process.
> 
> I've deliberately not used `FD_CLOEXEC` because the file descriptor closing 
> code in `childProcess()` is shared by all launch mechanisms. So to make it 
> work, I'd had to reset the corresponding file descriptors to `-1` in the 
> child anyway.
> 
> I therefor felt the current fix is smaller, simpler and easier to understand.

I don't understand what you mean.

A) vfork, fork:

`(parent)->fork()->(child)->childProcess()`

B) posix_spawn:

`(parent)->posix_spawn()(does clone() then exec())->jspawnhelper 
main()->childProcess()`

With your patch, you close the offending fd in `jspawnhelper main()`. With my 
suggestion, it would be closed on the `exec` call `posix_spawn` does. Neither 
variant affects the working of vfork/fork mode. Am I missing something?

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1549274987


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

2023-05-16 Thread Thomas Stuefe
On Tue, 16 May 2023 07:24:31 GMT, David Holmes  wrote:

> I wonder if @Martin-Buchholz is able to look at this one?
> 
> My concern with changes like this is that they fix an issue but then have 
> unexpected side-effects themselves.

Are there any specific concerns you have?

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1549179059


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

2023-05-16 Thread Thomas Stuefe
On Mon, 15 May 2023 18:45:24 GMT, Roger Riggs  wrote:

>> Since JDK13, executing commands in a sub-process defaults to the so called 
>> `POSIX_SPAWN` launching mechanism (i.e. 
>> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
>> using `posix_spawn(3)` to firstly start a tiny helper program called 
>> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
>> command data from the parent Java process over a Unix pipe and finally 
>> executes (i.e. `execvp(3)`) the requested command.
>> 
>> In cases where the parent process terminates abnormally before 
>> `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` 
>> will block indefinitely on the reading end of the pipe. This is especially 
>> harmful if the parent process had open sockets, because in that case, the 
>> forked `jspawnhelper` process will inherit them and keep all the 
>> corresponding ports open effectively preventing other processes to bind to 
>> them. Notice that this is not an abstract scenario. We've observed this 
>> regularly in production with services which couldn't be restarted after a 
>> crash after migrating to JDK 17.
>> 
>> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
>> writing end of the pipe which connects it with the parent Java process 
>> *before* starting to read from that pipe such that reads from the pipe can 
>> immediately return with EOF if the parent process terminates abnormally.
>> 
>> Also did some cleanup:
>>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
>> write the command data from the parent process to `jspawnhelper`
>>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
>> that's long gone.
>
> I think writing a test is worth a little bit of time but getting a clean test 
> run using the KILLTEST might be a bit tricky. The exit(-1) of the parent 
> would be detected by jtreg as a failure. And verifying that the spawned child 
> runs to completion might be difficult without writing a driver app to monitor 
> the child; though it may be sufficient to know just that it does not hang and 
> timeout. When the parent dies, the child will be re-parented.
> And it would only be run in debug builds.

@RogerRiggs @simonis 

Okay, I wrote a jtreg test case for the issue, just to see if I could. Rebased 
atop of Volkers change, you can just take the commit if you want:

https://github.com/tstuefe/jdk/tree/Added-TestCase-For-Volker
https://github.com/openjdk/jdk/commit/9119b442dac8dd95228d47d70871ee59862649ce

Test case fails immediately (so, no relying on timeouts) if Volkers patch is 
missing. Succeeds immediately with Volkers patch. I also tried commenting out 
the fixing close() call, test fails immediately.

Patch relies on jspawnhelper to write a small marker file on communication 
error with parent if KILLTEST is set. Only in debug code. I think this is very 
reliable. No second-guessing aliveness of the child and worrying about PID 
reuse. Re-parenting is no issue either. And I think such a file make be a good 
idea anyway to analyze handshake errors with jspawnhelper.

Tested on Linux x64.

@simonis, feel free to add me as contributor if you take this test. I spent way 
to much time on this :-)

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1549172824


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

2023-05-15 Thread Thomas Stuefe
On Fri, 12 May 2023 15:24:19 GMT, Volker Simonis  wrote:

> Since JDK13, executing commands in a sub-process defaults to the so called 
> `POSIX_SPAWN` launching mechanism (i.e. 
> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
> using `posix_spawn(3)` to firstly start a tiny helper program called 
> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
> command data from the parent Java process over a Unix pipe and finally 
> executes (i.e. `execvp(3)`) the requested command.
> 
> In cases where the parent process terminates abnormally before `jspawnhelper` 
> has read all the expected data from the pipe, `jspawnhelper` will block 
> indefinitely on the reading end of the pipe. This is especially harmful if 
> the parent process had open sockets, because in that case, the forked 
> `jspawnhelper` process will inherit them and keep all the corresponding ports 
> open effectively preventing other processes to bind to them. Notice that this 
> is not an abstract scenario. We've observed this regularly in production with 
> services which couldn't be restarted after a crash after migrating to JDK 17.
> 
> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
> writing end of the pipe which connects it with the parent Java process 
> *before* starting to read from that pipe such that reads from the pipe can 
> immediately return with EOF if the parent process terminates abnormally.
> 
> Also did some cleanup:
>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
> write the command data from the parent process to `jspawnhelper`
>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
> that's long gone.

> > > > Looks ok.
> > > > Is it practical to write a test for this situation? Can I assume you've 
> > > > validated the improvement as a remedy for the observed hangs?
> > > 
> > > 
> > > I thought about a test but couldn't come up with a practical way to write 
> > > it. The JVM has to exit in the time frame after the `posix_spawn()` and 
> > > before `jspawnhelper` has read its data from the parent. In production 
> > > this usually happens due to memory constraints on the particular host 
> > > which let the OOM-killer kill the JVM process because it is the biggest 
> > > memory consumer.
> > 
> > 
> > A regression test would be good.
> > This can be very simply a runtime switch that kills the parent process at 
> > vulnerable times. Sely be my example here:
> > [master...tstuefe:jdk:test-for-parent-premature-death](https://github.com/openjdk/jdk/compare/master...tstuefe:jdk:test-for-parent-premature-death)
> > Using that (KILLTEST=1), I was able to reproduce your problem with the 
> > hanging child. Using this to build a jtreg test is not hard.
> > Cheers, Thomas
> 
> Do you really think it makes sense to add such test code to the native 
> runtime libraries?

Debug only? Sure. The fact that this error is in there since JDK 13 is scary.

> 
> I'm not sure. What do others think?

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1548181950


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

2023-05-15 Thread Thomas Stuefe
On Mon, 15 May 2023 12:31:49 GMT, Volker Simonis  wrote:

> > Looks ok.
> > Is it practical to write a test for this situation? Can I assume you've 
> > validated the improvement as a remedy for the observed hangs?
> 
> I thought about a test but couldn't come up with a practical way to write it. 
> The JVM has to exit in the time frame after the `posix_spawn()` and before 
> `jspawnhelper` has read its data from the parent. In production this usually 
> happens due to memory constraints on the particular host which let the 
> OOM-killer kill the JVM process because it is the biggest memory consumer.
> 

A regression test would be good.

This can be very simply a runtime switch that kills the parent process at 
vulnerable times. See my example here:

https://github.com/openjdk/jdk/compare/master...tstuefe:jdk:test-for-parent-premature-death

Using that (KILLTEST=1), I was able to reproduce your problem with the hanging 
child. Using this to build a jtreg test is not hard.

Cheers, Thomas

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1548113927


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

2023-05-15 Thread Thomas Stuefe
On Fri, 12 May 2023 15:24:19 GMT, Volker Simonis  wrote:

> Since JDK13, executing commands in a sub-process defaults to the so called 
> `POSIX_SPAWN` launching mechanism (i.e. 
> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
> using `posix_spawn(3)` to firstly start a tiny helper program called 
> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
> command data from the parent Java process over a Unix pipe and finally 
> executes (i.e. `execvp(3)`) the requested command.
> 
> In cases where the parent process terminates abnormally before `jspawnhelper` 
> has read all the expected data from the pipe, `jspawnhelper` will block 
> indefinitely on the reading end of the pipe. This is especially harmful if 
> the parent process had open sockets, because in that case, the forked 
> `jspawnhelper` process will inherit them and keep all the corresponding ports 
> open effectively preventing other processes to bind to them. Notice that this 
> is not an abstract scenario. We've observed this regularly in production with 
> services which couldn't be restarted after a crash after migrating to JDK 17.
> 
> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
> writing end of the pipe which connects it with the parent Java process 
> *before* starting to read from that pipe such that reads from the pipe can 
> immediately return with EOF if the parent process terminates abnormally.
> 
> Also did some cleanup:
>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
> write the command data from the parent process to `jspawnhelper`
>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
> that's long gone.

Thank you Volker for the explanations.

Some more notes:

1) I think the child should probably close (but without error handling) the 
*read* end of the fail pipe too *before* sending out the liveness ping.

https://github.com/openjdk/jdk/blob/020a60e6449749ca979c1ace3af7db57f4c53a5f/src/java.base/unix/native/libjava/childproc.c#L335

Because the same problem could arise if the parent were to terminate. Child 
writes to the pipe, and theoretically could block on a full pipe since it holds 
open an fd to the read end. I admit this is highly theoretical since the pipe 
would have to be full for this to happen, and nobody wrote to the pipe yet, and 
the aliveness ping is just an integer (4 bytes). But to be more correct, and 
since the fix is so trivial, the fail pipe read end should be closed in the 
child process before writing to fail pipe.

2) I think you don't actually have to hand in the in-pipe-read-end fd number 
via command line arg, just to have the child to close it. You could just, in 
the parent, set the fd to FD_CLOEXEC. Since posix_spawn() exec's the spawn 
helper, this would close the file descriptor for you. Extending this thought, 
you could do this with all pipe ends you want to cauterize in the child process.

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1547994614


Re: RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

2023-05-14 Thread Thomas Stuefe
On Fri, 12 May 2023 15:24:19 GMT, Volker Simonis  wrote:

> Since JDK13, executing commands in a sub-process defaults to the so called 
> `POSIX_SPAWN` launching mechanism (i.e. 
> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
> using `posix_spawn(3)` to firstly start a tiny helper program called 
> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
> command data from the parent Java process over a Unix pipe and finally 
> executes (i.e. `execvp(3)`) the requested command.
> 
> In cases where the parent process terminates abnormally before `jspawnhelper` 
> has read all the expected data from the pipe, `jspawnhelper` will block 
> indefinitely on the reading end of the pipe. This is especially harmful if 
> the parent process had open sockets, because in that case, the forked 
> `jspawnhelper` process will inherit them and keep all the corresponding ports 
> open effectively preventing other processes to bind to them. Notice that this 
> is not an abstract scenario. We've observed this regularly in production with 
> services which couldn't be restarted after a crash after migrating to JDK 17.
> 
> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
> writing end of the pipe which connects it with the parent Java process 
> *before* starting to read from that pipe such that reads from the pipe can 
> immediately return with EOF if the parent process terminates abnormally.
> 
> Also did some cleanup:
>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
> write the command data from the parent process to `jspawnhelper`
>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
> that's long gone.

Trying to understand this, and the scope of the problem:

- Parent process opens "in" pipe. 
- Now owns read and write end of pipe. 
- Parent forks jspawnhelper. 
- jspawnhelper inherits handles. Now there are four open file descriptors to 
this pipe.
- Parent dies and releases its handles
- jspawnhelper gets thrown out of its first "readFully", but enters the next 
"readFully". It still owns both read and write end of the pipe. So it will wait 
endlessly. We only get an EOF from a pipe read when all write ends are closed.

Is this the problem?

I wonder whether the parent may not have the same problem at the other end. It 
may be prudent to close the child's ends of the three pipes right after calling 
startChild() 
(https://github.com/openjdk/jdk/blob/020a60e6449749ca979c1ace3af7db57f4c53a5f/src/java.base/unix/native/libjava/ProcessImpl_md.c#LL681C17-L681C27).

-

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1546954075


Re: RFR: 8307163: JLONG_FORMAT_SPECIFIER should be updated on Windows [v2]

2023-05-14 Thread Thomas Stuefe
On Tue, 2 May 2023 12:23:23 GMT, Julian Waters  wrote:

>> Windows no longer uses I64d anywhere in their newer compilers, instead using 
>> the conforming lld specifiers. Minor cleanup here in JLI code to reflect that
>
> Julian Waters has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   HotSpot should also use lld instead of I64d

Good

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13740#pullrequestreview-1425543162


Re: RFR: 8304915: Create jdk.internal.util.Architecture enum and apply [v7]

2023-04-08 Thread Thomas Stuefe
On Fri, 7 Apr 2023 21:15:11 GMT, Roger Riggs  wrote:

> Refactored the way that build time architecture values are mapped to the 
> Architecture enum. 

Thank you for this, Roger.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13357#discussion_r1161075537


Re: RFR: 8304915: Create jdk.internal.util.Architecture enum and apply [v7]

2023-04-07 Thread Thomas Stuefe
On Fri, 7 Apr 2023 10:52:47 GMT, Jaikiran Pai  wrote:

>> src/java.base/share/classes/jdk/internal/util/Architecture.java line 41:
>> 
>>> 39: AARCH64(),
>>> 40: RISCV64(),
>>> 41: S390(),
>> 
>> Why getting rid of the X in s390x? There has not been an s390 linux kernel 
>> in almost ten years.
>
> Hello Thomas, that change was based on the review comment here 
> https://github.com/openjdk/jdk/pull/13357#discussion_r1159810942

Okay, Lutz is the expert here. Sorry for the noise.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13357#discussion_r1160637280


Re: RFR: 8304915: Create jdk.internal.util.Architecture enum and apply [v7]

2023-04-07 Thread Thomas Stuefe
On Thu, 6 Apr 2023 19:25:19 GMT, Roger Riggs  wrote:

>> Define an internal jdk.internal.util.Architecture enumeration and static 
>> methods to replace uses of the system property `os.arch`.
>> The enumeration values are defined to match those used in the build.
>> The initial values are: `X64, X86, IA64, ARM, AARCH64, RISCV64, S390X, 
>> PPC64LE`
>> Note that `amd64` and `x86_64` in the build are represented by `X64`.
>> The values of the system property `os.arch` is unchanged.
>> 
>> The API is similar to the jdk.internal.util.OperatingSystem enum created by 
>> #[12931](https://git.openjdk.org/jdk/pull/12931).
>> Uses in `java.base` and a few others are included but other modules will be 
>> done in separate PRs.
>
> Roger Riggs has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove unneeded qualified export from java.base to jdk.attach

src/java.base/share/classes/jdk/internal/util/Architecture.java line 41:

> 39: AARCH64(),
> 40: RISCV64(),
> 41: S390(),

Why getting rid of the X in s390x? There has not been an s390 linux kernel in 
almost ten years.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13357#discussion_r1160610072


Re: RFR: 8304915: Create jdk.internal.util.Architecture enum and apply [v7]

2023-04-07 Thread Thomas Stuefe
On Fri, 7 Apr 2023 07:42:51 GMT, Alan Bateman  wrote:

> > What I meant was: You define PPCLE. PPCLE specifies ppc, little endian. We 
> > also have PPC big-endian, it is used by AIX (and you can also run Linux 
> > with PPC big-endian). Why omit that?
> > os.arch for AIX is "ppc64".
> 
> So I think you are saying that there are aix-ppc64 and linux-ppc64le builds 
> so this enum needs to have values for both ppc64 and ppc64le.

Yes, precisely.

-

PR Comment: https://git.openjdk.org/jdk/pull/13357#issuecomment-1500115177


Re: RFR: 8304915: Create jdk.internal.util.Architecture enum and apply [v7]

2023-04-06 Thread Thomas Stuefe
On Thu, 6 Apr 2023 20:39:41 GMT, Roger Riggs  wrote:

> > What about PPC (big endian)? Used on AIX?
> 
> I'm not aware of any code (in OpenJDK) related to big/little endian that is 
> derived from `os.arch`.
> 
> > On Arm, it may be useful to know whether we built for thumb mode (We 
> > recently had this problem in tests).
> 
> For tests, there is the test library `jdk.test.lib.Platform` class with its 
> own set of predicates.
> 
> For the moment, the idea is to do the minimum to replace OpenJDK internal 
> uses of `os.arch`.

What I meant was: You define PPCLE. PPCLE specifies ppc, little endian. We also 
have PPC big-endian, it is used by AIX (and you can also run Linux with PPC 
big-endian). Why omit that?

os.arch for AIX is "ppc64". It is even used in our codebase, see 
https://github.com/openjdk/jdk/blob/c67bbcea92919fea9b6f7bbcde8ba4488289d174/test/hotspot/jtreg/runtime/ElfDecoder/TestElfDirectRead.java#L51

> > Another question, how does this work with Zero?
> 
> Zero is orthogonal to architecture; the interpreter is compiled for a 
> specific target architecture.

Sorry, I was not clear. Zero is the vehicle to get the OpenJDK to build and run 
on hardware we don't directly support. It is not only used in bootstrapping but 
finds surprisingly broad use, e.g., for Debian wich itself supports a wide 
range of platforms and tries to provide java on all of them. 

For these JDKs, IIUC, os.arch would have the value of the target architecture 
("PA-RISC", "mips", "alpha" etc). These platforms would be at a disadvantage 
now, as @Glavo pointed out: since OPENJDK_TARGET_CPU is fed from uname: 
https://github.com/openjdk/jdk/blob/c67bbcea92919fea9b6f7bbcde8ba4488289d174/make/RunTestsPrebuilt.gmk#L182,
 they will have a name that expands to an enum that is not compilable. So all 
of these platforms will get build errors.

I understand your intention to keep this patch simple, and that you want to 
limit the enum value. I also like enums. I wonder whether it would be possible 
to have an enum value "other" and map unknown architectures to that.

-

PR Comment: https://git.openjdk.org/jdk/pull/13357#issuecomment-1499986742


Re: RFR: 8304915: Create jdk.internal.util.Architecture enum and apply [v7]

2023-04-06 Thread Thomas Stuefe
On Thu, 6 Apr 2023 21:17:25 GMT, Glavo  wrote:

>> The point of Architecture is reflect the architecture as supported by the 
>> build and the runtime mutually and to reflect dependencies on the target 
>> hardware. 
>> 
>> What did you use as the example that would not compile on the other 
>> architecture?
>> Did you make the other changes to the build that would be needed for a new 
>> architecture?
>
>> What did you use as the example that would not compile on the other 
>> architecture? 
> 
> https://github.com/openjdk/jdk/blob/52ca4a70fc3de9e285964f9545ea8cd54e2d9924/src/java.base/share/classes/jdk/internal/util/OperatingSystemProps.java.template#L40
> 
> I don't understand how the above code can be compiled on architectures such 
> as LongArch64, MIPS64el, or ARM32 without modifying the source file.
> 
>> Did you make the other changes to the build that would be needed for a new 
>> architecture?
> 
> I'm not talking about the new architecture, I'm talking about the 
> architecture that OpenJDK already supports. 
> 
> According to my understanding, the above code breaks the support of all 
> architectures not listed in `Architecture`. But has `Architecture` really 
> listed all the supported architectures in the OpenJDK mainline?
> 
> For example, I want to compile OpenJDK for 32-bit ARM.  I think in this PR, I 
> cannot compile without modifying the source code because `TARGET_ARCH_arm` 
> does not exist. But before this PR, I was able to compile.

@Glavo Arm32 builds, though, see GHAs (crossbuilding test). As for the others, 
as long as the ports have not been integrated into mainline, I expect port 
maintainers will adapt the enum downstream. They do this anyway.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/13357#discussion_r1160459041


  1   2   >