Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
Hi Joe, > As a general comment, it is _not_ the goal of the API specification to (over) specify exact behavior in cases like this. > See as an example the discussion concerning behavioral compatibility starting around slide 46 of > "Contributing to OpenJDK: Participating in stewardship for the long-term," https://jcp.org/aboutJava/communityprocess/ec-public/materials/2023-06-13/Contributing_to_OpenJDK_2023_04_12.pdf > This approach has evolved over the years and releases. > In this case semantically, the array returned by getMethod is a set and the no particular meaning should be read into the order of the elements. > HTH, > -Joe Missed this email of yours. Thanks for making it clear. Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ From: Aman Sharma Sent: Wednesday, May 22, 2024 8:19:41 PM To: Chen Liang Cc: David Holmes; core-libs-dev@openjdk.org; leyden-...@openjdk.org Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi, Another thing I wanted to look into in this thread was the order of fields in the Proxy classes generated. They are also based on the a number. The same proxy classes across different executions can have random order of `Method` fields and the methods could be mapped to different field names. For example, consider the proxy class based on `picocli.CommandLine<https://github.com/remkop/picocli/blob/da98db63d1b516141b7485881b0dcddfd082dbc8/src/main/java/picocli/CommandLine.java#L4541>` in two different executions. // fields and method are truncated for brevity public final class $Proxy9 extends Proxy implements CommandLine.Command { private static Method m1; private static Method m32; private static Method m21; private static Method m43; private static Method m36; private static Method m27; public final boolean helpCommand() throws { try { return (Boolean)super.h.invoke(this, m32, (Object[])null); } catch (RuntimeException | Error var2) { throw var2; } catch (Throwable var3) { throw new UndeclaredThrowableException(var3); } } // fields and method are truncated for brevity public final class $Proxy13 extends Proxy implements CommandLine.Command { private static Method m1; private static Method m29; private static Method m16; private static Method m40; private static Method m38; private static Method m12; public final boolean helpCommand() throws { try { return (Boolean)super.h.invoke(this, m29, (Object[])null); } catch (RuntimeException | Error var2) { throw var2; } catch (Throwable var3) { throw new UndeclaredThrowableException(var3); } } Notice the difference in the order of fields and `helpCommand` method is mapped to a different field name in both classes. This happens because the method array returned by `getMethods` is not sorted in any particular order<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178> when generating a proxy class. What dictates this order? And why is it not deterministic? Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ From: Aman Sharma Sent: Wednesday, May 22, 2024 4:12:19 PM To: Chen Liang Cc: David Holmes; core-libs-dev@openjdk.org; leyden-...@openjdk.org Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi Chen, That's clear. Thanks for letting me know. I guess then Project Leyden is working on naming the hidden classes deterministically to achieve their goals<https://openjdk.org/projects/leyden/notes/01-beginnings>. Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ From: Chen Liang Sent: Wednesday, May 22, 2024 1:35:46 PM To: Aman Sharma Cc: David Holmes; core-libs-dev@openjdk.org; leyden-...@openjdk.org Subject: Re: Deterministic naming
Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
Hi, Another thing I wanted to look into in this thread was the order of fields in the Proxy classes generated. They are also based on the a number. The same proxy classes across different executions can have random order of `Method` fields and the methods could be mapped to different field names. For example, consider the proxy class based on `picocli.CommandLine<https://github.com/remkop/picocli/blob/da98db63d1b516141b7485881b0dcddfd082dbc8/src/main/java/picocli/CommandLine.java#L4541>` in two different executions. // fields and method are truncated for brevity public final class $Proxy9 extends Proxy implements CommandLine.Command { private static Method m1; private static Method m32; private static Method m21; private static Method m43; private static Method m36; private static Method m27; public final boolean helpCommand() throws { try { return (Boolean)super.h.invoke(this, m32, (Object[])null); } catch (RuntimeException | Error var2) { throw var2; } catch (Throwable var3) { throw new UndeclaredThrowableException(var3); } } // fields and method are truncated for brevity public final class $Proxy13 extends Proxy implements CommandLine.Command { private static Method m1; private static Method m29; private static Method m16; private static Method m40; private static Method m38; private static Method m12; public final boolean helpCommand() throws { try { return (Boolean)super.h.invoke(this, m29, (Object[])null); } catch (RuntimeException | Error var2) { throw var2; } catch (Throwable var3) { throw new UndeclaredThrowableException(var3); } } Notice the difference in the order of fields and `helpCommand` method is mapped to a different field name in both classes. This happens because the method array returned by `getMethods` is not sorted in any particular order<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178> when generating a proxy class. What dictates this order? And why is it not deterministic? Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ ________ From: Aman Sharma Sent: Wednesday, May 22, 2024 4:12:19 PM To: Chen Liang Cc: David Holmes; core-libs-dev@openjdk.org; leyden-...@openjdk.org Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi Chen, That's clear. Thanks for letting me know. I guess then Project Leyden is working on naming the hidden classes deterministically to achieve their goals<https://openjdk.org/projects/leyden/notes/01-beginnings>. Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ ____ From: Chen Liang Sent: Wednesday, May 22, 2024 1:35:46 PM To: Aman Sharma Cc: David Holmes; core-libs-dev@openjdk.org; leyden-...@openjdk.org Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi Aman, We have tried defining Proxy as hidden classes; a previous attempt was on hold because of issues with serialization. Otherwise, Proxies work great as hidden classes. Chen On Mon, May 20, 2024 at 7:56 AM Aman Sharma mailto:aman...@kth.se>> wrote: Hi David, > I would not expect any class load events. I understand. I also haven't tried to intercept them but I see only one approach right now to include them in an allowlist - 1) statically look for invocations of "Lookup::defineHiddenClass". 2) Instrument them so that its first argument "bytes" can be looked into upon. I haven't looked into it much because I did not have much idea about it. And they are hidden so it made it worse. 😅 Thanks for sharing the JEP! > java.lang.reflect.Proxy could define hidden classes to act as the proxy classes which implement proxy interfaces; from JEP 317 It says that Proxy classes will also become hidden classes. Is it underway? Right now one can intercept, transform them, and include them in an allowlist. What do you think of naming them independent of AtomicLong so that a proxy class generated at runtime is easy to lookup in the allowlist? Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical E
Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
Hi Chen, That's clear. Thanks for letting me know. I guess then Project Leyden is working on naming the hidden classes deterministically to achieve their goals<https://openjdk.org/projects/leyden/notes/01-beginnings>. Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ From: Chen Liang Sent: Wednesday, May 22, 2024 1:35:46 PM To: Aman Sharma Cc: David Holmes; core-libs-dev@openjdk.org; leyden-...@openjdk.org Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi Aman, We have tried defining Proxy as hidden classes; a previous attempt was on hold because of issues with serialization. Otherwise, Proxies work great as hidden classes. Chen On Mon, May 20, 2024 at 7:56 AM Aman Sharma mailto:aman...@kth.se>> wrote: Hi David, > I would not expect any class load events. I understand. I also haven't tried to intercept them but I see only one approach right now to include them in an allowlist - 1) statically look for invocations of "Lookup::defineHiddenClass". 2) Instrument them so that its first argument "bytes" can be looked into upon. I haven't looked into it much because I did not have much idea about it. And they are hidden so it made it worse. 😅 Thanks for sharing the JEP! > java.lang.reflect.Proxy could define hidden classes to act as the proxy classes which implement proxy interfaces; from JEP 317 It says that Proxy classes will also become hidden classes. Is it underway? Right now one can intercept, transform them, and include them in an allowlist. What do you think of naming them independent of AtomicLong so that a proxy class generated at runtime is easy to lookup in the allowlist? Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ ____ From: David Holmes mailto:david.hol...@oracle.com>> Sent: Monday, May 20, 2024 2:30:37 PM To: Aman Sharma; liangchenb...@gmail.com<mailto:liangchenb...@gmail.com> Cc: core-libs-dev@openjdk.org<mailto:core-libs-dev@openjdk.org>; leyden-...@openjdk.org<mailto:leyden-...@openjdk.org> Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` On 20/05/2024 10:12 pm, Aman Sharma wrote: > Hi David, > > > > How did you try to intercept them? Hidden classes are not "loaded" in > the normal sense so won't trigger class load events. > > > I could not intercept them. I only see them when I pass `-verbose:class` > in the Java CLI. Yes that is why I asked how you tried to intercept them. > > I also couldn't intercept them using JVMTI Class File Load Hook > <https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassFileLoadHook> > event. However JEP 371 suggests that it should be possible to intercept them > using JVMTI Class Load > <https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassLoad> > event, but I won't have the bytecode at this stage. So is there no way to get > its bytecode before it is linked and initialized in the JVM? Hidden classes are not loaded so I would not expect any class load events. However the exact nature of the JVMTI class load event is unclear as it talks about "class or interface creation" which is neither loading or defining per se. But a class prepare event sounds like it should be issued. However neither give you access to the bytecode of the class AFAICS. David - > > Regards, > Aman Sharma > > PhD Student > KTH Royal Institute of Technology > School of Electrical Engineering and Computer Science (EECS) > Department of Theoretical Computer Science (TCS) > <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> > <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ > <https://algomaster99.github.io/> > > *From:* David Holmes mailto:david.hol...@oracle.com>> > *Sent:* Monday, May 20, 2024 2:59:17 AM > *To:* Aman Sharma; liangchenb...@gmail.com<mailto:liangchenb...@gmail.com> > *Cc:* core-libs-dev@openjdk.org<mailto:core-libs-dev@openjdk.org>; > leyden-...@openjdk.org<ma
Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
Hi David, > I would not expect any class load events. I understand. I also haven't tried to intercept them but I see only one approach right now to include them in an allowlist - 1) statically look for invocations of "Lookup::defineHiddenClass". 2) Instrument them so that its first argument "bytes" can be looked into upon. I haven't looked into it much because I did not have much idea about it. And they are hidden so it made it worse. 😅 Thanks for sharing the JEP! > java.lang.reflect.Proxy could define hidden classes to act as the proxy classes which implement proxy interfaces; from JEP 317 It says that Proxy classes will also become hidden classes. Is it underway? Right now one can intercept, transform them, and include them in an allowlist. What do you think of naming them independent of AtomicLong so that a proxy class generated at runtime is easy to lookup in the allowlist? Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ From: David Holmes Sent: Monday, May 20, 2024 2:30:37 PM To: Aman Sharma; liangchenb...@gmail.com Cc: core-libs-dev@openjdk.org; leyden-...@openjdk.org Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` On 20/05/2024 10:12 pm, Aman Sharma wrote: > Hi David, > > > > How did you try to intercept them? Hidden classes are not "loaded" in > the normal sense so won't trigger class load events. > > > I could not intercept them. I only see them when I pass `-verbose:class` > in the Java CLI. Yes that is why I asked how you tried to intercept them. > > I also couldn't intercept them using JVMTI Class File Load Hook > <https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassFileLoadHook> > event. However JEP 371 suggests that it should be possible to intercept them > using JVMTI Class Load > <https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassLoad> > event, but I won't have the bytecode at this stage. So is there no way to get > its bytecode before it is linked and initialized in the JVM? Hidden classes are not loaded so I would not expect any class load events. However the exact nature of the JVMTI class load event is unclear as it talks about "class or interface creation" which is neither loading or defining per se. But a class prepare event sounds like it should be issued. However neither give you access to the bytecode of the class AFAICS. David - > > Regards, > Aman Sharma > > PhD Student > KTH Royal Institute of Technology > School of Electrical Engineering and Computer Science (EECS) > Department of Theoretical Computer Science (TCS) > <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> > <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ > <https://algomaster99.github.io/> > > *From:* David Holmes > *Sent:* Monday, May 20, 2024 2:59:17 AM > *To:* Aman Sharma; liangchenb...@gmail.com > *Cc:* core-libs-dev@openjdk.org; leyden-...@openjdk.org > *Subject:* Re: Deterministic naming of subclasses of > `java/lang/reflect/Proxy` > On 17/05/2024 9:43 pm, Aman Sharma wrote: >> Hi Chen, >> >> > java.lang.invoke.LambdaForm$MH/0x0200cc000400 >> >> I do see this as output when I pass -verbose:class. However, based on my >> experiments, I have seen that neither an agent passed via 'javaagent' >> nor an agent passed via 'agentpath' is able to intercept this hidden class. > > How did you try to intercept them? Hidden classes are not "loaded" in > the normal sense so won't trigger class load events. > >> Also, I was a bit confused since I saw somewhere that the names of >> hidden classes are null. But thanks for clarifying here. > > The JEP clearly defines the name format for hidden classes - though the > final component is VM specific (and typically a hashcode). > > https://openjdk.org/jeps/371 <https://openjdk.org/jeps/371> > > Cheers, > David > - > >> > avoid dynamic class loading >> >> I don't see dynamic class loading as a problem. I only mind some >> unstable generation aspects of them which make it hard to verify them >> based on an allowlist. >> >> For example, if this hidden class is generated with the exact same
Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
Hi David, > How did you try to intercept them? Hidden classes are not "loaded" in the normal sense so won't trigger class load events. I could not intercept them. I only see them when I pass `-verbose:class` in the Java CLI. I also couldn't intercept them using JVMTI Class File Load Hook<https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassFileLoadHook> event. However JEP 371 suggests that it should be possible to intercept them using JVMTI Class Load<https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassLoad> event, but I won't have the bytecode at this stage. So is there no way to get its bytecode before it is linked and initialized in the JVM? Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ ____ From: David Holmes Sent: Monday, May 20, 2024 2:59:17 AM To: Aman Sharma; liangchenb...@gmail.com Cc: core-libs-dev@openjdk.org; leyden-...@openjdk.org Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` On 17/05/2024 9:43 pm, Aman Sharma wrote: > Hi Chen, > > > java.lang.invoke.LambdaForm$MH/0x0200cc000400 > > I do see this as output when I pass -verbose:class. However, based on my > experiments, I have seen that neither an agent passed via 'javaagent' > nor an agent passed via 'agentpath' is able to intercept this hidden class. How did you try to intercept them? Hidden classes are not "loaded" in the normal sense so won't trigger class load events. > Also, I was a bit confused since I saw somewhere that the names of > hidden classes are null. But thanks for clarifying here. The JEP clearly defines the name format for hidden classes - though the final component is VM specific (and typically a hashcode). https://openjdk.org/jeps/371 Cheers, David - > > avoid dynamic class loading > > I don't see dynamic class loading as a problem. I only mind some > unstable generation aspects of them which make it hard to verify them > based on an allowlist. > > For example, if this hidden class is generated with the exact same name > and the exact same bytecode during runtime as well, it would be easy to > verify it. However, I do see the names are based on some sort of memory > address so and I don't know what bytecode it has so I don't have > suggestions to make them stable as of now. For Proxy classes, I feel it > can be addressed unless you disagree or some involved in Project Leyden > does. :) Thank you for forwarding my mail there. > > Regards, > Aman Sharma > > PhD Student > KTH Royal Institute of Technology > https://algomaster99.github.io/ <https://algomaster99.github.io/> > > > *From:* liangchenb...@gmail.com > *Sent:* Friday, May 17, 2024 1:23:58 pm > *To:* Aman Sharma > *Cc:* core-libs-dev@openjdk.org ; > leyden-...@openjdk.org > *Subject:* Re: Deterministic naming of subclasses of > `java/lang/reflect/Proxy` > > Hi Aman, > For `-verbose:class`, it's a JVM argument instead of a program argument; > so when you run a java program like `java Main`, you should call it as > `java -verbose:class Main`. > When done correctly, you should see hidden class outputs like: > [0.032s][info][class,load] > java.lang.invoke.LambdaForm$MH/0x0200cc000400 source: > __JVM_LookupDefineClass__ > The loading of java.lang.invoke hidden classes requires your program to > use MethodHandle features, like a lambda. > > I think the problem you are exploring, that to avoid dynamic class > loading and effectively turn Java Platform closed for security, is also > being accomplished by project Leyden (as I've shared initially); Thus, I > am forwarding this to leyden-dev instead, so you can see what approach > Leyden uses to accomplish the same goal as yours. > > Regards, Chen Liang > > On Fri, May 17, 2024 at 4:40 AM Aman Sharma <mailto:aman...@kth.se>> wrote: > > __ > > Hi Roger, > > > Do you have ideas on how to intercept them? My javaagent is not able > to nor a JVMTI agent passed using `agentpath` option. It also does > not seem to show up in logs when I pass `-verbose:class`. > > > Also, what do you think of renaming the proxy classes as suggested > below? > > > Regards, > Aman Sharma > > PhD Stu
Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
Hi Chen, > java.lang.invoke.LambdaForm$MH/0x0200cc000400 I do see this as output when I pass -verbose:class. However, based on my experiments, I have seen that neither an agent passed via 'javaagent' nor an agent passed via 'agentpath' is able to intercept this hidden class. Also, I was a bit confused since I saw somewhere that the names of hidden classes are null. But thanks for clarifying here. > avoid dynamic class loading I don't see dynamic class loading as a problem. I only mind some unstable generation aspects of them which make it hard to verify them based on an allowlist. For example, if this hidden class is generated with the exact same name and the exact same bytecode during runtime as well, it would be easy to verify it. However, I do see the names are based on some sort of memory address so and I don't know what bytecode it has so I don't have suggestions to make them stable as of now. For Proxy classes, I feel it can be addressed unless you disagree or some involved in Project Leyden does. :) Thank you for forwarding my mail there. Regards, Aman Sharma PhD Student KTH Royal Institute of Technology https://algomaster99.github.io/ From: liangchenb...@gmail.com Sent: Friday, May 17, 2024 1:23:58 pm To: Aman Sharma Cc: core-libs-dev@openjdk.org ; leyden-...@openjdk.org Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi Aman, For `-verbose:class`, it's a JVM argument instead of a program argument; so when you run a java program like `java Main`, you should call it as `java -verbose:class Main`. When done correctly, you should see hidden class outputs like: [0.032s][info][class,load] java.lang.invoke.LambdaForm$MH/0x0200cc000400 source: __JVM_LookupDefineClass__ The loading of java.lang.invoke hidden classes requires your program to use MethodHandle features, like a lambda. I think the problem you are exploring, that to avoid dynamic class loading and effectively turn Java Platform closed for security, is also being accomplished by project Leyden (as I've shared initially); Thus, I am forwarding this to leyden-dev instead, so you can see what approach Leyden uses to accomplish the same goal as yours. Regards, Chen Liang On Fri, May 17, 2024 at 4:40 AM Aman Sharma mailto:aman...@kth.se>> wrote: Hi Roger, Do you have ideas on how to intercept them? My javaagent is not able to nor a JVMTI agent passed using `agentpath` option. It also does not seem to show up in logs when I pass `-verbose:class`. Also, what do you think of renaming the proxy classes as suggested below? Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ From: core-libs-dev mailto:core-libs-dev-r...@openjdk.org>> on behalf of Roger Riggs mailto:roger.ri...@oracle.com>> Sent: Friday, May 17, 2024 4:57:46 AM To: core-libs-dev@openjdk.org<mailto:core-libs-dev@openjdk.org> Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi Aman, You may also run into hidden classes (JEP 371: Hidden Classes) that allow classes to be defined, at runtime, without names. It has been proposed to use them for generated proxies but that hasn't been implemented yet. There are benefits to having nameless classes, because they can't be referenced by name, only as a capability, they can be better encapsulated. fyi, Roger Riggs On 5/16/24 8:11 AM, Aman Sharma wrote: Hi, Thanks for your response, Liang! > I think you meant CVE-2021-42392 instead of 2022. Sorry of the error. I indeed meant CVE-2021-42392<https://nvd.nist.gov/vuln/detail/cve-2021-42392>. > Leyden mainly avoids this unstable generation by performing a training run to > collect classes loaded Would love to know the details of Project Leyden and how they worked so far to focus on this goal. In our case, the training run is the test suite. > GeneratedConstructorAccessor is already retired by JEP 416 [2] in Java 18 I did see them not appearing in my allowlist when I ran my study subject (Apache PDFBox) with Java 21. Thanks for letting me know about this JEP. I see they are re-implemented with method handles. > How are you checking the classes? To detect runtime generated code, we have javaagent that is hooked statically to the test suite execution. It gives us all classes that that is loaded post the JVM and the javaagent are loaded. So we only check the classes loaded for the purpose of running the application. This is also why we did not choose -agentlib as it would give classes for
Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
Hi Roger, Do you have ideas on how to intercept them? My javaagent is not able to nor a JVMTI agent passed using `agentpath` option. It also does not seem to show up in logs when I pass `-verbose:class`. Also, what do you think of renaming the proxy classes as suggested below? Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ From: core-libs-dev on behalf of Roger Riggs Sent: Friday, May 17, 2024 4:57:46 AM To: core-libs-dev@openjdk.org Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi Aman, You may also run into hidden classes (JEP 371: Hidden Classes) that allow classes to be defined, at runtime, without names. It has been proposed to use them for generated proxies but that hasn't been implemented yet. There are benefits to having nameless classes, because they can't be referenced by name, only as a capability, they can be better encapsulated. fyi, Roger Riggs On 5/16/24 8:11 AM, Aman Sharma wrote: Hi, Thanks for your response, Liang! > I think you meant CVE-2021-42392 instead of 2022. Sorry of the error. I indeed meant CVE-2021-42392<https://nvd.nist.gov/vuln/detail/cve-2021-42392>. > Leyden mainly avoids this unstable generation by performing a training run to > collect classes loaded Would love to know the details of Project Leyden and how they worked so far to focus on this goal. In our case, the training run is the test suite. > GeneratedConstructorAccessor is already retired by JEP 416 [2] in Java 18 I did see them not appearing in my allowlist when I ran my study subject (Apache PDFBox) with Java 21. Thanks for letting me know about this JEP. I see they are re-implemented with method handles. > How are you checking the classes? To detect runtime generated code, we have javaagent that is hooked statically to the test suite execution. It gives us all classes that that is loaded post the JVM and the javaagent are loaded. So we only check the classes loaded for the purpose of running the application. This is also why we did not choose -agentlib as it would give classes for the setting up JVM and javaagent and we the user of our tool must the classes they load. Next, we have a `ClassFileTransformer` hook in the agent where we produce the checksum using the bytecode. And we compare the checksum with the one existing in the allowlist. The checksum computation algorithm is same for both steps. Let me describe how I compute the checksum. 1. I get the CONSTANT_Class_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1> entry corresponding to `this_class` and rewrite the CONSTANT_Utf8_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7> corresponding to a fix String constant, say "foo". 2. Since, the name of the class is used to refer to its types members (fields/method), I get all CONSTANT_Fieldref_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2> and if its `class_index` corresponds to the old `this_class`, we rewrite the UTF8 value of class_index to the same constant "foo". 3. Next, since the naming of the fields, in Proxy classes, are also suffixed by numbers, for example, `private static Method m4`, we rewrite the UTF8 value of name in the CONSTANT_NameAndType_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6>. 4. These fields can also have a random order so we simply sort the entire byte code using `Arrays.sort(byte[])` to eliminate any differences due to ordering of fields/methods. 5. Simply sorting the byte array still had minute differences. I could not understand why they existed even though values in constant pool of the bytecode in allowlist and at runtime were exactly the same after rewriting. The differences existed in the bytes of the Code attribute of methods. I concluded that the bytes stored some position information. To avoid this, I created a subarray where I considered the bytes corresponding to `CONSTANT_Utf8_info.bytes` only. Computing a checksum for it resulted in the same checksums for both classfiles. Let's understand the whole approach with an example of Proxy class. ` public final class $Proxy42 extends Proxy implements org.apache.logging.log4j.core.config.plugins.Plugin { ` The will go in the allowlist as "Proxy_Plugin: ". When the same class is intercepted at runtime, say "$Proxy10", we look for "Proxy_Plugin" in the allowlist and since the checksum algorithm is same in both cases, we g
Re: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
Hi, > have not looked into LambdaMetafactory because I did not encounter it as a > problem so far It is possible that java agents are unable to intercept it. `-verbose:class` logs classes such as "org.apache.pdfbox.cos.COSDocument$$Lambda/0x7a80631a0d08". Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ ________ From: Aman Sharma Sent: Thursday, May 16, 2024 2:11:59 PM To: liangchenb...@gmail.com; core-libs-dev Cc: Martin Monperrus Subject: Re: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi, Thanks for your response, Liang! > I think you meant CVE-2021-42392 instead of 2022. Sorry of the error. I indeed meant CVE-2021-42392<https://nvd.nist.gov/vuln/detail/cve-2021-42392>. > Leyden mainly avoids this unstable generation by performing a training run to > collect classes loaded Would love to know the details of Project Leyden and how they worked so far to focus on this goal. In our case, the training run is the test suite. > GeneratedConstructorAccessor is already retired by JEP 416 [2] in Java 18 I did see them not appearing in my allowlist when I ran my study subject (Apache PDFBox) with Java 21. Thanks for letting me know about this JEP. I see they are re-implemented with method handles. > How are you checking the classes? To detect runtime generated code, we have javaagent that is hooked statically to the test suite execution. It gives us all classes that that is loaded post the JVM and the javaagent are loaded. So we only check the classes loaded for the purpose of running the application. This is also why we did not choose -agentlib as it would give classes for the setting up JVM and javaagent and we the user of our tool must the classes they load. Next, we have a `ClassFileTransformer` hook in the agent where we produce the checksum using the bytecode. And we compare the checksum with the one existing in the allowlist. The checksum computation algorithm is same for both steps. Let me describe how I compute the checksum. 1. I get the CONSTANT_Class_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1> entry corresponding to `this_class` and rewrite the CONSTANT_Utf8_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7> corresponding to a fix String constant, say "foo". 2. Since, the name of the class is used to refer to its types members (fields/method), I get all CONSTANT_Fieldref_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2> and if its `class_index` corresponds to the old `this_class`, we rewrite the UTF8 value of class_index to the same constant "foo". 3. Next, since the naming of the fields, in Proxy classes, are also suffixed by numbers, for example, `private static Method m4`, we rewrite the UTF8 value of name in the CONSTANT_NameAndType_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6>. 4. These fields can also have a random order so we simply sort the entire byte code using `Arrays.sort(byte[])` to eliminate any differences due to ordering of fields/methods. 5. Simply sorting the byte array still had minute differences. I could not understand why they existed even though values in constant pool of the bytecode in allowlist and at runtime were exactly the same after rewriting. The differences existed in the bytes of the Code attribute of methods. I concluded that the bytes stored some position information. To avoid this, I created a subarray where I considered the bytes corresponding to `CONSTANT_Utf8_info.bytes` only. Computing a checksum for it resulted in the same checksums for both classfiles. Let's understand the whole approach with an example of Proxy class. ` public final class $Proxy42 extends Proxy implements org.apache.logging.log4j.core.config.plugins.Plugin { ` The will go in the allowlist as "Proxy_Plugin: ". When the same class is intercepted at runtime, say "$Proxy10", we look for "Proxy_Plugin" in the allowlist and since the checksum algorithm is same in both cases, we get a match and let the class load. This approach has seemed to work well for Proxy classes, Generated Constructor Accessor (which is removed as you said). I also looked at the species generated by method handles. I did not notice any modification in them. Their name generation seemed okay to me. If some new Species are generated, it is of course detected since it is not in the allowlist. I have not looked into LambdaMetafactory
Re: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
Hi, Thanks for your response, Liang! > I think you meant CVE-2021-42392 instead of 2022. Sorry of the error. I indeed meant CVE-2021-42392<https://nvd.nist.gov/vuln/detail/cve-2021-42392>. > Leyden mainly avoids this unstable generation by performing a training run to > collect classes loaded Would love to know the details of Project Leyden and how they worked so far to focus on this goal. In our case, the training run is the test suite. > GeneratedConstructorAccessor is already retired by JEP 416 [2] in Java 18 I did see them not appearing in my allowlist when I ran my study subject (Apache PDFBox) with Java 21. Thanks for letting me know about this JEP. I see they are re-implemented with method handles. > How are you checking the classes? To detect runtime generated code, we have javaagent that is hooked statically to the test suite execution. It gives us all classes that that is loaded post the JVM and the javaagent are loaded. So we only check the classes loaded for the purpose of running the application. This is also why we did not choose -agentlib as it would give classes for the setting up JVM and javaagent and we the user of our tool must the classes they load. Next, we have a `ClassFileTransformer` hook in the agent where we produce the checksum using the bytecode. And we compare the checksum with the one existing in the allowlist. The checksum computation algorithm is same for both steps. Let me describe how I compute the checksum. 1. I get the CONSTANT_Class_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1> entry corresponding to `this_class` and rewrite the CONSTANT_Utf8_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7> corresponding to a fix String constant, say "foo". 2. Since, the name of the class is used to refer to its types members (fields/method), I get all CONSTANT_Fieldref_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2> and if its `class_index` corresponds to the old `this_class`, we rewrite the UTF8 value of class_index to the same constant "foo". 3. Next, since the naming of the fields, in Proxy classes, are also suffixed by numbers, for example, `private static Method m4`, we rewrite the UTF8 value of name in the CONSTANT_NameAndType_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6>. 4. These fields can also have a random order so we simply sort the entire byte code using `Arrays.sort(byte[])` to eliminate any differences due to ordering of fields/methods. 5. Simply sorting the byte array still had minute differences. I could not understand why they existed even though values in constant pool of the bytecode in allowlist and at runtime were exactly the same after rewriting. The differences existed in the bytes of the Code attribute of methods. I concluded that the bytes stored some position information. To avoid this, I created a subarray where I considered the bytes corresponding to `CONSTANT_Utf8_info.bytes` only. Computing a checksum for it resulted in the same checksums for both classfiles. Let's understand the whole approach with an example of Proxy class. ` public final class $Proxy42 extends Proxy implements org.apache.logging.log4j.core.config.plugins.Plugin { ` The will go in the allowlist as "Proxy_Plugin: ". When the same class is intercepted at runtime, say "$Proxy10", we look for "Proxy_Plugin" in the allowlist and since the checksum algorithm is same in both cases, we get a match and let the class load. This approach has seemed to work well for Proxy classes, Generated Constructor Accessor (which is removed as you said). I also looked at the species generated by method handles. I did not notice any modification in them. Their name generation seemed okay to me. If some new Species are generated, it is of course detected since it is not in the allowlist. I have not looked into LambdaMetafactory because I did not encounter it as a problem so far, but I am aware its name generation is also unstable. I have run my approach only a few projects only. And for hidden classes, I assume the the agent won't be able to intercept them so detecting them would be really hard. Regards, Aman Sharma PhD Student KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Theoretical Computer Science (TCS) <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ From: liangchenb...@gmail.com Sent: Thursday, May 16, 2024 5:52:03 AM To: Aman Sharma; core-libs-dev Cc: Martin Monperrus Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy` Hi