Hi Roger,

Do you have ideas on how to intercept them? My javaagent is not able to nor a 
JVMTI agent passed using `agentpath` option. It also does not seem to show up 
in logs when I pass `-verbose:class`.


Also, what do you think of renaming the proxy classes as suggested below?


Regards,
Aman Sharma

PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha>
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/
________________________________
From: core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf of Roger Riggs 
<roger.ri...@oracle.com>
Sent: Friday, May 17, 2024 4:57:46 AM
To: core-libs-dev@openjdk.org
Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`

Hi Aman,

You may also run into hidden classes (JEP 371: Hidden Classes) that allow 
classes to be defined, at runtime, without names.
It has been proposed to use them for generated proxies but that hasn't been 
implemented yet.
There are benefits to having nameless classes, because they can't be referenced 
by name, only as a capability, they can be better encapsulated.

fyi, Roger Riggs


On 5/16/24 8:11 AM, Aman Sharma wrote:

Hi,


Thanks for your response, Liang!


> I think you meant CVE-2021-42392 instead of 2022.


Sorry of the error. I indeed meant 
CVE-2021-42392<https://nvd.nist.gov/vuln/detail/cve-2021-42392>.


> Leyden mainly avoids this unstable generation by performing a training run to 
> collect classes loaded


Would love to know the details of Project Leyden and how they worked so far to 
focus on this goal. In our case, the training run is the test suite.


> GeneratedConstructorAccessor is already retired by JEP 416 [2] in Java 18


I did see them not appearing in my allowlist when I ran my study subject 
(Apache PDFBox) with Java 21. Thanks for letting me know about this JEP. I see 
they are re-implemented with method handles.


> How are you checking the classes?


To detect runtime generated code, we have javaagent that is hooked statically 
to the test suite execution. It gives us all classes that that is loaded post 
the JVM and the javaagent are loaded. So we only check the classes loaded for 
the purpose of running the application. This is also why we did not choose 
-agentlib as it would give classes for the setting up JVM and javaagent and we 
the user of our tool must the classes they load.


Next, we have a `ClassFileTransformer` hook in the agent where we produce the 
checksum using the bytecode. And we compare the checksum with the one existing 
in the allowlist. The checksum computation algorithm is same for both steps. 
Let me describe how I compute the checksum.


  1.  I get the 
CONSTANT_Class_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1>
 entry corresponding to `this_class` and rewrite the 
CONSTANT_Utf8_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7>
 corresponding to a fix String constant, say "foo".
  2.  Since, the name of the class is used to refer to its types members 
(fields/method), I get all 
CONSTANT_Fieldref_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2>
 and if its `class_index` corresponds to the old `this_class`, we rewrite the 
UTF8 value of class_index to the same constant "foo".
  3.  Next, since the naming of the fields, in Proxy classes, are also suffixed 
by numbers, for example, `private static Method m4`, we rewrite the UTF8 value 
of name in the 
CONSTANT_NameAndType_info<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6>.
  4.  These fields can also have a random order so we simply sort the entire 
byte code using `Arrays.sort(byte[])` to eliminate any differences due to 
ordering of fields/methods.
  5.  Simply sorting the byte array still had minute differences. I could not 
understand why they existed even though values in constant pool of the bytecode 
in allowlist and at runtime were exactly the same after rewriting. The 
differences existed in the bytes of the Code attribute of methods. I concluded 
that the bytes stored some position information. To avoid this, I created a 
subarray where I considered the bytes corresponding to 
`CONSTANT_Utf8_info.bytes` only. Computing a checksum for it resulted in the 
same checksums for both classfiles.

Let's understand the whole approach with an example of Proxy class.

`

public final class $Proxy42 extends Proxy implements 
org.apache.logging.log4j.core.config.plugins.Plugin {

`

The will go in the allowlist as "Proxy_Plugin: <SHA256 checksum>".

When the same class is intercepted at runtime, say "$Proxy10", we look for 
"Proxy_Plugin" in the allowlist and since the checksum algorithm is same in 
both cases, we get a match and let the class load.

This approach has seemed to work well for Proxy classes, Generated Constructor 
Accessor (which is removed as you said). I also looked at the species generated 
by method handles. I did not notice any modification in them. Their name 
generation seemed okay to me. If some new Species are generated, it is of 
course detected since it is not in the allowlist.

I have not looked into LambdaMetafactory because I did not encounter it as a 
problem so far, but I am aware its name generation is also unstable. I have run 
my approach only a few projects only. And for hidden classes, I assume the the 
agent won't be able to intercept them so detecting them would be really hard.


Regards,
Aman Sharma

PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/
________________________________
From: liangchenb...@gmail.com<mailto:liangchenb...@gmail.com> 
<liangchenb...@gmail.com><mailto:liangchenb...@gmail.com>
Sent: Thursday, May 16, 2024 5:52:03 AM
To: Aman Sharma; core-libs-dev
Cc: Martin Monperrus
Subject: Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`

Hi Aman,
I think you meant CVE-2021-42392 instead of 2022.

For your approach of an "allowlist" for Java runtime, project Leyden is looking 
to generate a static image [1], that
> At run time it cannot load classes from outside the image, nor can it create 
> classes dynamically.
Leyden mainly avoids this unstable generation by performing a training run to 
collect classes loaded and even object graphs; I am not familiar with the 
details unfortunately.

Otherwise, the Proxy discussion belongs better to core-libs-dev, as 
java.lang.reflect.Proxy is part of Java's core libraries. I am replying this 
thread to core-libs-dev.

For your perceived problem that classes don't have unique names, your 
description sounds dubious: GeneratedConstructorAccessor is already retired by 
JEP 416 [2] in Java 18, and there are many other cases in which JDK generates 
classes without stable names, notoriously LambdaMetafactory (Gradle wished for 
cacheable Lambdas); the same applies for the generated classes for 
MethodHandle's LambdaForms (which carries implementation code for LambdaForm). 
How are you checking the classes? It seems you are not checking hidden classes. 
Proxy and Lambda classes are defined by the caller's class loader, while 
LambdaForms are under JDK's system class loader I think. We need to ensure you 
are correctly finding all unstable classes before we can proceed.

[1]: https://openjdk.org/projects/leyden/notes/01-beginnings
[2]: https://openjdk.org/jeps/416

On Wed, May 15, 2024 at 7:00 PM Aman Sharma 
<aman...@kth.se<mailto:aman...@kth.se>> wrote:

Hi,


My name is Aman and I am a PhD student at KTH Royal Institute of Technology, 
Stockholm, Sweden. I research as part of CHAINS<https://chains.proj.kth.se/> 
project to strengthen the software supply chain of multiple ecosystem. I 
particularly focus on runtime integrity in Java. In this email, I want to write 
about an issue I have discovered with dynamic generation of 
`java.lang.reflect.Proxy`classes. I will propose a solution and would love to 
hear the feedback from the community. Let me know if this is the correct 
mailing-list for such discussions. It seemed the most relevant from this 
list<https://mail.openjdk.org/mailman/listinfo>.


My research


Java has features to load class on the fly - it can either download or generate 
a class at runtime. These features are useful for inner workings of JDK. For 
example, implementing annotations, reflective access, etc. However, these 
features have also contributed to critical vulnerabilities in the past - 
CVE-2021-44228  (log4shell), CVE-2022-33980, CVE-2022-42392. All of these 
vulnerabilities have one thing in common - a class that was not known during 
build time was downloaded/generated at runtime and loaded into JVM.


To defend against such vulnerabilities, we propose a solution to allowlist 
classes for runtime. This allowlist will contain an exhaustive list of classes 
that can be loaded by the JVM and it will be enforced at runtime. We build this 
allowlist from three sources:

  1.  All classes of all modules provided by the Java Standard Library. We use 
ClassGraph<https://github.com/classgraph/classgraph> to scan the JDK.
  2.  We can take the source code and all dependencies of an application. We 
use a software bill of materials to get all the data.
  3.  Finally, we use run the test suite to include any runtime 
downloaded/generated classes.

Such a list is able to prevent the above 3 CVEs because it does not let the 
"unknown" bytecode to be loaded.

Problem with generating such an allowlist

The first two parts of the allowlist are easy to get. The problem is with the 
third step where we want to allowlist all the classes that could be downloaded 
or generated. Upon running the test suite and hooking to the classes it loads, 
we observer that the list consists of classes that are called 
"com/sun/proxy/$Proxy2", "jdk/internal/reflect/GeneratedConstructorAccessor3" 
among many more. The purpose of these classes can be identifed. The proxy class 
is created for to implement an annotation. The accessor gives access to 
constructor of a class to the JVM.

When enforcing this allowlist at runtime, we see that the bytecode content for 
"com/sun/proxy/$Proxy2" differs in the allowlist and at runtime. In our case, 
we we are experimenting with pdfbox<https://github.com/apache/pdfbox> so we 
created the allowlist using its test suite. Then we enforced this allowlist 
while running some of its subcommands. However, there was some other proxy 
class say "com/sun/proxy/$Proxy5" at runtime that implemented the same 
interfaces and had the same methods as "com/sun/proxy/$Proxy2" in the 
allowlist. They only differed in the name of the class, order of fields, and 
types for fields references. This could happen because the order of the loading 
of class is workload dependent, but it causes problem to generate such an 
allowlist.

Solution

We propose that naming of subclasses of "java/lang/reflect/Proxy" should not be 
dependent upon the order of loading. In order to do so, two issues can be fixed:

  1.  The naming of the class should not be based on 
AtomicLong<https://github.com/openjdk/jdk/blob/b687aa550837830b38f0f0faa69c353b1e85219c/src/java.base/share/classes/java/lang/reflect/Proxy.java#L531>.
 Rather it could be named based on the interfaces it implements. I also wonder 
why AtomicLong is chosen in the first place.
  2.  Methods of the interfaces must be in a particular order. Right now, they 
are not sorted in any particular 
order<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178>.


These fixes will make proxy class generation deterministic with respect to 
order of loading and won't be flagged at runtime since the test suite would 
already detect them.

I would love to hear from the community about these ideas. If in agreement, I 
would be happy to produce a patch. I have discovered this issue with subclasses 
of 
GeneratedConstructorAccessor<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/reflect/ConstructorAccessor.java>
 as well and I imagine it will also apply to some other runtime generated 
classes. If you disagree, please let me know also. It helps with my research.

I also have PoCs for the above 
CVEs<https://github.com/chains-project/exploits-for-sbom.exe> and a proof 
concept tool is being developed under the name 
sbom.exe<https://github.com/chains-project/sbom.exe> in case any one wonders 
about the implementation. I would also be happy to explain more.

Regards,
Aman Sharma

PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/

Reply via email to