Hi,
Thanks for your response, Liang!
> I think you meant CVE-2021-42392 instead of 2022.
Sorry of the error. I indeed meant CVE-2021-42392
<https://nvd.nist.gov/vuln/detail/cve-2021-42392>.
> Leyden mainly avoids this unstable generation by performing a
training run to collect classes loaded
Would love to know the details of Project Leyden and how they worked
so far to focus on this goal. In our case, the training run is the
test suite.
> GeneratedConstructorAccessor is already retired by JEP 416 [2] in
Java 18
I did see them not appearing in my allowlist when I ran my study
subject (Apache PDFBox) with Java 21. Thanks for letting me know about
this JEP. I see they are re-implemented with method handles.
> How are you checking the classes?
To detect runtime generated code, we have javaagent that is hooked
statically to the test suite execution. It gives us all classes that
that is loaded post the JVM and the javaagent are loaded. So we only
check the classes loaded for the purpose of running the application.
This is also why we did not choose -agentlib as it would give classes
for the setting up JVM and javaagent and we the user of our tool must
the classes they load.
Next, we have a `ClassFileTransformer` hook in the agent where we
produce the checksum using the bytecode. And we compare the checksum
with the one existing in the allowlist. The checksum computation
algorithm is same for both steps. Let me describe how I compute the
checksum.
1. I get the CONSTANT_Class_info
<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1>
entry corresponding to `this_class` and rewrite the
CONSTANT_Utf8_info
<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7>
corresponding to a fix String constant, say "foo".
2. Since, the name of the class is used to refer to its types members
(fields/method), I get all CONSTANT_Fieldref_info
<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2>
and if its `class_index` corresponds to the old `this_class`, we
rewrite the UTF8 value of class_index to the same constant "foo".
3. Next, since the naming of the fields, in Proxy classes, are also
suffixed by numbers, for example, `private static Method m4`, we
rewrite the UTF8 value of name in the CONSTANT_NameAndType_info
<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6>.
4. These fields can also have a random order so we simply sort the
entire byte code using `Arrays.sort(byte[])` to eliminate any
differences due to ordering of fields/methods.
5. Simply sorting the byte array still had minute differences. I
could not understand why they existed even though values in
constant pool of the bytecode in allowlist and at runtime were
exactly the same after rewriting. The differences existed in the
bytes of the Code attribute of methods. I concluded that the bytes
stored some position information. To avoid this, I created a
subarray where I considered the bytes corresponding to
`CONSTANT_Utf8_info.bytes` only. Computing a checksum for it
resulted in the same checksums for both classfiles.
Let's understand the whole approach with an example of Proxy class.
`
public final class $Proxy42 extends Proxy implements
org.apache.logging.log4j.core.config.plugins.Plugin {
`
The will go in the allowlist as "Proxy_Plugin: <SHA256 checksum>".
When the same class is intercepted at runtime, say "$Proxy10", we look
for "Proxy_Plugin" in the allowlist and since the checksum algorithm
is same in both cases, we get a match and let the class load.
This approach has seemed to work well for Proxy classes, Generated
Constructor Accessor (which is removed as you said). I also looked at
the species generated by method handles. I did not notice any
modification in them. Their name generation seemed okay to me. If some
new Species are generated, it is of course detected since it is not in
the allowlist.
I have not looked into LambdaMetafactory because I did not encounter
it as a problem so far, but I am aware its name generation is also
unstable. I have run my approach only a few projects only. And for
hidden classes, I assume the the agent won't be able to intercept them
so detecting them would be really hard.
Regards,
Aman Sharma
PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/
<https://algomaster99.github.io/>
------------------------------------------------------------------------
*From:* liangchenb...@gmail.com <liangchenb...@gmail.com>
*Sent:* Thursday, May 16, 2024 5:52:03 AM
*To:* Aman Sharma; core-libs-dev
*Cc:* Martin Monperrus
*Subject:* Re: Deterministic naming of subclasses of
`java/lang/reflect/Proxy`
Hi Aman,
I think you meant CVE-2021-42392 instead of 2022.
For your approach of an "allowlist" for Java runtime, project Leyden
is looking to generate a static image [1], that
> At run time it cannot load classes from outside the image, nor can
it create classes dynamically.
Leyden mainly avoids this unstable generation by performing a training
run to collect classes loaded and even object graphs; I am not
familiar with the details unfortunately.
Otherwise, the Proxy discussion belongs better to core-libs-dev, as
java.lang.reflect.Proxy is part of Java's core libraries. I am
replying this thread to core-libs-dev.
For your perceived problem that classes don't have unique names, your
description sounds dubious: GeneratedConstructorAccessor is already
retired by JEP 416 [2] in Java 18, and there are many other cases in
which JDK generates classes without stable names, notoriously
LambdaMetafactory (Gradle wished for cacheable Lambdas); the same
applies for the generated classes for MethodHandle's LambdaForms
(which carries implementation code for LambdaForm). How are you
checking the classes? It seems you are not checking hidden classes.
Proxy and Lambda classes are defined by the caller's class loader,
while LambdaForms are under JDK's system class loader I think. We need
to ensure you are correctly finding all unstable classes before we can
proceed.
[1]: https://openjdk.org/projects/leyden/notes/01-beginnings
[2]: https://openjdk.org/jeps/416
On Wed, May 15, 2024 at 7:00 PM Aman Sharma <aman...@kth.se> wrote:
Hi,
My name is Aman and I am a PhD student at KTH Royal Institute of
Technology, Stockholm, Sweden. I research as part of CHAINS
<https://chains.proj.kth.se/> project to strengthen the software
supply chain of multiple ecosystem. I particularly focus on
runtime integrity in Java. In this email, I want to write about an
issue I have discovered with /dynamic generation of
`java.lang.reflect.Proxy`classes/. I will propose a solution and
would love to hear the feedback from the community. Let me know if
this is the correct mailing-list for such discussions. It seemed
the most relevant from this list
<https://mail.openjdk.org/mailman/listinfo>.
*My research*
*
*
Java has features to load class on the fly - it can either
download or generate a class at runtime. These features are useful
for inner workings of JDK. For example, implementing annotations,
reflective access, etc. However, these features have also
contributed to critical vulnerabilities in the past
- CVE-2021-44228 (log4shell), CVE-2022-33980, CVE-2022-42392. All
of these vulnerabilities have one thing in common - /a class that
was not known during build time was downloaded/generated at
runtime and loaded into JVM./
To defend against such vulnerabilities, we propose a solution to
/allowlist classes for runtime/. This allowlist will contain an
exhaustive list of classes that can be loaded by the JVM and it
will be enforced at runtime. We build this allowlist from three
sources:
1. All classes of all modules provided by the Java Standard
Library. We use ClassGraph
<https://github.com/classgraph/classgraph> to scan the JDK.
2. We can take the source code and all dependencies of an
application. We use a software bill of materials to get all
the data.
3. Finally, we use run the test suite to include any runtime
downloaded/generated classes.
Such a list is able to prevent the above 3 CVEs because it does
not let the "unknown" bytecode to be loaded.
*Problem with generating such an allowlist*
*
*
The first two parts of the allowlist are easy to get. The problem
is with the third step where we want to allowlist all the classes
that could be downloaded or generated. Upon running the test suite
and hooking to the classes it loads, we observer that the list
consists of classes that are called "com/sun/proxy/$Proxy2",
"jdk/internal/reflect/GeneratedConstructorAccessor3" among many
more. The purpose of these classes can be identifed. The proxy
class is created for to implement an annotation. The accessor
gives access to constructor of a class to the JVM.
When enforcing this allowlist at runtime, we see that the bytecode
content for "com/sun/proxy/$Proxy2" differs in the allowlist and
at runtime. In our case, we we are experimenting with pdfbox
<https://github.com/apache/pdfbox> so we created the allowlist
using its test suite. Then we enforced this allowlist while
running some of its subcommands. However, there was some other
proxy class say "com/sun/proxy/$Proxy5" at runtime that
implemented the same interfaces and had the same methods as
"com/sun/proxy/$Proxy2" in the allowlist. They only differed in
the name of the class, order of fields, and types for fields
references. This could happen because the order of the loading of
class is workload dependent, but it causes problem to generate
such an allowlist.
*Solution
*
We propose that naming of subclasses of "java/lang/reflect/Proxy"
should not be dependent upon the order of loading. In order to do
so, two issues can be fixed:
1. The naming of the class should not be based on AtomicLong
<https://github.com/openjdk/jdk/blob/b687aa550837830b38f0f0faa69c353b1e85219c/src/java.base/share/classes/java/lang/reflect/Proxy.java#L531>.
Rather it could be named based on the interfaces it
implements. I also wonder why AtomicLong is chosen in the
first place.
2. Methods of the interfaces must be in a particular order. Right
now, they are not sorted in any particular order
<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178>.
These fixes will make proxy class generation deterministic with
respect to order of loading and won't be flagged at runtime since
the test suite would already detect them.
I would love to hear from the community about these ideas. If in
agreement, I would be happy to produce a patch. I have discovered
this issue with subclasses of GeneratedConstructorAccessor
<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/reflect/ConstructorAccessor.java>
as well and I imagine it will also apply to some other runtime
generated classes. If you disagree, please let me know also. It
helps with my research.
I also have PoCs for the above CVEs
<https://github.com/chains-project/exploits-for-sbom.exe> and a
proof concept tool is being developed under the name sbom.exe
<https://github.com/chains-project/sbom.exe> in case any one
wonders about the implementation. I would also be happy to explain
more.
Regards,
Aman Sharma
PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/