Re: Source code analysis: calls to wrapper class constructors
> On Oct 19, 2020, at 6:01 PM, Dan Smith wrote: > > In the context of the Warnings for Value-Based Classes JEP, we're looking for > usages of the deprecated wrapper class constructors ('new Integer(...)', 'new > Double(...)', etc.). When do these get used? How often is this motivated by > wanting a unique object vs. legacy code that has no particular reason not to > use 'valueOf'? > > We've got some investigations going on at Oracle, including looking at some > open-source projects, but I'm interested in examples from other > companies/projects as well. Please investigate and share what you find! > > I'll reply in a few days with our analysis for the code we look at. Some initial areas of particular concern: - How disruptive will it be if legacy jar files stop running on the JVM? How often will projects in, say, 2023 depend on binaries that haven't been recompiled since 9, when the constructors were deprecated? (We've seen evidence that many projects—a lot of Apache APIs for example—fixed up their code in response to the deprecation warnings.) - How disruptive will it be if programs need to update their older shaded sources (repackaged sources copied from another project) before they can compile? These sources are more likely to go unmaintained, preserving 'new Integer' calls from years ago.
Re: Source code analysis: calls to wrapper class constructors
Here are some numbers looking at Maven jars published since 2019: 116,532 jars 4,726 jars that invoke a wrapper constructor (4.1%) 1,031 jars that have more than 10 classes invoking a wrapper constructor (0.9%) If we focus just on the largest projects (jars with >1000 classes): 3,315 jars 1,620 jars that invoke a wrapper constructor (48.9%) 805 jars that have more than 10 classes invoking a wrapper constructor (24.3%) 133 jars that have more than 100 classes invoking a wrapper constructor (4.0%) Are these intended to be run on bleeding-edge JVMs? Class files show the following version breakdown: Pre-5: 2% 5-6: 34% 7: 15% 8: 46% 11: 2% Others (non-LTS): 0% What this doesn't tell us is the *latest* JVM a jar file is intended to support. We can at least say that newly-released jars targeting JDK 8 or earlier will be with us for quite some time. The wrapper constructors were deprecated in 9, and the post-8 sample size is fairly small, but there's not an obvious trend towards reduced use of wrapper constructors in those classes. On the other hand, when you dig into specific projects, I *do* see plenty of examples of projects that have updated their code, apparently in response to the deprecation warnings. Some specific examples, chosen pretty much at random among the worst offenders: - Apache Commons: The legacy commons.lang and commons.collections packages have many constructor calls, but these have been fixed in the newer APIs commons.lang3 and commons.collections4 - Apache FOP: Lots of constructor calls in generated code (representing fonts); these have all been fixed as of FOP 2.2, released in 2017. - Clover repackages old versions of utility libraries like fastutil and Apache Commons, which make heavy use of the wrapper constructors. Both of these projects have been updated to use 'valueOf', but the repackaged sources preserve the old code. - The GemFire project has a lot of constructor calls, but almost all appear in tests. - org.omg.CORBA makes heavy use of the constructors, but is also unmaintained code as of JEP 320 (I think?). Would there be an expectation that these libraries continue to run on a new JVM in a few years? - OpenCMS is an example of an active project that calls the constructor in what appears to be old code, e.g., manually boxing to Object when storing primitives in collections.
Re: Source code analysis: calls to wrapper class constructors
One more data-gathering exercise: I took a closer look at some popular Maven projects to see how they've evolved in their use of wrapper constructors. - junit:junit < 4.12 (<2014): many problems in junit.framework.Assert & org.junit.Assert 4.12-4.13 (2014-2020): a few problems in junit.framework.Assert 4.13.1 (2020): fixed org.ow2.asm:asm < 6.0 (<2017): many problems in org.objectweb.asm.ClassReader & Opcodes 6.0 (2017): problem in org.objectweb.asm.Opcodes 6.1+ (2018+): no problems commons-lang:commons-lang & org.apache.commons:commons-lang3 < 3.0 (<2011): many problems 3.0 (2011): many problems 3.1+ (2011+): fixed com.fasterxml.jackson.core:jackson-databind < 2.9.0 (<2017): one call in com.fasterxml.jackson.databind.deser.std.NumberDeserializers 2.9.0+ (2017+): fixed javax.xml.bind:jaxb-api 1.0 (2006): two problems 2.0+ (2006+): fixed Log4j (log4j:log4j & org.apache.logging.log4j:log4j-core) < 2.0 (<2014): multiple problems 2.0+ (2014+): no issues org.mockito:mockito-core < 2.1.0 (<2016): many problems, including in repackaged ASM 2.1.0+ (2016+): fixed org.scala-lang:scala-library < 2.8.0 (<2010): problems in scala.Console$, scala.Predef$, scala.mobile.Code, scala.runtime.BoxesUtility 2.8.0-2.9.3 (2010-2013): one issue, scala.actors.threadpool.locks.ReentrantReadWriteLock 2.10.0+ (2012+): fixed org.hibernate:hibernate-core < 4.0 (<2011): many, many, problems 4.0.0-5.4.22 (2011-2020): a few problems 6.0.0+ (not yet released): fixed ch.qos.logback:logback-classic < 1.0.7 (<2012): lots of problems 1.0.7+ (2012): fixed org.clojure < 1.10.0 (<2018): repackages ASM, a few other problems 1.1.0.0+ (2018+): fixed com.google.code.gson < 1.4 (<2010): problems in com.google.gson.DefaultInstanceCreators & JsonParser 1.4+ (2010+): fixed org.jetbrains.kotlin:kotlin-stdlib < 1.3.0 (<2018): no problems 1.3.0+ (2018+): new problems in kotlin.coroutines.jvm.internal.Boxing No problems: org.junit.jupiter:junit-jupiter-api (earliest release 2017) com.google.guava:guava (earliest release 2011) org.apache.commons.commons-collections4 (earliest release 2013) com.fasterxml.jackson.core:jackson-core (earliest release 2012) org.slf4j:slf4j-api (earliest release 2006) org.slf4j:slf4j-log4j12 (earliest release 2006) org.apache.logging.log4j:log4j-core (earliest release 2014) commons-io:commons-io (earliest release 2005) javax.servlet:javax.servlet-api (earliest release 2011) org.apache.httpcomponents:httpclient (earliest release 2009) - It does seem like the deprecation warnings introduced in 9 are working—lots of projects have responded by changing their code. A few cases worth special attention: - JUnit 4 was only very recently fixed (and it's in maintenance mode, must clients use 4.11 or 4.12, not 4.13) - Hibernate still has problems until its coming 6.0 release - Kotlin added (!) some constructor calls in 2018, and they're still there - Clojure and ASM were fixed pretty recently Still, it's a positive sign that only one of these projects is making wrapper constructor calls in its latest sources.
Re: Source code analysis: calls to wrapper class constructors
Overall, I find this data quite encouraging. It says that when we finally deprecate something, the warnings get noticed, and a lot of uses are fixed within a few years. Surely raising DepCon to FOR_REMOVAL will have additional effect. On 10/23/2020 9:20 PM, Dan Smith wrote: One more data-gathering exercise: I took a closer look at some popular Maven projects to see how they've evolved in their use of wrapper constructors. - junit:junit < 4.12 (<2014): many problems in junit.framework.Assert & org.junit.Assert 4.12-4.13 (2014-2020): a few problems in junit.framework.Assert 4.13.1 (2020): fixed org.ow2.asm:asm < 6.0 (<2017): many problems in org.objectweb.asm.ClassReader & Opcodes 6.0 (2017): problem in org.objectweb.asm.Opcodes 6.1+ (2018+): no problems commons-lang:commons-lang & org.apache.commons:commons-lang3 < 3.0 (<2011): many problems 3.0 (2011): many problems 3.1+ (2011+): fixed com.fasterxml.jackson.core:jackson-databind < 2.9.0 (<2017): one call in com.fasterxml.jackson.databind.deser.std.NumberDeserializers 2.9.0+ (2017+): fixed javax.xml.bind:jaxb-api 1.0 (2006): two problems 2.0+ (2006+): fixed Log4j (log4j:log4j & org.apache.logging.log4j:log4j-core) < 2.0 (<2014): multiple problems 2.0+ (2014+): no issues org.mockito:mockito-core < 2.1.0 (<2016): many problems, including in repackaged ASM 2.1.0+ (2016+): fixed org.scala-lang:scala-library < 2.8.0 (<2010): problems in scala.Console$, scala.Predef$, scala.mobile.Code, scala.runtime.BoxesUtility 2.8.0-2.9.3 (2010-2013): one issue, scala.actors.threadpool.locks.ReentrantReadWriteLock 2.10.0+ (2012+): fixed org.hibernate:hibernate-core < 4.0 (<2011): many, many, problems 4.0.0-5.4.22 (2011-2020): a few problems 6.0.0+ (not yet released): fixed ch.qos.logback:logback-classic < 1.0.7 (<2012): lots of problems 1.0.7+ (2012): fixed org.clojure < 1.10.0 (<2018): repackages ASM, a few other problems 1.1.0.0+ (2018+): fixed com.google.code.gson < 1.4 (<2010): problems in com.google.gson.DefaultInstanceCreators & JsonParser 1.4+ (2010+): fixed org.jetbrains.kotlin:kotlin-stdlib < 1.3.0 (<2018): no problems 1.3.0+ (2018+): new problems in kotlin.coroutines.jvm.internal.Boxing No problems: org.junit.jupiter:junit-jupiter-api (earliest release 2017) com.google.guava:guava (earliest release 2011) org.apache.commons.commons-collections4 (earliest release 2013) com.fasterxml.jackson.core:jackson-core (earliest release 2012) org.slf4j:slf4j-api (earliest release 2006) org.slf4j:slf4j-log4j12 (earliest release 2006) org.apache.logging.log4j:log4j-core (earliest release 2014) commons-io:commons-io (earliest release 2005) javax.servlet:javax.servlet-api (earliest release 2011) org.apache.httpcomponents:httpclient (earliest release 2009) - It does seem like the deprecation warnings introduced in 9 are working—lots of projects have responded by changing their code. A few cases worth special attention: - JUnit 4 was only very recently fixed (and it's in maintenance mode, must clients use 4.11 or 4.12, not 4.13) - Hibernate still has problems until its coming 6.0 release - Kotlin added (!) some constructor calls in 2018, and they're still there - Clojure and ASM were fixed pretty recently Still, it's a positive sign that only one of these projects is making wrapper constructor calls in its latest sources.
Re: Source code analysis: calls to wrapper class constructors
> On Oct 19, 2020, at 6:01 PM, Dan Smith wrote: > > In the context of the Warnings for Value-Based Classes JEP, we're looking for > usages of the deprecated wrapper class constructors ('new Integer(...)', 'new > Double(...)', etc.). When do these get used? How often is this motivated by > wanting a unique object vs. legacy code that has no particular reason not to > use 'valueOf'? > > We've got some investigations going on at Oracle, including looking at some > open-source projects, but I'm interested in examples from other > companies/projects as well. Please investigate and share what you find! > > I'll reply in a few days with our analysis for the code we look at. So, some conclusions that we've drawn: - In 2020, the constructor calls are fairly pervasive, even in recently released binaries. Removing these constructors may be the most disruptive single API change we've ever made. - The trend is good—serious projects have mostly responded to the deprecation warnings introduced in 9. In 2024 (for example), the picture may be much better. - It is impossible, given the current JVM model for primitive classes, for Integer to both be a primitive class and support 'new java/lang/Integer'. Binaries calling the deprecated constructors simply won't work. - There is a meaningful semantic difference, in terms of promising unique identity, between code that does 'new Integer' and code that does 'Integer.valueOf' or implicitly boxes. The latter is better aligned with the class's future behavior as a primitive class. - Almost no usages care about the identity distinction—they just want boxing. But it's difficult to automatically detect the usages that do. These points lead to the following tentative plan: We'll proceed as planned with deprecation for removal. The message for all Java programs is to stop using these constructors if you want to run in future JDKs. Typically, the simplest refactoring is to replace 'new Integer(x)' with just 'x', but in some cases you might want 'Integer.valueOf(x)'. When you refactor, you should confirm that the change in identity semantics is acceptable. We'll encourage IDEs to provide tooling for updating sources, including batch processing. (As an example, IntelliJ already highlights deprecated APIs and suggests conversions, although I'm not familiar with its batch processing features.) For legacy binaries that have not been updated but that need to run on a future JDK, we'll provide tooling to rewrite their bytecode containing constructor calls, either as a preprocessing step on jar files, or as a runtime command-line argument. This tooling will support common bytecode patterns like 'new Foo; dup; ...; invokespecial Foo.;', but will not be a comprehensive solution. (Mimicking the behavior of instance initialization method invocation in full generality would be a very difficult task.) It will also carry behavioral compatibility risks. For these reasons, it will be available as a workaround, not as default JVM behavior.
Re: Source code analysis: calls to wrapper class constructors
- Mail original - > De: "daniel smith" > À: "valhalla-spec-experts" > Envoyé: Mardi 27 Octobre 2020 20:27:39 > Objet: Re: Source code analysis: calls to wrapper class constructors >> On Oct 19, 2020, at 6:01 PM, Dan Smith wrote: >> >> In the context of the Warnings for Value-Based Classes JEP, we're looking for >> usages of the deprecated wrapper class constructors ('new Integer(...)', 'new >> Double(...)', etc.). When do these get used? How often is this motivated by >> wanting a unique object vs. legacy code that has no particular reason not to >> use 'valueOf'? >> >> We've got some investigations going on at Oracle, including looking at some >> open-source projects, but I'm interested in examples from other >> companies/projects as well. Please investigate and share what you find! >> >> I'll reply in a few days with our analysis for the code we look at. > > So, some conclusions that we've drawn: > > - In 2020, the constructor calls are fairly pervasive, even in recently > released > binaries. Removing these constructors may be the most disruptive single API > change we've ever made. > > - The trend is good—serious projects have mostly responded to the deprecation > warnings introduced in 9. In 2024 (for example), the picture may be much > better. > > - It is impossible, given the current JVM model for primitive classes, for > Integer to both be a primitive class and support 'new java/lang/Integer'. > Binaries calling the deprecated constructors simply won't work. > > - There is a meaningful semantic difference, in terms of promising unique > identity, between code that does 'new Integer' and code that does > 'Integer.valueOf' or implicitly boxes. The latter is better aligned with the > class's future behavior as a primitive class. > > - Almost no usages care about the identity distinction—they just want boxing. > But it's difficult to automatically detect the usages that do. > > These points lead to the following tentative plan: > > We'll proceed as planned with deprecation for removal. The message for all > Java > programs is to stop using these constructors if you want to run in future > JDKs. > Typically, the simplest refactoring is to replace 'new Integer(x)' with just > 'x', but in some cases you might want 'Integer.valueOf(x)'. When you refactor, > you should confirm that the change in identity semantics is acceptable. > > We'll encourage IDEs to provide tooling for updating sources, including batch > processing. (As an example, IntelliJ already highlights deprecated APIs and > suggests conversions, although I'm not familiar with its batch processing > features.) > > For legacy binaries that have not been updated but that need to run on a > future > JDK, we'll provide tooling to rewrite their bytecode containing constructor > calls, either as a preprocessing step on jar files, or as a runtime > command-line argument. > > This tooling will support common bytecode patterns like 'new Foo; dup; ...; > invokespecial Foo.;', but will not be a comprehensive solution. > (Mimicking the behavior of instance initialization method invocation in full > generality would be a very difficult task.) It will also carry behavioral > compatibility risks. For these reasons, it will be available as a workaround, > not as default JVM behavior. Three remarks: - the compiler can warn because a code is using new Integer(value) or warn because the compiler will automatically transform all new Integer(value) to Integer.valueOf() once Valhalla is integrated, i prefer the second solution, having a warning message saying that in the future new Integer() will not be supported and that all new Integer(...) will be transformed by the compiler automatically to Integer.valueOf(). - the introduction of the strong encapsulation in 9 was a very similar challenge, the first thing to do is to raise awareness, having a warning at runtime (so emitted by the VM) per callsite using new Wrapper(...) will help a lot. People can detect easily if they are using a dependency that use something that will not be backward compatible (this warning should be emitted by Java 16+, because there is no point to wait and because of JEP 396, there will be a second wave of people wanting to update the library they maintain, so they can fixed both issues in one pass. - IDE should inspect the jars downloaded by Maven and Gradle, and report the use of deprecated for removal APIs, again to raise awareness, a warning directly on the tag dependency in the POM saying that this dependency (or one of it's sub-dependencies) is using deprecated for removal APIs will help a lot. Rémi
Re: Source code analysis: calls to wrapper class constructors
> On Oct 27, 2020, at 2:00 PM, Remi Forax wrote: > > Three remarks: > - the compiler can warn because a code is using new Integer(value) or warn > because the compiler will automatically transform all new Integer(value) to > Integer.valueOf() once Valhalla is integrated, > i prefer the second solution, having a warning message saying that in the > future new Integer() will not be supported and that all new Integer(...) will > be transformed by the compiler automatically to Integer.valueOf(). I think the idea here is that if the compiler were doing this today, it would reduce the amount of binaries that will fail in the future. But... it won't eliminate the problem. I think we'd still need to do something about binaries that are *already* published (e.g., JUnit 4.12), or that target a pre-16 JDK. So given that a workaround for binaries will exist, there's not a lot to be gained by complicating the source-level story. And in the minus column, some areas of concern: - We'd have to change the language in 16 to special-case what class instance creation expressions mean. - We'd be breaking behavioral compatibility, with no way to opt out (if, say, you need unique identities, and don't plan to deploy on future JDKs). - There's no change in the compile-time experience—using either deprecation or these special warnings, you'd be stuck with warnings until you rewrote your code. Treating this as a standard application of deprecation, with behavioral changes triggered by source code changes, seems like a more attractive solution. > - the introduction of the strong encapsulation in 9 was a very similar > challenge, the first thing to do is to raise awareness, having a warning at > runtime (so emitted by the VM) per callsite using new Wrapper(...) > will help a lot. People can detect easily if they are using a dependency > that use something that will not be backward compatible (this warning should > be emitted by Java 16+, because there is no point to wait > and because of JEP 396, there will be a second wave of people wanting to > update the library they maintain, so they can fixed both issues in one pass. Besides javac, one out-of-the-box tool we already have is jdeprscan, which since 9 reports any wrapper constructor calls in a given jar file. I'm not sure whether there's a mechanism in HotSpot to generate warnings about deprecated APIs at link/run time. It does seem like it would be a reasonable feature... > - IDE should inspect the jars downloaded by Maven and Gradle, and report the > use of deprecated for removal APIs, again to raise awareness, > a warning directly on the tag dependency in the POM saying that this > dependency (or one of it's sub-dependencies) is using deprecated for removal > APIs will help a lot. Yes, IDEs providing their own jdeprscan-style diagnostics would be quite helpful.
Re: Source code analysis: calls to wrapper class constructors
On Oct 27, 2020, at 1:36 PM, Dan Smith wrote: > > I'm not sure whether there's a mechanism in HotSpot to generate warnings > about deprecated APIs at link/run time. It does seem like it would be a > reasonable feature... +1 There is no such feature at present; maybe something could be built on top of the debugger interface. A quick look at the event index [1] does not turn up dynamic linkage (resolution) events, which would be the obvious place to start. [1]: https://docs.oracle.com/javase/1.5.0/docs/guide/jvmti/jvmti.html#EventIndex
Re: Source code analysis: calls to wrapper class constructors
On Oct 27, 2020, at 12:27 PM, Dan Smith wrote: > > This tooling will support common bytecode patterns like 'new Foo; dup; ...; > invokespecial Foo.;', but will not be a comprehensive solution. > (Mimicking the behavior of instance initialization method invocation in full > generality would be a very difficult task.) One of the reasons it’s not going to be comprehensive is code like new Integer(complicatedExpr()), in which the `new` and `invokespecial ` are separated by (almost) arbitrarily complex bytecode. The two instructions don’t even have to be in the same basic block (at the bytecode level): new Integer(foo() ? bar() : baz()) // compiles to 4 BB’s in a diamond If we add switch expressions with large sub-blocks, I think we get peak separation of the start and end parts of the new/init dance: new Integer(switch (x) { case 1 -> { complicatedBlock: try { … } catch ... ; return 0; default -> { for (;;) … }} ) All of this gives me yet one more reason we would have been better off with factory methods instead of open-coding the new/init dance. It was, in hindsight, a false economy to open code the object creation “guts” instead of putting them in factory API points. And with an eye toward future evolutions of legacy code (legacy code not yet in existence!), and uniformity with the factory methods of inline classes, let’s try harder to get rid of the new/init dance for identity objects. — John
Re: Source code analysis: calls to wrapper class constructors
- Mail original - > De: "John Rose" > À: "daniel smith" > Cc: "valhalla-spec-experts" > Envoyé: Mercredi 28 Octobre 2020 05:56:29 > Objet: Re: Source code analysis: calls to wrapper class constructors > On Oct 27, 2020, at 12:27 PM, Dan Smith wrote: >> >> This tooling will support common bytecode patterns like 'new Foo; dup; ...; >> invokespecial Foo.;', but will not be a comprehensive solution. >> (Mimicking the behavior of instance initialization method invocation in full >> generality would be a very difficult task.) > > One of the reasons it’s not going to be comprehensive > is code like new Integer(complicatedExpr()), in which > the `new` and `invokespecial ` are separated > by (almost) arbitrarily complex bytecode. The two > instructions don’t even have to be in the same basic > block (at the bytecode level): > > new Integer(foo() ? bar() : baz()) > // compiles to 4 BB’s in a diamond > > If we add switch expressions with large sub-blocks, > I think we get peak separation of the start and > end parts of the new/init dance: > > new Integer(switch (x) { > case 1 -> { complicatedBlock: try { … } catch ... ; return 0; > default -> { for (;;) … }} ) > > All of this gives me yet one more reason we would have > been better off with factory methods instead of > open-coding the new/init dance. It was, in hindsight, > a false economy to open code the object creation “guts” > instead of putting them in factory API points. > > And with an eye toward future evolutions of legacy code > (legacy code not yet in existence!), and uniformity with > the factory methods of inline classes, let’s try harder > to get rid of the new/init dance for identity objects. I believe there is a quick and dirty trick, replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf (I)Ljava/lang/Integer; It has to be done after the code is verified because the new execution doesn't push java/lang/Integer on the stack anymore before calling the arbitrary init expression thus any StackMapTables in between the NOPs and INVOKESTATIC are invalid. > > — John Rémi
Re: Source code analysis: calls to wrapper class constructors
- Mail original - > De: "daniel smith" > À: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoyé: Mardi 27 Octobre 2020 21:36:21 > Objet: Re: Source code analysis: calls to wrapper class constructors >> On Oct 27, 2020, at 2:00 PM, Remi Forax wrote: >> >> Three remarks: >> - the compiler can warn because a code is using new Integer(value) or warn >> because the compiler will automatically transform all new Integer(value) to >> Integer.valueOf() once Valhalla is integrated, >> i prefer the second solution, having a warning message saying that in the >> future >> new Integer() will not be supported and that all new Integer(...) will be >> transformed by the compiler automatically to Integer.valueOf(). > > I think the idea here is that if the compiler were doing this today, it would > reduce the amount of binaries that will fail in the future. But... it won't > eliminate the problem. I think we'd still need to do something about binaries > that are *already* published (e.g., JUnit 4.12), or that target a pre-16 JDK. > > So given that a workaround for binaries will exist, there's not a lot to be > gained by complicating the source-level story. And in the minus column, some > areas of concern: > > - We'd have to change the language in 16 to special-case what class instance > creation expressions mean. > > - We'd be breaking behavioral compatibility, with no way to opt out (if, say, > you need unique identities, and don't plan to deploy on future JDKs). > > - There's no change in the compile-time experience—using either deprecation or > these special warnings, you'd be stuck with warnings until you rewrote your > code. > > Treating this as a standard application of deprecation, with behavioral > changes > triggered by source code changes, seems like a more attractive solution. [...] If the VM change all new Integer(x) to Integer.valueOf(x), see my other mail on how to do it, then i think the language should mirror the VM behavior. The warning doesn't have to be that the compiler actually rewrite new Integer(x) to Integer.valueOf(x) but that in the future the compiler will rewrite new Integer(x) to Integer.valueOf(x). The rewrite by the compiler has to be done at the same time ==/acmp transition from meaning pointer comparison to meaning primitive object comparison is both sides are primitive objects. The idea is that at a point in time, doing a new on a primitive object class is illegal, so the verifier will reject any use of "new java/lang/Integer" At that time, - the compîler will rewrite new Integer(x) to Integer.valueOf(x), so the generated class is valid and the source code is still valid but the semantics has slightly change hence the warning. - the VM will rewrite new Integer(x) to Integer.valueOf(x) for all classfile with a version lower than the current version. It's a better plan because it means that Java 1.0+ code still compile and that the compiler behavior and the VM behavior are aligned. It works because the semantics of the primive object subtype of java.lang.Integer and the semantics of java.lang.Integer has an identity object are mostly similar. It also means that the warning on synchronized(Integer) will now be an error. regards, Rémi
Re: Source code analysis: calls to wrapper class constructors
> On Oct 27, 2020, at 10:56 PM, John Rose wrote: > > One of the reasons it’s not going to be comprehensive > is code like new Integer(complicatedExpr()), in which > the `new` and `invokespecial ` are separated > by (almost) arbitrarily complex bytecode. > On Oct 28, 2020, at 3:25 AM, Remi Forax wrote: > > I believe there is a quick and dirty trick, > replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL > java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf > (I)Ljava/lang/Integer; > > It has to be done after the code is verified because the new execution > doesn't push java/lang/Integer on the stack anymore before calling the > arbitrary init expression thus any StackMapTables in between the NOPs and > INVOKESTATIC are invalid. Don't forget the 'dup'. We're assuming a 'new' immediately followed by 'dup' (4 nops), and code that will eventually consume the second one and leave the first one fully-initialized. You're right that this disrupts verification; I think we can address this pre-verification by rewriting the StackMapTable, eliminating all references to 'uninitialized(Offset)' and shrinking the stack by two. The bigger limitation, which I don't think you run into in any javac-generated code, is that you can put a copy of the uninitialized object reference anywhere you want—in locals, duplicated 15 times on the stack, etc. That's the point where I'm guessing we give up. So, there's a tractable rewrite for any code with the shape: new java/lang/Integer; dup; ... [ad hoc computation, as long as it doesn't touch the two uninitialized Integer refs] invokespecial java/lang/Integer.(...)V;
Re: Source code analysis: calls to wrapper class constructors
- Mail original - > De: "daniel smith" > À: "Remi Forax" , "John Rose" > Cc: "valhalla-spec-experts" > Envoyé: Mercredi 28 Octobre 2020 16:49:49 > Objet: Re: Source code analysis: calls to wrapper class constructors >> On Oct 27, 2020, at 10:56 PM, John Rose wrote: >> >> One of the reasons it’s not going to be comprehensive >> is code like new Integer(complicatedExpr()), in which >> the `new` and `invokespecial ` are separated >> by (almost) arbitrarily complex bytecode. > > >> On Oct 28, 2020, at 3:25 AM, Remi Forax wrote: >> >> I believe there is a quick and dirty trick, >> replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL >> java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf >> (I)Ljava/lang/Integer; >> >> It has to be done after the code is verified because the new execution >> doesn't >> push java/lang/Integer on the stack anymore before calling the arbitrary init >> expression thus any StackMapTables in between the NOPs and INVOKESTATIC are >> invalid. > > Don't forget the 'dup'. We're assuming a 'new' immediately followed by 'dup' > (4 > nops), and code that will eventually consume the second one and leave the > first > one fully-initialized. yes, i forget the DUP :) > > You're right that this disrupts verification; I think we can address this > pre-verification by rewriting the StackMapTable, eliminating all references to > 'uninitialized(Offset)' and shrinking the stack by two. Another solution, replace NEW [ref] DUP by NOP INVOKESTATIC java/lang/Integer giveMeAFakeInteger ()Ljava/lang/Integer; and replace the INVOKESPECIAL by an INVOKE_STATIC java/lang/Integer trampoline (Ljava/lang/Integer;I) with the method giveMeAFakeInteger returning a special Integer (can be null, maybe?) the method trampoline calling Integer.valueOf(). > > The bigger limitation, which I don't think you run into in any javac-generated > code, is that you can put a copy of the uninitialized object reference > anywhere > you want—in locals, duplicated 15 times on the stack, etc. That's the point > where I'm guessing we give up. > > So, there's a tractable rewrite for any code with the shape: > > new java/lang/Integer; > dup; > ... [ad hoc computation, as long as it doesn't touch the two uninitialized > Integer refs] > invokespecial java/lang/Integer.(...)V; Yep, some codes not generated by javac or ecj will fail. Rémi
Re: Source code analysis: calls to wrapper class constructors
Please accept the Tiger Woods Code Golf award for that one! It only works if the “dup” output (after “new”) is still contiguous on the stack. That won’t be true if javac for some reason spilled the result of “new” to a local instead of holding it on stack. IIRC one reason to spill from stack to locals during expression evaluation is if there is some kind of complicated control flow inside the expression. Different javac’s historically have different policies about stuff like that. > On Oct 28, 2020, at 4:25 AM, Remi Forax wrote: > > - Mail original - >> De: "John Rose" >> À: "daniel smith" >> Cc: "valhalla-spec-experts" >> Envoyé: Mercredi 28 Octobre 2020 05:56:29 >> Objet: Re: Source code analysis: calls to wrapper class constructors > >> On Oct 27, 2020, at 12:27 PM, Dan Smith wrote: >>> >>> This tooling will support common bytecode patterns like 'new Foo; dup; ...; >>> invokespecial Foo.;', but will not be a comprehensive solution. >>> (Mimicking the behavior of instance initialization method invocation in full >>> generality would be a very difficult task.) >> >> One of the reasons it’s not going to be comprehensive >> is code like new Integer(complicatedExpr()), in which >> the `new` and `invokespecial ` are separated >> by (almost) arbitrarily complex bytecode. The two >> instructions don’t even have to be in the same basic >> block (at the bytecode level): >> >> new Integer(foo() ? bar() : baz()) >> // compiles to 4 BB’s in a diamond >> >> If we add switch expressions with large sub-blocks, >> I think we get peak separation of the start and >> end parts of the new/init dance: >> >> new Integer(switch (x) { >> case 1 -> { complicatedBlock: try { … } catch ... ; return 0; >> default -> { for (;;) … }} ) >> >> All of this gives me yet one more reason we would have >> been better off with factory methods instead of >> open-coding the new/init dance. It was, in hindsight, >> a false economy to open code the object creation “guts” >> instead of putting them in factory API points. >> >> And with an eye toward future evolutions of legacy code >> (legacy code not yet in existence!), and uniformity with >> the factory methods of inline classes, let’s try harder >> to get rid of the new/init dance for identity objects. > > I believe there is a quick and dirty trick, > replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL > java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf > (I)Ljava/lang/Integer; > > It has to be done after the code is verified because the new execution > doesn't push java/lang/Integer on the stack anymore before calling the > arbitrary init expression thus any StackMapTables in between the NOPs and > INVOKESTATIC are invalid. > >> >> — John > > Rémi
Re: Source code analysis: calls to wrapper class constructors
On Oct 28, 2020, at 10:49 AM, Dan Smith wrote: > > You're right that this disrupts verification; I think we can address this > pre-verification by rewriting the StackMapTable, eliminating all references > to 'uninitialized(Offset)' and shrinking the stack by two. Or we can try to keep the verification as-is by emulating the stack effects. This requires inserting instructions, I think, but avoids reshaping the stack. Maybe: new Integer; dup; …(stuff that pushes int)…; invokespecial Integer.(int)V ⇒ ldc_w (String)dummy; dup; …(stuff that pushes int)…; invokestatic Integer.valueOf(int)Integer; swap; pop; swap; pop Maybe use a helper which can “gobble up” the stack junk in one go: invokestatic Integer.valueOf(int)Integer; swap; pop; swap; pop ⇒ invokestatic Integer.$pop2$valueOf(String,String,int)Integer If the dummy value has migrated somewhere random, it could be picked up and popped: new Integer; astore L42; …(stuff that pushes int)…; aload L42; invokespecial Integer.(int)V ⇒ ldc_w (String)dummy; astore L42; …(stuff that pushes int)…; invokestatic Integer.valueOf(int)Integer; swap; pop; aload L42; pop As a further improvement on this theme, note that the dummy always has two copies, one to feed to invokespecial and one to return to the user. The one to return to the user might be at TOS, or it might be elsewhere (in L42 or deeper on stack). We could do a peephole transform which finds the bytecodes that pull up the dummy value, move them *before* the $pop2$valueOf helper, and the net size change of bytecodes is zero. The location of the invokespecial might move a byte or two later. So: new Integer; …(stuff like dup that stores a duplicate ref)…; …(stuff that leaves the new ref on stack, then pushes the int)…; invokespecial Integer.(int)V; …(unrelated stuff)… …(stuff that ensures the replicate ref is now at TOS)… ⇒ ldc_w (String)dummy; …(same stuff like dup that stores a duplicate ref)…; …(same stuff that leaves the new ref on stack, then pushes the int)…; …(same stuff that ensures the replicate ref is now at TOS, but moved before the invoke)…; invokestatic Integer.$pop2$valueOf(Object,int)V; …(same unrelated stuff)… This more elaborate scheme works for both the simple “dup” case and for the more complicated “astore L42” case. I don’t think it requires changing stack maps. Hours of educational play for nerds 14 and up! > The bigger limitation, which I don't think you run into in any > javac-generated code, is that you can put a copy of the uninitialized object > reference anywhere you want—in locals, duplicated 15 times on the stack, etc. > That's the point where I'm guessing we give up.
Re: Source code analysis: calls to wrapper class constructors
On Oct 28, 2020, at 2:32 PM, John Rose wrote: > > invokestatic Integer.$pop2$valueOf(Object,int)V That would be invokestatic Integer.$pop2$valueOf(String,int,String)V And the dummy object could be an Integer (using a condy) if we don’t want to edit the stack maps that might mention the Integer. They might be present if the integer expression contains control flow. So, invokestatic Integer.$pop2$valueOf(Integer,int,Integer)V
Re: Source code analysis: calls to wrapper class constructors
To summarize, it's possible to rewrite the NEW + DUP + INVOKESPECIAL sequence and it may also be possible to rewrite other sequence like NEW + STORE + LOAD + INVOKESPECIAL + LOAD or other combination by loading with LDC condy using a fake Integer and using an INVOKESTATIC with more parameter so there is no need to change the StackMapFrames. We may still have some bytecode shapes we don't support but it worth a try. Rémi > De: "John Rose" > À: "daniel smith" > Cc: "Remi Forax" , "valhalla-spec-experts" > > Envoyé: Mercredi 28 Octobre 2020 20:36:59 > Objet: Re: Source code analysis: calls to wrapper class constructors > On Oct 28, 2020, at 2:32 PM, John Rose < [ mailto:john.r.r...@oracle.com | > john.r.r...@oracle.com ] > wrote: >> invokestatic Integer.$pop2$valueOf(Object,int)V > That would be invokestatic Integer.$pop2$valueOf(String,int,String)V > And the dummy object could be an Integer (using a condy) if we don’t > want to edit the stack maps that might mention the Integer. They > might be present if the integer expression contains control flow. > So, invokestatic Integer.$pop2$valueOf(Integer,int,Integer)V
Re: Source code analysis: calls to wrapper class constructors
> De: "John Rose" > À: "Remi Forax" > Cc: "Valhalla Expert Group Observers" > , "daniel smith" > , "valhalla-spec-experts" > > Envoyé: Mercredi 28 Octobre 2020 20:08:30 > Objet: Re: Source code analysis: calls to wrapper class constructors > Please accept the Tiger Woods Code Golf award for that one! Did i win something, a free t-shirt :) > It only works if the “dup” output (after “new”) is still contiguous > on the stack. That won’t be true if javac for some reason spilled > the result of “new” to a local instead of holding it on stack. > IIRC one reason to spill from stack to locals during expression > evaluation is if there is some kind of complicated control flow > inside the expression. Different javac’s historically have > different policies about stuff like that. I've never seen such bytecode shapes but I don't think i've ever seen a classfile compiled with a version which was less that Java 1.2. Rémi >> On Oct 28, 2020, at 4:25 AM, Remi Forax < [ mailto:fo...@univ-mlv.fr | >> fo...@univ-mlv.fr ] > wrote: >> - Mail original - >>> De: "John Rose" < [ mailto:john.r.r...@oracle.com | john.r.r...@oracle.com >>> ] > >>> À: "daniel smith" < [ mailto:daniel.sm...@oracle.com | >>> daniel.sm...@oracle.com ] >>> > >>> Cc: "valhalla-spec-experts" < [ >>> mailto:valhalla-spec-experts@openjdk.java.net | >>> valhalla-spec-experts@openjdk.java.net ] > >>> Envoyé: Mercredi 28 Octobre 2020 05:56:29 >>> Objet: Re: Source code analysis: calls to wrapper class constructors >>> On Oct 27, 2020, at 12:27 PM, Dan Smith < [ mailto:daniel.sm...@oracle.com | >>> daniel.sm...@oracle.com ] > wrote: >>>> This tooling will support common bytecode patterns like 'new Foo; dup; ...; >>>> invokespecial Foo.;', but will not be a comprehensive solution. >>>> (Mimicking the behavior of instance initialization method invocation in >>>> full >>>> generality would be a very difficult task.) >>> One of the reasons it’s not going to be comprehensive >>> is code like new Integer(complicatedExpr()), in which >>> the `new` and `invokespecial ` are separated >>> by (almost) arbitrarily complex bytecode. The two >>> instructions don’t even have to be in the same basic >>> block (at the bytecode level): >>> new Integer(foo() ? bar() : baz()) >>> // compiles to 4 BB’s in a diamond >>> If we add switch expressions with large sub-blocks, >>> I think we get peak separation of the start and >>> end parts of the new/init dance: >>> new Integer(switch (x) { >>> case 1 -> { complicatedBlock: try { … } catch ... ; return 0; >>> default -> { for (;;) … }} ) >>> All of this gives me yet one more reason we would have >>> been better off with factory methods instead of >>> open-coding the new/init dance. It was, in hindsight, >>> a false economy to open code the object creation “guts” >>> instead of putting them in factory API points. >>> And with an eye toward future evolutions of legacy code >>> (legacy code not yet in existence!), and uniformity with >>> the factory methods of inline classes, let’s try harder >>> to get rid of the new/init dance for identity objects. >> I believe there is a quick and dirty trick, >> replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL >> java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf >> (I)Ljava/lang/Integer; >> It has to be done after the code is verified because the new execution >> doesn't >> push java/lang/Integer on the stack anymore before calling the arbitrary init >> expression thus any StackMapTables in between the NOPs and INVOKESTATIC are >> invalid. >>> — John >> Rémi
Re: Source code analysis: calls to wrapper class constructors
On Oct 28, 2020, at 1:05 PM, fo...@univ-mlv.fr wrote: > > I've never seen such bytecode shapes but I don't think i've ever seen a > classfile compiled with a version which was less that Java 1.2. …I’ve seen bytecode shapes, such bytecode shapes as would freeze the marrow. It was dark and rainy, that night I first confronted the Bytecode Obfuscator. They say he no longer stalks this world, but tell me: Why do I clutch my instruction patterns so tightly? https://owasp.org/www-community/controls/Bytecode_obfuscation