Re: Source code analysis: calls to wrapper class constructors

2020-10-21 Thread Dan Smith
> On Oct 19, 2020, at 6:01 PM, Dan Smith  wrote:
> 
> In the context of the Warnings for Value-Based Classes JEP, we're looking for 
> usages of the deprecated wrapper class constructors ('new Integer(...)', 'new 
> Double(...)', etc.). When do these get used? How often is this motivated by 
> wanting a unique object vs. legacy code that has no particular reason not to 
> use 'valueOf'?
> 
> We've got some investigations going on at Oracle, including looking at some 
> open-source projects, but I'm interested in examples from other 
> companies/projects as well. Please investigate and share what you find!
> 
> I'll reply in a few days with our analysis for the code we look at.

Some initial areas of particular concern:

- How disruptive will it be if legacy jar files stop running on the JVM? How 
often will projects in, say, 2023 depend on binaries that haven't been 
recompiled since 9, when the constructors were deprecated? (We've seen evidence 
that many projects—a lot of Apache APIs for example—fixed up their code in 
response to the deprecation warnings.)

- How disruptive will it be if programs need to update their older shaded 
sources (repackaged sources copied from another project) before they can 
compile? These sources are more likely to go unmaintained, preserving 'new 
Integer' calls from years ago.

Re: Source code analysis: calls to wrapper class constructors

2020-10-22 Thread Dan Smith
Here are some numbers looking at Maven jars published since 2019:

116,532 jars
4,726 jars that invoke a wrapper constructor (4.1%)
1,031 jars that have more than 10 classes invoking a wrapper constructor (0.9%)

If we focus just on the largest projects (jars with >1000 classes):
3,315 jars
1,620 jars that invoke a wrapper constructor (48.9%)
805 jars that have more than 10 classes invoking a wrapper constructor (24.3%)
133 jars that have more than 100 classes invoking a wrapper constructor (4.0%)

Are these intended to be run on bleeding-edge JVMs? Class files show the 
following version breakdown:

Pre-5: 2%
5-6: 34%
7: 15%
8: 46%
11: 2%
Others (non-LTS): 0%

What this doesn't tell us is the *latest* JVM a jar file is intended to 
support. We can at least say that newly-released jars targeting JDK 8 or 
earlier will be with us for quite some time.

The wrapper constructors were deprecated in 9, and the post-8 sample size is 
fairly small, but there's not an obvious trend towards reduced use of wrapper 
constructors in those classes. On the other hand, when you dig into specific 
projects, I *do* see plenty of examples of projects that have updated their 
code, apparently in response to the deprecation warnings.

Some specific examples, chosen pretty much at random among the worst offenders:

- Apache Commons: The legacy commons.lang and commons.collections packages have 
many constructor calls, but these have been fixed in the newer APIs 
commons.lang3 and commons.collections4

- Apache FOP: Lots of constructor calls in generated code (representing fonts); 
these have all been fixed as of FOP 2.2, released in 2017.

- Clover repackages old versions of utility libraries like fastutil and Apache 
Commons, which make heavy use of the wrapper constructors. Both of these 
projects have been updated to use 'valueOf', but the repackaged sources 
preserve the old code.

- The GemFire project has a lot of constructor calls, but almost all appear in 
tests.

- org.omg.CORBA makes heavy use of the constructors, but is also unmaintained 
code as of JEP 320 (I think?). Would there be an expectation that these 
libraries continue to run on a new JVM in a few years?

- OpenCMS is an example of an active project that calls the constructor in what 
appears to be old code, e.g., manually boxing to Object when storing primitives 
in collections.



Re: Source code analysis: calls to wrapper class constructors

2020-10-23 Thread Dan Smith
One more data-gathering exercise: I took a closer look at some popular Maven 
projects to see how they've evolved in their use of wrapper constructors.

-

junit:junit
< 4.12 (<2014): many problems in junit.framework.Assert & org.junit.Assert
4.12-4.13 (2014-2020): a few problems in junit.framework.Assert
4.13.1 (2020): fixed

org.ow2.asm:asm
< 6.0 (<2017): many problems in org.objectweb.asm.ClassReader & Opcodes
6.0 (2017): problem in org.objectweb.asm.Opcodes
6.1+ (2018+): no problems

commons-lang:commons-lang & org.apache.commons:commons-lang3
< 3.0 (<2011): many problems
3.0 (2011): many problems
3.1+ (2011+): fixed

com.fasterxml.jackson.core:jackson-databind
< 2.9.0 (<2017): one call in 
com.fasterxml.jackson.databind.deser.std.NumberDeserializers
2.9.0+ (2017+): fixed

javax.xml.bind:jaxb-api
1.0 (2006): two problems
2.0+ (2006+): fixed

Log4j (log4j:log4j & org.apache.logging.log4j:log4j-core)
< 2.0 (<2014): multiple problems
2.0+ (2014+): no issues

org.mockito:mockito-core
< 2.1.0 (<2016): many problems, including in repackaged ASM
2.1.0+ (2016+): fixed

org.scala-lang:scala-library
< 2.8.0 (<2010): problems in scala.Console$, scala.Predef$, scala.mobile.Code, 
scala.runtime.BoxesUtility
2.8.0-2.9.3 (2010-2013): one issue, 
scala.actors.threadpool.locks.ReentrantReadWriteLock
2.10.0+ (2012+): fixed

org.hibernate:hibernate-core
< 4.0 (<2011): many, many, problems
4.0.0-5.4.22 (2011-2020): a few problems
6.0.0+ (not yet released): fixed

ch.qos.logback:logback-classic
< 1.0.7 (<2012): lots of problems
1.0.7+ (2012): fixed

org.clojure
< 1.10.0 (<2018): repackages ASM, a few other problems
1.1.0.0+ (2018+): fixed

com.google.code.gson
< 1.4 (<2010): problems in com.google.gson.DefaultInstanceCreators & JsonParser
1.4+ (2010+): fixed

org.jetbrains.kotlin:kotlin-stdlib
< 1.3.0 (<2018): no problems
1.3.0+ (2018+): new problems in kotlin.coroutines.jvm.internal.Boxing

No problems:
org.junit.jupiter:junit-jupiter-api (earliest release 2017)
com.google.guava:guava (earliest release 2011)
org.apache.commons.commons-collections4 (earliest release 2013)
com.fasterxml.jackson.core:jackson-core (earliest release 2012)
org.slf4j:slf4j-api (earliest release 2006)
org.slf4j:slf4j-log4j12 (earliest release 2006)
org.apache.logging.log4j:log4j-core (earliest release 2014)
commons-io:commons-io (earliest release 2005)
javax.servlet:javax.servlet-api (earliest release 2011)
org.apache.httpcomponents:httpclient (earliest release 2009)

-

It does seem like the deprecation warnings introduced in 9 are working—lots of 
projects have responded by changing their code.

A few cases worth special attention:

- JUnit 4 was only very recently fixed (and it's in maintenance mode, must 
clients use 4.11 or 4.12, not 4.13)

- Hibernate still has problems until its coming 6.0 release

- Kotlin added (!) some constructor calls in 2018, and they're still there

- Clojure and ASM were fixed pretty recently

Still, it's a positive sign that only one of these projects is making wrapper 
constructor calls in its latest sources.



Re: Source code analysis: calls to wrapper class constructors

2020-10-26 Thread Brian Goetz
Overall, I find this data quite encouraging.  It says that when we 
finally deprecate something, the warnings get noticed, and a lot of uses 
are fixed within a few years.  Surely raising DepCon to FOR_REMOVAL will 
have additional effect.


On 10/23/2020 9:20 PM, Dan Smith wrote:

One more data-gathering exercise: I took a closer look at some popular Maven 
projects to see how they've evolved in their use of wrapper constructors.

-

junit:junit
< 4.12 (<2014): many problems in junit.framework.Assert & org.junit.Assert
4.12-4.13 (2014-2020): a few problems in junit.framework.Assert
4.13.1 (2020): fixed

org.ow2.asm:asm
< 6.0 (<2017): many problems in org.objectweb.asm.ClassReader & Opcodes
6.0 (2017): problem in org.objectweb.asm.Opcodes
6.1+ (2018+): no problems

commons-lang:commons-lang & org.apache.commons:commons-lang3
< 3.0 (<2011): many problems
3.0 (2011): many problems
3.1+ (2011+): fixed

com.fasterxml.jackson.core:jackson-databind
< 2.9.0 (<2017): one call in 
com.fasterxml.jackson.databind.deser.std.NumberDeserializers
2.9.0+ (2017+): fixed

javax.xml.bind:jaxb-api
1.0 (2006): two problems
2.0+ (2006+): fixed

Log4j (log4j:log4j & org.apache.logging.log4j:log4j-core)
< 2.0 (<2014): multiple problems
2.0+ (2014+): no issues

org.mockito:mockito-core
< 2.1.0 (<2016): many problems, including in repackaged ASM
2.1.0+ (2016+): fixed

org.scala-lang:scala-library
< 2.8.0 (<2010): problems in scala.Console$, scala.Predef$, scala.mobile.Code, 
scala.runtime.BoxesUtility
2.8.0-2.9.3 (2010-2013): one issue, 
scala.actors.threadpool.locks.ReentrantReadWriteLock
2.10.0+ (2012+): fixed

org.hibernate:hibernate-core
< 4.0 (<2011): many, many, problems
4.0.0-5.4.22 (2011-2020): a few problems
6.0.0+ (not yet released): fixed

ch.qos.logback:logback-classic
< 1.0.7 (<2012): lots of problems
1.0.7+ (2012): fixed

org.clojure
< 1.10.0 (<2018): repackages ASM, a few other problems
1.1.0.0+ (2018+): fixed

com.google.code.gson
< 1.4 (<2010): problems in com.google.gson.DefaultInstanceCreators & JsonParser
1.4+ (2010+): fixed

org.jetbrains.kotlin:kotlin-stdlib
< 1.3.0 (<2018): no problems
1.3.0+ (2018+): new problems in kotlin.coroutines.jvm.internal.Boxing

No problems:
org.junit.jupiter:junit-jupiter-api (earliest release 2017)
com.google.guava:guava (earliest release 2011)
org.apache.commons.commons-collections4 (earliest release 2013)
com.fasterxml.jackson.core:jackson-core (earliest release 2012)
org.slf4j:slf4j-api (earliest release 2006)
org.slf4j:slf4j-log4j12 (earliest release 2006)
org.apache.logging.log4j:log4j-core (earliest release 2014)
commons-io:commons-io (earliest release 2005)
javax.servlet:javax.servlet-api (earliest release 2011)
org.apache.httpcomponents:httpclient (earliest release 2009)

-

It does seem like the deprecation warnings introduced in 9 are working—lots of 
projects have responded by changing their code.

A few cases worth special attention:

- JUnit 4 was only very recently fixed (and it's in maintenance mode, must 
clients use 4.11 or 4.12, not 4.13)

- Hibernate still has problems until its coming 6.0 release

- Kotlin added (!) some constructor calls in 2018, and they're still there

- Clojure and ASM were fixed pretty recently

Still, it's a positive sign that only one of these projects is making wrapper 
constructor calls in its latest sources.





Re: Source code analysis: calls to wrapper class constructors

2020-10-27 Thread Dan Smith
> On Oct 19, 2020, at 6:01 PM, Dan Smith  wrote:
> 
> In the context of the Warnings for Value-Based Classes JEP, we're looking for 
> usages of the deprecated wrapper class constructors ('new Integer(...)', 'new 
> Double(...)', etc.). When do these get used? How often is this motivated by 
> wanting a unique object vs. legacy code that has no particular reason not to 
> use 'valueOf'?
> 
> We've got some investigations going on at Oracle, including looking at some 
> open-source projects, but I'm interested in examples from other 
> companies/projects as well. Please investigate and share what you find!
> 
> I'll reply in a few days with our analysis for the code we look at.

So, some conclusions that we've drawn:

- In 2020, the constructor calls are fairly pervasive, even in recently 
released binaries. Removing these constructors may be the most disruptive 
single API change we've ever made.

- The trend is good—serious projects have mostly responded to the deprecation 
warnings introduced in 9. In 2024 (for example), the picture may be much better.

- It is impossible, given the current JVM model for primitive classes, for 
Integer to both be a primitive class and support 'new java/lang/Integer'. 
Binaries calling the deprecated constructors simply won't work.

- There is a meaningful semantic difference, in terms of promising unique 
identity, between code that does 'new Integer' and code that does 
'Integer.valueOf' or implicitly boxes. The latter is better aligned with the 
class's future behavior as a primitive class.

- Almost no usages care about the identity distinction—they just want boxing. 
But it's difficult to automatically detect the usages that do.

These points lead to the following tentative plan:

We'll proceed as planned with deprecation for removal. The message for all Java 
programs is to stop using these constructors if you want to run in future JDKs. 
Typically, the simplest refactoring is to replace 'new Integer(x)' with just 
'x', but in some cases you might want 'Integer.valueOf(x)'. When you refactor, 
you should confirm that the change in identity semantics is acceptable.

We'll encourage IDEs to provide tooling for updating sources, including batch 
processing. (As an example, IntelliJ already highlights deprecated APIs and 
suggests conversions, although I'm not familiar with its batch processing 
features.)

For legacy binaries that have not been updated but that need to run on a future 
JDK, we'll provide tooling to rewrite their bytecode containing constructor 
calls, either as a preprocessing step on jar files, or as a runtime 
command-line argument.

This tooling will support common bytecode patterns like 'new Foo; dup; ...; 
invokespecial Foo.;', but will not be a comprehensive solution. 
(Mimicking the behavior of instance initialization method invocation in full 
generality would be a very difficult task.) It will also carry behavioral 
compatibility risks. For these reasons, it will be available as a workaround, 
not as default JVM behavior.



Re: Source code analysis: calls to wrapper class constructors

2020-10-27 Thread Remi Forax
- Mail original -
> De: "daniel smith" 
> À: "valhalla-spec-experts" 
> Envoyé: Mardi 27 Octobre 2020 20:27:39
> Objet: Re: Source code analysis: calls to wrapper class constructors

>> On Oct 19, 2020, at 6:01 PM, Dan Smith  wrote:
>> 
>> In the context of the Warnings for Value-Based Classes JEP, we're looking for
>> usages of the deprecated wrapper class constructors ('new Integer(...)', 'new
>> Double(...)', etc.). When do these get used? How often is this motivated by
>> wanting a unique object vs. legacy code that has no particular reason not to
>> use 'valueOf'?
>> 
>> We've got some investigations going on at Oracle, including looking at some
>> open-source projects, but I'm interested in examples from other
>> companies/projects as well. Please investigate and share what you find!
>> 
>> I'll reply in a few days with our analysis for the code we look at.
> 
> So, some conclusions that we've drawn:
> 
> - In 2020, the constructor calls are fairly pervasive, even in recently 
> released
> binaries. Removing these constructors may be the most disruptive single API
> change we've ever made.
> 
> - The trend is good—serious projects have mostly responded to the deprecation
> warnings introduced in 9. In 2024 (for example), the picture may be much
> better.
> 
> - It is impossible, given the current JVM model for primitive classes, for
> Integer to both be a primitive class and support 'new java/lang/Integer'.
> Binaries calling the deprecated constructors simply won't work.
> 
> - There is a meaningful semantic difference, in terms of promising unique
> identity, between code that does 'new Integer' and code that does
> 'Integer.valueOf' or implicitly boxes. The latter is better aligned with the
> class's future behavior as a primitive class.
> 
> - Almost no usages care about the identity distinction—they just want boxing.
> But it's difficult to automatically detect the usages that do.
> 
> These points lead to the following tentative plan:
> 
> We'll proceed as planned with deprecation for removal. The message for all 
> Java
> programs is to stop using these constructors if you want to run in future 
> JDKs.
> Typically, the simplest refactoring is to replace 'new Integer(x)' with just
> 'x', but in some cases you might want 'Integer.valueOf(x)'. When you refactor,
> you should confirm that the change in identity semantics is acceptable.
> 
> We'll encourage IDEs to provide tooling for updating sources, including batch
> processing. (As an example, IntelliJ already highlights deprecated APIs and
> suggests conversions, although I'm not familiar with its batch processing
> features.)
> 
> For legacy binaries that have not been updated but that need to run on a 
> future
> JDK, we'll provide tooling to rewrite their bytecode containing constructor
> calls, either as a preprocessing step on jar files, or as a runtime
> command-line argument.
> 
> This tooling will support common bytecode patterns like 'new Foo; dup; ...;
> invokespecial Foo.;', but will not be a comprehensive solution.
> (Mimicking the behavior of instance initialization method invocation in full
> generality would be a very difficult task.) It will also carry behavioral
> compatibility risks. For these reasons, it will be available as a workaround,
> not as default JVM behavior.

Three remarks:
- the compiler can warn because a code is using new Integer(value) or warn 
because the compiler will automatically transform all new Integer(value) to 
Integer.valueOf() once Valhalla is integrated,
  i prefer the second solution, having a warning message saying that in the 
future new Integer() will not be supported and that all new Integer(...) will 
be transformed by the compiler automatically to Integer.valueOf(). 
- the introduction of the strong encapsulation in 9 was a very similar 
challenge, the first thing to do is to raise awareness, having a warning at 
runtime (so emitted by the VM) per callsite using new Wrapper(...)
  will help a lot. People can detect easily if they are using a dependency that 
use something that will not be backward compatible (this warning should be 
emitted by Java 16+, because there is no point to wait
  and because of JEP 396, there will be a second wave of people wanting to 
update the library they maintain, so they can fixed both issues in one pass.
- IDE should inspect the jars downloaded by Maven and Gradle, and report the 
use of deprecated for removal APIs, again to raise awareness,
  a warning directly on the tag dependency in the POM saying that this 
dependency (or one of it's sub-dependencies) is using deprecated for removal 
APIs will help a lot. 

Rémi


Re: Source code analysis: calls to wrapper class constructors

2020-10-27 Thread Dan Smith
> On Oct 27, 2020, at 2:00 PM, Remi Forax  wrote:
> 
> Three remarks:
> - the compiler can warn because a code is using new Integer(value) or warn 
> because the compiler will automatically transform all new Integer(value) to 
> Integer.valueOf() once Valhalla is integrated,
>  i prefer the second solution, having a warning message saying that in the 
> future new Integer() will not be supported and that all new Integer(...) will 
> be transformed by the compiler automatically to Integer.valueOf(). 

I think the idea here is that if the compiler were doing this today, it would 
reduce the amount of binaries that will fail in the future. But... it won't 
eliminate the problem. I think we'd still need to do something about binaries 
that are *already* published (e.g., JUnit 4.12), or that target a pre-16 JDK.

So given that a workaround for binaries will exist, there's not a lot to be 
gained by complicating the source-level story. And in the minus column, some 
areas of concern:

- We'd have to change the language in 16 to special-case what class instance 
creation expressions mean.

- We'd be breaking behavioral compatibility, with no way to opt out (if, say, 
you need unique identities, and don't plan to deploy on future JDKs).

- There's no change in the compile-time experience—using either deprecation or 
these special warnings, you'd be stuck with warnings until you rewrote your 
code.

Treating this as a standard application of deprecation, with behavioral changes 
triggered by source code changes, seems like a more attractive solution.

> - the introduction of the strong encapsulation in 9 was a very similar 
> challenge, the first thing to do is to raise awareness, having a warning at 
> runtime (so emitted by the VM) per callsite using new Wrapper(...)
>  will help a lot. People can detect easily if they are using a dependency 
> that use something that will not be backward compatible (this warning should 
> be emitted by Java 16+, because there is no point to wait
>  and because of JEP 396, there will be a second wave of people wanting to 
> update the library they maintain, so they can fixed both issues in one pass.

Besides javac, one out-of-the-box tool we already have is jdeprscan, which 
since 9 reports any wrapper constructor calls in a given jar file.

I'm not sure whether there's a mechanism in HotSpot to generate warnings about 
deprecated APIs at link/run time. It does seem like it would be a reasonable 
feature...

> - IDE should inspect the jars downloaded by Maven and Gradle, and report the 
> use of deprecated for removal APIs, again to raise awareness,
>  a warning directly on the tag dependency in the POM saying that this 
> dependency (or one of it's sub-dependencies) is using deprecated for removal 
> APIs will help a lot. 

Yes, IDEs providing their own jdeprscan-style diagnostics would be quite 
helpful.



Re: Source code analysis: calls to wrapper class constructors

2020-10-27 Thread John Rose
On Oct 27, 2020, at 1:36 PM, Dan Smith  wrote:
> 
> I'm not sure whether there's a mechanism in HotSpot to generate warnings 
> about deprecated APIs at link/run time. It does seem like it would be a 
> reasonable feature...

+1

There is no such feature at present; maybe something
could be built on top of the debugger interface.  A quick
look at the event index [1] does not turn up dynamic
linkage (resolution) events, which would be the obvious
place to start.

[1]: https://docs.oracle.com/javase/1.5.0/docs/guide/jvmti/jvmti.html#EventIndex



Re: Source code analysis: calls to wrapper class constructors

2020-10-27 Thread John Rose
On Oct 27, 2020, at 12:27 PM, Dan Smith  wrote:
> 
> This tooling will support common bytecode patterns like 'new Foo; dup; ...; 
> invokespecial Foo.;', but will not be a comprehensive solution. 
> (Mimicking the behavior of instance initialization method invocation in full 
> generality would be a very difficult task.)

One of the reasons it’s not going to be comprehensive
is code like new Integer(complicatedExpr()), in which
the `new` and `invokespecial ` are separated
by (almost) arbitrarily complex bytecode.  The two
instructions don’t even have to be in the same basic
block (at the bytecode level):

new Integer(foo() ? bar() : baz())
// compiles to 4 BB’s in a diamond

If we add switch expressions with large sub-blocks,
I think we get peak separation of the start and
end parts of the new/init dance:

new Integer(switch (x) {
  case 1 -> { complicatedBlock: try { … } catch ... ; return 0;
  default -> { for (;;) … }} )

All of this gives me yet one more reason we would have
been better off with factory methods instead of
open-coding the new/init dance.  It was, in hindsight,
a false economy to open code the object creation “guts”
instead of putting them in factory API points.

And with an eye toward future evolutions of legacy code
(legacy code not yet in existence!), and uniformity with
the factory methods of inline classes, let’s try harder
to get rid of the new/init dance for identity objects.

— John

Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread Remi Forax
- Mail original -
> De: "John Rose" 
> À: "daniel smith" 
> Cc: "valhalla-spec-experts" 
> Envoyé: Mercredi 28 Octobre 2020 05:56:29
> Objet: Re: Source code analysis: calls to wrapper class constructors

> On Oct 27, 2020, at 12:27 PM, Dan Smith  wrote:
>> 
>> This tooling will support common bytecode patterns like 'new Foo; dup; ...;
>> invokespecial Foo.;', but will not be a comprehensive solution.
>> (Mimicking the behavior of instance initialization method invocation in full
>> generality would be a very difficult task.)
> 
> One of the reasons it’s not going to be comprehensive
> is code like new Integer(complicatedExpr()), in which
> the `new` and `invokespecial ` are separated
> by (almost) arbitrarily complex bytecode.  The two
> instructions don’t even have to be in the same basic
> block (at the bytecode level):
> 
> new Integer(foo() ? bar() : baz())
> // compiles to 4 BB’s in a diamond
> 
> If we add switch expressions with large sub-blocks,
> I think we get peak separation of the start and
> end parts of the new/init dance:
> 
> new Integer(switch (x) {
>  case 1 -> { complicatedBlock: try { … } catch ... ; return 0;
>  default -> { for (;;) … }} )
> 
> All of this gives me yet one more reason we would have
> been better off with factory methods instead of
> open-coding the new/init dance.  It was, in hindsight,
> a false economy to open code the object creation “guts”
> instead of putting them in factory API points.
> 
> And with an eye toward future evolutions of legacy code
> (legacy code not yet in existence!), and uniformity with
> the factory methods of inline classes, let’s try harder
> to get rid of the new/init dance for identity objects.

I believe there is a quick and dirty trick,
replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL 
java/lang/Integer  (I)V by INVOKESTATIC java/lang/Integer valueOf 
(I)Ljava/lang/Integer;

It has to be done after the code is verified because the new execution doesn't 
push java/lang/Integer on the stack anymore before calling the arbitrary init 
expression thus any StackMapTables in between the NOPs and INVOKESTATIC are 
invalid.

> 
> — John

Rémi


Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread forax
- Mail original -
> De: "daniel smith" 
> À: "Remi Forax" 
> Cc: "valhalla-spec-experts" 
> Envoyé: Mardi 27 Octobre 2020 21:36:21
> Objet: Re: Source code analysis: calls to wrapper class constructors

>> On Oct 27, 2020, at 2:00 PM, Remi Forax  wrote:
>> 
>> Three remarks:
>> - the compiler can warn because a code is using new Integer(value) or warn
>> because the compiler will automatically transform all new Integer(value) to
>> Integer.valueOf() once Valhalla is integrated,
>>  i prefer the second solution, having a warning message saying that in the 
>> future
>>  new Integer() will not be supported and that all new Integer(...) will be
>>  transformed by the compiler automatically to Integer.valueOf().
> 
> I think the idea here is that if the compiler were doing this today, it would
> reduce the amount of binaries that will fail in the future. But... it won't
> eliminate the problem. I think we'd still need to do something about binaries
> that are *already* published (e.g., JUnit 4.12), or that target a pre-16 JDK.
> 
> So given that a workaround for binaries will exist, there's not a lot to be
> gained by complicating the source-level story. And in the minus column, some
> areas of concern:
> 
> - We'd have to change the language in 16 to special-case what class instance
> creation expressions mean.
> 
> - We'd be breaking behavioral compatibility, with no way to opt out (if, say,
> you need unique identities, and don't plan to deploy on future JDKs).
> 
> - There's no change in the compile-time experience—using either deprecation or
> these special warnings, you'd be stuck with warnings until you rewrote your
> code.
> 
> Treating this as a standard application of deprecation, with behavioral 
> changes
> triggered by source code changes, seems like a more attractive solution.

[...]

If the VM change all new Integer(x) to Integer.valueOf(x), see my other mail on 
how to do it, then i think the language should mirror the VM behavior.

The warning doesn't have to be that the compiler actually rewrite new 
Integer(x) to Integer.valueOf(x) but that in the future the compiler will 
rewrite new Integer(x) to Integer.valueOf(x).
The rewrite by the compiler has to be done at the same time ==/acmp transition 
from meaning pointer comparison to meaning primitive object comparison is both 
sides are primitive objects.

The idea is that at a point in time, doing a new on a primitive object class is 
illegal, so the verifier will reject any use of "new java/lang/Integer"
At that time,
 - the compîler will rewrite new Integer(x) to Integer.valueOf(x), so the 
generated class is valid and the source code is still valid but the semantics 
has slightly change hence the warning.
 - the VM will rewrite new Integer(x) to Integer.valueOf(x) for all classfile 
with a version lower than the current version.

It's a better plan because it means that Java 1.0+ code still compile and that 
the compiler behavior and the VM behavior are aligned.

It works because the semantics of the primive object subtype of 
java.lang.Integer and the semantics of java.lang.Integer has an identity object 
are mostly similar.
It also means that the warning on synchronized(Integer) will now be an error.

regards,
Rémi


Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread Dan Smith
> On Oct 27, 2020, at 10:56 PM, John Rose  wrote:
> 
> One of the reasons it’s not going to be comprehensive
> is code like new Integer(complicatedExpr()), in which
> the `new` and `invokespecial ` are separated
> by (almost) arbitrarily complex bytecode.


> On Oct 28, 2020, at 3:25 AM, Remi Forax  wrote:
> 
> I believe there is a quick and dirty trick,
> replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL 
> java/lang/Integer  (I)V by INVOKESTATIC java/lang/Integer valueOf 
> (I)Ljava/lang/Integer;
> 
> It has to be done after the code is verified because the new execution 
> doesn't push java/lang/Integer on the stack anymore before calling the 
> arbitrary init expression thus any StackMapTables in between the NOPs and 
> INVOKESTATIC are invalid.

Don't forget the 'dup'. We're assuming a 'new' immediately followed by 'dup' (4 
nops), and code that will eventually consume the second one and leave the first 
one fully-initialized.

You're right that this disrupts verification; I think we can address this 
pre-verification by rewriting the StackMapTable, eliminating all references to 
'uninitialized(Offset)' and shrinking the stack by two.

The bigger limitation, which I don't think you run into in any javac-generated 
code, is that you can put a copy of the uninitialized object reference anywhere 
you want—in locals, duplicated 15 times on the stack, etc. That's the point 
where I'm guessing we give up.

So, there's a tractable rewrite for any code with the shape:

new java/lang/Integer;
dup;
... [ad hoc computation, as long as it doesn't touch the two uninitialized 
Integer refs]
invokespecial java/lang/Integer.(...)V;



Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread forax
- Mail original -
> De: "daniel smith" 
> À: "Remi Forax" , "John Rose" 
> Cc: "valhalla-spec-experts" 
> Envoyé: Mercredi 28 Octobre 2020 16:49:49
> Objet: Re: Source code analysis: calls to wrapper class constructors

>> On Oct 27, 2020, at 10:56 PM, John Rose  wrote:
>> 
>> One of the reasons it’s not going to be comprehensive
>> is code like new Integer(complicatedExpr()), in which
>> the `new` and `invokespecial ` are separated
>> by (almost) arbitrarily complex bytecode.
> 
> 
>> On Oct 28, 2020, at 3:25 AM, Remi Forax  wrote:
>> 
>> I believe there is a quick and dirty trick,
>> replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL
>> java/lang/Integer  (I)V by INVOKESTATIC java/lang/Integer valueOf
>> (I)Ljava/lang/Integer;
>> 
>> It has to be done after the code is verified because the new execution 
>> doesn't
>> push java/lang/Integer on the stack anymore before calling the arbitrary init
>> expression thus any StackMapTables in between the NOPs and INVOKESTATIC are
>> invalid.
> 
> Don't forget the 'dup'. We're assuming a 'new' immediately followed by 'dup' 
> (4
> nops), and code that will eventually consume the second one and leave the 
> first
> one fully-initialized.

yes, i forget the DUP :)

> 
> You're right that this disrupts verification; I think we can address this
> pre-verification by rewriting the StackMapTable, eliminating all references to
> 'uninitialized(Offset)' and shrinking the stack by two.

Another solution, replace NEW [ref] DUP by NOP INVOKESTATIC java/lang/Integer 
giveMeAFakeInteger ()Ljava/lang/Integer;
and replace the INVOKESPECIAL by an INVOKE_STATIC java/lang/Integer trampoline 
(Ljava/lang/Integer;I)
with
  the method giveMeAFakeInteger returning a special Integer (can be null, 
maybe?)
  the method trampoline calling Integer.valueOf().

> 
> The bigger limitation, which I don't think you run into in any javac-generated
> code, is that you can put a copy of the uninitialized object reference 
> anywhere
> you want—in locals, duplicated 15 times on the stack, etc. That's the point
> where I'm guessing we give up.
> 
> So, there's a tractable rewrite for any code with the shape:
> 
> new java/lang/Integer;
> dup;
> ... [ad hoc computation, as long as it doesn't touch the two uninitialized
> Integer refs]
> invokespecial java/lang/Integer.(...)V;

Yep, some codes not generated by javac or ecj will fail.

Rémi



Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread John Rose
Please accept the Tiger Woods Code Golf award for that one!

It only works if the “dup” output (after “new”) is still contiguous
on the stack.  That won’t be true if javac for some reason spilled
the result of “new” to a local instead of holding it on stack.

IIRC one reason to spill from stack to locals during expression
evaluation is if there is some kind of complicated control flow
inside the expression.  Different javac’s historically have
different policies about stuff like that.

> On Oct 28, 2020, at 4:25 AM, Remi Forax  wrote:
> 
> - Mail original -
>> De: "John Rose" 
>> À: "daniel smith" 
>> Cc: "valhalla-spec-experts" 
>> Envoyé: Mercredi 28 Octobre 2020 05:56:29
>> Objet: Re: Source code analysis: calls to wrapper class constructors
> 
>> On Oct 27, 2020, at 12:27 PM, Dan Smith  wrote:
>>> 
>>> This tooling will support common bytecode patterns like 'new Foo; dup; ...;
>>> invokespecial Foo.;', but will not be a comprehensive solution.
>>> (Mimicking the behavior of instance initialization method invocation in full
>>> generality would be a very difficult task.)
>> 
>> One of the reasons it’s not going to be comprehensive
>> is code like new Integer(complicatedExpr()), in which
>> the `new` and `invokespecial ` are separated
>> by (almost) arbitrarily complex bytecode.  The two
>> instructions don’t even have to be in the same basic
>> block (at the bytecode level):
>> 
>> new Integer(foo() ? bar() : baz())
>> // compiles to 4 BB’s in a diamond
>> 
>> If we add switch expressions with large sub-blocks,
>> I think we get peak separation of the start and
>> end parts of the new/init dance:
>> 
>> new Integer(switch (x) {
>> case 1 -> { complicatedBlock: try { … } catch ... ; return 0;
>> default -> { for (;;) … }} )
>> 
>> All of this gives me yet one more reason we would have
>> been better off with factory methods instead of
>> open-coding the new/init dance.  It was, in hindsight,
>> a false economy to open code the object creation “guts”
>> instead of putting them in factory API points.
>> 
>> And with an eye toward future evolutions of legacy code
>> (legacy code not yet in existence!), and uniformity with
>> the factory methods of inline classes, let’s try harder
>> to get rid of the new/init dance for identity objects.
> 
> I believe there is a quick and dirty trick,
> replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL 
> java/lang/Integer  (I)V by INVOKESTATIC java/lang/Integer valueOf 
> (I)Ljava/lang/Integer;
> 
> It has to be done after the code is verified because the new execution 
> doesn't push java/lang/Integer on the stack anymore before calling the 
> arbitrary init expression thus any StackMapTables in between the NOPs and 
> INVOKESTATIC are invalid.
> 
>> 
>> — John
> 
> Rémi



Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread John Rose
On Oct 28, 2020, at 10:49 AM, Dan Smith  wrote:
> 
> You're right that this disrupts verification; I think we can address this 
> pre-verification by rewriting the StackMapTable, eliminating all references 
> to 'uninitialized(Offset)' and shrinking the stack by two.

Or we can try to keep the verification as-is by emulating the stack effects.
This requires inserting instructions, I think, but avoids reshaping the stack.

Maybe:

new Integer; dup; …(stuff that pushes int)…; invokespecial Integer.(int)V
⇒
ldc_w (String)dummy; dup; …(stuff that pushes int)…; invokestatic 
Integer.valueOf(int)Integer; swap; pop; swap; pop

Maybe use a helper which can “gobble up” the stack junk in one go:

invokestatic Integer.valueOf(int)Integer; swap; pop; swap; pop
⇒ 
invokestatic Integer.$pop2$valueOf(String,String,int)Integer

If the dummy value has migrated somewhere random, it could be picked up and 
popped:

new Integer; astore L42; …(stuff that pushes int)…; aload L42; invokespecial 
Integer.(int)V
⇒
ldc_w (String)dummy; astore L42; …(stuff that pushes int)…; invokestatic 
Integer.valueOf(int)Integer; swap; pop; aload L42; pop

As a further improvement on this theme, note that the dummy always has two 
copies, one to feed to invokespecial  and one to return to the user.  The 
one to return to the user might be at TOS, or it might be elsewhere (in L42 or 
deeper on stack).  We could do a peephole transform which finds the bytecodes 
that pull up the dummy value, move them *before* the $pop2$valueOf helper, and 
the net size change of bytecodes is zero.  The location of the invokespecial 
 might move a byte or two later.

So:

new Integer; …(stuff like dup that stores a duplicate ref)…; …(stuff that 
leaves the new ref on stack, then pushes the int)…; invokespecial 
Integer.(int)V; …(unrelated stuff)… …(stuff that ensures the replicate 
ref is now at TOS)…
⇒ 
ldc_w (String)dummy; …(same stuff like dup that stores a duplicate ref)…; 
…(same stuff that leaves the new ref on stack, then pushes the int)…;  …(same 
stuff that ensures the replicate ref is now at TOS, but moved before the 
invoke)…; invokestatic Integer.$pop2$valueOf(Object,int)V; …(same unrelated 
stuff)…

This more elaborate scheme works for both the simple “dup” case and for the 
more complicated “astore L42” case.  I don’t think it requires changing stack 
maps.

Hours of educational play for nerds 14 and up!

> The bigger limitation, which I don't think you run into in any 
> javac-generated code, is that you can put a copy of the uninitialized object 
> reference anywhere you want—in locals, duplicated 15 times on the stack, etc. 
> That's the point where I'm guessing we give up.




Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread John Rose
On Oct 28, 2020, at 2:32 PM, John Rose  wrote:
> 
> invokestatic Integer.$pop2$valueOf(Object,int)V

That would be invokestatic Integer.$pop2$valueOf(String,int,String)V

And the dummy object could be an Integer (using a condy) if we don’t
want to edit the stack maps that might mention the Integer.  They
might be present if the integer expression contains control flow.

So, invokestatic Integer.$pop2$valueOf(Integer,int,Integer)V



Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread forax
To summarize, 
it's possible to rewrite the NEW + DUP + INVOKESPECIAL sequence and it may also 
be possible to rewrite other sequence like NEW + STORE + LOAD + INVOKESPECIAL + 
LOAD or other combination by loading with LDC condy using a fake Integer and 
using an INVOKESTATIC with more parameter so there is no need to change the 
StackMapFrames. 

We may still have some bytecode shapes we don't support but it worth a try. 

Rémi 

> De: "John Rose" 
> À: "daniel smith" 
> Cc: "Remi Forax" , "valhalla-spec-experts"
> 
> Envoyé: Mercredi 28 Octobre 2020 20:36:59
> Objet: Re: Source code analysis: calls to wrapper class constructors

> On Oct 28, 2020, at 2:32 PM, John Rose < [ mailto:john.r.r...@oracle.com |
> john.r.r...@oracle.com ] > wrote:

>> invokestatic Integer.$pop2$valueOf(Object,int)V

> That would be invokestatic Integer.$pop2$valueOf(String,int,String)V

> And the dummy object could be an Integer (using a condy) if we don’t
> want to edit the stack maps that might mention the Integer. They
> might be present if the integer expression contains control flow.

> So, invokestatic Integer.$pop2$valueOf(Integer,int,Integer)V


Re: Source code analysis: calls to wrapper class constructors

2020-10-28 Thread forax
> De: "John Rose" 
> À: "Remi Forax" 
> Cc: "Valhalla Expert Group Observers"
> , "daniel smith"
> , "valhalla-spec-experts"
> 
> Envoyé: Mercredi 28 Octobre 2020 20:08:30
> Objet: Re: Source code analysis: calls to wrapper class constructors

> Please accept the Tiger Woods Code Golf award for that one!
Did i win something, a free t-shirt :) 

> It only works if the “dup” output (after “new”) is still contiguous
> on the stack. That won’t be true if javac for some reason spilled
> the result of “new” to a local instead of holding it on stack.

> IIRC one reason to spill from stack to locals during expression
> evaluation is if there is some kind of complicated control flow
> inside the expression. Different javac’s historically have
> different policies about stuff like that.

I've never seen such bytecode shapes but I don't think i've ever seen a 
classfile compiled with a version which was less that Java 1.2. 

Rémi 

>> On Oct 28, 2020, at 4:25 AM, Remi Forax < [ mailto:fo...@univ-mlv.fr |
>> fo...@univ-mlv.fr ] > wrote:

>> - Mail original -

>>> De: "John Rose" < [ mailto:john.r.r...@oracle.com | john.r.r...@oracle.com 
>>> ] >
>>> À: "daniel smith" < [ mailto:daniel.sm...@oracle.com | 
>>> daniel.sm...@oracle.com ]
>>> >
>>> Cc: "valhalla-spec-experts" < [ 
>>> mailto:valhalla-spec-experts@openjdk.java.net |
>>> valhalla-spec-experts@openjdk.java.net ] >
>>> Envoyé: Mercredi 28 Octobre 2020 05:56:29
>>> Objet: Re: Source code analysis: calls to wrapper class constructors

>>> On Oct 27, 2020, at 12:27 PM, Dan Smith < [ mailto:daniel.sm...@oracle.com |
>>> daniel.sm...@oracle.com ] > wrote:

>>>> This tooling will support common bytecode patterns like 'new Foo; dup; ...;
>>>> invokespecial Foo.;', but will not be a comprehensive solution.
>>>> (Mimicking the behavior of instance initialization method invocation in 
>>>> full
>>>> generality would be a very difficult task.)

>>> One of the reasons it’s not going to be comprehensive
>>> is code like new Integer(complicatedExpr()), in which
>>> the `new` and `invokespecial ` are separated
>>> by (almost) arbitrarily complex bytecode. The two
>>> instructions don’t even have to be in the same basic
>>> block (at the bytecode level):

>>> new Integer(foo() ? bar() : baz())
>>> // compiles to 4 BB’s in a diamond

>>> If we add switch expressions with large sub-blocks,
>>> I think we get peak separation of the start and
>>> end parts of the new/init dance:

>>> new Integer(switch (x) {
>>> case 1 -> { complicatedBlock: try { … } catch ... ; return 0;
>>> default -> { for (;;) … }} )

>>> All of this gives me yet one more reason we would have
>>> been better off with factory methods instead of
>>> open-coding the new/init dance. It was, in hindsight,
>>> a false economy to open code the object creation “guts”
>>> instead of putting them in factory API points.

>>> And with an eye toward future evolutions of legacy code
>>> (legacy code not yet in existence!), and uniformity with
>>> the factory methods of inline classes, let’s try harder
>>> to get rid of the new/init dance for identity objects.

>> I believe there is a quick and dirty trick,
>> replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL
>> java/lang/Integer  (I)V by INVOKESTATIC java/lang/Integer valueOf
>> (I)Ljava/lang/Integer;

>> It has to be done after the code is verified because the new execution 
>> doesn't
>> push java/lang/Integer on the stack anymore before calling the arbitrary init
>> expression thus any StackMapTables in between the NOPs and INVOKESTATIC are
>> invalid.

>>> — John

>> Rémi


Re: Source code analysis: calls to wrapper class constructors

2020-11-03 Thread John Rose
On Oct 28, 2020, at 1:05 PM, fo...@univ-mlv.fr wrote:
> 
> I've never seen such bytecode shapes but I don't think i've ever seen a 
> classfile compiled with a version which was less that Java 1.2.

…I’ve seen bytecode shapes, such bytecode shapes as
would freeze the marrow.

It was dark and rainy, that night I first confronted
the Bytecode Obfuscator.

They say he no longer stalks this world, but tell me:
Why do I clutch my instruction patterns so tightly?

https://owasp.org/www-community/controls/Bytecode_obfuscation