What you are suggesting is what we call "specifying the implementation",
which is generally not a very good idea.
The specification for these methods says "returns a String <with the
following properties>". That's a good specification! It says what you
can expect from the result. It says nothing about the identity of the
returned object, whether it is interned, whether it is aliased, etc.
This is by design. Specifying the implementation not only robs future
implementers of the ability to improve the implementation (which in turn
improves all the code that uses it, without having to even recompile),
and encourages people to do questionable things like make assumptions
about the identity of the returned String.
To describe this as "non-deterministic" is an abuse of the term. What
is going on here is that the specification makes certain promises, and
multiple implementations choose different ways to get there (and any
given implementation could choose different ways in different points in
time.) Similarly, describing it as "inconsistent" is also an abuse; no
one promised a non-aliased object.
If your code wants to make assumptions about interning or lack of
aliasing, you should call "new String(s)".
On 6/19/2022 11:29 PM, Sasi Peri wrote:
Hello,
*Issue details*
toLower() and toUpper() return a new String object sometimes and
sometimes a string literal, based on the input string type (and also
sometimes based on the VM/jdk type)
For example
-HotSpot VM, If the input is a string literal, which *already* has all
“lower case letters”, toLower would return the same string literal, if
not it will convert all letters to lower and returns a new String()
object.
-However, openJ9 (e.g. IBM jdk8 ditsro, ) always returns a new String
object not a literal.
This behavior is non deterministic, inconsistent, you cannot always
predict if the outcome is a new string object OR an interned string
from pool (particularly from unit testing stand point).
*Sample code to show case above behavior*
*package*com.bugs;
*import*java.util.Locale;
*public**class*TestStringFunction
{
* public**static**void*main(String args[])
{
String s1= "abc";
String s2= "ABC";
System.*/out/*.println("----- case: when string already lower
----------");
/testIfEqualsLower/(s1);
System.*/out/*.println("----- case: when string with upper case
----------");
/testIfEqualsLower/(s2);
}
* private**static**void*testIfEqualsLower(String s)
{
* if*(s.toLowerCase() == "abc")
{
System.*/out/*.println("YES - literal");
}
* if*(s.toLowerCase().equals("abc"))
{
System.*/out/*.println("YES – equals func");
}
}
}
*_Out put_*
*__*
----- case: when string already lower ----------
YES - literal
YES – equals func
----- case: when string with upper case ----------
YES - equals func**
**
*Why this could be an issue or bug prone?*
-Suppose an unit test is written, for a method doAThing(), that has
toLower/Upper conversions in the middle of the code somewhere, and
apply logic based on that.
-Though general guidance to compare unknown string (types) is always
using equals, sometimes developers can make a mistake (i.e. suppose
they used == in a unit test)
-If the code review did not catch it, this behavior can cause all unit
tests passed, as long unit test is written with “small case string” as
input.
-It could potentially make it to prod, and can be realized only when
it hits a case when inputstring has all uppercase OR mixed case
letters, which could be after multiple sprints, at the point not
easily detectable.
*Suggestion*
It would be great if we always can make a new String() and return
always a String object not an interned string sometimes, as openJ9/ibm
does (with some jdk versions).
It may not be good idea, to always return “.interned” value and fill
up the intern pool, for these short-lived objects (as they are most
times).
If this is agreed/approved, I can make a change and commit.
Regards
- SP