Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
I just realized that I also use String.trim() a lot, and have always assumed it's very cheap; that's no longer the case. JDK itself contains many usages of trim(); even a lot of str.substring().trim() Just saying. On Wed, Nov 14, 2012 at 9:14 AM, Zhong Yu zhong.j...@gmail.com wrote: Changing String.substring() from O(1) to O(n) is a big deal; we may say it breaks compatibility. Any code that intends to work across JDK versions before and after 7u6 cannot use the method, since its behavior is so different in different versions. Any deployment that upgrades JDK to 7u6 and later needs to review all its usages of substring(). That's a ton of work. A quick workaround might be to refactor all substring() usages to some `old_substring()` method that preserves O(1). Unfortunately `old_substring()` cannot exist, so there's no quick workaround possible. The memory leak problem of the old substring() method is well-known among Java programmers, it's not really a big problem today. For the uninitiated, they might expect that substring() is leak-free; but they might also expect that substring() is O(1). There's no apparent reason why favoring one is better than favoring the other. In the old implementation, there's a workaround to achieve leak-free, by `new String(String)`. In the new implementation, there is no workaround to achieve O(1), unless the developer uses something other than String. It is basically impossible to change String to another type if it's part of a method signature. With all these problems, please reconsider this change and see if you should roll it back. Thanks. Zhong Yu
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
Hi, This change is 6 months old now. I wonder if Oracle received any complaints from the users since then. I mean complaints that are based on real observations of performance degradation in real code - not only speculation. Regards, Peter 2012/11/15 Zhong Yu zhong.j...@gmail.com Since this change is to achieve minor performance boost, it's not fair to defend it by saying that it only incurs minor performance penalties. Java programs are infested with strings, most of which could have used a more appropriate type, but it is the insane reality. Any change to the behavior of strings should have been backed up by a much more thorough analysis. Every usage of substring() was (hopefully) the result of some conscious reasoning about space-time. Even if this change does not significantly alter an application's performance, it invalidates all the reasoning, that's the worst blow in my book. There's no problem if substring() does copying from day one, but 17 years have passed. Zhong Yu On Wed, Nov 14, 2012 at 6:58 PM, Vitaly Davidovich vita...@gmail.com wrote: Personally, I feel like the concern is a bit overstated: 1) the n in O(n) is likely actually fairly small in practice (at least in what I'd consider sane code) 2) I think a lot of people that worry about perf probably aren't using substring() anyway 3) copying char[] is optimized by jit - this is basically a memcpy()-like call, which modern machines handle well 4) the upside is strings are 8 bytes smaller 5) .NET substring() has always allocated new storage (via an optimized internal VM call) and never shared the char[] and I haven't come across any complaints or seen serious perf problems myself (granted I seldom use substring) So I don't know if this is anything to worry about in practice. Sent from my phone On Nov 14, 2012 5:26 PM, Zhong Yu zhong.j...@gmail.com wrote: On 06/03/2012 11:35 PM, Mike Duigou wrote: [I trimmed the distribution list] On Jun 3 2012, at 13:44 , Peter Levart wrote: On Thursday, May 31, 2012 03:22:35 AM mike.duigou at oracle.comwrote: Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Wow, that's quite a change. Indeed. It was a long time in development. It is a change which is expected to be overall beneficial though and in the general case a positive win. Wow! If the previous behavior of substring() was once a bug, by now it has become a well known feature. People know about it, and people depend on it. This change is a big surprise. Changing O(1) to O(n) is a breach of contract. It'll break lots of old code; and meanwhile lots of new code are still being written based on the old assumption. After people learned about the new behavior, they need to comb through and rewrite their code. The worst part is the same code performs very differently on different versions of JDK. What's a programmer supposed to do if his code targets JDK6 and above? If the cost of strings are no longer certain, what else can we believe in? Is there any chance in hell to roll it back? Maybe add a new method for the new behavior? Zhong Yu
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
Changing String.substring() from O(1) to O(n) is a big deal; we may say it breaks compatibility. Any code that intends to work across JDK versions before and after 7u6 cannot use the method, since its behavior is so different in different versions. Any deployment that upgrades JDK to 7u6 and later needs to review all its usages of substring(). That's a ton of work. A quick workaround might be to refactor all substring() usages to some `old_substring()` method that preserves O(1). Unfortunately `old_substring()` cannot exist, so there's no quick workaround possible. The memory leak problem of the old substring() method is well-known among Java programmers, it's not really a big problem today. For the uninitiated, they might expect that substring() is leak-free; but they might also expect that substring() is O(1). There's no apparent reason why favoring one is better than favoring the other. In the old implementation, there's a workaround to achieve leak-free, by `new String(String)`. In the new implementation, there is no workaround to achieve O(1), unless the developer uses something other than String. It is basically impossible to change String to another type if it's part of a method signature. With all these problems, please reconsider this change and see if you should roll it back. Thanks. Zhong Yu
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
The new implementation also introduces a new form of memory leak. Previously N substrings take O(N) space. Now it takes O(N*m) space where m is the average length of substrings. Some applications may be double penalized by the new implementation - both CPU and memory go up. On Wed, Nov 14, 2012 at 9:14 AM, Zhong Yu zhong.j...@gmail.com wrote: Changing String.substring() from O(1) to O(n) is a big deal; we may say it breaks compatibility. Any code that intends to work across JDK versions before and after 7u6 cannot use the method, since its behavior is so different in different versions. Any deployment that upgrades JDK to 7u6 and later needs to review all its usages of substring(). That's a ton of work. A quick workaround might be to refactor all substring() usages to some `old_substring()` method that preserves O(1). Unfortunately `old_substring()` cannot exist, so there's no quick workaround possible. The memory leak problem of the old substring() method is well-known among Java programmers, it's not really a big problem today. For the uninitiated, they might expect that substring() is leak-free; but they might also expect that substring() is O(1). There's no apparent reason why favoring one is better than favoring the other. In the old implementation, there's a workaround to achieve leak-free, by `new String(String)`. In the new implementation, there is no workaround to achieve O(1), unless the developer uses something other than String. It is basically impossible to change String to another type if it's part of a method signature. With all these problems, please reconsider this change and see if you should roll it back. Thanks. Zhong Yu
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
Hi Zhong Yu, I agree with you that changing the implementation of something like String.substring which is widely used is something that is always a little hairy. The memory leak you mention is one side of the problem, the other is that we want the VM to do memory collocation of String (i.e. allocate the array of char and the String as one object to avoid the double indirection). For that the creation of the char array and the creation of the String must be done in the constructor of String. So this change is a way to look to the bright future instead of looking to the dark past :) Now, I don't know why this change was backported to a jdk update, but it's more a question to the jdk7 update mailing list. Rémi On 11/14/2012 04:14 PM, Zhong Yu wrote: Changing String.substring() from O(1) to O(n) is a big deal; we may say it breaks compatibility. Any code that intends to work across JDK versions before and after 7u6 cannot use the method, since its behavior is so different in different versions. Any deployment that upgrades JDK to 7u6 and later needs to review all its usages of substring(). That's a ton of work. A quick workaround might be to refactor all substring() usages to some `old_substring()` method that preserves O(1). Unfortunately `old_substring()` cannot exist, so there's no quick workaround possible. The memory leak problem of the old substring() method is well-known among Java programmers, it's not really a big problem today. For the uninitiated, they might expect that substring() is leak-free; but they might also expect that substring() is O(1). There's no apparent reason why favoring one is better than favoring the other. In the old implementation, there's a workaround to achieve leak-free, by `new String(String)`. In the new implementation, there is no workaround to achieve O(1), unless the developer uses something other than String. It is basically impossible to change String to another type if it's part of a method signature. With all these problems, please reconsider this change and see if you should roll it back. Thanks. Zhong Yu
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
On 14/11/2012 16:06, Remi Forax wrote: Now, I don't know why this change was backported to a jdk update, but it's more a question to the jdk7 update mailing list. It was to offset the addition of the hash32 field. -Alan.
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
On 06/03/2012 11:35 PM, Mike Duigou wrote: [I trimmed the distribution list] On Jun 3 2012, at 13:44 , Peter Levart wrote: On Thursday, May 31, 2012 03:22:35 AM mike.duigou at oracle.com wrote: Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Wow, that's quite a change. Indeed. It was a long time in development. It is a change which is expected to be overall beneficial though and in the general case a positive win. Wow! If the previous behavior of substring() was once a bug, by now it has become a well known feature. People know about it, and people depend on it. This change is a big surprise. Changing O(1) to O(n) is a breach of contract. It'll break lots of old code; and meanwhile lots of new code are still being written based on the old assumption. After people learned about the new behavior, they need to comb through and rewrite their code. The worst part is the same code performs very differently on different versions of JDK. What's a programmer supposed to do if his code targets JDK6 and above? If the cost of strings are no longer certain, what else can we believe in? Is there any chance in hell to roll it back? Maybe add a new method for the new behavior? Zhong Yu
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
Personally, I feel like the concern is a bit overstated: 1) the n in O(n) is likely actually fairly small in practice (at least in what I'd consider sane code) 2) I think a lot of people that worry about perf probably aren't using substring() anyway 3) copying char[] is optimized by jit - this is basically a memcpy()-like call, which modern machines handle well 4) the upside is strings are 8 bytes smaller 5) .NET substring() has always allocated new storage (via an optimized internal VM call) and never shared the char[] and I haven't come across any complaints or seen serious perf problems myself (granted I seldom use substring) So I don't know if this is anything to worry about in practice. Sent from my phone On Nov 14, 2012 5:26 PM, Zhong Yu zhong.j...@gmail.com wrote: On 06/03/2012 11:35 PM, Mike Duigou wrote: [I trimmed the distribution list] On Jun 3 2012, at 13:44 , Peter Levart wrote: On Thursday, May 31, 2012 03:22:35 AM mike.duigou at oracle.com wrote: Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Wow, that's quite a change. Indeed. It was a long time in development. It is a change which is expected to be overall beneficial though and in the general case a positive win. Wow! If the previous behavior of substring() was once a bug, by now it has become a well known feature. People know about it, and people depend on it. This change is a big surprise. Changing O(1) to O(n) is a breach of contract. It'll break lots of old code; and meanwhile lots of new code are still being written based on the old assumption. After people learned about the new behavior, they need to comb through and rewrite their code. The worst part is the same code performs very differently on different versions of JDK. What's a programmer supposed to do if his code targets JDK6 and above? If the cost of strings are no longer certain, what else can we believe in? Is there any chance in hell to roll it back? Maybe add a new method for the new behavior? Zhong Yu
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
Since this change is to achieve minor performance boost, it's not fair to defend it by saying that it only incurs minor performance penalties. Java programs are infested with strings, most of which could have used a more appropriate type, but it is the insane reality. Any change to the behavior of strings should have been backed up by a much more thorough analysis. Every usage of substring() was (hopefully) the result of some conscious reasoning about space-time. Even if this change does not significantly alter an application's performance, it invalidates all the reasoning, that's the worst blow in my book. There's no problem if substring() does copying from day one, but 17 years have passed. Zhong Yu On Wed, Nov 14, 2012 at 6:58 PM, Vitaly Davidovich vita...@gmail.com wrote: Personally, I feel like the concern is a bit overstated: 1) the n in O(n) is likely actually fairly small in practice (at least in what I'd consider sane code) 2) I think a lot of people that worry about perf probably aren't using substring() anyway 3) copying char[] is optimized by jit - this is basically a memcpy()-like call, which modern machines handle well 4) the upside is strings are 8 bytes smaller 5) .NET substring() has always allocated new storage (via an optimized internal VM call) and never shared the char[] and I haven't come across any complaints or seen serious perf problems myself (granted I seldom use substring) So I don't know if this is anything to worry about in practice. Sent from my phone On Nov 14, 2012 5:26 PM, Zhong Yu zhong.j...@gmail.com wrote: On 06/03/2012 11:35 PM, Mike Duigou wrote: [I trimmed the distribution list] On Jun 3 2012, at 13:44 , Peter Levart wrote: On Thursday, May 31, 2012 03:22:35 AM mike.duigou at oracle.com wrote: Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Wow, that's quite a change. Indeed. It was a long time in development. It is a change which is expected to be overall beneficial though and in the general case a positive win. Wow! If the previous behavior of substring() was once a bug, by now it has become a well known feature. People know about it, and people depend on it. This change is a big surprise. Changing O(1) to O(n) is a breach of contract. It'll break lots of old code; and meanwhile lots of new code are still being written based on the old assumption. After people learned about the new behavior, they need to comb through and rewrite their code. The worst part is the same code performs very differently on different versions of JDK. What's a programmer supposed to do if his code targets JDK6 and above? If the cost of strings are no longer certain, what else can we believe in? Is there any chance in hell to roll it back? Maybe add a new method for the new behavior? Zhong Yu
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
For the record, javadoc uses substring very heavily, and might be impacted by this change. -- Jon On 06/03/2012 02:35 PM, Mike Duigou wrote: [I trimmed the distribution list] On Jun 3 2012, at 13:44 , Peter Levart wrote: On Thursday, May 31, 2012 03:22:35 AM mike.dui...@oracle.com wrote: Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Wow, that's quite a change. Indeed. It was a long time in development. It is a change which is expected to be overall beneficial though and in the general case a positive win. So .substring() is not O(1) any more? No. Though with object allocation it probably was only ever roughly O(1) anyway. Doesn't this have impact on the performance of parsers and such that rely on the performance caracteristics of the .substring() ? It does have an impact. We've seen as much as a couple of percent on some benchmarks. Parsers which use substring for extraction are definitely impacted by this change. Have you considered then implementing .subSequence() not in terms of just delegating to .substring() but returning a special CharSequence view over the chars of the sub-sequence? It does look that String.subSequence() returning a special view rather than a substring would be a good optimization and probably a very good compromise for parser developers. Please create an issue and if you have the time and expertise a patch would speed things along (though unfortunately almost certainly too late for inclusion in 7u6). Mike Regards, Peter Reviewed-by: alanb, mduigou, forax, briangoetz Contributed-by: brian.dohe...@oracle.com ! src/share/classes/java/lang/Integer.java ! src/share/classes/java/lang/Long.java ! src/share/classes/java/lang/String.java ! src/share/classes/java/lang/StringCoding.java
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
On Thursday, May 31, 2012 03:22:35 AM mike.dui...@oracle.com wrote: Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Wow, that's quite a change. So .substring() is not O(1) any more? Doesn't this have impact on the performance of parsers and such that rely on the performance caracteristics of the .substring() ? Have you considered then implementing .subSequence() not in terms of just delegating to .substring() but returning a special CharSequence view over the chars of the sub-sequence? Regards, Peter Reviewed-by: alanb, mduigou, forax, briangoetz Contributed-by: brian.dohe...@oracle.com ! src/share/classes/java/lang/Integer.java ! src/share/classes/java/lang/Long.java ! src/share/classes/java/lang/String.java ! src/share/classes/java/lang/StringCoding.java
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
On 06/03/2012 11:35 PM, Mike Duigou wrote: [I trimmed the distribution list] On Jun 3 2012, at 13:44 , Peter Levart wrote: On Thursday, May 31, 2012 03:22:35 AM mike.dui...@oracle.com wrote: Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Wow, that's quite a change. Indeed. It was a long time in development. It is a change which is expected to be overall beneficial though and in the general case a positive win. So .substring() is not O(1) any more? No. Though with object allocation it probably was only ever roughly O(1) anyway. Allocation fast path just bump a pointer, so it's O(1). There are two advantages of the new code. The String object and the array of chars are now co-located in memory (at least for small/medium strings) so cpu caches are happy. This fix a longstanding memory leak issue see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4513622 that why some parsers already doesn't use the substring() trick. BTW Mike, you can now close this bug. Doesn't this have impact on the performance of parsers and such that rely on the performance caracteristics of the .substring() ? It does have an impact. We've seen as much as a couple of percent on some benchmarks. Parsers which use substring for extraction are definitely impacted by this change. Have you considered then implementing .subSequence() not in terms of just delegating to .substring() but returning a special CharSequence view over the chars of the sub-sequence? It does look that String.subSequence() returning a special view rather than a substring would be a good optimization and probably a very good compromise for parser developers. Please create an issue and if you have the time and expertise a patch would speed things along (though unfortunately almost certainly too late for inclusion in 7u6). Given that Integer.parseInt() or Double.parseDouble() takes a String and not a CharSequence, yes you can create a CharSequence view but the only way to use it is to call toString() on it. Mike cheers, Rémi
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
In getBytes() you do not need: int n = srcEnd; int i = srcBegin; You could already use srcEnd srcBegin @@ -1015,16 +968,16 @@ public final class String return true; } if (anObject instanceof String) { -String anotherString = (String)anObject; -int n = count; -if (n == anotherString.count) { +String anotherString = (String) anObject; +int n = value.length; +if (n == anotherString.value.length) { char v1[] = value; char v2[] = anotherString.value; -int i = offset; -int j = anotherString.offset; +int i = 0; while (n-- != 0) { -if (v1[i++] != v2[j++]) -return false; +if (v1[i] != v2[i]) +return false; +i++; } return true; } Why not simply? : if (anObject instanceof String) { String anotherString = (String) anObject; int n = value.length; if (n == anotherString.value.length) { char v1[] = value; char v2[] = anotherString.value; while (n-- != 0) { if (v1[n] != v2[n]) return false; } return true; } This additionally has the advantage, that mostly the difference would be found quicker, as strings of same length often would differ at the end e.g.: VeryLongText_1 VeryLongText_2 Same for other equals and compare methods. BTW: You again have inserted a space after casts. ;-) -Ulf Am 31.05.2012 05:22, schrieb mike.dui...@oracle.com: Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Reviewed-by: alanb, mduigou, forax, briangoetz Contributed-by: brian.dohe...@oracle.com ! src/share/classes/java/lang/Integer.java ! src/share/classes/java/lang/Long.java ! src/share/classes/java/lang/String.java ! src/share/classes/java/lang/StringCoding.java
Re: hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
[I trimmed the distribution list] On Jun 3 2012, at 13:44 , Peter Levart wrote: On Thursday, May 31, 2012 03:22:35 AM mike.dui...@oracle.com wrote: Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Wow, that's quite a change. Indeed. It was a long time in development. It is a change which is expected to be overall beneficial though and in the general case a positive win. So .substring() is not O(1) any more? No. Though with object allocation it probably was only ever roughly O(1) anyway. Doesn't this have impact on the performance of parsers and such that rely on the performance caracteristics of the .substring() ? It does have an impact. We've seen as much as a couple of percent on some benchmarks. Parsers which use substring for extraction are definitely impacted by this change. Have you considered then implementing .subSequence() not in terms of just delegating to .substring() but returning a special CharSequence view over the chars of the sub-sequence? It does look that String.subSequence() returning a special view rather than a substring would be a good optimization and probably a very good compromise for parser developers. Please create an issue and if you have the time and expertise a patch would speed things along (though unfortunately almost certainly too late for inclusion in 7u6). Mike Regards, Peter Reviewed-by: alanb, mduigou, forax, briangoetz Contributed-by: brian.dohe...@oracle.com ! src/share/classes/java/lang/Integer.java ! src/share/classes/java/lang/Long.java ! src/share/classes/java/lang/String.java ! src/share/classes/java/lang/StringCoding.java
hg: jdk8/tl/jdk: 6924259: Remove offset and count fields from java.lang.String
Changeset: 2c773daa825d Author:mduigou Date: 2012-05-17 10:06 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d 6924259: Remove offset and count fields from java.lang.String Summary: Removes the use of shared character array buffers by String along with the two fields needed to support the use of shared buffers. Reviewed-by: alanb, mduigou, forax, briangoetz Contributed-by: brian.dohe...@oracle.com ! src/share/classes/java/lang/Integer.java ! src/share/classes/java/lang/Long.java ! src/share/classes/java/lang/String.java ! src/share/classes/java/lang/StringCoding.java