Re: java.io.Writer uses CharSequence.toString()
Could you please prototype what you suggest in code so we could discuss it more constructively? Otherwise I feel this discussion is getting too broad and as such may not achieve anything in particular. Thanks, -Pavel > On 30 Jul 2016, at 10:03, Fabian Lange wrote: > > Hi, > so why did you guys invent CharSequence as an API if it cannot be used. > I kind of understand why people use unsafe and come up with their own > character data implementations. > > So you prefer multi megabyte string allocations including their arraycopy, > (java.lang.StringBuilder.toString()) > >public String toString() { >// Create a copy, don't share the array >return new String(value, 0, count); >} > > which then will arraycopy again this multimegabyte char array > java.io.Writer.write(String, int, int) > >} else {// Don't permanently allocate very large buffers. >cbuf = new char[len]; >} >str.getChars(off, (off + len), cbuf, 0); >write(cbuf, 0, len); > > over an Implementation which doesn't do that? > > Fabian > > On Sat, Jul 30, 2016 at 1:23 AM, Brent Christian > wrote: >> Hi, >> >> This idea has been brought up before [1]. >> >> I concur with Pavel's assessment. I would add that now that latin-1 Strings >> are stored in a more compact form in JDK 9 ("Compact Strings" [2]), the >> performance profile of string data is further complicated. >> >> Thanks, >> -Brent >> >> 1. https://bugs.openjdk.java.net/browse/JDK-6206838 >> 2. https://bugs.openjdk.java.net/browse/JDK-8054307 >> >> On 07/29/2016 10:21 AM, Pavel Rappo wrote: >>> >>> Once again, while I agree in some places it could have been done a bit >>> better >>> probably, I would say it's good to a have a look at benchmarks first. >>> >>> If they show there's indeed a big difference between >>> >>>char[] copy = new chars[charSequence.length()]; >>>String s = charSequence.toString(); >>>s.getChars(0, s.length, copy, 0); >>> >>> and >>> >>>char[] copy = new chars[charSequence.length()]; >>>charSequence.getChars(0, charSequence.length(), copy, 0); >>> >>> it could justify an increase in complexity of CharBuffer.append or >>> introducing a >>> new default method (getChars/fillInto) into CharSequence. Possibly. Or >>> maybe >>> not. Because there might be some nontrivial effects we are completely >>> unaware of. >>> >>> Btw, what do you mean by "extract char[]" from StringBuilder? Do you want >>> StringBuilder to give away a reference to its char[] outside? If not, than >>> what's the difference between "extract char[]" from StringBuilder and "use >>> String" in your algorithm? >>> >>> The bottom line is whatever you suggest would likely need a good >>> justification. >>> To me it's not immediately obvious that something like this >>> >>> public CharBuffer append(CharSequence csq) { >>> if (csq == null) { >>> put("null"); >>> } else if (csq instanceof StringBuilder) { >>> char[] chars = new char[csq.length()]; >>> ((StringBuilder) csq).getChars(0, csq.length(), chars, 0); >>> put(chars); >>> } else if (csq instanceof StringBuffer) { >>> char[] chars = new char[csq.length()]; >>> ((StringBuffer) csq).getChars(0, csq.length(), chars, 0); >>> put(chars); >>> } else if (csq instanceof CharBuffer) { >>> CharBuffer buffer = (CharBuffer) csq; >>> int p = buffer.position(); >>> put(buffer); >>> buffer.position(p); >>> } else { >>> for (int i = 0; i < csq.length(); i++) { >>> put(csq.charAt(i)); >>> } >>> } >>> return this; >>> } >>> >>> is better than this (what's there today) >>> >>> public CharBuffer append(CharSequence csq) { >>> if (csq == null) >>> return put("null"); >>> else >>> return put(csq.toString()); >>> } >>> On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote: Hello, Have to agree with Fabian handling CharSequences (and special case StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see the same toString. I would expect it to do: - Instamceof String -> use it - Instance of StringBuilder -> extract char[] and iterate - Instance of CharBuffer -> handle - Otherwise: Loop over charAt (the otherwise might be a tradeof between allocation and (not)inlined bounds checks) Alternative would be a CharSequence.fillInto(char[]) BTW wouldn't it be create if char[] implements CharSequence? Gruss Bernd -- http://bernd.eckenfels.net From Win 10 Mobile Von: Fabian Lange >>> >>> >>
Re: java.io.Writer uses CharSequence.toString()
Hi, so why did you guys invent CharSequence as an API if it cannot be used. I kind of understand why people use unsafe and come up with their own character data implementations. So you prefer multi megabyte string allocations including their arraycopy, (java.lang.StringBuilder.toString()) public String toString() { // Create a copy, don't share the array return new String(value, 0, count); } which then will arraycopy again this multimegabyte char array java.io.Writer.write(String, int, int) } else {// Don't permanently allocate very large buffers. cbuf = new char[len]; } str.getChars(off, (off + len), cbuf, 0); write(cbuf, 0, len); over an Implementation which doesn't do that? Fabian On Sat, Jul 30, 2016 at 1:23 AM, Brent Christian wrote: > Hi, > > This idea has been brought up before [1]. > > I concur with Pavel's assessment. I would add that now that latin-1 Strings > are stored in a more compact form in JDK 9 ("Compact Strings" [2]), the > performance profile of string data is further complicated. > > Thanks, > -Brent > > 1. https://bugs.openjdk.java.net/browse/JDK-6206838 > 2. https://bugs.openjdk.java.net/browse/JDK-8054307 > > On 07/29/2016 10:21 AM, Pavel Rappo wrote: >> >> Once again, while I agree in some places it could have been done a bit >> better >> probably, I would say it's good to a have a look at benchmarks first. >> >> If they show there's indeed a big difference between >> >> char[] copy = new chars[charSequence.length()]; >> String s = charSequence.toString(); >> s.getChars(0, s.length, copy, 0); >> >> and >> >> char[] copy = new chars[charSequence.length()]; >> charSequence.getChars(0, charSequence.length(), copy, 0); >> >> it could justify an increase in complexity of CharBuffer.append or >> introducing a >> new default method (getChars/fillInto) into CharSequence. Possibly. Or >> maybe >> not. Because there might be some nontrivial effects we are completely >> unaware of. >> >> Btw, what do you mean by "extract char[]" from StringBuilder? Do you want >> StringBuilder to give away a reference to its char[] outside? If not, than >> what's the difference between "extract char[]" from StringBuilder and "use >> String" in your algorithm? >> >> The bottom line is whatever you suggest would likely need a good >> justification. >> To me it's not immediately obvious that something like this >> >> public CharBuffer append(CharSequence csq) { >> if (csq == null) { >> put("null"); >> } else if (csq instanceof StringBuilder) { >> char[] chars = new char[csq.length()]; >> ((StringBuilder) csq).getChars(0, csq.length(), chars, 0); >> put(chars); >> } else if (csq instanceof StringBuffer) { >> char[] chars = new char[csq.length()]; >> ((StringBuffer) csq).getChars(0, csq.length(), chars, 0); >> put(chars); >> } else if (csq instanceof CharBuffer) { >> CharBuffer buffer = (CharBuffer) csq; >> int p = buffer.position(); >> put(buffer); >> buffer.position(p); >> } else { >> for (int i = 0; i < csq.length(); i++) { >> put(csq.charAt(i)); >> } >> } >> return this; >> } >> >> is better than this (what's there today) >> >> public CharBuffer append(CharSequence csq) { >> if (csq == null) >> return put("null"); >> else >> return put(csq.toString()); >> } >> >>> On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote: >>> >>> Hello, >>> >>> Have to agree with Fabian handling CharSequences (and special case >>> StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see >>> the same toString. I would expect it to do: >>> - Instamceof String -> use it >>> - Instance of StringBuilder -> extract char[] and iterate >>> - Instance of CharBuffer -> handle >>> - Otherwise: Loop over charAt >>> >>> (the otherwise might be a tradeof between allocation and (not)inlined >>> bounds checks) >>> >>> Alternative would be a CharSequence.fillInto(char[]) >>> >>> BTW wouldn't it be create if char[] implements CharSequence? >>> >>> Gruss >>> Bernd >>> -- >>> http://bernd.eckenfels.net >>> From Win 10 Mobile >>> >>> Von: Fabian Lange >> >> >
Re: java.io.Writer uses CharSequence.toString()
Hi, This idea has been brought up before [1]. I concur with Pavel's assessment. I would add that now that latin-1 Strings are stored in a more compact form in JDK 9 ("Compact Strings" [2]), the performance profile of string data is further complicated. Thanks, -Brent 1. https://bugs.openjdk.java.net/browse/JDK-6206838 2. https://bugs.openjdk.java.net/browse/JDK-8054307 On 07/29/2016 10:21 AM, Pavel Rappo wrote: Once again, while I agree in some places it could have been done a bit better probably, I would say it's good to a have a look at benchmarks first. If they show there's indeed a big difference between char[] copy = new chars[charSequence.length()]; String s = charSequence.toString(); s.getChars(0, s.length, copy, 0); and char[] copy = new chars[charSequence.length()]; charSequence.getChars(0, charSequence.length(), copy, 0); it could justify an increase in complexity of CharBuffer.append or introducing a new default method (getChars/fillInto) into CharSequence. Possibly. Or maybe not. Because there might be some nontrivial effects we are completely unaware of. Btw, what do you mean by "extract char[]" from StringBuilder? Do you want StringBuilder to give away a reference to its char[] outside? If not, than what's the difference between "extract char[]" from StringBuilder and "use String" in your algorithm? The bottom line is whatever you suggest would likely need a good justification. To me it's not immediately obvious that something like this public CharBuffer append(CharSequence csq) { if (csq == null) { put("null"); } else if (csq instanceof StringBuilder) { char[] chars = new char[csq.length()]; ((StringBuilder) csq).getChars(0, csq.length(), chars, 0); put(chars); } else if (csq instanceof StringBuffer) { char[] chars = new char[csq.length()]; ((StringBuffer) csq).getChars(0, csq.length(), chars, 0); put(chars); } else if (csq instanceof CharBuffer) { CharBuffer buffer = (CharBuffer) csq; int p = buffer.position(); put(buffer); buffer.position(p); } else { for (int i = 0; i < csq.length(); i++) { put(csq.charAt(i)); } } return this; } is better than this (what's there today) public CharBuffer append(CharSequence csq) { if (csq == null) return put("null"); else return put(csq.toString()); } On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote: Hello, Have to agree with Fabian handling CharSequences (and special case StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see the same toString. I would expect it to do: - Instamceof String -> use it - Instance of StringBuilder -> extract char[] and iterate - Instance of CharBuffer -> handle - Otherwise: Loop over charAt (the otherwise might be a tradeof between allocation and (not)inlined bounds checks) Alternative would be a CharSequence.fillInto(char[]) BTW wouldn't it be create if char[] implements CharSequence? Gruss Bernd -- http://bernd.eckenfels.net From Win 10 Mobile Von: Fabian Lange
Re: java.io.Writer uses CharSequence.toString()
Hello, yes I agree that has to be benchmarked. (And probably greatly varries with the input length as well). But besides the performance aspect I wanted to mention something else. I had a password hashing API and wanted to switch from the typical overwriteable char[] signature to a CharSequence. Because if the password is only existing as a String there is no help in copying it to a char[]. And depending on the source CharSequence can deal with all of them. But then I noticed its quite hard to deal with CharSequences where no intermediate Strings are constructed. You asked about the "extract", I was inspired by Peters hint to extractChar in StringUtil of OpenHFT. It is a scary magic in an external API but it could be a internal shortcut within the JDK. I would actally expect encoders to have the same shortcut for strings... https://github.com/OpenHFT/Chronicle-Core/blob/master/src/main/java/net/openhft/chronicle/core/util/StringUtils.java Gruss Bernd Am Fri, 29 Jul 2016 18:21:37 +0100 schrieb Pavel Rappo : > Once again, while I agree in some places it could have been done a > bit better probably, I would say it's good to a have a look at > benchmarks first. > > If they show there's indeed a big difference between > >char[] copy = new chars[charSequence.length()]; >String s = charSequence.toString(); >s.getChars(0, s.length, copy, 0); > > and > >char[] copy = new chars[charSequence.length()]; >charSequence.getChars(0, charSequence.length(), copy, 0); > > it could justify an increase in complexity of CharBuffer.append or > introducing a new default method (getChars/fillInto) into > CharSequence. Possibly. Or maybe not. Because there might be some > nontrivial effects we are completely unaware of. > > Btw, what do you mean by "extract char[]" from StringBuilder? Do you > want StringBuilder to give away a reference to its char[] outside? If > not, than what's the difference between "extract char[]" from > StringBuilder and "use String" in your algorithm? > > The bottom line is whatever you suggest would likely need a good > justification. To me it's not immediately obvious that something like > this > > public CharBuffer append(CharSequence csq) { > if (csq == null) { > put("null"); > } else if (csq instanceof StringBuilder) { > char[] chars = new char[csq.length()]; > ((StringBuilder) csq).getChars(0, csq.length(), chars, 0); > put(chars); > } else if (csq instanceof StringBuffer) { > char[] chars = new char[csq.length()]; > ((StringBuffer) csq).getChars(0, csq.length(), chars, 0); > put(chars); > } else if (csq instanceof CharBuffer) { > CharBuffer buffer = (CharBuffer) csq; > int p = buffer.position(); > put(buffer); > buffer.position(p); > } else { > for (int i = 0; i < csq.length(); i++) { > put(csq.charAt(i)); > } > } > return this; > } > > is better than this (what's there today) > > public CharBuffer append(CharSequence csq) { > if (csq == null) > return put("null"); > else > return put(csq.toString()); > } > > > On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote: > > > > Hello, > > > > Have to agree with Fabian handling CharSequences (and special case > > StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) > > you see the same toString. I would expect it to do: > > - Instamceof String -> use it > > - Instance of StringBuilder -> extract char[] and iterate > > - Instance of CharBuffer -> handle > > - Otherwise: Loop over charAt > > > > (the otherwise might be a tradeof between allocation and > > (not)inlined bounds checks) > > > > Alternative would be a CharSequence.fillInto(char[]) > > > > BTW wouldn't it be create if char[] implements CharSequence? > > > > Gruss > > Bernd > > -- > > http://bernd.eckenfels.net > > From Win 10 Mobile > > > > Von: Fabian Lange >
Re: java.io.Writer uses CharSequence.toString()
Once again, while I agree in some places it could have been done a bit better probably, I would say it's good to a have a look at benchmarks first. If they show there's indeed a big difference between char[] copy = new chars[charSequence.length()]; String s = charSequence.toString(); s.getChars(0, s.length, copy, 0); and char[] copy = new chars[charSequence.length()]; charSequence.getChars(0, charSequence.length(), copy, 0); it could justify an increase in complexity of CharBuffer.append or introducing a new default method (getChars/fillInto) into CharSequence. Possibly. Or maybe not. Because there might be some nontrivial effects we are completely unaware of. Btw, what do you mean by "extract char[]" from StringBuilder? Do you want StringBuilder to give away a reference to its char[] outside? If not, than what's the difference between "extract char[]" from StringBuilder and "use String" in your algorithm? The bottom line is whatever you suggest would likely need a good justification. To me it's not immediately obvious that something like this public CharBuffer append(CharSequence csq) { if (csq == null) { put("null"); } else if (csq instanceof StringBuilder) { char[] chars = new char[csq.length()]; ((StringBuilder) csq).getChars(0, csq.length(), chars, 0); put(chars); } else if (csq instanceof StringBuffer) { char[] chars = new char[csq.length()]; ((StringBuffer) csq).getChars(0, csq.length(), chars, 0); put(chars); } else if (csq instanceof CharBuffer) { CharBuffer buffer = (CharBuffer) csq; int p = buffer.position(); put(buffer); buffer.position(p); } else { for (int i = 0; i < csq.length(); i++) { put(csq.charAt(i)); } } return this; } is better than this (what's there today) public CharBuffer append(CharSequence csq) { if (csq == null) return put("null"); else return put(csq.toString()); } > On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote: > > Hello, > > Have to agree with Fabian handling CharSequences (and special case > StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see the > same toString. I would expect it to do: > - Instamceof String -> use it > - Instance of StringBuilder -> extract char[] and iterate > - Instance of CharBuffer -> handle > - Otherwise: Loop over charAt > > (the otherwise might be a tradeof between allocation and (not)inlined bounds > checks) > > Alternative would be a CharSequence.fillInto(char[]) > > BTW wouldn't it be create if char[] implements CharSequence? > > Gruss > Bernd > -- > http://bernd.eckenfels.net > From Win 10 Mobile > > Von: Fabian Lange
AW: java.io.Writer uses CharSequence.toString()
Hello, Have to agree with Fabian handling CharSequences (and special case StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see the same toString. I would expect it to do: - Instamceof String -> use it - Instance of StringBuilder -> extract char[] and iterate - Instance of CharBuffer -> handle - Otherwise: Loop over charAt (the otherwise might be a tradeof between allocation and (not)inlined bounds checks) Alternative would be a CharSequence.fillInto(char[]) BTW wouldn't it be create if char[] implements CharSequence? Gruss Bernd -- http://bernd.eckenfels.net >From Win 10 Mobile Von: Fabian Lange
Re: java.io.Writer uses CharSequence.toString()
> On 29 Jul 2016, at 09:23, Fabian Lange wrote: > > This is especially sad because after having mad the String, write() > will then copy out the chars of the string and then iterate over the > chars individually. As far as I can see no one iterates over chars individually. Instead, bulk write is invoked with the char array from the String [1]. In contrast with java.io.OutputStream, java.io.Writer encourages bulk writes, not single char writes. So the premise is that a concrete subclass provides efficient bulk write. If this doesn't hold, then one could override Writer append(CharSequence csq) in this class to behave exactly like you said. > Something an iteration over chars of the CharSequence could have > achieved much better. I'm not sure about this. But you could prove it with a set of benchmarks for concrete subclasses in the JDK. [1] http://hg.openjdk.java.net/jdk9/dev/jdk/file/9410dfad9f32/src/java.base/share/classes/java/io/Writer.java#l203
java.io.Writer uses CharSequence.toString()
Hi, this week I learned that java.io.Writer while having append(CharSequence) methods, will invoke toString() on them before actually appending. This is especially sad because after having mad the String, write() will then copy out the chars of the string and then iterate over the chars individually. Something an iteration over chars of the CharSequence could have achieved much better. Fabian