Re: java.io.Writer uses CharSequence.toString()

2016-07-30 Thread Pavel Rappo
Could you please prototype what you suggest in code so we could discuss it more
constructively? Otherwise I feel this discussion is getting too broad and as
such may not achieve anything in particular.

Thanks,
-Pavel

> On 30 Jul 2016, at 10:03, Fabian Lange  wrote:
> 
> Hi,
> so why did you guys invent CharSequence as an API if it cannot be used.
> I kind of understand why people use unsafe and come up with their own
> character data implementations.
> 
> So you prefer multi megabyte string allocations including their arraycopy,
> (java.lang.StringBuilder.toString())
> 
>public String toString() {
>// Create a copy, don't share the array
>return new String(value, 0, count);
>}
> 
> which then will arraycopy again this multimegabyte char array
> java.io.Writer.write(String, int, int)
> 
>} else {// Don't permanently allocate very large buffers.
>cbuf = new char[len];
>}
>str.getChars(off, (off + len), cbuf, 0);
>write(cbuf, 0, len);
> 
> over an Implementation which doesn't do that?
> 
> Fabian
> 
> On Sat, Jul 30, 2016 at 1:23 AM, Brent Christian
>  wrote:
>> Hi,
>> 
>> This idea has been brought up before [1].
>> 
>> I concur with Pavel's assessment.  I would add that now that latin-1 Strings
>> are stored in a more compact form in JDK 9 ("Compact Strings" [2]), the
>> performance profile of string data is further complicated.
>> 
>> Thanks,
>> -Brent
>> 
>> 1. https://bugs.openjdk.java.net/browse/JDK-6206838
>> 2. https://bugs.openjdk.java.net/browse/JDK-8054307
>> 
>> On 07/29/2016 10:21 AM, Pavel Rappo wrote:
>>> 
>>> Once again, while I agree in some places it could have been done a bit
>>> better
>>> probably, I would say it's good to a have a look at benchmarks first.
>>> 
>>> If they show there's indeed a big difference between
>>> 
>>>char[] copy = new chars[charSequence.length()];
>>>String s = charSequence.toString();
>>>s.getChars(0, s.length, copy, 0);
>>> 
>>> and
>>> 
>>>char[] copy = new chars[charSequence.length()];
>>>charSequence.getChars(0, charSequence.length(), copy, 0);
>>> 
>>> it could justify an increase in complexity of CharBuffer.append or
>>> introducing a
>>> new default method (getChars/fillInto) into CharSequence. Possibly. Or
>>> maybe
>>> not. Because there might be some nontrivial effects we are completely
>>> unaware of.
>>> 
>>> Btw, what do you mean by "extract char[]" from StringBuilder? Do you want
>>> StringBuilder to give away a reference to its char[] outside? If not, than
>>> what's the difference between "extract char[]" from StringBuilder and "use
>>> String" in your algorithm?
>>> 
>>> The bottom line is whatever you suggest would likely need a good
>>> justification.
>>> To me it's not immediately obvious that something like this
>>> 
>>> public CharBuffer append(CharSequence csq) {
>>> if (csq == null) {
>>> put("null");
>>> } else if (csq instanceof StringBuilder) {
>>> char[] chars = new char[csq.length()];
>>> ((StringBuilder) csq).getChars(0, csq.length(), chars, 0);
>>> put(chars);
>>> } else if (csq instanceof StringBuffer) {
>>> char[] chars = new char[csq.length()];
>>> ((StringBuffer) csq).getChars(0, csq.length(), chars, 0);
>>> put(chars);
>>> } else if (csq instanceof CharBuffer) {
>>> CharBuffer buffer = (CharBuffer) csq;
>>> int p = buffer.position();
>>> put(buffer);
>>> buffer.position(p);
>>> } else {
>>> for (int i = 0; i < csq.length(); i++) {
>>> put(csq.charAt(i));
>>> }
>>> }
>>> return this;
>>> }
>>> 
>>> is better than this (what's there today)
>>> 
>>> public CharBuffer append(CharSequence csq) {
>>> if (csq == null)
>>> return put("null");
>>> else
>>> return put(csq.toString());
>>> }
>>> 
 On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote:
 
 Hello,
 
 Have to agree with Fabian handling CharSequences (and special case
 StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see
 the same toString. I would expect it to do:
 - Instamceof String -> use it
 - Instance of StringBuilder -> extract char[] and iterate
 - Instance of CharBuffer -> handle
 - Otherwise: Loop over charAt
 
 (the otherwise might be a tradeof between allocation and (not)inlined
 bounds checks)
 
 Alternative would be a CharSequence.fillInto(char[])
 
 BTW wouldn't it be create if char[] implements CharSequence?
 
 Gruss
 Bernd
 --
 http://bernd.eckenfels.net
 From Win 10 Mobile
 
 Von: Fabian Lange
>>> 
>>> 
>> 



Re: java.io.Writer uses CharSequence.toString()

2016-07-30 Thread Fabian Lange
Hi,
so why did you guys invent CharSequence as an API if it cannot be used.
I kind of understand why people use unsafe and come up with their own
character data implementations.

So you prefer multi megabyte string allocations including their arraycopy,
(java.lang.StringBuilder.toString())

public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}

which then will arraycopy again this multimegabyte char array
java.io.Writer.write(String, int, int)

} else {// Don't permanently allocate very large buffers.
cbuf = new char[len];
}
str.getChars(off, (off + len), cbuf, 0);
write(cbuf, 0, len);

over an Implementation which doesn't do that?

Fabian

On Sat, Jul 30, 2016 at 1:23 AM, Brent Christian
 wrote:
> Hi,
>
> This idea has been brought up before [1].
>
> I concur with Pavel's assessment.  I would add that now that latin-1 Strings
> are stored in a more compact form in JDK 9 ("Compact Strings" [2]), the
> performance profile of string data is further complicated.
>
> Thanks,
> -Brent
>
> 1. https://bugs.openjdk.java.net/browse/JDK-6206838
> 2. https://bugs.openjdk.java.net/browse/JDK-8054307
>
> On 07/29/2016 10:21 AM, Pavel Rappo wrote:
>>
>> Once again, while I agree in some places it could have been done a bit
>> better
>> probably, I would say it's good to a have a look at benchmarks first.
>>
>> If they show there's indeed a big difference between
>>
>> char[] copy = new chars[charSequence.length()];
>> String s = charSequence.toString();
>> s.getChars(0, s.length, copy, 0);
>>
>> and
>>
>> char[] copy = new chars[charSequence.length()];
>> charSequence.getChars(0, charSequence.length(), copy, 0);
>>
>> it could justify an increase in complexity of CharBuffer.append or
>> introducing a
>> new default method (getChars/fillInto) into CharSequence. Possibly. Or
>> maybe
>> not. Because there might be some nontrivial effects we are completely
>> unaware of.
>>
>> Btw, what do you mean by "extract char[]" from StringBuilder? Do you want
>> StringBuilder to give away a reference to its char[] outside? If not, than
>> what's the difference between "extract char[]" from StringBuilder and "use
>> String" in your algorithm?
>>
>> The bottom line is whatever you suggest would likely need a good
>> justification.
>> To me it's not immediately obvious that something like this
>>
>>  public CharBuffer append(CharSequence csq) {
>>  if (csq == null) {
>>  put("null");
>>  } else if (csq instanceof StringBuilder) {
>>  char[] chars = new char[csq.length()];
>>  ((StringBuilder) csq).getChars(0, csq.length(), chars, 0);
>>  put(chars);
>>  } else if (csq instanceof StringBuffer) {
>>  char[] chars = new char[csq.length()];
>>  ((StringBuffer) csq).getChars(0, csq.length(), chars, 0);
>>  put(chars);
>>  } else if (csq instanceof CharBuffer) {
>>  CharBuffer buffer = (CharBuffer) csq;
>>  int p = buffer.position();
>>  put(buffer);
>>  buffer.position(p);
>>  } else {
>>  for (int i = 0; i < csq.length(); i++) {
>>  put(csq.charAt(i));
>>  }
>>  }
>>  return this;
>>  }
>>
>> is better than this (what's there today)
>>
>>  public CharBuffer append(CharSequence csq) {
>>  if (csq == null)
>>  return put("null");
>>  else
>>  return put(csq.toString());
>>  }
>>
>>> On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote:
>>>
>>> Hello,
>>>
>>> Have to agree with Fabian handling CharSequences (and special case
>>> StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see
>>> the same toString. I would expect it to do:
>>> - Instamceof String -> use it
>>> - Instance of StringBuilder -> extract char[] and iterate
>>> - Instance of CharBuffer -> handle
>>> - Otherwise: Loop over charAt
>>>
>>> (the otherwise might be a tradeof between allocation and (not)inlined
>>> bounds checks)
>>>
>>> Alternative would be a CharSequence.fillInto(char[])
>>>
>>> BTW wouldn't it be create if char[] implements CharSequence?
>>>
>>> Gruss
>>> Bernd
>>> --
>>> http://bernd.eckenfels.net
>>>  From Win 10 Mobile
>>>
>>> Von: Fabian Lange
>>
>>
>


Re: java.io.Writer uses CharSequence.toString()

2016-07-29 Thread Brent Christian

Hi,

This idea has been brought up before [1].

I concur with Pavel's assessment.  I would add that now that latin-1 
Strings are stored in a more compact form in JDK 9 ("Compact Strings" 
[2]), the performance profile of string data is further complicated.


Thanks,
-Brent

1. https://bugs.openjdk.java.net/browse/JDK-6206838
2. https://bugs.openjdk.java.net/browse/JDK-8054307
On 07/29/2016 10:21 AM, Pavel Rappo wrote:

Once again, while I agree in some places it could have been done a bit better
probably, I would say it's good to a have a look at benchmarks first.

If they show there's indeed a big difference between

char[] copy = new chars[charSequence.length()];
String s = charSequence.toString();
s.getChars(0, s.length, copy, 0);

and

char[] copy = new chars[charSequence.length()];
charSequence.getChars(0, charSequence.length(), copy, 0);

it could justify an increase in complexity of CharBuffer.append or introducing a
new default method (getChars/fillInto) into CharSequence. Possibly. Or maybe
not. Because there might be some nontrivial effects we are completely unaware 
of.

Btw, what do you mean by "extract char[]" from StringBuilder? Do you want
StringBuilder to give away a reference to its char[] outside? If not, than
what's the difference between "extract char[]" from StringBuilder and "use
String" in your algorithm?

The bottom line is whatever you suggest would likely need a good justification.
To me it's not immediately obvious that something like this

 public CharBuffer append(CharSequence csq) {
 if (csq == null) {
 put("null");
 } else if (csq instanceof StringBuilder) {
 char[] chars = new char[csq.length()];
 ((StringBuilder) csq).getChars(0, csq.length(), chars, 0);
 put(chars);
 } else if (csq instanceof StringBuffer) {
 char[] chars = new char[csq.length()];
 ((StringBuffer) csq).getChars(0, csq.length(), chars, 0);
 put(chars);
 } else if (csq instanceof CharBuffer) {
 CharBuffer buffer = (CharBuffer) csq;
 int p = buffer.position();
 put(buffer);
 buffer.position(p);
 } else {
 for (int i = 0; i < csq.length(); i++) {
 put(csq.charAt(i));
 }
 }
 return this;
 }

is better than this (what's there today)

 public CharBuffer append(CharSequence csq) {
 if (csq == null)
 return put("null");
 else
 return put(csq.toString());
 }


On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote:

Hello,

Have to agree with Fabian handling CharSequences (and special case 
StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see the 
same toString. I would expect it to do:
- Instamceof String -> use it
- Instance of StringBuilder -> extract char[] and iterate
- Instance of CharBuffer -> handle
- Otherwise: Loop over charAt

(the otherwise might be a tradeof between allocation and (not)inlined bounds 
checks)

Alternative would be a CharSequence.fillInto(char[])

BTW wouldn't it be create if char[] implements CharSequence?

Gruss
Bernd
--
http://bernd.eckenfels.net
 From Win 10 Mobile

Von: Fabian Lange






Re: java.io.Writer uses CharSequence.toString()

2016-07-29 Thread Bernd Eckenfels
Hello,

yes I agree that has to be benchmarked. (And probably greatly varries
with the input length as well).


But besides the performance aspect I wanted to mention something else.
I had a password hashing API and wanted to switch from the typical
overwriteable char[] signature to a CharSequence. Because if the
password is only existing as a String there is no help in copying it to
a char[]. And depending on the source CharSequence can deal with all of
them. But then I noticed its quite hard to deal with CharSequences
where no intermediate Strings are constructed.

You asked about the "extract", I was inspired by Peters hint to
extractChar in StringUtil of OpenHFT. It is a scary magic in an
external API but it could be a internal shortcut within the JDK. I
would actally expect encoders to have the same shortcut for strings...

https://github.com/OpenHFT/Chronicle-Core/blob/master/src/main/java/net/openhft/chronicle/core/util/StringUtils.java

Gruss
Bernd


 Am Fri, 29 Jul 2016 18:21:37 +0100
schrieb Pavel Rappo :

> Once again, while I agree in some places it could have been done a
> bit better probably, I would say it's good to a have a look at
> benchmarks first.
> 
> If they show there's indeed a big difference between
> 
>char[] copy = new chars[charSequence.length()];
>String s = charSequence.toString();
>s.getChars(0, s.length, copy, 0);
> 
> and
> 
>char[] copy = new chars[charSequence.length()];
>charSequence.getChars(0, charSequence.length(), copy, 0);
> 
> it could justify an increase in complexity of CharBuffer.append or
> introducing a new default method (getChars/fillInto) into
> CharSequence. Possibly. Or maybe not. Because there might be some
> nontrivial effects we are completely unaware of.
> 
> Btw, what do you mean by "extract char[]" from StringBuilder? Do you
> want StringBuilder to give away a reference to its char[] outside? If
> not, than what's the difference between "extract char[]" from
> StringBuilder and "use String" in your algorithm?
> 
> The bottom line is whatever you suggest would likely need a good
> justification. To me it's not immediately obvious that something like
> this
> 
> public CharBuffer append(CharSequence csq) {
> if (csq == null) {
> put("null");
> } else if (csq instanceof StringBuilder) {
> char[] chars = new char[csq.length()];
> ((StringBuilder) csq).getChars(0, csq.length(), chars, 0);
> put(chars);
> } else if (csq instanceof StringBuffer) {
> char[] chars = new char[csq.length()];
> ((StringBuffer) csq).getChars(0, csq.length(), chars, 0);
> put(chars);
> } else if (csq instanceof CharBuffer) {
> CharBuffer buffer = (CharBuffer) csq;
> int p = buffer.position();
> put(buffer);
> buffer.position(p);
> } else {
> for (int i = 0; i < csq.length(); i++) {
> put(csq.charAt(i));
> }
> }
> return this;
> }
> 
> is better than this (what's there today)
> 
> public CharBuffer append(CharSequence csq) {
> if (csq == null)
> return put("null");
> else
> return put(csq.toString());
> }
> 
> > On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote:
> > 
> > Hello,
> > 
> > Have to agree with Fabian handling CharSequences (and special case
> > StringBuilder) is pretty weak, in CharBuffer.append(CharSequence)
> > you see the same toString. I would expect it to do:
> > - Instamceof String -> use it
> > - Instance of StringBuilder -> extract char[] and iterate
> > - Instance of CharBuffer -> handle
> > - Otherwise: Loop over charAt
> > 
> > (the otherwise might be a tradeof between allocation and
> > (not)inlined bounds checks) 
> > 
> > Alternative would be a CharSequence.fillInto(char[])
> > 
> > BTW wouldn't it be create if char[] implements CharSequence?
> > 
> > Gruss
> > Bernd
> > -- 
> > http://bernd.eckenfels.net
> > From Win 10 Mobile
> > 
> > Von: Fabian Lange
> 



Re: java.io.Writer uses CharSequence.toString()

2016-07-29 Thread Pavel Rappo
Once again, while I agree in some places it could have been done a bit better
probably, I would say it's good to a have a look at benchmarks first.

If they show there's indeed a big difference between

   char[] copy = new chars[charSequence.length()];
   String s = charSequence.toString();
   s.getChars(0, s.length, copy, 0);

and

   char[] copy = new chars[charSequence.length()];
   charSequence.getChars(0, charSequence.length(), copy, 0);

it could justify an increase in complexity of CharBuffer.append or introducing a
new default method (getChars/fillInto) into CharSequence. Possibly. Or maybe
not. Because there might be some nontrivial effects we are completely unaware 
of.

Btw, what do you mean by "extract char[]" from StringBuilder? Do you want
StringBuilder to give away a reference to its char[] outside? If not, than
what's the difference between "extract char[]" from StringBuilder and "use
String" in your algorithm?

The bottom line is whatever you suggest would likely need a good justification.
To me it's not immediately obvious that something like this

public CharBuffer append(CharSequence csq) {
if (csq == null) {
put("null");
} else if (csq instanceof StringBuilder) {
char[] chars = new char[csq.length()];
((StringBuilder) csq).getChars(0, csq.length(), chars, 0);
put(chars);
} else if (csq instanceof StringBuffer) {
char[] chars = new char[csq.length()];
((StringBuffer) csq).getChars(0, csq.length(), chars, 0);
put(chars);
} else if (csq instanceof CharBuffer) {
CharBuffer buffer = (CharBuffer) csq;
int p = buffer.position();
put(buffer);
buffer.position(p);
} else {
for (int i = 0; i < csq.length(); i++) {
put(csq.charAt(i));
}
}
return this;
}

is better than this (what's there today)

public CharBuffer append(CharSequence csq) {
if (csq == null)
return put("null");
else
return put(csq.toString());
}

> On 29 Jul 2016, at 15:12, e...@zusammenkunft.net wrote:
> 
> Hello,
> 
> Have to agree with Fabian handling CharSequences (and special case 
> StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see the 
> same toString. I would expect it to do:
> - Instamceof String -> use it
> - Instance of StringBuilder -> extract char[] and iterate
> - Instance of CharBuffer -> handle
> - Otherwise: Loop over charAt
> 
> (the otherwise might be a tradeof between allocation and (not)inlined bounds 
> checks) 
> 
> Alternative would be a CharSequence.fillInto(char[])
> 
> BTW wouldn't it be create if char[] implements CharSequence?
> 
> Gruss
> Bernd
> -- 
> http://bernd.eckenfels.net
> From Win 10 Mobile
> 
> Von: Fabian Lange



AW: java.io.Writer uses CharSequence.toString()

2016-07-29 Thread ecki
Hello,

Have to agree with Fabian handling CharSequences (and special case 
StringBuilder) is pretty weak, in CharBuffer.append(CharSequence) you see the 
same toString. I would expect it to do:
- Instamceof String -> use it
- Instance of StringBuilder -> extract char[] and iterate
- Instance of CharBuffer -> handle
- Otherwise: Loop over charAt

(the otherwise might be a tradeof between allocation and (not)inlined bounds 
checks) 

Alternative would be a CharSequence.fillInto(char[])

BTW wouldn't it be create if char[] implements CharSequence?

Gruss
Bernd
-- 
http://bernd.eckenfels.net
>From Win 10 Mobile

Von: Fabian Lange

Re: java.io.Writer uses CharSequence.toString()

2016-07-29 Thread Pavel Rappo

> On 29 Jul 2016, at 09:23, Fabian Lange  wrote:
> 
> This is especially sad because after having mad the String, write()
> will then copy out the chars of the string and then iterate over the
> chars individually.

As far as I can see no one iterates over chars individually. Instead, bulk write
is invoked with the char array from the String [1].

In contrast with java.io.OutputStream, java.io.Writer encourages bulk writes,
not single char writes. So the premise is that a concrete subclass provides
efficient bulk write. If this doesn't hold, then one could override 

Writer append(CharSequence csq)

in this class to behave exactly like you said.

> Something an iteration over chars of the CharSequence could have
> achieved much better.

I'm not sure about this. But you could prove it with a set of benchmarks for
concrete subclasses in the JDK.


[1] 
http://hg.openjdk.java.net/jdk9/dev/jdk/file/9410dfad9f32/src/java.base/share/classes/java/io/Writer.java#l203



java.io.Writer uses CharSequence.toString()

2016-07-29 Thread Fabian Lange
Hi,
this week I learned that java.io.Writer while having
append(CharSequence) methods, will invoke toString() on them before
actually appending.
This is especially sad because after having mad the String, write()
will then copy out the chars of the string and then iterate over the
chars individually.

Something an iteration over chars of the CharSequence could have
achieved much better.

Fabian