Re: RFR (XS) 8076759: AbstractStringBuilder.append(...) should ensure enough capacity for the upcoming "trivial" append calls

2015-05-01 Thread Bernd Eckenfels
Hello,

btw just a small - maybe unrelated - observation on stock Java8u40. When
benchmarking String.valueOf/Integer.toString/""+n with JMH I noticed
that the compiler aided concatenation is the fastest, but not for all
integer values. I asume this is related with the initial size of the
buffer?

https://gist.github.com/ecki/399136f4fd59c1d110c1

Gruss
Bernd

 Am Fri, 01 May
2015 13:19:11 -0400 schrieb Roger Riggs :

> Hi Aleksey,
> 
> Is there any additional benefit by rounding up the next multiple of 4
> or 8. That would avoid a few wasted bytes at the end of the buffer
> modulo the allocation size.
> 
> Otherwise, looks fine to me also.


Re: RFR (XS) 8076759: AbstractStringBuilder.append(...) should ensure enough capacity for the upcoming "trivial" append calls

2015-05-01 Thread Roger Riggs

Hi Aleksey,

Is there any additional benefit by rounding up the next multiple of 4 or 8.
That would avoid a few wasted bytes at the end of the buffer modulo the 
allocation size.


Otherwise, looks fine to me also.

Roger


On 5/1/2015 10:09 AM, Aleksey Shipilev wrote:

Anyone?

-Aleksey

On 04/24/2015 09:05 PM, Aleksey Shipilev wrote:

Hi,

This seems to be a simple one-liner fix, but the background is more
complicated. See the bug:
   https://bugs.openjdk.java.net/browse/JDK-8076759

The bottom line is that our current resizing policy in ASB is hostile
for long appends. There is a heuristics that extends the capacity to
match the *exact* length of append if doubling the array would not help.

This heuristics has a nasty corner case: if there is an upcoming append
after a large one, then we are guaranteed to re-size again. If an
upcoming append is large in itself, the resizing is inevitable even
under the doubling-the-storage strategy; but if we only do a small
append, then we can be smarter.

After trying a few options to fix this (see below), I have settled on
just adding a simple static "pad", to absorb the trivial appends after a
large append:
   http://cr.openjdk.java.net/~shade/8076759/webrev.00/

The choice of "32" as magic number is deliberate: arraycopy likes large
power-of-two strides (and it does not like to make catch up loops for
small residuals). "16" is too small to fit the decimal representation of
Long.MIN_VALUE, therefore, we pick "32".

There are other approaches, briefly mentioned here:
   http://cr.openjdk.java.net/~shade/8076759/patches.txt

There is a direct correlation between the allocation pressure, and test
performance:
   http://cr.openjdk.java.net/~shade/8076759/data-perf.png
   http://cr.openjdk.java.net/~shade/8076759/data-foot.png

Naively, one could expect doubling the storage ("mult2") until we reach
$minimalCapacity solves the problem, but it wastes too much memory, and
only reaches the "plus32" on power-of-two sizes. That is also the
Achilles' Heel of the heuristics, because appending the
power-of-two-plus-one-sized string will set us up for the original
problem. This effect can be alleviated by doing the padding as well
("mult2-plus32"). Exactly the same trouble manifests on smaller strings
that go through the usual double-the-storage route, and this is why a
proposed patch makes the pad on common path.

I do believe the current heuristics is smart about large appends, and
mult2* strategies undo it. Therefore, I would think keeping the
minimumCapacity cap is a good thing, and just adding the pad is a good
solution. Thus, it is in the webrev.

Thanks,
-Aleksey.







Unicode command-line parameters on Windows

2015-05-01 Thread Anthony Vanelverdinghe

Hi

I would like to use a Java program for Windows file associations. 
However, this doesn't work when the file to be opened contains non-ASCII 
Unicode characters in its path.


There are several related issues about Windows Unicode support (e.g. 
JDK-4488646, JDK-4519026, JDK-4900150, JDK-6475368, JDK-6937897, 
JDK-8029584), some of which are resolved with "Future Project" and the 
last one having Fix Version 10 [1].
A while ago there was also a draft JEP about this with ID 8047097 [2]. 
However, the JEP is no longer available & the associated JDK issue is 
private.
In January a code submission was proposed by Microsoft developers [3], 
but it's unclear from the thread what happened with the submission.
From these observations, I'd guess there will be a "Windows Unicode 
support" project targeted for Java SE 10?


Who can shed some light on the current plans for this? Will there be 
improvements in this area in Java SE 9?


Thanks in advance,
Anthony

[1] https://bugs.openjdk.java.net/browse/JDK-8029584
[2] http://openjdk.java.net/jeps/8047097
[3] 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-January/031068.html




Re: RFR (XS) 8076759: AbstractStringBuilder.append(...) should ensure enough capacity for the upcoming "trivial" append calls

2015-05-01 Thread Ulf Zibis

Hi Aleksey,

I like this approach and to me the webrev looks good.

-Ulf


Am 24.04.2015 um 20:05 schrieb Aleksey Shipilev:

Hi,

This seems to be a simple one-liner fix, but the background is more
complicated. See the bug:
   https://bugs.openjdk.java.net/browse/JDK-8076759

   http://cr.openjdk.java.net/~shade/8076759/webrev.00/





Re: RFR [9] Add blocking bulk read to java.io.InputStream

2015-05-01 Thread Roger Riggs

Hi Chris,

There is some duplication in the descriptions of the buffer contents; 
see below.


On 5/1/2015 5:54 AM, Chris Hegarty wrote:

This latest version addresses all comments so far:

/**
 * Reads some bytes from the input stream into the given byte array. This
 * method blocks until {@code len} bytes of input data have been read, 
end

 * of stream is detected, or an exception is thrown. The number of bytes
 * actually read, possibly zero, is returned. This method does not close
 * the input stream.
 *
 *  In the case where end of stream is reached before {@code len} 
bytes

 * have been read, then the actual number of bytes read will be returned.
 * When this stream reaches end of stream, further invocations of this
 * method will return zero.
 *
 *  If {@code len} is zero, then no bytes are read and {@code 0} is
 * returned; otherwise, there is an attempt to read up to {@code len} 
bytes.

 *
 *  The first byte read is stored into element {@code b[off]}, the 
next

 * one in to {@code b[off+1]}, and so on. The number of bytes read is, at
 * most, equal to {@code len}. _Let k be the number of bytes 
actually __
__ * read; these bytes will be stored in elements {@code b[off]} 
through __
__ * {@code b[off+}k{@code -1]}, leaving elements {@code 
b[off+}k __

__ * {@code ]} through {@code b[off+len-1]} unaffected. _

This section duplicates the previous sentence and the following sentence.

 *
 *  In the case where {@code off > 0}, elements {@code b[0]} through
 * {@code b[off-1]} are unaffected. In every case, elements
 * {@code b[off+len]} through {@code b[b.length-1]} are unaffected.
 *
_ *  In every case, elements {@code b[0]} through {@code b[off-1]} 
and __
__ * elements {@code b[off+len]} through {@code b[b.length-1]} are 
unaffected. _

Duplicates previous paragraph.

Each section of the buffer should be described only once.

Regards, Roger


 *
 *  The behavior for the case where the input stream is 
asynchronously

 * closed, or the thread interrupted during the read, is highly input
 * stream specific, and therefore not specified.
 *
 *  If an I/O error occurs reading from the input stream, then it 
may _occur _do

 * so__after some, but not all, bytes of {@code b} have been updated with
 * data from the input stream. Consequently the input stream and 
{@code b}

 * may be in an inconsistent state. It is strongly recommended that the
 * stream be promptly closed if an I/O error occurs.
 *
 * @param  b the buffer into which the data is read
 * @param  off the start offset in {@code b} at which the data is written
 * @param  len the maximum number of bytes to read
 * @return the actual number of bytes read into the buffer
 * @throws IOException if an I/O error occurs
 * @throws NullPointerException if {@code b} is {@code null}
 * @throws IndexOutOfBoundsException If {@code off} is negative, 
{@code len}
 * is negative, or {@code len} is greater than {@code b.length 
- off}

 *
 * @since 1.9
 */
public int readNBytes(byte[] b, int off, int len) throws IOException {
Objects.requireNonNull(b);
if (off < 0 || len < 0 || len > b.length - off)
throw new IndexOutOfBoundsException();
int n = 0;
while (n < len) {
int count = read(b, off + n, len - n);
if (count < 0)
break;
n += count;
}
return n;
}

-Chris.

On 24/04/15 09:44, Chris Hegarty wrote:

On 23 Apr 2015, at 22:24, Roger Riggs  wrote:


Hi Pavel,

On 4/23/2015 5:12 PM, Pavel Rappo wrote:

Hey Roger,

1. Good catch! This thing also applies to 
java.io.InputStream.read(byte[], int, int):


Yes, good catch indeed.


  *  In every case, elements b[0] through
  * b[off] and elements b[off+len] 
through

  * b[b.length-1] are unaffected.

I suppose the javadoc for the method proposed by Chris has started 
its life as a
copy of the javadoc read(byte[], int, int) which was assumed to be 
perfectly

polished. Unfortunately it was a false assumption.
it happens...  many many people have read those descriptions (or 
didn't because

it was too obvious or thought to be redundant).


I propose this small amendment.

*  In the case where {@code off > 0}, elements {@code b[0]} through
* {@code b[off-1]} are unaffected. In every case, elements
* {@code b[off+len]} through {@code b[b.length-1]} are unaffected.



2. About awkward sentences. This paragraph also has to be rephrased 
for the same reason:


  *  The first byte read is stored into element {@code 
b[off]}, the next
  * one in to {@code b[off+1]}, and so on. The number of bytes 
read is, at
  * most, equal to {@code len}. Let k be the number of 
bytes actually
  * read; these bytes will be stored in elements {@code b[off]} 
through
  * {@code b[off+}k{@code -1]}, leaving elements {@code 
b[off+}k

  * {@code ]} through {@code b[off+len-1]} unaffected.

If k == 0 then spec claims to store values in b[off]... b[off - 1].


Reading the whole method description leads to be believe that 'k' 
ca

Re: RFR (XS) 8076759: AbstractStringBuilder.append(...) should ensure enough capacity for the upcoming "trivial" append calls

2015-05-01 Thread Aleksey Shipilev
Anyone?

-Aleksey

On 04/24/2015 09:05 PM, Aleksey Shipilev wrote:
> Hi,
> 
> This seems to be a simple one-liner fix, but the background is more
> complicated. See the bug:
>   https://bugs.openjdk.java.net/browse/JDK-8076759
> 
> The bottom line is that our current resizing policy in ASB is hostile
> for long appends. There is a heuristics that extends the capacity to
> match the *exact* length of append if doubling the array would not help.
> 
> This heuristics has a nasty corner case: if there is an upcoming append
> after a large one, then we are guaranteed to re-size again. If an
> upcoming append is large in itself, the resizing is inevitable even
> under the doubling-the-storage strategy; but if we only do a small
> append, then we can be smarter.
> 
> After trying a few options to fix this (see below), I have settled on
> just adding a simple static "pad", to absorb the trivial appends after a
> large append:
>   http://cr.openjdk.java.net/~shade/8076759/webrev.00/
> 
> The choice of "32" as magic number is deliberate: arraycopy likes large
> power-of-two strides (and it does not like to make catch up loops for
> small residuals). "16" is too small to fit the decimal representation of
> Long.MIN_VALUE, therefore, we pick "32".
> 
> There are other approaches, briefly mentioned here:
>   http://cr.openjdk.java.net/~shade/8076759/patches.txt
> 
> There is a direct correlation between the allocation pressure, and test
> performance:
>   http://cr.openjdk.java.net/~shade/8076759/data-perf.png
>   http://cr.openjdk.java.net/~shade/8076759/data-foot.png
> 
> Naively, one could expect doubling the storage ("mult2") until we reach
> $minimalCapacity solves the problem, but it wastes too much memory, and
> only reaches the "plus32" on power-of-two sizes. That is also the
> Achilles' Heel of the heuristics, because appending the
> power-of-two-plus-one-sized string will set us up for the original
> problem. This effect can be alleviated by doing the padding as well
> ("mult2-plus32"). Exactly the same trouble manifests on smaller strings
> that go through the usual double-the-storage route, and this is why a
> proposed patch makes the pad on common path.
> 
> I do believe the current heuristics is smart about large appends, and
> mult2* strategies undo it. Therefore, I would think keeping the
> minimumCapacity cap is a good thing, and just adding the pad is a good
> solution. Thus, it is in the webrev.
> 
> Thanks,
> -Aleksey.
> 




Re: RFR [9] Add blocking bulk read to java.io.InputStream

2015-05-01 Thread Chris Hegarty

This latest version addresses all comments so far:

/**
 * Reads some bytes from the input stream into the given byte array. This
 * method blocks until {@code len} bytes of input data have been read, end
 * of stream is detected, or an exception is thrown. The number of bytes
 * actually read, possibly zero, is returned. This method does not close
 * the input stream.
 *
 *  In the case where end of stream is reached before {@code len} bytes
 * have been read, then the actual number of bytes read will be returned.
 * When this stream reaches end of stream, further invocations of this
 * method will return zero.
 *
 *  If {@code len} is zero, then no bytes are read and {@code 0} is
 * returned; otherwise, there is an attempt to read up to {@code len} 
bytes.

 *
 *  The first byte read is stored into element {@code b[off]}, the next
 * one in to {@code b[off+1]}, and so on. The number of bytes read is, at
 * most, equal to {@code len}. Let k be the number of bytes actually
 * read; these bytes will be stored in elements {@code b[off]} through
 * {@code b[off+}k{@code -1]}, leaving elements {@code 
b[off+}k

 * {@code ]} through {@code b[off+len-1]} unaffected.
 *
 *  In the case where {@code off > 0}, elements {@code b[0]} through
 * {@code b[off-1]} are unaffected. In every case, elements
 * {@code b[off+len]} through {@code b[b.length-1]} are unaffected.
 *
 *  In every case, elements {@code b[0]} through {@code b[off-1]} and
 * elements {@code b[off+len]} through {@code b[b.length-1]} are 
unaffected.

 *
 *  The behavior for the case where the input stream is 
asynchronously

 * closed, or the thread interrupted during the read, is highly input
 * stream specific, and therefore not specified.
 *
 *  If an I/O error occurs reading from the input stream, then it may do
 * so after some, but not all, bytes of {@code b} have been updated with
 * data from the input stream. Consequently the input stream and {@code b}
 * may be in an inconsistent state. It is strongly recommended that the
 * stream be promptly closed if an I/O error occurs.
 *
 * @param  b the buffer into which the data is read
 * @param  off the start offset in {@code b} at which the data is written
 * @param  len the maximum number of bytes to read
 * @return the actual number of bytes read into the buffer
 * @throws IOException if an I/O error occurs
 * @throws NullPointerException if {@code b} is {@code null}
 * @throws IndexOutOfBoundsException If {@code off} is negative, {@code 
len}
 * is negative, or {@code len} is greater than {@code b.length 
- off}

 *
 * @since 1.9
 */
public int readNBytes(byte[] b, int off, int len) throws IOException {
Objects.requireNonNull(b);
if (off < 0 || len < 0 || len > b.length - off)
throw new IndexOutOfBoundsException();
int n = 0;
while (n < len) {
int count = read(b, off + n, len - n);
if (count < 0)
break;
n += count;
}
return n;
}

-Chris.

On 24/04/15 09:44, Chris Hegarty wrote:

On 23 Apr 2015, at 22:24, Roger Riggs  wrote:


Hi Pavel,

On 4/23/2015 5:12 PM, Pavel Rappo wrote:

Hey Roger,

1. Good catch! This thing also applies to java.io.InputStream.read(byte[], int, 
int):


Yes, good catch indeed.


  *  In every case, elements b[0] through
  * b[off] and elements b[off+len] through
  * b[b.length-1] are unaffected.

I suppose the javadoc for the method proposed by Chris has started its life as a
copy of the javadoc read(byte[], int, int) which was assumed to be perfectly
polished. Unfortunately it was a false assumption.

it happens...  many many people have read those descriptions  (or didn't because
it was too obvious or thought to be redundant).


I propose this small amendment.

*  In the case where {@code off > 0}, elements {@code b[0]} through
* {@code b[off-1]} are unaffected. In every case, elements
* {@code b[off+len]} through {@code b[b.length-1]} are unaffected.



2. About awkward sentences. This paragraph also has to be rephrased for the 
same reason:

  *  The first byte read is stored into element {@code b[off]}, the next
  * one in to {@code b[off+1]}, and so on. The number of bytes read is, at
  * most, equal to {@code len}. Let k be the number of bytes actually
  * read; these bytes will be stored in elements {@code b[off]} through
  * {@code b[off+}k{@code -1]}, leaving elements {@code 
b[off+}k
  * {@code ]} through {@code b[off+len-1]} unaffected.

If k == 0 then spec claims to store values in b[off]... b[off - 1].


Reading the whole method description leads to be believe that 'k' cannot equal 
0 at this point. The previous paragraph handles the case where len is 0. The 
previous paragraph to that handles the EOF case. This paragraph implicitly 
implies that k is greater than 0, “The first byte read”, and “the number of 
actual bytes read”, neither of which can be 0 at this point.

I included below [*] the latest version of this method, including all comment