Re: RFR (XS) 8076759: AbstractStringBuilder.append(...) should ensure enough capacity for the upcoming "trivial" append calls
Hello, btw just a small - maybe unrelated - observation on stock Java8u40. When benchmarking String.valueOf/Integer.toString/""+n with JMH I noticed that the compiler aided concatenation is the fastest, but not for all integer values. I asume this is related with the initial size of the buffer? https://gist.github.com/ecki/399136f4fd59c1d110c1 Gruss Bernd Am Fri, 01 May 2015 13:19:11 -0400 schrieb Roger Riggs : > Hi Aleksey, > > Is there any additional benefit by rounding up the next multiple of 4 > or 8. That would avoid a few wasted bytes at the end of the buffer > modulo the allocation size. > > Otherwise, looks fine to me also.
Re: RFR (XS) 8076759: AbstractStringBuilder.append(...) should ensure enough capacity for the upcoming "trivial" append calls
Hi Aleksey, Is there any additional benefit by rounding up the next multiple of 4 or 8. That would avoid a few wasted bytes at the end of the buffer modulo the allocation size. Otherwise, looks fine to me also. Roger On 5/1/2015 10:09 AM, Aleksey Shipilev wrote: Anyone? -Aleksey On 04/24/2015 09:05 PM, Aleksey Shipilev wrote: Hi, This seems to be a simple one-liner fix, but the background is more complicated. See the bug: https://bugs.openjdk.java.net/browse/JDK-8076759 The bottom line is that our current resizing policy in ASB is hostile for long appends. There is a heuristics that extends the capacity to match the *exact* length of append if doubling the array would not help. This heuristics has a nasty corner case: if there is an upcoming append after a large one, then we are guaranteed to re-size again. If an upcoming append is large in itself, the resizing is inevitable even under the doubling-the-storage strategy; but if we only do a small append, then we can be smarter. After trying a few options to fix this (see below), I have settled on just adding a simple static "pad", to absorb the trivial appends after a large append: http://cr.openjdk.java.net/~shade/8076759/webrev.00/ The choice of "32" as magic number is deliberate: arraycopy likes large power-of-two strides (and it does not like to make catch up loops for small residuals). "16" is too small to fit the decimal representation of Long.MIN_VALUE, therefore, we pick "32". There are other approaches, briefly mentioned here: http://cr.openjdk.java.net/~shade/8076759/patches.txt There is a direct correlation between the allocation pressure, and test performance: http://cr.openjdk.java.net/~shade/8076759/data-perf.png http://cr.openjdk.java.net/~shade/8076759/data-foot.png Naively, one could expect doubling the storage ("mult2") until we reach $minimalCapacity solves the problem, but it wastes too much memory, and only reaches the "plus32" on power-of-two sizes. That is also the Achilles' Heel of the heuristics, because appending the power-of-two-plus-one-sized string will set us up for the original problem. This effect can be alleviated by doing the padding as well ("mult2-plus32"). Exactly the same trouble manifests on smaller strings that go through the usual double-the-storage route, and this is why a proposed patch makes the pad on common path. I do believe the current heuristics is smart about large appends, and mult2* strategies undo it. Therefore, I would think keeping the minimumCapacity cap is a good thing, and just adding the pad is a good solution. Thus, it is in the webrev. Thanks, -Aleksey.
Unicode command-line parameters on Windows
Hi I would like to use a Java program for Windows file associations. However, this doesn't work when the file to be opened contains non-ASCII Unicode characters in its path. There are several related issues about Windows Unicode support (e.g. JDK-4488646, JDK-4519026, JDK-4900150, JDK-6475368, JDK-6937897, JDK-8029584), some of which are resolved with "Future Project" and the last one having Fix Version 10 [1]. A while ago there was also a draft JEP about this with ID 8047097 [2]. However, the JEP is no longer available & the associated JDK issue is private. In January a code submission was proposed by Microsoft developers [3], but it's unclear from the thread what happened with the submission. From these observations, I'd guess there will be a "Windows Unicode support" project targeted for Java SE 10? Who can shed some light on the current plans for this? Will there be improvements in this area in Java SE 9? Thanks in advance, Anthony [1] https://bugs.openjdk.java.net/browse/JDK-8029584 [2] http://openjdk.java.net/jeps/8047097 [3] http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-January/031068.html
Re: RFR (XS) 8076759: AbstractStringBuilder.append(...) should ensure enough capacity for the upcoming "trivial" append calls
Hi Aleksey, I like this approach and to me the webrev looks good. -Ulf Am 24.04.2015 um 20:05 schrieb Aleksey Shipilev: Hi, This seems to be a simple one-liner fix, but the background is more complicated. See the bug: https://bugs.openjdk.java.net/browse/JDK-8076759 http://cr.openjdk.java.net/~shade/8076759/webrev.00/
Re: RFR [9] Add blocking bulk read to java.io.InputStream
Hi Chris, There is some duplication in the descriptions of the buffer contents; see below. On 5/1/2015 5:54 AM, Chris Hegarty wrote: This latest version addresses all comments so far: /** * Reads some bytes from the input stream into the given byte array. This * method blocks until {@code len} bytes of input data have been read, end * of stream is detected, or an exception is thrown. The number of bytes * actually read, possibly zero, is returned. This method does not close * the input stream. * * In the case where end of stream is reached before {@code len} bytes * have been read, then the actual number of bytes read will be returned. * When this stream reaches end of stream, further invocations of this * method will return zero. * * If {@code len} is zero, then no bytes are read and {@code 0} is * returned; otherwise, there is an attempt to read up to {@code len} bytes. * * The first byte read is stored into element {@code b[off]}, the next * one in to {@code b[off+1]}, and so on. The number of bytes read is, at * most, equal to {@code len}. _Let k be the number of bytes actually __ __ * read; these bytes will be stored in elements {@code b[off]} through __ __ * {@code b[off+}k{@code -1]}, leaving elements {@code b[off+}k __ __ * {@code ]} through {@code b[off+len-1]} unaffected. _ This section duplicates the previous sentence and the following sentence. * * In the case where {@code off > 0}, elements {@code b[0]} through * {@code b[off-1]} are unaffected. In every case, elements * {@code b[off+len]} through {@code b[b.length-1]} are unaffected. * _ * In every case, elements {@code b[0]} through {@code b[off-1]} and __ __ * elements {@code b[off+len]} through {@code b[b.length-1]} are unaffected. _ Duplicates previous paragraph. Each section of the buffer should be described only once. Regards, Roger * * The behavior for the case where the input stream is asynchronously * closed, or the thread interrupted during the read, is highly input * stream specific, and therefore not specified. * * If an I/O error occurs reading from the input stream, then it may _occur _do * so__after some, but not all, bytes of {@code b} have been updated with * data from the input stream. Consequently the input stream and {@code b} * may be in an inconsistent state. It is strongly recommended that the * stream be promptly closed if an I/O error occurs. * * @param b the buffer into which the data is read * @param off the start offset in {@code b} at which the data is written * @param len the maximum number of bytes to read * @return the actual number of bytes read into the buffer * @throws IOException if an I/O error occurs * @throws NullPointerException if {@code b} is {@code null} * @throws IndexOutOfBoundsException If {@code off} is negative, {@code len} * is negative, or {@code len} is greater than {@code b.length - off} * * @since 1.9 */ public int readNBytes(byte[] b, int off, int len) throws IOException { Objects.requireNonNull(b); if (off < 0 || len < 0 || len > b.length - off) throw new IndexOutOfBoundsException(); int n = 0; while (n < len) { int count = read(b, off + n, len - n); if (count < 0) break; n += count; } return n; } -Chris. On 24/04/15 09:44, Chris Hegarty wrote: On 23 Apr 2015, at 22:24, Roger Riggs wrote: Hi Pavel, On 4/23/2015 5:12 PM, Pavel Rappo wrote: Hey Roger, 1. Good catch! This thing also applies to java.io.InputStream.read(byte[], int, int): Yes, good catch indeed. * In every case, elements b[0] through * b[off] and elements b[off+len] through * b[b.length-1] are unaffected. I suppose the javadoc for the method proposed by Chris has started its life as a copy of the javadoc read(byte[], int, int) which was assumed to be perfectly polished. Unfortunately it was a false assumption. it happens... many many people have read those descriptions (or didn't because it was too obvious or thought to be redundant). I propose this small amendment. * In the case where {@code off > 0}, elements {@code b[0]} through * {@code b[off-1]} are unaffected. In every case, elements * {@code b[off+len]} through {@code b[b.length-1]} are unaffected. 2. About awkward sentences. This paragraph also has to be rephrased for the same reason: * The first byte read is stored into element {@code b[off]}, the next * one in to {@code b[off+1]}, and so on. The number of bytes read is, at * most, equal to {@code len}. Let k be the number of bytes actually * read; these bytes will be stored in elements {@code b[off]} through * {@code b[off+}k{@code -1]}, leaving elements {@code b[off+}k * {@code ]} through {@code b[off+len-1]} unaffected. If k == 0 then spec claims to store values in b[off]... b[off - 1]. Reading the whole method description leads to be believe that 'k' ca
Re: RFR (XS) 8076759: AbstractStringBuilder.append(...) should ensure enough capacity for the upcoming "trivial" append calls
Anyone? -Aleksey On 04/24/2015 09:05 PM, Aleksey Shipilev wrote: > Hi, > > This seems to be a simple one-liner fix, but the background is more > complicated. See the bug: > https://bugs.openjdk.java.net/browse/JDK-8076759 > > The bottom line is that our current resizing policy in ASB is hostile > for long appends. There is a heuristics that extends the capacity to > match the *exact* length of append if doubling the array would not help. > > This heuristics has a nasty corner case: if there is an upcoming append > after a large one, then we are guaranteed to re-size again. If an > upcoming append is large in itself, the resizing is inevitable even > under the doubling-the-storage strategy; but if we only do a small > append, then we can be smarter. > > After trying a few options to fix this (see below), I have settled on > just adding a simple static "pad", to absorb the trivial appends after a > large append: > http://cr.openjdk.java.net/~shade/8076759/webrev.00/ > > The choice of "32" as magic number is deliberate: arraycopy likes large > power-of-two strides (and it does not like to make catch up loops for > small residuals). "16" is too small to fit the decimal representation of > Long.MIN_VALUE, therefore, we pick "32". > > There are other approaches, briefly mentioned here: > http://cr.openjdk.java.net/~shade/8076759/patches.txt > > There is a direct correlation between the allocation pressure, and test > performance: > http://cr.openjdk.java.net/~shade/8076759/data-perf.png > http://cr.openjdk.java.net/~shade/8076759/data-foot.png > > Naively, one could expect doubling the storage ("mult2") until we reach > $minimalCapacity solves the problem, but it wastes too much memory, and > only reaches the "plus32" on power-of-two sizes. That is also the > Achilles' Heel of the heuristics, because appending the > power-of-two-plus-one-sized string will set us up for the original > problem. This effect can be alleviated by doing the padding as well > ("mult2-plus32"). Exactly the same trouble manifests on smaller strings > that go through the usual double-the-storage route, and this is why a > proposed patch makes the pad on common path. > > I do believe the current heuristics is smart about large appends, and > mult2* strategies undo it. Therefore, I would think keeping the > minimumCapacity cap is a good thing, and just adding the pad is a good > solution. Thus, it is in the webrev. > > Thanks, > -Aleksey. >
Re: RFR [9] Add blocking bulk read to java.io.InputStream
This latest version addresses all comments so far: /** * Reads some bytes from the input stream into the given byte array. This * method blocks until {@code len} bytes of input data have been read, end * of stream is detected, or an exception is thrown. The number of bytes * actually read, possibly zero, is returned. This method does not close * the input stream. * * In the case where end of stream is reached before {@code len} bytes * have been read, then the actual number of bytes read will be returned. * When this stream reaches end of stream, further invocations of this * method will return zero. * * If {@code len} is zero, then no bytes are read and {@code 0} is * returned; otherwise, there is an attempt to read up to {@code len} bytes. * * The first byte read is stored into element {@code b[off]}, the next * one in to {@code b[off+1]}, and so on. The number of bytes read is, at * most, equal to {@code len}. Let k be the number of bytes actually * read; these bytes will be stored in elements {@code b[off]} through * {@code b[off+}k{@code -1]}, leaving elements {@code b[off+}k * {@code ]} through {@code b[off+len-1]} unaffected. * * In the case where {@code off > 0}, elements {@code b[0]} through * {@code b[off-1]} are unaffected. In every case, elements * {@code b[off+len]} through {@code b[b.length-1]} are unaffected. * * In every case, elements {@code b[0]} through {@code b[off-1]} and * elements {@code b[off+len]} through {@code b[b.length-1]} are unaffected. * * The behavior for the case where the input stream is asynchronously * closed, or the thread interrupted during the read, is highly input * stream specific, and therefore not specified. * * If an I/O error occurs reading from the input stream, then it may do * so after some, but not all, bytes of {@code b} have been updated with * data from the input stream. Consequently the input stream and {@code b} * may be in an inconsistent state. It is strongly recommended that the * stream be promptly closed if an I/O error occurs. * * @param b the buffer into which the data is read * @param off the start offset in {@code b} at which the data is written * @param len the maximum number of bytes to read * @return the actual number of bytes read into the buffer * @throws IOException if an I/O error occurs * @throws NullPointerException if {@code b} is {@code null} * @throws IndexOutOfBoundsException If {@code off} is negative, {@code len} * is negative, or {@code len} is greater than {@code b.length - off} * * @since 1.9 */ public int readNBytes(byte[] b, int off, int len) throws IOException { Objects.requireNonNull(b); if (off < 0 || len < 0 || len > b.length - off) throw new IndexOutOfBoundsException(); int n = 0; while (n < len) { int count = read(b, off + n, len - n); if (count < 0) break; n += count; } return n; } -Chris. On 24/04/15 09:44, Chris Hegarty wrote: On 23 Apr 2015, at 22:24, Roger Riggs wrote: Hi Pavel, On 4/23/2015 5:12 PM, Pavel Rappo wrote: Hey Roger, 1. Good catch! This thing also applies to java.io.InputStream.read(byte[], int, int): Yes, good catch indeed. * In every case, elements b[0] through * b[off] and elements b[off+len] through * b[b.length-1] are unaffected. I suppose the javadoc for the method proposed by Chris has started its life as a copy of the javadoc read(byte[], int, int) which was assumed to be perfectly polished. Unfortunately it was a false assumption. it happens... many many people have read those descriptions (or didn't because it was too obvious or thought to be redundant). I propose this small amendment. * In the case where {@code off > 0}, elements {@code b[0]} through * {@code b[off-1]} are unaffected. In every case, elements * {@code b[off+len]} through {@code b[b.length-1]} are unaffected. 2. About awkward sentences. This paragraph also has to be rephrased for the same reason: * The first byte read is stored into element {@code b[off]}, the next * one in to {@code b[off+1]}, and so on. The number of bytes read is, at * most, equal to {@code len}. Let k be the number of bytes actually * read; these bytes will be stored in elements {@code b[off]} through * {@code b[off+}k{@code -1]}, leaving elements {@code b[off+}k * {@code ]} through {@code b[off+len-1]} unaffected. If k == 0 then spec claims to store values in b[off]... b[off - 1]. Reading the whole method description leads to be believe that 'k' cannot equal 0 at this point. The previous paragraph handles the case where len is 0. The previous paragraph to that handles the EOF case. This paragraph implicitly implies that k is greater than 0, “The first byte read”, and “the number of actual bytes read”, neither of which can be 0 at this point. I included below [*] the latest version of this method, including all comment