Re: RFR: JDK-8299336 - InputStream::DEFAULT_BUFFER_SIZE should be 16384

2023-01-04 Thread Markus KARG
On Fri, 23 Dec 2022 22:28:34 GMT, Markus KARG  wrote:

> I/O had always been much slower than CPU and memory access, and thanks to 
> physical constraints, always will be.
> While CPUs can get shrinked more and more, and can hold more and more memory 
> cache on or nearby a CPU core, the distance between CPU core and I/O device 
> cannot get reduced much: It will stay "far" away.
> Due to this simple logic (and other factors), the spread between performance 
> of CPU and memory access on one hand, and performance of I/O on the other 
> hand, increases with every new CPU generation.
> As a consequence, internal adjustment factors of the JDK need to get revised 
> from time to time to ensure optimum performance and each hardware generation.
> 
> One such factor is the size of the temporary transfer buffer used internally 
> by `InputStream::transferTo`.
> Since its introduction with JDK 9 many years (hence hardware generations) 
> have passed, so it's time to check the appropriateness of that buffer's size.
> 
> Using JMH on a typical, modern cloud platform, it was proven that the current 
> 8K buffer is (much) too small on modern hardware:
> The small buffer clearly stands in the way of faster transfers.
> The ops/s of a simple `FileInputStream.transferTo(ByteArrayOutputStream)` 
> operation on JDK 21 could be doubled (!) by only doubling the buffer size 
> from 8K to 16K, which seems to be a considerable and cheap deal.
> Doubling the buffer even more shows only marginal improvements of approx. 1% 
> to 3% per duplication of size, which does not justify additional memory 
> consumption.
> 
> 
> TransferToPerformance.transferTo 8192 1048576 thrpt 25 1349.929 ± 47.057 ops/s
> TransferToPerformance.transferTo 16384 1048576 thrpt 25 2633.560 ± 93.337 
> ops/s
> TransferToPerformance.transferTo 32768 1048576 thrpt 25 2721.025 ± 89.555 
> ops/s
> TransferToPerformance.transferTo 65536 1048576 thrpt 25 2855.949 ± 96.623 
> ops/s
> TransferToPerformance.transferTo 131072 1048576 thrpt 25 2903.062 ± 40.798 
> ops/s
> 
> 
> Even on small or limited platforms, an investment of 8K additonal temporary 
> buffer is very cheap and very useful, as it doubles the performance of 
> `InputStream::transferTo`, in particular for legacy (non-NIO) applications 
> still using `FileInputStream` and `ByteArrayOutputStream`.
> I dare to say, even if not proven, that is a very considerable (possibly the 
> major) number of existing applications, as NIO was only adopted gradually by 
> programmers.
> 
> Due to the given reasons, it should be approporiate to change 
> `DEFAULT_BUFFER_SIZE` from 8192 to 16384.

Thank you for not rejecting my proposal. Indeed the starting point for this PR 
was the surprising experience with several existing applications that were 
using custom transfer loops with larger buffers being to be considerably faster 
than `transferTo` (Maven being one of them just to name some). Can I convince 
you to mark this PR as reviewed? Thanks! :-)

-

PR: https://git.openjdk.org/jdk/pull/11783


Re: RFR: JDK-8299336 - InputStream::DEFAULT_BUFFER_SIZE should be 16384

2023-01-03 Thread Alan Bateman
On Fri, 23 Dec 2022 22:28:34 GMT, Markus KARG  wrote:

> I/O had always been much slower than CPU and memory access, and thanks to 
> physical constraints, always will be.
> While CPUs can get shrinked more and more, and can hold more and more memory 
> cache on or nearby a CPU core, the distance between CPU core and I/O device 
> cannot get reduced much: It will stay "far" away.
> Due to this simple logic (and other factors), the spread between performance 
> of CPU and memory access on one hand, and performance of I/O on the other 
> hand, increases with every new CPU generation.
> As a consequence, internal adjustment factors of the JDK need to get revised 
> from time to time to ensure optimum performance and each hardware generation.
> 
> One such factor is the size of the temporary transfer buffer used internally 
> by `InputStream::transferTo`.
> Since its introduction with JDK 9 many years (hence hardware generations) 
> have passed, so it's time to check the appropriateness of that buffer's size.
> 
> Using JMH on a typical, modern cloud platform, it was proven that the current 
> 8K buffer is (much) too small on modern hardware:
> The small buffer clearly stands in the way of faster transfers.
> The ops/s of a simple `FileInputStream.transferTo(ByteArrayOutputStream)` 
> operation on JDK 21 could be doubled (!) by only doubling the buffer size 
> from 8K to 16K, which seems to be a considerable and cheap deal.
> Doubling the buffer even more shows only marginal improvements of approx. 1% 
> to 3% per duplication of size, which does not justify additional memory 
> consumption.
> 
> 
> TransferToPerformance.transferTo 8192 1048576 thrpt 25 1349.929 ± 47.057 ops/s
> TransferToPerformance.transferTo 16384 1048576 thrpt 25 2633.560 ± 93.337 
> ops/s
> TransferToPerformance.transferTo 32768 1048576 thrpt 25 2721.025 ± 89.555 
> ops/s
> TransferToPerformance.transferTo 65536 1048576 thrpt 25 2855.949 ± 96.623 
> ops/s
> TransferToPerformance.transferTo 131072 1048576 thrpt 25 2903.062 ± 40.798 
> ops/s
> 
> 
> Even on small or limited platforms, an investment of 8K additonal temporary 
> buffer is very cheap and very useful, as it doubles the performance of 
> `InputStream::transferTo`, in particular for legacy (non-NIO) applications 
> still using `FileInputStream` and `ByteArrayOutputStream`.
> I dare to say, even if not proven, that is a very considerable (possibly the 
> major) number of existing applications, as NIO was only adopted gradually by 
> programmers.
> 
> Due to the given reasons, it should be approporiate to change 
> `DEFAULT_BUFFER_SIZE` from 8192 to 16384.

The source input stream and target output stream can be connected to anything 
and reading/writing may involve compressed or encrypted streams. 8k has always 
been a default that is expected to be a good compromise for a wide range of 
sources and targets. It's good to have data for the file -> BAOS case for both 
cold and cached cases. No objection to bumping it to 16k as it's a temporary 
buffer so can be GC'ed when there isn't a thread blocked in transferTo (or 
readNBytes).

-

PR: https://git.openjdk.org/jdk/pull/11783


Re: RFR: JDK-8299336 - InputStream::DEFAULT_BUFFER_SIZE should be 16384

2023-01-02 Thread Markus KARG
On Mon, 2 Jan 2023 10:03:02 GMT, Peter Levart  wrote:

> Here, the benefit of increasing buffer from 8k to 16k gets from about 10% 
> (doing IO) up to 20% (reading from cache) increase in performance.

I think 10% to 20% is good enough as an argument to go with 16k instead of 8k.

-

PR: https://git.openjdk.org/jdk/pull/11783


Re: RFR: JDK-8299336 - InputStream::DEFAULT_BUFFER_SIZE should be 16384

2023-01-02 Thread Peter Levart
On Fri, 23 Dec 2022 22:28:34 GMT, Markus KARG  wrote:

> I/O had always been much slower than CPU and memory access, and thanks to 
> physical constraints, always will be.
> While CPUs can get shrinked more and more, and can hold more and more memory 
> cache on or nearby a CPU core, the distance between CPU core and I/O device 
> cannot get reduced much: It will stay "far" away.
> Due to this simple logic (and other factors), the spread between performance 
> of CPU and memory access on one hand, and performance of I/O on the other 
> hand, increases with every new CPU generation.
> As a consequence, internal adjustment factors of the JDK need to get revised 
> from time to time to ensure optimum performance and each hardware generation.
> 
> One such factor is the size of the temporary transfer buffer used internally 
> by `InputStream::transferTo`.
> Since its introduction with JDK 9 many years (hence hardware generations) 
> have passed, so it's time to check the appropriateness of that buffer's size.
> 
> Using JMH on a typical, modern cloud platform, it was proven that the current 
> 8K buffer is (much) too small on modern hardware:
> The small buffer clearly stands in the way of faster transfers.
> The ops/s of a simple `FileInputStream.transferTo(ByteArrayOutputStream)` 
> operation on JDK 21 could be doubled (!) by only doubling the buffer size 
> from 8K to 16K, which seems to be a considerable and cheap deal.
> Doubling the buffer even more shows only marginal improvements of approx. 1% 
> to 3% per duplication of size, which does not justify additional memory 
> consumption.
> 
> 
> TransferToPerformance.transferTo 8192 1048576 thrpt 25 1349.929 ± 47.057 ops/s
> TransferToPerformance.transferTo 16384 1048576 thrpt 25 2633.560 ± 93.337 
> ops/s
> TransferToPerformance.transferTo 32768 1048576 thrpt 25 2721.025 ± 89.555 
> ops/s
> TransferToPerformance.transferTo 65536 1048576 thrpt 25 2855.949 ± 96.623 
> ops/s
> TransferToPerformance.transferTo 131072 1048576 thrpt 25 2903.062 ± 40.798 
> ops/s
> 
> 
> Even on small or limited platforms, an investment of 8K additonal temporary 
> buffer is very cheap and very useful, as it doubles the performance of 
> `InputStream::transferTo`, in particular for legacy (non-NIO) applications 
> still using `FileInputStream` and `ByteArrayOutputStream`.
> I dare to say, even if not proven, that is a very considerable (possibly the 
> major) number of existing applications, as NIO was only adopted gradually by 
> programmers.
> 
> Due to the given reasons, it should be approporiate to change 
> `DEFAULT_BUFFER_SIZE` from 8192 to 16384.

Here's also results for "modern" architecture - I executed the benchmark in a 
k8s container on an Oracle cloud ARM64 virtual machine.

With dropCaches=true:
https://jmh.morethan.io/?gist=2db4c3b51073e90c1a84d7eed8e1a988

With dropCaches=false:
https://jmh.morethan.io/?gist=d7dfb5def5899af41722b2768a827006

Here, the benefit of increasing buffer from 8k to 16k gets from about 10% 
(doing IO) up to 20% (reading from cache) increase in performance.

-

PR: https://git.openjdk.org/jdk/pull/11783


Re: RFR: JDK-8299336 - InputStream::DEFAULT_BUFFER_SIZE should be 16384

2023-01-01 Thread Peter Levart
On Fri, 23 Dec 2022 22:28:34 GMT, Markus KARG  wrote:

> I/O had always been much slower than CPU and memory access, and thanks to 
> physical constraints, always will be.
> While CPUs can get shrinked more and more, and can hold more and more memory 
> cache on or nearby a CPU core, the distance between CPU core and I/O device 
> cannot get reduced much: It will stay "far" away.
> Due to this simple logic (and other factors), the spread between performance 
> of CPU and memory access on one hand, and performance of I/O on the other 
> hand, increases with every new CPU generation.
> As a consequence, internal adjustment factors of the JDK need to get revised 
> from time to time to ensure optimum performance and each hardware generation.
> 
> One such factor is the size of the temporary transfer buffer used internally 
> by `InputStream::transferTo`.
> Since its introduction with JDK 9 many years (hence hardware generations) 
> have passed, so it's time to check the appropriateness of that buffer's size.
> 
> Using JMH on a typical, modern cloud platform, it was proven that the current 
> 8K buffer is (much) too small on modern hardware:
> The small buffer clearly stands in the way of faster transfers.
> The ops/s of a simple `FileInputStream.transferTo(ByteArrayOutputStream)` 
> operation on JDK 21 could be doubled (!) by only doubling the buffer size 
> from 8K to 16K, which seems to be a considerable and cheap deal.
> Doubling the buffer even more shows only marginal improvements of approx. 1% 
> to 3% per duplication of size, which does not justify additional memory 
> consumption.
> 
> 
> TransferToPerformance.transferTo 8192 1048576 thrpt 25 1349.929 ± 47.057 ops/s
> TransferToPerformance.transferTo 16384 1048576 thrpt 25 2633.560 ± 93.337 
> ops/s
> TransferToPerformance.transferTo 32768 1048576 thrpt 25 2721.025 ± 89.555 
> ops/s
> TransferToPerformance.transferTo 65536 1048576 thrpt 25 2855.949 ± 96.623 
> ops/s
> TransferToPerformance.transferTo 131072 1048576 thrpt 25 2903.062 ± 40.798 
> ops/s
> 
> 
> Even on small or limited platforms, an investment of 8K additonal temporary 
> buffer is very cheap and very useful, as it doubles the performance of 
> `InputStream::transferTo`, in particular for legacy (non-NIO) applications 
> still using `FileInputStream` and `ByteArrayOutputStream`.
> I dare to say, even if not proven, that is a very considerable (possibly the 
> major) number of existing applications, as NIO was only adopted gradually by 
> programmers.
> 
> Due to the given reasons, it should be approporiate to change 
> `DEFAULT_BUFFER_SIZE` from 8192 to 16384.

Hello Markus,

Happy 2023!

I checked out your JMH code and I'm a little uncomfortable with two things:
- at least on Linux, the benchmark does not actually involve any input from the 
device (HD, SSD) at all.
- at least on Linux, the benchmark involves output to the device (HD, SSD) in a 
way that might interfere with the banchmark.

Let me explain. In the per-invocation @Setup method you create new file with 
random bytes and close it. At least on Linux this actually just writes the 
content to the write-back buffer cache. The disk is not touched yet. Writing 
happens asynchronously in the background while the benchmark is running. The 
benchmark reads from that file but this does not involve reading from the 
device at all, since the whole content is already present in buffer cache. So 
what this benchmark does is it measures overhead of JNI invocations from Java 
to native code and the overhead of system calls from native code to kernel 
which are served from buffer cache. All this happens while asynchronous writes 
to disk are taking place which might interfere with benchmark as they compete 
with benchmark for system resources.
While running your unchanged benchmark, I checked the output of `iostat 2` 
which proves above claims:


Device tpskB_read/skB_wrtn/skB_dscd/skB_read
kB_wrtnkB_dscd
nvme0n1  22.50 0.00   438.75 0.00  0
877  0


So I modified your benchmark a bit:
- I copied the InputStream.transferTo method to the benchmark itself so that 
buffer size can be passed as argument. I think this should not present any 
noticable difference since we are not measuring the quality of JIT-ed code but 
overheads of JNI and system calls which are of entirely different magnitude.
- I moved the creation of a temporary file from per-invocation to per-trial 
@Setup method and added a "sync" command at the end which flushes the data to 
the actual device and waits for flushing to complete.
- I added a "drop_caches" command just before opening the file for reading in 
the per-invocation @Setup method. This drops cached data of the file so that 
reads in the benchmark method are made from the actual device. Since the use of 
buffer cache is an important use case I added a boolean `dropCaches` parameter 
to measure both cases.

Both commands work 

Re: RFR: JDK-8299336 - InputStream::DEFAULT_BUFFER_SIZE should be 16384

2022-12-27 Thread Markus KARG
On Tue, 27 Dec 2022 14:55:31 GMT, Peter Levart  wrote:

> Hello Markus! Could you show the JMH code that produced the benchmark results?

The following lines make use of a custom method I have added to `InputStream` 
in a custom build of JDK 21, so JMH can control the size of the buffer. The 
test runs approx. 42 Minutes, and the result depends on the actual hardware you 
are using. The common sense of all tested hardware so far is that going from 8K 
to 16K you have a *strong* gain, but the *additional* gain is lower with 
duplicating buffer size further. You have *double* performance with the 
specific cloud VM I was using, while you only have approx. only *20%* gain on 
my personal laptop with local SSD. Anyways, you always have a *considerable* 
gain.


public class TransferToPerformance {

public static void main(final String[] args) throws IOException {
Main.main(args);
}

@State(Scope.Benchmark)
public static class Config {

@Param({ "1048576" })
public int streamSize;

@Param({ "8192", "16384", "32768", "65536", "131072" })
public int bufferSize;

public InputStream source;
public OutputStream target;
private Path path;

@Setup(Level.Invocation)
public void setUp() throws IOException {
InputStream.setBufferSize(bufferSize); // static utility method 
using custom build of JDK 21
path = Files.createTempFile("a-", ".bin");
var bytes = createRandomBytes(streamSize);
Files.deleteIfExists(this.path);
Files.write(this.path, bytes, CREATE, TRUNCATE_EXISTING, WRITE);
source = new FileInputStream(this.path.toFile());
target = new ByteArrayOutputStream();
}

@TearDown(Level.Invocation)
public void tearDown() throws IOException {
source.close();
target.close();
source = null;
target = null;
Files.deleteIfExists(this.path);
}
}

private static final Random RND = new Random(); 

private static byte[] createRandomBytes(int size) {
var bytes = new byte[size];
RND.nextBytes(bytes);
return bytes;
}

@Benchmark
public final void transferTo(Config config, Blackhole blackhole) throws 
IOException {
blackhole.consume(config.source.transferTo(config.target));
config.target.flush();
config.target.close();
}

}

-

PR: https://git.openjdk.org/jdk/pull/11783


Re: RFR: JDK-8299336 - InputStream::DEFAULT_BUFFER_SIZE should be 16384

2022-12-27 Thread Peter Levart
On Fri, 23 Dec 2022 22:28:34 GMT, Markus KARG  wrote:

> I/O had always been much slower than CPU and memory access, and thanks to 
> physical constraints, always will be.
> While CPUs can get shrinked more and more, and can hold more and more memory 
> cache on or nearby a CPU core, the distance between CPU core and I/O device 
> cannot get reduced much: It will stay "far" away.
> Due to this simple logic (and other factors), the spread between performance 
> of CPU and memory access on one hand, and performance of I/O on the other 
> hand, increases with every new CPU generation.
> As a consequence, internal adjustment factors of the JDK need to get revised 
> from time to time to ensure optimum performance and each hardware generation.
> 
> One such factor is the size of the temporary transfer buffer used internally 
> by `InputStream::transferTo`.
> Since its introduction with JDK 9 many years (hence hardware generations) 
> have passed, so it's time to check the appropriateness of that buffer's size.
> 
> Using JMH on a typical, modern cloud platform, it was proven that the current 
> 8K buffer is (much) too small on modern hardware:
> The small buffer clearly stands in the way of faster transfers.
> The ops/s of a simple `FileInputStream.transferTo(ByteArrayOutputStream)` 
> operation on JDK 21 could be doubled (!) by only doubling the buffer size 
> from 8K to 16K, which seems to be a considerable and cheap deal.
> Doubling the buffer even more shows only marginal improvements of approx. 1% 
> to 3% per duplication of size, which does not justify additional memory 
> consumption.
> 
> 
> TransferToPerformance.transferTo 8192 1048576 thrpt 25 1349.929 ± 47.057 ops/s
> TransferToPerformance.transferTo 16384 1048576 thrpt 25 2633.560 ± 93.337 
> ops/s
> TransferToPerformance.transferTo 32768 1048576 thrpt 25 2721.025 ± 89.555 
> ops/s
> TransferToPerformance.transferTo 65536 1048576 thrpt 25 2855.949 ± 96.623 
> ops/s
> TransferToPerformance.transferTo 131072 1048576 thrpt 25 2903.062 ± 40.798 
> ops/s
> 
> 
> Even on small or limited platforms, an investment of 8K additonal temporary 
> buffer is very cheap and very useful, as it doubles the performance of 
> `InputStream::transferTo`, in particular for legacy (non-NIO) applications 
> still using `FileInputStream` and `ByteArrayOutputStream`.
> I dare to say, even if not proven, that is a very considerable (possibly the 
> major) number of existing applications, as NIO was only adopted gradually by 
> programmers.
> 
> Due to the given reasons, it should be approporiate to change 
> `DEFAULT_BUFFER_SIZE` from 8192 to 16384.

Hello Markus!
Could you show the JMH code that produced the benchmark results?

-

PR: https://git.openjdk.org/jdk/pull/11783


Re: RFR: JDK-8299336 - InputStream::DEFAULT_BUFFER_SIZE should be 16384

2022-12-24 Thread Johannes Lichtenberger
I'm also looking forward to io_uring support on Linux kernels which support
it and the Windows equivalent :-) The first one should come with one of the
project Loom updates sometime.

Kind regards
Johannes

Markus KARG  schrieb am Sa., 24. Dez. 2022, 10:11:

> I/O had always been much slower than CPU and memory access, and thanks to
> physical constraints, always will be.
> While CPUs can get shrinked more and more, and can hold more and more
> memory cache on or nearby a CPU core, the distance between CPU core and I/O
> device cannot get reduced much: It will stay "far" away.
> Due to this simple logic (and other factors), the spread between
> performance of CPU and memory access on one hand, and performance of I/O on
> the other hand, increases with every new CPU generation.
> As a consequence, internal adjustment factors of the JDK need to get
> revised from time to time to ensure optimum performance and each hardware
> generation.
>
> One such factor is the size of the temporary transfer buffer used
> internally by `InputStream::transferTo`.
> Since its introduction with JDK 9 many years (hence hardware generations)
> have passed, so it's time to check the appropriateness of that buffer's
> size.
>
> Using JMH on a typical, modern cloud platform, it was proven that the
> current 8K buffer is (much) too small on modern hardware:
> The small buffer clearly stands in the way of faster transfers.
> The ops/s of a simple `FileInputStream.transferTo(ByteArrayOutputStream)`
> operation on JDK 21 could be doubled (!) by only doubling the buffer size
> from 8K to 16K, which seems to be a considerable and cheap deal.
> Doubling the buffer even more shows only marginal improvements of approx.
> 1% to 3% per duplication of size, which does not justify additional memory
> consumption.
>
>
> TransferToPerformance.transferTo 8192 1048576 thrpt 25 1349.929 ± 47.057
> ops/s
> TransferToPerformance.transferTo 16384 1048576 thrpt 25 2633.560 ± 93.337
> ops/s
> TransferToPerformance.transferTo 32768 1048576 thrpt 25 2721.025 ± 89.555
> ops/s
> TransferToPerformance.transferTo 65536 1048576 thrpt 25 2855.949 ± 96.623
> ops/s
> TransferToPerformance.transferTo 131072 1048576 thrpt 25 2903.062 ± 40.798
> ops/s
>
>
> Even on small or limited platforms, an investment of 8K additonal
> temporary buffer is very cheap and very useful, as it doubles the
> performance of `InputStream::transferTo`, in particular for legacy
> (non-NIO) applications still using `FileInputStream` and
> `ByteArrayOutputStream`.
> I dare to say, even if not proven, that is a very considerable (possibly
> the major) number of existing applications, as NIO was only adopted
> gradually by programmers.
>
> Due to the given reasons, it should be approporiate to change
> `DEFAULT_BUFFER_SIZE` from 8192 to 16384.
>
> -
>
> Commit messages:
>  - JDK-8299336
>
> Changes: https://git.openjdk.org/jdk/pull/11783/files
>  Webrev: https://webrevs.openjdk.org/?repo=jdk=11783=00
>   Issue: https://bugs.openjdk.org/browse/JDK-8299336
>   Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
>   Patch: https://git.openjdk.org/jdk/pull/11783.diff
>   Fetch: git fetch https://git.openjdk.org/jdk pull/11783/head:pull/11783
>
> PR: https://git.openjdk.org/jdk/pull/11783
>