On Fri, 27 Jan 2023 16:04:41 GMT, Roger Riggs <[email protected]> wrote:
>> This is the javadoc of `JavaLangAccess::newStringNoRepl`:
>>
>>
>> /**
>> * Constructs a new {@code String} by decoding the specified subarray of
>> * bytes using the specified {@linkplain java.nio.charset.Charset
>> charset}.
>> *
>> * The caller of this method shall relinquish and transfer the ownership
>> of
>> * the byte array to the callee since the later will not make a copy.
>> *
>> * @param bytes the byte array source
>> * @param cs the Charset
>> * @return the newly created string
>> * @throws CharacterCodingException for malformed or unmappable bytes
>> */
>>
>>
>> It is recorded in the document that it should be able to directly construct
>> strings with parameter byte array to reduce array allocation.
>>
>> However, at present, `newStringNoRepl` always copies arrays for UTF-8 or
>> other ASCII compatible charsets.
>>
>> This PR fixes this problem.
>
> It seems odd that the benchmark seems slower for smaller files; can you
> suggest why that might be?
> I'd expect the size distribution for Files.readString to be biased toward the
> smaller files.
> Can you repeat the benchmark using the default file system. OS file caching
> should eliminate the disk speed effects.
@RogerRiggs
I rerun benchmark based on the default file system, and the test file size is
between 0 and 32KiB.
The throughput of reading ASCII files as UTF-8:

The throughput of reading ASCII files as GBK:

The performance has been slightly improved, and there is no performance
degradation.
For UTF-8 and GBK files with non-ASCII characters, the throughput fluctuates by
no more than 4%.
Test code and original results:
https://gist.github.com/Glavo/f3d2060d0bd13cd0ce2add70e6060ea0?permalink_comment_id=4451350#gistcomment-4451350
> It seems odd that the benchmark seems slower for smaller files; can you
> suggest why that might be?
The most likely reason is the cost of the newly added if judgment in
newStringUTF8NoRepl.
I don't think this is an important issue, because when it comes to actual I/O
operations, its impact is negligible.
The main purpose of this PR is to eliminate unnecessary temporary memory
allocation, thus reducing GC pressure. The change in throughput is only a
by-product.
-------------
PR: https://git.openjdk.org/jdk/pull/12119