Re: RFR: 8231717: Improve performance of EBCDIC charset decoding for COMPACT_STRINGS

Roger Riggs Fri, 04 Oct 2019 12:57:40 -0700

Hi Andrew,

That does seem more straightforward.


The name ALWAYSCOMPACTABLE doesn't really match the rest of the
nomenclature used in the compact string code.

Can you change the name to LATIN1COMPATIBLE?

Its similar to the ASCIICOMPATIBLE case and tied in to the Latin1 codingfor used

in StringCoding.

I hope another Reviewer will take a look also.

Thanks, Roger



On 10/4/19 9:05 AM, Andrew Leonard wrote:

Hi Roger,
So based on your suggestion i've removed the logic from theconstructor and it's now pre-determined by the charset generatortooling, just like isASCIICompatible is... It works really nicely, sothere is now no impact on the constructor. I have verified this usingthe StrCodingBenchmark test. I have also integrated the new methodinto ArrayDecoder as you said, which makes it neater.
Please review the new webrev here:http://cr.openjdk.java.net/~aleonard/8231717/webrev.01/
Many thanks
Andrew

Andrew Leonard
Java Runtimes Development
IBM Hursley
IBM United Kingdom Ltd
internet email: andrew_m_leon...@uk.ibm.com




From: Roger Riggs <roger.ri...@oracle.com>
To: Andrew Leonard <andrew_m_leon...@uk.ibm.com>
Cc: core-libs-dev@openjdk.java.net
Date: 02/10/2019 19:27
Subject: Re: RFR: 8231717: Improve performance of EBCDIC charsetdecoding for COMPACT_STRINGS
------------------------------------------------------------------------



Hi Andrew,


On 10/2/19 12:54 PM, Andrew Leonard wrote:
Hi Roger,
Thank you for the feedback.
So your point about constructor performance is a good one, in myperformance tests in which I have run both the StrCodingBenchmark.javatest and also some simple standalone tests, I did not see anynoticeable degradation for charsets that are not alwaysCompactable(),but I will look closer at that one. It has made me think though that Ican do the isAlwaysCompactable determination at static init time,since the table is obviously static...I will investigate thatoptimization.
The change is actually not just EBCDIC, as I point out in the bug it'sany SingleByte charset that is determined as "alwaysCompactable", Ispecified EBCDIC in the bug title as it is actually just about onlythe EBCDIC charsets that fall into this category. I can rename the"title" maybe?
So I think the name should be closer to everything that can bedecoded/mapped to Latin1 or 8-bit clean.
The "negatives" check for isASCIICompatible is done only on the inputbytes when the given charset "isASCIICompatible" as you said. Thischange is not related to hasNegatives() so not sure I quite understandthe point about moving it? hasNegatives() is relavent to the inputbyte[] being decoded as if it contains negatives it will need completemapping even for an isASCIICompatible charset.
ok
isAlwaysCompactable is about the charset in general, and can (and Ithink should) be computed at static init time, as it's simplydetermined by the fact that the static b2c[] table maps only to values0->0xff.
EBCDIC codepages are not ASCIICompatible as they don't map directly,so you cannot combine the two, they are independent scenario fastpaths.
rigth, there is a mapping needed.
If I add a default impl for decodeToLatin1() I could add the methodsto ArrayDecoder, I will look at that. I think I did it that way as Ididn't want to affect anyone that may have implemented ArrayDecoderseparately...
I was also thinking of ISO_8859_1, that doesn't need a mapping,
and trying to avoid a multiple separate decisions and data paths.
(while keeping the code simple)

Thanks, Roger
Thanks for the review, I will look at the static init ofalwaysCompactable, and move the methods to ArrayDecoder, and do acloser check on constructor performance at the end of that.
Cheers
Andrew

Andrew Leonard
Java Runtimes Development
IBM Hursley
IBM United Kingdom Ltd
internet email: _andrew_m_leon...@uk.ibm.com_<mailto:andrew_m_leon...@uk.ibm.com>
From: Roger Riggs _<roger.ri...@oracle.com>_<mailto:roger.ri...@oracle.com>To: _core-libs-...@openjdk.java.net_<mailto:core-libs-dev@openjdk.java.net>
Date: 02/10/2019 14:52
Subject: Re: RFR: 8231717: Improve performance of EBCDIC charsetdecoding for COMPACT_STRINGSSent by: "core-libs-dev" _<core-libs-dev-boun...@openjdk.java.net>_<mailto:core-libs-dev-boun...@openjdk.java.net>
------------------------------------------------------------------------



Hi Andrew,

This would seem to have an impact on the performance of every Decoder
constructor
because it determines dynamically the isAlwaysCompactable attribute.
Are there regressions in the non-EBCDIC cases?

Did your performance tests include the constructor overhead?

The isASCIICompatible case is handled by checking for negatives only
when needed.

Unless the information is used more than once it looks like you've moved
the computation
from StringCoding hasNegatives to the Decoder constructor.

Is there a change that would benefit both the isASCIICompatible case and
EBCDIC?

Is there a reason you didn't add the isAlwaysCompactable method to the
existing ArrayDecoder interface.
I don't think there there is a need for a new interface.

Thanks, Roger

On 10/2/19 4:10 AM, Andrew Leonard wrote:
> Hi,
> Please can I request a review of this performance enhancement for EBCDIC
> (and any SingleByte, always compactable) charsets? I've explained the
> theory in the bug(_https://bugs.openjdk.java.net/browse/JDK-8231717_), but> essentially it optimizes any SingleByte charset that is alwayscompactable
> due to all mappings being to <=0xff and avoids unnecessary char[] to
> internal Latin1 byte[] arraycopy as a result. This leads to up to a 100%
> performance gain for decoding these charsets.
> I have run the complete tier1 and also the complete sun/nio/cs testcases
> successfully.
>
> Webrev: _http://cr.openjdk.java.net/~aleonard/8231717/webrev.00/_
>
> Thoughts and comments welcome please?
> Thanks
> Andrew
>
> Andrew Leonard
> Java Runtimes Development
> IBM Hursley
> IBM United Kingdom Ltd
> internet email: _andrew_m_leon...@uk.ibm.com_<mailto:andrew_m_leon...@uk.ibm.com>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, HampshirePO6 3AU
>





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales withnumber 741598.Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO63AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales withnumber 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: RFR: 8231717: Improve performance of EBCDIC charset decoding for COMPACT_STRINGS

Reply via email to