Hi Claes, The latest version looks good.
Thank you for the patch. Best Lance > On Apr 22, 2020, at 6:26 AM, Claes Redestad <[email protected]> wrote: > > Hi, > > new webrev based on discussions here and offline: > > http://cr.openjdk.java.net/~redestad/8243254/open.01/ > > - creates a new testng test TestZipFileEncodings, which derives from > TestZipFile rather than repurposing that existing test. There is > definitely some overlap, but since TestZipFile is only run manually > and this new test runs fine (and quickly) in our testing I think > that's fine > > - only implements the optimization for UTF-8, removing the concept > of ASCII-compatible ZipCoders. > > - a few other cleanups > > Testing: tier1+2 > > Thanks! > > /Claes > > On 2020-04-21 15:52, Claes Redestad wrote: >> Hi, >> ZipFile.getEntry does optimizations to check for directory entries by >> adding a '/' to the encoded byte array. JDK-8242959 improved on this >> optimization, but a question was raised Jason Zaugg on whether the >> optimization is always valid[1]. Turns out it isn't, but only for >> non-ASCII compatible charsets. >> While JarFiles are always assumed to be UTF-8-encoded or compatible, >> ZipFiles allow arbitrary encoding. E.g., UTF-16 would encode '/' (2F) as >> either 2F 00 or 00 2F, which means the hash code would differ and a >> directory "foo/" potentially not be found when looking up "foo". Further >> complications arise when/if the directory name ends with a code point >> that might be encoded so that the final byte is 2F, e.g. \u012F. >> This patch only enables this low-level optimization when the charset >> encoding used is known to be ASCII compatible in the sense that 2F will >> be encoded as single-byte 2F. UTF-8 is compatible in this sense - and >> since this is the charset exclusively used by JarFile these changes have >> little to no effect on startup performance in the cases we've been >> looking at. >> Webrev: http://cr.openjdk.java.net/~redestad/8243254/open.00/ >> Bug: https://bugs.openjdk.java.net/browse/JDK-8243254 >> I've partially repurposed ZipFile/TestZipFile to test some such corner >> cases. Mainly expanded it to test ZipFiles created with various non- >> standard encodings, but also slimmed it down so that it can actually run >> quickly and reliably - as well as enabled it in regular testing. The >> updated test fails both before and after JDK-8242959, but passes with >> the proposed changes. >> Testing: tier1+2 >> Thanks! >> /Claes >> [1] https://twitter.com/retronym/status/1252134723431747584?s=20 <http://oracle.com/us/design/oracle-email-sig-198324.gif> <http://oracle.com/us/design/oracle-email-sig-198324.gif> <http://oracle.com/us/design/oracle-email-sig-198324.gif> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 [email protected] <mailto:[email protected]>
