Re: RFR: 8321156: Improve the handling of invalid UTF-8 byte sequences for ZipInputStream::getNextEntry and ZipFile::getComment [v2]

2024-02-24 Thread Lance Andersen
> Please review this PR which addresses the handling of invalid UTF-8 byte 
> sequences in the entry name of a LOC file header and a Zip file comment which 
> is returned via ZipFile::getComment.
> 
> As part of the change, `ZipFile::getComment` will now return `null` if an 
> invalid UTF-8 byte sequence is encountered while converting the byte array to 
> a String.  The CSR for this change has also been approved.
> 
> Mach5 tiers 1-3 are clean with this change.

Lance Andersen has updated the pull request incrementally with one additional 
commit since the last revision:

  Updates based on 1st round of feedback

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17995/files
  - new: https://git.openjdk.org/jdk/pull/17995/files/e76ff749..dfd49101

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17995&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17995&range=00-01

  Stats: 24 lines in 3 files changed: 1 ins; 2 del; 21 mod
  Patch: https://git.openjdk.org/jdk/pull/17995.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17995/head:pull/17995

PR: https://git.openjdk.org/jdk/pull/17995


Re: RFR: 8321156: Improve the handling of invalid UTF-8 byte sequences for ZipInputStream::getNextEntry and ZipFile::getComment [v2]

2024-02-24 Thread Lance Andersen
On Sat, 24 Feb 2024 18:51:22 GMT, Lance Andersen  wrote:

>> Please review this PR which addresses the handling of invalid UTF-8 byte 
>> sequences in the entry name of a LOC file header and a Zip file comment 
>> which is returned via ZipFile::getComment.
>> 
>> As part of the change, `ZipFile::getComment` will now return `null` if an 
>> invalid UTF-8 byte sequence is encountered while converting the byte array 
>> to a String.  The CSR for this change has also been approved.
>> 
>> Mach5 tiers 1-3 are clean with this change.
>
> Lance Andersen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Updates based on 1st round of feedback

Thank you for your input Eirik,  I have pushed updates to address the 
suggestions you raised

-

PR Review: https://git.openjdk.org/jdk/pull/17995#pullrequestreview-1899544786


Re: RFR: 8321156: Improve the handling of invalid UTF-8 byte sequences for ZipInputStream::getNextEntry and ZipFile::getComment [v2]

2024-02-24 Thread Lance Andersen
On Sat, 24 Feb 2024 16:57:50 GMT, Eirik Bjørsnøs  wrote:

>> Lance Andersen has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Updates based on 1st round of feedback
>
> src/java.base/share/classes/java/util/zip/ZipFile.java line 331:
> 
>> 329: try {
>> 330: return res.zsrc.zc.toString(res.zsrc.comment);
>> 331: } catch(IllegalArgumentException iae) {
> 
> Suggestion:
> 
> } catch (IllegalArgumentException iae) {

fixed

> src/java.base/share/classes/java/util/zip/ZipInputStream.java line 526:
> 
>> 524: throw  (ZipException) new ZipException(
>> 525: "invalid LOC header (bad entry 
>> name)").initCause(ex);
>> 526: }
> 
> There is a lot going on here:
> 
> * The creation of a ZipEntry
> * The interpretation of the general purpose bit flag
> * The split to call either a static or non-static toString depending on the 
> flag interpretation
> * The exception handling, including the somewhat unusual initCause and cast
> 
> The comment at 517 seems even more detached from the logic it tries to 
> describe after this change.
> 
> Perhaps we could benefit from introducing a `zipCoderFromFlag(int flag)` 
> method, similar to that in ZipFile?
> 
> This would allow the section to be rewritten to something that hide the flag 
> parsing details, leaves us with a single (polymorphic) toString invocation 
> and extracts `createZipEntry` from the try/catch scope. 
> 
> Something like this: (I also changed the exception handling here, that's a 
> bit orthogonal to the main points above)
> 
> Suggestion:
> 
> String name;
> try {
> name = zipCoderForFlag(flag).toString(b, len);
> } catch (IllegalArgumentException ex) {
> ZipException e = new ZipException("invalid LOC header (bad entry 
> name)");
> e.initCause(ex);
> throw e;
> }
> ZipEntry e = createZipEntry(name);

I believe you meant `ZipCoderFromPos(int flag)`.  I don't think it is needed in 
ZipInputStream but I did move the call to createZipEntry out of the try block 
per your suggestion

> test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 49:
> 
>> 47:  * @summary Validate that a ZipException is thrown when opening a ZIP 
>> file via
>> 48:  * ZipFile, traversing a Zip File via ZipInputStream, with invalid UTF-8
>> 49:  * byte sequences in the name or comment fields fails with ZipException.
> 
> The leading summary sentence needs a cleanup. Here's one suggestion:
> 
> Suggestion:
> 
>  * @summary Validate that a ZipException is thrown when a ZIP file with 
>  * invalid UTF-8 byte sequences in the name or comment fields is opened via
>  * ZipFile or traversed via ZipInputStream.

Updated

> test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 60:
> 
>> 58: private static final byte[] INVALID_UTF8_BYTE_SEQUENCE = {(byte) 
>> 0xF0, (byte) 0xA4, (byte) 0xAD};
>> 59: // Expected error message when an invalid entry name or entry 
>> comment is
>> 60: // encountered when accessing a CEN Header
> 
> This two-line comment could benefit from an earlier line break.

The break makes sense as is in IntelliJ, so please clarify

> test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 64:
> 
>> 62: 
>> 63: // Expected error message when an invalid entry name is encountered 
>> when
>> 64: // accessing a LOC Header
> 
> This two-line comment could benefit from an earlier line break.

Same comment as above

> test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 65:
> 
>> 63: // Expected error message when an invalid entry name is encountered 
>> when
>> 64: // accessing a LOC Header
>> 65: private static final String LOC_BAD_ENTRY_NAME_OR_COMMENT = "invalid 
>> LOC header (bad entry name)";
> 
> Should this be renamed `LOC_BAD_ENTRY_NAME`?

Yep, I thought I made that change but must have just thought about it and never 
went back to it

> test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 128:
> 
>> 126:  * 0x93: Comment :   [ZipFileComment]
>> 127:  */
>> 128: public static byte[] VALID_ZIP = {
> 
> I see this byte array encoding of ZIP files is a pattern used across tests. 
> My preference would be to encode this in Base64 or hex, since that would make 
> the consumption by external command line tools easier and the encoded 
> representation somewhat prettier. But no big deal, this might just come down 
> to personal preference. (This isn't human readable anyhow for the normal 
> definition of "human" :)

Yes, this is a  style preference and plan to leave as is.

> test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 182:
> 
>> 180: /**
>> 181:  * Validate that the original Zip file can be opened via ZipFile.
>> 182:  * @throws IOException if an entry occurs
> 

Re: RFR: 8321156: Improve the handling of invalid UTF-8 byte sequences for ZipInputStream::getNextEntry and ZipFile::getComment [v2]

2024-02-25 Thread Jaikiran Pai
On Sat, 24 Feb 2024 18:54:07 GMT, Lance Andersen  wrote:

>> Please review this PR which addresses the handling of invalid UTF-8 byte 
>> sequences in the entry name of a LOC file header and a Zip file comment 
>> which is returned via ZipFile::getComment.
>> 
>> As part of the change, `ZipFile::getComment` will now return `null` if an 
>> invalid UTF-8 byte sequence is encountered while converting the byte array 
>> to a String.  The CSR for this change has also been approved.
>> 
>> Mach5 tiers 1-3 are clean with this change.
>
> Lance Andersen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Updates based on 1st round of feedback

Hello Lance, the changes look fine to me. `ZipFile.java` file will need a 
copyright year update.

test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 189:

> 187: try (ZipFile zf = new ZipFile(ZIP_FILE.toFile())) {
> 188: var comment = zf.getComment();
> 189: System.out.printf("Comment= %s%n", comment);

Should we assert for the valid expected `comment` here?

-

Marked as reviewed by jpai (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17995#pullrequestreview-1899665517
PR Review Comment: https://git.openjdk.org/jdk/pull/17995#discussion_r1501808997


Re: RFR: 8321156: Improve the handling of invalid UTF-8 byte sequences for ZipInputStream::getNextEntry and ZipFile::getComment [v2]

2024-02-25 Thread Eirik Bjørsnøs
On Sat, 24 Feb 2024 18:54:07 GMT, Lance Andersen  wrote:

>> Please review this PR which addresses the handling of invalid UTF-8 byte 
>> sequences in the entry name of a LOC file header and a Zip file comment 
>> which is returned via ZipFile::getComment.
>> 
>> As part of the change, `ZipFile::getComment` will now return `null` if an 
>> invalid UTF-8 byte sequence is encountered while converting the byte array 
>> to a String.  The CSR for this change has also been approved.
>> 
>> Mach5 tiers 1-3 are clean with this change.
>
> Lance Andersen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Updates based on 1st round of feedback

src/java.base/share/classes/java/util/zip/ZipInputStream.java line 524:

> 522: : zc.toString(b, len);
> 523: } catch(Exception ex) {
> 524: throw  (ZipException) new ZipException(

Whitespace nit:

Suggestion:

} catch (Exception ex) {
throw (ZipException) new ZipException(

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17995#discussion_r1501809393


Re: RFR: 8321156: Improve the handling of invalid UTF-8 byte sequences for ZipInputStream::getNextEntry and ZipFile::getComment [v2]

2024-02-25 Thread Lance Andersen
On Sun, 25 Feb 2024 12:16:14 GMT, Jaikiran Pai  wrote:

> Hello Lance, the changes look fine to me. `ZipFile.java` file will need a 
> copyright year update.

Thank you Jai, updated

> test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 189:
> 
>> 187: try (ZipFile zf = new ZipFile(ZIP_FILE.toFile())) {
>> 188: var comment = zf.getComment();
>> 189: System.out.printf("Comment= %s%n", comment);
> 
> Should we assert for the valid expected `comment` here?

Sure I can add the validation

-

PR Comment: https://git.openjdk.org/jdk/pull/17995#issuecomment-1962954345
PR Review Comment: https://git.openjdk.org/jdk/pull/17995#discussion_r1501822988


Re: RFR: 8321156: Improve the handling of invalid UTF-8 byte sequences for ZipInputStream::getNextEntry and ZipFile::getComment [v2]

2024-02-25 Thread Lance Andersen
On Sat, 24 Feb 2024 18:54:07 GMT, Lance Andersen  wrote:

>> Please review this PR which addresses the handling of invalid UTF-8 byte 
>> sequences in the entry name of a LOC file header and a Zip file comment 
>> which is returned via ZipFile::getComment.
>> 
>> As part of the change, `ZipFile::getComment` will now return `null` if an 
>> invalid UTF-8 byte sequence is encountered while converting the byte array 
>> to a String.  The CSR for this change has also been approved.
>> 
>> Mach5 tiers 1-3 are clean with this change.
>
> Lance Andersen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Updates based on 1st round of feedback

Thanks for the feedback.  Updated accordingly

-

PR Review: https://git.openjdk.org/jdk/pull/17995#pullrequestreview-1899679545


Re: RFR: 8321156: Improve the handling of invalid UTF-8 byte sequences for ZipInputStream::getNextEntry and ZipFile::getComment [v2]

2024-02-25 Thread Lance Andersen
On Sun, 25 Feb 2024 12:20:36 GMT, Eirik Bjørsnøs  wrote:

>> Lance Andersen has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Updates based on 1st round of feedback
>
> src/java.base/share/classes/java/util/zip/ZipInputStream.java line 524:
> 
>> 522: : zc.toString(b, len);
>> 523: } catch(Exception ex) {
>> 524: throw  (ZipException) new ZipException(
> 
> Whitespace nit:
> 
> Suggestion:
> 
> } catch (Exception ex) {
> throw (ZipException) new ZipException(

fixed

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17995#discussion_r1501822894