[PR] ARJ: correct byte accounting and truncation errors [commons-compress]

via GitHub Mon, 06 Oct 2025 06:41:45 -0700


ppkarwasz opened a new pull request, #723:
URL: https://github.com/apache/commons-compress/pull/723


   In the current implementation:
   
   * `getBytesRead()` could drift from the actual archive size after a full 
read.
   * Exceptions on truncation errors were inconsistent or missing.
   * `DataInputStream` (big-endian) forced ad-hoc helpers for ARJ’s 
little-endian fields.
   
   This PR introduces:
   
   * **Accurate byte accounting:** count all consumed bytes across main/file 
headers, variable strings, CRCs, extended headers, and file data. 
`getBytesRead()` now matches the archive length at end-of-stream.
   * **Consistent truncation handling:**
   
     * Truncation in the **main (archive) header**, read during construction, 
now throws an `ArchiveException` **wrapping** an `EOFException` (cause 
preserved).
     * Truncation in **file headers or file data** is propagated as a plain 
`EOFException` from `getNextEntry()`/`read()`.
   * **Endianness refactor:** replace `DataInputStream` with `EndianUtils`, 
removing several bespoke helpers and making intent explicit.
   
   * Add assertion that `getBytesRead()` equals the archive size after full 
consumption.
   * Parameterized truncation tests at key boundaries (signature, basic/fixed 
header sizes, end of fixed/basic header, CRC, extended-header length, file 
data) verifying the exception contract above.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] ARJ: correct byte accounting and truncation errors [commons-compress]

Reply via email to