ppkarwasz opened a new pull request, #734:
URL: https://github.com/apache/commons-compress/pull/734
The 7z file format specification defines only **unsigned numbers**
(`UINT64`, `REAL_UINT64`, `UINT32`). However, the current implementation allows
parsing methods like `readUint64`, `getLong`, and `getInt` to return negative
values and then handles those inconsistently in downstream logic.
This PR introduces a safer and more specification-compliant number parsing
model.
### Key changes
* **Strict unsigned number parsing**
* Parsing methods now *never* return negative numbers.
* `readUint64`, `readUint64ToIntExact`, `readRealUint64`, and `readUint32`
follow the terminology from `7zFormat.txt`.
* Eliminates scattered negative-value checks that previously compensated
for parsing issues.
* **Improved header integrity validation**
* Before large allocations, the size is now validated against the **actual
available data in the header** as well as the memory limit.
* Prevents unnecessary or unsafe allocations when the archive is corrupted
or truncated.
* **Correct numeric type usage**
* Some fields represent 7z numbers as 64-bit values but are constrained
internally to Java `int` limits.
* These are now declared as `int` to signal real constraints in our
implementation.
* **Consistent error handling** Parsing now throws only three well-defined
exception types:
| Condition |
Exception |
| ---------------------------------------------------------------------- |
-------------------------------------------- |
| Declared structure exceeds `maxMemoryLimitKiB` |
`MemoryLimitException` |
| Missing data inside header (truncated or corrupt) |
`ArchiveException("Corrupted 7z archive")` |
| Unsupported numeric values (too large for implementation) |
`ArchiveException("Unsupported 7z archive")` |
Note: `EOFException` is no longer used: a header with missing fields is
not “EOF,” it is **corrupted**.
This PR lays groundwork for safer parsing and easier future maintenance by
aligning number handling with the actual 7z specification and making header
parsing behavior *predictable and robust*.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]