ppkarwasz opened a new pull request, #734:
URL: https://github.com/apache/commons-compress/pull/734

   The 7z file format specification defines only **unsigned numbers** 
(`UINT64`, `REAL_UINT64`, `UINT32`). However, the current implementation allows 
parsing methods like `readUint64`, `getLong`, and `getInt` to return negative 
values and then handles those inconsistently in downstream logic.
   
   This PR introduces a safer and more specification-compliant number parsing 
model.
   
   ### Key changes
   
   * **Strict unsigned number parsing**
   
     * Parsing methods now *never* return negative numbers.
     * `readUint64`, `readUint64ToIntExact`, `readRealUint64`, and `readUint32` 
follow the terminology from `7zFormat.txt`.
     * Eliminates scattered negative-value checks that previously compensated 
for parsing issues.
   
   * **Improved header integrity validation**
   
     * Before large allocations, the size is now validated against the **actual 
available data in the header** as well as the memory limit.
     * Prevents unnecessary or unsafe allocations when the archive is corrupted 
or truncated.
   
   * **Correct numeric type usage**
   
     * Some fields represent 7z numbers as 64-bit values but are constrained 
internally to Java `int` limits.
     * These are now declared as `int` to signal real constraints in our 
implementation.
   
   * **Consistent error handling** Parsing now throws only three well-defined 
exception types:
   
     | Condition                                                              | 
Exception                                    |
     | ---------------------------------------------------------------------- | 
-------------------------------------------- |
     | Declared structure exceeds `maxMemoryLimitKiB`                         | 
`MemoryLimitException`                       |
     | Missing data inside header (truncated or corrupt)                      | 
`ArchiveException("Corrupted 7z archive")`   |
     | Unsupported numeric values (too large for implementation) | 
`ArchiveException("Unsupported 7z archive")` |
   
     Note: `EOFException` is no longer used: a header with missing fields is 
not “EOF,” it is **corrupted**.
   
   This PR lays groundwork for safer parsing and easier future maintenance by 
aligning number handling with the actual 7z specification and making header 
parsing behavior *predictable and robust*.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to