lurongjiang commented on PR #1044:
URL: https://github.com/apache/poi/pull/1044#issuecomment-4207636944
### **Changes Made**
1. **Removed zero-padding logic**
- During standard `.doc` file parsing, zero-padding introduced unexpected
exceptions due to invalid buffer content.
- Added test cases for both **standard (Microsoft Office 97-2003
`.doc`)** and **non-standard (WPS-compatible)** formats to ensure robustness.
2. **Enhanced out-of-bounds handling in `FileBackedDataSource`**
- Discovered that file-based operations rely on `FileBackedDataSource`,
which previously threw exceptions when `position >= size()`.
- Now returns an empty `ByteBuffer.allocate(length)` instead of throwing
an exception, ensuring consistent behavior with `ByteArrayBackedDataSource`.
- Added corresponding test cases to validate boundary conditions for
file-backed reads.
### **Impact Assessment**
- Since the original implementation threw exceptions on out-of-bounds
access, the new behavior (returning an empty buffer) maintains backward
compatibility for valid cases.
- For invalid cases (e.g., corrupted files or incomplete reads), the change
prevents abrupt exceptions, allowing callers to handle incomplete data
gracefully (e.g., failing silently or logging errors).
- **No functional regression expected**, as the new logic aligns with the
previous exception-throwing behavior in terms of outcome (failure to parse
invalid data).
### **Testing Validation**
- Verified that:
- Standard `.doc` files parse correctly (no zero-padding artifacts).
- Non-standard WPS files remain supported.
- Out-of-bounds reads (both byte-array and file-backed) now return empty
buffers instead of exceptions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]