steveloughran opened a new pull request, #3562:
URL: https://github.com/apache/parquet-java/pull/3562
GH-3561 Harden variant decoding
### Rationale for this change
Malformed parquet files could be distruptive enough to not only affect the
execution of a single worker thread (which will ultimately reject it), but
other threads on the same process. This can be disruptive.
### What changes are included in this PR?
- reject oversized metadata/value declarations
- reject oversize dictSize in objects
- range checking
Only low cost checks are made, equivalent to arrow variant
`try_new_with_metadata_and_shallow_validation()`
There's no equivalent `with_full_validation()` logic is omitted. The caching
logic of #3481 may be able to do this when it builds a dictionary, as range
checking the increasing dictionary offsets is the key work there.
There's also a depth check consistent with the json parser; it's arguable as
to whether that is needed. It will defend against StackOverflowExceptions by
anything trying to treewalk, but shouldn't that code be the place to do the
checks?
### Are these changes tested?
The new test suite TestHardenedReader can be configured to actually emit the
malformed files, to see how applications deal with them.
### Are there any user-facing changes?
No
<!-- Please uncomment the line below and replace ${GITHUB_ISSUE_ID} with the
actual Github issue id. -->
Closes #3561
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]