steveloughran opened a new pull request, #3562:
URL: https://github.com/apache/parquet-java/pull/3562

   GH-3561 Harden variant decoding
   
   
   ### Rationale for this change
   
   Malformed parquet files could be distruptive enough to not only affect the 
execution of a single worker thread (which will ultimately reject it), but 
other threads on the same process.  This can be disruptive.
   
   
   ### What changes are included in this PR?
   
   - reject oversized metadata/value declarations
   - reject oversize dictSize in objects
   - range checking
   
   Only low cost checks are made, equivalent to arrow variant 
`try_new_with_metadata_and_shallow_validation()`
   
   There's no equivalent `with_full_validation()` logic is omitted. The caching 
logic of #3481 may be able to do this when it builds a dictionary, as range 
checking the increasing dictionary offsets is the key work there.
   
   There's also a depth check consistent with the json parser; it's arguable as 
to whether that is needed. It will defend against StackOverflowExceptions by 
anything trying to treewalk, but shouldn't that code be the place to do the 
checks?
   
   ### Are these changes tested?
   
   The new test suite TestHardenedReader can be configured to actually emit the 
malformed files, to see how applications deal with them.
   
   ### Are there any user-facing changes?
   
   No
   
   <!-- Please uncomment the line below and replace ${GITHUB_ISSUE_ID} with the 
actual Github issue id. -->
   Closes #3561 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to