yadavay-amzn opened a new pull request, #3486:
URL: https://github.com/apache/parquet-java/pull/3486

   ### What changes were proposed?
   
   Fix LZ4_RAW heap decompression that fails when the decompressed page exceeds 
the ~8KB chunk size used by stream materialization.
   
   **Root cause:** When `HeapBytesDecompressor.decompress()` returns a lazy 
`StreamBytesInput`, its `writeInto()` method reads via `Channels.newChannel()` 
in ~8KB chunks. Each chunk triggers `NonBlockedDecompressorStream.read(b, off, 
len)` with a small `len`, which `Lz4RawDecompressor.maxUncompressedLength()` 
uses to size the output buffer. Since LZ4_RAW requires one-shot decompression, 
the undersized output buffer causes decompression failure for pages larger than 
~8KB.
   
   **Fix:** Eagerly materialize the decompressed stream using 
`BytesInput.copy()` for `Lz4RawCodec` (same approach already used for 
`ZstandardCodec`). This forces the full `decompressedSize` to be read at once 
via `DataInputStream.readFully()`, ensuring the decompressor is called with the 
correct buffer size.
   
   Closes #3478
   
   ### How was this tested?
   
   - Compilation verified (zero errors in changed file)
   - Existing test 
`TestInteropReadLz4RawCodec.testInteropReadLz4RawLargerParquetFiles` covers the 
large-page LZ4_RAW read path


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to