yadavay-amzn opened a new pull request, #3486: URL: https://github.com/apache/parquet-java/pull/3486
### What changes were proposed? Fix LZ4_RAW heap decompression that fails when the decompressed page exceeds the ~8KB chunk size used by stream materialization. **Root cause:** When `HeapBytesDecompressor.decompress()` returns a lazy `StreamBytesInput`, its `writeInto()` method reads via `Channels.newChannel()` in ~8KB chunks. Each chunk triggers `NonBlockedDecompressorStream.read(b, off, len)` with a small `len`, which `Lz4RawDecompressor.maxUncompressedLength()` uses to size the output buffer. Since LZ4_RAW requires one-shot decompression, the undersized output buffer causes decompression failure for pages larger than ~8KB. **Fix:** Eagerly materialize the decompressed stream using `BytesInput.copy()` for `Lz4RawCodec` (same approach already used for `ZstandardCodec`). This forces the full `decompressedSize` to be read at once via `DataInputStream.readFully()`, ensuring the decompressor is called with the correct buffer size. Closes #3478 ### How was this tested? - Compilation verified (zero errors in changed file) - Existing test `TestInteropReadLz4RawCodec.testInteropReadLz4RawLargerParquetFiles` covers the large-page LZ4_RAW read path -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
