arouel opened a new issue, #3542:
URL: https://github.com/apache/parquet-java/issues/3542

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   `LocalInputFile.readFully(ByteBuffer)` and `LocalInputFile.read(ByteBuffer)` 
in `parquet-common` are broken for any `ByteBuffer` that either (a) does not 
expose an accessible backing array or (b) has a non-zero `position()` when 
passed in. In practice this means any call to `ParquetFileReader.readFooter` 
against an `InputFile` obtained from new `LocalInputFile(path)` can fail, 
Parquet itself passes buffer shapes that trigger the bug.
   
   ### Root cause
   
   Both methods end with:
   `buf.put(buffer, buf.position() + buf.arrayOffset(), buf.remaining());`
   
   Two independent defects:
   1. Wrong argument semantics. `ByteBuffer.put(byte[] src, int offset, int 
length)` treats offset as an offset into the source array. The source here is 
the freshly-allocated local buffer, whose indices have nothing to do with 
`buf.position()` or `buf.arrayOffset()`. It happens to work when both are zero; 
any other state either reads from the wrong offset or throws 
`IndexOutOfBoundsException`.
   2. `arrayOffset()` is not universally defined. Direct buffers, memory-mapped 
buffers, and read-only views all throw `UnsupportedOperationException` from 
`arrayOffset()`, so the call explodes before the put is even attempted.
   
   `read(ByteBuffer)` has an additional bug: it copies `buf.remaining()` bytes 
into the destination regardless of how many bytes `read(byte[])` actually 
returned, corrupting the buffer on short reads and advancing `position` past 
the EOF boundary.
   
   ### Stack trace
   ```
   java.lang.UnsupportedOperationException
       at java.base/java.nio.ByteBuffer.arrayOffset(ByteBuffer.java:1558)
       at 
org.apache.parquet.io.LocalInputFile$1.readFully(LocalInputFile.java:93)
       at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:642)
       at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:578)
       at 
org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:971)
       at 
org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:961)
   ```
   
   ### Minimal reproducer
   ```java
   Path path = /* any existing Parquet file */;
   try (SeekableInputStream s = new LocalInputFile(path).newStream()) {
       s.readFully(ByteBuffer.allocateDirect(8));   // throws 
UnsupportedOperationException
   }
   ```
   ```java
   try (SeekableInputStream s = new LocalInputFile(path).newStream()) {
       ByteBuffer heap = ByteBuffer.allocate(8);
       heap.put(new byte[] {0, 0});                 // position=2
       s.readFully(heap);                           // reads from wrong offset 
in source array
   }
   ```
   
   ### Version
   - parquet-common 1.17.0
   - Introduced by PARQUET-1822 (commit 7c4cb42a, "Avoid requiring Hadoop 
installation for reading/writing", #1111), which added `LocalInputFile`.
   
   ### Component(s)
   
   Core


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to