arouel opened a new issue, #3489:
URL: https://github.com/apache/parquet-java/issues/3489

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   `Binary.copy()` is intended to produce an owned copy of the value that is 
safe to store beyond the lifetime of the source buffer. However, the base 
implementation (line 120-126) only copies when `isBackingBytesReused` is `true`:
   ```java
   public Binary copy() {
       if (isBackingBytesReused) {
           return Binary.fromConstantByteArray(getBytes());
       } else {
           return this;
       }
   }
   ```
   For `ByteBufferBackedBinary` instances produced by column readers, 
`isBackingBytesReused` is `false`, so `copy()` returns this, the original 
object still pointing into the decompressed page buffer.
   When the decompressed page buffer is backed by a direct `ByteBuffer` whose 
memory can be freed independently (e.g., via `Arena.close()` when using a 
direct `ByteBufferAllocator` with `useOffHeapDecryptBuffer`), any code that 
called `copy()` expecting to own the data will later hit a use-after-free when 
the source buffer is released.
   
   In practice this surfaces through `DictionaryValuesWriter.writeBytes(Binary 
v)`, which calls `v.copy()` before inserting into its dictionary map. The 
returned object is the same `ByteBufferBackedBinary` still referencing the 
direct buffer. When the `PageReadStore` is closed (releasing the decompressed 
page buffers), the dictionary entries become dangling references. A subsequent 
dictionary fallback triggers access to the freed memory:
   ```
   java.lang.IllegalStateException: Already closed
       at 
java.base/jdk.internal.foreign.MemorySessionImpl.alreadyClosed(MemorySessionImpl.java:326)
       at 
java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:114)
       at 
java.base/jdk.internal.misc.ScopedMemoryAccess.getByte(ScopedMemoryAccess.java:1090)
       at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:341)
       at java.base/java.nio.ByteBuffer.getArray(ByteBuffer.java:984)
       at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:838)
       at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:865)
       at 
org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes(Binary.java:504)
       at 
org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytesUnsafe(Binary.java:515)
       at 
org.apache.parquet.column.values.deltastrings.DeltaByteArrayWriter.writeBytes(DeltaByteArrayWriter.java:92)
       at 
org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.fallBackDictionaryEncodedData(DictionaryValuesWriter.java:315)
       at 
org.apache.parquet.column.values.dictionary.DictionaryValuesWriter.fallBackAllValuesTo(DictionaryValuesWriter.java:151)
       at 
org.apache.parquet.column.values.fallback.FallbackValuesWriter.fallBack(FallbackValuesWriter.java:155)
       at 
org.apache.parquet.column.values.fallback.FallbackValuesWriter.getBytes(FallbackValuesWriter.java:86)
       at 
org.apache.parquet.column.impl.ColumnWriterV2.writePage(ColumnWriterV2.java:99)
       at 
org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:379)
       at 
org.apache.parquet.column.impl.ColumnWriteStoreBase.flush(ColumnWriteStoreBase.java:191)
   ```
   The `isBackingBytesReused` flag was designed for the `RecordReader` pattern 
where a shared `byte[]` backing array might be mutated between calls. It does 
not account for direct `ByteBuffer`s whose underlying memory can be deallocated 
externally.
   
   ### Expected behavior
   
   `Binary.copy()` on a `ByteBufferBackedBinary` with a direct `ByteBuffer` 
should always materialize to a heap-backed `Binary`, ensuring the returned 
value is independent of the source buffer's lifecycle.
   
   ### Version
   
   1.17.0 (older versions are also affected)
   
   ### Component(s)
   
   Core


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to