arouel opened a new issue, #3489:
URL: https://github.com/apache/parquet-java/issues/3489
### Describe the bug, including details regarding any error messages,
version, and platform.
`Binary.copy()` is intended to produce an owned copy of the value that is
safe to store beyond the lifetime of the source buffer. However, the base
implementation (line 120-126) only copies when `isBackingBytesReused` is `true`:
```java
public Binary copy() {
if (isBackingBytesReused) {
return Binary.fromConstantByteArray(getBytes());
} else {
return this;
}
}
```
For `ByteBufferBackedBinary` instances produced by column readers,
`isBackingBytesReused` is `false`, so `copy()` returns this, the original
object still pointing into the decompressed page buffer.
When the decompressed page buffer is backed by a direct `ByteBuffer` whose
memory can be freed independently (e.g., via `Arena.close()` when using a
direct `ByteBufferAllocator` with `useOffHeapDecryptBuffer`), any code that
called `copy()` expecting to own the data will later hit a use-after-free when
the source buffer is released.
In practice this surfaces through `DictionaryValuesWriter.writeBytes(Binary
v)`, which calls `v.copy()` before inserting into its dictionary map. The
returned object is the same `ByteBufferBackedBinary` still referencing the
direct buffer. When the `PageReadStore` is closed (releasing the decompressed
page buffers), the dictionary entries become dangling references. A subsequent
dictionary fallback triggers access to the freed memory:
```
java.lang.IllegalStateException: Already closed
at
java.base/jdk.internal.foreign.MemorySessionImpl.alreadyClosed(MemorySessionImpl.java:326)
at
java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:114)
at
java.base/jdk.internal.misc.ScopedMemoryAccess.getByte(ScopedMemoryAccess.java:1090)
at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:341)
at java.base/java.nio.ByteBuffer.getArray(ByteBuffer.java:984)
at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:838)
at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:865)
at
org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes(Binary.java:504)
at
org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytesUnsafe(Binary.java:515)
at
org.apache.parquet.column.values.deltastrings.DeltaByteArrayWriter.writeBytes(DeltaByteArrayWriter.java:92)
at
org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.fallBackDictionaryEncodedData(DictionaryValuesWriter.java:315)
at
org.apache.parquet.column.values.dictionary.DictionaryValuesWriter.fallBackAllValuesTo(DictionaryValuesWriter.java:151)
at
org.apache.parquet.column.values.fallback.FallbackValuesWriter.fallBack(FallbackValuesWriter.java:155)
at
org.apache.parquet.column.values.fallback.FallbackValuesWriter.getBytes(FallbackValuesWriter.java:86)
at
org.apache.parquet.column.impl.ColumnWriterV2.writePage(ColumnWriterV2.java:99)
at
org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:379)
at
org.apache.parquet.column.impl.ColumnWriteStoreBase.flush(ColumnWriteStoreBase.java:191)
```
The `isBackingBytesReused` flag was designed for the `RecordReader` pattern
where a shared `byte[]` backing array might be mutated between calls. It does
not account for direct `ByteBuffer`s whose underlying memory can be deallocated
externally.
### Expected behavior
`Binary.copy()` on a `ByteBufferBackedBinary` with a direct `ByteBuffer`
should always materialize to a heap-backed `Binary`, ensuring the returned
value is independent of the source buffer's lifecycle.
### Version
1.17.0 (older versions are also affected)
### Component(s)
Core
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]