iemejia opened a new issue, #3495:
URL: https://github.com/apache/parquet-java/issues/3495
### Describe the enhancement requested
`PlainValuesWriter` (used for PLAIN-encoded INT32, INT64, FLOAT, DOUBLE, and
BINARY
columns) currently writes each value through two layers of abstraction:
```
PlainValuesWriter -> LittleEndianDataOutputStream ->
CapacityByteArrayOutputStream
```
Per `writeInt()`, `LittleEndianDataOutputStream` decomposes the int into 4
bytes
in a temporary `writeBuffer[8]` array and calls `out.write(writeBuffer, 0,
4)`,
which dispatches through the `OutputStream` chain into
`CapacityByteArrayOutputStream`.
That path performs:
- 4 byte-shift operations for little-endian decomposition
- 1 intermediate `writeBuffer[8]` array write
- 2 levels of virtual dispatch
- 1 bounds check in `write(byte[], off, len)`
- 1 `System.arraycopy` for 4 bytes
Since `CapacityByteArrayOutputStream` already buffers into `ByteBuffer` slabs
internally, the entire chain can be collapsed into a single
`ByteBuffer.putInt()`
call, which is a HotSpot intrinsic that compiles to a single unaligned store
on
x86/ARM when the buffer is in `LITTLE_ENDIAN` order.
### Proposal
1. In `CapacityByteArrayOutputStream`:
- Set `ByteOrder.LITTLE_ENDIAN` on newly allocated slabs in `addSlab()`.
- Add `writeInt(int)` and `writeLong(long)` methods that call
`currentSlab.putInt(v)` / `currentSlab.putLong(v)` directly, with a
single
remaining-check that grows the slab if needed.
2. In `PlainValuesWriter`:
- Remove the `LittleEndianDataOutputStream` field entirely.
- `writeInteger(v)` -> `arrayOut.writeInt(v)`
- `writeLong(v)` -> `arrayOut.writeLong(v)`
- `writeFloat(v)` -> `arrayOut.writeInt(Float.floatToIntBits(v))`
- `writeDouble(v)` -> `arrayOut.writeLong(Double.doubleToLongBits(v))`
- `writeBytes(Binary v)` -> `arrayOut.writeInt(v.length());
v.writeTo(arrayOut);`
- `getBytes()` no longer needs to flush a buffering layer.
- `close()` no longer closes the defunct stream.
What was eliminated per `writeInt` call:
- 4 byte-shift operations for little-endian decomposition
- 1 intermediate `writeBuffer[8]` array write
- 2 levels of virtual dispatch
- 1 bounds check in `write(byte[], off, len)`
- 1 `System.arraycopy` for 4 bytes
Replaced with:
- 1 remaining-check on the slab `ByteBuffer`
- 1 `ByteBuffer.putInt()` call (single JVM intrinsic, ~1 store instruction on
little-endian architectures)
### Benchmark results
`IntEncodingBenchmark.encodePlain` (100,000 INT32 values per invocation, JMH
`-wi 3 -i 5 -f 1`):
| Pattern | Before (ops/s) | After (ops/s) | Improvement |
|------------------|---------------:|--------------:|------------:|
| SEQUENTIAL | 26,817,451 | 52,953,193 | **+97.5% (2.0x)** |
| RANDOM | 28,517,312 | 37,774,036 | **+32.5%** |
| LOW_CARDINALITY | 28,705,158 | 52,819,678 | **+84.0%** |
| HIGH_CARDINALITY | 28,595,519 | 37,862,571 | **+32.4%** |
The improvement varies by pattern: SEQUENTIAL and LOW_CARDINALITY see ~2x
because
the slab `putInt()` path has highly predictable branching (slab rarely runs
out
for sequential writes). RANDOM and HIGH_CARDINALITY still see a solid +32%
improvement.
The same code path also benefits `writeLong()`, `writeFloat()`,
`writeDouble()`,
and the length prefix written by `writeBytes(Binary)`.
Decode round-trip verified: re-reading the encoded data with
`PlainValuesReader`
produces identical values at ~1.15B ops/s.
### Validation
All 573 `parquet-column` tests and 308 `parquet-common` tests pass with the
change applied.
### Component(s)
Core
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]