Anton Vinogradov created IGNITE-28853:
-----------------------------------------
Summary: CompressedMessage: excessive copying and per-message
direct buffer allocations on both send and receive paths
Key: IGNITE-28853
URL: https://issues.apache.org/jira/browse/IGNITE-28853
Project: Ignite
Issue Type: Task
Reporter: Anton Vinogradov
Assignee: Dmitry Werner
CompressedMessage moves the same bytes through memory 3-5 times and allocates
direct ByteBuffers per message on both sides of the wire. Direct allocation is
expensive (memory zeroing, Cleaner-based release, potential System.gc() inside
Bits.reserveMemory), while no point of the path actually needs a direct buffer:
data arrives in and leaves through heap arrays.
Send path: compress() copies the whole source buffer into a byte[], deflates
via DeflaterOutputStream (512-byte internal buffer -> many small JNI calls)
into a ByteArrayOutputStream pre-sized to the *uncompressed* length, then
copies again via toByteArray(); ChunkedByteReader then copies every 10K chunk
into a fresh array one more time.
Receive path: CompressedMessageSerializer.readFrom() accumulates incoming
chunks into a 100KB direct ByteBuffer allocated per message (grown by doubling
through another copy), although each chunk is already a fresh heap array
returned by readByteArray(); uncompress() copies it all back into a heap array
and inflates via InflaterInputStream.readAllBytes() (internal 8K buffers +
final consolidation copy) despite the exact result size being known upfront;
DirectMessageReader.readCompressedMessageAndDeserialize() then copies the whole
uncompressed payload into yet another per-message direct buffer, although
DirectByteBufferStream fully supports heap buffers.
Fix (wire format unchanged):
* Internal representation switched to List<byte[]> chunks for both directions,
ChunkedByteReader removed.
* compress(): raw Deflater with setInput(ByteBuffer) (no input copy), deflating
straight into wire-ready chunks - compressed bytes are written exactly once.
* readFrom(): a received chunk is simply added to the list - zero copies, zero
direct allocations.
* uncompress(): raw Inflater fed chunk by chunk into an exact-size
byte[dataSize].
* readCompressedMessageAndDeserialize(): ByteBuffer.wrap(uncompressed) instead
of allocateDirect+put+flip.
JMH (GridDhtPartitionsFullMessage receive round-trip with two @Compress map
fields, JDK 17, M-series):
* 30 entries: 15.2K +/- 34.8K -> 100.9K +/- 6.2K ops/s (~6.6x; master's huge
variance is caused by per-message direct allocations triggering GC storms),
heap 66.7K -> 25.2K B/op (-62%).
* 500 entries: 4.38K -> 5.76K ops/s (+31%), heap 522K -> 431K B/op (-18%).
* On top of the heap savings, all per-message direct buffer allocations
(~365KB/op at 500 entries, invisible to gc.alloc.rate) are eliminated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)