Anton Vinogradov created IGNITE-28853:
-----------------------------------------

             Summary: CompressedMessage: excessive copying and per-message 
direct buffer allocations on both send and receive paths
                 Key: IGNITE-28853
                 URL: https://issues.apache.org/jira/browse/IGNITE-28853
             Project: Ignite
          Issue Type: Task
            Reporter: Anton Vinogradov
            Assignee: Dmitry Werner


CompressedMessage moves the same bytes through memory 3-5 times and allocates 
direct ByteBuffers per message on both sides of the wire. Direct allocation is 
expensive (memory zeroing, Cleaner-based release, potential System.gc() inside 
Bits.reserveMemory), while no point of the path actually needs a direct buffer: 
data arrives in and leaves through heap arrays.

Send path: compress() copies the whole source buffer into a byte[], deflates 
via DeflaterOutputStream (512-byte internal buffer -> many small JNI calls) 
into a ByteArrayOutputStream pre-sized to the *uncompressed* length, then 
copies again via toByteArray(); ChunkedByteReader then copies every 10K chunk 
into a fresh array one more time.

Receive path: CompressedMessageSerializer.readFrom() accumulates incoming 
chunks into a 100KB direct ByteBuffer allocated per message (grown by doubling 
through another copy), although each chunk is already a fresh heap array 
returned by readByteArray(); uncompress() copies it all back into a heap array 
and inflates via InflaterInputStream.readAllBytes() (internal 8K buffers + 
final consolidation copy) despite the exact result size being known upfront; 
DirectMessageReader.readCompressedMessageAndDeserialize() then copies the whole 
uncompressed payload into yet another per-message direct buffer, although 
DirectByteBufferStream fully supports heap buffers.

Fix (wire format unchanged):
* Internal representation switched to List<byte[]> chunks for both directions, 
ChunkedByteReader removed.
* compress(): raw Deflater with setInput(ByteBuffer) (no input copy), deflating 
straight into wire-ready chunks - compressed bytes are written exactly once.
* readFrom(): a received chunk is simply added to the list - zero copies, zero 
direct allocations.
* uncompress(): raw Inflater fed chunk by chunk into an exact-size 
byte[dataSize].
* readCompressedMessageAndDeserialize(): ByteBuffer.wrap(uncompressed) instead 
of allocateDirect+put+flip.

JMH (GridDhtPartitionsFullMessage receive round-trip with two @Compress map 
fields, JDK 17, M-series):
* 30 entries: 15.2K +/- 34.8K -> 100.9K +/- 6.2K ops/s (~6.6x; master's huge 
variance is caused by per-message direct allocations triggering GC storms), 
heap 66.7K -> 25.2K B/op (-62%).
* 500 entries: 4.38K -> 5.76K ops/s (+31%), heap 522K -> 431K B/op (-18%).
* On top of the heap savings, all per-message direct buffer allocations 
(~365KB/op at 500 entries, invisible to gc.alloc.rate) are eliminated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to