Siyao Meng created HDDS-15356:
---------------------------------
Summary: Make multi-buffer chunk checksum allocation-free
Key: HDDS-15356
URL: https://issues.apache.org/jira/browse/HDDS-15356
Project: Apache Ozone
Issue Type: Improvement
Components: common
Reporter: Siyao Meng
Assignee: Siyao Meng
A **multi-buffer** `ChunkBuffer` is one whose data is stored as more than one
underlying `ByteBuffer`
instead of a single contiguous block. In production this shape occurs when
the data was assembled from
a list of Netty `ByteBuf`s (Ratis state-machine-data read via
`ChunkedNioFile`), when the verifier
received a `List<ByteString>` spanning more than one checksum window (client
read-verify), or when an
operator opted into `IncrementalChunkBuffer` via
`ozone.client.bytebuffer.increment > 0`.
`Checksum.computeChecksum(ChunkBuffer)` previously delegated to
`ChunkBuffer.iterate(bytesPerChecksum)`, which allocates a fresh
`byte[bytesPerChecksum]` (1 MB by
default) and memcpys into it whenever a checksum window straddles two of
those underlying ByteBuffers.
This change rewrites both the no-cache and cache compute paths to walk
`data.asByteBufferList()`
directly, slicing each window via a non-copying `ByteBuffer.duplicate()`
helper (`BufferUtils.slice`)
and feeding slices to a new `Checksum.StreamingChecksum` strategy that wraps
`ChecksumByteBuffer`
(CRC32, CRC32C) or `MessageDigest` (SHA-256, MD5). The `update(ByteBuffer)`
contracts of both define
incremental update as byte-equivalent to a single update over the
concatenation, so output bytes are
bit-identical.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]