Hi Pulsar Community,
I create a proposal that ManagedCursorInfo compression. The proposal can be
found: https://github.com/apache/pulsar/issues/14529
<https://github.com/apache/pulsar/issues/14529>
Thanks,
Zixuan
------------------
Motivation
The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data
becomes more and more, the data size will increase and will take a lot of time
to pull the data. Therefore, it is necessary to add compression for the cursor,
which can reduce the size of data and reduce the time of pulling data.
Goal
Support use the LZ4/ZLIB/ZSTD/SNAPPY to compress the ManagedCursorInfo.
Implementation
CursorInfo compression format
[MAGIC_NUMBER] + [METADATA_SIZE] + [METADATA_PAYLOAD] +
[MANAGED_CURSOR_INFO_PAYLOAD]
MAGIC_NUMBER: Ox4779
METADATA
Add a named ManagedCursorInfoMetadata message to MLDataFormats.proto
message ManagedCursorInfoMetadata {
required CompressionType compressionType = 1;
required int32 uncompressedSize = 2;
}
CursorInfo compression and decompression design
Currently, these compressions types have been defined and implemented by
Pulsar, we only need to deal with compression and decompression of the
ManagedCursorInfo data:
Get CursorInfo from the metadata store
We will check the cursor data header, if it is compressed, we will parse the
bytes data by compressed format, otherwise by the original way.
Add/Update CursorInfo to the metadata store
The default is to use compression if the compression type is specified.
CursorInfo compression type configuration
Add managedCursorInfoCompressionType in
org.apache.pulsar.broker.ServiceConfiguration and
org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig.