Christian Kosmowski created KAFKA-9716:
------------------------------------------
Summary: Values of compression-rate and compression-rate-avg are
misleading
Key: KAFKA-9716
URL: https://issues.apache.org/jira/browse/KAFKA-9716
Project: Kafka
Issue Type: Bug
Components: clients, compression
Affects Versions: 2.4.1
Reporter: Christian Kosmowski
The values of the following metrics:
compression-rate and compression-rate-avg and basically every other
compression-rate (i.e.) topic compression rate
are confusing.
They are calculated as follows:
{code:java}
if (numRecords == 0L) {
buffer().position(initialPosition);
builtRecords = MemoryRecords.EMPTY;
} else {
if (magic > RecordBatch.MAGIC_VALUE_V1)
this.actualCompressionRatio = (float) writeDefaultBatchHeader() /
this.uncompressedRecordsSizeInBytes;
else if (compressionType != CompressionType.NONE)
this.actualCompressionRatio = (float)
writeLegacyCompressedWrapperHeader() / this.uncompressedRecordsSizeInBytes;
ByteBuffer buffer = buffer().duplicate();
buffer.flip();
buffer.position(initialPosition);
builtRecords = MemoryRecords.readableRecords(buffer.slice());
}
{code}
basically the compressed size is divided by the uncompressed size which leads
to a value < 1 for high compression (good if you want compression) or > 1 for
poor compression (bad if you want compression).
>From the name "compression rate" i would expect the exact opposite. Apart from
>the fact that the word "rate" usually refers to comparisons based on values of
>different units (miles per hour) the correct word "ratio" would refer to the
>uncompressed size divided by the compressed size.
So if the compressed data takes half the space of the uncompressed data the
correct value for compression ratio (or rate) would be 2 and not 0.5 as kafka
reports it. That is really confusing and i would AT LEAST expect that this
behaviour would be documented somewhere, but it's not all documentation sources
just say "the compression rate".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)