Albert Strasheim created KAFKA-1449:
---------------------------------------

             Summary: Extend wire protocol to allow CRC32C
                 Key: KAFKA-1449
                 URL: https://issues.apache.org/jira/browse/KAFKA-1449
             Project: Kafka
          Issue Type: Improvement
          Components: consumer
            Reporter: Albert Strasheim
            Assignee: Neha Narkhede
             Fix For: 0.9.0


Howdy

We are currently building out a number of Kafka consumers in Go, based on a 
patched version of the Sarama library that Shopify released a while back.

We have a reasonably fast serialization protocol (Cap'n Proto), a 10G network 
and lots of cores. We have various consumers computing all kinds of aggregates 
on a reasonably high volume access log stream (1e6 messages/sec peak, about 
500-600 bytes per message uncompressed).

When profiling our consumer, our single hottest function (until we disabled 
it), was the CRC32 checksum validation, since the deserialization and 
aggregation in these consumers is pretty cheap.

We believe things could be improved by extending the wire protocol to support 
CRC-32C (Castagnoli), since SSE 4.2 has an instruction to accelerate its 
calculation.

https://en.wikipedia.org/wiki/SSE4#SSE4.2

It might be hard to use from Java, but consumers written in most other 
languages will benefit a lot.

To give you an idea, here are some benchmarks for the Go CRC32 functions 
running on a Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz core:

BenchmarkCrc32KB         90196 ns/op 363.30 MB/s
BenchmarkCrcCastagnoli32KB 3404 ns/op 9624.42 MB/s

I believe BenchmarkCrc32 written in C would do about 600-700 MB/sec, and the 
CRC32-C speed should be close to what one achieves in Go.

(Met Todd and Clark at the meetup last night. Thanks for the great 
presentation!)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to