Albert Strasheim created KAFKA-1449: ---------------------------------------
Summary: Extend wire protocol to allow CRC32C Key: KAFKA-1449 URL: https://issues.apache.org/jira/browse/KAFKA-1449 Project: Kafka Issue Type: Improvement Components: consumer Reporter: Albert Strasheim Assignee: Neha Narkhede Fix For: 0.9.0 Howdy We are currently building out a number of Kafka consumers in Go, based on a patched version of the Sarama library that Shopify released a while back. We have a reasonably fast serialization protocol (Cap'n Proto), a 10G network and lots of cores. We have various consumers computing all kinds of aggregates on a reasonably high volume access log stream (1e6 messages/sec peak, about 500-600 bytes per message uncompressed). When profiling our consumer, our single hottest function (until we disabled it), was the CRC32 checksum validation, since the deserialization and aggregation in these consumers is pretty cheap. We believe things could be improved by extending the wire protocol to support CRC-32C (Castagnoli), since SSE 4.2 has an instruction to accelerate its calculation. https://en.wikipedia.org/wiki/SSE4#SSE4.2 It might be hard to use from Java, but consumers written in most other languages will benefit a lot. To give you an idea, here are some benchmarks for the Go CRC32 functions running on a Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz core: BenchmarkCrc32KB 90196 ns/op 363.30 MB/s BenchmarkCrcCastagnoli32KB 3404 ns/op 9624.42 MB/s I believe BenchmarkCrc32 written in C would do about 600-700 MB/sec, and the CRC32-C speed should be close to what one achieves in Go. (Met Todd and Clark at the meetup last night. Thanks for the great presentation!) -- This message was sent by Atlassian JIRA (v6.2#6252)