Perf producer/consumers for compacted topics
Hi there, As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been doing extensive benchmarking of Kafka. In our case this is to help give customers a good idea of the performance of our various configurations. For this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh` across multiple machines, which was relatively easy to do and very successful (recently leading to a doc change and a good lesson about 0.10). However, we're finding one thing missing from the current producer/consumer perf tests, which is that there's no good perf testing on compacted topics. Some folk will undoubtedly use compacted topics, so it would be extremely helpful (I think) for the community to have benchmarks that test performance on compacted topics. We're interested in working on this and contributing it upstream, but are pretty unsure what such a test should look like. One straw proposal is to adapt the existing producer/consumer perf tests to work on a compacted topic, likely with an additional flag on the producer that lets you choose how wide a key range to emit, if it should emit deletes (and how often to do so) and so on. Is there anything more we could or should do there? We're happy writing the code here, and want to continue contributing back, I'd just love a hand thinking about what perf tests for compacted topics should look like. Thanks Tom Crayford Heroku Kafka
Re: Perf producer/consumers for compacted topics
Hi, There is a kafka.tools.TestLogCleaning tool, which is used to stress test the compaction feature. This tool validates the correctness of compaction process. This tool can be improved for perf testing. I think you want to benchmark server side compaction process. Currently we have few compaction related metrics. We may need to add few more topic specific metrics for better analysis. log compaction related JMX metrics: kafka.log:type=LogCleaner,name=cleaner-recopy-percent kafka.log:type=LogCleaner,name=max-buffer-utilization-percent kafka.log:type=LogCleaner,name=max-clean-time-secs kafka.log:type=LogCleanerManager,name=max-dirty-percent Manikumar On Tue, May 17, 2016 at 8:45 PM, Tom Crayford wrote: > Hi there, > > As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been > doing extensive benchmarking of Kafka. In our case this is to help give > customers a good idea of the performance of our various configurations. For > this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh` > across multiple machines, which was relatively easy to do and very > successful (recently leading to a doc change and a good lesson about 0.10). > > However, we're finding one thing missing from the current producer/consumer > perf tests, which is that there's no good perf testing on compacted topics. > Some folk will undoubtedly use compacted topics, so it would be extremely > helpful (I think) for the community to have benchmarks that test > performance on compacted topics. We're interested in working on this and > contributing it upstream, but are pretty unsure what such a test should > look like. One straw proposal is to adapt the existing producer/consumer > perf tests to work on a compacted topic, likely with an additional flag on > the producer that lets you choose how wide a key range to emit, if it > should emit deletes (and how often to do so) and so on. Is there anything > more we could or should do there? > > We're happy writing the code here, and want to continue contributing back, > I'd just love a hand thinking about what perf tests for compacted topics > should look like. > > Thanks > > Tom Crayford > Heroku Kafka >
Re: Perf producer/consumers for compacted topics
Hi, I'm interested in benchmarking the impact of compaction on producers and consumers and long term cluster stability. That's not *quite* the impact of it on the server side, but it certainly plays into it. For example, I'd like to be able to answer "in configuration X, if we write N messages into a compacted topic with a certain key range, a certain number of deletes etc, *then* replay that into a consumer that does nothing. How long does that consumer take? What happens if we're continually running that compaction process, along with restarting consumers once an hour or two, *and* producing a lot of messages. What happens to perf on compacted topics with different disk configurations (e.g. magnetic vs ssd, RAID vs JBOD). I certainly welcome some topic/partition specific compaction metrics, and would be willing to contribute there. Thanks Tom On Wed, May 18, 2016 at 1:32 PM, Manikumar Reddy wrote: > Hi, > > There is a kafka.tools.TestLogCleaning tool, which is used to stress test > the compaction feature. > This tool validates the correctness of compaction process. This tool can be > improved for perf testing. > > I think you want to benchmark server side compaction process. Currently we > have few compaction > related metrics. We may need to add few more topic specific metrics for > better analysis. > > log compaction related JMX metrics: > kafka.log:type=LogCleaner,name=cleaner-recopy-percent > kafka.log:type=LogCleaner,name=max-buffer-utilization-percent > kafka.log:type=LogCleaner,name=max-clean-time-secs > kafka.log:type=LogCleanerManager,name=max-dirty-percent > > Manikumar > > On Tue, May 17, 2016 at 8:45 PM, Tom Crayford > wrote: > > > Hi there, > > > > As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been > > doing extensive benchmarking of Kafka. In our case this is to help give > > customers a good idea of the performance of our various configurations. > For > > this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh` > > across multiple machines, which was relatively easy to do and very > > successful (recently leading to a doc change and a good lesson about > 0.10). > > > > However, we're finding one thing missing from the current > producer/consumer > > perf tests, which is that there's no good perf testing on compacted > topics. > > Some folk will undoubtedly use compacted topics, so it would be extremely > > helpful (I think) for the community to have benchmarks that test > > performance on compacted topics. We're interested in working on this and > > contributing it upstream, but are pretty unsure what such a test should > > look like. One straw proposal is to adapt the existing producer/consumer > > perf tests to work on a compacted topic, likely with an additional flag > on > > the producer that lets you choose how wide a key range to emit, if it > > should emit deletes (and how often to do so) and so on. Is there anything > > more we could or should do there? > > > > We're happy writing the code here, and want to continue contributing > back, > > I'd just love a hand thinking about what perf tests for compacted topics > > should look like. > > > > Thanks > > > > Tom Crayford > > Heroku Kafka > > >