[ https://issues.apache.org/jira/browse/KAFKA-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Chen updated KAFKA-16283: ------------------------------ Description: When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect data are send to all partitions in round-robin manner. But we found there are only half of the partitions got the data. This causes half of the resources(storage, consumer...) are wasted. {code:java} > bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server > localhost:9092 --partitions 2 Created topic quickstart-events4. # send 10 records to the topic, expecting 5 records in partition0, and 5 records in partition1 > bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 > --record-size 1024 --throughput -1 --producer-props > bootstrap.servers=localhost:9092 > partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner 1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th. > ls -al /tmp/kafka-logs/quickstart-events4-1 total 24 drwxr-xr-x 7 lukchen wheel 224 2 20 19:53 . drwxr-xr-x 70 lukchen wheel 2240 2 20 19:53 .. -rw-r--r-- 1 lukchen wheel 10485760 2 20 19:53 00000000000000000000.index -rw-r--r-- 1 lukchen wheel 1037819 2 20 19:53 00000000000000000000.log -rw-r--r-- 1 lukchen wheel 10485756 2 20 19:53 00000000000000000000.timeindex -rw-r--r-- 1 lukchen wheel 8 2 20 19:53 leader-epoch-checkpoint -rw-r--r-- 1 lukchen wheel 43 2 20 19:53 partition.metadata # No records in partition 1 > ls -al /tmp/kafka-logs/quickstart-events4-0 total 8 drwxr-xr-x 7 lukchen wheel 224 2 20 19:53 . drwxr-xr-x 70 lukchen wheel 2240 2 20 19:53 .. -rw-r--r-- 1 lukchen wheel 10485760 2 20 19:53 00000000000000000000.index -rw-r--r-- 1 lukchen wheel 0 2 20 19:53 00000000000000000000.log -rw-r--r-- 1 lukchen wheel 10485756 2 20 19:53 00000000000000000000.timeindex -rw-r--r-- 1 lukchen wheel 0 2 20 19:53 leader-epoch-checkpoint -rw-r--r-- 1 lukchen wheel 43 2 20 19:53 partition.metadata {code} Had a quick look, it's because we will abortOnNewBatch each time when new batch created. was: When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect data are send to all partitions in round-robin manner. But we found there are only half of the partitions got the data. This causes half of the resources(storage, consumer...) are wasted. {code:java} > bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server > localhost:9092 --partitions 2 Created topic quickstart-events4. # send 10 records to the topic, expecting 5 records in partition0, and 5 records in partition1 > bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 > --record-size 1024 --throughput -1 --producer-props > bootstrap.servers=localhost:9092 > partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner 1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th. > ls -al /tmp/kafka-logs/quickstart-events4-1 total 24 drwxr-xr-x 7 lukchen wheel 224 2 20 19:53 . drwxr-xr-x 70 lukchen wheel 2240 2 20 19:53 .. -rw-r--r-- 1 lukchen wheel 10485760 2 20 19:53 00000000000000000000.index -rw-r--r-- 1 lukchen wheel 1037819 2 20 19:53 00000000000000000000.log -rw-r--r-- 1 lukchen wheel 10485756 2 20 19:53 00000000000000000000.timeindex -rw-r--r-- 1 lukchen wheel 8 2 20 19:53 leader-epoch-checkpoint -rw-r--r-- 1 lukchen wheel 43 2 20 19:53 partition.metadata # No records in partition 1 > ls -al /tmp/kafka-logs/quickstart-events4-0 total 8 drwxr-xr-x 7 lukchen wheel 224 2 20 19:53 . drwxr-xr-x 70 lukchen wheel 2240 2 20 19:53 .. -rw-r--r-- 1 lukchen wheel 10485760 2 20 19:53 00000000000000000000.index -rw-r--r-- 1 lukchen wheel 0 2 20 19:53 00000000000000000000.log -rw-r--r-- 1 lukchen wheel 10485756 2 20 19:53 00000000000000000000.timeindex -rw-r--r-- 1 lukchen wheel 0 2 20 19:53 leader-epoch-checkpoint -rw-r--r-- 1 lukchen wheel 43 2 20 19:53 partition.metadata {code} Had a quick look, it's because we will abortOnNewBatch each time when new batch created. > RoundRobinPartitioner will only send to half of the partitions in a topic > ------------------------------------------------------------------------- > > Key: KAFKA-16283 > URL: https://issues.apache.org/jira/browse/KAFKA-16283 > Project: Kafka > Issue Type: Bug > Reporter: Luke Chen > Priority: Major > > When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we > expect data are send to all partitions in round-robin manner. But we found > there are only half of the partitions got the data. This causes half of the > resources(storage, consumer...) are wasted. > {code:java} > > bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server > > localhost:9092 --partitions 2 > Created topic quickstart-events4. > # send 10 records to the topic, expecting 5 records in partition0, and 5 > records in partition1 > > bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records > > 1000 --record-size 1024 --throughput -1 --producer-props > > bootstrap.servers=localhost:9092 > > partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner > 1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg > latency, 121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms > 99.9th. > > ls -al /tmp/kafka-logs/quickstart-events4-1 > total 24 > drwxr-xr-x 7 lukchen wheel 224 2 20 19:53 . > drwxr-xr-x 70 lukchen wheel 2240 2 20 19:53 .. > -rw-r--r-- 1 lukchen wheel 10485760 2 20 19:53 00000000000000000000.index > -rw-r--r-- 1 lukchen wheel 1037819 2 20 19:53 00000000000000000000.log > -rw-r--r-- 1 lukchen wheel 10485756 2 20 19:53 > 00000000000000000000.timeindex > -rw-r--r-- 1 lukchen wheel 8 2 20 19:53 leader-epoch-checkpoint > -rw-r--r-- 1 lukchen wheel 43 2 20 19:53 partition.metadata > # No records in partition 1 > > ls -al /tmp/kafka-logs/quickstart-events4-0 > total 8 > drwxr-xr-x 7 lukchen wheel 224 2 20 19:53 . > drwxr-xr-x 70 lukchen wheel 2240 2 20 19:53 .. > -rw-r--r-- 1 lukchen wheel 10485760 2 20 19:53 00000000000000000000.index > -rw-r--r-- 1 lukchen wheel 0 2 20 19:53 00000000000000000000.log > -rw-r--r-- 1 lukchen wheel 10485756 2 20 19:53 > 00000000000000000000.timeindex > -rw-r--r-- 1 lukchen wheel 0 2 20 19:53 leader-epoch-checkpoint > -rw-r--r-- 1 lukchen wheel 43 2 20 19:53 partition.metadata > {code} > Had a quick look, it's because we will abortOnNewBatch each time when new > batch created. -- This message was sent by Atlassian Jira (v8.20.10#820010)