[ 
https://issues.apache.org/jira/browse/KAFKA-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-16283:
------------------------------
    Description: 
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are send to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.
{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel       224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel      2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 00000000000000000000.index
-rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 00000000000000000000.log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
00000000000000000000.timeindex
-rw-r--r--   1 lukchen  wheel         8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel        43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel       224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel      2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 00000000000000000000.index
-rw-r--r--   1 lukchen  wheel         0  2 20 19:53 00000000000000000000.log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
00000000000000000000.timeindex
-rw-r--r--   1 lukchen  wheel         0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel        43  2 20 19:53 partition.metadata
{code}
Tested in kafka 3.0.0, 3.2.3, and the latest trunk, they all have the same 
issue. It should already exist for a long time.

 

Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created.

  was:
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are send to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.
{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel       224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel      2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 00000000000000000000.index
-rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 00000000000000000000.log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
00000000000000000000.timeindex
-rw-r--r--   1 lukchen  wheel         8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel        43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel       224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel      2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 00000000000000000000.index
-rw-r--r--   1 lukchen  wheel         0  2 20 19:53 00000000000000000000.log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
00000000000000000000.timeindex
-rw-r--r--   1 lukchen  wheel         0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel        43  2 20 19:53 partition.metadata
{code}
Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created.


> RoundRobinPartitioner will only send to half of the partitions in a topic
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-16283
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16283
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.0.0, 3.6.1
>            Reporter: Luke Chen
>            Priority: Major
>
> When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we 
> expect data are send to all partitions in round-robin manner. But we found 
> there are only half of the partitions got the data. This causes half of the 
> resources(storage, consumer...) are wasted.
> {code:java}
> > bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> > localhost:9092 --partitions 2 
> Created topic quickstart-events4.
> # send 10 records to the topic, expecting 5 records in partition0, and 5 
> records in partition1
> > bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 
> > 1000 --record-size 1024 --throughput -1 --producer-props 
> > bootstrap.servers=localhost:9092 
> > partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner
> 1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg 
> latency, 121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 
> 99.9th.
> > ls -al /tmp/kafka-logs/quickstart-events4-1
> total 24
> drwxr-xr-x   7 lukchen  wheel       224  2 20 19:53 .
> drwxr-xr-x  70 lukchen  wheel      2240  2 20 19:53 ..
> -rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 00000000000000000000.index
> -rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 00000000000000000000.log
> -rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
> 00000000000000000000.timeindex
> -rw-r--r--   1 lukchen  wheel         8  2 20 19:53 leader-epoch-checkpoint
> -rw-r--r--   1 lukchen  wheel        43  2 20 19:53 partition.metadata
> # No records in partition 1
> > ls -al /tmp/kafka-logs/quickstart-events4-0
> total 8
> drwxr-xr-x   7 lukchen  wheel       224  2 20 19:53 .
> drwxr-xr-x  70 lukchen  wheel      2240  2 20 19:53 ..
> -rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 00000000000000000000.index
> -rw-r--r--   1 lukchen  wheel         0  2 20 19:53 00000000000000000000.log
> -rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
> 00000000000000000000.timeindex
> -rw-r--r--   1 lukchen  wheel         0  2 20 19:53 leader-epoch-checkpoint
> -rw-r--r--   1 lukchen  wheel        43  2 20 19:53 partition.metadata
> {code}
> Tested in kafka 3.0.0, 3.2.3, and the latest trunk, they all have the same 
> issue. It should already exist for a long time.
>  
> Had a quick look, it's because we will abortOnNewBatch each time when new 
> batch created.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to