[
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174373#comment-14174373
]
Ewen Cheslack-Postava commented on KAFKA-1710:
----------------------------------------------
[~Bmis13] That approach just pushes the problem into KafkaAsyncProducer's
thread that processes messages -- there won't be lock contention in
KafkaProducer since KafkaAsyncProducer will be the only user of it, but you may
not get an improvement in throughput because ultimately you're limited to the
time a single thread can get. It may even get *slower* because you'll have more
runnable threads at any given time, which means that the KafkaAsyncProducer
worker thread will get less CPU time. Even disregarding that, since you used a
LinkedBlockingQueue that will become your new source of contention (since it
must be synchronized internally). If you have a very large capacity, that'll
let the threads continue to make progress and contention will be lower since
the time spent adding an item is very small, but it will cost a lot of memory
since you're just adding a layer of buffering. That might be useful if you have
bursty traffic (the buffer allows you to temporarily buffer more data while the
KafkaProducer works on getting it sent), but if you have sustained traffic
you'll just have constantly growing memory usage. If the capacity is small,
then the threads producing messages will eventually end up getting blocked
waiting for there to be space in the queue.
Probably the biggest issue here is that this test only writes to a single
partition in a single topic. You could improve performance by using more
partitions in that topic. You're already writing to all producers from all
threads, so you must not need the ordering guarantees of a single partition. If
you still want a single partition, you can improve performance by using more
Producers, which will spread the contention across more queues. Since you
already have 4 that you're running round-robin on, I'd guess adding more
shouldn't be a problem.
In any case, this use case seems a bit odd. Are you really going to have 200
threads generating messages *as fast as they can* with only 4 producers?
As far as this issue is concerned, the original report said the problem was
deadlock but that doesn't seem to be the case. If you're just worried about
performance, it probably makes more sense to move the discussion over to the
mailing list. It'll probably be seen by more people and there will probably be
multiple suggestions for improvements to your approach before we have to make
changes to the Kafka code.
> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is
> being sent to single partition
> ------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
> Issue Type: Bug
> Components: producer
> Environment: Development
> Reporter: Bhavesh Mistry
> Priority: Critical
> Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png,
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump,
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump,
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so
> on, I have encounter deadlock (please see the screen attached) and thread
> contention from YourKit profiling.
> Use Case:
> 1) Aggregating messages into same partition for metric counting.
> 2) Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>
> It seems that the following threads have not changed their stack for more
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
> byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run()
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run()
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
> byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run()
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run()
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
> byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run()
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run()
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)