[ https://issues.apache.org/jira/browse/KAFKA-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798916#comment-16798916 ]
Michal Turek commented on KAFKA-3539: ------------------------------------- Hi [~spyridon.ninos], I have never said the fix will be simple and I fully understand you. I saw this code while debugging the issue and other related ones and agree it will be challenging to fix the producer correctly and backward compatible. Also your original message I was commenting spoke not about "in progress" status but about the ticket closing with no action. If you would change the "in progress" status to "open" I wouldn't say anything. To describe our use case and the workarounds... We have REST-like services with high request rates (thousands or tens thousands req/s) implemented async way to be as effective as possible. Blocking on any level would cause full blocking the whole application including HTTP handler threads. If the threads are blocked, it's even impossible to return 500 Internal server error to the clients, there are only TCP/IP timeouts of the connections. - We use home-made {{class NonBlockingKafkaProducer<K, V> implements Producer<K, V>}} wrapper that writes to std. producer through ExecutorService that immediately rejects tasks if producer blocks all the provided threads and the queue in executor is full (async non-blocking API). The configured buffer can handle blocking that takes max. few seconds. - Unrelated to this ticket: This wrapper is wrapped by second one {{PersistingKafkaProducer<K, V> implements Producer<K, V>}} (also home-made) that is able to persist the failed events to a local disk and re-produce them after Kafka is available again - for unexpected longer outages. - There are also issues during *every* application startup. Kafka producer reads topics metadata lazily on the first incoming message... and blocks the callers threads. Every application startup is not a rare situation and one can consider critical to behave correctly during the whole process life cycle. The I/O operation to get the metadata takes some time, the first N requests immediately block all N threads. We use *Kafka consumer API* get list of topics and warm-up *producer* before binding the listening socket and making the instance active. - In special cases we also manage and periodically update a white list of topics that exist in Kafka based on the aforementioned consumer API to prevent writes to the non-existing ones. Each such write again triggers metadata update and blocks that thread. The white list doesn't solve deletion of a topic, but we typically don't delete. - Are we really so special or are these issues common in the whole industry? We built this know how about Kafka during last five years - mainly during production issues. I remember we started with Kafka 0.8.1.something. My 10 cents... > KafkaProducer.send() may block even though it returns the Future > ---------------------------------------------------------------- > > Key: KAFKA-3539 > URL: https://issues.apache.org/jira/browse/KAFKA-3539 > Project: Kafka > Issue Type: Bug > Components: producer > Reporter: Oleg Zhurakousky > Priority: Critical > > You can get more details from the us...@kafka.apache.org by searching on the > thread with the subject "KafkaProducer block on send". > The bottom line is that method that returns Future must never block, since it > essentially violates the Future contract as it was specifically designed to > return immediately passing control back to the user to check for completion, > cancel etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)