[ 
https://issues.apache.org/jira/browse/KAFKA-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798916#comment-16798916
 ] 

Michal Turek commented on KAFKA-3539:
-------------------------------------

Hi [~spyridon.ninos],

I have never said the fix will be simple and I fully understand you. I saw this 
code while debugging the issue and other related ones and agree it will be 
challenging to fix the producer correctly and backward compatible. Also your 
original message I was commenting spoke not about "in progress" status but 
about the ticket closing with no action. If you would change the "in progress" 
status to "open" I wouldn't say anything.

To describe our use case and the workarounds... We have REST-like services with 
high request rates (thousands or tens thousands req/s) implemented async way to 
be as effective as possible. Blocking on any level would cause full blocking 
the whole application including HTTP handler threads. If the threads are 
blocked, it's even impossible to return 500 Internal server error to the 
clients, there are only TCP/IP timeouts of the connections.

- We use home-made {{class NonBlockingKafkaProducer<K, V> implements 
Producer<K, V>}} wrapper that writes to std. producer through ExecutorService 
that immediately rejects tasks if producer blocks all the provided threads and 
the queue in executor is full (async non-blocking API). The configured buffer 
can handle blocking that takes max. few seconds.
- Unrelated to this ticket: This wrapper is wrapped by second one 
{{PersistingKafkaProducer<K, V> implements Producer<K, V>}} (also home-made) 
that is able to persist the failed events to a local disk and re-produce them 
after Kafka is available again - for unexpected longer outages.
- There are also issues during *every* application startup. Kafka producer 
reads topics metadata lazily on the first incoming message... and blocks the 
callers threads. Every application startup is not a rare situation and one can 
consider critical to behave correctly during the whole process life cycle. The 
I/O operation to get the metadata takes some time, the first N requests 
immediately block all N threads. We use *Kafka consumer API* get list of topics 
and warm-up *producer* before binding the listening socket and making the 
instance active.
- In special cases we also manage and periodically update a white list of 
topics that exist in Kafka based on the aforementioned consumer API to prevent 
writes to the non-existing ones. Each such write again triggers metadata update 
and blocks that thread. The white list doesn't solve deletion of a topic, but 
we typically don't delete.
- Are we really so special or are these issues common in the whole industry? We 
built this know how about Kafka during last five years - mainly during 
production issues. I remember we started with Kafka 0.8.1.something.

My 10 cents...

> KafkaProducer.send() may block even though it returns the Future
> ----------------------------------------------------------------
>
>                 Key: KAFKA-3539
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3539
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>            Reporter: Oleg Zhurakousky
>            Priority: Critical
>
> You can get more details from the us...@kafka.apache.org by searching on the 
> thread with the subject "KafkaProducer block on send".
> The bottom line is that method that returns Future must never block, since it 
> essentially violates the Future contract as it was specifically designed to 
> return immediately passing control back to the user to check for completion, 
> cancel etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to