[
https://issues.apache.org/jira/browse/KAFKA-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798916#comment-16798916
]
Michal Turek commented on KAFKA-3539:
-------------------------------------
Hi [~spyridon.ninos],
I have never said the fix will be simple and I fully understand you. I saw this
code while debugging the issue and other related ones and agree it will be
challenging to fix the producer correctly and backward compatible. Also your
original message I was commenting spoke not about "in progress" status but
about the ticket closing with no action. If you would change the "in progress"
status to "open" I wouldn't say anything.
To describe our use case and the workarounds... We have REST-like services with
high request rates (thousands or tens thousands req/s) implemented async way to
be as effective as possible. Blocking on any level would cause full blocking
the whole application including HTTP handler threads. If the threads are
blocked, it's even impossible to return 500 Internal server error to the
clients, there are only TCP/IP timeouts of the connections.
- We use home-made {{class NonBlockingKafkaProducer<K, V> implements
Producer<K, V>}} wrapper that writes to std. producer through ExecutorService
that immediately rejects tasks if producer blocks all the provided threads and
the queue in executor is full (async non-blocking API). The configured buffer
can handle blocking that takes max. few seconds.
- Unrelated to this ticket: This wrapper is wrapped by second one
{{PersistingKafkaProducer<K, V> implements Producer<K, V>}} (also home-made)
that is able to persist the failed events to a local disk and re-produce them
after Kafka is available again - for unexpected longer outages.
- There are also issues during *every* application startup. Kafka producer
reads topics metadata lazily on the first incoming message... and blocks the
callers threads. Every application startup is not a rare situation and one can
consider critical to behave correctly during the whole process life cycle. The
I/O operation to get the metadata takes some time, the first N requests
immediately block all N threads. We use *Kafka consumer API* get list of topics
and warm-up *producer* before binding the listening socket and making the
instance active.
- In special cases we also manage and periodically update a white list of
topics that exist in Kafka based on the aforementioned consumer API to prevent
writes to the non-existing ones. Each such write again triggers metadata update
and blocks that thread. The white list doesn't solve deletion of a topic, but
we typically don't delete.
- Are we really so special or are these issues common in the whole industry? We
built this know how about Kafka during last five years - mainly during
production issues. I remember we started with Kafka 0.8.1.something.
My 10 cents...
> KafkaProducer.send() may block even though it returns the Future
> ----------------------------------------------------------------
>
> Key: KAFKA-3539
> URL: https://issues.apache.org/jira/browse/KAFKA-3539
> Project: Kafka
> Issue Type: Bug
> Components: producer
> Reporter: Oleg Zhurakousky
> Priority: Critical
>
> You can get more details from the [email protected] by searching on the
> thread with the subject "KafkaProducer block on send".
> The bottom line is that method that returns Future must never block, since it
> essentially violates the Future contract as it was specifically designed to
> return immediately passing control back to the user to check for completion,
> cancel etc.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)