Re: Few observations related to KafkaSpout implementation (1.1.0)

Bobby Evans Mon, 10 Jul 2017 11:36:57 -0700

I'm not sure what assumptions you want to make that this is preventing, or why 
they would be helpful.

- Bobby

On Monday, July 10, 2017, 12:14:53 PM CDT, chandan singh <cks07...@gmail.com>
wrote:

Hi Stig & Bobby

Thanks for confirming my understanding.

1) Ensuring that calls to nexTuple(), ack() and fail() are non-blocking
has been a guideline on http://storm.apache.org/releases/1.1.0/Concepts.html
for long. Copying verbatim here : "The main method on spouts is nextTuple.
nextTuple either emits a new tuple into the topology or simply returns if
there are no new tuples to emit. It is imperative that nextTuple does not
block for any spout implementation, because Storm calls all the spout
methods on the same thread." I admit that there is some chance my
interpretation is partially incorrect but I have been following it in a
custom spout till now. Even though the objective is different, there is a
similar hint on Kafka official documentation. Please see under heading "2.
Decouple Consumption and Processing" on
https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html.
Essentially, a thread polls Kafka and spout thread gets the messages
through a shared queue. If pre-fetching is present in Kafka (I will read
about it further), I assume we do not have to fetch in another thread but I
am not sure how does the pre-fetching behave with re-seeking before every
poll.

2) @Bobby, you are correct in pointing what needs to be optimized but the
facts, sometimes, prevent us from making assumptions. We do optimize our
retry loop such that we don't poll the messages again. I especially see
problems when combined with exponential back off. I am not sure how
difficult or clean will it be to expose some sort of configuration to allow
such optimization? Do you think it will be worth trying out something?

Thanks
Chandan

Re: Few observations related to KafkaSpout implementation (1.1.0)

Reply via email to