Esko Suomi created KAFKA-519:
--------------------------------

             Summary: Allow commiting the state of single KafkaStream
                 Key: KAFKA-519
                 URL: https://issues.apache.org/jira/browse/KAFKA-519
             Project: Kafka
          Issue Type: Improvement
    Affects Versions: 0.7.1, 0.7
            Reporter: Esko Suomi


Currently consuming multiple topics through ZK by first acquiring 
ConsumerConnector and then fetching message streams for wanted topics. And when 
the messages have been consumed, the current consuming state is commited with 
the method ConsumerConnector#commitOffsets().

This scheme has a flaw when the consuming application is used as sort of a data 
piping proxy instead of final consuming sink. In our case we read data from 
Kafka, repackage it and only then move it to persistent storage. The 
repackaging step is relatively long running and may span several hours (usually 
a few minutes) which in addition is mixed with highly asymmetric topic 
throughputs; one of our topics gets about 80% of total throughput. We have 
about 20 topics in total. As an unwanted side effect of all this, commiting the 
offset whenever the per-topic persistence step has been taken means commiting 
offsets for other topics too which may eventually manifest as loss of data if 
the consuming application or the machine it is running on crashes.

So, while this loss of data can be alleviated to some extent with for example 
local temp storage, it would be cleaner if KafkaStream itself would allow for 
partition level offset commiting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to