Hello Guys,

Any insights on this??
If I'm not clear enough my question is how can I use kafka consumer and not
loose any data in cases of failures with spark-streaming.

On Tue, Dec 9, 2014 at 2:53 PM, Mukesh Jha <me.mukesh....@gmail.com> wrote:

> Hello Experts,
>
> I'm working on a spark app which reads data from kafka & persists it in
> hbase.
>
> Spark documentation states the below *[1]* that in case of worker failure
> we can loose some data. If not how can I make my kafka stream more reliable?
> I have seen there is a simple consumer *[2]* but I'm not sure if it has
> been used/tested extensively.
>
> I was wondering if there is a way to explicitly acknowledge the kafka
> offsets once they are replicated in memory of other worker nodes (if it's
> not already done) to tackle this issue.
>
> Any help is appreciated in advance.
>
>
>    1. *Using any input source that receives data through a network* - For
>    network-based data sources like *Kafka *and Flume, the received input
>    data is replicated in memory between nodes of the cluster (default
>    replication factor is 2). So if a worker node fails, then the system can
>    recompute the lost from the the left over copy of the input data. However,
>    if the *worker node where a network receiver was running fails, then a
>    tiny bit of data may be lost*, that is, the data received by the
>    system but not yet replicated to other node(s). The receiver will be
>    started on a different node and it will continue to receive data.
>    2. https://github.com/dibbhatt/kafka-spark-consumer
>
> Txz,
>
> *Mukesh Jha <me.mukesh....@gmail.com>*
>



-- 


Thanks & Regards,

*Mukesh Jha <me.mukesh....@gmail.com>*

Reply via email to