Re: question on Write Ahead Log (Spark Streaming )

2017-03-10 Thread Dibyendu Bhattacharya
Hi,

You could also use this Receiver :
https://github.com/dibbhatt/kafka-spark-consumer

This is part of spark-packages also :
https://spark-packages.org/package/dibbhatt/kafka-spark-consumer

You do not need to enable WAL in this and still recover from Driver failure
with no data loss. You can refer to
https://github.com/dibbhatt/kafka-spark-consumer/blob/master/README.md for
more details or can reach out to me.

Regards,
Dibyendu


On Wed, Mar 8, 2017 at 8:58 AM, kant kodali  wrote:

> Hi All,
>
> I am using a Receiver based approach. And I understand that spark
> streaming API's will convert the received data from receiver into blocks
> and these blocks that are in memory are also stored in WAL if one enables
> it. my upstream source which is not Kafka can also replay by which I mean
> if I don't send an ack to my upstream source it will resend it so I don't
> have to write the received data to WAL however I still need to enable WAL
> correct? because there are blocks that are in memory that needs to written
> to WAL so they can be recovered later.
>
> Thanks,
> kant
>


Re: question on Write Ahead Log (Spark Streaming )

2017-03-08 Thread Saisai Shao
IIUC, your scenario is quite like what currently ReliableKafkaReceiver
does. You can only send ack to the upstream source after WAL is persistent,
otherwise because of asynchronization of data processing and data
receiving, there's still a chance data could be lost if you send out ack
before WAL.

You could refer to ReliableKafkaReceiver.

On Thu, Mar 9, 2017 at 12:58 AM, kant kodali  wrote:

> Hi All,
>
> I am using a Receiver based approach. And I understand that spark
> streaming API's will convert the received data from receiver into blocks
> and these blocks that are in memory are also stored in WAL if one enables
> it. my upstream source which is not Kafka can also replay by which I mean
> if I don't send an ack to my upstream source it will resend it so I don't
> have to write the received data to WAL however I still need to enable WAL
> correct? because there are blocks that are in memory that needs to written
> to WAL so they can be recovered later.
>
> Thanks,
> kant
>


question on Write Ahead Log (Spark Streaming )

2017-03-08 Thread kant kodali
Hi All,

I am using a Receiver based approach. And I understand that spark streaming
API's will convert the received data from receiver into blocks and these
blocks that are in memory are also stored in WAL if one enables it. my
upstream source which is not Kafka can also replay by which I mean if I
don't send an ack to my upstream source it will resend it so I don't have
to write the received data to WAL however I still need to enable WAL
correct? because there are blocks that are in memory that needs to written
to WAL so they can be recovered later.

Thanks,
kant