Re: question on Write Ahead Log (Spark Streaming )
Hi, You could also use this Receiver : https://github.com/dibbhatt/kafka-spark-consumer This is part of spark-packages also : https://spark-packages.org/package/dibbhatt/kafka-spark-consumer You do not need to enable WAL in this and still recover from Driver failure with no data loss. You can refer to https://github.com/dibbhatt/kafka-spark-consumer/blob/master/README.md for more details or can reach out to me. Regards, Dibyendu On Wed, Mar 8, 2017 at 8:58 AM, kant kodaliwrote: > Hi All, > > I am using a Receiver based approach. And I understand that spark > streaming API's will convert the received data from receiver into blocks > and these blocks that are in memory are also stored in WAL if one enables > it. my upstream source which is not Kafka can also replay by which I mean > if I don't send an ack to my upstream source it will resend it so I don't > have to write the received data to WAL however I still need to enable WAL > correct? because there are blocks that are in memory that needs to written > to WAL so they can be recovered later. > > Thanks, > kant >
Re: question on Write Ahead Log (Spark Streaming )
IIUC, your scenario is quite like what currently ReliableKafkaReceiver does. You can only send ack to the upstream source after WAL is persistent, otherwise because of asynchronization of data processing and data receiving, there's still a chance data could be lost if you send out ack before WAL. You could refer to ReliableKafkaReceiver. On Thu, Mar 9, 2017 at 12:58 AM, kant kodaliwrote: > Hi All, > > I am using a Receiver based approach. And I understand that spark > streaming API's will convert the received data from receiver into blocks > and these blocks that are in memory are also stored in WAL if one enables > it. my upstream source which is not Kafka can also replay by which I mean > if I don't send an ack to my upstream source it will resend it so I don't > have to write the received data to WAL however I still need to enable WAL > correct? because there are blocks that are in memory that needs to written > to WAL so they can be recovered later. > > Thanks, > kant >
question on Write Ahead Log (Spark Streaming )
Hi All, I am using a Receiver based approach. And I understand that spark streaming API's will convert the received data from receiver into blocks and these blocks that are in memory are also stored in WAL if one enables it. my upstream source which is not Kafka can also replay by which I mean if I don't send an ack to my upstream source it will resend it so I don't have to write the received data to WAL however I still need to enable WAL correct? because there are blocks that are in memory that needs to written to WAL so they can be recovered later. Thanks, kant