I'm wondering to use Flume (channel file)-Spark Streaming.

I have some doubts about it:

1.The RDD size is all data what it comes in a microbatch which you have
defined. Risght?

2.If there are 2Gb of data, how many are RDDs generated? just one and I
have to make a repartition?

3.When is the ACK sent back  from Spark to Flume?
  I guess that if Flume dies, Flume is going to send the same data again to
Spark
  If Spark dies, I have no idea if Spark is going to reprocessing same data
again when it is sent again.
  Coult it be different if I use Kafka Channel?

Reply via email to