I'm wondering to use Flume (channel file)-Spark Streaming. I have some doubts about it:
1.The RDD size is all data what it comes in a microbatch which you have defined. Risght? 2.If there are 2Gb of data, how many are RDDs generated? just one and I have to make a repartition? 3.When is the ACK sent back from Spark to Flume? I guess that if Flume dies, Flume is going to send the same data again to Spark If Spark dies, I have no idea if Spark is going to reprocessing same data again when it is sent again. Coult it be different if I use Kafka Channel?