I'm using Spark Streaming and Kafka with Direct Approach. I have created a
topic with 6 partitions so when I execute Spark there are six RDD. I
understand than ideally it should have six executors to process each one
one RDD. To do it, when I execute spark-submit (I use YARN) I specific the
6 kafka partitions will result in 6 spark partitions, not 6 spark rdds.
The question of whether you will have a backlog isn't just a matter of
having 1 executor per partition. If a single executor can process all of
the partitions fast enough to complete a batch in under the required time,
you