I'm using Spark Streaming and Kafka with Direct Approach. I have created a topic with 6 partitions so when I execute Spark there are six RDD. I understand than ideally it should have six executors to process each one one RDD. To do it, when I execute spark-submit (I use YARN) I specific the number executors to six. If I don't specific anything it just create one executor. Looking for information I have read:
"The --num-executors command-line flag or spark.executor.instances configuration property control the number of executors requested. Starting in CDH 5.4/Spark 1.3, you will be able to avoid setting this property by turning on dynamic allocation <https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation> with thespark.dynamicAllocation.enabled property. Dynamic allocation enables a Spark application to request executors when there is a backlog of pending tasks and free up executors when idle." I have this parameter enabled, I understand than if I don't set the parameter --num-executors it must create six executors or am I wrong?