Hello, As per Spark programming guide, it says "we should have 2-4 partitions for each CPU in your cluster.". In this case how does 1 CPU core process 2-4 partitions at the same time? Link - http://spark.apache.org/docs/latest/programming-guide.html (under Rdd section)
Does it do context switching between tasks or run them in parallel? If it does context switching how is it efficient compared to 1:1 partition vs Core? PS: If we are using Kafka direct API in which kafka partitions= Rdd partitions. Does that mean we should give 40 kafka partitions for 10 CPU Cores? -- Regards Hemalatha