In our application where we load our historical data in 40 partitioned RDDs (no. of available cores X 2) and we have not implemented any custom partitioner.
After applying transformations on these RDDs intermediate RDDs are created which have partitions greater than 40 and sometimes partitions are going up till 300. 1. Is Spark intelligent enough to manage the partitions of RDD? Please suggest why there is an increase in the no. of partitions? 2. We suspect that increasing the no. of partitions is causing decrease in performance. 3. If we create a custom Partitioner will it improve our performance? Thanks, Sayantini