In our application where we load our historical data in 40 partitioned RDDs
(no. of available cores X 2) and we have not implemented any custom
partitioner.

After applying transformations on these RDDs intermediate RDDs are created
which have partitions greater than 40 and sometimes partitions are going up
till 300.

1. Is Spark intelligent enough to manage the partitions of RDD? Please
suggest why there is an increase in the no. of partitions?

2. We suspect that increasing the no. of partitions is causing decrease in
performance.

3. If we create a custom Partitioner will it improve our performance?



Thanks,

Sayantini

Reply via email to