Thanks Cheng for replying.
Meant to say to change number of partitions of a cached table. It doesn’t need
to be re-adjusted after caching.
To provide more context:
What I am seeing on my dataset is that we have a large number of tasks. Since
it appears each task is mapped to a partition, I
Hi Judy,
In the case of |HadoopRDD| and |NewHadoopRDD|, partition number is
actually decided by the |InputFormat| used. And
|spark.sql.inMemoryColumnarStorage.batchSize| is not related to
partition number, it controls the in-memory columnar batch size within a
single partition.
Also, what