Thanks Cheng for replying.
Meant to say to change number of partitions of a cached table. It doesn’t need
to be re-adjusted after caching.
To provide more context:
What I am seeing on my dataset is that we have a large number of tasks. Since
it appears each task is mapped to a partition, I
Hi Judy,
In the case of |HadoopRDD| and |NewHadoopRDD|, partition number is
actually decided by the |InputFormat| used. And
|spark.sql.inMemoryColumnarStorage.batchSize| is not related to
partition number, it controls the in-memory columnar batch size within a
single partition.
Also, what
Hi,
I am tuning a hive dataset on Spark SQL deployed via thrift server.
How can I change the number of partitions after caching the table on thrift
server?
I have tried the following but still getting the same number of partitions
after caching:
Spark.default.parallelism