Re: configure number of cached partition in memory on SparkSQL

Cheng Lian Mon, 16 Mar 2015 05:42:36 -0700

Hi Judy,

Also, what do you mean by “change the number of partitions /after/caching the table”? Are you trying to re-cache an already cached tablewith a different partition number?

Currently, I don’t see a super intuitive pure SQL way to set thepartition number in this case. Maybe you can try this (assuming table|t| has a column |s| which is expected to be sorted):


|SET  spark.sql.shuffle.partitions =10;
CACHE  TABLE  cached_tAS  SELECT  *FROM  tORDER  BY  s;
|

In this way, we introduce a shuffle by sorting a column, and zoom in/outthe partition number at the same time. This might not be the best wayout there, but it’s the first one that jumped into my head.


Cheng

On 3/5/15 3:51 AM, Judy Nash wrote:

Hi,

I am tuning a hive dataset on Spark SQL deployed via thrift server.
How can I change the number of partitions after caching the table onthrift server?
I have tried the following but still getting the same number ofpartitions after caching:
Spark.default.parallelism

spark.sql.inMemoryColumnarStorage.batchSize

Thanks,

Judy

Re: configure number of cached partition in memory on SparkSQL

Reply via email to