subject:"RE\: configure number of cached partition in memory on SparkSQL"

RE: configure number of cached partition in memory on SparkSQL

2015-03-19 Thread Judy Nash

Thanks Cheng for replying. Meant to say to change number of partitions of a cached table. It doesn’t need to be re-adjusted after caching. To provide more context: What I am seeing on my dataset is that we have a large number of tasks. Since it appears each task is mapped to a partition, I

Re: configure number of cached partition in memory on SparkSQL

2015-03-16 Thread Cheng Lian

Hi Judy, In the case of |HadoopRDD| and |NewHadoopRDD|, partition number is actually decided by the |InputFormat| used. And |spark.sql.inMemoryColumnarStorage.batchSize| is not related to partition number, it controls the in-memory columnar batch size within a single partition. Also, what