Re: Re: About cache table performance in spark sql

2016-02-04 Thread Takeshi Yamamuro
com> > *Date:* 2016-02-04 14:35 > *To:* fightf...@163.com > *CC:* user <user@spark.apache.org> > *Subject:* Re: About cache table performance in spark sql > Sun, > >When Executor don't have enough memory and if it tries to cache the > data, it spends l

Re: Re: About cache table performance in spark sql

2016-02-04 Thread fightf...@163.com
Oh, thanks. Make sense to me. Best, Sun. fightf...@163.com From: Takeshi Yamamuro Date: 2016-02-04 16:01 To: fightf...@163.com CC: user Subject: Re: Re: About cache table performance in spark sql Hi, Parquet data are column-wise and highly compressed, so the size of deserialized rows

Re: Re: About cache table performance in spark sql

2016-02-03 Thread fightf...@163.com
? From impala I get the overall parquet file size if about 24.59GB. Would be good to had some correction on this. Best, Sun. fightf...@163.com From: Prabhu Joseph Date: 2016-02-04 14:35 To: fightf...@163.com CC: user Subject: Re: About cache table performance in spark sql Sun, When

Re: About cache table performance in spark sql

2016-02-03 Thread Prabhu Joseph
Sun, When Executor don't have enough memory and if it tries to cache the data, it spends lot of time on GC and hence the job will be slow. Either, 1. We should allocate enough memory to cache all RDD and hence the job will complete fast Or 2. Don't use cache when there is not enough

About cache table performance in spark sql

2016-02-03 Thread fightf...@163.com
Hi, I want to make sure that the cache table indeed would accelerate sql queries. Here is one of my use case : impala table size : 24.59GB, no partitions, with about 1 billion+ rows. I use sqlContext.sql to run queries over this table and try to do cache and uncache command to see if there