Re: SQL group by on Parquet table slower when table cached

Michael Armbrust Fri, 06 Feb 2015 18:48:32 -0800

Check the storage tab.  Does the table actually fit in memory? Otherwise
you are rebuilding column buffers in addition to reading the data off of
the disk.


On Fri, Feb 6, 2015 at 4:39 PM, Manoj Samel <manojsamelt...@gmail.com>
wrote:

> Spark 1.2
>
> Data stored in parquet table (large number of rows)
>
> Test 1
>
> select a, sum(b), sum(c) from table
>
> Test
>
> sqlContext.cacheTable()
> select a, sum(b), sum(c) from table  - "seed cache" First time slow since
> loading cache ?
> select a, sum(b), sum(c) from table  - Second time it should be faster as
> it should be reading from cache, not HDFS. But it is slower than test1
>
> Any thoughts? Should a different query be used to seed cache ?
>
> Thanks,
>
>

Re: SQL group by on Parquet table slower when table cached

Reply via email to