Do you think you are seeing a regression from 1.2?  Also, are you caching
nested data or flat rows?  The in-memory caching is not really designed for
nested data and so performs pretty slowly here (its just falling back to
kryo and even then there are some locking issues).

If so, would it be possible to try caching a flattened version?

CACHE TABLE flattenedTable AS SELECT ... FROM parquetTable

On Mon, Apr 6, 2015 at 5:00 PM, Christian Perez <christ...@svds.com> wrote:

> Hi all,
>
> Has anyone else noticed very slow time to cache a Parquet file? It
> takes 14 s per 235 MB (1 block) uncompressed node local Parquet file
> on M2 EC2 instances. Or are my expectations way off...
>
> Cheers,
>
> Christian
>
> --
> Christian Perez
> Silicon Valley Data Science
> Data Analyst
> christ...@svds.com
> @cp_phd
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to