I face the similar issue in Spark 1.2. Cache the schema RDD takes about 50s
for 400MB data. The schema is similar to the TPC-H LineItem.

Here is the code I tried the cache. I am wondering if there is any setting
missing?

Thank you so much!

lineitemSchemaRDD.registerTempTable("lineitem");
sqlContext.sqlContext().cacheTable("lineitem");
System.out.println(lineitemSchemaRDD.count());


On Mon, Apr 6, 2015 at 8:00 PM, Christian Perez <christ...@svds.com> wrote:

> Hi all,
>
> Has anyone else noticed very slow time to cache a Parquet file? It
> takes 14 s per 235 MB (1 block) uncompressed node local Parquet file
> on M2 EC2 instances. Or are my expectations way off...
>
> Cheers,
>
> Christian
>
> --
> Christian Perez
> Silicon Valley Data Science
> Data Analyst
> christ...@svds.com
> @cp_phd
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Wenlei Xie (谢文磊)

Ph.D. Candidate
Department of Computer Science
456 Gates Hall, Cornell University
Ithaca, NY 14853, USA
Email: wenlei....@gmail.com

Reply via email to