To: Evo Eftimov
Cc: Christian Perez; user
Subject: Re: Super slow caching in 1.3?
Here are the types that we specialize, other types will be much slower.
This is only for Spark SQL, normal RDDs do not serialize data that is
cached. I'll also not that until yesterday we were missing FloatType
I face the similar issue in Spark 1.2. Cache the schema RDD takes about 50s
for 400MB data. The schema is similar to the TPC-H LineItem.
Here is the code I tried the cache. I am wondering if there is any setting
missing?
Thank you so much!
lineitemSchemaRDD.registerTempTable(lineitem);
a
detailed description / spec of both
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Thursday, April 16, 2015 7:23 PM
To: Evo Eftimov
Cc: Christian Perez; user
Subject: Re: Super slow caching in 1.3?
Here are the types that we specialize, other types will be much slower
Hi Michael,
Good question! We checked 1.2 and found that it is also slow cacheing
the same flat parquet file. Caching other file formats of the same
data were faster by up to a factor of ~2. Note that the parquet file
was created in Impala but the other formats were written by Spark SQL.
Cheers,
: user
Subject: Re: Super slow caching in 1.3?
Hi Michael,
Good question! We checked 1.2 and found that it is also slow cacheing the same
flat parquet file. Caching other file formats of the same data were faster by
up to a factor of ~2. Note that the parquet file was created in Impala
the performance of each of the
above options is
-Original Message-
From: Christian Perez [mailto:christ...@svds.com]
Sent: Thursday, April 16, 2015 6:09 PM
To: Michael Armbrust
Cc: user
Subject: Re: Super slow caching in 1.3?
Hi Michael,
Good question! We checked 1.2 and found
Subject: Re: Super slow caching in 1.3?
Here are the types that we specialize, other types will be much slower. This
is only for Spark SQL, normal RDDs do not serialize data that is cached. I'll
also not that until yesterday we were missing FloatType
https://github.com/apache/spark/blob
Do you think you are seeing a regression from 1.2? Also, are you caching
nested data or flat rows? The in-memory caching is not really designed for
nested data and so performs pretty slowly here (its just falling back to
kryo and even then there are some locking issues).
If so, would it be