Re: Super slow caching in 1.3?

2015-04-27 Thread Christian Perez
To: Evo Eftimov Cc: Christian Perez; user Subject: Re: Super slow caching in 1.3? Here are the types that we specialize, other types will be much slower. This is only for Spark SQL, normal RDDs do not serialize data that is cached. I'll also not that until yesterday we were missing FloatType

Re: Pyspark where do third parties libraries need to be installed under Yarn-client mode

2015-04-24 Thread Christian Perez
. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Christian Perez Silicon Valley Data Science Data Analyst christ...@svds.com @cp_phd

Re: Super slow caching in 1.3?

2015-04-16 Thread Christian Perez
just falling back to kryo and even then there are some locking issues). If so, would it be possible to try caching a flattened version? CACHE TABLE flattenedTable AS SELECT ... FROM parquetTable On Mon, Apr 6, 2015 at 5:00 PM, Christian Perez christ...@svds.com wrote: Hi all, Has anyone else

Super slow caching in 1.3?

2015-04-06 Thread Christian Perez
Hi all, Has anyone else noticed very slow time to cache a Parquet file? It takes 14 s per 235 MB (1 block) uncompressed node local Parquet file on M2 EC2 instances. Or are my expectations way off... Cheers, Christian -- Christian Perez Silicon Valley Data Science Data Analyst christ

Re: persist(MEMORY_ONLY) takes lot of time

2015-04-02 Thread Christian Perez
at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Christian Perez Silicon Valley Data Science Data Analyst christ...@svds.com @cp_phd

Re: input size too large | Performance issues with Spark

2015-04-02 Thread Christian Perez
. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Christian Perez Silicon Valley Data Science Data Analyst christ...@svds.com @cp_phd

Re: saveAsTable broken in v1.3 DataFrames?

2015-03-20 Thread Christian Perez
Any other users interested in a feature DataFrame.saveAsExternalTable() for making _useful_ external tables in Hive, or am I the only one? Bueller? If I start a PR for this, will it be taken seriously? On Thu, Mar 19, 2015 at 9:34 AM, Christian Perez christ...@svds.com wrote: Hi Yin, Thanks

saveAsTable broken in v1.3 DataFrames?

2015-03-19 Thread Christian Perez
... but before I get any deeper, can anyone reproduce this behavior? Cheers, Christian -- Christian Perez Silicon Valley Data Science Data Analyst christ...@svds.com @cp_phd - To unsubscribe, e-mail: user-unsubscr

Re: saveAsTable broken in v1.3 DataFrames?

2015-03-19 Thread Christian Perez
will be org.apache.spark.sql.parquet.DefaultSource. You can also look at your files in the file system. They are stored by Parquet. Thanks, Yin On Thu, Mar 19, 2015 at 12:00 PM, Christian Perez christ...@svds.com wrote: Hi all, DataFrame.saveAsTable creates a managed table in Hive (v0.13