[jira] [Created] (SPARK-6072) Enable hash joins for nullable columns

2015-02-27 Thread Dima Zhiyanov (JIRA)
Dima Zhiyanov created SPARK-6072: Summary: Enable hash joins for nullable columns Key: SPARK-6072 URL: https://issues.apache.org/jira/browse/SPARK-6072 Project: Spark Issue Type: Improvement

DataFrame: Enable zipWithUniqueId

2015-02-20 Thread Dima Zhiyanov
Hello Question regarding the new DataFrame API introduced here https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html I oftentimes use the zipWithUniqueId method of the SchemaRDD (as an RDD) to replace string keys with more efficient long keys.

Re: How to do broadcast join in SparkSQL

2015-02-12 Thread Dima Zhiyanov
Hello Has Spark implemented computing statistics for Parquet files? Or is there any other way I can enable broadcast joins between parquet file RDDs in Spark Sql? Thanks Dima -- View this message in context:

Test

2015-02-12 Thread Dima Zhiyanov
Sent from my iPhone - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: How to do broadcast join in SparkSQL

2015-02-11 Thread Dima Zhiyanov
Hello Has Spark implemented computing statistics for Parquet files? Or is there any other way I can enable broadcast joins between parquet file RDDs in Spark Sql? Thanks Dima -- View this message in context:

Re: How to do broadcast join in SparkSQL

2015-02-11 Thread Dima Zhiyanov
Hello Has Spark implemented computing statistics for Parquet files? Or is there any other way I can enable broadcast joins between parquet file RDDs in Spark Sql? Thanks Dima -- View this message in context:

Re: How to do broadcast join in SparkSQL

2015-02-11 Thread Dima Zhiyanov
Hello Has Spark implemented computing statistics for Parquet files? Or is there any other way I can enable broadcast joins between parquet file RDDs in Spark Sql? Thanks Dima -- View this message in context:

Re: How to do broadcast join in SparkSQL

2015-02-11 Thread Dima Zhiyanov
://search-hadoop.com/m/JW1q5BZhf92 On Wed, Feb 11, 2015 at 3:04 PM, Dima Zhiyanov dimazhiya...@gmail.com wrote: Hello Has Spark implemented computing statistics for Parquet files? Or is there any other way I can enable broadcast joins between parquet file RDDs in Spark Sql? Thanks Dima

Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Dima Zhiyanov
I am also experiencing this kryo buffer problem. My join is left outer with under 40mb on the right side. I would expect the broadcast join to succeed in this case (hive did) Another problem is that the optimizer chose nested loop join for some reason I would expect broadcast (map side) hash

Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Dima Zhiyanov
Yes Sent from my iPhone On Aug 5, 2014, at 7:38 AM, Dima Zhiyanov [via Apache Spark User List] ml-node+s1001560n11432...@n3.nabble.com wrote: I am also experiencing this kryo buffer problem. My join is left outer with under 40mb on the right side. I would expect the broadcast join