Re: Join implementation in SparkSQL

2015-01-16 Thread Alessandro Baretta
Reynold, The source file you are directing me to is a little too terse for me to understand what exactly is going on. Let me tell you what I'm trying to do and what problems I'm encountering, so that you might be able to better direct me investigation of the SparkSQL codebase. I am computing the

Re: Join implementation in SparkSQL

2015-01-16 Thread Yin Huai
Hi Alex, Can you attach the output of sql(explain extended your query).collect.foreach(println)? Thanks, Yin On Fri, Jan 16, 2015 at 1:54 PM, Alessandro Baretta alexbare...@gmail.com wrote: Reynold, The source file you are directing me to is a little too terse for me to understand what

Re: Join implementation in SparkSQL

2015-01-15 Thread Reynold Xin
It's a bunch of strategies defined here: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala In most common use cases (e.g. inner equi join), filters are pushed below the join or into the join. Doing a cartesian product followed

Re: Join implementation in SparkSQL

2015-01-15 Thread Andrew Ash
What Reynold is describing is a performance optimization in implementation, but the semantics of the join (cartesian product plus relational algebra filter) should be the same and produce the same results. On Thu, Jan 15, 2015 at 1:36 PM, Reynold Xin r...@databricks.com wrote: It's a bunch of

RE: Join implementation in SparkSQL

2015-01-15 Thread Cheng, Hao
Xin Cc: Alessandro Baretta; dev@spark.apache.org Subject: Re: Join implementation in SparkSQL What Reynold is describing is a performance optimization in implementation, but the semantics of the join (cartesian product plus relational algebra filter) should be the same and produce the same