Reynold,
The source file you are directing me to is a little too terse for me to
understand what exactly is going on. Let me tell you what I'm trying to do
and what problems I'm encountering, so that you might be able to better
direct me investigation of the SparkSQL codebase.
I am computing the
Hi Alex,
Can you attach the output of sql(explain extended your
query).collect.foreach(println)?
Thanks,
Yin
On Fri, Jan 16, 2015 at 1:54 PM, Alessandro Baretta alexbare...@gmail.com
wrote:
Reynold,
The source file you are directing me to is a little too terse for me to
understand what
It's a bunch of strategies defined here:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
In most common use cases (e.g. inner equi join), filters are pushed below
the join or into the join. Doing a cartesian product followed
What Reynold is describing is a performance optimization in implementation,
but the semantics of the join (cartesian product plus relational algebra
filter) should be the same and produce the same results.
On Thu, Jan 15, 2015 at 1:36 PM, Reynold Xin r...@databricks.com wrote:
It's a bunch of
Xin
Cc: Alessandro Baretta; dev@spark.apache.org
Subject: Re: Join implementation in SparkSQL
What Reynold is describing is a performance optimization in implementation, but
the semantics of the join (cartesian product plus relational algebra
filter) should be the same and produce the same