[jira] [Commented] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

Abhinav Chawade (JIRA) Tue, 09 Feb 2016 14:10:03 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139845#comment-15139845
 ]


Abhinav Chawade commented on SPARK-13219:
-----------------------------------------

[~smilegator] I changed the patch based on 
https://github.com/gatorsmile/spark/commit/20d46c9bee2d99966406e6450b159ca404578aa6
 and the predicate pushdown is working for 2 and more than 2 tables. I created 
another table called parkings(assetid int primary key,parkingid int) and joined 
that with assets and tenants to get result. Looking at explain output, the 
filter gets pushed down to all phases before table scan which can speed things 
up.

{code}
spark-sql> explain select p.parkingid from parkings p,assets a,tenants t where 
t.assetid=a.assetid and a.assetid=p.assetid and t.assetid=1201;
== Physical Plan ==
Project [parkingid#22]
 ShuffledHashJoin [assetid#23], [assetid#26], BuildRight
  Exchange (HashPartitioning 200)
   Project [parkingid#22,assetid#23]
    ShuffledHashJoin [assetid#21], [assetid#23], BuildRight
     Exchange (HashPartitioning 200)
      Filter (assetid#21 = 1201)
       HiveTableScan [assetid#21,parkingid#22], (MetastoreRelation 
element22082, parkings, Some(p)), None
     Exchange (HashPartitioning 200)
      Filter (assetid#23 = 1201)
       HiveTableScan [assetid#23], (MetastoreRelation element22082, assets, 
Some(a)), None
  Exchange (HashPartitioning 200)
   Filter (assetid#26 = 1201)
    HiveTableScan [assetid#26], (MetastoreRelation element22082, tenants, 
Some(t)), None
Time taken: 0.43 seconds, Fetched 15 row(s)
{code}

I will do some more tests with inner, left outer and right outer join but with 
a simple select with 3 tables the result looks promising. I have updated code 
in my branch based on your fix 
https://github.com/drnushooz/spark/tree/v1.4.1-SPARK-13219

> Pushdown predicate propagation in SparkSQL with join
> ----------------------------------------------------
>
>                 Key: SPARK-13219
>                 URL: https://issues.apache.org/jira/browse/SPARK-13219
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.4.1, 1.6.0
>         Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>            Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
>     assetid int PRIMARY KEY,
>     address text,
>     propertyname text
> )
> CREATE TABLE tenants (
>     assetid int PRIMARY KEY,
>     name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>    Filter (CAST(assetid#13, DoubleType) = 1201.0)
>     HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>    HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

Reply via email to