[ 
https://issues.apache.org/jira/browse/SPARK-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075690#comment-15075690
 ] 

Russell Alexander Spitzer edited comment on SPARK-11661 at 12/31/15 3:44 AM:
-----------------------------------------------------------------------------

This seems to have a slightly unintended consequence in the explain dialogue. 

It basically makes it seem as if a source is always pushing down all of the 
filters (even those it cannot handle)

This can have a confusing effect (I kept checking my code to see where I had 
broken something :D )

{code: title="Query plan for source where nothing is handled by C* Source"}
Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
+- Scan 
org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
 PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}
Although the tell tale "Filter" step is present my first instinct would tell me 
that the underlying source relation is using all of those filters.

{code: title="Query plan for source where *everything* is handled by C* Source"}
Scan 
org.apache.spark.sql.cassandra.CassandraSourceRelation@55d4456c[a#79,b#80,c#81,d#82,e#83,f#84,g#85,h#86]
 PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}


I think this would be much clearer if we changed the metadata key to 
"HandledFilters" and only listed those handled fully by the underlying source.

wdyt?


was (Author: rspitzer):
This seems to have a slightly unintended consequence in the explain dialogue. 

It basically makes it seem as if a source is always pushing down all of the 
filters (even those it cannot handle)

This can have a confusing effect (I kept checking my code to see where I had 
broken something :D )

{code: Title="Query plan for source where nothing is handled by C* Source"}
Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
+- Scan 
org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
 PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}
Although the tell tale "Filter" step is present my first instinct would tell me 
that the underlying source relation is using all of those filters.

{code: Title="Query plan for source where *everything* is handled by C* Source"}
Scan 
org.apache.spark.sql.cassandra.CassandraSourceRelation@55d4456c[a#79,b#80,c#81,d#82,e#83,f#84,g#85,h#86]
 PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}


I think this would be much clearer if we changed the metadata key to 
"HandledFilters" and only listed those handled fully by the underlying source.

wdyt?

> We should still pushdown filters returned by a data source's unhandledFilters
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-11661
>                 URL: https://issues.apache.org/jira/browse/SPARK-11661
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>            Priority: Blocker
>             Fix For: 1.6.0
>
>
> We added unhandledFilters interface to SPARK-10978. So, a data source has a 
> chance to let Spark SQL know that for those returned filters, it is possible 
> that the data source will not apply them to every row. So, Spark SQL should 
> use a Filter operator to evaluate those filters. However, if a filter is a 
> part of returned unhandledFilters, we should still push it down. For example, 
> our internal data sources do not override this method, if we do not push down 
> those filters, we are actually turning off the filter pushdown feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to