[jira] [Commented] (SPARK-35245) DynamicFilter pushdown not working

Yuming Wang (Jira) Sat, 01 May 2021 07:07:11 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-35245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337807#comment-17337807
 ]


Yuming Wang commented on SPARK-35245:
-------------------------------------

This is because filtering side do not has selective predicate : 
https://github.com/apache/spark/blob/19c7d2f3d8cda8d9bc5dfc1a0bf5d46845b1bc2f/sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala#L193-L201

> DynamicFilter pushdown not working
> ----------------------------------
>
>                 Key: SPARK-35245
>                 URL: https://issues.apache.org/jira/browse/SPARK-35245
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.1.1
>            Reporter: jean-claude
>            Priority: Minor
>
>  
> The pushed filters is always empty. `PushedFilters: []`
> I was expecting the filters to be pushed down on the probe side of the join.
> Not sure how to properly configure this to work. For example how to set 
> fallbackFilterRatio ?
> spark = ( SparkSession.builder
>  .master('local')
>  .config("spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio", 
> 100)
>  .getOrCreate()
>  )
>  
> df = 
> spark.read.parquet('abfss://warehouse@<domain>/iceberg/opensource/<table>/data/timeperiod=2021-04-25/00000-0-929b48ef-7ec3-47bd-b0a1-e9172c2dca6a-00001.parquet')
> df.createOrReplaceTempView('TMP')
>  
> spark.sql('''
> explain cost
> select 
>    timeperiod, rrname
> from
>    TMP
> where
>    timeperiod in (
>      select
>        TO_DATE(d, 'MM-dd-yyyy') AS timeperiod
>      from
>      values
>        ('01-01-2021'),
>        ('01-01-2021'),
>        ('01-01-2021') tbl(d)
>  )
> group by
>    timeperiod, rrname
> ''').show(truncate=False)
> |== Optimized Logical Plan ==
> Aggregate [timeperiod#597, rrname#594], [timeperiod#597, rrname#594], 
> Statistics(sizeInBytes=69.0 MiB)
> +- Join LeftSemi, (timeperiod#597 = timeperiod#669), 
> Statistics(sizeInBytes=69.0 MiB)
>  :- Project [rrname#594, timeperiod#597], Statistics(sizeInBytes=69.0 MiB)
>  : +- 
> Relation[count#591,time_first#592L,time_last#593L,rrname#594,rrtype#595,rdata#596,timeperiod#597]
>  parquet, Statistics(sizeInBytes=198.5 MiB)
>  +- LocalRelation [timeperiod#669], Statistics(sizeInBytes=36.0 B)
> == Physical Plan ==
> *(2) HashAggregate(keys=[timeperiod#597, rrname#594], functions=[], 
> output=[timeperiod#597, rrname#594])
> +- Exchange hashpartitioning(timeperiod#597, rrname#594, 200), 
> ENSURE_REQUIREMENTS, [id=#839]
>  +- *(1) HashAggregate(keys=[timeperiod#597, rrname#594], functions=[], 
> output=[timeperiod#597, rrname#594])
>  +- *(1) BroadcastHashJoin [timeperiod#597], [timeperiod#669], LeftSemi, 
> BuildRight, false
>  :- *(1) ColumnarToRow
>  : +- FileScan parquet [rrname#594,timeperiod#597] Batched: true, 
> DataFilters: [], Format: Parquet, Location: 
> InMemoryFileIndex[abfss://warehouse@<domain>/iceberg/opensource/..., 
> PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct<rrname:string,timeperiod:date>
>  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, date, 
> true]),false), [id=#822]
>  +- LocalTableScan [timeperiod#669]
>  
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35245) DynamicFilter pushdown not working

Reply via email to