[jira] [Commented] (SPARK-36776) Partition filter of DataSourceV2ScanRelation can not push down when select none dataSchema from FileScan

2021-09-18 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417234#comment-17417234
 ] 

L. C. Hsieh commented on SPARK-36776:
-

This is duplicated to SPARK-35985. We just need backport SPARK-35985 to 3.1. So 
resolved this as duplicate. Thanks for reporting this.

> Partition filter of DataSourceV2ScanRelation can not push down when select 
> none dataSchema from FileScan
> 
>
> Key: SPARK-36776
> URL: https://issues.apache.org/jira/browse/SPARK-36776
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: suheng.cloud
>Priority: Major
>
> In PruneFileSourcePartitions rule, the FileScan::withFilters is called to 
> push down partition prune filter(and this is the only place this function can 
> be called), but it has a constraint that “scan.readDataSchema.nonEmpty” 
>  [source code 
> here|https://github.com/apache/spark/blob/de351e30a90dd988b133b3d00fa6218bfcaba8b8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala#L114]
>  We use spark sql in custom catalog and execute the count sql like: select 
> count( * ) from catalog.db.tbl where dt=‘0812’ (also in other sqls if we not 
> select any col reference to tbl), in which dt is a partition key.
> In this case the scan.readDataSchema is empty indeed and no scan partition 
> prune performed, which cause scan all partition at last.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36776) Partition filter of DataSourceV2ScanRelation can not push down when select none dataSchema from FileScan

2021-09-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417015#comment-17417015
 ] 

Apache Spark commented on SPARK-36776:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34037

> Partition filter of DataSourceV2ScanRelation can not push down when select 
> none dataSchema from FileScan
> 
>
> Key: SPARK-36776
> URL: https://issues.apache.org/jira/browse/SPARK-36776
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: suheng.cloud
>Priority: Major
>
> In PruneFileSourcePartitions rule, the FileScan::withFilters is called to 
> push down partition prune filter(and this is the only place this function can 
> be called), but it has a constraint that “scan.readDataSchema.nonEmpty” 
>  [source code 
> here|https://github.com/apache/spark/blob/de351e30a90dd988b133b3d00fa6218bfcaba8b8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala#L114]
>  We use spark sql in custom catalog and execute the count sql like: select 
> count( * ) from catalog.db.tbl where dt=‘0812’ (also in other sqls if we not 
> select any col reference to tbl), in which dt is a partition key.
> In this case the scan.readDataSchema is empty indeed and no scan partition 
> prune performed, which cause scan all partition at last.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36776) Partition filter of DataSourceV2ScanRelation can not push down when select none dataSchema from FileScan

2021-09-17 Thread suheng.cloud (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416987#comment-17416987
 ] 

suheng.cloud commented on SPARK-36776:
--

Thank you Hyukjin & Huaxin~

> Partition filter of DataSourceV2ScanRelation can not push down when select 
> none dataSchema from FileScan
> 
>
> Key: SPARK-36776
> URL: https://issues.apache.org/jira/browse/SPARK-36776
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: suheng.cloud
>Priority: Major
>
> In PruneFileSourcePartitions rule, the FileScan::withFilters is called to 
> push down partition prune filter(and this is the only place this function can 
> be called), but it has a constraint that “scan.readDataSchema.nonEmpty” 
>  [source code 
> here|https://github.com/apache/spark/blob/de351e30a90dd988b133b3d00fa6218bfcaba8b8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala#L114]
>  We use spark sql in custom catalog and execute the count sql like: select 
> count( * ) from catalog.db.tbl where dt=‘0812’ (also in other sqls if we not 
> select any col reference to tbl), in which dt is a partition key.
> In this case the scan.readDataSchema is empty indeed and no scan partition 
> prune performed, which cause scan all partition at last.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36776) Partition filter of DataSourceV2ScanRelation can not push down when select none dataSchema from FileScan

2021-09-17 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416944#comment-17416944
 ] 

Huaxin Gao commented on SPARK-36776:


This is fixed in Spark master/3.2 in this PR 
https://github.com/apache/spark/pull/33191. I will open a PR to back port the 
fix in 3.1.

> Partition filter of DataSourceV2ScanRelation can not push down when select 
> none dataSchema from FileScan
> 
>
> Key: SPARK-36776
> URL: https://issues.apache.org/jira/browse/SPARK-36776
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: suheng.cloud
>Priority: Major
>
> In PruneFileSourcePartitions rule, the FileScan::withFilters is called to 
> push down partition prune filter(and this is the only place this function can 
> be called), but it has a constraint that “scan.readDataSchema.nonEmpty” 
>  [source code 
> here|https://github.com/apache/spark/blob/de351e30a90dd988b133b3d00fa6218bfcaba8b8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala#L114]
>  We use spark sql in custom catalog and execute the count sql like: select 
> count( * ) from catalog.db.tbl where dt=‘0812’ (also in other sqls if we not 
> select any col reference to tbl), in which dt is a partition key.
> In this case the scan.readDataSchema is empty indeed and no scan partition 
> prune performed, which cause scan all partition at last.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36776) Partition filter of DataSourceV2ScanRelation can not push down when select none dataSchema from FileScan

2021-09-16 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416451#comment-17416451
 ] 

Hyukjin Kwon commented on SPARK-36776:
--

I think this is fixed at SPARK-36351. cc [~huaxingao] FYI

> Partition filter of DataSourceV2ScanRelation can not push down when select 
> none dataSchema from FileScan
> 
>
> Key: SPARK-36776
> URL: https://issues.apache.org/jira/browse/SPARK-36776
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: suheng.cloud
>Priority: Major
>
> In PruneFileSourcePartitions rule, the FileScan::withFilters is called to 
> push down partition prune filter(and this is the only place this function can 
> be called), but it has a constraint that “scan.readDataSchema.nonEmpty” 
>  [source code 
> here|https://github.com/apache/spark/blob/de351e30a90dd988b133b3d00fa6218bfcaba8b8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala#L114]
>  We use spark sql in custom catalog and execute the count sql like: select 
> count( * ) from catalog.db.tbl where dt=‘0812’ (also in other sqls if we not 
> select any col reference to tbl), in which dt is a partition key.
> In this case the scan.readDataSchema is empty indeed and no scan partition 
> prune performed, which cause scan all partition at last.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org