[ https://issues.apache.org/jira/browse/SPARK-43170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713373#comment-17713373 ]
Hyukjin Kwon commented on SPARK-43170: -------------------------------------- Spark 3.2.X is EOL. Mind trying if the same persists in higher versions? > The spark sql like statement is pushed down to parquet for execution, but the > data cannot be queried > ---------------------------------------------------------------------------------------------------- > > Key: SPARK-43170 > URL: https://issues.apache.org/jira/browse/SPARK-43170 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.2 > Reporter: todd > Priority: Major > Attachments: image-2023-04-18-10-59-30-199.png > > > --DDL > CREATE TABLE `ecom_dwm`.`dwm_user_app_action_sum_all` ( > `gaid` STRING COMMENT '', > `beyla_id` STRING COMMENT '', > `dt` STRING, > `hour` STRING, > `appid` STRING COMMENT '包名') > USING parquet > PARTITIONED BY (dt, hour, appid) > LOCATION 's3://xxxxx/dwm_user_app_action_sum_all' > – partitions info > show partitions ecom_dwm.dwm_user_app_action_sum_all PARTITION > (dt='20230412'); > > dt=20230412/hour=23/appid=blibli.mobile.commerce > dt=20230412/hour=23/appid=cn.shopee.app > dt=20230412/hour=23/appid=cn.shopee.br > dt=20230412/hour=23/appid=cn.shopee.id > dt=20230412/hour=23/appid=cn.shopee.my > dt=20230412/hour=23/appid=cn.shopee.ph > > — query > select DISTINCT(appid) from ecom_dwm.dwm_user_app_action_sum_all > where dt='20230412' and appid like '%shopee%' > > --result > nodata > > — other > I use spark3.0.1 version and trino query engine to query the data。 > > > The physical execution node formed by spark 3.2 > (3) Scan parquet ecom_dwm.dwm_user_app_action_sum_all Output [3]: [dt#63, > hour#64, appid#65|#63, hour#64, appid#65] Batched: true Location: > InMemoryFileIndex [] > PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412), > Contains(appid#65, shopee)|#63), isnotnull(appid#65), (dt#63 = 20230412), > Contains(appid#65, shopee)] ReadSchema: struct<> > > > !image-2023-04-18-10-59-30-199.png! > > – sql plan detail > {code:java} > == Physical Plan == > CollectLimit (9) > +- InMemoryTableScan (1) > +- InMemoryRelation (2) > +- * HashAggregate (8) > +- Exchange (7) > +- * HashAggregate (6) > +- * Project (5) > +- * ColumnarToRow (4) > +- Scan parquet > ecom_dwm.dwm_user_app_action_sum_all (3) > (1) InMemoryTableScan > Output [1]: [appid#65] > Arguments: [appid#65] > (2) InMemoryRelation > Arguments: [appid#65], > CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer@ab5af13,StorageLevel(disk, > memory, deserialized, 1 replicas),*(2) HashAggregate(keys=[appid#65], > functions=[], output=[appid#65]) > +- Exchange hashpartitioning(appid#65, 200), ENSURE_REQUIREMENTS, [plan_id=24] > +- *(1) HashAggregate(keys=[appid#65], functions=[], output=[appid#65]) > +- *(1) Project [appid#65] > +- *(1) ColumnarToRow > +- FileScan parquet > ecom_dwm.dwm_user_app_action_sum_all[dt#63,hour#64,appid#65] Batched: true, > DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(0 paths)[], > PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412), > StartsWith(appid#65, com)], PushedFilters: [], ReadSchema: struct<> > ,None) > (3) Scan parquet ecom_dwm.dwm_user_app_action_sum_all > Output [3]: [dt#63, hour#64, appid#65] > Batched: true > Location: InMemoryFileIndex [] > PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412), > StartsWith(appid#65, com)] > ReadSchema: struct<> > (4) ColumnarToRow [codegen id : 1] > Input [3]: [dt#63, hour#64, appid#65] > (5) Project [codegen id : 1] > Output [1]: [appid#65] > Input [3]: [dt#63, hour#64, appid#65] > (6) HashAggregate [codegen id : 1] > Input [1]: [appid#65] > Keys [1]: [appid#65] > Functions: [] > Aggregate Attributes: [] > Results [1]: [appid#65] > (7) Exchange > Input [1]: [appid#65] > Arguments: hashpartitioning(appid#65, 200), ENSURE_REQUIREMENTS, [plan_id=24] > (8) HashAggregate [codegen id : 2] > Input [1]: [appid#65] > Keys [1]: [appid#65] > Functions: [] > Aggregate Attributes: [] > Results [1]: [appid#65] > (9) CollectLimit > Input [1]: [appid#65] > Arguments: 1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org