[ https://issues.apache.org/jira/browse/SPARK-43170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
todd updated SPARK-43170: ------------------------- Description: --DDL CREATE TABLE `ecom_dwm`.`dwm_user_app_action_sum_all` ( `gaid` STRING COMMENT '', `beyla_id` STRING COMMENT '', `dt` STRING, `hour` STRING, `appid` STRING COMMENT '包名') USING parquet PARTITIONED BY (dt, hour, appid) LOCATION 's3://xxxxx/dwm_user_app_action_sum_all' -- partitions info show partitions ecom_dwm.dwm_user_app_action_sum_all PARTITION (dt='20230412'); dt=20230412/hour=23/appid=blibli.mobile.commerce dt=20230412/hour=23/appid=cn.shopee.app dt=20230412/hour=23/appid=cn.shopee.br dt=20230412/hour=23/appid=cn.shopee.id dt=20230412/hour=23/appid=cn.shopee.my dt=20230412/hour=23/appid=cn.shopee.ph --- query select DISTINCT(appid) from ecom_dwm.dwm_user_app_action_sum_all where dt='20230412' and appid like '%shopee%' --result nodata --- other I use spark3.0.1 version and trino query engine to query the data。 The physical execution node formed by spark 3.2 (3) Scan parquet ecom_dwm.dwm_user_app_action_sum_all Output [3]: [dt#63, hour#64, appid#65] Batched: true Location: InMemoryFileIndex [] PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412), Contains(appid#65, shopee)] ReadSchema: struct<> !image-2023-04-18-10-59-30-199.png! > The spark sql like statement is pushed down to parquet for execution, but the > data cannot be queried > ---------------------------------------------------------------------------------------------------- > > Key: SPARK-43170 > URL: https://issues.apache.org/jira/browse/SPARK-43170 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.2 > Reporter: todd > Priority: Blocker > Attachments: image-2023-04-18-10-59-30-199.png > > > --DDL > CREATE TABLE `ecom_dwm`.`dwm_user_app_action_sum_all` ( > `gaid` STRING COMMENT '', > `beyla_id` STRING COMMENT '', > `dt` STRING, > `hour` STRING, > `appid` STRING COMMENT '包名') > USING parquet > PARTITIONED BY (dt, hour, appid) > LOCATION 's3://xxxxx/dwm_user_app_action_sum_all' > -- partitions info > show partitions ecom_dwm.dwm_user_app_action_sum_all PARTITION > (dt='20230412'); > > dt=20230412/hour=23/appid=blibli.mobile.commerce > dt=20230412/hour=23/appid=cn.shopee.app > dt=20230412/hour=23/appid=cn.shopee.br > dt=20230412/hour=23/appid=cn.shopee.id > dt=20230412/hour=23/appid=cn.shopee.my > dt=20230412/hour=23/appid=cn.shopee.ph > > --- query > select DISTINCT(appid) from ecom_dwm.dwm_user_app_action_sum_all > where dt='20230412' and appid like '%shopee%' > > --result > nodata > > --- other > I use spark3.0.1 version and trino query engine to query the data。 > > > The physical execution node formed by spark 3.2 > (3) Scan parquet ecom_dwm.dwm_user_app_action_sum_all Output [3]: [dt#63, > hour#64, appid#65] Batched: true Location: InMemoryFileIndex [] > PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412), > Contains(appid#65, shopee)] ReadSchema: struct<> > > > !image-2023-04-18-10-59-30-199.png! > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org