[ 
https://issues.apache.org/jira/browse/SPARK-43170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

todd updated SPARK-43170:
-------------------------
    Description: 
--DDL

CREATE TABLE `ecom_dwm`.`dwm_user_app_action_sum_all` (
  `gaid` STRING COMMENT '',
  `beyla_id` STRING COMMENT '',
  `dt` STRING,
  `hour` STRING,
  `appid` STRING COMMENT '包名')
USING parquet
PARTITIONED BY (dt, hour, appid)
LOCATION 's3://xxxxx/dwm_user_app_action_sum_all'

-- partitions  info
show partitions ecom_dwm.dwm_user_app_action_sum_all PARTITION (dt='20230412');
 
dt=20230412/hour=23/appid=blibli.mobile.commerce
dt=20230412/hour=23/appid=cn.shopee.app
dt=20230412/hour=23/appid=cn.shopee.br
dt=20230412/hour=23/appid=cn.shopee.id
dt=20230412/hour=23/appid=cn.shopee.my
dt=20230412/hour=23/appid=cn.shopee.ph
 
--- query
select DISTINCT(appid) from ecom_dwm.dwm_user_app_action_sum_all
where dt='20230412' and appid like '%shopee%'
 
--result
 nodata 
 
--- other
I use spark3.0.1 version and trino query engine to query the data。
 
 
The physical execution node formed by spark 3.2
(3) Scan parquet ecom_dwm.dwm_user_app_action_sum_all Output [3]: [dt#63, 
hour#64, appid#65] Batched: true Location: InMemoryFileIndex []
PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412), 
Contains(appid#65, shopee)] ReadSchema: struct<>
 
 
!image-2023-04-18-10-59-30-199.png!
 
 

> The spark sql like statement is pushed down to parquet for execution, but the 
> data cannot be queried
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-43170
>                 URL: https://issues.apache.org/jira/browse/SPARK-43170
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.2
>            Reporter: todd
>            Priority: Blocker
>         Attachments: image-2023-04-18-10-59-30-199.png
>
>
> --DDL
> CREATE TABLE `ecom_dwm`.`dwm_user_app_action_sum_all` (
>   `gaid` STRING COMMENT '',
>   `beyla_id` STRING COMMENT '',
>   `dt` STRING,
>   `hour` STRING,
>   `appid` STRING COMMENT '包名')
> USING parquet
> PARTITIONED BY (dt, hour, appid)
> LOCATION 's3://xxxxx/dwm_user_app_action_sum_all'
> -- partitions  info
> show partitions ecom_dwm.dwm_user_app_action_sum_all PARTITION 
> (dt='20230412');
>  
> dt=20230412/hour=23/appid=blibli.mobile.commerce
> dt=20230412/hour=23/appid=cn.shopee.app
> dt=20230412/hour=23/appid=cn.shopee.br
> dt=20230412/hour=23/appid=cn.shopee.id
> dt=20230412/hour=23/appid=cn.shopee.my
> dt=20230412/hour=23/appid=cn.shopee.ph
>  
> --- query
> select DISTINCT(appid) from ecom_dwm.dwm_user_app_action_sum_all
> where dt='20230412' and appid like '%shopee%'
>  
> --result
>  nodata 
>  
> --- other
> I use spark3.0.1 version and trino query engine to query the data。
>  
>  
> The physical execution node formed by spark 3.2
> (3) Scan parquet ecom_dwm.dwm_user_app_action_sum_all Output [3]: [dt#63, 
> hour#64, appid#65] Batched: true Location: InMemoryFileIndex []
> PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412), 
> Contains(appid#65, shopee)] ReadSchema: struct<>
>  
>  
> !image-2023-04-18-10-59-30-199.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to