icyjhl created SPARK-40610: ------------------------------ Summary: Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause Key: SPARK-40610 URL: https://issues.apache.org/jira/browse/SPARK-40610 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Environment: edw.tmp_test_metastore_usage_source is a big table with 1000 partitions and hundreds of columns Reporter: icyjhl
When I run a insert overwrite statement, I got error saying: {code:java} MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. listPartitions {code} It's weird as I only selected for about 3 partitions, so I rerun the sql and checked the metastore, then I found it's fetching all columns in all partitions: {code:java} select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code} After testing, I found the problem is with the date_add function in where clause, if remove it ,sql works fine, else metastore would fetch all columns in all partitions. {code:java} insert overwrite table test.tmp_test_metastore_usage SELECT userid ,SUBSTR(sendtime,1,10) AS creation_date ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS bh_esdate_deltadays_max ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS bh_esdate_deltadays_min ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min ,json_bh_industryphyname AS bh_industryphyname ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS bh_industryphyname_nunique ,cast(current_timestamp() as string) as dw_cre_date ,cast(current_timestamp() as string) as dw_upd_date FROM ( SELECT userid ,sendtime ,json_bh_esdate_deltadays_max ,json_bh_qiye_industryphyname ,json_bh_esdate_deltadays_min ,json_bh_subconam_min ,json_bh_qiye_regcap_min ,json_bh_industryphyname ,json_bh_subconam_mean ,json_bh_industryphyname_nunique ,row_number() OVER ( PARTITION BY userid,dt ORDER BY sendtime DESC ) rn FROM edw.tmp_test_metastore_usage_source WHERE dt >= date_add('2022-09-22',-3 ) AND json_bizid IN ('6101') AND json_dingid IN ('611') ) t WHERE rn = 1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org