[ https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614323#comment-17614323 ]
icyjhl commented on SPARK-40610: -------------------------------- Hi [~Zing], this works fine in spark2 and spark 3.0.1, anything we changed in 3.2.1? > Spark fall back to use getPartitions instead of getPartitionsByFilter when > date_add functions used in where clause > ------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-40610 > URL: https://issues.apache.org/jira/browse/SPARK-40610 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.1 > Environment: edw.tmp_test_metastore_usage_source is a big table with > 1000 partitions and hundreds of columns > Reporter: icyjhl > Priority: Major > Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql > > > When I run a insert overwrite statement, I got error saying: > > {code:java} > MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. > listPartitions {code} > > It's weird as I only selected for about 3 partitions, so I rerun the sql and > checked the metastore, then I found it's fetching all columns in all > partitions: > > {code:java} > select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where > "CD_ID" > in > (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code} > > > After testing, I found the problem is with the date_add function in where > clause, if remove it ,sql works fine, else metastore would fetch all columns > in all partitions. > > > {code:java} > insert overwrite table test.tmp_test_metastore_usage > SELECT userid > ,SUBSTR(sendtime,1,10) AS creation_date > ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS > bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname > ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS > bh_esdate_deltadays_min > ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min > ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min > ,json_bh_industryphyname AS bh_industryphyname > ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean > ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS > bh_industryphyname_nunique > ,cast(current_timestamp() as string) as dw_cre_date > ,cast(current_timestamp() as string) as dw_upd_date > FROM ( > SELECT userid > ,sendtime > ,json_bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname > ,json_bh_esdate_deltadays_min > ,json_bh_subconam_min > ,json_bh_qiye_regcap_min > ,json_bh_industryphyname > ,json_bh_subconam_mean > ,json_bh_industryphyname_nunique > ,row_number() OVER ( > PARTITION BY userid,dt ORDER BY sendtime DESC > ) rn > FROM edw.tmp_test_metastore_usage_source > WHERE dt >= date_add('2022-09-22',-3 ) > AND json_bizid IN ('6101') > AND json_dingid IN ('611') > ) t > WHERE rn = 1 {code} > > By the way 2.4.7 works good. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org