[ https://issues.apache.org/jira/browse/SPARK-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-25740: -------------------------------- Description: How to reproduce: {code:sql} # spark-sql create table t1 (a int) stored as parquet; create table t2 (a int) stored as parquet; insert into table t1 values (1); insert into table t2 values (1); -- clear cache REFRESH TABLE t1; REFRESH TABLE t2; explain select * from t1, t2 where t1.a = t2.a; -- SortMergeJoin set spark.sql.statistics.fallBackToHdfs=true; explain select * from t1, t2 where t1.a = t2.a; -- SortMergeJoin, it should be BroadcastHashJoin -- clear cache REFRESH TABLE t1; REFRESH TABLE t2; explain select * from t1, t2 where t1.a = t2.a; -- BroadcastHashJoin {code} was: How to reproduce: {code:sql} # spark-sql create table t1 (a int) stored as parquet; create table t2 (a int) stored as parquet; insert into table t1 values (1); insert into table t2 values (1); exit; spark-sql set spark.sql.statistics.fallBackToHdfs=true; explain select * from t1, t2 where t1.a = t2.a; -- BroadcastHashJoin exit; spark-sql explain select * from t1, t2 where t1.a = t2.a; -- SortMergeJoin set spark.sql.statistics.fallBackToHdfs=true; explain select * from t1, t2 where t1.a = t2.a; -- SortMergeJoin, it should be BroadcastHashJoin exit; {code} We need {{LogicalPlanStats.invalidateStatsCache}}, but seems only we can do is invalidateAllCachedTables when execute set Command: {code:java} val isInvalidateAllCachedTablesKeys = Set( SQLConf.ENABLE_FALL_BACK_TO_HDFS_FOR_STATS.key, SQLConf.DEFAULT_SIZE_IN_BYTES.key ) sparkSession.conf.set(key, value) if (isInvalidateAllCachedTablesKeys.contains(key)) { sparkSession.sessionState.catalog.invalidateAllCachedTables() } {code} > Refactor DetermineTableStats to invalidate cache when some configuration > changed > -------------------------------------------------------------------------------- > > Key: SPARK-25740 > URL: https://issues.apache.org/jira/browse/SPARK-25740 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Yuming Wang > Priority: Major > > How to reproduce: > {code:sql} > # spark-sql > create table t1 (a int) stored as parquet; > create table t2 (a int) stored as parquet; > insert into table t1 values (1); > insert into table t2 values (1); > -- clear cache > REFRESH TABLE t1; > REFRESH TABLE t2; > explain select * from t1, t2 where t1.a = t2.a; > -- SortMergeJoin > set spark.sql.statistics.fallBackToHdfs=true; > explain select * from t1, t2 where t1.a = t2.a; > -- SortMergeJoin, it should be BroadcastHashJoin > -- clear cache > REFRESH TABLE t1; > REFRESH TABLE t2; > explain select * from t1, t2 where t1.a = t2.a; > -- BroadcastHashJoin > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org