angerszhu created SPARK-39043: --------------------------------- Summary: Hive client should not gather statistic by default. Key: SPARK-39043 URL: https://issues.apache.org/jira/browse/SPARK-39043 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0, 3.1.2, 3.3.0 Reporter: angerszhu
When use `InsertIntoHiveTable`, when insert overwrite partition, it will call Hive.loadPartition(), in this method, when `hive.stats.autogather` is true(default is true) {code:java} // Some comments here public String getFoo() if (oldPart == null) { newTPart.getTPartition().setParameters(new HashMap<String,String>()); if (this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) { StatsSetupConst.setBasicStatsStateForCreateTable(newTPart.getParameters(), StatsSetupConst.TRUE); } public static void setBasicStatsStateForCreateTable(Map<String, String> params, String setting) { if (TRUE.equals(setting)) { for (String stat : StatsSetupConst.supportedStats) { params.put(stat, "0"); } } setBasicStatsState(params, setting); } public static final String[] supportedStats = {NUM_FILES,ROW_COUNT,TOTAL_SIZE,RAW_DATA_SIZE}; {code} Then it set default rowNum as 0, but since spark will update numFiles and rawSize, so rowNum remain 0. This impact other system like presto's CBO. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org