[ https://issues.apache.org/jira/browse/SPARK-39043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-39043: ------------------------------------ Assignee: angerszhu (was: Apache Spark) > Hive client should not gather statistic by default. > --------------------------------------------------- > > Key: SPARK-39043 > URL: https://issues.apache.org/jira/browse/SPARK-39043 > Project: Spark > Issue Type: Task > Components: SQL > Affects Versions: 3.1.2, 3.2.0, 3.3.0 > Reporter: angerszhu > Assignee: angerszhu > Priority: Major > Fix For: 3.4.0 > > > When use `InsertIntoHiveTable`, when insert overwrite partition, it will call > Hive.loadPartition(), in this method, when `hive.stats.autogather` is > true(default is true) > > {code:java} > // Some comments here > public String getFoo() > if (oldPart == null) { > newTPart.getTPartition().setParameters(new HashMap<String,String>()); > if (this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) > { > > StatsSetupConst.setBasicStatsStateForCreateTable(newTPart.getParameters(), > StatsSetupConst.TRUE); > } > public static void setBasicStatsStateForCreateTable(Map<String, String> > params, String setting) { > if (TRUE.equals(setting)) { > for (String stat : StatsSetupConst.supportedStats) { > params.put(stat, "0"); > } > } > setBasicStatsState(params, setting); > } > public static final String[] supportedStats = > {NUM_FILES,ROW_COUNT,TOTAL_SIZE,RAW_DATA_SIZE}; > {code} > Then it set default rowNum as 0, but since spark will update numFiles and > rawSize, so rowNum remain 0. > This impact other system like presto's CBO. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org