[ https://issues.apache.org/jira/browse/HIVE-24313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krisztian Kasa resolved HIVE-24313. ----------------------------------- Resolution: Fixed [#3639|https://github.com/apache/hive/pull/3639] was merged to master. Thanks [~difin] for the patch. > Optimise stats collection for file sizes on cloud storage > --------------------------------------------------------- > > Key: HIVE-24313 > URL: https://issues.apache.org/jira/browse/HIVE-24313 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 > Reporter: Rajesh Balamohan > Assignee: Dmitriy Fingerman > Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > When stats information is not present (e.g external table), RelOptHiveTable > computes basic stats at runtime. > Following is the codepath. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L598] > {code:java} > Statistics stats = StatsUtils.collectStatistics(hiveConf, partitionList, > hiveTblMetadata, hiveNonPartitionCols, > nonPartColNamesThatRqrStats, colStatsCached, > nonPartColNamesThatRqrStats, true); > {code} > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L322] > {code:java} > for (Partition p : partList.getNotDeniedPartns()) { > BasicStats basicStats = > basicStatsFactory.build(Partish.buildFor(table, p)); > partStats.add(basicStats); > } > {code} > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205] > > {code:java} > try { > ds = getFileSizeForPath(path); > } catch (IOException e) { > ds = 0L; > } > {code} > > For a table & query with large number of partitions, this takes long time to > compute statistics and increases compilation time. It would be good to fix > it with "ForkJoinPool" ( > partList.getNotDeniedPartns().parallelStream().forEach((p) ) > > -- This message was sent by Atlassian Jira (v8.20.10#820010)