[ https://issues.apache.org/jira/browse/HIVE-24313?focusedWorklogId=814273&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-814273 ]
ASF GitHub Bot logged work on HIVE-24313: ----------------------------------------- Author: ASF GitHub Bot Created on: 06/Oct/22 07:38 Start Date: 06/Oct/22 07:38 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3639: URL: https://github.com/apache/hive/pull/3639#issuecomment-1269499492 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3639) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3639&resolved=false&types=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3639&resolved=false&types=BUG) [6 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3639&resolved=false&types=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3639&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3639&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3639&resolved=false&types=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3639&resolved=false&types=SECURITY_HOTSPOT) [![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png 'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3639&resolved=false&types=SECURITY_HOTSPOT) [1 Security Hotspot](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3639&resolved=false&types=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3639&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3639&resolved=false&types=CODE_SMELL) [63 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3639&resolved=false&types=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3639&metric=coverage&view=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3639&metric=duplicated_lines_density&view=list) No Duplication information Issue Time Tracking ------------------- Worklog Id: (was: 814273) Time Spent: 2h 10m (was: 2h) > Optimise stats collection for file sizes on cloud storage > --------------------------------------------------------- > > Key: HIVE-24313 > URL: https://issues.apache.org/jira/browse/HIVE-24313 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 > Reporter: Rajesh Balamohan > Assignee: Dmitriy Fingerman > Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > When stats information is not present (e.g external table), RelOptHiveTable > computes basic stats at runtime. > Following is the codepath. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L598] > {code:java} > Statistics stats = StatsUtils.collectStatistics(hiveConf, partitionList, > hiveTblMetadata, hiveNonPartitionCols, > nonPartColNamesThatRqrStats, colStatsCached, > nonPartColNamesThatRqrStats, true); > {code} > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L322] > {code:java} > for (Partition p : partList.getNotDeniedPartns()) { > BasicStats basicStats = > basicStatsFactory.build(Partish.buildFor(table, p)); > partStats.add(basicStats); > } > {code} > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205] > > {code:java} > try { > ds = getFileSizeForPath(path); > } catch (IOException e) { > ds = 0L; > } > {code} > > For a table & query with large number of partitions, this takes long time to > compute statistics and increases compilation time. It would be good to fix > it with "ForkJoinPool" ( > partList.getNotDeniedPartns().parallelStream().forEach((p) ) > > -- This message was sent by Atlassian Jira (v8.20.10#820010)