[ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=831063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831063
 ]

ASF GitHub Bot logged work on HIVE-26221:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Dec/22 15:10
            Start Date: 05/Dec/22 15:10
    Worklog Time Spent: 10m 
      Work Description: sonarcloud[bot] commented on PR #3137:
URL: https://github.com/apache/hive/pull/3137#issuecomment-1337544903

   Kudos, SonarCloud Quality Gate passed!    [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3137)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3137&resolved=false&types=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3137&resolved=false&types=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3137&resolved=false&types=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3137&resolved=false&types=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3137&resolved=false&types=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3137&resolved=false&types=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3137&resolved=false&types=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3137&resolved=false&types=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3137&resolved=false&types=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3137&resolved=false&types=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3137&resolved=false&types=CODE_SMELL)
 [38 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3137&resolved=false&types=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3137&metric=coverage&view=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3137&metric=duplicated_lines_density&view=list)
 No Duplication information
   
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 831063)
    Time Spent: 3h 50m  (was: 3h 40m)

> Add histogram-based column statistics
> -------------------------------------
>
>                 Key: HIVE-26221
>                 URL: https://issues.apache.org/jira/browse/HIVE-26221
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO, Metastore, Statistics
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to