Zhenhua Wang created SPARK-18000: ------------------------------------ Summary: Aggregation function for computing endpoints for numeric histograms Key: SPARK-18000 URL: https://issues.apache.org/jira/browse/SPARK-18000 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.1.0 Reporter: Zhenhua Wang
For a column of numeric type (including date and timestamp), we will generate a equi-width or equi-height histogram, depending on if its ndv is large than the maximum number of bins allowed in one histogram (denoted as numBins). This agg function computes values and their frequencies using a small hashmap, whose size is less than or equal to "numBins", and returns an equi-width histogram. When the size of hashmap exceeds "numBins", it cleans the hashmap and utilizes ApproximatePercentile to return endpoints of equi-height histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org