[ https://issues.apache.org/jira/browse/SPARK-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
gagan taneja updated SPARK-18940: --------------------------------- Shepherd: Herman van Hovell > Percentile and approximate percentile support for frequency distribution table > ------------------------------------------------------------------------------ > > Key: SPARK-18940 > URL: https://issues.apache.org/jira/browse/SPARK-18940 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.2 > Reporter: gagan taneja > > I have a frequency distribution table with following entries > {noformat} > Age, No of person > 21, 10 > 22, 15 > 23, 18 > .. > .. > 30, 14 > {noformat} > Moreover it is common to have data in frequency distribution format to > further calculate Percentile, Median. With current implementation > It would be very difficult and complex to find the percentile. > Therefore i am proposing enhancement to current Percentile and Approx > Percentile implementation to take frequency distribution column into > consideration > Current Percentile definition > {noformat} > percentile(col, array(percentage1 [, percentage2]...)) > case class Percentile( > child: Expression, > percentageExpression: Expression, > mutableAggBufferOffset: Int = 0, > inputAggBufferOffset: Int = 0) { > def this(child: Expression, percentageExpression: Expression) = { > this(child, percentageExpression, 0, 0) > } > } > {noformat} > Proposed changes > {noformat} > percentile(col, [frequency], array(percentage1 [, percentage2]...)) > case class Percentile( > child: Expression, > frequency : Expression, > percentageExpression: Expression, > mutableAggBufferOffset: Int = 0, > inputAggBufferOffset: Int = 0) { > def this(child: Expression, percentageExpression: Expression) = { > this(child, Literal(1L), percentageExpression, 0, 0) > } > def this(child: Expression, frequency : Expression, percentageExpression: > Expression) = { > this(child, frequency, percentageExpression, 0, 0) > } > } > {noformat} > Although this definition will differ from hive implementation, it will be > useful functionality to many spark user. > Moreover the changes are local to only Percentile and ApproxPercentile > implementation -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org