[ https://issues.apache.org/jira/browse/SPARK-17439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471801#comment-15471801 ]
Tim Hunter commented on SPARK-17439: ------------------------------------ I have a patch for that. It should be merged after SPARK-17306 > QuantilesSummaries returns the wrong result after compression > ------------------------------------------------------------- > > Key: SPARK-17439 > URL: https://issues.apache.org/jira/browse/SPARK-17439 > Project: Spark > Issue Type: Bug > Reporter: Tim Hunter > > [~clockfly] found the following corner case that returns the wrong quantile > (off by 1): > {code} > test("test QuantileSummaries compression") { > var left = new QuantileSummaries(10000, 0.0001) > System.out.println("LEFT RIGHT") > System.out.println("====================") > (0 to 10).foreach { index => > left = left.insert(index) > left = left.compress() > var right = new QuantileSummaries(10000, 0.0001) > (0 to index).foreach(right.insert(_)) > right = right.compress() > System.out.println(s"${left.query(0.5)} ${right.query(0.5)}") > } > } > {code} > The result is: > {code} > LEFT RIGHT > ==================== > 0.0 0.0 > 0.0 1.0 > 0.0 1.0 > 0.0 1.0 > 1.0 2.0 > 1.0 2.0 > 2.0 3.0 > 2.0 3.0 > 3.0 4.0 > 3.0 4.0 > 4.0 5.0 > {code} > The value of the "LEFT" column represents the output when using > QuantileSummaries in Window function, the value on the "RIGHT" column > represents the expected result. The different between "LEFT" and "RIGHT" > column is that the "LEFT" column does intermediate compression on the storage > of QuantileSummaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org