[ 
https://issues.apache.org/jira/browse/SPARK-17439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471801#comment-15471801
 ] 

Tim Hunter commented on SPARK-17439:
------------------------------------

I have a patch for that. It should be merged after SPARK-17306

> QuantilesSummaries returns the wrong result after compression
> -------------------------------------------------------------
>
>                 Key: SPARK-17439
>                 URL: https://issues.apache.org/jira/browse/SPARK-17439
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Tim Hunter
>
> [~clockfly] found the following corner case that returns the wrong quantile 
> (off by 1):
> {code}
> test("test QuantileSummaries compression") {
>     var left = new QuantileSummaries(10000, 0.0001)
>     System.out.println("LEFT      RIGHT")
>     System.out.println("====================")
>     (0 to 10).foreach { index =>
>       left = left.insert(index)
>       left = left.compress()
>       var right = new QuantileSummaries(10000, 0.0001)
>       (0 to index).foreach(right.insert(_))
>       right = right.compress()
>       System.out.println(s"${left.query(0.5)}   ${right.query(0.5)}")
>     }
>   }
> {code}
> The result is:
> {code}
> LEFT      RIGHT
> ====================
> 0.0   0.0
> 0.0   1.0
> 0.0   1.0
> 0.0   1.0
> 1.0   2.0
> 1.0   2.0
> 2.0   3.0
> 2.0   3.0
> 3.0   4.0
> 3.0   4.0
> 4.0   5.0
> {code}
> The value of the "LEFT" column represents the output when using 
> QuantileSummaries in Window function, the value on the "RIGHT" column 
> represents the expected result. The different between "LEFT" and "RIGHT" 
> column is that the "LEFT" column does intermediate compression on the storage 
> of QuantileSummaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to