[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

rdblue Fri, 06 Oct 2017 13:18:24 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19394#discussion_r143285737
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala ---
    @@ -58,7 +58,7 @@ class ConfigBehaviorSuite extends QueryTest with 
SharedSQLContext {
           withSQLConf(SQLConf.RANGE_EXCHANGE_SAMPLE_SIZE_PER_PARTITION.key -> 
"1") {
             // If we only sample one point, the range boundaries will be 
pretty bad and the
             // chi-sq value would be very high.
    -        assert(computeChiSquareTest() > 1000)
    +        assert(computeChiSquareTest() > 300)
    --- End diff --
    
    @rxin, the difference that is causing this to fail is that `rdd.id` is 15 
instead of 14 because of the change to `getByteArrayRdd`. That id is used to 
seed the random generator and the chi-sq value isn't high enough. The previous 
value must have been unusually high. With so few data points, this can vary 
quite a bit so I think changing the bound to 300 is a good fix.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

Reply via email to