Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23065#discussion_r234393799 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -276,10 +276,10 @@ class QuantileDiscretizerSuite extends MLTest with DefaultReadWriteTest { 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0) val data2 = Array.range(1, 40, 2).map(_.toDouble) val expected2 = Array (0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, - 2.0, 2.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0) + 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0) --- End diff -- Interestingly, avoiding double Ranges actually fixed the code here. You can see the bucketing before didn't quite make sense. Now it's even. It's because of... ``` scala> (0.0 to 1.0 by 1.0 / 10).toList <console>:12: warning: method to in trait FractionalProxy is deprecated (since 2.12.6): use BigDecimal range instead (0.0 to 1.0 by 1.0 / 10).toList ^ res5: List[Double] = List(0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999) scala> (0 to 10).map(_.toDouble / 10).toList res6: List[Double] = List(0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0) ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org