[ https://issues.apache.org/jira/browse/SPARK-31306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben updated SPARK-31306: ------------------------ Description: The rand() function in PySpark, Spark, and R is documented as drawing from U[0.0, 1.0]. This suggests an inclusive upper bound, so can be confusing (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so `U[0.0, 1.0]` suggests the value returned could include 1.0). The function uses Rand(), which is [documented | https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71] as having a result in the range [0, 1). was: The rand() function in PySpark, Spark, and R is documented as drawing from U[0.0, 1.0]. This suggests an inclusive upper bound, so can be confusing (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so `U[0.0, 1.0]` suggests the value returned could include 1.0). The function uses Rand(), which is documented as having a result in the range [0, 1). [documented here | [https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71]] > rand() function documentation suggests an inclusive upper bound of 1.0 > ---------------------------------------------------------------------- > > Key: SPARK-31306 > URL: https://issues.apache.org/jira/browse/SPARK-31306 > Project: Spark > Issue Type: Documentation > Components: PySpark, R, Spark Core > Affects Versions: 2.4.5, 3.0.0 > Reporter: Ben > Priority: Major > > > The rand() function in PySpark, Spark, and R is documented as drawing from > U[0.0, 1.0]. This suggests an inclusive upper bound, so can be confusing (i.e > for a distribution written as `X ~ U(a, b)`, x can be a or b, so `U[0.0, > 1.0]` suggests the value returned could include 1.0). The function uses > Rand(), which is [documented | > https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71] > as having a result in the range [0, 1). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org