[ 
https://issues.apache.org/jira/browse/SPARK-31306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben updated SPARK-31306:
------------------------
    Description: 
 

The rand() function in PySpark, Spark, and R is documented as drawing from 
U[0.0, 1.0]. This suggests an inclusive upper bound, so can be confusing (i.e 
for a distribution written as `X ~ U(a, b)`, x can be a or b, so `U[0.0, 1.0]` 
suggests the value returned could include 1.0). The function uses Rand(), which 
is documented as having a result in the range [0, 1). [documented here | 
[https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71]]

  was:`rand()` uses `Rand()` - which generates values in [0, 1) ([documented 
here](https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71)).
 The existing documentation suggests that 1.0 is a possible value returned by 
rand (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so 
`U[0.0, 1.0]` suggests the value returned could include 1.0).


> rand() function documentation suggests an inclusive upper bound of 1.0
> ----------------------------------------------------------------------
>
>                 Key: SPARK-31306
>                 URL: https://issues.apache.org/jira/browse/SPARK-31306
>             Project: Spark
>          Issue Type: Documentation
>          Components: PySpark, R, Spark Core
>    Affects Versions: 2.4.5, 3.0.0
>            Reporter: Ben
>            Priority: Major
>
>  
> The rand() function in PySpark, Spark, and R is documented as drawing from 
> U[0.0, 1.0]. This suggests an inclusive upper bound, so can be confusing (i.e 
> for a distribution written as `X ~ U(a, b)`, x can be a or b, so `U[0.0, 
> 1.0]` suggests the value returned could include 1.0). The function uses 
> Rand(), which is documented as having a result in the range [0, 1). 
> [documented here | 
> [https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71]]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to