Xiao Li created SPARK-13380:
-------------------------------

             Summary: Document Rand(seed) and Randn(seed) Return 
Indeterministic Results When Data Partitions are not fixed
                 Key: SPARK-13380
                 URL: https://issues.apache.org/jira/browse/SPARK-13380
             Project: Spark
          Issue Type: Documentation
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Xiao Li
            Priority: Minor


rand and randn functions with a seed argument are commonly used. Based on the 
common sense, the results of rand and randn should be deterministic if the seed 
parameter value is provided. For example, in MS SQL Server, it also has a 
function rand. Regarding the parameter seed, the description is like: Seed is 
an integer expression (tinyint, smallint, or int) that gives the seed value. If 
seed is not specified, the SQL Server Database Engine assigns a seed value at 
random. For a specified seed value, the result returned is always the same.

Update: the current implementation is unable to generate deterministic results 
when the partitions are not fixed. This PR documents this issue in the function 
descriptions.

@jkbradley hit an issue and provided an example in the following JIRA: 
https://issues.apache.org/jira/browse/SPARK-13333



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to