[
https://issues.apache.org/jira/browse/SPARK-11476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987508#comment-14987508
]
Sean Owen commented on SPARK-11476:
-----------------------------------
The other way around, right? normalRDD generates N(0,1) samples. That is not
the same thing as the uniform distribution on [0,1]. The Python example is the
one that needs the fix. Feel free to open a PR.
> Incorrect function referred to in MLib Random data generation documentation
> ---------------------------------------------------------------------------
>
> Key: SPARK-11476
> URL: https://issues.apache.org/jira/browse/SPARK-11476
> Project: Spark
> Issue Type: Documentation
> Components: Documentation
> Affects Versions: 1.5.1
> Reporter: Jason Blochowiak
> Priority: Minor
> Labels: documentation, easyfix
> Original Estimate: 10m
> Remaining Estimate: 10m
>
> http://spark.apache.org/docs/latest/mllib-statistics.html in the "Random data
> generation", a comment in the example code says:
> Generate a random double RDD that contains 1 million i.i.d. values drawn from
> the standard normal distribution `N(0, 1)`, evenly distributed in 10
> partitions.
> But it then calls normalRDD(), which does not do that - a call to
> uniformRDD() with the same parameters would do what the comment claims.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]