[ 
https://issues.apache.org/jira/browse/SPARK-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600400#comment-14600400
 ] 

Michael Armbrust commented on SPARK-8599:
-----------------------------------------

What about this case?

Random DF
{code}
val df = sqlContext.range(1, 10).select($"id", rand(0).as('r))
df.show()
​
+--+-------------------+
|id|                  r|
+--+-------------------+
| 1|0.47027138530546275|
| 2|0.11616379100300933|
| 3|0.45008521832568693|
| 4| 0.9959647025839259|
| 5| 0.6038577325006693|
| 6| 0.6319470735268434|
| 7|0.22327628846133507|
| 8|0.24223739932588373|
| 9| 0.8395518879513995|
+--+-------------------+
{code}

Joins work as expected...
{code}
val df = sqlContext.range(1, 10).select($"id", rand(0).as('r))
df.as("a").join(df.as("b"), $"a.id" === $"b.id").show()
+--+-------------------+--+-------------------+
|id|                  r|id|                  r|
+--+-------------------+--+-------------------+
| 1|0.47027138530546275| 1|0.47027138530546275|
| 2|0.11616379100300933| 2|0.11616379100300933|
| 3|0.45008521832568693| 3|0.45008521832568693|
| 4| 0.9959647025839259| 4| 0.9959647025839259|
| 5| 0.6038577325006693| 5| 0.6038577325006693|
| 6| 0.6319470735268434| 6| 0.6319470735268434|
| 7|0.22327628846133507| 7|0.22327628846133507|
| 8|0.24223739932588373| 8|0.24223739932588373|
| 9| 0.8395518879513995| 9| 0.8395518879513995|
+--+-------------------+--+-------------------+
{code}

But this is kind of confusing...
{code}
val df = sqlContext.range(1, 10).select($"id", rand(0).as('r))
df.as("a").join(df.filter($"r" < 0.5).as("b"), $"a.id" === $"b.id").show()
+--+-------------------+--+-------------------+
|id|                  r|id|                  r|
+--+-------------------+--+-------------------+
| 1|0.47027138530546275| 1|0.11616379100300933|
| 2|0.11616379100300933| 2| 0.8588851155739579|
| 3|0.45008521832568693| 3| 0.9959647025839259|
| 4| 0.9959647025839259| 4| 0.5910417491366206|
| 7|0.22327628846133507| 7|0.24223739932588373|
| 9| 0.8395518879513995| 9| 0.8994457593465164|
+--+-------------------+--+-------------------+
{code}

> Use a Random operator to handle Random distribution generating expressions
> --------------------------------------------------------------------------
>
>                 Key: SPARK-8599
>                 URL: https://issues.apache.org/jira/browse/SPARK-8599
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 1.4.0
>            Reporter: Yin Huai
>            Priority: Critical
>
> Right now, we are using expressions for Random distribution generating 
> expressions. But, we have to track them in lots of places in the optimizer to 
> handle them carefully. Otherwise, these expressions will be treated as 
> stateless expressions and have unexpected behaviors (e.g. SPARK-8023). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to