[ https://issues.apache.org/jira/browse/SPARK-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600400#comment-14600400 ]
Michael Armbrust commented on SPARK-8599: ----------------------------------------- What about this case? Random DF {code} val df = sqlContext.range(1, 10).select($"id", rand(0).as('r)) df.show() +--+-------------------+ |id| r| +--+-------------------+ | 1|0.47027138530546275| | 2|0.11616379100300933| | 3|0.45008521832568693| | 4| 0.9959647025839259| | 5| 0.6038577325006693| | 6| 0.6319470735268434| | 7|0.22327628846133507| | 8|0.24223739932588373| | 9| 0.8395518879513995| +--+-------------------+ {code} Joins work as expected... {code} val df = sqlContext.range(1, 10).select($"id", rand(0).as('r)) df.as("a").join(df.as("b"), $"a.id" === $"b.id").show() +--+-------------------+--+-------------------+ |id| r|id| r| +--+-------------------+--+-------------------+ | 1|0.47027138530546275| 1|0.47027138530546275| | 2|0.11616379100300933| 2|0.11616379100300933| | 3|0.45008521832568693| 3|0.45008521832568693| | 4| 0.9959647025839259| 4| 0.9959647025839259| | 5| 0.6038577325006693| 5| 0.6038577325006693| | 6| 0.6319470735268434| 6| 0.6319470735268434| | 7|0.22327628846133507| 7|0.22327628846133507| | 8|0.24223739932588373| 8|0.24223739932588373| | 9| 0.8395518879513995| 9| 0.8395518879513995| +--+-------------------+--+-------------------+ {code} But this is kind of confusing... {code} val df = sqlContext.range(1, 10).select($"id", rand(0).as('r)) df.as("a").join(df.filter($"r" < 0.5).as("b"), $"a.id" === $"b.id").show() +--+-------------------+--+-------------------+ |id| r|id| r| +--+-------------------+--+-------------------+ | 1|0.47027138530546275| 1|0.11616379100300933| | 2|0.11616379100300933| 2| 0.8588851155739579| | 3|0.45008521832568693| 3| 0.9959647025839259| | 4| 0.9959647025839259| 4| 0.5910417491366206| | 7|0.22327628846133507| 7|0.24223739932588373| | 9| 0.8395518879513995| 9| 0.8994457593465164| +--+-------------------+--+-------------------+ {code} > Use a Random operator to handle Random distribution generating expressions > -------------------------------------------------------------------------- > > Key: SPARK-8599 > URL: https://issues.apache.org/jira/browse/SPARK-8599 > Project: Spark > Issue Type: Improvement > Affects Versions: 1.4.0 > Reporter: Yin Huai > Priority: Critical > > Right now, we are using expressions for Random distribution generating > expressions. But, we have to track them in lots of places in the optimizer to > handle them carefully. Otherwise, these expressions will be treated as > stateless expressions and have unexpected behaviors (e.g. SPARK-8023). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org