[ https://issues.apache.org/jira/browse/SPARK-45216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Toth updated SPARK-45216: ------------------------------- Description: If we run the following example the result is the expected equal 2 columns: {noformat} val c = rand() df.select(c, c) +--------------------------+--------------------------+ |rand(-4522010140232537566)|rand(-4522010140232537566)| +--------------------------+--------------------------+ | 0.4520819282997137| 0.4520819282997137| +--------------------------+--------------------------+ {noformat} But if we run use other similar APIs their result is incorrect: {noformat} val r1 = random() val r2 = uuid() val r3 = shuffle(col("x")) val x = df.select(r1, r1, r2, r2, r3, r3) +------------------+------------------+--------------------+--------------------+----------+----------+ | rand()| rand()| uuid()| uuid()|shuffle(x)|shuffle(x)| +------------------+------------------+--------------------+--------------------+----------+----------+ |0.7407604956381952|0.7957319451135009|e55bc4b0-74e6-4b0...|a587163b-d06b-4bb...| [1, 2, 3]| [2, 1, 3]| +------------------+------------------+--------------------+--------------------+----------+----------+ {noformat} > Fix non-deterministic seeded Dataset APIs > ----------------------------------------- > > Key: SPARK-45216 > URL: https://issues.apache.org/jira/browse/SPARK-45216 > Project: Spark > Issue Type: Bug > Components: Connect, SQL > Affects Versions: 4.0.0 > Reporter: Peter Toth > Priority: Major > > If we run the following example the result is the expected equal 2 columns: > {noformat} > val c = rand() > df.select(c, c) > +--------------------------+--------------------------+ > |rand(-4522010140232537566)|rand(-4522010140232537566)| > +--------------------------+--------------------------+ > | 0.4520819282997137| 0.4520819282997137| > +--------------------------+--------------------------+ > {noformat} > > But if we run use other similar APIs their result is incorrect: > {noformat} > val r1 = random() > val r2 = uuid() > val r3 = shuffle(col("x")) > val x = df.select(r1, r1, r2, r2, r3, r3) > +------------------+------------------+--------------------+--------------------+----------+----------+ > | rand()| rand()| uuid()| > uuid()|shuffle(x)|shuffle(x)| > +------------------+------------------+--------------------+--------------------+----------+----------+ > |0.7407604956381952|0.7957319451135009|e55bc4b0-74e6-4b0...|a587163b-d06b-4bb...| > [1, 2, 3]| [2, 1, 3]| > +------------------+------------------+--------------------+--------------------+----------+----------+ > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org