[ https://issues.apache.org/jira/browse/SPARK-25870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel resolved SPARK-25870. ---------------------------- Resolution: Not A Problem > RandomSplit with seed gives different results depending on column order > ----------------------------------------------------------------------- > > Key: SPARK-25870 > URL: https://issues.apache.org/jira/browse/SPARK-25870 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.2 > Reporter: Daniel > Priority: Minor > Original Estimate: 96h > Remaining Estimate: 96h > > Co-discovered by Zhihui Hong (zhon...@syr.edu): > {{If you run the following example, the resulting dataframe will have > different rows even though the have the same seed:}} > {{from pyspark.sql import SparkSession, functions as fn}} > {{spark = SparkSession.builder.getOrCreate()}}{{ }} > {{df = spark.range(0, 10).withColumn('r', (fn.rand()*10).cast('int'))}} > {{# sample 1}} > {{df.randomSplit([0.8, 0.2], seed=0)[0].show(5)}}{{ }} > {{# sample 2}} > {{df.select('r', 'id').randomSplit([0.8, 0.2], seed=0)[0].show(5)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org