kondziolka9ld created SPARK-34792: ------------------------------------- Summary: Restore previous behaviour of randomSplit from spark-2.4.7 in spark-3 Key: SPARK-34792 URL: https://issues.apache.org/jira/browse/SPARK-34792 Project: Spark Issue Type: Question Components: Spark Core, SQL Affects Versions: 3.0.1 Reporter: kondziolka9ld
Hi, Please consider a following difference even despite of the same seed. Is it possible to restore the same {code:java} ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.7 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) Type in expressions to have them evaluated. Type :help for more information. scala> val Array(f, s) = Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 0.7), 42) f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int] s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int] scala> f.show +-----+ |value| +-----+ | 4| +-----+ scala> s.show +-----+ |value| +-----+ | 1| | 2| | 3| | 5| | 6| | 7| | 8| | 9| | 10| +-----+ {code} while as on spark-3 {code:java} scala> val Array(f, s) = Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 0.7), 42) f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int] s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int] scala> f.show +-----+ |value| +-----+ | 5| | 10| +-----+ scala> s.show +-----+ |value| +-----+ | 1| | 2| | 3| | 4| | 6| | 7| | 8| | 9| +-----+ {code} I guess that implementation of `sample` method changed. Is it possible to restore previous behaviour? Thanks in advance! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org