[ https://issues.apache.org/jira/browse/SPARK-34792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304654#comment-17304654 ]
kondziolka9ld commented on SPARK-34792: --------------------------------------- [~dongjoon] > This doesn't look like a bug. It is why I submitted question, not a bug. > Not only Spark's code, but also there is difference between Scala 2.11 and > Scala 2.12. Not exactly, there was no such difference between spark-2-3-2-scala-2-11 and spark-2-4-7-scala-2-12. > BTW, please send your question to d...@spark.apache.org next time. Done > Restore previous behaviour of randomSplit from spark-2.4.7 in spark-3 > --------------------------------------------------------------------- > > Key: SPARK-34792 > URL: https://issues.apache.org/jira/browse/SPARK-34792 > Project: Spark > Issue Type: Question > Components: Spark Core, SQL > Affects Versions: 3.0.1 > Reporter: kondziolka9ld > Priority: Major > > Hi, > Please consider a following difference of `randomSplit` method even despite > of the same seed. > > {code:java} > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.4.7 > /_/ > > Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val Array(f, s) = > Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 0.7), 42) > f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int] > s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int] > scala> f.show > +-----+ > |value| > +-----+ > | 4| > +-----+ > scala> s.show > +-----+ > |value| > +-----+ > | 1| > | 2| > | 3| > | 5| > | 6| > | 7| > | 8| > | 9| > | 10| > +-----+ > {code} > while as on spark-3 > {code:java} > scala> val Array(f, s) = > Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 0.7), 42) > f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int] > s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int] > scala> f.show > +-----+ > |value| > +-----+ > | 5| > | 10| > +-----+ > scala> s.show > +-----+ > |value| > +-----+ > | 1| > | 2| > | 3| > | 4| > | 6| > | 7| > | 8| > | 9| > +-----+ > {code} > I guess that implementation of `sample` method changed. > Is it possible to restore previous behaviour? > Thanks in advance! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org