[jira] [Created] (SPARK-34792) Restore previous behaviour of randomSplit from spark-2.4.7 in spark-3

kondziolka9ld (Jira) Thu, 18 Mar 2021 12:49:04 -0700

kondziolka9ld created SPARK-34792:
-------------------------------------

             Summary: Restore previous behaviour of randomSplit from 
spark-2.4.7 in spark-3
                 Key: SPARK-34792
                 URL: https://issues.apache.org/jira/browse/SPARK-34792
             Project: Spark
          Issue Type: Question
          Components: Spark Core, SQL
    Affects Versions: 3.0.1
            Reporter: kondziolka9ld



Hi, 

Please consider a following difference even despite of the same seed.

Is it possible to restore the same
{code:java}
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.7
      /_/
         
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val Array(f, s) =  Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 
0.7), 42)
f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
scala> f.show
+-----+
|value|
+-----+
|    4|
+-----+
scala> s.show
+-----+
|value|
+-----+
|    1|
|    2|
|    3|
|    5|
|    6|
|    7|
|    8|
|    9|
|   10|
+-----+
{code}
while as on spark-3
{code:java}
scala> val Array(f, s) =  Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 
0.7), 42)
f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
scala> f.show
+-----+
|value|
+-----+
|    5|
|   10|
+-----+
scala> s.show
+-----+
|value|
+-----+
|    1|
|    2|
|    3|
|    4|
|    6|
|    7|
|    8|
|    9|
+-----+
{code}
I guess that implementation of `sample` method changed.

Is it possible to restore previous behaviour?

Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34792) Restore previous behaviour of randomSplit from spark-2.4.7 in spark-3

Reply via email to