[jira] [Commented] (SPARK-34792) Restore previous behaviour of randomSplit from spark-2.4.7 in spark-3

kondziolka9ld (Jira) Thu, 18 Mar 2021 23:23:07 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-34792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304654#comment-17304654
 ]


kondziolka9ld commented on SPARK-34792:
---------------------------------------

[~dongjoon]

> This doesn't look like a bug.

It is why I submitted question, not a bug.

 

> Not only Spark's code, but also there is difference between Scala 2.11 and 
> Scala 2.12.

Not exactly, there was no such difference between spark-2-3-2-scala-2-11 and 
spark-2-4-7-scala-2-12.

 

> BTW, please send your question to d...@spark.apache.org next time.

Done

> Restore previous behaviour of randomSplit from spark-2.4.7 in spark-3
> ---------------------------------------------------------------------
>
>                 Key: SPARK-34792
>                 URL: https://issues.apache.org/jira/browse/SPARK-34792
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core, SQL
>    Affects Versions: 3.0.1
>            Reporter: kondziolka9ld
>            Priority: Major
>
> Hi, 
> Please consider a following difference of `randomSplit` method even despite 
> of the same seed.
>  
> {code:java}
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 2.4.7
>       /_/
>          
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> val Array(f, s) =  
> Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 0.7), 42)
> f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
> s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
> scala> f.show
> +-----+
> |value|
> +-----+
> |    4|
> +-----+
> scala> s.show
> +-----+
> |value|
> +-----+
> |    1|
> |    2|
> |    3|
> |    5|
> |    6|
> |    7|
> |    8|
> |    9|
> |   10|
> +-----+
> {code}
> while as on spark-3
> {code:java}
> scala> val Array(f, s) =  
> Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 0.7), 42)
> f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
> s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
> scala> f.show
> +-----+
> |value|
> +-----+
> |    5|
> |   10|
> +-----+
> scala> s.show
> +-----+
> |value|
> +-----+
> |    1|
> |    2|
> |    3|
> |    4|
> |    6|
> |    7|
> |    8|
> |    9|
> +-----+
> {code}
> I guess that implementation of `sample` method changed.
> Is it possible to restore previous behaviour?
> Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34792) Restore previous behaviour of randomSplit from spark-2.4.7 in spark-3

Reply via email to