[
https://issues.apache.org/jira/browse/SPARK-47061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035390#comment-18035390
]
Abdoulaye commented on SPARK-47061:
-----------------------------------
Hello,
We have hit this issue during our migration from Scala and spark. Any status
update regarding this issue?
> Wrong result from flatMapGroups using Scala 2.13.x
> --------------------------------------------------
>
> Key: SPARK-47061
> URL: https://issues.apache.org/jira/browse/SPARK-47061
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.5.0
> Environment: Tested with Windows using OpenJDK 17, as well as Ubuntu
> using OpenJDK 19
> Reporter: Magnus Kühn
> Priority: Major
>
> Using Scala 2.13 and `KeyValueGroupedDataset::flatMapGroups` produces wrong
> results. All rows produced by flatMapGroups have the values from the last
> entry in the returned iterator.
> * Downgrading to Scala 2.12 fixes the issue.
> * Using `mapGroups` followed by `flatMap` also fixes the issue.
>
> Test-Setup:
> {code:scala}
> import org.apache.spark.sql.SparkSession
> object Main {
> def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[*]").getOrCreate()
> import spark.implicits._
> // using flatMapGroups
> spark.createDataset(Seq(1, 2))
> .groupByKey(x => x)
> .flatMapGroups((x, _) => Seq(10 + x, 20 + x, 30 + x)).show()
> // second code using map, then flatMap ~> should yield the same result
> spark.createDataset(Seq(1, 2))
> .groupByKey(x => x)
> .mapGroups((x, _) => Seq(10 + x, 20 + x, 30 + x))
> .flatMap(x => x).show()
> }
> } {code}
> We map the key 1 to the Sequence (11, 21, 31). Analogously the key 2 is
> mapped to (12, 22, 32). Both computations should produce the following
> (identical) result:
> {code:java}
> +-----+
> |value|
> +-----+
> | 11|
> | 21|
> | 31|
> | 12|
> | 22|
> | 32|
> +-----+ {code}
> This was run using Scala 2.12 with Spark 3.5 - using the following
> `build.sbt`:
> {code:java}
> ThisBuild / scalaVersion := "2.12.18"
> libraryDependencies += "org.apache.spark" % "spark-sql_2.12" % "3.5.0"
> {code}
>
> Problem: By upgrading to Scala 2.13 we instead get the following result:
> {code:java}
> +-----+
> |value|
> +-----+
> | 31|
> | 31|
> | 31|
> | 32|
> | 32|
> | 32|
> +-----+ {code}
> Using this new `build.sbt`. The Code was not modified.
> {code:java}
> ThisBuild / scalaVersion := "2.13.10"
> libraryDependencies += "org.apache.spark" % "spark-sql_2.13" % "3.5.0" {code}
> [The test-case is inspired by this StackOverflow
> question.|https://stackoverflow.com/questions/74633091/why-does-keyvaluegroupeddatasets-flatmapgroups-give-incorrect-result-when-runni]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]