tkaymak commented on code in PR #38212:
URL: https://github.com/apache/beam/pull/38212#discussion_r3094258687
##########
runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java:
##########
@@ -522,7 +522,9 @@ JavaDStream<WindowedValue<KV<K, Iterable<InputT>>>>
groupByKeyAndWindow(
Tuple2</*K*/ ByteArray, Tuple2<StateAndTimers, /*WV<KV<K,
Itr<I>>>*/ List<byte[]>>>>
firedStream =
pairDStream.updateStateByKey(
- updateFunc,
+ // Raw cast to AbstractFunction1 suppresses Scala 2.12
(collection.Seq) vs
+ // Scala 2.13 (immutable.Seq) type difference — safe at
runtime due to erasure.
+ (scala.runtime.AbstractFunction1) updateFunc,
Review Comment:
In Scala 2.12 (used by Spark 2 and some Spark 3 versions), `Seq` in the
signature of updateStateByKey refers to `scala.collection.Seq`.
In Scala 2.13 (used by Spark 4 and newer Spark 3 versions), `Seq` refers to
`scala.collection.immutable.Seq`.
Since `SparkGroupAlsoByWindowViaWindowSet.java` is located in the shared
source directory (runners/spark/src/main/java), it is compiled against
different Spark and Scala versions depending on the build profile.
As Java does not recognize Scala's declaration-site variance (where
Function1 is contravariant in its argument), the Java compiler sees
`AbstractFunction1<..., ...>` with `scala.collection.Seq` as an entirely
different type than what Spark expects when compiled against Scala 2.13
(`scala.collection.immutable.Seq`).
The raw cast to `scala.runtime.AbstractFunction1` effectively bypasses the
Java compiler's type checks. At runtime, due to type erasure, the generic
parameters disappear, and the method call succeeds as long as the arguments
passed are compatible. Since both `collection.Seq` and `immutable.Seq`
implement the necessary methods for iteration, and the code uses
J`avaConverters.asJavaIterable` (which accepts `scala.collection.Iterable`), it
should be safe at runtime.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]