[ https://issues.apache.org/jira/browse/SPARK-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-3369: ----------------------------- Labels: breaking_change releasenotes (was: breaking_change) There's another related issue: {{DStream.flatMap}} takes a {{T => Traversable[U]}}, which is inconsistent with other APIs in the same way, which take functions producing {{TraversableOnce}}. {{Traversable}} is like {{Iterable}} meaning "can be iterated many times". This API change seems to go logically together. In practice it only changes binary compatibility since a {{Traversable}} is a special case of {{TraversableOnce}} and so all existing code continues to compile. > Java mapPartitions Iterator->Iterable is inconsistent with Scala's > Iterator->Iterator > ------------------------------------------------------------------------------------- > > Key: SPARK-3369 > URL: https://issues.apache.org/jira/browse/SPARK-3369 > Project: Spark > Issue Type: Improvement > Components: Java API > Affects Versions: 1.0.2, 1.2.1 > Reporter: Sean Owen > Assignee: Sean Owen > Labels: breaking_change, releasenotes > Attachments: FlatMapIterator.patch > > > {{mapPartitions}} in the Scala RDD API takes a function that transforms an > {{Iterator}} to an {{Iterator}}: > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD > In the Java RDD API, the equivalent is a FlatMapFunction, which operates on > an {{Iterator}} but is requires to return an {{Iterable}}, which is a > stronger condition and appears inconsistent. It's a problematic inconsistent > though because this seems to require copying all of the input into memory in > order to create an object that can be iterated many times, since the input > does not afford this itself. > Similarity for other {{mapPartitions*}} methods and other > {{*FlatMapFunctions}}s in Java. > (Is there a reason for this difference that I'm overlooking?) > If I'm right that this was inadvertent inconsistency, then the big issue here > is that of course this is part of a public API. Workarounds I can think of: > Promise that Spark will only call {{iterator()}} once, so implementors can > use a hacky {{IteratorIterable}} that returns the same {{Iterator}}. > Or, make a series of methods accepting a {{FlatMapFunction2}}, etc. with the > desired signature, and deprecate existing ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org