[ 
https://issues.apache.org/jira/browse/SPARK-19287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19287:
---------------------------------
    Docs Text: JavaPairRDD/JavaPairDStream.flatMapValues() now requires a 
FlatMapFunction as an argument. This means this function now must return an 
Iterator, not Iterable. This corrects a long-standing inconsistency between the 
Scala and Java API, and allows the caller to supply merely an Iterator, not a 
full Iterable. Existing functions passed to this method can simply invoke 
".iterator()" on their existing return value to comply with the new signature.  
(was: JavaPairRDD.flatMapValues() now requires a FlatMapFunction as an 
argument. This means this function now must return an Iterator, not Iterable. 
This corrects a long-standing inconsistency between the Scala and Java API, and 
allows the caller to supply merely an Iterator, not a full Iterable. Existing 
functions passed to this method can simply invoke ".iterator()" on their 
existing return value to comply with the new signature.)

> JavaPairRDD flatMapValues requires function returning Iterable, not Iterator
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-19287
>                 URL: https://issues.apache.org/jira/browse/SPARK-19287
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.1.1
>            Reporter: Sean R. Owen
>            Assignee: Sean R. Owen
>            Priority: Minor
>              Labels: release-notes
>             Fix For: 3.0.0
>
>
> SPARK-3369 corrected an old oversight in the Java API, wherein 
> {{FlatMapFunction}} required an {{Iterable}} rather than {{Iterator}}. As 
> reported by [~akrim], it seems that this same type of problem was overlooked 
> also in {{JavaPairRDD}} 
> (https://github.com/apache/spark/blob/6c00c069e3c3f5904abd122cea1d56683031cca0/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677
>  ):
> {code}
> def flatMapValues[U](f: JFunction[V, java.lang.Iterable[U]]): JavaPairRDD[K, 
> U] =
> {code}
> As in {{PairRDDFunctions.scala}}, whose {{flatMapValues}} operates on 
> {{TraversableOnce}}, this should really take a function that returns an 
> {{Iterator}} -- really, {{FlatMapFunction}}.
> We can easily add an overload and deprecate the existing method.
> {code}
> def flatMapValues[U](f: FlatMapFunction[V, U]): JavaPairRDD[K, U]
> {code}
> This is source- and binary-backwards-compatible, in Java 7. It's 
> binary-backwards-compatible in Java 8, but not source-compatible. The 
> following natural usage with Java 8 lambdas becomes ambiguous and won't 
> compile -- Java won't figure out which to implement even based on the return 
> type unfortunately:
> {code}
> JavaPairRDD<Integer, String> pairRDD = ...
> JavaPairRDD<Integer, Integer> mappedRDD = 
>   pairRDD.flatMapValues(s -> Arrays.asList(s.length()).iterator());
> {code}
> It can be resolved by explicitly casting the lambda.
> We can at least document this. One day in Spark 3.x this can just be changed 
> outright.
> It's conceivable to resolve this by making the new method called 
> "flatMapValues2" or something ugly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to