[ 
https://issues.apache.org/jira/browse/SPARK-19287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19287:
------------------------------------

    Assignee:     (was: Apache Spark)

> JavaPairRDD flatMapValues requires function returning Iterable, not Iterator
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-19287
>                 URL: https://issues.apache.org/jira/browse/SPARK-19287
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.1.1
>            Reporter: Sean Owen
>            Priority: Minor
>              Labels: release-notes
>
> SPARK-3369 corrected an old oversight in the Java API, wherein 
> {{FlatMapFunction}} required an {{Iterable}} rather than {{Iterator}}. As 
> reported by [~akrim], it seems that this same type of problem was overlooked 
> also in {{JavaPairRDD}} 
> (https://github.com/apache/spark/blob/6c00c069e3c3f5904abd122cea1d56683031cca0/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677
>  ):
> {code}
> def flatMapValues[U](f: JFunction[V, java.lang.Iterable[U]]): JavaPairRDD[K, 
> U] =
> {code}
> As in {{PairRDDFunctions.scala}}, whose {{flatMapValues}} operates on 
> {{TraversableOnce}}, this should really take a function that returns an 
> {{Iterator}} -- really, {{FlatMapFunction}}.
> We can easily add an overload and deprecate the existing method.
> {code}
> def flatMapValues[U](f: FlatMapFunction[V, U]): JavaPairRDD[K, U]
> {code}
> This is source- and binary-backwards-compatible, in Java 7. It's 
> binary-backwards-compatible in Java 8, but not source-compatible. The 
> following natural usage with Java 8 lambdas becomes ambiguous and won't 
> compile -- Java won't figure out which to implement even based on the return 
> type unfortunately:
> {code}
> JavaPairRDD<Integer, String> pairRDD = ...
> JavaPairRDD<Integer, Integer> mappedRDD = 
>   pairRDD.flatMapValues(s -> Arrays.asList(s.length()).iterator());
> {code}
> It can be resolved by explicitly casting the lambda.
> We can at least document this. One day in Spark 3.x this can just be changed 
> outright.
> It's conceivable to resolve this by making the new method called 
> "flatMapValues2" or something ugly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to