Hm. Unless I am also totally missing or forgetting something, I think you're right. The equivalent in PairRDDFunctions.scala operations on a function from T to TraversableOnce[U] and a TraversableOnce is most like java.util.Iterator.
You can work around it by wrapping it in a faked IteratorIterable. I think this is fixable in the API by deprecating this method and adding a new one that takes a FlatMapFunction. We'd have to triple-check in a test that this doesn't cause an API compatibility problem with respect to Java 8 lambdas, but if that's settled, I think this could be fixed without breaking the API. On Wed, Jan 18, 2017 at 8:50 PM Asher Krim <ak...@hubspot.com> wrote: > In Spark 2 + Java + RDD api, the use of iterables was replaced with > iterators. I just encountered an inconsistency in `flatMapValues` that may > be a bug: > > `flatMapValues` ( > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677) > takes a `FlatMapFunction` ( > https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/core/src/main/java/org/apache/spark/api/java/function/FlatMapFunction.java > ) > > The problem is that `FlatMapFunction` was changed to return an iterator, > but `rdd.flatMapValues` still expects an iterable. Am I using these > constructs correctly? Is there a workaround other than converting the > iterator to an iterable outside of the function? > > Thanks, > -- > Asher Krim > Senior Software Engineer >