Hm. Unless I am also totally missing or forgetting something, I think
you're right. The equivalent in PairRDDFunctions.scala operations on a
function from T to TraversableOnce[U] and a TraversableOnce is most like
java.util.Iterator.

You can work around it by wrapping it in a faked IteratorIterable.

I think this is fixable in the API by deprecating this method and adding a
new one that takes a FlatMapFunction. We'd have to triple-check in a test
that this doesn't cause an API compatibility problem with respect to Java 8
lambdas, but if that's settled, I think this could be fixed without
breaking the API.

On Wed, Jan 18, 2017 at 8:50 PM Asher Krim <ak...@hubspot.com> wrote:

> In Spark 2 + Java + RDD api, the use of iterables was replaced with
> iterators. I just encountered an inconsistency in `flatMapValues` that may
> be a bug:
>
> `flatMapValues` (
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677)
> takes a `FlatMapFunction` (
> https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/core/src/main/java/org/apache/spark/api/java/function/FlatMapFunction.java
> )
>
> The problem is that `FlatMapFunction` was changed to return an iterator,
> but `rdd.flatMapValues` still expects an iterable. Am I using these
> constructs correctly? Is there a workaround other than converting the
> iterator to an iterable outside of the function?
>
> Thanks,
> --
> Asher Krim
> Senior Software Engineer
>

Reply via email to