Thanks Sean! On Thu, Jan 19, 2017 at 6:09 AM, Sean Owen <so...@cloudera.com> wrote:
> Yes, confirmed that fixing it unfortunately causes trouble in Java 8. See > https://issues.apache.org/jira/browse/SPARK-19287 for further discussion. > > On Wed, Jan 18, 2017 at 9:00 PM Sean Owen <so...@cloudera.com> wrote: > >> Hm. Unless I am also totally missing or forgetting something, I think >> you're right. The equivalent in PairRDDFunctions.scala operations on a >> function from T to TraversableOnce[U] and a TraversableOnce is most like >> java.util.Iterator. >> >> You can work around it by wrapping it in a faked IteratorIterable. >> >> I think this is fixable in the API by deprecating this method and adding >> a new one that takes a FlatMapFunction. We'd have to triple-check in a test >> that this doesn't cause an API compatibility problem with respect to Java 8 >> lambdas, but if that's settled, I think this could be fixed without >> breaking the API. >> >> On Wed, Jan 18, 2017 at 8:50 PM Asher Krim <ak...@hubspot.com> wrote: >> >> In Spark 2 + Java + RDD api, the use of iterables was replaced with >> iterators. I just encountered an inconsistency in `flatMapValues` that may >> be a bug: >> >> `flatMapValues` (https://github.com/apache/spark/blob/master/core/src/ >> main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677) takes >> a `FlatMapFunction` (https://github.com/apache/spark/blob/ >> 39e2bad6a866d27c3ca594d15e574a1da3ee84cc/core/src/main/java/ >> org/apache/spark/api/java/function/FlatMapFunction.java) >> >> The problem is that `FlatMapFunction` was changed to return an iterator, >> but `rdd.flatMapValues` still expects an iterable. Am I using these >> constructs correctly? Is there a workaround other than converting the >> iterator to an iterable outside of the function? >> >> Thanks, >> -- >> Asher Krim >> Senior Software Engineer >> >> -- Asher Krim Senior Software Engineer