[ 
https://issues.apache.org/jira/browse/SPARK-27683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837967#comment-16837967
 ] 

Sean Owen commented on SPARK-27683:
-----------------------------------

There are many usages of TraversableOnce, though many are in internal packages 
and classes, which isn't so urgent to address. The concern are public APIs, and 
the main one is flatMap / flatMapValues / flatMapGroups. These accept a 
function that returns a TraversableOnce.

That's nice as TraversableOnce is a supertype of Iterable and Iterator, so one 
can return either in a flatMap. In Scala 2.13 IterableOnce will play that role 
but it isn't available in 2.12. This makes it hard to create an API method that 
works in both, and cuts off the possibility, I think, of deprecating the 
current method while adding the new one. 

IterableOnce will have basically two subclasses, Iterable and Iterator. These 
exist now. We could change to support both of those in flatMap now and 
deprecate the existing method. However this won't compile as there would be two 
methods with the same name and signature after erasure. Even if we drop the 
TraversableOnce version it won't work for the same reason.

There's a scala-collections-compat library that attempts to bridge some of the 
difference between 2.12 and 2.13. It does provide some help with IterableOnce, 
but, the compat class is in a different package (scala.collection.compat) than 
the final one, and is in any event just a type def for TraversableOnce. It 
doesn't seem to help.

I considered adding a dummy implementation of IterableOnce to our source, 
extending TraversableOnce. However this too won't help without defining 
implicit conversion from Iterable and Iterator to IterableOnce that users would 
have to import.

We could instead change the one flatMap method to accept an Iterator, or an 
Iterable. Either one makes some usages of flatMap stop working. Of the two, 
Iterator is probably the better choice. It's less restrictive on the caller, 
it's how the Java equivalent works now, and is more consistent with what 
TraversableOnce means now. That would mean you can't flatMap to a collection, 
which is unfortuante; you'd have to add ".iterator".

Another option is to of course maintain separate source trees for 2.12 and 2.13 
in the future. That's somewhat painful if it means maintaining two versions of 
PairRDDFunctions, RDD, DStream, etc. We may be able to break out just the part 
that varies into a separate class though.


I'm interested in thoughts on whether it's better to go for separate source 
trees to minimize change needed from callers, or, whether requiring an Iterator 
is acceptable enough as a breaking change in 3.0. But if we're going to do that 
it has to be for 3.0, and unfortunately I don't see a way to keep the existing 
method as deprecated while adding the new one.


> Remove usage of TraversableOnce
> -------------------------------
>
>                 Key: SPARK-27683
>                 URL: https://issues.apache.org/jira/browse/SPARK-27683
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, Spark Core, SQL, Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Major
>
> As with {{Traversable}}, {{TraversableOnce}} is going away in Scala 2.13. We 
> should use {{IterableOnce}} instead. This one is a bigger change as there are 
> more API methods with the existing signature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to