I will wait a couple of more days and if there's no objection I hear, I will document this at https://github.com/databricks/scala-style-guide#java-interoperability.
2020년 5월 7일 (목) 오후 9:18, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: > Hi all, I would like to proceed this. Are there more thoughts on this? If > not, I would like to go ahead with the proposal here. > > 2020년 4월 30일 (목) 오후 10:54, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: > >> Nothing is urgent. I just don't want to leave it undecided and just keep >> adding Java APIs inconsistently as it's currently happening. >> >> We should have a set of coherent APIs. It's very difficult to change APIs >> once they are out in releases. I guess I have seen people here agree with >> having a general guidance for the same reason at least - please let me know >> if I'm taking it wrong. >> >> I don't think we should assume Java programmers know how Scala works with >> Java types. Less assumtion might be better. >> >> I feel like we have things on the table to consider at this moment and >> not much point of waiting indefinitely. >> >> But sure maybe I am wrong. We can wait for more feedback for a couple of >> days. >> >> >> On Thu, 30 Apr 2020, 18:59 ZHANG Wei, <wezh...@outlook.com> wrote: >> >>> I feel a little pushed... :-) I still don't get the point of why it's >>> urgent to make the decision now. AFAIK, it's a common practice to handle >>> Scala types conversions by self when Java programmers prepare to >>> invoke Scala libraries. I'm not sure which one is the Java programmers' >>> root complaint, Scala type instance or Scala Jar file. >>> >>> My 2 cents. >>> >>> -- >>> Cheers, >>> -z >>> >>> On Thu, 30 Apr 2020 09:17:37 +0900 >>> Hyukjin Kwon <gurwls...@gmail.com> wrote: >>> >>> > There was a typo in the previous email. I am re-sending: >>> > >>> > Hm, I thought you meant you prefer 3. over 4 but don't mind >>> particularly. >>> > I don't mean to wait for more feedback. It looks likely just a deadlock >>> > which will be the worst case. >>> > I was suggesting to pick one way first, and stick to it. If we find out >>> > something later, we can discuss >>> > more about changing it later. >>> > >>> > Having separate Java specific API (3. way) >>> > - causes maintenance cost >>> > - makes users to search which API for Java every time >>> > - this looks the opposite why against the unified API set Spark >>> targeted >>> > so far. >>> > >>> > I don't completely buy the argument about Scala/Java friendly because >>> using >>> > Java instance is already documented in the official Scala >>> documentation. >>> > Users still need to search if we have Java specific methods for *some* >>> APIs. >>> > >>> > 2020년 4월 30일 (목) 오전 8:58, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: >>> > >>> > > Hm, I thought you meant you prefer 3. over 4 but don't mind >>> particularly. >>> > > I don't mean to wait for more feedback. It looks likely just a >>> deadlock >>> > > which will be the worst case. >>> > > I was suggesting to pick one way first, and stick to it. If we find >>> out >>> > > something later, we can discuss >>> > > more about changing it later. >>> > > >>> > > Having separate Java specific API (4. way) >>> > > - causes maintenance cost >>> > > - makes users to search which API for Java every time >>> > > - this looks the opposite why against the unified API set Spark >>> targeted >>> > > so far. >>> > > >>> > > I don't completely buy the argument about Scala/Java friendly because >>> > > using Java instance is already documented in the official Scala >>> > > documentation. >>> > > Users still need to search if we have Java specific methods for >>> *some* >>> > > APIs. >>> > > >>> > > >>> > > >>> > > On Thu, 30 Apr 2020, 00:06 Tom Graves, <tgraves...@yahoo.com> wrote: >>> > > >>> > >> Sorry I'm not sure what your last email means. Does it mean you are >>> > >> putting it up for a vote or just waiting to get more feedback? I >>> disagree >>> > >> with saying option 4 is the rule but agree having a general rule >>> makes >>> > >> sense. I think we need a lot more input to make the rule as it >>> affects the >>> > >> api's. >>> > >> >>> > >> Tom >>> > >> >>> > >> On Wednesday, April 29, 2020, 09:53:22 AM CDT, Hyukjin Kwon < >>> > >> gurwls...@gmail.com> wrote: >>> > >> >>> > >> >>> > >> I think I am not seeing explicit objection here but rather see >>> people >>> > >> tend to agree with the proposal in general. >>> > >> I would like to step forward rather than leaving it as a deadlock - >>> the >>> > >> worst choice here is to postpone and abandon this discussion with >>> this >>> > >> inconsistency. >>> > >> >>> > >> I don't currently target to document this as the cases are rather >>> > >> rare, and we haven't really documented JavaRDD <> RDD vs DataFrame >>> case as >>> > >> well. >>> > >> Let's keep monitoring and see if this discussion thread clarifies >>> things >>> > >> enough in such cases I mentioned. >>> > >> >>> > >> Let me know if you guys think differently. >>> > >> >>> > >> >>> > >> 2020년 4월 28일 (화) 오후 5:03, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: >>> > >> >>> > >> Spark has targeted to have a unified API set rather than having >>> separate >>> > >> Java classes to reduce the maintenance cost, >>> > >> e.g.) JavaRDD <> RDD vs DataFrame. These JavaXXX are more about the >>> > >> legacy. >>> > >> >>> > >> I think it's best to stick to the approach 4. in general cases. >>> > >> Other options might have to be considered based upon a specific >>> context. >>> > >> For example, if we *must* to add a bunch of Java-specifics >>> > >> into a specific class for an inevitable reason somewhere, I would >>> > >> consider to have a Java-specific class. >>> > >> >>> > >> >>> > >> >>> > >> 2020년 4월 28일 (화) 오후 4:38, ZHANG Wei <wezh...@outlook.com>님이 작성: >>> > >> >>> > >> Be frankly, I also love the pure Java type in Java API and Scala >>> type in >>> > >> Scala API. :-) >>> > >> >>> > >> If we don't treat Java as a "FRIEND" of Scala, just as Python, >>> maybe we >>> > >> can adopt the status of option 1, the specific Java classes. (But I >>> don't >>> > >> like the `Java` prefix, which is redundant when I'm coding Java app, >>> > >> such as JavaRDD, why not distinct it by package namespace...) The >>> specific >>> > >> Java API can also leverage some native Java language features with >>> new >>> > >> versions. >>> > >> >>> > >> And just since the friendly relationship between Scala and Java, >>> the Java >>> > >> user can call Scala API with `.asScala` or `.asJava`'s help if Java >>> API >>> > >> is not ready. Then switch to Java API when it's well cooked. >>> > >> >>> > >> The cons is more efforts to maintain. >>> > >> >>> > >> My 2 cents. >>> > >> >>> > >> -- >>> > >> Cheers, >>> > >> -z >>> > >> >>> > >> On Tue, 28 Apr 2020 12:07:36 +0900 >>> > >> Hyukjin Kwon <gurwls...@gmail.com> wrote: >>> > >> >>> > >> > The problem is that calling Scala instances in Java side is >>> discouraged >>> > >> in >>> > >> > general up to my best knowledge. >>> > >> > A Java user won't likely know asJava in Scala but a Scala user >>> will >>> > >> likely >>> > >> > know both asScala and asJava. >>> > >> > >>> > >> > >>> > >> > 2020년 4월 28일 (화) 오전 11:35, ZHANG Wei <wezh...@outlook.com>님이 작성: >>> > >> > >>> > >> > > How about making a small change on option 4: >>> > >> > > Keep Scala API returning Scala type instance with providing a >>> > >> > > `asJava` method to return a Java type instance. >>> > >> > > >>> > >> > > Scala 2.13 has provided CollectionConverter [1][2][3], in the >>> > >> following >>> > >> > > Spark dependences upgrade, which can be supported by nature. For >>> > >> > > current Scala 2.12 version, we can wrap >>> `ImplicitConversionsToJava`[4] >>> > >> > > as what Scala 2.13 does and add implicit conversions. >>> > >> > > >>> > >> > > Just my 2 cents. >>> > >> > > >>> > >> > > -- >>> > >> > > Cheers, >>> > >> > > -z >>> > >> > > >>> > >> > > [1] >>> > >> > > >>> > >> >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.scala-lang.org%2Foverviews%2Fcollections-2.13%2Fconversions-between-java-and-scala-collections.html&data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&sdata=Vw8k4x0D0P1Pocr17O6wPUQzt%2FS3iX0lCBigIKdy0yY%3D&reserved=0 >>> > >> > > [2] >>> > >> > > >>> > >> >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.scala-lang.org%2Fapi%2F2.13.0%2Fscala%2Fjdk%2Fjavaapi%2FCollectionConverters%24.html&data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&sdata=9R96UT1W05Wn6K3RhhkMi1lo6bUnHht3qEhKxsr7%2FI0%3D&reserved=0 >>> > >> > > [3] >>> > >> > > >>> > >> >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.scala-lang.org%2Fapi%2F2.13.0%2Fscala%2Fjdk%2FCollectionConverters%24.html&data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&sdata=IBKGT2uSOgMg0KQOLZnDkxMVeUiZDzEvKvxNF%2FZzXxs%3D&reserved=0 >>> > >> > > [4] >>> > >> > > >>> > >> >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.scala-lang.org%2Fapi%2F2.12.11%2Fscala%2Fcollection%2Fconvert%2FImplicitConversionsToJava%24.html&data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&sdata=nkxkT0WUrqpaOUcyvWDDlFK3yrTa7WJBlTw%2Foqjqqks%3D&reserved=0 >>> > >> > > >>> > >> > > >>> > >> > > On Tue, 28 Apr 2020 08:52:57 +0900 >>> > >> > > Hyukjin Kwon <gurwls...@gmail.com> wrote: >>> > >> > > >>> > >> > > > I would like to make sure I am open for other options that >>> can be >>> > >> > > > considered situationally and based on the context. >>> > >> > > > It's okay, and I don't target to restrict this here. For >>> example, >>> > >> DSv2, I >>> > >> > > > understand it's written in Java because Java >>> > >> > > > interfaces arguably brings better performance. That's why >>> vectorized >>> > >> > > > readers are written in Java too. >>> > >> > > > >>> > >> > > > Maybe the "general" wasn't explicit in my previous email. >>> Adding >>> > >> APIs to >>> > >> > > > return a Java instance is still >>> > >> > > > rather rare in general given my few years monitoring. >>> > >> > > > The problem I would more like to deal with is more about when >>> we >>> > >> need to >>> > >> > > > add one or a couple of user-facing >>> > >> > > > Java-specific APIs to return Java instances, which is >>> relatively >>> > >> more >>> > >> > > > frequent compared to when we need a bunch >>> > >> > > > of Java specific APIs. >>> > >> > > > >>> > >> > > > In this case, I think it should be guided to use 4. approach. >>> There >>> > >> are >>> > >> > > > pros and cons between 3. and 4., of course. >>> > >> > > > But it looks to me 4. approach is closer to what Spark has >>> targeted >>> > >> so >>> > >> > > far. >>> > >> > > > >>> > >> > > > >>> > >> > > > >>> > >> > > > 2020년 4월 28일 (화) 오전 8:34, Hyukjin Kwon <gurwls...@gmail.com>님이 >>> 작성: >>> > >> > > > >>> > >> > > > > > One thing we could do here is use Java collections >>> internally >>> > >> and >>> > >> > > make >>> > >> > > > > the Scala API a thin wrapper around Java -- like how Python >>> works. >>> > >> > > > > > Then adding a method to the Scala API would require >>> adding it >>> > >> to the >>> > >> > > > > Java API and we would keep the two more in sync. >>> > >> > > > > >>> > >> > > > > I think it can be an appropriate idea for when we have to >>> deal >>> > >> with >>> > >> > > this >>> > >> > > > > case a lot but I don't think there are so many >>> > >> > > > > user-facing APIs to return a Java collections, it's rather >>> rare. >>> > >> Also, >>> > >> > > the >>> > >> > > > > Java users are relatively less than Scala users. >>> > >> > > > > This case is slightly different from Python in a way that >>> there >>> > >> are so >>> > >> > > > > many differences to deal with in PySpark case. >>> > >> > > > > >>> > >> > > > > Also, in case of `Seq`, actually we can just use `Array` >>> instead >>> > >> for >>> > >> > > both >>> > >> > > > > Scala and Java side simply. I don't find such cases notably >>> > >> awkward. >>> > >> > > > > This problematic cases might be specific to few Java >>> collections >>> > >> or >>> > >> > > > > instances, and I would like to avoid an overkill here. >>> > >> > > > > >>> > >> > > > > Of course, if there is a place to consider other options, >>> let's >>> > >> do. I >>> > >> > > > > don't like to say this is the only required option. >>> > >> > > > > >>> > >> > > > > >>> > >> > > > > >>> > >> > > > > >>> > >> > > > > >>> > >> > > > > 2020년 4월 28일 (화) 오전 1:18, Ryan Blue >>> <rb...@netflix.com.invalid>님이 >>> > >> 작성: >>> > >> > > > > >>> > >> > > > >> I think the right choice here depends on how the object is >>> used. >>> > >> For >>> > >> > > > >> developer and internal APIs, I think standardizing on Java >>> > >> collections >>> > >> > > > >> makes the most sense. >>> > >> > > > >> >>> > >> > > > >> For user-facing APIs, it is awkward to return Java >>> collections to >>> > >> > > Scala >>> > >> > > > >> code -- I think that's the motivation for Tom's comment. >>> For user >>> > >> > > APIs, I >>> > >> > > > >> think most methods should return Scala collections, and I >>> don't >>> > >> have a >>> > >> > > > >> strong opinion about whether the conversion (or lack >>> thereof) is >>> > >> done >>> > >> > > in a >>> > >> > > > >> separate object (#1) or in parallel methods (#3). >>> > >> > > > >> >>> > >> > > > >> Both #1 and #3 seem like about the same amount of work and >>> have >>> > >> the >>> > >> > > same >>> > >> > > > >> likelihood that a developer will leave out a Java method >>> > >> version. One >>> > >> > > thing >>> > >> > > > >> we could do here is use Java collections internally and >>> make the >>> > >> > > Scala API >>> > >> > > > >> a thin wrapper around Java -- like how Python works. Then >>> adding >>> > >> a >>> > >> > > method >>> > >> > > > >> to the Scala API would require adding it to the Java API >>> and we >>> > >> would >>> > >> > > keep >>> > >> > > > >> the two more in sync. It would also help avoid Scala >>> collections >>> > >> > > leaking >>> > >> > > > >> into internals. >>> > >> > > > >> >>> > >> > > > >> On Mon, Apr 27, 2020 at 8:49 AM Hyukjin Kwon < >>> > >> gurwls...@gmail.com> >>> > >> > > wrote: >>> > >> > > > >> >>> > >> > > > >>> Let's stick to the less maintenance efforts then rather >>> than we >>> > >> > > leave it >>> > >> > > > >>> undecided and delay with leaving this inconsistency. >>> > >> > > > >>> >>> > >> > > > >>> I dont think we can have some very meaningful data about >>> this >>> > >> soon >>> > >> > > given >>> > >> > > > >>> that we don't hear much complaints about this in general >>> so far. >>> > >> > > > >>> >>> > >> > > > >>> The point of this thread is to make a call rather then >>> defer to >>> > >> the >>> > >> > > > >>> future. >>> > >> > > > >>> >>> > >> > > > >>> On Mon, 27 Apr 2020, 23:15 Wenchen Fan, < >>> cloud0...@gmail.com> >>> > >> wrote: >>> > >> > > > >>> >>> > >> > > > >>>> IIUC We are moving away from having 2 classes for Java >>> and >>> > >> Scala, >>> > >> > > like >>> > >> > > > >>>> JavaRDD and RDD. It's much simpler to maintain and use >>> with a >>> > >> > > single class. >>> > >> > > > >>>> >>> > >> > > > >>>> I don't have a strong preference over option 3 or 4. We >>> may >>> > >> need to >>> > >> > > > >>>> collect more data points from actual users. >>> > >> > > > >>>> >>> > >> > > > >>>> On Mon, Apr 27, 2020 at 9:50 PM Hyukjin Kwon < >>> > >> gurwls...@gmail.com> >>> > >> > > > >>>> wrote: >>> > >> > > > >>>> >>> > >> > > > >>>>> Scala users are arguably more prevailing compared to >>> Java >>> > >> users, >>> > >> > > yes. >>> > >> > > > >>>>> Using the Java instances in Scala side is legitimate, >>> and >>> > >> they are >>> > >> > > > >>>>> already being used in multiple please. I don't believe >>> Scala >>> > >> > > > >>>>> users find this not Scala friendly as it's legitimate >>> and >>> > >> already >>> > >> > > > >>>>> being used. I personally find it's more trouble some to >>> let >>> > >> Java >>> > >> > > > >>>>> users to search which APIs to call. Yes, I understand >>> the >>> > >> pros and >>> > >> > > > >>>>> cons - we should also find the balance considering the >>> actual >>> > >> > > usage. >>> > >> > > > >>>>> >>> > >> > > > >>>>> One more argument from me is, though, I think one of the >>> > >> goals in >>> > >> > > > >>>>> Spark APIs is the unified API set up to my knowledge >>> > >> > > > >>>>> e.g., JavaRDD <> RDD vs DataFrame. >>> > >> > > > >>>>> If either way is not particularly preferred over the >>> other, I >>> > >> would >>> > >> > > > >>>>> just choose the one to have the unified API set. >>> > >> > > > >>>>> >>> > >> > > > >>>>> >>> > >> > > > >>>>> >>> > >> > > > >>>>> 2020년 4월 27일 (월) 오후 10:37, Tom Graves < >>> tgraves...@yahoo.com>님이 >>> > >> 작성: >>> > >> > > > >>>>> >>> > >> > > > >>>>>> I agree a general guidance is good so we keep >>> consistent in >>> > >> the >>> > >> > > apis. >>> > >> > > > >>>>>> I don't necessarily agree that 4 is the best solution >>> > >> though. I >>> > >> > > agree its >>> > >> > > > >>>>>> nice to have one api, but it is less friendly for the >>> scala >>> > >> side. >>> > >> > > > >>>>>> Searching for the equivalent Java api shouldn't be >>> hard as it >>> > >> > > should be >>> > >> > > > >>>>>> very close in the name and if we make it a general >>> rule users >>> > >> > > should >>> > >> > > > >>>>>> understand it. I guess one good question is what API >>> do >>> > >> most of >>> > >> > > our users >>> > >> > > > >>>>>> use between Java and Scala and what is the ratio? I >>> don't >>> > >> know >>> > >> > > the answer >>> > >> > > > >>>>>> to that. I've seen more using Scala over Java. If the >>> > >> majority >>> > >> > > use Scala >>> > >> > > > >>>>>> then I think the API should be more friendly to that. >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> Tom >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> On Monday, April 27, 2020, 04:04:28 AM CDT, Hyukjin >>> Kwon < >>> > >> > > > >>>>>> gurwls...@gmail.com> wrote: >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> Hi all, >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> I would like to discuss Java specific APIs and which >>> design >>> > >> we >>> > >> > > will >>> > >> > > > >>>>>> choose. >>> > >> > > > >>>>>> This has been discussed in multiple places so far, for >>> > >> example, at >>> > >> > > > >>>>>> >>> > >> >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F28085%23discussion_r407334754&data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&sdata=zEYDV0XyvDbeL5YojcdZWHfuJ%2BVOP5%2ByFlbkTFlHPGM%3D&reserved=0 >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> *The problem:* >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> In short, I would like us to have clear guidance on >>> how we >>> > >> support >>> > >> > > > >>>>>> Java specific APIs when >>> > >> > > > >>>>>> it requires to return a Java instance. The problem is >>> simple: >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> def requests: Map[String, ExecutorResourceRequest] = >>> ... >>> > >> > > > >>>>>> def requestsJMap: java.util.Map[String, >>> > >> ExecutorResourceRequest] >>> > >> > > = ... >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> vs >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> def requests: java.util.Map[String, >>> ExecutorResourceRequest] >>> > >> = ... >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> *Current codebase:* >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> My understanding so far was that the latter is >>> preferred and >>> > >> more >>> > >> > > > >>>>>> consistent and prevailing in the >>> > >> > > > >>>>>> existing codebase, for example, see >>> StateOperatorProgress and >>> > >> > > > >>>>>> StreamingQueryProgress in Structured Streaming. >>> > >> > > > >>>>>> However, I realised that we also have other approaches >>> in the >>> > >> > > current >>> > >> > > > >>>>>> codebase. There look >>> > >> > > > >>>>>> four approaches to deal with Java specifics in general: >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> 1. Java specific classes such as JavaRDD and >>> > >> JavaSparkContext. >>> > >> > > > >>>>>> 2. Java specific methods with the same name that >>> overload >>> > >> its >>> > >> > > > >>>>>> parameters, see functions.scala. >>> > >> > > > >>>>>> 3. Java specific methods with a different name that >>> needs >>> > >> to >>> > >> > > > >>>>>> return a different type such as >>> TaskContext.resourcesJMap >>> > >> vs >>> > >> > > > >>>>>> TaskContext.resources. >>> > >> > > > >>>>>> 4. One method that returns a Java instance for both >>> Scala >>> > >> and >>> > >> > > > >>>>>> Java sides. see StateOperatorProgress and >>> > >> > > StreamingQueryProgress. >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> *Analysis on the current codebase:* >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> I agree with 2. approach because the corresponding >>> cases >>> > >> give you >>> > >> > > a >>> > >> > > > >>>>>> consistent API usage across >>> > >> > > > >>>>>> other language APIs in general. Approach 1. is from >>> the old >>> > >> world >>> > >> > > > >>>>>> when we didn't have unified APIs. >>> > >> > > > >>>>>> This might be the worst approach. >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> 3. and 4. are controversial. >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> For 3., if you have to use Java APIs, then, you should >>> > >> search if >>> > >> > > > >>>>>> there is a variant of that API >>> > >> > > > >>>>>> every time specifically for Java APIs. But yes, it >>> gives you >>> > >> > > > >>>>>> Java/Scala friendly instances. >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> For 4., having one API that returns a Java instance >>> makes you >>> > >> > > able to >>> > >> > > > >>>>>> use it in both Scala and Java APIs >>> > >> > > > >>>>>> sides although it makes you call asScala in Scala side >>> > >> > > specifically. >>> > >> > > > >>>>>> But you don’t >>> > >> > > > >>>>>> have to search if there’s a variant of this API and it >>> gives >>> > >> you a >>> > >> > > > >>>>>> consistent API usage across languages. >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> Also, note that calling Java in Scala is legitimate >>> but the >>> > >> > > opposite >>> > >> > > > >>>>>> case is not, up to my best knowledge. >>> > >> > > > >>>>>> In addition, you should have a method that returns a >>> Java >>> > >> instance >>> > >> > > > >>>>>> for PySpark or SparkR to support. >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> *Proposal:* >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> I would like to have a general guidance on this that >>> the >>> > >> Spark dev >>> > >> > > > >>>>>> agrees upon: Do 4. approach. If not possible, do 3. >>> Avoid 1 >>> > >> > > almost at all >>> > >> > > > >>>>>> cost. >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> Note that this isn't a hard requirement but *a general >>> > >> guidance*; >>> > >> > > > >>>>>> therefore, the decision might be up to >>> > >> > > > >>>>>> the specific context. For example, when there are some >>> strong >>> > >> > > > >>>>>> arguments to have a separate Java specific API, that’s >>> fine. >>> > >> > > > >>>>>> Of course, we won’t change the existing methods given >>> > >> Micheal’s >>> > >> > > > >>>>>> rubric added before. I am talking about new >>> > >> > > > >>>>>> methods in unreleased branches. >>> > >> > > > >>>>>> >>> > >> > > > >>>>>> Any concern or opinion on this? >>> > >> > > > >>>>>> >>> > >> > > > >>>>> >>> > >> > > > >> >>> > >> > > > >> -- >>> > >> > > > >> Ryan Blue >>> > >> > > > >> Software Engineer >>> > >> > > > >> Netflix >>> > >> > > > >> >>> > >> > > > > >>> > >> > > >>> > >> >>> > >> >>> >>