Re: [DISCUSS] Java specific APIs design concern and choice

Hyukjin Kwon Mon, 11 May 2020 04:45:09 -0700

I will wait a couple of more days and if there's no objection I hear, I
will document this at
https://github.com/databricks/scala-style-guide#java-interoperability.


2020년 5월 7일 (목) 오후 9:18, Hyukjin Kwon <gurwls...@gmail.com>님이 작성:

> Hi all, I would like to proceed this. Are there more thoughts on this? If
> not, I would like to go ahead with the proposal here.
>
> 2020년 4월 30일 (목) 오후 10:54, Hyukjin Kwon <gurwls...@gmail.com>님이 작성:
>
>> Nothing is urgent. I just don't want to leave it undecided and just keep
>> adding Java APIs inconsistently as it's currently happening.
>>
>> We should have a set of coherent APIs. It's very difficult to change APIs
>> once they are out in releases. I guess I have seen people here agree with
>> having a general guidance for the same reason at least - please let me know
>> if I'm taking it wrong.
>>
>> I don't think we should assume Java programmers know how Scala works with
>> Java types. Less assumtion might be better.
>>
>> I feel like we have things on the table to consider at this moment and
>> not much point of waiting indefinitely.
>>
>> But sure maybe I am wrong. We can wait for more feedback for a couple of
>> days.
>>
>>
>> On Thu, 30 Apr 2020, 18:59 ZHANG Wei, <wezh...@outlook.com> wrote:
>>
>>> I feel a little pushed... :-) I still don't get the point of why it's
>>> urgent to make the decision now. AFAIK, it's a common practice to handle
>>> Scala types conversions by self when Java programmers prepare to
>>> invoke Scala libraries. I'm not sure which one is the Java programmers'
>>> root complaint, Scala type instance or Scala Jar file.
>>>
>>> My 2 cents.
>>>
>>> --
>>> Cheers,
>>> -z
>>>
>>> On Thu, 30 Apr 2020 09:17:37 +0900
>>> Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>>
>>> > There was a typo in the previous email. I am re-sending:
>>> >
>>> > Hm, I thought you meant you prefer 3. over 4 but don't mind
>>> particularly.
>>> > I don't mean to wait for more feedback. It looks likely just a deadlock
>>> > which will be the worst case.
>>> > I was suggesting to pick one way first, and stick to it. If we find out
>>> > something later, we can discuss
>>> > more about changing it later.
>>> >
>>> > Having separate Java specific API (3. way)
>>> >   - causes maintenance cost
>>> >   - makes users to search which API for Java every time
>>> >   - this looks the opposite why against the unified API set Spark
>>> targeted
>>> > so far.
>>> >
>>> > I don't completely buy the argument about Scala/Java friendly because
>>> using
>>> > Java instance is already documented in the official Scala
>>> documentation.
>>> > Users still need to search if we have Java specific methods for *some*
>>> APIs.
>>> >
>>> > 2020년 4월 30일 (목) 오전 8:58, Hyukjin Kwon <gurwls...@gmail.com>님이 작성:
>>> >
>>> > > Hm, I thought you meant you prefer 3. over 4 but don't mind
>>> particularly.
>>> > > I don't mean to wait for more feedback. It looks likely just a
>>> deadlock
>>> > > which will be the worst case.
>>> > > I was suggesting to pick one way first, and stick to it. If we find
>>> out
>>> > > something later, we can discuss
>>> > > more about changing it later.
>>> > >
>>> > > Having separate Java specific API (4. way)
>>> > >   - causes maintenance cost
>>> > >   - makes users to search which API for Java every time
>>> > >   - this looks the opposite why against the unified API set Spark
>>> targeted
>>> > > so far.
>>> > >
>>> > > I don't completely buy the argument about Scala/Java friendly because
>>> > > using Java instance is already documented in the official Scala
>>> > > documentation.
>>> > > Users still need to search if we have Java specific methods for
>>> *some*
>>> > > APIs.
>>> > >
>>> > >
>>> > >
>>> > > On Thu, 30 Apr 2020, 00:06 Tom Graves, <tgraves...@yahoo.com> wrote:
>>> > >
>>> > >> Sorry I'm not sure what your last email means. Does it mean you are
>>> > >> putting it up for a vote or just waiting to get more feedback?  I
>>> disagree
>>> > >> with saying option 4 is the rule but agree having a general rule
>>> makes
>>> > >> sense.  I think we need a lot more input to make the rule as it
>>> affects the
>>> > >> api's.
>>> > >>
>>> > >> Tom
>>> > >>
>>> > >> On Wednesday, April 29, 2020, 09:53:22 AM CDT, Hyukjin Kwon <
>>> > >> gurwls...@gmail.com> wrote:
>>> > >>
>>> > >>
>>> > >> I think I am not seeing explicit objection here but rather see
>>> people
>>> > >> tend to agree with the proposal in general.
>>> > >> I would like to step forward rather than leaving it as a deadlock -
>>> the
>>> > >> worst choice here is to postpone and abandon this discussion with
>>> this
>>> > >> inconsistency.
>>> > >>
>>> > >> I don't currently target to document this as the cases are rather
>>> > >> rare, and we haven't really documented JavaRDD <> RDD vs DataFrame
>>> case as
>>> > >> well.
>>> > >> Let's keep monitoring and see if this discussion thread clarifies
>>> things
>>> > >> enough in such cases I mentioned.
>>> > >>
>>> > >> Let me know if you guys think differently.
>>> > >>
>>> > >>
>>> > >> 2020년 4월 28일 (화) 오후 5:03, Hyukjin Kwon <gurwls...@gmail.com>님이 작성:
>>> > >>
>>> > >> Spark has targeted to have a unified API set rather than having
>>> separate
>>> > >> Java classes to reduce the maintenance cost,
>>> > >> e.g.) JavaRDD <> RDD vs DataFrame. These JavaXXX are more about the
>>> > >> legacy.
>>> > >>
>>> > >> I think it's best to stick to the approach 4. in general cases.
>>> > >> Other options might have to be considered based upon a specific
>>> context.
>>> > >> For example, if we *must* to add a bunch of Java-specifics
>>> > >> into a specific class for an inevitable reason somewhere, I would
>>> > >> consider to have a Java-specific class.
>>> > >>
>>> > >>
>>> > >>
>>> > >> 2020년 4월 28일 (화) 오후 4:38, ZHANG Wei <wezh...@outlook.com>님이 작성:
>>> > >>
>>> > >> Be frankly, I also love the pure Java type in Java API and Scala
>>> type in
>>> > >> Scala API. :-)
>>> > >>
>>> > >> If we don't treat Java as a "FRIEND" of Scala, just as Python,
>>> maybe we
>>> > >> can adopt the status of option 1, the specific Java classes. (But I
>>> don't
>>> > >> like the `Java` prefix, which is redundant when I'm coding Java app,
>>> > >> such as JavaRDD, why not distinct it by package namespace...) The
>>> specific
>>> > >> Java API can also leverage some native Java language features with
>>> new
>>> > >> versions.
>>> > >>
>>> > >> And just since the friendly relationship between Scala and Java,
>>> the Java
>>> > >> user can call Scala API with `.asScala` or `.asJava`'s help if Java
>>> API
>>> > >> is not ready. Then switch to Java API when it's well cooked.
>>> > >>
>>> > >> The cons is more efforts to maintain.
>>> > >>
>>> > >> My 2 cents.
>>> > >>
>>> > >> --
>>> > >> Cheers,
>>> > >> -z
>>> > >>
>>> > >> On Tue, 28 Apr 2020 12:07:36 +0900
>>> > >> Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>> > >>
>>> > >> > The problem is that calling Scala instances in Java side is
>>> discouraged
>>> > >> in
>>> > >> > general up to my best knowledge.
>>> > >> > A Java user won't likely know asJava in Scala but a Scala user
>>> will
>>> > >> likely
>>> > >> > know both asScala and asJava.
>>> > >> >
>>> > >> >
>>> > >> > 2020년 4월 28일 (화) 오전 11:35, ZHANG Wei <wezh...@outlook.com>님이 작성:
>>> > >> >
>>> > >> > > How about making a small change on option 4:
>>> > >> > >   Keep Scala API returning Scala type instance with providing a
>>> > >> > >   `asJava` method to return a Java type instance.
>>> > >> > >
>>> > >> > > Scala 2.13 has provided CollectionConverter [1][2][3], in the
>>> > >> following
>>> > >> > > Spark dependences upgrade, which can be supported by nature. For
>>> > >> > > current Scala 2.12 version, we can wrap
>>> `ImplicitConversionsToJava`[4]
>>> > >> > > as what Scala 2.13 does and add implicit conversions.
>>> > >> > >
>>> > >> > > Just my 2 cents.
>>> > >> > >
>>> > >> > > --
>>> > >> > > Cheers,
>>> > >> > > -z
>>> > >> > >
>>> > >> > > [1]
>>> > >> > >
>>> > >>
>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.scala-lang.org%2Foverviews%2Fcollections-2.13%2Fconversions-between-java-and-scala-collections.html&amp;data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&amp;sdata=Vw8k4x0D0P1Pocr17O6wPUQzt%2FS3iX0lCBigIKdy0yY%3D&amp;reserved=0
>>> > >> > > [2]
>>> > >> > >
>>> > >>
>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.scala-lang.org%2Fapi%2F2.13.0%2Fscala%2Fjdk%2Fjavaapi%2FCollectionConverters%24.html&amp;data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&amp;sdata=9R96UT1W05Wn6K3RhhkMi1lo6bUnHht3qEhKxsr7%2FI0%3D&amp;reserved=0
>>> > >> > > [3]
>>> > >> > >
>>> > >>
>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.scala-lang.org%2Fapi%2F2.13.0%2Fscala%2Fjdk%2FCollectionConverters%24.html&amp;data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&amp;sdata=IBKGT2uSOgMg0KQOLZnDkxMVeUiZDzEvKvxNF%2FZzXxs%3D&amp;reserved=0
>>> > >> > > [4]
>>> > >> > >
>>> > >>
>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.scala-lang.org%2Fapi%2F2.12.11%2Fscala%2Fcollection%2Fconvert%2FImplicitConversionsToJava%24.html&amp;data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&amp;sdata=nkxkT0WUrqpaOUcyvWDDlFK3yrTa7WJBlTw%2Foqjqqks%3D&amp;reserved=0
>>> > >> > >
>>> > >> > >
>>> > >> > > On Tue, 28 Apr 2020 08:52:57 +0900
>>> > >> > > Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>> > >> > >
>>> > >> > > > I would like to make sure I am open for other options that
>>> can be
>>> > >> > > > considered situationally and based on the context.
>>> > >> > > > It's okay, and I don't target to restrict this here. For
>>> example,
>>> > >> DSv2, I
>>> > >> > > > understand it's written in Java because Java
>>> > >> > > > interfaces arguably brings better performance. That's why
>>> vectorized
>>> > >> > > > readers are written in Java too.
>>> > >> > > >
>>> > >> > > > Maybe the "general" wasn't explicit in my previous email.
>>> Adding
>>> > >> APIs to
>>> > >> > > > return a Java instance is still
>>> > >> > > > rather rare in general given my few years monitoring.
>>> > >> > > > The problem I would more like to deal with is more about when
>>> we
>>> > >> need to
>>> > >> > > > add one or a couple of user-facing
>>> > >> > > > Java-specific APIs to return Java instances, which is
>>> relatively
>>> > >> more
>>> > >> > > > frequent compared to when we need a bunch
>>> > >> > > > of Java specific APIs.
>>> > >> > > >
>>> > >> > > > In this case, I think it should be guided to use 4. approach.
>>> There
>>> > >> are
>>> > >> > > > pros and cons between 3. and 4., of course.
>>> > >> > > > But it looks to me 4. approach is closer to what Spark has
>>> targeted
>>> > >> so
>>> > >> > > far.
>>> > >> > > >
>>> > >> > > >
>>> > >> > > >
>>> > >> > > > 2020년 4월 28일 (화) 오전 8:34, Hyukjin Kwon <gurwls...@gmail.com>님이
>>> 작성:
>>> > >> > > >
>>> > >> > > > > > One thing we could do here is use Java collections
>>> internally
>>> > >> and
>>> > >> > > make
>>> > >> > > > > the Scala API a thin wrapper around Java -- like how Python
>>> works.
>>> > >> > > > > > Then adding a method to the Scala API would require
>>> adding it
>>> > >> to the
>>> > >> > > > > Java API and we would keep the two more in sync.
>>> > >> > > > >
>>> > >> > > > > I think it can be an appropriate idea for when we have to
>>> deal
>>> > >> with
>>> > >> > > this
>>> > >> > > > > case a lot but I don't think there are so many
>>> > >> > > > > user-facing APIs to return a Java collections, it's rather
>>> rare.
>>> > >> Also,
>>> > >> > > the
>>> > >> > > > > Java users are relatively less than Scala users.
>>> > >> > > > > This case is slightly different from Python in a way that
>>> there
>>> > >> are so
>>> > >> > > > > many differences to deal with in PySpark case.
>>> > >> > > > >
>>> > >> > > > > Also, in case of `Seq`, actually we can just use `Array`
>>> instead
>>> > >> for
>>> > >> > > both
>>> > >> > > > > Scala and Java side simply. I don't find such cases notably
>>> > >> awkward.
>>> > >> > > > > This problematic cases might be specific to few Java
>>> collections
>>> > >> or
>>> > >> > > > > instances, and I would like to avoid an overkill here.
>>> > >> > > > >
>>> > >> > > > > Of course, if there is a place to consider other options,
>>> let's
>>> > >> do. I
>>> > >> > > > > don't like to say this is the only required option.
>>> > >> > > > >
>>> > >> > > > >
>>> > >> > > > >
>>> > >> > > > >
>>> > >> > > > >
>>> > >> > > > > 2020년 4월 28일 (화) 오전 1:18, Ryan Blue
>>> <rb...@netflix.com.invalid>님이
>>> > >> 작성:
>>> > >> > > > >
>>> > >> > > > >> I think the right choice here depends on how the object is
>>> used.
>>> > >> For
>>> > >> > > > >> developer and internal APIs, I think standardizing on Java
>>> > >> collections
>>> > >> > > > >> makes the most sense.
>>> > >> > > > >>
>>> > >> > > > >> For user-facing APIs, it is awkward to return Java
>>> collections to
>>> > >> > > Scala
>>> > >> > > > >> code -- I think that's the motivation for Tom's comment.
>>> For user
>>> > >> > > APIs, I
>>> > >> > > > >> think most methods should return Scala collections, and I
>>> don't
>>> > >> have a
>>> > >> > > > >> strong opinion about whether the conversion (or lack
>>> thereof) is
>>> > >> done
>>> > >> > > in a
>>> > >> > > > >> separate object (#1) or in parallel methods (#3).
>>> > >> > > > >>
>>> > >> > > > >> Both #1 and #3 seem like about the same amount of work and
>>> have
>>> > >> the
>>> > >> > > same
>>> > >> > > > >> likelihood that a developer will leave out a Java method
>>> > >> version. One
>>> > >> > > thing
>>> > >> > > > >> we could do here is use Java collections internally and
>>> make the
>>> > >> > > Scala API
>>> > >> > > > >> a thin wrapper around Java -- like how Python works. Then
>>> adding
>>> > >> a
>>> > >> > > method
>>> > >> > > > >> to the Scala API would require adding it to the Java API
>>> and we
>>> > >> would
>>> > >> > > keep
>>> > >> > > > >> the two more in sync. It would also help avoid Scala
>>> collections
>>> > >> > > leaking
>>> > >> > > > >> into internals.
>>> > >> > > > >>
>>> > >> > > > >> On Mon, Apr 27, 2020 at 8:49 AM Hyukjin Kwon <
>>> > >> gurwls...@gmail.com>
>>> > >> > > wrote:
>>> > >> > > > >>
>>> > >> > > > >>> Let's stick to the less maintenance efforts then rather
>>> than we
>>> > >> > > leave it
>>> > >> > > > >>> undecided and delay with leaving this inconsistency.
>>> > >> > > > >>>
>>> > >> > > > >>> I dont think we can have some very meaningful data about
>>> this
>>> > >> soon
>>> > >> > > given
>>> > >> > > > >>> that we don't hear much complaints about this in general
>>> so far.
>>> > >> > > > >>>
>>> > >> > > > >>> The point of this thread is to make a call rather then
>>> defer to
>>> > >> the
>>> > >> > > > >>> future.
>>> > >> > > > >>>
>>> > >> > > > >>> On Mon, 27 Apr 2020, 23:15 Wenchen Fan, <
>>> cloud0...@gmail.com>
>>> > >> wrote:
>>> > >> > > > >>>
>>> > >> > > > >>>> IIUC We are moving away from having 2 classes for Java
>>> and
>>> > >> Scala,
>>> > >> > > like
>>> > >> > > > >>>> JavaRDD and RDD. It's much simpler to maintain and use
>>> with a
>>> > >> > > single class.
>>> > >> > > > >>>>
>>> > >> > > > >>>> I don't have a strong preference over option 3 or 4. We
>>> may
>>> > >> need to
>>> > >> > > > >>>> collect more data points from actual users.
>>> > >> > > > >>>>
>>> > >> > > > >>>> On Mon, Apr 27, 2020 at 9:50 PM Hyukjin Kwon <
>>> > >> gurwls...@gmail.com>
>>> > >> > > > >>>> wrote:
>>> > >> > > > >>>>
>>> > >> > > > >>>>> Scala users are arguably more prevailing compared to
>>> Java
>>> > >> users,
>>> > >> > > yes.
>>> > >> > > > >>>>> Using the Java instances in Scala side is legitimate,
>>> and
>>> > >> they are
>>> > >> > > > >>>>> already being used in multiple please. I don't believe
>>> Scala
>>> > >> > > > >>>>> users find this not Scala friendly as it's legitimate
>>> and
>>> > >> already
>>> > >> > > > >>>>> being used. I personally find it's more trouble some to
>>> let
>>> > >> Java
>>> > >> > > > >>>>> users to search which APIs to call. Yes, I understand
>>> the
>>> > >> pros and
>>> > >> > > > >>>>> cons - we should also find the balance considering the
>>> actual
>>> > >> > > usage.
>>> > >> > > > >>>>>
>>> > >> > > > >>>>> One more argument from me is, though, I think one of the
>>> > >> goals in
>>> > >> > > > >>>>> Spark APIs is the unified API set up to my knowledge
>>> > >> > > > >>>>>  e.g., JavaRDD <> RDD vs DataFrame.
>>> > >> > > > >>>>> If either way is not particularly preferred over the
>>> other, I
>>> > >> would
>>> > >> > > > >>>>> just choose the one to have the unified API set.
>>> > >> > > > >>>>>
>>> > >> > > > >>>>>
>>> > >> > > > >>>>>
>>> > >> > > > >>>>> 2020년 4월 27일 (월) 오후 10:37, Tom Graves <
>>> tgraves...@yahoo.com>님이
>>> > >> 작성:
>>> > >> > > > >>>>>
>>> > >> > > > >>>>>> I agree a general guidance is good so we keep
>>> consistent in
>>> > >> the
>>> > >> > > apis.
>>> > >> > > > >>>>>> I don't necessarily agree that 4 is the best solution
>>> > >> though.  I
>>> > >> > > agree its
>>> > >> > > > >>>>>> nice to have one api, but it is less friendly for the
>>> scala
>>> > >> side.
>>> > >> > > > >>>>>> Searching for the equivalent Java api shouldn't be
>>> hard as it
>>> > >> > > should be
>>> > >> > > > >>>>>> very close in the name and if we make it a general
>>> rule users
>>> > >> > > should
>>> > >> > > > >>>>>> understand it.   I guess one good question is what API
>>> do
>>> > >> most of
>>> > >> > > our users
>>> > >> > > > >>>>>> use between Java and Scala and what is the ratio?  I
>>> don't
>>> > >> know
>>> > >> > > the answer
>>> > >> > > > >>>>>> to that. I've seen more using Scala over Java.  If the
>>> > >> majority
>>> > >> > > use Scala
>>> > >> > > > >>>>>> then I think the API should be more friendly to that.
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> Tom
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> On Monday, April 27, 2020, 04:04:28 AM CDT, Hyukjin
>>> Kwon <
>>> > >> > > > >>>>>> gurwls...@gmail.com> wrote:
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> Hi all,
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> I would like to discuss Java specific APIs and which
>>> design
>>> > >> we
>>> > >> > > will
>>> > >> > > > >>>>>> choose.
>>> > >> > > > >>>>>> This has been discussed in multiple places so far, for
>>> > >> example, at
>>> > >> > > > >>>>>>
>>> > >>
>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F28085%23discussion_r407334754&amp;data=02%7C01%7C%7C9175b84aa9004ee6da1908d7ec9bea50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238026695625983&amp;sdata=zEYDV0XyvDbeL5YojcdZWHfuJ%2BVOP5%2ByFlbkTFlHPGM%3D&amp;reserved=0
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> *The problem:*
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> In short, I would like us to have clear guidance on
>>> how we
>>> > >> support
>>> > >> > > > >>>>>> Java specific APIs when
>>> > >> > > > >>>>>> it requires to return a Java instance. The problem is
>>> simple:
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> def requests: Map[String, ExecutorResourceRequest] =
>>> ...
>>> > >> > > > >>>>>> def requestsJMap: java.util.Map[String,
>>> > >> ExecutorResourceRequest]
>>> > >> > > = ...
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> vs
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> def requests: java.util.Map[String,
>>> ExecutorResourceRequest]
>>> > >> = ...
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> *Current codebase:*
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> My understanding so far was that the latter is
>>> preferred and
>>> > >> more
>>> > >> > > > >>>>>> consistent and prevailing in the
>>> > >> > > > >>>>>> existing codebase, for example, see
>>> StateOperatorProgress and
>>> > >> > > > >>>>>> StreamingQueryProgress in Structured Streaming.
>>> > >> > > > >>>>>> However, I realised that we also have other approaches
>>> in the
>>> > >> > > current
>>> > >> > > > >>>>>> codebase. There look
>>> > >> > > > >>>>>> four approaches to deal with Java specifics in general:
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>>    1. Java specific classes such as JavaRDD and
>>> > >> JavaSparkContext.
>>> > >> > > > >>>>>>    2. Java specific methods with the same name that
>>> overload
>>> > >> its
>>> > >> > > > >>>>>>    parameters, see functions.scala.
>>> > >> > > > >>>>>>    3. Java specific methods with a different name that
>>> needs
>>> > >> to
>>> > >> > > > >>>>>>    return a different type such as
>>> TaskContext.resourcesJMap
>>> > >> vs
>>> > >> > > > >>>>>>    TaskContext.resources.
>>> > >> > > > >>>>>>    4. One method that returns a Java instance for both
>>> Scala
>>> > >> and
>>> > >> > > > >>>>>>    Java sides. see StateOperatorProgress and
>>> > >> > > StreamingQueryProgress.
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> *Analysis on the current codebase:*
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> I agree with 2. approach because the corresponding
>>> cases
>>> > >> give you
>>> > >> > > a
>>> > >> > > > >>>>>> consistent API usage across
>>> > >> > > > >>>>>> other language APIs in general. Approach 1. is from
>>> the old
>>> > >> world
>>> > >> > > > >>>>>> when we didn't have unified APIs.
>>> > >> > > > >>>>>> This might be the worst approach.
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> 3. and 4. are controversial.
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> For 3., if you have to use Java APIs, then, you should
>>> > >> search if
>>> > >> > > > >>>>>> there is a variant of that API
>>> > >> > > > >>>>>> every time specifically for Java APIs. But yes, it
>>> gives you
>>> > >> > > > >>>>>> Java/Scala friendly instances.
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> For 4., having one API that returns a Java instance
>>> makes you
>>> > >> > > able to
>>> > >> > > > >>>>>> use it in both Scala and Java APIs
>>> > >> > > > >>>>>> sides although it makes you call asScala in Scala side
>>> > >> > > specifically.
>>> > >> > > > >>>>>> But you don’t
>>> > >> > > > >>>>>> have to search if there’s a variant of this API and it
>>> gives
>>> > >> you a
>>> > >> > > > >>>>>> consistent API usage across languages.
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> Also, note that calling Java in Scala is legitimate
>>> but the
>>> > >> > > opposite
>>> > >> > > > >>>>>> case is not, up to my best knowledge.
>>> > >> > > > >>>>>> In addition, you should have a method that returns a
>>> Java
>>> > >> instance
>>> > >> > > > >>>>>> for PySpark or SparkR to support.
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> *Proposal:*
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> I would like to have a general guidance on this that
>>> the
>>> > >> Spark dev
>>> > >> > > > >>>>>> agrees upon: Do 4. approach. If not possible, do 3.
>>> Avoid 1
>>> > >> > > almost at all
>>> > >> > > > >>>>>> cost.
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> Note that this isn't a hard requirement but *a general
>>> > >> guidance*;
>>> > >> > > > >>>>>> therefore, the decision might be up to
>>> > >> > > > >>>>>> the specific context. For example, when there are some
>>> strong
>>> > >> > > > >>>>>> arguments to have a separate Java specific API, that’s
>>> fine.
>>> > >> > > > >>>>>> Of course, we won’t change the existing methods given
>>> > >> Micheal’s
>>> > >> > > > >>>>>> rubric added before. I am talking about new
>>> > >> > > > >>>>>> methods in unreleased branches.
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>> Any concern or opinion on this?
>>> > >> > > > >>>>>>
>>> > >> > > > >>>>>
>>> > >> > > > >>
>>> > >> > > > >> --
>>> > >> > > > >> Ryan Blue
>>> > >> > > > >> Software Engineer
>>> > >> > > > >> Netflix
>>> > >> > > > >>
>>> > >> > > > >
>>> > >> > >
>>> > >>
>>> > >>
>>>
>>

Re: [DISCUSS] Java specific APIs design concern and choice

Reply via email to