Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-11 Thread Hyukjin Kwon
Had a short sync with Tom. I am going to postpone this for now since this case is very unlikely - I have seen this twice for the last 5 years. We'll go for a vote when we happen to see this more, and make a decision based on the feedback in the vote thread. 2020년 5월 11일 (월) 오후 11:08, Hyukjin

Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-11 Thread Hyukjin Kwon
The guide is our official guide, see "Code Style Guide" in http://spark.apache.org/contributing.html. As I said this is a general guidance, instead of a hard strict policy. I don't target to change existing APIs either. I would like to not start the vote when I see the clear objection to address,

Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-11 Thread Tom Graves
So as I've already stated and it looks like 2 others have issues with number 4 as written as well, I'm against you posting this as is.  I do not think we should recommend 4 for public user facing Scala API. Also note the page you linked is a Databricks page, while I know we reference it as a

Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-11 Thread Hyukjin Kwon
I will wait a couple of more days and if there's no objection I hear, I will document this at https://github.com/databricks/scala-style-guide#java-interoperability. 2020년 5월 7일 (목) 오후 9:18, Hyukjin Kwon 님이 작성: > Hi all, I would like to proceed this. Are there more thoughts on this? If > not, I

Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-07 Thread Hyukjin Kwon
Hi all, I would like to proceed this. Are there more thoughts on this? If not, I would like to go ahead with the proposal here. 2020년 4월 30일 (목) 오후 10:54, Hyukjin Kwon 님이 작성: > Nothing is urgent. I just don't want to leave it undecided and just keep > adding Java APIs inconsistently as it's

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-30 Thread Hyukjin Kwon
Nothing is urgent. I just don't want to leave it undecided and just keep adding Java APIs inconsistently as it's currently happening. We should have a set of coherent APIs. It's very difficult to change APIs once they are out in releases. I guess I have seen people here agree with having a

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-30 Thread ZHANG Wei
I feel a little pushed... :-) I still don't get the point of why it's urgent to make the decision now. AFAIK, it's a common practice to handle Scala types conversions by self when Java programmers prepare to invoke Scala libraries. I'm not sure which one is the Java programmers' root complaint,

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-29 Thread Hyukjin Kwon
There was a typo in the previous email. I am re-sending: Hm, I thought you meant you prefer 3. over 4 but don't mind particularly. I don't mean to wait for more feedback. It looks likely just a deadlock which will be the worst case. I was suggesting to pick one way first, and stick to it. If we

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-29 Thread Hyukjin Kwon
Hm, I thought you meant you prefer 3. over 4 but don't mind particularly. I don't mean to wait for more feedback. It looks likely just a deadlock which will be the worst case. I was suggesting to pick one way first, and stick to it. If we find out something later, we can discuss more about

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-29 Thread Tom Graves
Sorry I'm not sure what your last email means. Does it mean you are putting it up for a vote or just waiting to get more feedback?  I disagree with saying option 4 is the rule but agree having a general rule makes sense.  I think we need a lot more input to make the rule as it affects the

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-29 Thread Hyukjin Kwon
I think I am not seeing explicit objection here but rather see people tend to agree with the proposal in general. I would like to step forward rather than leaving it as a deadlock - the worst choice here is to postpone and abandon this discussion with this inconsistency. I don't currently target

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-28 Thread Hyukjin Kwon
Spark has targeted to have a unified API set rather than having separate Java classes to reduce the maintenance cost, e.g.) JavaRDD <> RDD vs DataFrame. These JavaXXX are more about the legacy. I think it's best to stick to the approach 4. in general cases. Other options might have to be

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-28 Thread Reynold Xin
The con is much more than just more effort to maintain a parallel API. It puts the burden for all libraries and library developers to maintain a parallel API as well. That’s one of the primary reasons we moved away from this RDD vs JavaRDD approach in the old RDD API. On Tue, Apr 28, 2020 at

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-28 Thread ZHANG Wei
Be frankly, I also love the pure Java type in Java API and Scala type in Scala API. :-) If we don't treat Java as a "FRIEND" of Scala, just as Python, maybe we can adopt the status of option 1, the specific Java classes. (But I don't like the `Java` prefix, which is redundant when I'm coding Java

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
The problem is that calling Scala instances in Java side is discouraged in general up to my best knowledge. A Java user won't likely know asJava in Scala but a Scala user will likely know both asScala and asJava. 2020년 4월 28일 (화) 오전 11:35, ZHANG Wei 님이 작성: > How about making a small change on

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread ZHANG Wei
How about making a small change on option 4: Keep Scala API returning Scala type instance with providing a `asJava` method to return a Java type instance. Scala 2.13 has provided CollectionConverter [1][2][3], in the following Spark dependences upgrade, which can be supported by nature. For

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
I would like to make sure I am open for other options that can be considered situationally and based on the context. It's okay, and I don't target to restrict this here. For example, DSv2, I understand it's written in Java because Java interfaces arguably brings better performance. That's why

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
> One thing we could do here is use Java collections internally and make the Scala API a thin wrapper around Java -- like how Python works. > Then adding a method to the Scala API would require adding it to the Java API and we would keep the two more in sync. I think it can be an appropriate idea

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Ryan Blue
I think the right choice here depends on how the object is used. For developer and internal APIs, I think standardizing on Java collections makes the most sense. For user-facing APIs, it is awkward to return Java collections to Scala code -- I think that's the motivation for Tom's comment. For

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
Let's stick to the less maintenance efforts then rather than we leave it undecided and delay with leaving this inconsistency. I dont think we can have some very meaningful data about this soon given that we don't hear much complaints about this in general so far. The point of this thread is to

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Wenchen Fan
IIUC We are moving away from having 2 classes for Java and Scala, like JavaRDD and RDD. It's much simpler to maintain and use with a single class. I don't have a strong preference over option 3 or 4. We may need to collect more data points from actual users. On Mon, Apr 27, 2020 at 9:50 PM

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
Scala users are arguably more prevailing compared to Java users, yes. Using the Java instances in Scala side is legitimate, and they are already being used in multiple please. I don't believe Scala users find this not Scala friendly as it's legitimate and already being used. I personally find it's

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Tom Graves
I agree a general guidance is good so we keep consistent in the apis. I don't necessarily agree that 4 is the best solution though.  I agree its nice to have one api, but it is less friendly for the scala side.  Searching for the equivalent Java api shouldn't be hard as it should be very close

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Sean Owen
The guidance sounds fine, if the general message is 'keep it simple'. The right approach might be pretty situational. For example RDD has a lot of methods that need a Java variant. Putting all the overloads in one class might be harder to figure out than making a separate return type with those

[DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
Hi all, I would like to discuss Java specific APIs and which design we will choose. This has been discussed in multiple places so far, for example, at https://github.com/apache/spark/pull/28085#discussion_r407334754 *The problem:* In short, I would like us to have clear guidance on how we