Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2020-04-27 Thread Hyukjin Kwon
Maybe it's time to switch. Do you know if we can still link the JIRA against Github? The script used to change the status of JIRA too but it stopped working for a long time - I suspect this isn't a big deal. 2020년 4월 25일 (토) 오전 10:31, Nicholas Chammas 님이 작성: > Have we asked Infra recently about

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
The problem is that calling Scala instances in Java side is discouraged in general up to my best knowledge. A Java user won't likely know asJava in Scala but a Scala user will likely know both asScala and asJava. 2020년 4월 28일 (화) 오전 11:35, ZHANG Wei 님이 작성: > How about making a small change on

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread ZHANG Wei
How about making a small change on option 4: Keep Scala API returning Scala type instance with providing a `asJava` method to return a Java type instance. Scala 2.13 has provided CollectionConverter [1][2][3], in the following Spark dependences upgrade, which can be supported by nature. For

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
I would like to make sure I am open for other options that can be considered situationally and based on the context. It's okay, and I don't target to restrict this here. For example, DSv2, I understand it's written in Java because Java interfaces arguably brings better performance. That's why

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
> One thing we could do here is use Java collections internally and make the Scala API a thin wrapper around Java -- like how Python works. > Then adding a method to the Scala API would require adding it to the Java API and we would keep the two more in sync. I think it can be an appropriate idea

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Ryan Blue
I think the right choice here depends on how the object is used. For developer and internal APIs, I think standardizing on Java collections makes the most sense. For user-facing APIs, it is awkward to return Java collections to Scala code -- I think that's the motivation for Tom's comment. For

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
Let's stick to the less maintenance efforts then rather than we leave it undecided and delay with leaving this inconsistency. I dont think we can have some very meaningful data about this soon given that we don't hear much complaints about this in general so far. The point of this thread is to

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Wenchen Fan
IIUC We are moving away from having 2 classes for Java and Scala, like JavaRDD and RDD. It's much simpler to maintain and use with a single class. I don't have a strong preference over option 3 or 4. We may need to collect more data points from actual users. On Mon, Apr 27, 2020 at 9:50 PM

Re: contributor permission

2020-04-27 Thread Sean Owen
You don't need any status or permission to contribute. We only have that role in order to assign completed JIRAs. Don't worry about it. Do, however, read https://spark.apache.org/contributing.html first On Mon, Apr 27, 2020 at 9:03 AM qq邮箱 <1044913...@qq.com> wrote: > > > Hi Guys, > > I want to

contributor permission

2020-04-27 Thread qq邮箱
Hi Guys, I want to contribute to Apache Spark. Would you please give me the permission as a contributor? My JIRA ID is Fay.

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
Scala users are arguably more prevailing compared to Java users, yes. Using the Java instances in Scala side is legitimate, and they are already being used in multiple please. I don't believe Scala users find this not Scala friendly as it's legitimate and already being used. I personally find it's

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Tom Graves
I agree a general guidance is good so we keep consistent in the apis. I don't necessarily agree that 4 is the best solution though.  I agree its nice to have one api, but it is less friendly for the scala side.  Searching for the equivalent Java api shouldn't be hard as it should be very close

unsubscribe

2020-04-27 Thread Hongbin Liu
unsubscribe This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Sean Owen
The guidance sounds fine, if the general message is 'keep it simple'. The right approach might be pretty situational. For example RDD has a lot of methods that need a Java variant. Putting all the overloads in one class might be harder to figure out than making a separate return type with those

[DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
Hi all, I would like to discuss Java specific APIs and which design we will choose. This has been discussed in multiple places so far, for example, at https://github.com/apache/spark/pull/28085#discussion_r407334754 *The problem:* In short, I would like us to have clear guidance on how we

Re: ShuffleMapStage and pendingPartitions vs isAvailable or findMissingPartitions?

2020-04-27 Thread ZHANG Wei
AFAICT, not must have `pendingPartitions`, `mapOutputTrackerMaster` is added by a later change, `pendingPartitions` can be cleaned up. -- Cheers, -z On Sun, 26 Apr 2020 11:53:09 +0200 Jacek Laskowski wrote: > Hi, > > I found that ShuffleMapStage has this (apparently superfluous) >