Re: Spark JIRA tags clarification and management

2018-09-06 Thread Reynold Xin
Yup I sometimes use it. I think quite a few others do. It might've been called out in the contributor guide too. On Thu, Sep 6, 2018 at 8:54 PM Sean Owen wrote: > I believe 'starter' is still the standard tag for simple issues for > newcomers. > > On Thu, Sep 6, 2018 at 8:46 PM Hyukjin Kwon

Re: Spark JIRA tags clarification and management

2018-09-06 Thread Sean Owen
I believe 'starter' is still the standard tag for simple issues for newcomers. On Thu, Sep 6, 2018 at 8:46 PM Hyukjin Kwon wrote: > Does anyone know if we still user starter or newbie tags as well? >

Re: Spark JIRA tags clarification and management

2018-09-06 Thread Hyukjin Kwon
Does anyone know if we still user starter or newbie tags as well? 2018년 9월 4일 (화) 오후 10:00, Kazuaki Ishizaki 님이 작성: > Of course, we would like to eliminate all of the following tags > > "flanky" or "flankytest" > > Kazuaki Ishizaki > > > > From:Hyukjin Kwon > To:dev > Cc:

RE: [VOTE] SPARK 2.3.2 (RC5)

2018-09-06 Thread Sharanabasappa G Keriwaddi
Hi – Are there any blocking issues open for 2.3.2? 2.3.1 had few critical issues, I feel it would be better to publish 2.3.2 with all those critical bug fixes. Thanks and Regards Sharan From: Saisai Shao [mailto:sai.sai.s...@gmail.com] Sent: 07 September 2018 08:30 To: 441586683

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-09-06 Thread Saisai Shao
Hi, PMC members asked me to hold a bit while they're dealing with some other things. Please wait for a bit while. Thanks Saisai zzc <441586...@qq.com> 于2018年9月6日周四 下午4:27写道: > Hi Saisai: > Spark 2.4 was cut, and is there any new process on 2.3.2? > > > > -- > Sent from:

Re: time for Apache Spark 3.0?

2018-09-06 Thread Matei Zaharia
Yes, you can start with Unstable and move to Evolving and Stable when needed. We’ve definitely had experimental features that changed across maintenance releases when they were well-isolated. If your change risks breaking stuff in stable components of Spark though, then it probably won’t be

Re: time for Apache Spark 3.0?

2018-09-06 Thread Ryan Blue
I meant flexibility beyond the point releases. I think what Reynold was suggesting was getting v2 code out more often than the point releases every 6 months. An Evolving API can change in point releases, but maybe we should move v2 to Unstable so it can change more often? I don't really see

Re: time for Apache Spark 3.0?

2018-09-06 Thread Mark Hamstra
Yes, that is why we have these annotations in the code and the corresponding labels appearing in the API documentation: https://github.com/apache/spark/blob/master/common/tags/src/main/java/org/apache/spark/annotation/InterfaceStability.java As long as it is properly annotated, we can change or

Re: Branch 2.4 is cut

2018-09-06 Thread Dongjoon Hyun
Great for branch cut and Scala 2.12 build. We also need to add `branch-2.4` to our Jenkins dashboard to prevent any regression. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ Bests, Dongjoon. On Thu, Sep 6, 2018 at 6:56 AM Wenchen Fan wrote: > Good news! I'll

Re: data source api v2 refactoring

2018-09-06 Thread Ryan Blue
Wenchen, I'm not really sure what you're proposing here. What is a `LogicalWrite`? Is it something that mirrors the read side in your PR? I think that I agree that if we have a Write independent of the Table that carries the commit and abort methods, then we can create it directly without a

Re: time for Apache Spark 3.0?

2018-09-06 Thread sadhen
I’d like to see an independent Spark Catalyst without Spark Core and Hadoop dependencies in Spark 3.0 . I created Enzyme (A Spark SQL compatible SQL engine that depends on Spark Catalyst) in Wacai for performance reason in a non-distribute scenario. Enzyme is a simplified version of Spark

Re: time for Apache Spark 3.0?

2018-09-06 Thread Ryan Blue
It would be great to get more features out incrementally. For experimental features, do we have more relaxed constraints? On Thu, Sep 6, 2018 at 9:47 AM Reynold Xin wrote: > +1 on 3.0 > > Dsv2 stable can still evolve in across major releases. DataFrame, Dataset, > dsv1 and a lot of other major

Re: time for Apache Spark 3.0?

2018-09-06 Thread Reynold Xin
I definitely agree we shouldn't make dsv2 stable in the next release. On Thu, Sep 6, 2018 at 9:48 AM Ryan Blue wrote: > I definitely support moving to 3.0 to remove deprecations and update > dependencies. > > For the v2 work, we know that there will be a major API changes and > standardization

Re: time for Apache Spark 3.0?

2018-09-06 Thread Ryan Blue
I definitely support moving to 3.0 to remove deprecations and update dependencies. For the v2 work, we know that there will be a major API changes and standardization of behavior from the new logical plans going into the next release. I think it is a safe bet that this isn’t going to be

Re: time for Apache Spark 3.0?

2018-09-06 Thread Reynold Xin
+1 on 3.0 Dsv2 stable can still evolve in across major releases. DataFrame, Dataset, dsv1 and a lot of other major features all were developed throughout the 1.x and 2.x lines. I do want to explore ways for us to get dsv2 incremental changes out there more frequently, to get feedback. Maybe that

Re: time for Apache Spark 3.0?

2018-09-06 Thread Sean Owen
I think this doesn't necessarily mean 3.0 is coming soon (thoughts on timing? 6 months?) but simply next. Do you mean you'd prefer that change to happen before 3.x? if it's a significant change, seems reasonable for a major version bump rather than minor. Is the concern that tying it to 3.0 means

Re: time for Apache Spark 3.0?

2018-09-06 Thread Ryan Blue
My concern is that the v2 data source API is still evolving and not very close to stable. I had hoped to have stabilized the API and behaviors for a 3.0 release. But we could also wait on that for a 4.0 release, depending on when we think that will be. Unless there is a pressing need to move to

Re: time for Apache Spark 3.0?

2018-09-06 Thread Xiao Li
Yesterday, the 2.4 branch was created. Based on the above discussion, I think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern? Thanks, Xiao vaquar khan 于2018年6月16日周六 上午10:21写道: > +1 for 2.4 next, followed by 3.0. > > Where we can get Apache Spark road map for 2.4 and 2.5

Re: python test infrastructure

2018-09-06 Thread Imran Rashid
> On Wed, Sep 5, 2018 at 11:59 PM Hyukjin Kwon wrote: > > > > > 1. all of the output in target/test-reports & python/unit-tests.log should be included in the jenkins archived artifacts. > > > > Hmmm, I thought they are already archived (

Re: Branch 2.4 is cut

2018-09-06 Thread Wenchen Fan
Good news! I'll try and update you later. Thanks! On Thu, Sep 6, 2018 at 9:44 PM Sean Owen wrote: > BTW it does appear the Scala 2.12 build works now: > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/229/ > > Let's try

Re: Datasource v2 Select Into support

2018-09-06 Thread Wenchen Fan
Data source v2 catalog support(table/view) is still in progress. There are several threads in the dev list discussing it, please join the discussion if you are interested. Thanks for trying! On Thu, Sep 6, 2018 at 7:23 PM Ross Lawley wrote: > Hi, > > I hope this is the correct mailinglist. I've

Re: Branch 2.4 is cut

2018-09-06 Thread Sean Owen
BTW it does appear the Scala 2.12 build works now: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/229/ Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me

Pool Information Details cannot be accessed from HistoryServer UI

2018-09-06 Thread Sandeep Katta
[image: image.png] But From the HistoryServer for the same application it throws exception as *Unknown pool* Code which is throwing exception *// For now, pool information is only accessible in live UIs **val *pool = parent.*sc*.flatMap(_.getPoolForName(poolName)).getOrElse { *throw new

Datasource v2 Select Into support

2018-09-06 Thread Ross Lawley
Hi, I hope this is the correct mailinglist. I've been adding v2 support to the MongoDB Spark connector using Spark 2.3.1. I've noticed one of my tests pass when using the original DefaultSource but errors with my v2 implementation: The code I'm running is: val df = spark.loadDS[Character]()

How to parallelize JDBC Read in Spark

2018-09-06 Thread Chetan Khatri
Hello Dev Users, I am struggling to parallelize JDBC Read in Spark, It is using 1 - 2 task only to read data and taking so much of time to read. Ex. val invoiceLineItemDF = ((spark.read.jdbc(url = t360jdbcURL, table = invoiceLineItemQuery, columnName = "INVOICE_LINE_ITEM_ID", lowerBound =

Re: Branch 2.4 is cut

2018-09-06 Thread Hyukjin Kwon
Thanks, Wenchen. 2018년 9월 6일 (목) 오후 3:32, Wenchen Fan 님이 작성: > Hi all, > > I've cut the branch-2.4 since all the major blockers are resolved. If no > objections I'll shortly followup with an RC to get the QA started in > parallel. > > Committers, please only merge PRs to branch-2.4 that are bug

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-09-06 Thread zzc
Hi Saisai: Spark 2.4 was cut, and is there any new process on 2.3.2? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Branch 2.4 is cut

2018-09-06 Thread Wenchen Fan
Hi all, I've cut the branch-2.4 since all the major blockers are resolved. If no objections I'll shortly followup with an RC to get the QA started in parallel. Committers, please only merge PRs to branch-2.4 that are bug fixes, performance regression fixes, document changes, or test suites