[VOTE][RESULT] Spark 2.3.3 (RC2)

2019-02-13 Thread Takeshi Yamamuro
Hi, All.

The vote passes. Thanks to all who helped with this release 2.3.3!
I'll follow up later with a release announcement once everything is
published.

+1 (* = binding):

Sean Owen*
Dongjoon Hyun
John Zhuge
Mark Hamstra*
Hyukjin Kwon
Felix Cheung*
Marcelo Vanzin*

+0: None

-1: None

Thanks,
Takeshi


-- 
---
Takeshi Yamamuro


Re: I want to contribute to Apache Spark.

2019-02-13 Thread Marco Gaido
Hi,

You need no permissions to start contributing to Spark. Just start working
on the JIRAs you want and submit a PR for them. You will be added to the
contributors in JIRA once your PR gets merged and you are assigned the
related JIRA. For more information, please refer to the contributing page
on the website.

Thanks,
Looking forward to see your PRs.
Marco

On Thu, 14 Feb 2019, 06:32 wangfei 
> Hi Guys,
>
> I want to contribute to Apache Spark.
> Would you please give me the permission as a contributor?
> My JIRA ID is feiwang.
> hzfeiwang
> hzfeiw...@163.com
>
> 
> 签名由 网易邮箱大师  定制
>
>


Re: Time to cut an Apache 2.4.1 release?

2019-02-13 Thread Darcy Shen
We found that ORC table created by Spark 2.4 failed to be read by Hive 2.1.1.





spark-sql -e 'CREATE TABLE tmp.orcTable2 USING orc  AS SELECT * FROM 
tmp.orcTable1 limit 10;'

hive -e 'select * from tmp.orcTable2'



The ERROR messages by Hive:



Failed with exception java.io.IOException:java.lang.RuntimeException: ORC split 
generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 6



And Spark 2.3.2 (or below) works fine.



I think we should git revert [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2 
by Dongjoon Hyun





 On Tue, 12 Feb 2019 16:56:09 +0800 Dongjin Lee  wrote 





> SPARK-23539 is a non-trivial improvement, so probably would not be 
> back-ported to 2.4.x.



Got it. It seems reasonable.



Committers:



Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this feature.



Thanks,

Dongjin





On Tue, Feb 12, 2019 at 1:50 PM Takeshi Yamamuro  
wrote:








-- 

Dongjin Lee




A hitchhiker in the mathematical world.




github:http://goog_969573159/https://github.com/dongjinleekr

linkedin: https://kr.linkedin.com/in/dongjinleekr


speakerdeck: https://speakerdeck.com/dongjin










+1, too.

branch-2.4 accumulates too many commits..:

https://github.com/apache/spark/compare/0a4c03f7d084f1d2aa48673b99f3b9496893ce8d...af3c7111efd22907976fc8bbd7810fe3cfd92092





On Tue, Feb 12, 2019 at 12:36 PM Dongjoon Hyun  
wrote:

Thank you, DB.

 

 +1, Yes. It's time for preparing 2.4.1 release.

 

 Bests,

 Dongjoon.

 

 On 2019/02/12 03:16:05, Sean Owen  wrote: 

 > I support a 2.4.1 release now, yes.

 > 

 > SPARK-23539 is a non-trivial improvement, so probably would not be

 > back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could

 > be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for

 > it, but it could go in if otherwise ready.

 > 

 > 

 > On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee  
 > wrote:

 > >

 > > Hi DB,

 > >

 > > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a little bit 
 > > ago, but it has not included in 2.3.0 nor get enough review.

 > >

 > > Thanks,

 > > Dongjin

 > >

 > > [^1]: https://issues.apache.org/jira/browse/SPARK-23539

 > > [^2]: https://github.com/apache/spark/pull/22282

 > >

 > > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim  
 > > wrote:

 > >>

 > >> Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I 
 > >> hope it can be reviewed and included within Spark 2.4.1 - otherwise it 
 > >> will be a long-live correctness issue.

 > >>

 > >> Thanks,

 > >> Jungtaek Lim (HeartSaVioR)

 > >>

 > >> 1. https://issues.apache.org/jira/browse/SPARK-26154

 > >> 2. https://github.com/apache/spark/pull/23634

 > >>

 > >>

 > >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:

 > >>>

 > >>> Hello all,

 > >>>

 > >>> I am preparing to cut a new Apache 2.4.1 release as there are many bugs 
 > >>> and correctness issues fixed in branch-2.4.

 > >>>

 > >>> The list of addressed issues are 
 > >>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC

 > >>>

 > >>> Let me know if you have any concern or any PR you would like to get in.

 > >>>

 > >>> Thanks!

 > >>>

 > >>> -

 > >>> To unsubscribe e-mail: mailto:dev-unsubscr...@spark.apache.org

 > >>>

 > >

 > >

 > > --

 > > Dongjin Lee

 > >

 > > A hitchhiker in the mathematical world.

 > >

 > > github: http://github.com/dongjinleekr

 > > linkedin: http://kr.linkedin.com/in/dongjinleekr

 > > speakerdeck: http://speakerdeck.com/dongjin

 > 

 > -

 > To unsubscribe e-mail: mailto:dev-unsubscr...@spark.apache.org

 > 

 > 

 

 -

 To unsubscribe e-mail: mailto:dev-unsubscr...@spark.apache.org

 







-- 

---

Takeshi Yamamuro

I want to contribute to Apache Spark.

2019-02-13 Thread wangfei


Hi Guys,

I want to contribute to Apache Spark.
Would you please give me the permission as a contributor?
My JIRA ID is feiwang.
| |
hzfeiwang
|
|
hzfeiw...@163.com
|
签名由网易邮箱大师定制



Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-13 Thread Takeshi Yamamuro
Hi, all

We already have enough binding +1 now, so I'll close the vote (passed) in a
few hours.
If any problem, please let me know.

Thanks,
Takeshi

On Tue, Feb 12, 2019 at 4:59 AM Marcelo Vanzin  wrote:

> +1. Ran our regression tests for YARN and Hive, all look good.
>
> On Tue, Feb 5, 2019 at 5:07 PM Takeshi Yamamuro 
> wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 2.3.3.
> >
> > The vote is open until February 8 6:00PM (PST) and passes if a majority
> +1 PMC votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 2.3.3
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.3.3-rc2 (commit
> 66fd9c34bf406a4b5f86605d06c9607752bd637a):
> > https://github.com/apache/spark/tree/v2.3.3-rc2
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1298/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-docs/
> >
> > The list of bug fixes going into 2.3.3 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12343759
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.3.3?
> > ===
> >
> > The current list of open tickets targeted at 2.3.3 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.3.3
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> > P.S.
> > I checked all the tests passed in the Amazon Linux 2 AMI;
> > $ java -version
> > openjdk version "1.8.0_191"
> > OpenJDK Runtime Environment (build 1.8.0_191-b12)
> > OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
> > $ ./build/mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos
> -Psparkr test
> >
> > --
> > ---
> > Takeshi Yamamuro
>
>
>
> --
> Marcelo
>


-- 
---
Takeshi Yamamuro


Re: Apache Spark git repo moved to gitbox.apache.org

2019-02-13 Thread Sean Owen
Yes, we all need to be using one remote, and that should be github.
The website is authoritative. this became clear after the initial
email. I apologize as I don't think this was made extra clear to all
committers, and it is important. I see that there are some checks in
the sync to even deal with this case (and rebase? hm, what if they
conflict?) but certainly we should all push to one upstream repo.

Do we have a sense of who might have pushed to gitbox, to make sure
they get the message?
Right now they seem consistent again, so I can't ID the commit that
went just to gitbox. That will have to be replayed.

https://gitbox.apache.org/repos/asf?p=spark.git
https://github.com/apache/spark/commits/master


On Wed, Feb 13, 2019 at 12:34 AM Xiao Li  wrote:
>
> The above instruction is different from what the website document: 
> https://github.com/apache/spark-website/commit/92606b2e7849b9d743ef2a8176438142420a83e5#diff-17faa4bab13b7530a3e1b627bb798ad0
>
> Some committers are using gitbox, but the others are following the website 
> instruction and using github.
>
> Due to the mismatch, gitbox and github becomes inconsistent. I opened an 
> infra ticket. https://issues.apache.org/jira/browse/INFRA-17842 Hopefully, it 
> can be fixed soon. We should let all the committers follow the same way; 
> otherwise, it could break the commit history easily.
>
> Xiao
>
>
>
>
> Sean Owen  于2018年12月10日周一 上午8:30写道:
>>
>> Per the thread last week, the Apache Spark repos have migrated from
>> https://git-wip-us.apache.org/repos/asf to
>> https://gitbox.apache.org/repos/asf
>>
>>
>> Non-committers:
>>
>> This just means repointing any references to the old repository to the
>> new one. It won't affect you if you were already referencing
>> https://github.com/apache/spark .
>>
>>
>> Committers:
>>
>> Follow the steps at https://reference.apache.org/committer/github to
>> fully sync your ASF and Github accounts, and then wait up to an hour
>> for it to finish.
>>
>> Then repoint your git-wip-us remotes to gitbox in your git checkouts.
>> For our standard setup that works with the merge script, that should
>> be your 'apache' remote. For example here are my current remotes:
>>
>> $ git remote -v
>> apache https://gitbox.apache.org/repos/asf/spark.git (fetch)
>> apache https://gitbox.apache.org/repos/asf/spark.git (push)
>> apache-github git://github.com/apache/spark (fetch)
>> apache-github git://github.com/apache/spark (push)
>> origin https://github.com/srowen/spark (fetch)
>> origin https://github.com/srowen/spark (push)
>> upstream https://github.com/apache/spark (fetch)
>> upstream https://github.com/apache/spark (push)
>>
>> In theory we also have read/write access to github.com now too, but
>> right now it hadn't yet worked for me. It may need to sync. This note
>> just makes sure anyone knows how to keep pushing commits right now to
>> the new ASF repo.
>>
>> Report any problems here!
>>
>> Sean
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Subscribe

2019-02-13 Thread Rafael Mendes