What does everyone think about getting some of the newer DataSourceV2 improvements in? It should be low risk because it is a new code path, and v2 isn't very usable without things like support for using the output commit coordinator to deconflict writes.
The ones I'd like to get in are: * Use the output commit coordinator: https://issues.apache.org/jira/browse/SPARK-23323 * Use immutable trees and the same push-down logic as other read paths: https://issues.apache.org/jira/browse/SPARK-23203 * Don't allow users to supply schemas when they aren't supported: https://issues.apache.org/jira/browse/SPARK-23418 I think it would make the 2.3.0 release more usable for anyone interested in the v2 read and write paths. Thanks! On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu <weichen...@databricks.com> wrote: > +1 > > On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> Done, thanks! >> >> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal <samee...@apache.org> >> wrote: >> > Sure, please feel free to backport. >> > >> > On 20 February 2018 at 18:02, Marcelo Vanzin <van...@cloudera.com> >> wrote: >> >> >> >> Hey Sameer, >> >> >> >> Mind including https://github.com/apache/spark/pull/20643 >> >> (SPARK-23468) in the new RC? It's a minor bug since I've only hit it >> >> with older shuffle services, but it's pretty safe. >> >> >> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal <samee...@apache.org> >> >> wrote: >> >> > This RC has failed due to >> >> > https://issues.apache.org/jira/browse/SPARK-23470. >> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll >> follow >> >> > up >> >> > with an RC5 soon. >> >> > >> >> > On 20 February 2018 at 16:49, Ryan Blue <rb...@netflix.com> wrote: >> >> >> >> >> >> +1 >> >> >> >> >> >> Build & tests look fine, checked signature and checksums for src >> >> >> tarball. >> >> >> >> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu >> >> >> <shixi...@databricks.com> wrote: >> >> >>> >> >> >>> I'm -1 because of the UI regression >> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs >> page >> >> >>> may be >> >> >>> too slow and cause "read timeout" when there are lots of jobs and >> >> >>> stages. >> >> >>> This is one of the most important pages because when it's broken, >> it's >> >> >>> pretty hard to use Spark Web UI. >> >> >>> >> >> >>> >> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido < >> marcogaid...@gmail.com> >> >> >>> wrote: >> >> >>>> >> >> >>>> +1 >> >> >>>> >> >> >>>> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon <gurwls...@gmail.com>: >> >> >>>>> >> >> >>>>> +1 too >> >> >>>>> >> >> >>>>> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <ues...@happy-camper.st >> >: >> >> >>>>>> >> >> >>>>>> +1 >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang >> >> >>>>>> <jiangxb1...@gmail.com> >> >> >>>>>> wrote: >> >> >>>>>>> >> >> >>>>>>> +1 >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> Wenchen Fan <cloud0...@gmail.com>于2018年2月20日 周二下午1:09写道: >> >> >>>>>>>> >> >> >>>>>>>> +1 >> >> >>>>>>>> >> >> >>>>>>>> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin >> >> >>>>>>>> <r...@databricks.com> >> >> >>>>>>>> wrote: >> >> >>>>>>>>> >> >> >>>>>>>>> +1 >> >> >>>>>>>>> >> >> >>>>>>>>> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal >> >> >>>>>>>>> <sameer.a...@gmail.com>, wrote: >> >> >>>>>>>>>> >> >> >>>>>>>>>> this file shouldn't be included? >> >> >>>>>>>>>> >> >> >>>>>>>>>> https://dist.apache.org/repos/ >> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I've now deleted this file >> >> >>>>>>>>> >> >> >>>>>>>>>> From: Sameer Agarwal <sameer.a...@gmail.com> >> >> >>>>>>>>>> Sent: Saturday, February 17, 2018 1:43:39 PM >> >> >>>>>>>>>> To: Sameer Agarwal >> >> >>>>>>>>>> Cc: dev >> >> >>>>>>>>>> Subject: Re: [VOTE] Spark 2.3.0 (RC4) >> >> >>>>>>>>>> >> >> >>>>>>>>>> I'll start with a +1 once again. >> >> >>>>>>>>>> >> >> >>>>>>>>>> All blockers reported against RC3 have been resolved and the >> >> >>>>>>>>>> builds are healthy. >> >> >>>>>>>>>> >> >> >>>>>>>>>> On 17 February 2018 at 13:41, Sameer Agarwal >> >> >>>>>>>>>> <samee...@apache.org> >> >> >>>>>>>>>> wrote: >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> Please vote on releasing the following candidate as Apache >> >> >>>>>>>>>>> Spark >> >> >>>>>>>>>>> version 2.3.0. The vote is open until Thursday February 22, >> >> >>>>>>>>>>> 2018 at 8:00:00 >> >> >>>>>>>>>>> am UTC and passes if a majority of at least 3 PMC +1 votes >> are >> >> >>>>>>>>>>> cast. >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.3.0 >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> [ ] -1 Do not release this package because ... >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> To learn more about Apache Spark, please see >> >> >>>>>>>>>>> https://spark.apache.org/ >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> The tag to be voted on is v2.3.0-rc4: >> >> >>>>>>>>>>> https://github.com/apache/spark/tree/v2.3.0-rc4 >> >> >>>>>>>>>>> (44095cb65500739695b0324c177c19dfa1471472) >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> List of JIRA tickets resolved in this release can be found >> >> >>>>>>>>>>> here: >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> https://issues.apache.org/jira >> /projects/SPARK/versions/12339551 >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> The release files, including signatures, digests, etc. can >> be >> >> >>>>>>>>>>> found at: >> >> >>>>>>>>>>> https://dist.apache.org/repos/ >> dist/dev/spark/v2.3.0-rc4-bin/ >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> Release artifacts are signed with the following key: >> >> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> The staging repository for this release can be found at: >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> https://repository.apache.org/ >> content/repositories/orgapachespark-1265/ >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> The documentation corresponding to this release can be >> found >> >> >>>>>>>>>>> at: >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> https://dist.apache.org/repos/ >> dist/dev/spark/v2.3.0-rc4-docs/_site/index.html >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> FAQ >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> ======================================= >> >> >>>>>>>>>>> What are the unresolved issues targeted for 2.3.0? >> >> >>>>>>>>>>> ======================================= >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> Please see https://s.apache.org/oXKi. At the time of >> writing, >> >> >>>>>>>>>>> there are currently no known release blockers. >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> ========================= >> >> >>>>>>>>>>> How can I help test this release? >> >> >>>>>>>>>>> ========================= >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> If you are a Spark user, you can help us test this release >> by >> >> >>>>>>>>>>> taking an existing Spark workload and running on this >> release >> >> >>>>>>>>>>> candidate, >> >> >>>>>>>>>>> then reporting any regressions. >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> If you're working in PySpark you can set up a virtual env >> and >> >> >>>>>>>>>>> install the current RC and see if anything important >> breaks, >> >> >>>>>>>>>>> in the >> >> >>>>>>>>>>> Java/Scala you can add the staging repository to your >> projects >> >> >>>>>>>>>>> resolvers and >> >> >>>>>>>>>>> test with the RC (make sure to clean up the artifact cache >> >> >>>>>>>>>>> before/after so >> >> >>>>>>>>>>> you don't end up building with a out of date RC going >> >> >>>>>>>>>>> forward). >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> =========================================== >> >> >>>>>>>>>>> What should happen to JIRA tickets still targeting 2.3.0? >> >> >>>>>>>>>>> =========================================== >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> Committers should look at those and triage. Extremely >> >> >>>>>>>>>>> important >> >> >>>>>>>>>>> bug fixes, documentation, and API tweaks that impact >> >> >>>>>>>>>>> compatibility should be >> >> >>>>>>>>>>> worked on immediately. Everything else please retarget to >> >> >>>>>>>>>>> 2.3.1 or 2.4.0 as >> >> >>>>>>>>>>> appropriate. >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> =================== >> >> >>>>>>>>>>> Why is my bug not fixed? >> >> >>>>>>>>>>> =================== >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> In order to make timely releases, we will typically not >> hold >> >> >>>>>>>>>>> the >> >> >>>>>>>>>>> release unless the bug in question is a regression from >> 2.2.0. >> >> >>>>>>>>>>> That being >> >> >>>>>>>>>>> said, if there is something which is a regression from >> 2.2.0 >> >> >>>>>>>>>>> and has not >> >> >>>>>>>>>>> been correctly targeted please ping me or a committer to >> help >> >> >>>>>>>>>>> target the >> >> >>>>>>>>>>> issue (you can see the open issues listed as impacting >> Spark >> >> >>>>>>>>>>> 2.3.0 at >> >> >>>>>>>>>>> https://s.apache.org/WmoI). >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> -- >> >> >>>>>>>>>> Sameer Agarwal >> >> >>>>>>>>>> Computer Science | UC Berkeley >> >> >>>>>>>>>> http://cs.berkeley.edu/~sameerag >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> -- >> >> >>>>>>>>> Sameer Agarwal >> >> >>>>>>>>> Computer Science | UC Berkeley >> >> >>>>>>>>> http://cs.berkeley.edu/~sameerag >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> -- >> >> >>>>>> Takuya UESHIN >> >> >>>>>> Tokyo, Japan >> >> >>>>>> >> >> >>>>>> http://twitter.com/ueshin >> >> >>>>> >> >> >>>>> >> >> >>>> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Ryan Blue >> >> >> Software Engineer >> >> >> Netflix >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Marcelo >> > >> > >> >> >> >> -- >> Marcelo >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > -- Ryan Blue Software Engineer Netflix