Re: [VOTE] Release Apache Spark 2.4.2

2019-05-01 Thread Felix Cheung
: Tuesday, April 30, 2019 1:52:53 PM To: Reynold Xin Cc: Jungtaek Lim; Dongjoon Hyun; Wenchen Fan; Michael Heuer; Terry Kim; dev; Xiao Li Subject: Re: [VOTE] Release Apache Spark 2.4.2 FWIW I'm OK with this even though I proposed the backport PR for discussion. It really is a tough call

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-29 Thread Dongjoon Hyun
Hi, All and Xiao (as a next release manager). In any case, can the release manager include the information about the used release script as a part of VOTE email officially? That information will be very helpful to reproduce Spark build (in the downstream environment) Currently, it's not clearly

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-29 Thread Wenchen Fan
> it could just be fixed in master rather than back-port and re-roll the RC I don't think the release script is part of the released product. That said, we can just fix the release script in branch 2.4 without creating a new RC. We can even create a new repo for the release script, like

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-29 Thread Sean Owen
I think this is a reasonable idea; I know @vanzin had suggested it was simpler to use the latest in case a bug was found in the release script and then it could just be fixed in master rather than back-port and re-roll the RC. That said I think we did / had to already drop the ability to build <=

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Michael Heuer
> Scala versioned artifacts in our release. Our python library on PyPI depends > on pyspark, our Bioconda recipe depends on the pyspark Conda recipe, and our > Homebrew formula depends on the apache-spark Homebrew formula. > > Using Scala 2.12 in the binary distribu

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
brew formula depends on the apache-spark Homebrew > formula. > > Using Scala 2.12 in the binary distribution for Spark 2.4.2 was > unintentional and never voted on. There was a successful vote to default > to Scala 2.12 in Spark version 3.0. > >michael > > > On

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Michael Heuer
. Using Scala 2.12 in the binary distribution for Spark 2.4.2 was unintentional and never voted on. There was a successful vote to default to Scala 2.12 in Spark version 3.0. michael > On Apr 26, 2019, at 9:52 AM, Sean Owen wrote: > > To be clear, what's the nature of the problem ther

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
To be clear, what's the nature of the problem there... just Pyspark apps that are using a Scala-based library? Trying to make sure we understand what is and isn't a problem here. On Fri, Apr 26, 2019 at 9:44 AM Michael Heuer wrote: > This will also cause problems in Conda builds that depend on

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Michael Heuer
This will also cause problems in Conda builds that depend on pyspark https://anaconda.org/conda-forge/pyspark and Homebrew builds that depend on apache-spark, as that also uses the binary distribution.

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
Re: .NET, what's the particular issue in there that it's causing? 2.4.2 still builds for 2.11. I'd imagine you'd be pulling dependencies from Maven central (?) or if needed can build for 2.11 from source. I'm more concerned about pyspark because it builds in 2.12 jars. On Fri, Apr 26, 2019 at

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Reynold Xin
I do feel it'd be better to not switch default Scala versions in a minor release. I don't know how much downstream this impacts. Dotnet is a good data point. Anybody else hit this issue? On Thu, Apr 25, 2019 at 11:36 PM, Terry Kim < yumin...@gmail.com > wrote: > > > > Very much interested

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Terry Kim
Very much interested in hearing what you folks decide. We currently have a couple asking us questions at https://github.com/dotnet/spark/issues. Thanks, Terry -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-22 Thread Shixiong(Ryan) Zhu
gt;> On Thu, Apr 18, 2019 at 9:51 PM Wenchen Fan wrote: >> > >> > Please vote on releasing the following candidate as Apache Spark >> version 2.4.2. >> > >> > The vote is open until April 23 PST and passes if a majority +1 PMC >> votes are cast, with &g

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Wenchen Fan
April 23 PST and passes if a majority +1 PMC > votes are cast, with > > a minimum of 3 +1 votes. > > > > [ ] +1 Release this package as Apache Spark 2.4.2 > > [ ] -1 Do not release this package because ... > > > > To learn more about Apache Spark, ple

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Sean Owen
nimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 2.4.2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.4.2-rc1 (commit > a44880ba74caab7a987128cb09

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread vaquar khan
* Wenchen Fan > *Cc:* Spark dev list > *Subject:* Re: [VOTE] Release Apache Spark 2.4.2 > > +1 from me too. > > It seems like there is support for merging the Jackson change into > 2.4.x (and, I think, a few more minor dependency updates) but this > doesn't have to go into

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Felix Cheung
+1 R tests, package tests on r-hub. Manually check commits under R, doc etc From: Sean Owen Sent: Saturday, April 20, 2019 11:27 AM To: Wenchen Fan Cc: Spark dev list Subject: Re: [VOTE] Release Apache Spark 2.4.2 +1 from me too. It seems like

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-20 Thread Sean Owen
ease this package as Apache Spark 2.4.2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.4.2-rc1 (commit > a44880ba74caab7a987128cb09c4bee41617770a): > https://github.com/ap

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-19 Thread shane knapp
an wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.4.2. >> >> The vote is open until April 23 PST and passes if a majority +1 PMC votes >> are cast, with >> a minimum of 3 +1 votes. >> >> [ ] +1 Release this package

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-19 Thread Michael Armbrust
a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 2.4.2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.4.2-rc1 (commit > a44880ba74caab7a98712

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
; > >>> Re shading - same argument I’ve made earlier today in a PR... > >>> > >>> (Context- in many cases Spark has light or indirect dependencies but > >>> bringing them into the process breaks users code easily) > >>> > >>>

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
made earlier today in a PR... >>> >>> (Context- in many cases Spark has light or indirect dependencies but >>> bringing them into the process breaks users code easily) >>> >>> >>> ____ >>> From: Michael Heuer

Re: Spark 2.4.2

2019-04-19 Thread Driesprong, Fokko
> >>> (Context- in many cases Spark has light or indirect dependencies but >>> bringing them into the process breaks users code easily) >>> >>> >>> -- >>> *From:* Michael Heuer >>> *Sent:* Thursday, April 1

Re: Spark 2.4.2

2019-04-19 Thread Arun Mahadevan
ging them into the process breaks users code easily) >> >> >> -- >> *From:* Michael Heuer >> *Sent:* Thursday, April 18, 2019 6:41 AM >> *To:* Reynold Xin >> *Cc:* Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen >>

Re: Spark 2.4.2

2019-04-18 Thread Wenchen Fan
endencies but > bringing them into the process breaks users code easily) > > > -- > *From:* Michael Heuer > *Sent:* Thursday, April 18, 2019 6:41 AM > *To:* Reynold Xin > *Cc:* Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen > Fan;

[VOTE] Release Apache Spark 2.4.2

2019-04-18 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 2.4.2. The vote is open until April 23 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.2 [ ] -1 Do not release this package because

Re: Spark 2.4.2

2019-04-18 Thread Felix Cheung
Xin Cc: Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen Fan; Xiao Li Subject: Re: Spark 2.4.2 +100 On Apr 18, 2019, at 1:48 AM, Reynold Xin mailto:r...@databricks.com>> wrote: We should have shaded all Spark’s dependencies :( On Wed, Apr 17, 2019 at 11:47 PM Sea

Re: Spark 2.4.2

2019-04-18 Thread Michael Heuer
+100 > On Apr 18, 2019, at 1:48 AM, Reynold Xin wrote: > > We should have shaded all Spark’s dependencies :( > > On Wed, Apr 17, 2019 at 11:47 PM Sean Owen > wrote: > For users that would inherit Jackson and use it directly, or whose > dependencies do. Spark itself

Re: Spark 2.4.2

2019-04-18 Thread Reynold Xin
We should have shaded all Spark’s dependencies :( On Wed, Apr 17, 2019 at 11:47 PM Sean Owen wrote: > For users that would inherit Jackson and use it directly, or whose > dependencies do. Spark itself (with modifications) should be OK with > the change. > It's risky and normally wouldn't

Re: Spark 2.4.2

2019-04-18 Thread Sean Owen
For users that would inherit Jackson and use it directly, or whose dependencies do. Spark itself (with modifications) should be OK with the change. It's risky and normally wouldn't backport, except that I've heard a few times about concerns about CVEs affecting Databind, so wondering who else out

Re: Spark 2.4.2

2019-04-17 Thread Reynold Xin
For Jackson - are you worrying about JSON parsing for users or internal Spark functionality breaking? On Wed, Apr 17, 2019 at 6:02 PM Sean Owen wrote: > There's only one other item on my radar, which is considering updating > Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
There's only one other item on my radar, which is considering updating Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up a few times now that there are a number of CVEs open for 2.6.7. Cons: not clear they affect Spark, and Jackson 2.6->2.9 does change Jackson behavior

Re: Spark 2.4.2

2019-04-17 Thread Wenchen Fan
I volunteer to be the release manager for 2.4.2, as I was also going to propose 2.4.2 because of the reverting of SPARK-25250. Is there any other ongoing bug fixes we want to include in 2.4.2? If no I'd like to start the release process today (CST). Thanks, Wenchen On Thu, Apr 18, 2019 at 3:44

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
I think the 'only backport bug fixes to branches' principle remains sound. But what's a bug fix? Something that changes behavior to match what is explicitly supposed to happen, or implicitly supposed to happen -- implied by what other similar things do, by reasonable user expectations, or simply

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust
Thanks Ryan. To me the "test" for putting things in a maintenance release is really a trade-off between benefit and risk (along with some caveats, like user facing surface should not grow). The benefits here are fairly large (now it is possible to plug in partition aware data sources) and the risk

Re: Spark 2.4.2

2019-04-16 Thread Ryan Blue
Spark has a lot of strange behaviors already that we don't fix in patch releases. And bugs aren't usually fixed with a configuration flag to turn on the fix. That said, I don't have a problem with this commit making it into a patch release. This is a small change and looks safe enough to me. I

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust
I would argue that its confusing enough to a user for options from DataFrameWriter to be silently dropped when instantiating the data source to consider this a bug. They asked for partitioning to occur, and we are doing nothing (not even telling them we can't). I was certainly surprised by this

Re: Spark 2.4.2

2019-04-16 Thread Ryan Blue
Is this a bug fix? It looks like a new feature to me. On Tue, Apr 16, 2019 at 4:13 PM Michael Armbrust wrote: > Hello All, > > I know we just released Spark 2.4.1, but in light of fixing SPARK-27453 > I was wondering if it > might make sense

Spark 2.4.2

2019-04-16 Thread Michael Armbrust
Hello All, I know we just released Spark 2.4.1, but in light of fixing SPARK-27453 I was wondering if it might make sense to follow up quickly with 2.4.2. Without this fix its very hard to build a datasource that correctly handles partitioning