Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell
Sounds good - thanks Holden!

On Mon, Sep 18, 2017 at 8:21 PM, Holden Karau  wrote:

> That sounds like a pretty good temporary work around if folks agree I'll
> cancel release vote for 2.1.2 and work on getting an RC2 out later this
> week manually signed. I've filed JIRA SPARK-22055 & SPARK-22054 to port the
> release scripts and allow injecting of the RM's key.
>
> On Mon, Sep 18, 2017 at 8:11 PM, Patrick Wendell 
> wrote:
>
>> For the current release - maybe Holden could just sign the artifacts with
>> her own key manually, if this is a concern. I don't think that would
>> require modifying the release pipeline, except to just remove/ignore the
>> existing signatures.
>>
>> - Patrick
>>
>> On Mon, Sep 18, 2017 at 7:56 PM, Reynold Xin  wrote:
>>
>>> Does anybody know whether this is a hard blocker? If it is not, we
>>> should probably push 2.1.2 forward quickly and do the infrastructure
>>> improvement in parallel.
>>>
>>> On Mon, Sep 18, 2017 at 7:49 PM, Holden Karau 
>>> wrote:
>>>
 I'm more than willing to help migrate the scripts as part of either
 this release or the next.

 It sounds like there is a consensus developing around changing the
 process -- should we hold off on the 2.1.2 release or roll this into the
 next one?

 On Mon, Sep 18, 2017 at 7:37 PM, Marcelo Vanzin 
 wrote:

> +1 to this. There should be a script in the Spark repo that has all
> the logic needed for a release. That script should take the RM's key
> as a parameter.
>
> if there's a desire to keep the current Jenkins job to create the
> release, it should be based on that script. But from what I'm seeing
> there are currently too many unknowns in the release process.
>
> On Mon, Sep 18, 2017 at 4:55 PM, Ryan Blue 
> wrote:
> > I don't understand why it is necessary to share a release key. If
> this is
> > something that can be automated in a Jenkins job, then can it be a
> script
> > with a reasonable set of build requirements for Mac and Ubuntu?
> That's the
> > approach I've seen the most in other projects.
> >
> > I'm also not just concerned about release managers. Having a key
> stored
> > persistently on outside infrastructure adds the most risk, as
> Luciano noted
> > as well. We should also start publishing checksums in the Spark VOTE
> thread,
> > which are currently missing. The risk I'm concerned about is that if
> the key
> > were compromised, it would be possible to replace binaries with
> perfectly
> > valid ones, at least on some mirrors. If the Apache copy were
> replaced, then
> > we wouldn't even be able to catch that it had happened. Given the
> high
> > profile of Spark and the number of companies that run it, I think we
> need to
> > take extra care to make sure that can't happen, even if it is an
> annoyance
> > for the release managers.
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


 --
 Twitter: https://twitter.com/holdenkarau

>>>
>>>
>>
>
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>


Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-18 Thread Holden Karau
As per the conversation happening around the signing of releases I'm
cancelling this vote. If folks agree with the temporary solution there I'll
try and get a new RC out shortly but if we end up blocking on migrating the
Jenkins jobs it could take a bit longer.

On Sun, Sep 17, 2017 at 1:30 AM, yuming wang  wrote:

> Yes, It doesn’t work in 2.1.0 and 2.1.1, I create a PR for this:
> https://github.com/apache/spark/pull/19259.
>
>
> 在 2017年9月17日,16:14,Sean Owen  写道:
>
> So, didn't work in 2.1.0 or 2.1.1? If it's not a regression and not
> critical, it shouldn't block a release. It seems like this can only affect
> Docker and/or Oracle JDBC? Well, if we need to roll another release anyway,
> seems OK.
>
> On Sun, Sep 17, 2017 at 6:06 AM Xiao Li  wrote:
>
>> This is a bug introduced in 2.1. It works fine in 2.0
>>
>> 2017-09-16 16:15 GMT-07:00 Holden Karau :
>>
>>> Ok :) Was this working in 2.1.1?
>>>
>>> On Sat, Sep 16, 2017 at 3:59 PM Xiao Li  wrote:
>>>
 Still -1

 Unable to pass the tests in my local environment. Open a JIRA
 https://issues.apache.org/jira/browse/SPARK-22041

 - SPARK-16625: General data types to be mapped to Oracle *** FAILED ***

   types.apply(9).equals(org.apache.spark.sql.types.DateType) was false
 (OracleIntegrationSuite.scala:158)

 Xiao

 2017-09-15 17:35 GMT-07:00 Ryan Blue :

> -1 (with my Apache member hat on, non-binding)
>
> I'll continue discussion in the other thread, but I don't think we
> should share signing keys.
>
> On Fri, Sep 15, 2017 at 5:14 PM, Holden Karau 
> wrote:
>
>> Indeed it's limited to a people with login permissions on the Jenkins
>> host (and perhaps further limited, I'm not certain). Shane probably knows
>> more about the ACLs, so I'll ask him in the other thread for specifics.
>>
>> This is maybe branching a bit from the question of the current RC
>> though, so I'd suggest we continue this discussion on the thread Sean 
>> Owen
>> made.
>>
>> On Fri, Sep 15, 2017 at 4:04 PM Ryan Blue  wrote:
>>
>>> I'm not familiar with the release procedure, can you send a link to
>>> this Jenkins job? Can anyone run this job, or is it limited to 
>>> committers?
>>>
>>> rb
>>>
>>> On Fri, Sep 15, 2017 at 12:28 PM, Holden Karau >> > wrote:
>>>
 That's a good question, I built the release candidate however the
 Jenkins scripts don't take a parameter for configuring who signs them
 rather it always signs them with Patrick's key. You can see this from
 previous releases which were managed by other folks but still signed by
 Patrick.

 On Fri, Sep 15, 2017 at 12:16 PM, Ryan Blue 
 wrote:

> The signature is valid, but why was the release signed with
> Patrick Wendell's private key? Did Patrick build the release 
> candidate?
>
> rb
>
> On Fri, Sep 15, 2017 at 6:36 AM, Denny Lee 
> wrote:
>
>> +1 (non-binding)
>>
>> On Thu, Sep 14, 2017 at 10:57 PM Felix Cheung <
>> felixcheun...@hotmail.com> wrote:
>>
>>> +1 tested SparkR package on Windows, r-hub, Ubuntu.
>>>
>>> _
>>> From: Sean Owen 
>>> Sent: Thursday, September 14, 2017 3:12 PM
>>> Subject: Re: [VOTE] Spark 2.1.2 (RC1)
>>> To: Holden Karau , 
>>>
>>>
>>>
>>> +1
>>> Very nice. The sigs and hashes look fine, it builds fine for me
>>> on Debian Stretch with Java 8, yarn/hive/hadoop-2.7 profiles, and 
>>> passes
>>> tests.
>>>
>>> Yes as you say, no outstanding issues except for this which
>>> doesn't look critical, as it's not a regression.
>>>
>>> SPARK-21985 PySpark PairDeserializer is broken for double-zipped
>>> RDDs
>>>
>>>
>>> On Thu, Sep 14, 2017 at 7:47 PM Holden Karau <
>>> hol...@pigscanfly.ca> wrote:
>>>
 Please vote on releasing the following candidate as Apache
 Spark version 2.1.2. The vote is open until Friday September
 22nd at 18:00 PST and passes if a majority of at least 3 +1 PMC
 votes are cast.

 [ ] +1 Release this package as Apache Spark 2.1.2
 [ ] -1 Do not release this package because ...


 To learn more about Apache Spark, please see
 https://spark.apache.org/

 The 

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Holden Karau
That sounds like a pretty good temporary work around if folks agree I'll
cancel release vote for 2.1.2 and work on getting an RC2 out later this
week manually signed. I've filed JIRA SPARK-22055 & SPARK-22054 to port the
release scripts and allow injecting of the RM's key.

On Mon, Sep 18, 2017 at 8:11 PM, Patrick Wendell 
wrote:

> For the current release - maybe Holden could just sign the artifacts with
> her own key manually, if this is a concern. I don't think that would
> require modifying the release pipeline, except to just remove/ignore the
> existing signatures.
>
> - Patrick
>
> On Mon, Sep 18, 2017 at 7:56 PM, Reynold Xin  wrote:
>
>> Does anybody know whether this is a hard blocker? If it is not, we should
>> probably push 2.1.2 forward quickly and do the infrastructure improvement
>> in parallel.
>>
>> On Mon, Sep 18, 2017 at 7:49 PM, Holden Karau 
>> wrote:
>>
>>> I'm more than willing to help migrate the scripts as part of either this
>>> release or the next.
>>>
>>> It sounds like there is a consensus developing around changing the
>>> process -- should we hold off on the 2.1.2 release or roll this into the
>>> next one?
>>>
>>> On Mon, Sep 18, 2017 at 7:37 PM, Marcelo Vanzin 
>>> wrote:
>>>
 +1 to this. There should be a script in the Spark repo that has all
 the logic needed for a release. That script should take the RM's key
 as a parameter.

 if there's a desire to keep the current Jenkins job to create the
 release, it should be based on that script. But from what I'm seeing
 there are currently too many unknowns in the release process.

 On Mon, Sep 18, 2017 at 4:55 PM, Ryan Blue 
 wrote:
 > I don't understand why it is necessary to share a release key. If
 this is
 > something that can be automated in a Jenkins job, then can it be a
 script
 > with a reasonable set of build requirements for Mac and Ubuntu?
 That's the
 > approach I've seen the most in other projects.
 >
 > I'm also not just concerned about release managers. Having a key
 stored
 > persistently on outside infrastructure adds the most risk, as Luciano
 noted
 > as well. We should also start publishing checksums in the Spark VOTE
 thread,
 > which are currently missing. The risk I'm concerned about is that if
 the key
 > were compromised, it would be possible to replace binaries with
 perfectly
 > valid ones, at least on some mirrors. If the Apache copy were
 replaced, then
 > we wouldn't even be able to catch that it had happened. Given the high
 > profile of Spark and the number of companies that run it, I think we
 need to
 > take extra care to make sure that can't happen, even if it is an
 annoyance
 > for the release managers.

 --
 Marcelo

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell
For the current release - maybe Holden could just sign the artifacts with
her own key manually, if this is a concern. I don't think that would
require modifying the release pipeline, except to just remove/ignore the
existing signatures.

- Patrick

On Mon, Sep 18, 2017 at 7:56 PM, Reynold Xin  wrote:

> Does anybody know whether this is a hard blocker? If it is not, we should
> probably push 2.1.2 forward quickly and do the infrastructure improvement
> in parallel.
>
> On Mon, Sep 18, 2017 at 7:49 PM, Holden Karau 
> wrote:
>
>> I'm more than willing to help migrate the scripts as part of either this
>> release or the next.
>>
>> It sounds like there is a consensus developing around changing the
>> process -- should we hold off on the 2.1.2 release or roll this into the
>> next one?
>>
>> On Mon, Sep 18, 2017 at 7:37 PM, Marcelo Vanzin 
>> wrote:
>>
>>> +1 to this. There should be a script in the Spark repo that has all
>>> the logic needed for a release. That script should take the RM's key
>>> as a parameter.
>>>
>>> if there's a desire to keep the current Jenkins job to create the
>>> release, it should be based on that script. But from what I'm seeing
>>> there are currently too many unknowns in the release process.
>>>
>>> On Mon, Sep 18, 2017 at 4:55 PM, Ryan Blue 
>>> wrote:
>>> > I don't understand why it is necessary to share a release key. If this
>>> is
>>> > something that can be automated in a Jenkins job, then can it be a
>>> script
>>> > with a reasonable set of build requirements for Mac and Ubuntu? That's
>>> the
>>> > approach I've seen the most in other projects.
>>> >
>>> > I'm also not just concerned about release managers. Having a key stored
>>> > persistently on outside infrastructure adds the most risk, as Luciano
>>> noted
>>> > as well. We should also start publishing checksums in the Spark VOTE
>>> thread,
>>> > which are currently missing. The risk I'm concerned about is that if
>>> the key
>>> > were compromised, it would be possible to replace binaries with
>>> perfectly
>>> > valid ones, at least on some mirrors. If the Apache copy were
>>> replaced, then
>>> > we wouldn't even be able to catch that it had happened. Given the high
>>> > profile of Spark and the number of companies that run it, I think we
>>> need to
>>> > take extra care to make sure that can't happen, even if it is an
>>> annoyance
>>> > for the release managers.
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>


Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Reynold Xin
Does anybody know whether this is a hard blocker? If it is not, we should
probably push 2.1.2 forward quickly and do the infrastructure improvement
in parallel.

On Mon, Sep 18, 2017 at 7:49 PM, Holden Karau  wrote:

> I'm more than willing to help migrate the scripts as part of either this
> release or the next.
>
> It sounds like there is a consensus developing around changing the process
> -- should we hold off on the 2.1.2 release or roll this into the next one?
>
> On Mon, Sep 18, 2017 at 7:37 PM, Marcelo Vanzin 
> wrote:
>
>> +1 to this. There should be a script in the Spark repo that has all
>> the logic needed for a release. That script should take the RM's key
>> as a parameter.
>>
>> if there's a desire to keep the current Jenkins job to create the
>> release, it should be based on that script. But from what I'm seeing
>> there are currently too many unknowns in the release process.
>>
>> On Mon, Sep 18, 2017 at 4:55 PM, Ryan Blue 
>> wrote:
>> > I don't understand why it is necessary to share a release key. If this
>> is
>> > something that can be automated in a Jenkins job, then can it be a
>> script
>> > with a reasonable set of build requirements for Mac and Ubuntu? That's
>> the
>> > approach I've seen the most in other projects.
>> >
>> > I'm also not just concerned about release managers. Having a key stored
>> > persistently on outside infrastructure adds the most risk, as Luciano
>> noted
>> > as well. We should also start publishing checksums in the Spark VOTE
>> thread,
>> > which are currently missing. The risk I'm concerned about is that if
>> the key
>> > were compromised, it would be possible to replace binaries with
>> perfectly
>> > valid ones, at least on some mirrors. If the Apache copy were replaced,
>> then
>> > we wouldn't even be able to catch that it had happened. Given the high
>> > profile of Spark and the number of companies that run it, I think we
>> need to
>> > take extra care to make sure that can't happen, even if it is an
>> annoyance
>> > for the release managers.
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>


Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Holden Karau
I'm more than willing to help migrate the scripts as part of either this
release or the next.

It sounds like there is a consensus developing around changing the process
-- should we hold off on the 2.1.2 release or roll this into the next one?

On Mon, Sep 18, 2017 at 7:37 PM, Marcelo Vanzin  wrote:

> +1 to this. There should be a script in the Spark repo that has all
> the logic needed for a release. That script should take the RM's key
> as a parameter.
>
> if there's a desire to keep the current Jenkins job to create the
> release, it should be based on that script. But from what I'm seeing
> there are currently too many unknowns in the release process.
>
> On Mon, Sep 18, 2017 at 4:55 PM, Ryan Blue 
> wrote:
> > I don't understand why it is necessary to share a release key. If this is
> > something that can be automated in a Jenkins job, then can it be a script
> > with a reasonable set of build requirements for Mac and Ubuntu? That's
> the
> > approach I've seen the most in other projects.
> >
> > I'm also not just concerned about release managers. Having a key stored
> > persistently on outside infrastructure adds the most risk, as Luciano
> noted
> > as well. We should also start publishing checksums in the Spark VOTE
> thread,
> > which are currently missing. The risk I'm concerned about is that if the
> key
> > were compromised, it would be possible to replace binaries with perfectly
> > valid ones, at least on some mirrors. If the Apache copy were replaced,
> then
> > we wouldn't even be able to catch that it had happened. Given the high
> > profile of Spark and the number of companies that run it, I think we
> need to
> > take extra care to make sure that can't happen, even if it is an
> annoyance
> > for the release managers.
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
Twitter: https://twitter.com/holdenkarau


Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Marcelo Vanzin
+1 to this. There should be a script in the Spark repo that has all
the logic needed for a release. That script should take the RM's key
as a parameter.

if there's a desire to keep the current Jenkins job to create the
release, it should be based on that script. But from what I'm seeing
there are currently too many unknowns in the release process.

On Mon, Sep 18, 2017 at 4:55 PM, Ryan Blue  wrote:
> I don't understand why it is necessary to share a release key. If this is
> something that can be automated in a Jenkins job, then can it be a script
> with a reasonable set of build requirements for Mac and Ubuntu? That's the
> approach I've seen the most in other projects.
>
> I'm also not just concerned about release managers. Having a key stored
> persistently on outside infrastructure adds the most risk, as Luciano noted
> as well. We should also start publishing checksums in the Spark VOTE thread,
> which are currently missing. The risk I'm concerned about is that if the key
> were compromised, it would be possible to replace binaries with perfectly
> valid ones, at least on some mirrors. If the Apache copy were replaced, then
> we wouldn't even be able to catch that it had happened. Given the high
> profile of Spark and the number of companies that run it, I think we need to
> take extra care to make sure that can't happen, even if it is an annoyance
> for the release managers.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell
Hey I talked more with Josh Rosen about this who has helped with automation
since I became less involved in release management.

I can think of a few different things that would improve our RM based on
these suggestions:

(1) We could remove signing step from the rest of the automation and as the
RM to sign the artifacts locally as a last step. This does mean we'd trust
the RM's environment not to be owned, but it could be better if there is
concern about centralization of risk. I'm curious how other projects do
this.

(2) We could rotate the RM position. BTW Holden Karau is doing this and
that's how this whole discussion started.

(3) We should make sure all build tooling automation is in the repo itself
so that the build is 100% reproducible by anyone. I think most of it is
already in dev/ [1] but there might be jenkins configs, etc that could be
put into the spark repo.

[1] https://github.com/apache/spark/tree/master/dev/create-release

- Patrick

On Mon, Sep 18, 2017 at 6:23 PM, Patrick Wendell 
wrote:

> One thing we could do is modify the release tooling to allow the key to be
> injected each time, thus allowing any RM to insert their own key at build
> time.
>
> Patrick
>
> On Mon, Sep 18, 2017 at 4:56 PM Ryan Blue  wrote:
>
>> I don't understand why it is necessary to share a release key. If this is
>> something that can be automated in a Jenkins job, then can it be a script
>> with a reasonable set of build requirements for Mac and Ubuntu? That's the
>> approach I've seen the most in other projects.
>>
>> I'm also not just concerned about release managers. Having a key stored
>> persistently on outside infrastructure adds the most risk, as Luciano noted
>> as well. We should also start publishing checksums in the Spark VOTE
>> thread, which are currently missing. The risk I'm concerned about is that
>> if the key were compromised, it would be possible to replace binaries with
>> perfectly valid ones, at least on some mirrors. If the Apache copy were
>> replaced, then we wouldn't even be able to catch that it had happened.
>> Given the high profile of Spark and the number of companies that run it, I
>> think we need to take extra care to make sure that can't happen, even if it
>> is an annoyance for the release managers.
>>
>> On Sun, Sep 17, 2017 at 10:12 PM, Patrick Wendell > > wrote:
>>
>>> Sparks release pipeline is automated and part of that automation
>>> includes securely injecting this key for the purpose of signing. I asked
>>> the ASF to provide a service account key several years ago but they
>>> suggested that we use a key attributed to an individual even if the process
>>> is automated.
>>>
>>> I believe other projects that release with high frequency also have
>>> automated the signing process.
>>>
>>> This key is injected during the build process. A really ambitious
>>> release manager could reverse engineer this in a way that reveals the
>>> private key, however if someone is a release manager then they themselves
>>> can do quite a bit of nefarious things anyways.
>>>
>>> It is true that we trust all previous release managers instead of only
>>> one. We could probably rotate the jenkins credentials periodically in order
>>> to compensate for this, if we think this is a nontrivial risk.
>>>
>>> - Patrick
>>>
>>> On Sun, Sep 17, 2017 at 7:04 PM, Holden Karau 
>>> wrote:
>>>
 Would any of Patrick/Josh/Shane (or other PMC folks with
 understanding/opinions on this setup) care to comment? If this is a
 blocking issue I can cancel the current release vote thread while we
 discuss this some more.

 On Fri, Sep 15, 2017 at 5:18 PM Holden Karau 
 wrote:

> Oh yes and to keep people more informed I've been updating a PR for
> the release documentation as I go to write down some of this unwritten
> knowledge -- https://github.com/apache/spark-website/pull/66
>
>
> On Fri, Sep 15, 2017 at 5:12 PM Holden Karau 
> wrote:
>
>> Also continuing the discussion from the vote threads, Shane probably
>> has the best idea on the ACLs for Jenkins so I've CC'd him as well.
>>
>>
>> On Fri, Sep 15, 2017 at 5:09 PM Holden Karau 
>> wrote:
>>
>>> Changing the release jobs, beyond the available parameters, right
>>> now depends on Josh arisen as there are some scripts which generate the
>>> jobs which aren't public. I've done temporary fixes in the past with the
>>> Python packaging but my understanding is that in the medium term it
>>> requires access to the scripts.
>>>
>>> So +CC Josh.
>>>
>>> On Fri, Sep 15, 2017 at 4:38 PM Ryan Blue  wrote:
>>>
 I think this needs to be fixed. It's true that there are barriers
 to publication, but the signature is what we use to authenticate Apache

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread shane knapp
i will detail how we control access to the jenkins infra tomorrow.
we're pretty well locked down, but there is absolutely room for
improvement.

this thread is also a good reminder that we (RMs + pwendell + ?)
should audit who still has, but does not need direct (or special)
access to jenkins.

regarding the release pipeline and associated tooling, i wasn't
involved much during the rollout of the current system.  i'm all for a
bit more involvement on my end, as well as better/moar documentation.
also, tweaking the build/release process to allow individual RMs to
inject their own keys is fine by me.

shane

On Fri, Sep 15, 2017 at 5:12 PM, Holden Karau  wrote:
> Also continuing the discussion from the vote threads, Shane probably has the
> best idea on the ACLs for Jenkins so I've CC'd him as well.
>
>
> On Fri, Sep 15, 2017 at 5:09 PM Holden Karau  wrote:
>>
>> Changing the release jobs, beyond the available parameters, right now
>> depends on Josh arisen as there are some scripts which generate the jobs
>> which aren't public. I've done temporary fixes in the past with the Python
>> packaging but my understanding is that in the medium term it requires access
>> to the scripts.
>>
>> So +CC Josh.
>>
>> On Fri, Sep 15, 2017 at 4:38 PM Ryan Blue  wrote:
>>>
>>> I think this needs to be fixed. It's true that there are barriers to
>>> publication, but the signature is what we use to authenticate Apache
>>> releases.
>>>
>>> If Patrick's key is available on Jenkins for any Spark committer to use,
>>> then the chance of a compromise are much higher than for a normal RM key.
>>>
>>> rb
>>>
>>> On Fri, Sep 15, 2017 at 12:34 PM, Sean Owen  wrote:

 Yeah I had meant to ask about that in the past. While I presume Patrick
 consents to this and all that, it does mean that anyone with access to said
 Jenkins scripts can create a signed Spark release, regardless of who they
 are.

 I haven't thought through whether that's a theoretical issue we can
 ignore or something we need to fix up. For example you can't get a release
 on the ASF mirrors without more authentication.

 How hard would it be to make the script take in a key? it sort of looks
 like the script already takes GPG_KEY, but don't know how to modify the
 jobs. I suppose it would be ideal, in any event, for the actual release
 manager to sign.

 On Fri, Sep 15, 2017 at 8:28 PM Holden Karau 
 wrote:
>
> That's a good question, I built the release candidate however the
> Jenkins scripts don't take a parameter for configuring who signs them 
> rather
> it always signs them with Patrick's key. You can see this from previous
> releases which were managed by other folks but still signed by Patrick.
>
> On Fri, Sep 15, 2017 at 12:16 PM, Ryan Blue  wrote:
>>
>> The signature is valid, but why was the release signed with Patrick
>> Wendell's private key? Did Patrick build the release candidate?
>>>
>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>
> --
> Twitter: https://twitter.com/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell
One thing we could do is modify the release tooling to allow the key to be
injected each time, thus allowing any RM to insert their own key at build
time.

Patrick

On Mon, Sep 18, 2017 at 4:56 PM Ryan Blue  wrote:

> I don't understand why it is necessary to share a release key. If this is
> something that can be automated in a Jenkins job, then can it be a script
> with a reasonable set of build requirements for Mac and Ubuntu? That's the
> approach I've seen the most in other projects.
>
> I'm also not just concerned about release managers. Having a key stored
> persistently on outside infrastructure adds the most risk, as Luciano noted
> as well. We should also start publishing checksums in the Spark VOTE
> thread, which are currently missing. The risk I'm concerned about is that
> if the key were compromised, it would be possible to replace binaries with
> perfectly valid ones, at least on some mirrors. If the Apache copy were
> replaced, then we wouldn't even be able to catch that it had happened.
> Given the high profile of Spark and the number of companies that run it, I
> think we need to take extra care to make sure that can't happen, even if it
> is an annoyance for the release managers.
>
> On Sun, Sep 17, 2017 at 10:12 PM, Patrick Wendell 
> wrote:
>
>> Sparks release pipeline is automated and part of that automation includes
>> securely injecting this key for the purpose of signing. I asked the ASF to
>> provide a service account key several years ago but they suggested that we
>> use a key attributed to an individual even if the process is automated.
>>
>> I believe other projects that release with high frequency also have
>> automated the signing process.
>>
>> This key is injected during the build process. A really ambitious release
>> manager could reverse engineer this in a way that reveals the private key,
>> however if someone is a release manager then they themselves can do quite a
>> bit of nefarious things anyways.
>>
>> It is true that we trust all previous release managers instead of only
>> one. We could probably rotate the jenkins credentials periodically in order
>> to compensate for this, if we think this is a nontrivial risk.
>>
>> - Patrick
>>
>> On Sun, Sep 17, 2017 at 7:04 PM, Holden Karau 
>> wrote:
>>
>>> Would any of Patrick/Josh/Shane (or other PMC folks with
>>> understanding/opinions on this setup) care to comment? If this is a
>>> blocking issue I can cancel the current release vote thread while we
>>> discuss this some more.
>>>
>>> On Fri, Sep 15, 2017 at 5:18 PM Holden Karau 
>>> wrote:
>>>
 Oh yes and to keep people more informed I've been updating a PR for the
 release documentation as I go to write down some of this unwritten
 knowledge -- https://github.com/apache/spark-website/pull/66


 On Fri, Sep 15, 2017 at 5:12 PM Holden Karau 
 wrote:

> Also continuing the discussion from the vote threads, Shane probably
> has the best idea on the ACLs for Jenkins so I've CC'd him as well.
>
>
> On Fri, Sep 15, 2017 at 5:09 PM Holden Karau 
> wrote:
>
>> Changing the release jobs, beyond the available parameters, right now
>> depends on Josh arisen as there are some scripts which generate the jobs
>> which aren't public. I've done temporary fixes in the past with the 
>> Python
>> packaging but my understanding is that in the medium term it requires
>> access to the scripts.
>>
>> So +CC Josh.
>>
>> On Fri, Sep 15, 2017 at 4:38 PM Ryan Blue  wrote:
>>
>>> I think this needs to be fixed. It's true that there are barriers to
>>> publication, but the signature is what we use to authenticate Apache
>>> releases.
>>>
>>> If Patrick's key is available on Jenkins for any Spark committer to
>>> use, then the chance of a compromise are much higher than for a normal 
>>> RM
>>> key.
>>>
>>> rb
>>>
>>> On Fri, Sep 15, 2017 at 12:34 PM, Sean Owen 
>>> wrote:
>>>
 Yeah I had meant to ask about that in the past. While I presume
 Patrick consents to this and all that, it does mean that anyone with 
 access
 to said Jenkins scripts can create a signed Spark release, regardless 
 of
 who they are.

 I haven't thought through whether that's a theoretical issue we can
 ignore or something we need to fix up. For example you can't get a 
 release
 on the ASF mirrors without more authentication.

 How hard would it be to make the script take in a key? it sort of
 looks like the script already takes GPG_KEY, but don't know how to 
 modify
 the jobs. I suppose it would be ideal, in any event, for the actual 
 release
 manager 

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Ryan Blue
I don't understand why it is necessary to share a release key. If this is
something that can be automated in a Jenkins job, then can it be a script
with a reasonable set of build requirements for Mac and Ubuntu? That's the
approach I've seen the most in other projects.

I'm also not just concerned about release managers. Having a key stored
persistently on outside infrastructure adds the most risk, as Luciano noted
as well. We should also start publishing checksums in the Spark VOTE
thread, which are currently missing. The risk I'm concerned about is that
if the key were compromised, it would be possible to replace binaries with
perfectly valid ones, at least on some mirrors. If the Apache copy were
replaced, then we wouldn't even be able to catch that it had happened.
Given the high profile of Spark and the number of companies that run it, I
think we need to take extra care to make sure that can't happen, even if it
is an annoyance for the release managers.

On Sun, Sep 17, 2017 at 10:12 PM, Patrick Wendell 
wrote:

> Sparks release pipeline is automated and part of that automation includes
> securely injecting this key for the purpose of signing. I asked the ASF to
> provide a service account key several years ago but they suggested that we
> use a key attributed to an individual even if the process is automated.
>
> I believe other projects that release with high frequency also have
> automated the signing process.
>
> This key is injected during the build process. A really ambitious release
> manager could reverse engineer this in a way that reveals the private key,
> however if someone is a release manager then they themselves can do quite a
> bit of nefarious things anyways.
>
> It is true that we trust all previous release managers instead of only
> one. We could probably rotate the jenkins credentials periodically in order
> to compensate for this, if we think this is a nontrivial risk.
>
> - Patrick
>
> On Sun, Sep 17, 2017 at 7:04 PM, Holden Karau 
> wrote:
>
>> Would any of Patrick/Josh/Shane (or other PMC folks with
>> understanding/opinions on this setup) care to comment? If this is a
>> blocking issue I can cancel the current release vote thread while we
>> discuss this some more.
>>
>> On Fri, Sep 15, 2017 at 5:18 PM Holden Karau 
>> wrote:
>>
>>> Oh yes and to keep people more informed I've been updating a PR for the
>>> release documentation as I go to write down some of this unwritten
>>> knowledge -- https://github.com/apache/spark-website/pull/66
>>>
>>>
>>> On Fri, Sep 15, 2017 at 5:12 PM Holden Karau 
>>> wrote:
>>>
 Also continuing the discussion from the vote threads, Shane probably
 has the best idea on the ACLs for Jenkins so I've CC'd him as well.


 On Fri, Sep 15, 2017 at 5:09 PM Holden Karau 
 wrote:

> Changing the release jobs, beyond the available parameters, right now
> depends on Josh arisen as there are some scripts which generate the jobs
> which aren't public. I've done temporary fixes in the past with the Python
> packaging but my understanding is that in the medium term it requires
> access to the scripts.
>
> So +CC Josh.
>
> On Fri, Sep 15, 2017 at 4:38 PM Ryan Blue  wrote:
>
>> I think this needs to be fixed. It's true that there are barriers to
>> publication, but the signature is what we use to authenticate Apache
>> releases.
>>
>> If Patrick's key is available on Jenkins for any Spark committer to
>> use, then the chance of a compromise are much higher than for a normal RM
>> key.
>>
>> rb
>>
>> On Fri, Sep 15, 2017 at 12:34 PM, Sean Owen 
>> wrote:
>>
>>> Yeah I had meant to ask about that in the past. While I presume
>>> Patrick consents to this and all that, it does mean that anyone with 
>>> access
>>> to said Jenkins scripts can create a signed Spark release, regardless of
>>> who they are.
>>>
>>> I haven't thought through whether that's a theoretical issue we can
>>> ignore or something we need to fix up. For example you can't get a 
>>> release
>>> on the ASF mirrors without more authentication.
>>>
>>> How hard would it be to make the script take in a key? it sort of
>>> looks like the script already takes GPG_KEY, but don't know how to 
>>> modify
>>> the jobs. I suppose it would be ideal, in any event, for the actual 
>>> release
>>> manager to sign.
>>>
>>> On Fri, Sep 15, 2017 at 8:28 PM Holden Karau 
>>> wrote:
>>>
 That's a good question, I built the release candidate however the
 Jenkins scripts don't take a parameter for configuring who signs them
 rather it always signs them with Patrick's key. You can see this from
 previous releases which 

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Luciano Resende
Looks like this thread is touching a few different issues:

- Process documentation: I was trying to learn the details behind the
automation, release signatures, etc in the Spark release management
official documentation (http://spark.apache.org/release-process.html) , and
it looks like not much is described there.

- Sharing release keys: As described in the Apache Release Creation Process
(http://www.apache.org/dev/release-publishing.html#signed) it is
recommended that, "If you plan to serve as a release manager, you should
generate a key and publish it well in advance of creating a release." which
clearly recommends an individual key per RM. If the keys are going to be
shared (which I don't recommend) it should at least be an "Apache Spark"
key, instead of an individual person key. IMHO sharing a key, particularly
when this key is available in a non-apache managed infrastructure, makes it
much more susceptible to become compromised, particularly because the PMC
does not control who has access to the environment.

- Inability to customize the release automation by the RM: Looks like the
Jenkins jobs that are responsible for automating the spark releases are
created/updated by private scripts that are not available to all Spark PMC
and/or RMs, this also makes the ability to improve the release process a
lot more complicated.

Would the Spark PMC please look into addressing these issues asap.

Thanks


On Sun, Sep 17, 2017 at 10:12 PM, Patrick Wendell 
wrote:

> Sparks release pipeline is automated and part of that automation includes
> securely injecting this key for the purpose of signing. I asked the ASF to
> provide a service account key several years ago but they suggested that we
> use a key attributed to an individual even if the process is automated.
>
> I believe other projects that release with high frequency also have
> automated the signing process.
>
> This key is injected during the build process. A really ambitious release
> manager could reverse engineer this in a way that reveals the private key,
> however if someone is a release manager then they themselves can do quite a
> bit of nefarious things anyways.
>
> It is true that we trust all previous release managers instead of only
> one. We could probably rotate the jenkins credentials periodically in order
> to compensate for this, if we think this is a nontrivial risk.
>
> - Patrick
>
> On Sun, Sep 17, 2017 at 7:04 PM, Holden Karau 
> wrote:
>
>> Would any of Patrick/Josh/Shane (or other PMC folks with
>> understanding/opinions on this setup) care to comment? If this is a
>> blocking issue I can cancel the current release vote thread while we
>> discuss this some more.
>>
>> On Fri, Sep 15, 2017 at 5:18 PM Holden Karau 
>> wrote:
>>
>>> Oh yes and to keep people more informed I've been updating a PR for the
>>> release documentation as I go to write down some of this unwritten
>>> knowledge -- https://github.com/apache/spark-website/pull/66
>>>
>>>
>>> On Fri, Sep 15, 2017 at 5:12 PM Holden Karau 
>>> wrote:
>>>
 Also continuing the discussion from the vote threads, Shane probably
 has the best idea on the ACLs for Jenkins so I've CC'd him as well.


 On Fri, Sep 15, 2017 at 5:09 PM Holden Karau 
 wrote:

> Changing the release jobs, beyond the available parameters, right now
> depends on Josh arisen as there are some scripts which generate the jobs
> which aren't public. I've done temporary fixes in the past with the Python
> packaging but my understanding is that in the medium term it requires
> access to the scripts.
>
> So +CC Josh.
>
> On Fri, Sep 15, 2017 at 4:38 PM Ryan Blue  wrote:
>
>> I think this needs to be fixed. It's true that there are barriers to
>> publication, but the signature is what we use to authenticate Apache
>> releases.
>>
>> If Patrick's key is available on Jenkins for any Spark committer to
>> use, then the chance of a compromise are much higher than for a normal RM
>> key.
>>
>> rb
>>
>> On Fri, Sep 15, 2017 at 12:34 PM, Sean Owen 
>> wrote:
>>
>>> Yeah I had meant to ask about that in the past. While I presume
>>> Patrick consents to this and all that, it does mean that anyone with 
>>> access
>>> to said Jenkins scripts can create a signed Spark release, regardless of
>>> who they are.
>>>
>>> I haven't thought through whether that's a theoretical issue we can
>>> ignore or something we need to fix up. For example you can't get a 
>>> release
>>> on the ASF mirrors without more authentication.
>>>
>>> How hard would it be to make the script take in a key? it sort of
>>> looks like the script already takes GPG_KEY, but don't know how to 
>>> modify
>>> the jobs. I suppose it would