from:"Patrick Wendell"

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell

Sounds good - thanks Holden!

On Mon, Sep 18, 2017 at 8:21 PM, Holden Karau <hol...@pigscanfly.ca> wrote:

> That sounds like a pretty good temporary work around if folks agree I'll
> cancel release vote for 2.1.2 and work on getting an RC2 out later this
> week manually signed. I've filed JIRA SPARK-22055 & SPARK-22054 to port the
> release scripts and allow injecting of the RM's key.
>
> On Mon, Sep 18, 2017 at 8:11 PM, Patrick Wendell <patr...@databricks.com>
> wrote:
>
>> For the current release - maybe Holden could just sign the artifacts with
>> her own key manually, if this is a concern. I don't think that would
>> require modifying the release pipeline, except to just remove/ignore the
>> existing signatures.
>>
>> - Patrick
>>
>> On Mon, Sep 18, 2017 at 7:56 PM, Reynold Xin <r...@databricks.com> wrote:
>>
>>> Does anybody know whether this is a hard blocker? If it is not, we
>>> should probably push 2.1.2 forward quickly and do the infrastructure
>>> improvement in parallel.
>>>
>>> On Mon, Sep 18, 2017 at 7:49 PM, Holden Karau <hol...@pigscanfly.ca>
>>> wrote:
>>>
>>>> I'm more than willing to help migrate the scripts as part of either
>>>> this release or the next.
>>>>
>>>> It sounds like there is a consensus developing around changing the
>>>> process -- should we hold off on the 2.1.2 release or roll this into the
>>>> next one?
>>>>
>>>> On Mon, Sep 18, 2017 at 7:37 PM, Marcelo Vanzin <van...@cloudera.com>
>>>> wrote:
>>>>
>>>>> +1 to this. There should be a script in the Spark repo that has all
>>>>> the logic needed for a release. That script should take the RM's key
>>>>> as a parameter.
>>>>>
>>>>> if there's a desire to keep the current Jenkins job to create the
>>>>> release, it should be based on that script. But from what I'm seeing
>>>>> there are currently too many unknowns in the release process.
>>>>>
>>>>> On Mon, Sep 18, 2017 at 4:55 PM, Ryan Blue <rb...@netflix.com.invalid>
>>>>> wrote:
>>>>> > I don't understand why it is necessary to share a release key. If
>>>>> this is
>>>>> > something that can be automated in a Jenkins job, then can it be a
>>>>> script
>>>>> > with a reasonable set of build requirements for Mac and Ubuntu?
>>>>> That's the
>>>>> > approach I've seen the most in other projects.
>>>>> >
>>>>> > I'm also not just concerned about release managers. Having a key
>>>>> stored
>>>>> > persistently on outside infrastructure adds the most risk, as
>>>>> Luciano noted
>>>>> > as well. We should also start publishing checksums in the Spark VOTE
>>>>> thread,
>>>>> > which are currently missing. The risk I'm concerned about is that if
>>>>> the key
>>>>> > were compromised, it would be possible to replace binaries with
>>>>> perfectly
>>>>> > valid ones, at least on some mirrors. If the Apache copy were
>>>>> replaced, then
>>>>> > we wouldn't even be able to catch that it had happened. Given the
>>>>> high
>>>>> > profile of Spark and the number of companies that run it, I think we
>>>>> need to
>>>>> > take extra care to make sure that can't happen, even if it is an
>>>>> annoyance
>>>>> > for the release managers.
>>>>>
>>>>> --
>>>>> Marcelo
>>>>>
>>>>> -
>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>
>>>
>>
>
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell

For the current release - maybe Holden could just sign the artifacts with
her own key manually, if this is a concern. I don't think that would
require modifying the release pipeline, except to just remove/ignore the
existing signatures.

- Patrick

On Mon, Sep 18, 2017 at 7:56 PM, Reynold Xin  wrote:

> Does anybody know whether this is a hard blocker? If it is not, we should
> probably push 2.1.2 forward quickly and do the infrastructure improvement
> in parallel.
>
> On Mon, Sep 18, 2017 at 7:49 PM, Holden Karau 
> wrote:
>
>> I'm more than willing to help migrate the scripts as part of either this
>> release or the next.
>>
>> It sounds like there is a consensus developing around changing the
>> process -- should we hold off on the 2.1.2 release or roll this into the
>> next one?
>>
>> On Mon, Sep 18, 2017 at 7:37 PM, Marcelo Vanzin 
>> wrote:
>>
>>> +1 to this. There should be a script in the Spark repo that has all
>>> the logic needed for a release. That script should take the RM's key
>>> as a parameter.
>>>
>>> if there's a desire to keep the current Jenkins job to create the
>>> release, it should be based on that script. But from what I'm seeing
>>> there are currently too many unknowns in the release process.
>>>
>>> On Mon, Sep 18, 2017 at 4:55 PM, Ryan Blue 
>>> wrote:
>>> > I don't understand why it is necessary to share a release key. If this
>>> is
>>> > something that can be automated in a Jenkins job, then can it be a
>>> script
>>> > with a reasonable set of build requirements for Mac and Ubuntu? That's
>>> the
>>> > approach I've seen the most in other projects.
>>> >
>>> > I'm also not just concerned about release managers. Having a key stored
>>> > persistently on outside infrastructure adds the most risk, as Luciano
>>> noted
>>> > as well. We should also start publishing checksums in the Spark VOTE
>>> thread,
>>> > which are currently missing. The risk I'm concerned about is that if
>>> the key
>>> > were compromised, it would be possible to replace binaries with
>>> perfectly
>>> > valid ones, at least on some mirrors. If the Apache copy were
>>> replaced, then
>>> > we wouldn't even be able to catch that it had happened. Given the high
>>> > profile of Spark and the number of companies that run it, I think we
>>> need to
>>> > take extra care to make sure that can't happen, even if it is an
>>> annoyance
>>> > for the release managers.
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell

Hey I talked more with Josh Rosen about this who has helped with automation
since I became less involved in release management.

I can think of a few different things that would improve our RM based on
these suggestions:

(1) We could remove signing step from the rest of the automation and as the
RM to sign the artifacts locally as a last step. This does mean we'd trust
the RM's environment not to be owned, but it could be better if there is
concern about centralization of risk. I'm curious how other projects do
this.

(2) We could rotate the RM position. BTW Holden Karau is doing this and
that's how this whole discussion started.

(3) We should make sure all build tooling automation is in the repo itself
so that the build is 100% reproducible by anyone. I think most of it is
already in dev/ [1] but there might be jenkins configs, etc that could be
put into the spark repo.

[1] https://github.com/apache/spark/tree/master/dev/create-release

- Patrick

On Mon, Sep 18, 2017 at 6:23 PM, Patrick Wendell <patr...@databricks.com>
wrote:

> One thing we could do is modify the release tooling to allow the key to be
> injected each time, thus allowing any RM to insert their own key at build
> time.
>
> Patrick
>
> On Mon, Sep 18, 2017 at 4:56 PM Ryan Blue <rb...@netflix.com> wrote:
>
>> I don't understand why it is necessary to share a release key. If this is
>> something that can be automated in a Jenkins job, then can it be a script
>> with a reasonable set of build requirements for Mac and Ubuntu? That's the
>> approach I've seen the most in other projects.
>>
>> I'm also not just concerned about release managers. Having a key stored
>> persistently on outside infrastructure adds the most risk, as Luciano noted
>> as well. We should also start publishing checksums in the Spark VOTE
>> thread, which are currently missing. The risk I'm concerned about is that
>> if the key were compromised, it would be possible to replace binaries with
>> perfectly valid ones, at least on some mirrors. If the Apache copy were
>> replaced, then we wouldn't even be able to catch that it had happened.
>> Given the high profile of Spark and the number of companies that run it, I
>> think we need to take extra care to make sure that can't happen, even if it
>> is an annoyance for the release managers.
>>
>> On Sun, Sep 17, 2017 at 10:12 PM, Patrick Wendell <patr...@databricks.com
>> > wrote:
>>
>>> Sparks release pipeline is automated and part of that automation
>>> includes securely injecting this key for the purpose of signing. I asked
>>> the ASF to provide a service account key several years ago but they
>>> suggested that we use a key attributed to an individual even if the process
>>> is automated.
>>>
>>> I believe other projects that release with high frequency also have
>>> automated the signing process.
>>>
>>> This key is injected during the build process. A really ambitious
>>> release manager could reverse engineer this in a way that reveals the
>>> private key, however if someone is a release manager then they themselves
>>> can do quite a bit of nefarious things anyways.
>>>
>>> It is true that we trust all previous release managers instead of only
>>> one. We could probably rotate the jenkins credentials periodically in order
>>> to compensate for this, if we think this is a nontrivial risk.
>>>
>>> - Patrick
>>>
>>> On Sun, Sep 17, 2017 at 7:04 PM, Holden Karau <hol...@pigscanfly.ca>
>>> wrote:
>>>
>>>> Would any of Patrick/Josh/Shane (or other PMC folks with
>>>> understanding/opinions on this setup) care to comment? If this is a
>>>> blocking issue I can cancel the current release vote thread while we
>>>> discuss this some more.
>>>>
>>>> On Fri, Sep 15, 2017 at 5:18 PM Holden Karau <hol...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>> Oh yes and to keep people more informed I've been updating a PR for
>>>>> the release documentation as I go to write down some of this unwritten
>>>>> knowledge -- https://github.com/apache/spark-website/pull/66
>>>>>
>>>>>
>>>>> On Fri, Sep 15, 2017 at 5:12 PM Holden Karau <hol...@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>> Also continuing the discussion from the vote threads, Shane probably
>>>>>> has the best idea on the ACLs for Jenkins so I've CC'd him as well.
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 15, 2017 at 5:09 P

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell

One thing we could do is modify the release tooling to allow the key to be
injected each time, thus allowing any RM to insert their own key at build
time.

Patrick

On Mon, Sep 18, 2017 at 4:56 PM Ryan Blue <rb...@netflix.com> wrote:

> I don't understand why it is necessary to share a release key. If this is
> something that can be automated in a Jenkins job, then can it be a script
> with a reasonable set of build requirements for Mac and Ubuntu? That's the
> approach I've seen the most in other projects.
>
> I'm also not just concerned about release managers. Having a key stored
> persistently on outside infrastructure adds the most risk, as Luciano noted
> as well. We should also start publishing checksums in the Spark VOTE
> thread, which are currently missing. The risk I'm concerned about is that
> if the key were compromised, it would be possible to replace binaries with
> perfectly valid ones, at least on some mirrors. If the Apache copy were
> replaced, then we wouldn't even be able to catch that it had happened.
> Given the high profile of Spark and the number of companies that run it, I
> think we need to take extra care to make sure that can't happen, even if it
> is an annoyance for the release managers.
>
> On Sun, Sep 17, 2017 at 10:12 PM, Patrick Wendell <patr...@databricks.com>
> wrote:
>
>> Sparks release pipeline is automated and part of that automation includes
>> securely injecting this key for the purpose of signing. I asked the ASF to
>> provide a service account key several years ago but they suggested that we
>> use a key attributed to an individual even if the process is automated.
>>
>> I believe other projects that release with high frequency also have
>> automated the signing process.
>>
>> This key is injected during the build process. A really ambitious release
>> manager could reverse engineer this in a way that reveals the private key,
>> however if someone is a release manager then they themselves can do quite a
>> bit of nefarious things anyways.
>>
>> It is true that we trust all previous release managers instead of only
>> one. We could probably rotate the jenkins credentials periodically in order
>> to compensate for this, if we think this is a nontrivial risk.
>>
>> - Patrick
>>
>> On Sun, Sep 17, 2017 at 7:04 PM, Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>> Would any of Patrick/Josh/Shane (or other PMC folks with
>>> understanding/opinions on this setup) care to comment? If this is a
>>> blocking issue I can cancel the current release vote thread while we
>>> discuss this some more.
>>>
>>> On Fri, Sep 15, 2017 at 5:18 PM Holden Karau <hol...@pigscanfly.ca>
>>> wrote:
>>>
>>>> Oh yes and to keep people more informed I've been updating a PR for the
>>>> release documentation as I go to write down some of this unwritten
>>>> knowledge -- https://github.com/apache/spark-website/pull/66
>>>>
>>>>
>>>> On Fri, Sep 15, 2017 at 5:12 PM Holden Karau <hol...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>> Also continuing the discussion from the vote threads, Shane probably
>>>>> has the best idea on the ACLs for Jenkins so I've CC'd him as well.
>>>>>
>>>>>
>>>>> On Fri, Sep 15, 2017 at 5:09 PM Holden Karau <hol...@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>> Changing the release jobs, beyond the available parameters, right now
>>>>>> depends on Josh arisen as there are some scripts which generate the jobs
>>>>>> which aren't public. I've done temporary fixes in the past with the 
>>>>>> Python
>>>>>> packaging but my understanding is that in the medium term it requires
>>>>>> access to the scripts.
>>>>>>
>>>>>> So +CC Josh.
>>>>>>
>>>>>> On Fri, Sep 15, 2017 at 4:38 PM Ryan Blue <rb...@netflix.com> wrote:
>>>>>>
>>>>>>> I think this needs to be fixed. It's true that there are barriers to
>>>>>>> publication, but the signature is what we use to authenticate Apache
>>>>>>> releases.
>>>>>>>
>>>>>>> If Patrick's key is available on Jenkins for any Spark committer to
>>>>>>> use, then the chance of a compromise are much higher than for a normal 
>>>>>>> RM
>>>>>>> key.
>>>>>>>
>>>>>>> rb

Re: Signing releases with pwendell or release manager's key?

2017-09-17 Thread Patrick Wendell

Sparks release pipeline is automated and part of that automation includes
securely injecting this key for the purpose of signing. I asked the ASF to
provide a service account key several years ago but they suggested that we
use a key attributed to an individual even if the process is automated.

I believe other projects that release with high frequency also have
automated the signing process.

This key is injected during the build process. A really ambitious release
manager could reverse engineer this in a way that reveals the private key,
however if someone is a release manager then they themselves can do quite a
bit of nefarious things anyways.

It is true that we trust all previous release managers instead of only one.
We could probably rotate the jenkins credentials periodically in order to
compensate for this, if we think this is a nontrivial risk.

- Patrick

On Sun, Sep 17, 2017 at 7:04 PM, Holden Karau  wrote:

> Would any of Patrick/Josh/Shane (or other PMC folks with
> understanding/opinions on this setup) care to comment? If this is a
> blocking issue I can cancel the current release vote thread while we
> discuss this some more.
>
> On Fri, Sep 15, 2017 at 5:18 PM Holden Karau  wrote:
>
>> Oh yes and to keep people more informed I've been updating a PR for the
>> release documentation as I go to write down some of this unwritten
>> knowledge -- https://github.com/apache/spark-website/pull/66
>>
>>
>> On Fri, Sep 15, 2017 at 5:12 PM Holden Karau 
>> wrote:
>>
>>> Also continuing the discussion from the vote threads, Shane probably has
>>> the best idea on the ACLs for Jenkins so I've CC'd him as well.
>>>
>>>
>>> On Fri, Sep 15, 2017 at 5:09 PM Holden Karau 
>>> wrote:
>>>
 Changing the release jobs, beyond the available parameters, right now
 depends on Josh arisen as there are some scripts which generate the jobs
 which aren't public. I've done temporary fixes in the past with the Python
 packaging but my understanding is that in the medium term it requires
 access to the scripts.

 So +CC Josh.

 On Fri, Sep 15, 2017 at 4:38 PM Ryan Blue  wrote:

> I think this needs to be fixed. It's true that there are barriers to
> publication, but the signature is what we use to authenticate Apache
> releases.
>
> If Patrick's key is available on Jenkins for any Spark committer to
> use, then the chance of a compromise are much higher than for a normal RM
> key.
>
> rb
>
> On Fri, Sep 15, 2017 at 12:34 PM, Sean Owen 
> wrote:
>
>> Yeah I had meant to ask about that in the past. While I presume
>> Patrick consents to this and all that, it does mean that anyone with 
>> access
>> to said Jenkins scripts can create a signed Spark release, regardless of
>> who they are.
>>
>> I haven't thought through whether that's a theoretical issue we can
>> ignore or something we need to fix up. For example you can't get a 
>> release
>> on the ASF mirrors without more authentication.
>>
>> How hard would it be to make the script take in a key? it sort of
>> looks like the script already takes GPG_KEY, but don't know how to modify
>> the jobs. I suppose it would be ideal, in any event, for the actual 
>> release
>> manager to sign.
>>
>> On Fri, Sep 15, 2017 at 8:28 PM Holden Karau 
>> wrote:
>>
>>> That's a good question, I built the release candidate however the
>>> Jenkins scripts don't take a parameter for configuring who signs them
>>> rather it always signs them with Patrick's key. You can see this from
>>> previous releases which were managed by other folks but still signed by
>>> Patrick.
>>>
>>> On Fri, Sep 15, 2017 at 12:16 PM, Ryan Blue 
>>> wrote:
>>>
 The signature is valid, but why was the release signed with Patrick
 Wendell's private key? Did Patrick build the release candidate?

>>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
 --
 Twitter: https://twitter.com/holdenkarau

>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
> --
> Twitter: https://twitter.com/holdenkarau
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Patrick Wendell

+1

On Wed, Dec 16, 2015 at 6:15 PM, Ted Yu  wrote:

> Ran test suite (minus docker-integration-tests)
> All passed
>
> +1
>
> [INFO] Spark Project External ZeroMQ .. SUCCESS [
> 13.647 s]
> [INFO] Spark Project External Kafka ... SUCCESS [
> 45.424 s]
> [INFO] Spark Project Examples . SUCCESS [02:06
> min]
> [INFO] Spark Project External Kafka Assembly .. SUCCESS [
> 11.280 s]
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 01:49 h
> [INFO] Finished at: 2015-12-16T17:06:58-08:00
>
> On Wed, Dec 16, 2015 at 4:37 PM, Andrew Or  wrote:
>
>> +1
>>
>> Mesos cluster mode regression in RC2 is now fixed (SPARK-12345
>>  / PR10332
>> ).
>>
>> Also tested on standalone client and cluster mode. No problems.
>>
>> 2015-12-16 15:16 GMT-08:00 Rad Gruchalski :
>>
>>> I also noticed that spark.replClassServer.host and
>>> spark.replClassServer.port aren’t used anymore. The transport now happens
>>> over the main RpcEnv.
>>>
>>> Kind regards,
>>> Radek Gruchalski
>>> ra...@gruchalski.com 
>>> de.linkedin.com/in/radgruchalski/
>>>
>>>
>>> *Confidentiality:*This communication is intended for the above-named
>>> person and may be confidential and/or legally privileged.
>>> If it has come to you in error you must take no action based on it, nor
>>> must you copy or show it to anyone; please delete/destroy and inform the
>>> sender immediately.
>>>
>>> On Wednesday, 16 December 2015 at 23:43, Marcelo Vanzin wrote:
>>>
>>> I was going to say that spark.executor.port is not used anymore in
>>> 1.6, but damn, there's still that akka backend hanging around there
>>> even when netty is being used... we should fix this, should be a
>>> simple one-liner.
>>>
>>> On Wed, Dec 16, 2015 at 2:35 PM, singinpirate 
>>> wrote:
>>>
>>> -0 (non-binding)
>>>
>>> I have observed that when we set spark.executor.port in 1.6, we get
>>> thrown a
>>> NPE in SparkEnv$.create(SparkEnv.scala:259). It used to work in 1.5.2. Is
>>> anyone else seeing this?
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>>
>>
>

Re: When to cut RCs

2015-12-02 Thread Patrick Wendell

In terms of advertising to people the status of the release and whether an
RC is likely to go out, the best mechanism I can think of is our current
mechanism of using JIRA and respecting the semantics of a blocker JIRA. We
could do a better job though creating a JIRA dashboard for each release and
linking to it publicly so it's very clear to people what is going on. I
have always used one privately when managing previous releases, but no
reason we can't put one up on the website or wiki.

IMO a mailing list is not a great mechanism for the fine-grained work of
release management because of the sheer complexity and volume of finalizing
a spark release. Being a release manager means tracking over a course of
several weeks typically dozens of distinct issues and trying to prioritize
them, get more clarity from the report of those issues, possibly reaching
out to people on the phone or in person to get more details, etc. You want
a mutable dashboard where you can convey the current status clearly.

What might be good in the early stages is a weekly e-mail to the dev@ list
just refreshing what is on the JIRA and letting people know how things are
looking. So someone just passing by has some idea of how things are going
and can chime in, etc.

Once an RC is cut then we do mostly rely on the mailing list for
discussion. At that point the number of known issues is small enough I
think to discuss in an all-to-all fashion.

- Patrick

On Wed, Dec 2, 2015 at 1:25 PM, Sean Owen  wrote:

> On Wed, Dec 2, 2015 at 9:06 PM, Michael Armbrust 
> wrote:
> > This can be debated, but I explicitly ignored test and documentation
> issues.
> > Since the docs are published separately and easy to update, I don't think
> > its worth further disturbing the release cadence for these JIRAs.
>
> It makes sense to not hold up an RC since they don't affect testing of
> functionality. Prior releases have ultimately gone out with doc issues
> still outstanding (and bugs) though. This doesn't seem to be on
> anyone's release checklist, and maybe part of it is because they're
> let slide for RCs.  Your suggestion to check-point release status
> below sounds spot on; I sort of tried to do that earlier.
>
>
> > Up until today various committers have told me that there were known
> issues
> > with branch-1.6 that would cause them to -1 the release.  Whenever this
> > happened, I asked them to ensure there was a properly targeted blocker
> JIRA
> > open so people could publicly track the status of the release.  As long
> as
> > such issues were open, I only published a preview since making an RC is
> > pretty high cost.
>
> Makes sense if these are all getting translated into Blockers and
> resolved before an RC. It's the simplest mechanism to communicate and
> track this in a distributed way.
>
> "No blockers" is a minimal criterion for release. It still seems funny
> to release with so many issues targeted for 1.6.0, including issues
> that aren't critical or bugs. Sure, that's just hygiene. But without
> it, do people take "Target Version" seriously? if they don't, is there
> any force guiding people to prioritize or decide what to (not) work
> on? I'm sure the communication happens, just doesn't seem like it's
> fully on JIRA, which is ultimately suboptimal.
>
>
> > I actually did spent quite a bit of time asking people to close various
> > umbrella issues, and I was pretty strict about watching JIRA throughout
> the
> > process.  Perhaps as an additional step, future preview releases or
> branch
> > cuts can include a link to an authoritative dashboard that we will use to
> > decide when we are ready to make an RC.  I'm also open to other
> suggestions.
>
> Yes, that's great. It takes the same effort from everyone. Having a
> green light on a dashboard at release time is only the symptom of
> decent planning. The effect I think it really needs to have occurs
> now: what's really probably on the menu for 1.7? and periodically
> track against that goal. Then the release process is with any luck
> just a formality with no surprises.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: A proposal for Spark 2.0

2015-11-10 Thread Patrick Wendell

I also feel the same as Reynold. I agree we should minimize API breaks and
focus on fixing things around the edge that were mistakes (e.g. exposing
Guava and Akka) rather than any overhaul that could fragment the community.
Ideally a major release is a lightweight process we can do every couple of
years, with minimal impact for users.

- Patrick

On Tue, Nov 10, 2015 at 3:35 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> > For this reason, I would *not* propose doing major releases to break
> substantial API's or perform large re-architecting that prevent users from
> upgrading. Spark has always had a culture of evolving architecture
> incrementally and making changes - and I don't think we want to change this
> model.
>
> +1 for this. The Python community went through a lot of turmoil over the
> Python 2 -> Python 3 transition because the upgrade process was too painful
> for too long. The Spark community will benefit greatly from our explicitly
> looking to avoid a similar situation.
>
> > 3. Assembly-free distribution of Spark: don’t require building an
> enormous assembly jar in order to run Spark.
>
> Could you elaborate a bit on this? I'm not sure what an assembly-free
> distribution means.
>
> Nick
>
> On Tue, Nov 10, 2015 at 6:11 PM Reynold Xin  wrote:
>
>> I’m starting a new thread since the other one got intermixed with feature
>> requests. Please refrain from making feature request in this thread. Not
>> that we shouldn’t be adding features, but we can always add features in
>> 1.7, 2.1, 2.2, ...
>>
>> First - I want to propose a premise for how to think about Spark 2.0 and
>> major releases in Spark, based on discussion with several members of the
>> community: a major release should be low overhead and minimally disruptive
>> to the Spark community. A major release should not be very different from a
>> minor release and should not be gated based on new features. The main
>> purpose of a major release is an opportunity to fix things that are broken
>> in the current API and remove certain deprecated APIs (examples follow).
>>
>> For this reason, I would *not* propose doing major releases to break
>> substantial API's or perform large re-architecting that prevent users from
>> upgrading. Spark has always had a culture of evolving architecture
>> incrementally and making changes - and I don't think we want to change this
>> model. In fact, we’ve released many architectural changes on the 1.X line.
>>
>> If the community likes the above model, then to me it seems reasonable to
>> do Spark 2.0 either after Spark 1.6 (in lieu of Spark 1.7) or immediately
>> after Spark 1.7. It will be 18 or 21 months since Spark 1.0. A cadence of
>> major releases every 2 years seems doable within the above model.
>>
>> Under this model, here is a list of example things I would propose doing
>> in Spark 2.0, separated into APIs and Operation/Deployment:
>>
>>
>> APIs
>>
>> 1. Remove interfaces, configs, and modules (e.g. Bagel) deprecated in
>> Spark 1.x.
>>
>> 2. Remove Akka from Spark’s API dependency (in streaming), so user
>> applications can use Akka (SPARK-5293). We have gotten a lot of complaints
>> about user applications being unable to use Akka due to Spark’s dependency
>> on Akka.
>>
>> 3. Remove Guava from Spark’s public API (JavaRDD Optional).
>>
>> 4. Better class package structure for low level developer API’s. In
>> particular, we have some DeveloperApi (mostly various listener-related
>> classes) added over the years. Some packages include only one or two public
>> classes but a lot of private classes. A better structure is to have public
>> classes isolated to a few public packages, and these public packages should
>> have minimal private classes for low level developer APIs.
>>
>> 5. Consolidate task metric and accumulator API. Although having some
>> subtle differences, these two are very similar but have completely
>> different code path.
>>
>> 6. Possibly making Catalyst, Dataset, and DataFrame more general by
>> moving them to other package(s). They are already used beyond SQL, e.g. in
>> ML pipelines, and will be used by streaming also.
>>
>>
>> Operation/Deployment
>>
>> 1. Scala 2.11 as the default build. We should still support Scala 2.10,
>> but it has been end-of-life.
>>
>> 2. Remove Hadoop 1 support.
>>
>> 3. Assembly-free distribution of Spark: don’t require building an
>> enormous assembly jar in order to run Spark.
>>
>>

Re: State of the Build

2015-11-05 Thread Patrick Wendell

Hey Jakob,

The builds in Spark are largely maintained by me, Sean, and Michael
Armbrust (for SBT). For historical reasons, Spark supports both a Maven and
SBT build. Maven is the build of reference for packaging Spark and is used
by many downstream packagers and to build all Spark releases. SBT is more
often used by developers. Both builds inherit from the same pom files (and
rely on the same profiles) to minimize maintenance complexity of Spark's
very complex dependency graph.

If you are looking to make contributions that help with the build, I am
happy to point you towards some things that are consistent maintenance
headaches. There are two major pain points right now that I'd be thrilled
to see fixes for:

1. SBT relies on a different dependency conflict resolution strategy than
maven - causing all kinds of headaches for us. I have heard that newer
versions of SBT can (maybe?) use Maven as a dependency resolver instead of
Ivy. This would make our life so much better if it were possible, either by
virtue of upgrading SBT or somehow doing this ourselves.

2. We don't have a great way of auditing the net effect of dependency
changes when people make them in the build. I am working on a fairly clunky
patch to do this here:

https://github.com/apache/spark/pull/8531

It could be done much more nicely using SBT, but only provided (1) is
solved.

Doing a major overhaul of the sbt build to decouple it from pom files, I'm
not sure that's the best place to start, given that we need to continue to
support maven - the coupling is intentional. But getting involved in the
build in general would be completely welcome.

- Patrick

On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen  wrote:

> Maven isn't 'legacy', or supported for the benefit of third parties.
> SBT had some behaviors / problems that Maven didn't relative to what
> Spark needs. SBT is a development-time alternative only, and partly
> generated from the Maven build.
>
> On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers  wrote:
> > People who do upstream builds of spark (think bigtop and hadoop distros)
> are
> > used to legacy systems like maven, so maven is the default build. I don't
> > think it will change.
> >
> > Any improvements for the sbt build are of course welcome (it is still
> used
> > by many developers), but i would not do anything that increases the
> burden
> > of maintaining two build systems.
> >
> > On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:
> >>
> >> Hi everyone,
> >> in the process of learning Spark, I wanted to get an overview of the
> >> interaction between all of its sub-projects. I therefore decided to
> have a
> >> look at the build setup and its dependency management.
> >> Since I am alot more comfortable using sbt than maven, I decided to try
> to
> >> port the maven configuration to sbt (with the help of automated tools).
> >> This led me to a couple of observations and questions on the build
> system
> >> design:
> >>
> >> First, currently, there are two build systems, maven and sbt. Is there a
> >> preferred tool (or future direction to one)?
> >>
> >> Second, the sbt build also uses maven "profiles" requiring the use of
> >> specific commandline parameters when starting sbt. Furthermore, since it
> >> relies on maven poms, dependencies to the scala binary version (_2.xx)
> are
> >> hardcoded and require running an external script when switching
> versions.
> >> Sbt could leverage built-in constructs to support cross-compilation and
> >> emulate profiles with configurations and new build targets. This would
> >> remove external state from the build (in that no extra steps need to be
> >> performed in a particular order to generate artifacts for a new
> >> configuration) and therefore improve stability and build reproducibility
> >> (maybe even build performance). I was wondering if implementing such
> >> functionality for the sbt build would be welcome?
> >>
> >> thanks,
> >> --Jakob
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: test failed due to OOME

2015-11-02 Thread Patrick Wendell

I believe this is some bug in our tests. For some reason we are using way
more memory than necessary. We'll probably need to log into Jenkins and
heap dump some running tests and figure out what is going on.

On Mon, Nov 2, 2015 at 7:42 AM, Ted Yu  wrote:

> Looks like SparkListenerSuite doesn't OOM on QA runs compared to Jenkins
> builds.
>
> I wonder if this is due to difference between machines running QA tests vs
> machines running Jenkins builds.
>
> On Fri, Oct 30, 2015 at 1:19 PM, Ted Yu  wrote:
>
>> I noticed that the SparkContext created in each sub-test is not stopped
>> upon finishing sub-test.
>>
>> Would stopping each SparkContext make a difference in terms of heap
>> memory consumption ?
>>
>> Cheers
>>
>> On Fri, Oct 30, 2015 at 12:04 PM, Mridul Muralidharan 
>> wrote:
>>
>>> It is giving OOM at 32GB ? Something looks wrong with that ... that is
>>> already on the higher side.
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>> On Fri, Oct 30, 2015 at 11:28 AM, shane knapp 
>>> wrote:
>>> > here's the current heap settings on our workers:
>>> > InitialHeapSize == 2.1G
>>> > MaxHeapSize == 32G
>>> >
>>> > system ram:  128G
>>> >
>>> > we can bump it pretty easily...  it's just a matter of deciding if we
>>> > want to do this globally (super easy, but will affect ALL maven builds
>>> > on our system -- not just spark) or on a per-job basis (this doesn't
>>> > scale that well).
>>> >
>>> > thoughts?
>>> >
>>> > On Fri, Oct 30, 2015 at 9:47 AM, Ted Yu  wrote:
>>> >> This happened recently on Jenkins:
>>> >>
>>> >>
>>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.3,label=spark-test/3964/console
>>> >>
>>> >> On Sun, Oct 18, 2015 at 7:54 AM, Ted Yu  wrote:
>>> >>>
>>> >>> From
>>> >>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/3846/console
>>> >>> :
>>> >>>
>>> >>> SparkListenerSuite:
>>> >>> - basic creation and shutdown of LiveListenerBus
>>> >>> - bus.stop() waits for the event queue to completely drain
>>> >>> - basic creation of StageInfo
>>> >>> - basic creation of StageInfo with shuffle
>>> >>> - StageInfo with fewer tasks than partitions
>>> >>> - local metrics
>>> >>> - onTaskGettingResult() called when result fetched remotely ***
>>> FAILED ***
>>> >>>   org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Task
>>> >>> 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in
>>> stage
>>> >>> 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: Java heap space
>>> >>>  at java.util.Arrays.copyOf(Arrays.java:2271)
>>> >>>  at
>>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>>> >>>  at
>>> >>>
>>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>>> >>>  at
>>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>>> >>>  at
>>> >>>
>>> java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1852)
>>> >>>  at java.io.ObjectOutputStream.write(ObjectOutputStream.java:708)
>>> >>>  at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:182)
>>> >>>  at
>>> >>>
>>> org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:52)
>>> >>>  at
>>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160)
>>> >>>  at
>>> >>>
>>> org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:49)
>>> >>>  at
>>> >>>
>>> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458)
>>> >>>  at
>>> >>>
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
>>> >>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>>> >>>  at
>>> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>>> >>>  at
>>> >>>
>>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
>>> >>>  at
>>> >>>
>>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
>>> >>>  at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
>>> >>>  at
>>> >>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>  at
>>> >>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>  at java.lang.Thread.run(Thread.java:745)
>>> >>>
>>> >>>
>>> >>> Should more heap be given to test suite ?
>>> >>>
>>> >>>
>>> >>> Cheers
>>> >>
>>> >>
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> > For additional commands, e-mail: dev-h...@spark.apache.org
>>> >
>>>
>>
>>
>

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

2015-10-26 Thread Patrick Wendell

I verified that the issue with build binaries being present in the source
release is fixed. Haven't done enough vetting for a full vote, but did
verify that.

On Sun, Oct 25, 2015 at 12:07 AM, Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 1.5.2. The vote is open until Wed Oct 28, 2015 at 08:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.5.2
> [ ] -1 Do not release this package because ...
>
>
> The release fixes 51 known issues in Spark 1.5.1, listed here:
> http://s.apache.org/spark-1.5.2
>
> The tag to be voted on is v1.5.2-rc1:
> https://github.com/apache/spark/releases/tag/v1.5.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> *http://people.apache.org/~pwendell/spark-releases/spark-1.5.2-rc1-bin/
> *
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> - as version 1.5.2-rc1:
> https://repository.apache.org/content/repositories/orgapachespark-1151
> - as version 1.5.2:
> https://repository.apache.org/content/repositories/orgapachespark-1150
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-v1.5.2-rc1-docs/
>
>
> ===
> How can I help test this release?
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> What justifies a -1 vote for this release?
> 
> -1 vote should occur for regressions from Spark 1.5.1. Bugs already
> present in 1.5.1 will not block this release.
>
> ===
> What should happen to JIRA tickets still targeting 1.5.2?
> ===
> Please target 1.5.3 or 1.6.0.
>
>
>

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread Patrick Wendell

This is what I'm looking at:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/



On Mon, Oct 19, 2015 at 12:58 PM, shane knapp <skn...@berkeley.edu> wrote:

> all we did was reboot -05 and -03...  i'm seeing a bunch of green
> builds.  could you provide me w/some specific failures so i can look
> in to them more closely?
>
> On Mon, Oct 19, 2015 at 12:27 PM, Patrick Wendell <pwend...@gmail.com>
> wrote:
> > Hey Shane,
> >
> > It also appears that every Spark build is failing right now. Could it be
> > related to your changes?
> >
> > - Patrick
> >
> > On Mon, Oct 19, 2015 at 11:13 AM, shane knapp <skn...@berkeley.edu>
> wrote:
> >>
> >> worker 05 is back up now...  looks like the machine OOMed and needed
> >> to be kicked.
> >>
> >> On Mon, Oct 19, 2015 at 9:39 AM, shane knapp <skn...@berkeley.edu>
> wrote:
> >> > i'll have to head down to the colo and see what's up with it...  it
> >> > seems to be wedged (pings ok, can't ssh in) and i'll update the list
> >> > when i figure out what's wrong.
> >> >
> >> > i don't think it caught fire (#toosoon?), because everything else is
> >> > up and running.  :)
> >> >
> >> > shane
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "amp-infra" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to amp-infra+unsubscr...@googlegroups.com.
> >> For more options, visit https://groups.google.com/d/optout.
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "amp-infra" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to amp-infra+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "amp-infra" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to amp-infra+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread Patrick Wendell

I think many of them are coming form the Spark 1.4 builds:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-1.4-Maven-pre-YARN/3900/console

On Mon, Oct 19, 2015 at 1:44 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> This is what I'm looking at:
>
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/
>
>
>
> On Mon, Oct 19, 2015 at 12:58 PM, shane knapp <skn...@berkeley.edu> wrote:
>
>> all we did was reboot -05 and -03...  i'm seeing a bunch of green
>> builds.  could you provide me w/some specific failures so i can look
>> in to them more closely?
>>
>> On Mon, Oct 19, 2015 at 12:27 PM, Patrick Wendell <pwend...@gmail.com>
>> wrote:
>> > Hey Shane,
>> >
>> > It also appears that every Spark build is failing right now. Could it be
>> > related to your changes?
>> >
>> > - Patrick
>> >
>> > On Mon, Oct 19, 2015 at 11:13 AM, shane knapp <skn...@berkeley.edu>
>> wrote:
>> >>
>> >> worker 05 is back up now...  looks like the machine OOMed and needed
>> >> to be kicked.
>> >>
>> >> On Mon, Oct 19, 2015 at 9:39 AM, shane knapp <skn...@berkeley.edu>
>> wrote:
>> >> > i'll have to head down to the colo and see what's up with it...  it
>> >> > seems to be wedged (pings ok, can't ssh in) and i'll update the list
>> >> > when i figure out what's wrong.
>> >> >
>> >> > i don't think it caught fire (#toosoon?), because everything else is
>> >> > up and running.  :)
>> >> >
>> >> > shane
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "amp-infra" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> >> email to amp-infra+unsubscr...@googlegroups.com.
>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "amp-infra" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to amp-infra+unsubscr...@googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "amp-infra" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to amp-infra+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread Patrick Wendell

Hey Shane,

It also appears that every Spark build is failing right now. Could it be
related to your changes?

- Patrick

On Mon, Oct 19, 2015 at 11:13 AM, shane knapp  wrote:

> worker 05 is back up now...  looks like the machine OOMed and needed
> to be kicked.
>
> On Mon, Oct 19, 2015 at 9:39 AM, shane knapp  wrote:
> > i'll have to head down to the colo and see what's up with it...  it
> > seems to be wedged (pings ok, can't ssh in) and i'll update the list
> > when i figure out what's wrong.
> >
> > i don't think it caught fire (#toosoon?), because everything else is
> > up and running.  :)
> >
> > shane
>
> --
> You received this message because you are subscribed to the Google Groups
> "amp-infra" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to amp-infra+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Re: Is "mllib" no longer Experimental?

2015-10-14 Thread Patrick Wendell

I would tend to agree with this approach. We should audit all
@Experimenetal labels before the 1.6 release and clear them out when
appropriate.

- Patrick

On Wed, Oct 14, 2015 at 2:13 AM, Sean Owen  wrote:

> Someone asked, is "ML pipelines" stable? I said, no, most of the key
> classes are still marked @Experimental, which matches my expression that
> things may still be subject to change.
>
> But then, I see that MLlib classes, which are de facto not seeing much
> further work and no API change, are also mostly marked @Experimental. If,
> generally, no more significant work is going into MLlib classes, is it time
> to remove most or all of those labels, to keep it meaningful?
>
> Sean
>

Re: Status of SBT Build

2015-10-14 Thread Patrick Wendell

Hi Jakob,

There is a temporary issue with the Scala 2.11 build in SBT. The problem is
this wasn't previously covered by our automated tests so it broke without
us knowing - this has been actively discussed on the dev list in the last
24 hours. I am trying to get it working in our test harness today.

In terms of fixing the underlying issues, I am not sure whether there is a
JIRA for it yet, but we should make one if not. Does anyone know?

- Patrick

On Wed, Oct 14, 2015 at 12:13 PM, Jakob Odersky  wrote:

> Hi everyone,
>
> I've been having trouble building Spark with SBT recently. Scala 2.11
> doesn't work and in all cases I get large amounts of warnings and even
> errors on tests.
>
> I was therefore wondering what the official status of spark with sbt is?
> Is it very new and still buggy or unmaintained and "falling to pieces"?
>
> In any case, I would be glad to help with any issues on setting up a clean
> and working build with sbt.
>
> thanks,
> --Jakob
>

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-12 Thread Patrick Wendell

It's really easy to create and modify those builds. If the issue is that we
need to add SBT or Maven to the existing one, it's a short change. We can
just have it build both of them. I wasn't aware of things breaking before
in one build but not another.

- Patrick

On Mon, Oct 12, 2015 at 9:21 AM, Sean Owen <so...@cloudera.com> wrote:

> Yeah, was the issue that it had to be built vs Maven to show the error
> and this uses SBT -- or vice versa? that's why the existing test
> didn't detect it. Was just thinking of adding one more of these non-PR
> builds, but I forget if there was a reason this is hard. Certainly not
> worth building for each PR.
>
> On Mon, Oct 12, 2015 at 5:16 PM, Patrick Wendell <pwend...@gmail.com>
> wrote:
> > We already do automated compile testing for Scala 2.11 similar to Hadoop
> > versions:
> >
> > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/
> >
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-master-Scala211-Compile/buildTimeTrend
> >
> >
> > If you look, this build takes 7-10 minutes, so it's a nontrivial
> increase to
> > add it to all new PR's. Also, it's only broken once in the last few
> months
> > (despite many patches going in) - a pretty low failure rate. For
> scenarios
> > like this it's better to test it asynchronously. We can even just revert
> a
> > patch immediately if it's found to break 2.11.
> >
> > Put another way - we typically have 1000 patches or more per release.
> Even
> > at one jenkins run per patch: 7 minutes * 1000 = 7 days of developer
> > productivity loss. Compare that to having a few times where we have to
> > revert a patch and ask someone to resubmit (which maybe takes at most one
> > hour)... it's not worth it.
> >
> > - Patrick
> >
> > On Mon, Oct 12, 2015 at 8:24 AM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> There are many Jenkins jobs besides the pull request builder that
> >> build against various Hadoop combinations, for example, in the
> >> background. Is there an obstacle to building vs 2.11 on both Maven and
> >> SBT this way?
> >>
> >> On Mon, Oct 12, 2015 at 2:55 PM, Iulian Dragoș
> >> <iulian.dra...@typesafe.com> wrote:
> >> > Anything that can be done by a machine should be done by a machine. I
> am
> >> > not
> >> > sure we have enough data to say it's only once or twice per release,
> and
> >> > even if we were to issue a PR for each breakage, it's additional load
> on
> >> > committers and reviewers, not to mention our own work. I personally
> >> > don't
> >> > see how 2-3 minutes of compute time per PR can justify hours of work
> >> > plus
> >> > reviews.
> >
> >
>

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell

I think Daniel is correct here. The source artifact incorrectly includes
jars. It is inadvertent and not part of our intended release process. This
was something I noticed in Spark 1.5.0 and filed a JIRA and was fixed by
updating our build scripts to fix it. However, our build environment was
not using the most current version of the build scripts. See related links:

https://issues.apache.org/jira/browse/SPARK-10511
https://github.com/apache/spark/pull/8774/files

I can update our build environment and we can repackage the Spark 1.5.1
source tarball. To not include sources.

- Patrick

On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen  wrote:

> Daniel: we did not vote on a tag. Please again read the VOTE email I
> linked to you:
>
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
>
> among other things, it contains a link to the concrete source (and
> binary) distribution under vote:
>
> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>
> You can still examine it, sure.
>
> Dependencies are *not* bundled in the source release. You're again
> misunderstanding what you are seeing. Read my email again.
>
> I am still pretty confused about what the problem is. This is entirely
> business as usual for ASF projects. I'll follow up with you offline if
> you have any more doubts.
>
> On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno 
> wrote:
> > Here's my issue:
> >
> > How am I to audit that the dependencies you bundle are in fact what you
> > claim they are?  How do I know they don't contain malware or - in light
> > of recent events - emissions test rigging? ;)
> >
> > I am not interested in a git tag - that means nothing in the ASF voting
> > process, you cannot vote on a tag, only on a release candidate. The VCS
> > in use is irrelevant in this issue. If you can point me to a release
> > candidate archive that was voted upon and does not contain binary
> > applications, all is well.
> >
> > If there is no such thing, and we cannot come to an understanding, I
> > will exercise my ASF Members' rights and bring this to the attention of
> > the board of directors and ask for a clarification of the legality of
> this.
> >
> > I find it highly irregular. Perhaps it is something some projects do in
> > the Java community, but that doesn't make it permissible in my view.
> >
> > With regards,
> > Daniel.
> >
> >
> > On 10/11/2015 05:42 PM, Sean Owen wrote:
> >> Still confused. Why are you saying we didn't vote on an archive? refer
> >> to the email I linked, which includes both the git tag and a link to
> >> all generated artifacts (also in my email).
> >>
> >> So, there are two things at play here:
> >>
> >> First, I am not sure what you mean that a source distro can't have
> >> binary files. It's supposed to have the source code of Spark, and
> >> shouldn't contain binary Spark. Nothing you listed are Spark binaries.
> >> However, a distribution might have a lot of things in it that support
> >> the source build, like copies of tools, test files, etc.  That
> >> explains I think the first couple lines that you identified.
> >>
> >> Still, I am curious why you are saying that would invalidate a source
> >> release? I have never heard anything like that.
> >>
> >> Second, I do think there are some binaries in here that aren't
> >> supposed to be there, like the build/ directory stuff. IIRC these were
> >> included accidentally and won't be in the next release. At least, I
> >> don't see why they need to be bundled. These are just local copies of
> >> third party tools though, and don't really matter. As it happens, the
> >> licenses that get distributed with the source distro even cover all of
> >> this stuff. I think that's not supposed to be there, but, also don't
> >> see it's 'invalid' as a result.
> >>
> >>
> >> On Sun, Oct 11, 2015 at 4:33 PM, Daniel Gruno 
> wrote:
> >>> On 10/11/2015 05:29 PM, Sean Owen wrote:
>  Of course, but what's making you think this was a binary-only
>  distribution?
> >>>
> >>> I'm not saying binary-only, I am saying your source release contains
> >>> binary programs, which would invalidate a release vote. Is there a
> >>> release candidate package, that is voted on (saying you have a git tag
> >>> does not satisfy this criteria, you need to vote on an actual archive
> of
> >>> files, otherwise there is no cogent proof of the release being from
> that
> >>> specific git tag).
> >>>
> >>> Here's what I found in your source release:
> >>>
> >>> Binary application (application/jar; charset=binary) found in
> >>> spark-1.5.1/sql/hive/src/test/resources/data/files/TestSerDe.jar
> >>>
> >>> Binary application (application/jar; charset=binary) found in
> >>>
> spark-1.5.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test.jar
> >>>
> >>> Binary application (application/jar; charset=binary) found in
> >>>

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell

*to not include binaries.

On Sun, Oct 11, 2015 at 9:35 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> I think Daniel is correct here. The source artifact incorrectly includes
> jars. It is inadvertent and not part of our intended release process. This
> was something I noticed in Spark 1.5.0 and filed a JIRA and was fixed by
> updating our build scripts to fix it. However, our build environment was
> not using the most current version of the build scripts. See related links:
>
> https://issues.apache.org/jira/browse/SPARK-10511
> https://github.com/apache/spark/pull/8774/files
>
> I can update our build environment and we can repackage the Spark 1.5.1
> source tarball. To not include sources.
>
> - Patrick
>
> On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> Daniel: we did not vote on a tag. Please again read the VOTE email I
>> linked to you:
>>
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
>>
>> among other things, it contains a link to the concrete source (and
>> binary) distribution under vote:
>>
>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>>
>> You can still examine it, sure.
>>
>> Dependencies are *not* bundled in the source release. You're again
>> misunderstanding what you are seeing. Read my email again.
>>
>> I am still pretty confused about what the problem is. This is entirely
>> business as usual for ASF projects. I'll follow up with you offline if
>> you have any more doubts.
>>
>> On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno <humbed...@apache.org>
>> wrote:
>> > Here's my issue:
>> >
>> > How am I to audit that the dependencies you bundle are in fact what you
>> > claim they are?  How do I know they don't contain malware or - in light
>> > of recent events - emissions test rigging? ;)
>> >
>> > I am not interested in a git tag - that means nothing in the ASF voting
>> > process, you cannot vote on a tag, only on a release candidate. The VCS
>> > in use is irrelevant in this issue. If you can point me to a release
>> > candidate archive that was voted upon and does not contain binary
>> > applications, all is well.
>> >
>> > If there is no such thing, and we cannot come to an understanding, I
>> > will exercise my ASF Members' rights and bring this to the attention of
>> > the board of directors and ask for a clarification of the legality of
>> this.
>> >
>> > I find it highly irregular. Perhaps it is something some projects do in
>> > the Java community, but that doesn't make it permissible in my view.
>> >
>> > With regards,
>> > Daniel.
>> >
>> >
>> > On 10/11/2015 05:42 PM, Sean Owen wrote:
>> >> Still confused. Why are you saying we didn't vote on an archive? refer
>> >> to the email I linked, which includes both the git tag and a link to
>> >> all generated artifacts (also in my email).
>> >>
>> >> So, there are two things at play here:
>> >>
>> >> First, I am not sure what you mean that a source distro can't have
>> >> binary files. It's supposed to have the source code of Spark, and
>> >> shouldn't contain binary Spark. Nothing you listed are Spark binaries.
>> >> However, a distribution might have a lot of things in it that support
>> >> the source build, like copies of tools, test files, etc.  That
>> >> explains I think the first couple lines that you identified.
>> >>
>> >> Still, I am curious why you are saying that would invalidate a source
>> >> release? I have never heard anything like that.
>> >>
>> >> Second, I do think there are some binaries in here that aren't
>> >> supposed to be there, like the build/ directory stuff. IIRC these were
>> >> included accidentally and won't be in the next release. At least, I
>> >> don't see why they need to be bundled. These are just local copies of
>> >> third party tools though, and don't really matter. As it happens, the
>> >> licenses that get distributed with the source distro even cover all of
>> >> this stuff. I think that's not supposed to be there, but, also don't
>> >> see it's 'invalid' as a result.
>> >>
>> >>
>> >> On Sun, Oct 11, 2015 at 4:33 PM, Daniel Gruno <humbed...@apache.org>
>> wrote:
>> >>> On 10/11/2015 05:29 PM, Sea

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell

Oh I see - yes it's the build/. I always thought release votes related to a
source tag rather than specific binaries. But maybe we can just fix it in
1.5.2 if there is concern about mutating binaries. It seems reasonable to
me.

For tests... in the past we've tried to avoid having jars inside of the
source tree, including some effort to generate jars on the fly which a lot
of our tests use. I am not sure whether it's a firm policy that you can't
have jars in test folders, though. If it is, we could probably do some
magic to get rid of these few ones that have crept in.

- Patrick

On Sun, Oct 11, 2015 at 9:57 PM, Sean Owen <so...@cloudera.com> wrote:

> Agree, but we are talking about the build/ bit right?
>
> I don't agree that it invalidates the release, which is probably the more
> important idea. As a point of process, you would not want to modify and
> republish the artifact that was already released after being voted on -
> unless it was invalid in which case we spin up 1.5.1.1 or something.
>
> But that build/ directory should go in future releases.
>
> I think he is talking about more than this though and the other jars look
> like they are part of tests, and still nothing to do with Spark binaries.
> Those can and should stay.
>
> On Mon, Oct 12, 2015, 5:35 AM Patrick Wendell <pwend...@gmail.com> wrote:
>
>> I think Daniel is correct here. The source artifact incorrectly includes
>> jars. It is inadvertent and not part of our intended release process. This
>> was something I noticed in Spark 1.5.0 and filed a JIRA and was fixed by
>> updating our build scripts to fix it. However, our build environment was
>> not using the most current version of the build scripts. See related links:
>>
>> https://issues.apache.org/jira/browse/SPARK-10511
>> https://github.com/apache/spark/pull/8774/files
>>
>> I can update our build environment and we can repackage the Spark 1.5.1
>> source tarball. To not include sources.
>>
>>
>> - Patrick
>>
>> On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> Daniel: we did not vote on a tag. Please again read the VOTE email I
>>> linked to you:
>>>
>>>
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
>>>
>>> among other things, it contains a link to the concrete source (and
>>> binary) distribution under vote:
>>>
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>>>
>>> You can still examine it, sure.
>>>
>>> Dependencies are *not* bundled in the source release. You're again
>>> misunderstanding what you are seeing. Read my email again.
>>>
>>> I am still pretty confused about what the problem is. This is entirely
>>> business as usual for ASF projects. I'll follow up with you offline if
>>> you have any more doubts.
>>>
>>> On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno <humbed...@apache.org>
>>> wrote:
>>> > Here's my issue:
>>> >
>>> > How am I to audit that the dependencies you bundle are in fact what you
>>> > claim they are?  How do I know they don't contain malware or - in light
>>> > of recent events - emissions test rigging? ;)
>>> >
>>> > I am not interested in a git tag - that means nothing in the ASF voting
>>> > process, you cannot vote on a tag, only on a release candidate. The VCS
>>> > in use is irrelevant in this issue. If you can point me to a release
>>> > candidate archive that was voted upon and does not contain binary
>>> > applications, all is well.
>>> >
>>> > If there is no such thing, and we cannot come to an understanding, I
>>> > will exercise my ASF Members' rights and bring this to the attention of
>>> > the board of directors and ask for a clarification of the legality of
>>> this.
>>> >
>>> > I find it highly irregular. Perhaps it is something some projects do in
>>> > the Java community, but that doesn't make it permissible in my view.
>>> >
>>> > With regards,
>>> > Daniel.
>>> >
>>> >
>>> > On 10/11/2015 05:42 PM, Sean Owen wrote:
>>> >> Still confused. Why are you saying we didn't vote on an archive? refer
>>> >> to the email I linked, which includes both the git tag and a link to
>>> >> all generated artifacts (also in my email).
>>> >>
>>> >> So, there are two things at play here:

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell

Yeah I mean I definitely think we're not violating the *spirit* of the "no
binaries" policy, in that we do not include any binary code that is used at
runtime. This is because the binaries we distribute relate only to build
and testing.

Whether we are violating the *letter* of the policy, I'm not so sure. In
the very strictest interpretation of "there cannot be any binary files in
your downloaded tarball" - we aren't honoring that. We got a lot of people
complaining about the sbt jar for instance when we were in the incubator. I
found those complaints a little pedantic, but we ended up removing it from
our source tree and adding things to download it for the user.

- Patrick

On Sun, Oct 11, 2015 at 10:12 PM, Sean Owen <so...@cloudera.com> wrote:

> No we are voting on the artifacts being released (too) in principle.
> Although of course the artifacts should be a deterministic function of the
> source at a certain point in time.
>
> I think the concern is about putting Spark binaries or its dependencies
> into a source release. That should not happen, but it is not what has
> happened here.
>
> On Mon, Oct 12, 2015, 6:03 AM Patrick Wendell <pwend...@gmail.com> wrote:
>
>> Oh I see - yes it's the build/. I always thought release votes related to
>> a source tag rather than specific binaries. But maybe we can just fix it in
>> 1.5.2 if there is concern about mutating binaries. It seems reasonable to
>> me.
>>
>> For tests... in the past we've tried to avoid having jars inside of the
>> source tree, including some effort to generate jars on the fly which a lot
>> of our tests use. I am not sure whether it's a firm policy that you can't
>> have jars in test folders, though. If it is, we could probably do some
>> magic to get rid of these few ones that have crept in.
>>
>> - Patrick
>>
>> On Sun, Oct 11, 2015 at 9:57 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> Agree, but we are talking about the build/ bit right?
>>>
>>> I don't agree that it invalidates the release, which is probably the
>>> more important idea. As a point of process, you would not want to modify
>>> and republish the artifact that was already released after being voted on -
>>> unless it was invalid in which case we spin up 1.5.1.1 or something.
>>>
>>> But that build/ directory should go in future releases.
>>>
>>> I think he is talking about more than this though and the other jars
>>> look like they are part of tests, and still nothing to do with Spark
>>> binaries. Those can and should stay.
>>>
>>> On Mon, Oct 12, 2015, 5:35 AM Patrick Wendell <pwend...@gmail.com>
>>> wrote:
>>>
>>>> I think Daniel is correct here. The source artifact incorrectly
>>>> includes jars. It is inadvertent and not part of our intended release
>>>> process. This was something I noticed in Spark 1.5.0 and filed a JIRA and
>>>> was fixed by updating our build scripts to fix it. However, our build
>>>> environment was not using the most current version of the build scripts.
>>>> See related links:
>>>>
>>>> https://issues.apache.org/jira/browse/SPARK-10511
>>>> https://github.com/apache/spark/pull/8774/files
>>>>
>>>> I can update our build environment and we can repackage the Spark 1.5.1
>>>> source tarball. To not include sources.
>>>>
>>>>
>>>> - Patrick
>>>>
>>>> On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>>> Daniel: we did not vote on a tag. Please again read the VOTE email I
>>>>> linked to you:
>>>>>
>>>>>
>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
>>>>>
>>>>> among other things, it contains a link to the concrete source (and
>>>>> binary) distribution under vote:
>>>>>
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>>>>>
>>>>> You can still examine it, sure.
>>>>>
>>>>> Dependencies are *not* bundled in the source release. You're again
>>>>> misunderstanding what you are seeing. Read my email again.
>>>>>
>>>>> I am still pretty confused about what the problem is. This is entirely
>>>>> business as usual for ASF projects. I'll follow up with you offline if
>>>>> you have any more doubts.
>>>>>
>

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-09 Thread Patrick Wendell

I would push back slightly. The reason we have the PR builds taking so long
is death by a million small things that we add. Doing a full 2.11 compile
is order minutes... it's a nontrivial increase to the build times.

It doesn't seem that bad to me to go back post-hoc once in a while and fix
2.11 bugs when they come up. It's on the order of once or twice per release
and the typesafe guys keep a close eye on it (thanks!). Compare that to
literally thousands of PR runs and a few minutes every time, IMO it's not
worth it.

On Fri, Oct 9, 2015 at 3:31 PM, Hari Shreedharan 
wrote:

> +1, much better than having a new PR each time to fix something for
> scala-2.11 every time a patch breaks it.
>
> Thanks,
> Hari Shreedharan
>
>
>
>
> On Oct 9, 2015, at 11:47 AM, Michael Armbrust 
> wrote:
>
> How about just fixing the warning? I get it; it doesn't stop this from
>> happening again, but still seems less drastic than tossing out the
>> whole mechanism.
>>
>
> +1
>
> It also does not seem that expensive to test only compilation for Scala
> 2.11 on PR builds.
>
>
>

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-07 Thread Patrick Wendell

I don't think we have a firm contract around that. So far we've never
removed old artifacts, but the ASF has asked us at time to decrease the
size of binaries we post. In the future at some point we may drop older
ones since we keep adding new ones.

If downstream projects are depending on our artifacts, I'd say just hold
tight for now until something changes. If it changes, then those projects
might need to build Spark on their own and host older hadoop versions, etc.

On Wed, Oct 7, 2015 at 9:59 AM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:

> Thanks guys.
>
> Regarding this earlier question:
>
> More importantly, is there some rough specification for what packages we
> should be able to expect in this S3 bucket with every release?
>
> Is the implied answer that we should continue to expect the same set of
> artifacts for every release for the foreseeable future?
>
> Nick
> 
>
> On Tue, Oct 6, 2015 at 1:13 AM Patrick Wendell <pwend...@gmail.com> wrote:
>
>> The missing artifacts are uploaded now. Things should propagate in the
>> next 24 hours. If there are still issues past then ping this thread. Thanks!
>>
>> - Patrick
>>
>> On Mon, Oct 5, 2015 at 2:41 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> Thanks for looking into this Josh.
>>>
>>> On Mon, Oct 5, 2015 at 5:39 PM Josh Rosen <joshro...@databricks.com>
>>> wrote:
>>>
>>>> I'm working on a fix for this right now. I'm planning to re-run a
>>>> modified copy of the release packaging scripts which will emit only the
>>>> missing artifacts (so we won't upload new artifacts with different SHAs for
>>>> the builds which *did* succeed).
>>>>
>>>> I expect to have this finished in the next day or so; I'm currently
>>>> blocked by some infra downtime but expect that to be resolved soon.
>>>>
>>>> - Josh
>>>>
>>>> On Mon, Oct 5, 2015 at 8:46 AM, Nicholas Chammas <
>>>> nicholas.cham...@gmail.com> wrote:
>>>>
>>>>> Blaž said:
>>>>>
>>>>> Also missing is
>>>>> http://s3.amazonaws.com/spark-related-packages/spark-1.5.1-bin-hadoop1.tgz
>>>>> which breaks spark-ec2 script.
>>>>>
>>>>> This is the package I am referring to in my original email.
>>>>>
>>>>> Nick said:
>>>>>
>>>>> It appears that almost every version of Spark up to and including
>>>>> 1.5.0 has included a —bin-hadoop1.tgz release (e.g.
>>>>> spark-1.5.0-bin-hadoop1.tgz). However, 1.5.1 has no such package.
>>>>>
>>>>> Nick
>>>>> 
>>>>>
>>>>> On Mon, Oct 5, 2015 at 3:27 AM Blaž Šnuderl <snud...@gmail.com> wrote:
>>>>>
>>>>>> Also missing is http://s3.amazonaws.com/spark-related-packages/spark-
>>>>>> 1.5.1-bin-hadoop1.tgz which breaks spark-ec2 script.
>>>>>>
>>>>>> On Mon, Oct 5, 2015 at 5:20 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>
>>>>>>> hadoop1 package for Scala 2.10 wasn't in RC1 either:
>>>>>>>
>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>>>>>>>
>>>>>>> On Sun, Oct 4, 2015 at 5:17 PM, Nicholas Chammas <
>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I’m looking here:
>>>>>>>>
>>>>>>>> https://s3.amazonaws.com/spark-related-packages/
>>>>>>>>
>>>>>>>> I believe this is where one set of official packages is published.
>>>>>>>> Please correct me if this is not the case.
>>>>>>>>
>>>>>>>> It appears that almost every version of Spark up to and including
>>>>>>>> 1.5.0 has included a --bin-hadoop1.tgz release (e.g.
>>>>>>>> spark-1.5.0-bin-hadoop1.tgz).
>>>>>>>>
>>>>>>>> However, 1.5.1 has no such package. There is a
>>>>>>>> spark-1.5.1-bin-hadoop1-scala2.11.tgz package, but this is a
>>>>>>>> separate thing. (1.5.0 also has a hadoop1-scala2.11 package.)
>>>>>>>>
>>>>>>>> Was this intentional?
>>>>>>>>
>>>>>>>> More importantly, is there some rough specification for what
>>>>>>>> packages we should be able to expect in this S3 bucket with every 
>>>>>>>> release?
>>>>>>>>
>>>>>>>> This is important for those of us who depend on this publishing
>>>>>>>> venue (e.g. spark-ec2 and related tools).
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> 
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Re: Adding Spark Testing functionality

2015-10-06 Thread Patrick Wendell

Hey Holden,

It would be helpful if you could outline the set of features you'd imagine
being part of Spark in a short doc. I didn't see a README on the existing
repo, so it's hard to know exactly what is being proposed.

As a general point of process, we've typically avoided merging modules into
Spark that can exist outside of the project. A testing utility package that
is based on Spark's public API's seems like a really useful thing for the
community, but it does seem like a good fit for a package library. At
least, this is my first question after taking a look at the project.

In any case, getting some high level view of the functionality you imagine
would be helpful to give more detailed feedback.

- Patrick

On Tue, Oct 6, 2015 at 3:12 PM, Holden Karau  wrote:

> Hi Spark Devs,
>
> So this has been brought up a few times before, and generally on the user
> list people get directed to use spark-testing-base. I'd like to start
> moving some of spark-testing-base's functionality into Spark so that people
> don't need a library to do what is (hopefully :p) a very common requirement
> across all Spark projects.
>
> To that end I was wondering what peoples thoughts are on where this should
> live inside of Spark. I was thinking it could either be a separate testing
> project (like sql or similar), or just put the bits to enable testing
> inside of each relevant project.
>
> I was also thinking it probably makes sense to only move the unit testing
> parts at the start and leave things like integration testing in a testing
> project since that could vary depending on the users environment.
>
> What are peoples thoughts?
>
> Cheers,
>
> Holden :)
>

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-05 Thread Patrick Wendell

The missing artifacts are uploaded now. Things should propagate in the next
24 hours. If there are still issues past then ping this thread. Thanks!

- Patrick

On Mon, Oct 5, 2015 at 2:41 PM, Nicholas Chammas  wrote:

> Thanks for looking into this Josh.
>
> On Mon, Oct 5, 2015 at 5:39 PM Josh Rosen 
> wrote:
>
>> I'm working on a fix for this right now. I'm planning to re-run a
>> modified copy of the release packaging scripts which will emit only the
>> missing artifacts (so we won't upload new artifacts with different SHAs for
>> the builds which *did* succeed).
>>
>> I expect to have this finished in the next day or so; I'm currently
>> blocked by some infra downtime but expect that to be resolved soon.
>>
>> - Josh
>>
>> On Mon, Oct 5, 2015 at 8:46 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> Blaž said:
>>>
>>> Also missing is
>>> http://s3.amazonaws.com/spark-related-packages/spark-1.5.1-bin-hadoop1.tgz
>>> which breaks spark-ec2 script.
>>>
>>> This is the package I am referring to in my original email.
>>>
>>> Nick said:
>>>
>>> It appears that almost every version of Spark up to and including 1.5.0
>>> has included a —bin-hadoop1.tgz release (e.g. spark-1.5.0-bin-hadoop1.tgz).
>>> However, 1.5.1 has no such package.
>>>
>>> Nick
>>> 
>>>
>>> On Mon, Oct 5, 2015 at 3:27 AM Blaž Šnuderl  wrote:
>>>
 Also missing is http://s3.amazonaws.com/spark-related-packages/spark-
 1.5.1-bin-hadoop1.tgz which breaks spark-ec2 script.

 On Mon, Oct 5, 2015 at 5:20 AM, Ted Yu  wrote:

> hadoop1 package for Scala 2.10 wasn't in RC1 either:
> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>
> On Sun, Oct 4, 2015 at 5:17 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> I’m looking here:
>>
>> https://s3.amazonaws.com/spark-related-packages/
>>
>> I believe this is where one set of official packages is published.
>> Please correct me if this is not the case.
>>
>> It appears that almost every version of Spark up to and including
>> 1.5.0 has included a --bin-hadoop1.tgz release (e.g.
>> spark-1.5.0-bin-hadoop1.tgz).
>>
>> However, 1.5.1 has no such package. There is a
>> spark-1.5.1-bin-hadoop1-scala2.11.tgz package, but this is a
>> separate thing. (1.5.0 also has a hadoop1-scala2.11 package.)
>>
>> Was this intentional?
>>
>> More importantly, is there some rough specification for what packages
>> we should be able to expect in this S3 bucket with every release?
>>
>> This is important for those of us who depend on this publishing venue
>> (e.g. spark-ec2 and related tools).
>>
>> Nick
>> 
>>
>
>

>>

Re: Spark 1.6 Release window is not updated in Spark-wiki

2015-10-01 Thread Patrick Wendell

BTW - the merge window for 1.6 is September+October. The QA window is
November and we'll expect to ship probably early december. We are on a
3 month release cadence, with the caveat that there is some
pipelining... as we finish release X we are already starting on
release X+1.

- Patrick

On Thu, Oct 1, 2015 at 11:30 AM, Patrick Wendell <pwend...@gmail.com> wrote:
> Ah - I can update it. Usually i do it after the release is cut. It's
> just a standard 3 month cadence.
>
> On Thu, Oct 1, 2015 at 3:55 AM, Sean Owen <so...@cloudera.com> wrote:
>> My guess is that the 1.6 merge window should close at the end of
>> November (2 months from now)? I can update it but wanted to check if
>> anyone else has a preferred tentative plan.
>>
>> On Thu, Oct 1, 2015 at 2:20 AM, Meethu Mathew <meethu.mat...@flytxt.com> 
>> wrote:
>>> Hi,
>>> In the https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage the
>>> current release window has not been changed from 1.5. Can anybody give an
>>> idea of the expected dates for 1.6 version?
>>>
>>> Regards,
>>>
>>> Meethu Mathew
>>> Senior Engineer
>>> Flytxt
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Spark 1.6 Release window is not updated in Spark-wiki

2015-10-01 Thread Patrick Wendell

Ah - I can update it. Usually i do it after the release is cut. It's
just a standard 3 month cadence.

On Thu, Oct 1, 2015 at 3:55 AM, Sean Owen  wrote:
> My guess is that the 1.6 merge window should close at the end of
> November (2 months from now)? I can update it but wanted to check if
> anyone else has a preferred tentative plan.
>
> On Thu, Oct 1, 2015 at 2:20 AM, Meethu Mathew  
> wrote:
>> Hi,
>> In the https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage the
>> current release window has not been changed from 1.5. Can anybody give an
>> idea of the expected dates for 1.6 version?
>>
>> Regards,
>>
>> Meethu Mathew
>> Senior Engineer
>> Flytxt
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-24 Thread Patrick Wendell

Hey Richard,

My assessment (just looked before I saw Sean's email) is the same as
his. The NOTICE file embeds other projects' licenses. If those
licenses themselves have pointers to other files or dependencies, we
don't embed them. I think this is standard practice.

- Patrick

On Thu, Sep 24, 2015 at 10:00 AM, Sean Owen  wrote:
> Hi Richard, those are messages reproduced from other projects' NOTICE
> files, not created by Spark. They need to be reproduced in Spark's
> NOTICE file to comply with the license, but their text may or may not
> apply to Spark's distribution. The intent is that users would track
> this back to the source project if interested to investigate what the
> upstream notice is about.
>
> Requirements vary by license, but I do not believe there is additional
> requirement to reproduce these other files. Their license information
> is already indicated in accordance with the license terms.
>
> What licenses are you looking for in LICENSE that you believe should be there?
>
> Getting all this right is both difficult and important. I've made some
> efforts over time to strictly comply with the Apache take on
> licensing, which is at http://www.apache.org/legal/resolved.html  It's
> entirely possible there's still a mistake somewhere in here (possibly
> a new dependency, etc). Please point it out if you see such a thing.
>
> But so far what you describe is "working as intended", as far as I
> know, according to Apache.
>
>
> On Thu, Sep 24, 2015 at 5:52 PM, Richard Hillegas  wrote:
>> -1 (non-binding)
>>
>> I was able to build Spark cleanly from the source distribution using the
>> command in README.md:
>>
>> build/mvn -DskipTests clean package
>>
>> However, while I was waiting for the build to complete, I started going
>> through the NOTICE file. I was confused about where to find licenses for 3rd
>> party software bundled with Spark. About halfway through the NOTICE file,
>> starting with Java Collections Framework, there is a list of licenses of the
>> form
>>
>>license/*.txt
>>
>> But there is no license subdirectory in the source distro. I couldn't find
>> the  *.txt license files for Java Collections Framework, Base64 Encoder, or
>> JZlib anywhere in the source distro. I couldn't find those files in license
>> subdirectories at the indicated home pages for those projects. (I did find
>> the license for JZLIB somewhere else, however:
>> http://www.jcraft.com/jzlib/LICENSE.txt.)
>>
>> In addition, I couldn't find licenses for those projects in the master
>> LICENSE file.
>>
>> Are users supposed to get licenses from the indicated 3rd party web sites?
>> Those online licenses could change. I would feel more comfortable if the ASF
>> were protected by our bundling the licenses inside our source distros.
>>
>> After looking for those three licenses, I stopped reading the NOTICE file.
>> Maybe I'm confused about how to read the NOTICE file. Where should users
>> expect to find the 3rd party licenses?
>>
>> Thanks,
>> -Rick
>>
>> Reynold Xin  wrote on 09/24/2015 12:27:25 AM:
>>
>>> From: Reynold Xin 
>>> To: "dev@spark.apache.org" 
>>> Date: 09/24/2015 12:28 AM
>>> Subject: [VOTE] Release Apache Spark 1.5.1 (RC1)
>>
>>
>>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 1.5.1. The vote is open until Sun, Sep 27, 2015 at 10:00 UTC
>>> and passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.5.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> The release fixes 81 known issues in Spark 1.5.0, listed here:
>>> http://s.apache.org/spark-1.5.1
>>>
>>> The tag to be voted on is v1.5.1-rc1:
>>> https://github.com/apache/spark/commit/
>>> 4df97937dbf68a9868de58408b9be0bf87dbbb94
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release (1.5.1) can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1148/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-docs/
>>>
>>> ===
>>> How can I help test this release?
>>> ===
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate,
>>> then reporting any regressions.
>>>
>>> 
>>> What justifies a -1 vote for this release?
>>> 
>>> -1 vote should occur for regressions from Spark 1.5.0. Bugs already
>>> present in

Re: RFC: packaging Spark without assemblies

2015-09-23 Thread Patrick Wendell

I think it would be a big improvement to get rid of it. It's not how
jars are supposed to be packaged and it has caused problems in many
different context over the years.

For me a key step in moving away would be to fully audit/understand
all compatibility implications of removing it. If other people are
supportive of this plan I can offer to help spend some time thinking
about any potential corner cases, etc.

- Patrick

On Wed, Sep 23, 2015 at 3:13 PM, Marcelo Vanzin  wrote:
> Hey all,
>
> This is something that we've discussed several times internally, but
> never really had much time to look into; but as time passes by, it's
> increasingly becoming an issue for us and I'd like to throw some ideas
> around about how to fix it.
>
> So, without further ado:
> https://github.com/vanzin/spark/pull/2/files
>
> (You can comment there or click "View" to read the formatted document.
> I thought that would be easier than sharing on Google Drive or Box or
> something.)
>
> It would be great to get people's feedback, especially if there are
> strong reasons for the assemblies that I'm not aware of.
>
>
> --
> Marcelo
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Why there is no snapshots for 1.5 branch?

2015-09-22 Thread Patrick Wendell

I just added snapshot builds for 1.5. They will take a few hours to
build, but once we get them working should publish every few hours.

https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging

- Patrick

On Mon, Sep 21, 2015 at 10:36 PM, Bin Wang  wrote:
> However I find some scripts in dev/audit-release, can I use them?
>
> Bin Wang 于2015年9月22日周二 下午1:34写道：
>>
>> No, I mean push spark to my private repository. Spark don't have a
>> build.sbt as far as I see.
>>
>> Fengdong Yu 于2015年9月22日周二 下午1:29写道：
>>>
>>> Do you mean you want to publish the artifact to your private repository?
>>>
>>> if so, please using ‘sbt publish’
>>>
>>> add the following in your build.sb:
>>>
>>> publishTo := {
>>>   val nexus = "https://YOUR_PRIVATE_REPO_HOSTS/;
>>>   if (version.value.endsWith("SNAPSHOT"))
>>> Some("snapshots" at nexus + "content/repositories/snapshots")
>>>   else
>>> Some("releases"  at nexus + "content/repositories/releases")
>>>
>>> }
>>>
>>>
>>>
>>> On Sep 22, 2015, at 13:26, Bin Wang  wrote:
>>>
>>> My project is using sbt (or maven), which need to download dependency
>>> from a maven repo. I have my own private maven repo with nexus but I don't
>>> know how to push my own build to it, can you give me a hint?
>>>
>>> Mark Hamstra 于2015年9月22日周二 下午1:25写道：

 Yeah, whoever is maintaining the scripts and snapshot builds has fallen
 down on the job -- but there is nothing preventing you from checking out
 branch-1.5 and creating your own build, which is arguably a smarter thing 
 to
 do anyway.  If I'm going to use a non-release build, then I want the full
 git commit history of exactly what is in that build readily available, not
 just somewhat arbitrary JARs.

 On Mon, Sep 21, 2015 at 9:57 PM, Bin Wang  wrote:
>
> But I cannot find 1.5.1-SNAPSHOT either at
> https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.10/
>
> Mark Hamstra 于2015年9月22日周二 下午12:55写道：
>>
>> There is no 1.5.0-SNAPSHOT because 1.5.0 has already been released.
>> The current head of branch-1.5 is 1.5.1-SNAPSHOT -- soon to be 1.5.1 
>> release
>> candidates and then the 1.5.1 release.
>>
>> On Mon, Sep 21, 2015 at 9:51 PM, Bin Wang  wrote:
>>>
>>> I'd like to use some important bug fixes in 1.5 branch and I look for
>>> the apache maven host, but don't find any snapshot for 1.5 branch.
>>> https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.10/1.5.0-SNAPSHOT/
>>>
>>> I can find 1.4.X and 1.6.0 versions, why there is no snapshot for
>>> 1.5.X?
>>
>>

>>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[ANNOUNCE] New testing capabilities for pull requests

2015-08-30 Thread Patrick Wendell

Hi All,

For pull requests that modify the build, you can now test different
build permutations as part of the pull request builder. To trigger
these, you add a special phrase to the title of the pull request.
Current options are:

[test-maven] - run tests using maven and not sbt
[test-hadoop1.0] - test using older hadoop versions (can use 1.0, 2.0,
2.2, and 2.3).

The relevant source code is here:
https://github.com/apache/spark/blob/master/dev/run-tests-jenkins#L193

This is useful because it allows up-front testing of build changes to
avoid breaks once a patch has already been merged.

I've documented this on the wiki:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Patrick Wendell

There is already code in place that restricts which tests run
depending on which code is modified. However, changes inside of
Spark's core currently require running all dependent tests. If you
have some ideas about how to improve that heuristic, it would be
great.

- Patrick

On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin van...@cloudera.com wrote:
 Hello y'all,

 So I've been getting kinda annoyed with how many PR tests have been
 timing out. I took one of the logs from one of my PRs and started to
 do some crunching on the data from the output, and here's a list of
 the 5 slowest suites:

 307.14s HiveSparkSubmitSuite
 382.641s VersionsSuite
 398s CliSuite
 410.52s HashJoinCompatibilitySuite
 2508.61s HiveCompatibilitySuite

 Looking at those, I'm not surprised at all that we see so many
 timeouts. Is there any ongoing effort to trim down those tests
 (especially HiveCompatibilitySuite) or somehow restrict when they're
 run?

 Almost 1 hour to run a single test suite that affects a rather
 isolated part of the code base looks a little excessive to me.

 --
 Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Avoiding unnecessary build changes until tests are in better shape

2015-08-05 Thread Patrick Wendell

Hey All,

Was wondering if people would be willing to avoid merging build
changes until we have put the tests in better shape. The reason is
that build changes are the most likely to cause downstream issues with
the test matrix and it's very difficult to reverse engineer which
patches caused which problems when the tests are not in a stable
state. For instance, the updates to Hive 1.2.1 caused cascading
failures that have lasted several days now and in the mean time a few
other build related patches were also merged - as these pile up it
gets harder for us to have confidence those other patches didn't
introduce problems.

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: How to help for 1.5 release?

2015-08-04 Thread Patrick Wendell

Hey Meihua,

If you are a user of Spark, one thing that is really helpful is to run
Spark 1.5 on your workload and report any issues, performance
regressions, etc.

- Patrick

On Mon, Aug 3, 2015 at 11:49 PM, Akhil Das ak...@sigmoidanalytics.com wrote:
 I think you can start from here
 https://issues.apache.org/jira/browse/SPARK/fixforversion/12332078/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel

 Thanks
 Best Regards

 On Tue, Aug 4, 2015 at 12:02 PM, Meihua Wu rotationsymmetr...@gmail.com
 wrote:

 I think the team is preparing for the 1.5 release. Anything to help with
 the QA, testing etc?

 Thanks,

 MW



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Patrick Wendell

Yeah the best bet is to use ./build/mvn --force (otherwise we'll still
use your system maven).

- Patrick

On Mon, Aug 3, 2015 at 1:26 PM, Sean Owen so...@cloudera.com wrote:
 That statement is true for Spark 1.4.x. But you've reminded me that I
 failed to update this doc for 1.5, to say Maven 3.3.3 is required.
 Patch coming up.

 On Mon, Aug 3, 2015 at 9:12 PM, Guru Medasani gdm...@gmail.com wrote:
 Thanks Sean. Reason I asked this is, in Building Spark documentation of
 1.4.1, I still see this.

 https://spark.apache.org/docs/latest/building-spark.html

 Building Spark using Maven requires Maven 3.0.4 or newer and Java 6+.

 But I noticed the following warnings from the build of Spark version
 1.5.0-snapshot. So I was wondering if the changes you mentioned relate to
 newer versions of Spark or for 1.4.1 version as well.

 [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

 [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion
 failed with message:
 Detected JDK Version: 1.6.0-36 is not in the allowed range 1.7.

 Guru Medasani
 gdm...@gmail.com

 On Aug 3, 2015, at 2:38 PM, Sean Owen so...@cloudera.com wrote:

 Using ./build/mvn should always be fine. Your local mvn is fine too if
 it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users
 on OS X out there will have, by the way.

 On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote:

 Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT
 this morning.

 WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

 Should we be using maven 3.3.3 locally or build/mvn starting from Spark
 1.4.1 or Spark version 1.5?

 Guru Medasani
 gdm...@gmail.com



 On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:

 If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
 via brew on OS X), then this won't affect you, but I wanted to call
 attention to https://github.com/apache/spark/pull/7852 which makes
 Maven 3.3.3 the minimum required to build Spark. This heads off
 problems from some behavior differences that Patrick and I observed
 between 3.3 and 3.2 last week, on top of the dependency reduced POM
 glitch from the 1.4.1 release window.

 Again all you need to do is use build/mvn if you don't already have
 the latest Maven installed and all will be well.

 Sean

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-08-01 Thread Patrick Wendell

Hey All,

I got it up and running - it was a newly surfaced bug in the build scripts.

- Patrick

On Wed, Jul 29, 2015 at 6:05 AM, Bharath Ravi Kumar reachb...@gmail.com wrote:
 Hey Patrick,

 Any update on this front please?

 Thanks,
 Bharath

 On Fri, Jul 24, 2015 at 8:38 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Bharath,

 There was actually an incompatible change to the build process that
 broke several of the Jenkins builds. This should be patched up in the
 next day or two and nightly builds will resume.

 - Patrick

 On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar
 reachb...@gmail.com wrote:
  I noticed the last (1.5) build has a timestamp of 16th July. Have
  nightly
  builds been discontinued since then?
 
  Thanks,
  Bharath
 
  On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Hi All,
 
  This week I got around to setting up nightly builds for Spark on
  Jenkins. I'd like feedback on these and if it's going well I can merge
  the relevant automation scripts into Spark mainline and document it on
  the website. Right now I'm doing:
 
  1. SNAPSHOT's of Spark master and release branches published to ASF
  Maven snapshot repo:
 
 
 
  https://repository.apache.org/content/repositories/snapshots/org/apache/spark/
 
  These are usable by adding this repository in your build and using a
  snapshot version (e.g. 1.3.2-SNAPSHOT).
 
  2. Nightly binary package builds and doc builds of master and release
  versions.
 
  http://people.apache.org/~pwendell/spark-nightly/
 
  These build 4 times per day and are tagged based on commits.
 
  If anyone has feedback on these please let me know.
 
  Thanks!
  - Patrick
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Should spark-ec2 get its own repo?

2015-07-31 Thread Patrick Wendell

Hey All,

I've mostly kept quiet since I am not very active in maintaining this
code anymore. However, it is a bit odd that the project is
split-brained with a lot of the code being on github and some in the
Spark repo.

If the consensus is to migrate everything to github, that seems okay
with me. I would vouch for having user continuity, for instance still
have a shim ec2/spark-ec2 script that could perhaps just download
and unpack the real script from github.

- Patrick

On Fri, Jul 31, 2015 at 2:13 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 Yes - It is still in progress, but I have just not gotten time to get to
 this. I think getting the repo moved from mesos to amplab in the codebase by
 1.5 should be possible.

 Thanks
 Shivaram

 On Fri, Jul 31, 2015 at 3:08 AM, Sean Owen so...@cloudera.com wrote:

 PS is this still in progress? it feels like something that would be
 good to do before 1.5.0, if it's going to happen soon.

 On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  Yeah I'll send a note to the mesos dev list just to make sure they are
  informed.
 
  Shivaram
 
  On Tue, Jul 21, 2015 at 11:47 AM, Sean Owen so...@cloudera.com wrote:
 
  I agree it's worth informing Mesos devs and checking that there are no
  big objections. I presume Shivaram is plugged in enough to Mesos that
  there won't be any surprises there, and that the project would also
  agree with moving this Spark-specific bit out. they may also want to
  leave a pointer to the new location in the mesos repo of course.
 
  I don't think it is something that requires a formal vote. It's not a
  question of ownership -- neither Apache nor the project PMC owns the
  code. I don't think it's different from retiring or removing any other
  code.
 
 
 
 
 
  On Tue, Jul 21, 2015 at 7:03 PM, Mridul Muralidharan mri...@gmail.com
  wrote:
   If I am not wrong, since the code was hosted within mesos project
   repo, I assume (atleast part of it) is owned by mesos project and so
   its PMC ?
  
   - Mridul
  
   On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
   shiva...@eecs.berkeley.edu wrote:
   There is technically no PMC for the spark-ec2 project (I guess we
   are
   kind
   of establishing one right now). I haven't heard anything from the
   Spark
   PMC
   on the dev list that might suggest a need for a vote so far. I will
   send
   another round of email notification to the dev list when we have a
   JIRA
   / PR
   that actually moves the scripts (right now the only thing that
   changed
   is
   the location of some scripts in mesos/ to amplab/).
  
   Thanks
   Shivaram
  
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Data source aliasing

2015-07-30 Thread Patrick Wendell

Yeah this could make sense - allowing data sources to register a short
name. What mechanism did you have in mind? To use the jar service loader?

The only issue is that there could be conflicts since many of these are
third party packages. If the same name were registered twice I'm not sure
what the best behavior would be. Ideally in my mind if the same shortname
were registered twice we'd force the user to use a fully qualified name and
say the short name is ambiguous.

Patrick
On Jul 30, 2015 9:44 AM, Joseph Batchik josephbatc...@gmail.com wrote:

 Hi all,

 There are now starting to be a lot of data source packages for Spark. A
 annoyance I see is that I have to type in the full class name like:

 sqlContext.read.format(com.databricks.spark.avro).load(path).

 Spark internally has formats such as parquet and jdbc registered and
 it would be nice to be able just to type in avro, redshift, etc. as
 well. Would it be a good idea to use something like a service loader to
 allow data sources defined in other packages to register themselves with
 Spark? I think that this would make it easier for end users. I would be
 interested in adding this, please let me know what you guys think.

 - Joe

Re: ReceiverTrackerSuite failing in master build

2015-07-28 Thread Patrick Wendell

Thanks ted for pointing this out. CC to Ryan and TD

On Tue, Jul 28, 2015 at 8:25 AM, Ted Yu yuzhih...@gmail.com wrote:
 Hi,
 I noticed that ReceiverTrackerSuite is failing in master Jenkins build for
 both hadoop profiles.

 The failure seems to start with:
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3104/

 FYI

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Protocol for build breaks

2015-07-25 Thread Patrick Wendell

Hi All,

If there is a build break (i.e. a compile issue or consistently
failing test) that somehow makes it into master, the best protocol is:

1. Revert the offending patch.
2. File a JIRA and assign it to the committer of the offending patch.
The JIRA should contain links to broken builds.

It's not worth waiting any time to try and figure out how to fix it,
or blocking on tracking down the commit author. This is because every
hour that we have the PRB broken is a major cost in terms of developer
productivity.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-07-24 Thread Patrick Wendell

Hey Bharath,

There was actually an incompatible change to the build process that
broke several of the Jenkins builds. This should be patched up in the
next day or two and nightly builds will resume.

- Patrick

On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar
reachb...@gmail.com wrote:
 I noticed the last (1.5) build has a timestamp of 16th July. Have nightly
 builds been discontinued since then?

 Thanks,
 Bharath

 On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hi All,

 This week I got around to setting up nightly builds for Spark on
 Jenkins. I'd like feedback on these and if it's going well I can merge
 the relevant automation scripts into Spark mainline and document it on
 the website. Right now I'm doing:

 1. SNAPSHOT's of Spark master and release branches published to ASF
 Maven snapshot repo:


 https://repository.apache.org/content/repositories/snapshots/org/apache/spark/

 These are usable by adding this repository in your build and using a
 snapshot version (e.g. 1.3.2-SNAPSHOT).

 2. Nightly binary package builds and doc builds of master and release
 versions.

 http://people.apache.org/~pwendell/spark-nightly/

 These build 4 times per day and are tagged based on commits.

 If anyone has feedback on these please let me know.

 Thanks!
 - Patrick

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Policy around backporting bug fixes

2015-07-24 Thread Patrick Wendell

Hi All,

A few times I've been asked about backporting and when to backport and
not backport fix patches. Since I have managed this for many of the
past releases, I wanted to point out the way I have been thinking
about it. If we have some consensus I can put it on the wiki.

The trade off when backporting is you get to deliver the fix to people
running older versions (great!), but you risk introducing new or even
worse bugs in maintenance releases (bad!). The decision point is when
you have a bug fix and it's not clear whether it is worth backporting.

I think the following facets are important to consider:
(a) Backports are an extremely valuable service to the community and
should be considered for any bug fix.
(b) Introducing a new bug in a maintenance release must be avoided at
all costs. It over time would erode confidence in our release process.
(c) Distributions or advanced users can always backport risky patches
on their own, if they see fit.

For me, the consequence of these is that we should backport in the
following situations:
- Both the bug and the fix are well understood and isolated. Code
being modified is well tested.
- The bug being addressed is high priority to the community.
- The backported fix does not vary widely from the master branch fix.

We tend to avoid backports in the converse situations:
- The bug or fix are not well understood. For instance, it relates to
interactions between complex components or third party libraries (e.g.
Hadoop libraries). The code is not well tested outside of the
immediate bug being fixed.
- The bug is not clearly a high priority for the community.
- The backported fix is widely different from the master branch fix.

These are clearly subjective criteria, but ones worth considering. I
am always happy to help advise people on specific patches if they want
a soundingboard to understand whether it makes sense to backport.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: KinesisStreamSuite failing in master branch

2015-07-19 Thread Patrick Wendell

I think we should just revert this patch on all affected branches. No
reason to leave the builds broken until a fix is in place.

- Patrick

On Sun, Jul 19, 2015 at 6:03 PM, Josh Rosen rosenvi...@gmail.com wrote:
 Yep, I emailed TD about it; I think that we may need to make a change to the
 pull request builder to fix this.  Pending that, we could just revert the
 commit that added this.

 On Sun, Jul 19, 2015 at 5:32 PM, Ted Yu yuzhih...@gmail.com wrote:

 Hi,
 I noticed that KinesisStreamSuite fails for both hadoop profiles in master
 Jenkins builds.

 From
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/3011/console
 :

 KinesisStreamSuite:
 *** RUN ABORTED ***
   java.lang.AssertionError: assertion failed: Kinesis test not enabled,
 should not attempt to get AWS credentials
   at scala.Predef$.assert(Predef.scala:179)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils$.getAWSCredentials(KinesisTestUtils.scala:189)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.org$apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient$lzycompute(KinesisTestUtils.scala:59)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.org$apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient(KinesisTestUtils.scala:58)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.describeStream(KinesisTestUtils.scala:121)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.findNonExistentStreamName(KinesisTestUtils.scala:157)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.createStream(KinesisTestUtils.scala:78)
   at
 org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:45)
   at
 org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
   at
 org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:33)


 FYI



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Foundation policy on releases and Spark nightly builds

2015-07-19 Thread Patrick Wendell

Sean B.,

Thank you for giving a thorough reply. I will work with Sean O. and
see what we can change to make us more in line with the stated policy.

I did some research and it appears that some time between October [1]
and December [2] 2006, this page was modified to include stricter
policy surrounding nightly builds. Actually, the original version of
the policy page encouraged projects to post nightly builds for the
benefit of all developers, just as we have been doing.

If you detect frustration from the Spark community, it's because this
type of situation occurs with some regularity. In this case:

(a) A policy exists from ~10 years ago, presumably because some
project back then had problematic release management practices and so
a policy needed to be created to solve a problem.
(b) The policy is outdated now, and no one is 100% sure why it was
created (likely many of the people are no longer involved in the ASF
who helped craft it).
(c) The steps for how to change it are unclear and there isn't clear
ownership of the policy document.

I think it's unavoidable given the decentralized organization
structure of the ASF, but I just want to be up front about our
perspective and why you might sense some frustration.

[1] 
https://web.archive.org/web/20061020220358/http://www.apache.org/dev/release.html
[2] 
https://web.archive.org/web/20061231050046/http://www.apache.org/dev/release.html

- Patrick

On Tue, Jul 14, 2015 at 10:09 AM, Sean Busbey bus...@cloudera.com wrote:
 Responses inline, with some liberties on ordering.

 On Sun, Jul 12, 2015 at 10:32 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Hey Sean B,

 Would you mind outlining for me how we go about changing this policy -
 I think it's outdated and doesn't make much sense. Ideally I'd like to
 propose a vote to modify the text slightly such that our current
 behavior is seen as complaint. Specifically:




 - Who has the authority to change this document?


 It's foundation level policy, so I'd presume the board needs to. Since it's
 part of our legal position, it might be owned by the legal affairs
 committee[1]. That would mean they could update it without a board
 resolution. (legal-discuss@ could tell you for sure).


 - What concrete steps can I take to change the policy?


 The Legal Affairs Committee is reachable either through their mailing
 list[2] or their issue tracker[3].

 Please be sure to read the entire original document, it explains the
 rationale that has gone into it. You'll need to address the matters raised
 there.



 - You keep mentioning the incubator@ list, why is this the place for
 such policy to be discussed or decided on?



 It can't be decided on the general@incubator list, but there are already
 several relevant parties discussing the matter there. You certainly don't
 *need* to join that conversation, but the participants there have overlap
 with the folks who can ultimately decide the issue. Thus, it may help avoid
 having to repeat things.



 - What is the reasonable amount of time frame in which the policy
 change is likely to be decided?


 I am neither a participant on legal affairs nor the board, so I have no
 idea.


 We've had a few times people from the various parts of the ASF come
 and say we are in violation of a policy. And sometimes other ASF
 people come and then get in a fight on our mailing list, and there is



 Please keep in mind that you are also ASF people, as is the entire Spark
 community (users and all)[4]. Phrasing things in terms of us and them by
 drawing a distinction on [they] get in a fight on our mailing list is not
 helpful.



 back and fourth, and it turns out there isn't so much a widely
 followed policy as a doc somewhere that is really old and not actually
 universally followed. It's difficult for us in such situations to now
 how to proceed and how much autonomy we as a PMC have to make
 decisions about our own project.


 Understanding and abiding by ASF legal obligations and policies is the job
 of each project PMC as a part of their formation by the board[5]. If anyone
 in your community has questions about what the project can or can not do
 then it's the job of the PMC find out proactively (rather than take a ask
 for forgiveness approach). Where the existing documentation is unclear or
 where you think it might be out of date, you can often get guidance from
 general@incubator (since it contains a large number of members and folks
 from across foundation projects) or comdev[6] (since their charter includes
 explaining ASF policy). If those resources prove insufficient matters can be
 brought up with either legal-discuss@ or board@.

 If you find out of date documentation that is not ASF policy, you can have
 it removed by notifying the appropriate group (i.e. legal-discuss, comdev,
 or whomever is hosting it).


 [1]: http://apache.org/legal/
 [2]: http://www.apache.org/foundation/mailinglists.html#foundation-legal
 [3]: https://issues.apache.org/jira/browse/LEGAL

Re: Foundation policy on releases and Spark nightly builds

2015-07-19 Thread Patrick Wendell

Hey Sean,

One other thing I'd be okay doing is moving the main text about
nightly builds to the wiki and just have header called Nightly
builds at the end of the downloads page that says For developers,
Spark maintains nightly builds. More information is available on the
[Spark developer Wiki](link). I think this would preserve
discoverability while also placing the information on the wiki, which
seems to be the main ask of the policy.

- Patrick

On Sun, Jul 19, 2015 at 2:32 AM, Sean Owen so...@cloudera.com wrote:
 I am going to make an edit to the download page on the web site to
 start, as that much seems uncontroversial. Proposed change:

 Reorder sections to put developer-oriented sections at the bottom,
 including the info on nightly builds:
   Download Spark
   Link with Spark
   All Releases
   Spark Source Code Management
   Nightly Builds

 Change text to emphasize the audience:

 Packages are built regularly off of Spark’s master branch and release
 branches. These provide *Spark developers* access to the bleeding-edge
 of Spark master or the most recent fixes not yet incorporated into a
 maintenance release. *They should not be used by anyone except Spark
 developers, and may be unstable or have serious bugs. End users should
 only use official releases above. Please subscribe to
 dev@spark.apache.org if you are a Spark developer to be aware of
 issues in nightly builds.* Spark nightly packages are available at:

 On Thu, Jul 16, 2015 at 8:21 AM, Sean Owen so...@cloudera.com wrote:
 To move this forward, I think one of two things needs to happen:

 1. Move this guidance to the wiki. Seems that people gathered here
 believe that resolves the issue. Done.

 2. Put disclaimers on the current downloads page. This may resolve the
 issue, but then we bring it up on the right mailing list for
 discussion. It may end up at #1, or may end in a tweak to the policy.

 I can drive either one. Votes on how to proceed?


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [discuss] Removing individual commit messages from the squash commit message

2015-07-18 Thread Patrick Wendell

+1 from me too

On Sat, Jul 18, 2015 at 3:32 AM, Ted Yu yuzhih...@gmail.com wrote:
 +1 to removing commit messages.



 On Jul 18, 2015, at 1:35 AM, Sean Owen so...@cloudera.com wrote:

 +1 to removing them. Sometimes there are 50+ commits because people
 have been merging from master into their branch rather than rebasing.

 On Sat, Jul 18, 2015 at 8:48 AM, Reynold Xin r...@databricks.com wrote:
 I took a look at the commit messages in git log -- it looks like the
 individual commit messages are not that useful to include, but do make the
 commit messages more verbose. They are usually just a bunch of extremely
 concise descriptions of bug fixes, merges, etc:

cb3f12d [xxx] add whitespace
6d874a6 [xxx] support pyspark for yarn-client

89b01f5 [yyy] Update the unit test to add more cases
275d252 [yyy] Address the comments
7cc146d [yyy] Address the comments
2624723 [yyy] Fix rebase conflict
45befaa [yyy] Update the unit test
bbc1c9c [yyy] Fix checkpointing doesn't retain driver port issue


 Anybody against removing those from the merge script so the log looks
 cleaner? If nobody feels strongly about this, we can just create a JIRA to
 remove them, and only keep the author names.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Slight API incompatibility caused by SPARK-4072

2015-07-15 Thread Patrick Wendell

One related note here is that we have a Java version of this that is
an abstract class - in the doc it says that it exists more or less to
allow for binary compatibility (it says it's for Java users, but
really Scala could use this also):

https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/JavaSparkListener.java#L23

I think it might be reasonable that the Scala trait provides only
source compatibitly and the Java class provides binary compatibility.

- Patrick

On Wed, Jul 15, 2015 at 11:47 AM, Marcelo Vanzin van...@cloudera.com wrote:
 Hey all,

 Just noticed this when some of our tests started to fail. SPARK-4072 added a
 new method to the SparkListener trait, and even though it has a default
 implementation, it doesn't seem like that applies retroactively.

 Namely, if you have an existing, compiled app that has an implementation of
 SparkListener, that app won't work on 1.5 without a recompile. You'll get
 something like this:

 java.lang.AbstractMethodError
   at
 org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:62)
   at
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at 
 org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56)
   at
 org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
   at
 org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:79)
   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1235)
   at
 org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)


 Now I know that SparkListener is marked as @DeveloperApi, but is this
 something we should care about? Seems like adding methods to traits is just
 as backwards-incompatible as adding new methods to Java interfaces.


 --
 Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Slight API incompatibility caused by SPARK-4072

2015-07-15 Thread Patrick Wendell

Actually the java one is a concrete class.

On Wed, Jul 15, 2015 at 12:14 PM, Patrick Wendell pwend...@gmail.com wrote:
 One related note here is that we have a Java version of this that is
 an abstract class - in the doc it says that it exists more or less to
 allow for binary compatibility (it says it's for Java users, but
 really Scala could use this also):

 https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/JavaSparkListener.java#L23

 I think it might be reasonable that the Scala trait provides only
 source compatibitly and the Java class provides binary compatibility.

 - Patrick

 On Wed, Jul 15, 2015 at 11:47 AM, Marcelo Vanzin van...@cloudera.com wrote:
 Hey all,

 Just noticed this when some of our tests started to fail. SPARK-4072 added a
 new method to the SparkListener trait, and even though it has a default
 implementation, it doesn't seem like that applies retroactively.

 Namely, if you have an existing, compiled app that has an implementation of
 SparkListener, that app won't work on 1.5 without a recompile. You'll get
 something like this:

 java.lang.AbstractMethodError
   at
 org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:62)
   at
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at 
 org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56)
   at
 org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
   at
 org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:79)
   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1235)
   at
 org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)


 Now I know that SparkListener is marked as @DeveloperApi, but is this
 something we should care about? Seems like adding methods to traits is just
 as backwards-incompatible as adding new methods to Java interfaces.


 --
 Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Announcing Spark 1.4.1!

2015-07-15 Thread Patrick Wendell

Hi All,

I'm happy to announce the Spark 1.4.1 maintenance release.
We recommend all users on the 1.4 branch upgrade to
this release, which contain several important bug fixes.

Download Spark 1.4.1 - http://spark.apache.org/downloads.html
Release notes - http://spark.apache.org/releases/spark-release-1-4-1.html
Comprehensive list of fixes - http://s.apache.org/spark-1.4.1

Thanks to the 85 developers who worked on this release!

Please contact me directly for errata in the release notes.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-13 Thread Patrick Wendell

This vote passes with 14 +1 (7 binding) votes and no 0 or -1 votes.

+1 (14):
Patrick Wendell
Reynold Xin
Sean Owen
Burak Yavuz
Mark Hamstra
Michael Armbrust
Andrew Or
York, Brennon
Krishna Sankar
Luciano Resende
Holden Karau
Tom Graves
Denny Lee
Sean McNamara

- Patrick

On Wed, Jul 8, 2015 at 10:55 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1125/
 [published as version: 1.4.1-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1126/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Sunday, July 12, at 06:55 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Foundation policy on releases and Spark nightly builds

2015-07-12 Thread Patrick Wendell

Thanks Sean O. I was thinking something like NOTE: Nightly builds are
meant for development and testing purposes. They do not go through
Apache's release auditing process and are not official releases.

- Patrick

On Sun, Jul 12, 2015 at 3:39 PM, Sean Owen so...@cloudera.com wrote:
 (This sounds pretty good to me. Mark it developers-only, not formally
 tested by the community, etc.)

 On Sun, Jul 12, 2015 at 7:50 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Sean B.,

 Thanks for bringing this to our attention. I think putting them on the
 developer wiki would substantially decrease visibility in a way that
 is not beneficial to the project - this feature was specifically
 requested by developers from other projects that integrate with Spark.

 If the concern underlying that policy is that snapshot builds could be
 misconstrued as formal releases, I think it would work to put a very
 clear disclaimer explaining the difference directly adjacent to the
 link. That's arguably more explicit than just moving the same text to
 a different page.

 The formal policy asks us not to include links that encourage
 non-developers to download the builds. Stating clearly that the
 audience for those links is developers, in my interpretation that
 would satisfy the letter and spirit of this policy.

 - Patrick


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-12 Thread Patrick Wendell

I think we can close this vote soon. Any addition votes/testing would
be much appreciated!

On Fri, Jul 10, 2015 at 11:30 AM, Sean McNamara
sean.mcnam...@webtrends.com wrote:
 +1

 Sean

 On Jul 8, 2015, at 11:55 PM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1125/
 [published as version: 1.4.1-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1126/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Sunday, July 12, at 06:55 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Foundation policy on releases and Spark nightly builds

2015-07-12 Thread Patrick Wendell

Hey Sean B.,

Thanks for bringing this to our attention. I think putting them on the
developer wiki would substantially decrease visibility in a way that
is not beneficial to the project - this feature was specifically
requested by developers from other projects that integrate with Spark.

If the concern underlying that policy is that snapshot builds could be
misconstrued as formal releases, I think it would work to put a very
clear disclaimer explaining the difference directly adjacent to the
link. That's arguably more explicit than just moving the same text to
a different page.

The formal policy asks us not to include links that encourage
non-developers to download the builds. Stating clearly that the
audience for those links is developers, in my interpretation that
would satisfy the letter and spirit of this policy.

- Patrick

On Sat, Jul 11, 2015 at 11:53 AM, Sean Owen so...@cloudera.com wrote:
 From a developer perspective, I also find it surprising to hear that
 nightly builds should be hidden from non-developer end users. In an
 age of Github, what on earth is the problem with distributing the
 content of master? However I do understand why this exists.

 To the extent the ASF provides any value, it is at least a legal
 framework for defining what it means for you and I to give software to
 a bunch of other people. Software artifacts released according to an
 ASF process becomes something the ASF can take responsibility for as
 an entity. Nightly builds are not. It might matter to the committers
 if, say, somebody commits a serious data loss bug. You don't want to
 be on the hook individually for putting that into end-user hands.

 More practically, I think this exists to prevent some projects from
 lazily depending on unofficial nightly builds as pseudo-releases for
 long periods of time. End users may come to perceive them as official
 sanctioned releases when they aren't. That's not the case here of
 course.

 I think nightlies aren't for end-users anyway, and I think developers
 who care would know how to get nightlies anyway. There's little cost
 to moving this info to the wiki, so I'd do it.

 On Sat, Jul 11, 2015 at 4:29 PM, Reynold Xin r...@databricks.com wrote:
 I don't get this rule. It is arbitrary, and does not seem like something
 that should be enforced at the foundation level. By this reasoning, are we
 not allowed to list source code management on the project public page as
 well?

 The download page clearly states the nightly builds are bleeding-edge.

 Note that technically we did not violate any rules, since the ones we showed
 were not nightly builds by the foundation's definition: Nightly Builds
 are simply built from the Subversion trunk, usually once a day.. Spark
 nightly artifacts were built from git, not svn trunk. :)  (joking).



 On Sat, Jul 11, 2015 at 7:44 AM, Sean Busbey bus...@cloudera.com wrote:

 That would be great.

 A note on that page that it's meant for the use of folks working on the
 project with a link to your get involved howto would be nice additional
 context.

 --
 Sean

 On Jul 11, 2015 6:18 AM, Sean Owen so...@cloudera.com wrote:

 I suggest we move this info to the developer wiki, to keep it out from
 the place all and users look for downloads. What do you think about
 that Sean B?

 On Sat, Jul 11, 2015 at 5:34 AM, Sean Busbey bus...@cloudera.com wrote:
  Hi Folks!
 
  I noticed that Spark website's download page lists nightly builds and
  instructions for accessing SNAPSHOT maven artifacts[1]. The ASF policy
  on
  releases expressly forbids this kind of publishing outside of the
  dev@spark
  community[2].
 
  If you'd like to discuss having the policy updated (including expanding
  the
  definition of in the development community), please contribute to the
  discussion on general@incubator[3] after removing the offending items.
 
  [1]:
  http://spark.apache.org/downloads.html#nightly-packages-and-artifacts
  [2]: http://www.apache.org/dev/release.html#what
  [3]: http://s.apache.org/XFP
 
  --
  Sean



 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-09 Thread Patrick Wendell

+1

On Wed, Jul 8, 2015 at 10:55 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1125/
 [published as version: 1.4.1-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1126/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Sunday, July 12, at 06:55 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Patrick Wendell

Yeah - we can fix the docs separately from the release.

- Patrick

On Wed, Jul 8, 2015 at 10:03 AM, Mark Hamstra m...@clearstorydata.com wrote:
 HiveSparkSubmitSuite is fine for me, but I do see the same issue with
 DataFrameStatSuite -- OSX 10.10.4, java

 1.7.0_75, -Phive -Phive-thriftserver -Phadoop-2.4 -Pyarn


 On Wed, Jul 8, 2015 at 4:18 AM, Sean Owen so...@cloudera.com wrote:

 The POM issue is resolved and the build succeeds. The license and sigs
 still work. The tests pass for me with -Pyarn -Phadoop-2.6, with the
 following two exceptions. Is anyone else seeing these? this is
 consistent on Ubuntu 14 with Java 7/8:

 DataFrameStatSuite:
 ...
 - special crosstab elements (., '', null, ``) *** FAILED ***
   java.lang.NullPointerException:
   at
 org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$4.apply(StatFunctions.scala:131)
   at
 org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$4.apply(StatFunctions.scala:121)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.Map$Map4.foreach(Map.scala:181)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at
 org.apache.spark.sql.execution.stat.StatFunctions$.crossTabulate(StatFunctions.scala:121)
   at
 org.apache.spark.sql.DataFrameStatFunctions.crosstab(DataFrameStatFunctions.scala:94)
   at
 org.apache.spark.sql.DataFrameStatSuite$$anonfun$5.apply$mcV$sp(DataFrameStatSuite.scala:97)
   ...

 HiveSparkSubmitSuite:
 - SPARK-8368: includes jars passed in through --jars *** FAILED ***
   Process returned with exit code 1. See the log4j logs for more
 detail. (HiveSparkSubmitSuite.scala:92)
 - SPARK-8020: set sql conf in spark conf *** FAILED ***
   Process returned with exit code 1. See the log4j logs for more
 detail. (HiveSparkSubmitSuite.scala:92)
 - SPARK-8489: MissingRequirementError during reflection *** FAILED ***
   Process returned with exit code 1. See the log4j logs for more
 detail. (HiveSparkSubmitSuite.scala:92)

 On Tue, Jul 7, 2015 at 8:06 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
  1.4.1!
 
  This release fixes a handful of known issues in Spark 1.4.0, listed
  here:
  http://s.apache.org/spark-1.4.1
 
  The tag to be voted on is v1.4.1-rc3 (commit 3e8ae38):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  3e8ae38944f13895daf328555c1ad22cd590b089
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.1]
  https://repository.apache.org/content/repositories/orgapachespark-1123/
  [published as version: 1.4.1-rc3]
  https://repository.apache.org/content/repositories/orgapachespark-1124/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.1!
 
  The vote is open until Friday, July 10, at 20:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Patrick Wendell

Hey All,

The issue that Josh pointed out is not just a test failure, it's an
issue with an important bug fix that was not correctly back-ported
into the 1.4 branch. Unfortunately the overall state of the 1.4 branch
tests on Jenkins was not in great shape so this was missed earlier on.

Given that this is fixed now, I have prepared another RC and am
leaning towards restarting the vote. If anyone feels strongly one way
or the other let me know, otherwise I'll restart it in a few hours. I
figured since this will likely finalize over the weekend anyways, it's
not so bad to wait 1 additional day in order to get that fix.

- Patrick

On Wed, Jul 8, 2015 at 12:00 PM, Josh Rosen rosenvi...@gmail.com wrote:
 I've filed https://issues.apache.org/jira/browse/SPARK-8903 to fix the
 DataFrameStatSuite test failure. The problem turned out to be caused by a
 mistake made while resolving a merge-conflict when backporting that patch to
 branch-1.4.

 I've submitted https://github.com/apache/spark/pull/7295 to fix this issue.

 On Wed, Jul 8, 2015 at 11:30 AM, Sean Owen so...@cloudera.com wrote:

 I see, but shouldn't this test not be run when Hive isn't in the build?

 On Wed, Jul 8, 2015 at 7:13 PM, Andrew Or and...@databricks.com wrote:
  @Sean You actually need to run HiveSparkSubmitSuite with `-Phive` and
  `-Phive-thriftserver`. The MissingRequirementsError is just complaining
  that
  it can't find the right classes. The other one (DataFrameStatSuite) is a
  little more concerning.
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-08 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.1!

This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1

The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.1]
https://repository.apache.org/content/repositories/orgapachespark-1125/
[published as version: 1.4.1-rc4]
https://repository.apache.org/content/repositories/orgapachespark-1126/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

Please vote on releasing this package as Apache Spark 1.4.1!

The vote is open until Sunday, July 12, at 06:55 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Patrick Wendell

This vote is cancelled in favor of RC4.

- Patrick

On Tue, Jul 7, 2015 at 12:06 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc3 (commit 3e8ae38):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 3e8ae38944f13895daf328555c1ad22cd590b089

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1123/
 [published as version: 1.4.1-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1124/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Friday, July 10, at 20:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-07 Thread Patrick Wendell

Hey All,

This vote is cancelled in favor of RC3.

- Patrick

On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 07b95c7adf88f0662b7ab1c47e302ff5e6859606

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1120/
 [published as version: 1.4.1-rc2]
 https://repository.apache.org/content/repositories/orgapachespark-1121/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Monday, July 06, at 22:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Can not build master

2015-07-04 Thread Patrick Wendell

Hi Tomo,

For now you can do that as a work around. We are working on a fix for
this in the master branch but it may take a couple of days since the
issue is fairly complicated.

- Patrick

On Sat, Jul 4, 2015 at 7:00 AM, tomo cocoa cocoatom...@gmail.com wrote:
 Hi all,

 I have a same error and it seems depending on Maven versions.

 I tried building Spark using Maven with several versions on Jenkins.

 + Output of
 /Users/tomohiko/.jenkins/tools/hudson.tasks.Maven_MavenInstallation/mvn-3.3.3/bin/mvn
 -version:

 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T20:57:37+09:00)
 Maven home:
 /Users/tomohiko/.jenkins/tools/hudson.tasks.Maven_MavenInstallation/mvn-3.3.3
 Java version: 1.8.0, vendor: Oracle Corporation
 Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac

 + Jenkins Configuration:
 Jenkins project type: Maven Project
 Goals and options: -Phadoop-2.6 -DskipTests clean package

 + Maven versions and results:
 3.3.3 - infinite loop
 3.3.1 - infinite loop
 3.2.5 - SUCCESS


 So do we prefer to build Spark with Maven 3.2.5?


 On 4 July 2015 at 12:28, Andrew Or and...@databricks.com wrote:

 Thanks, I just tried it with 3.3.3 and I was able to reproduce it as well.

 2015-07-03 18:51 GMT-07:00 Tarek Auel tarek.a...@gmail.com:

 That's mine

 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T04:57:37-07:00)

 Maven home: /usr/local/Cellar/maven/3.3.3/libexec

 Java version: 1.8.0_45, vendor: Oracle Corporation

 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/jre

 Default locale: en_US, platform encoding: UTF-8

 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac


 On Fri, Jul 3, 2015 at 6:32 PM Ted Yu yuzhih...@gmail.com wrote:

 Here is mine:

 Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c;
 2015-03-13T13:10:27-07:00)
 Maven home: /home/hbase/apache-maven-3.3.1
 Java version: 1.8.0_45, vendor: Oracle Corporation
 Java home: /home/hbase/jdk1.8.0_45/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-504.el6.x86_64, arch: amd64,
 family: unix

 On Fri, Jul 3, 2015 at 6:05 PM, Andrew Or and...@databricks.com wrote:

 @Tarek and Ted, what maven versions are you using?

 2015-07-03 17:35 GMT-07:00 Krishna Sankar ksanka...@gmail.com:

 Patrick,
I assume an RC3 will be out for folks like me to test the
 distribution. As usual, I will run the tests when you have a new
 distribution.
 Cheers
 k/

 On Fri, Jul 3, 2015 at 4:38 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Patch that added test-jar dependencies:
 https://github.com/apache/spark/commit/bfe74b34

 Patch that originally disabled dependency reduced poms:

 https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724

 Patch that reverted the disabling of dependency reduced poms:

 https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e

 On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Okay I did some forensics with Sean Owen. Some things about this
  bug:
 
  1. The underlying cause is that we added some code to make the
  tests
  of sub modules depend on the core tests. For unknown reasons this
  causes Spark to hit MSHADE-148 for *some* combinations of build
  profiles.
 
  2. MSHADE-148 can be worked around by disabling building of
  dependency reduced poms because then the buggy code path is
  circumvented. Andrew Or did this in a patch on the 1.4 branch.
  However, that is not a tenable option for us because our
  *published*
  pom files require dependency reduction to substitute in the scala
  version correctly for the poms published to maven central.
 
  3. As a result, Andrew Or reverted his patch recently, causing some
  package builds to start failing again (but publishing works now).
 
  4. The reason this is not detected in our test harness or release
  build is that it is sensitive to the profiles enabled. The
  combination
  of profiles we enable in the test harness and release builds do not
  trigger this bug.
 
  The best path I see forward right now is to do the following:
 
  1. Disable creation of dependency reduced poms by default (this
  doesn't matter for people doing a package build) so typical users
  won't have this bug.
 
  2. Add a profile that re-enables that setting.
 
  3. Use the above profile when publishing release artifacts to maven
  central.
 
  4. Hope that we don't hit this bug for publishing.
 
  - Patrick
 
  On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com
  wrote:
  Doesn't change anything for me.
 
  On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell
  pwend...@gmail.com wrote:
 
  Can you try using the built in maven build/mvn...? All of our
  builds
  are passing on Jenkins so I wonder if it's a maven version issue:
 
  https

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell

Hm - what if you do a fresh git checkout (just to make sure you don't
have an older maven version downloaded). It also might be that this
really is an issue even with Maven 3.3.3. I just am not sure why it's
not reflected in our continuous integration or the build of the
release packages themselves:

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

It could be that it's dependent on which modules are enabled.

On Fri, Jul 3, 2015 at 3:46 PM, Robin East robin.e...@xense.co.uk wrote:
 which got me thinking:

 build/mvn -version
 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;
 support was removed in 8.0
 Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c;
 2015-03-13T20:10:27+00:00)
 Maven home: /usr/local/Cellar/maven/3.3.1/libexec
 Java version: 1.8.0_40, vendor: Oracle Corporation
 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.2, arch: x86_64, family: “mac

 Seems to be using 3.3.1

 On 3 Jul 2015, at 23:44, Robin East robin.e...@xense.co.uk wrote:

 I used the following build command:

 build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
 package

 this also gave the ‘Dependency-reduced POM’ loop

 Robin

 On 3 Jul 2015, at 23:41, Patrick Wendell pwend...@gmail.com wrote:

 What if you use the built-in maven (i.e. build/mvn). It might be that
 we require a newer version of maven than you have. The release itself
 is built with maven 3.3.3:

 https://github.com/apache/spark/blob/master/build/mvn#L72

 - Patrick

 On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com wrote:

 Yep, happens to me as well. Build loops.
 Cheers
 k/

 On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote:


 Patrick:
 I used the following command:
 ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean
 package

 The build doesn't seem to stop.
 Here is tail of build output:

 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml

 Here is part of the stack trace for the build process:

 http://pastebin.com/xL2Y0QMU

 FYI

 On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com
 wrote:


 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 07b95c7adf88f0662b7ab1c47e302ff5e6859606

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1120/
 [published as version: 1.4.1-rc2]
 https://repository.apache.org/content/repositories/orgapachespark-1121/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Monday, July 06, at 22:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell

Let's continue the disucssion on the other thread relating to the master build.

On Fri, Jul 3, 2015 at 4:13 PM, Patrick Wendell pwend...@gmail.com wrote:
 Thanks - it appears this is just a legitimate issue with the build,
 affecting all versions of Maven.

 On Fri, Jul 3, 2015 at 4:02 PM, Krishna Sankar ksanka...@gmail.com wrote:
 I have 3.3.3
 USS-Defiant:NW ksankar$ mvn -version
 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T04:57:37-07:00)
 Maven home: /usr/local/apache-maven-3.3.3
 Java version: 1.7.0_60, vendor: Oracle Corporation
 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac

 Let me nuke it and reinstall maven.

 Cheers
 k/

 On Fri, Jul 3, 2015 at 3:41 PM, Patrick Wendell pwend...@gmail.com wrote:

 What if you use the built-in maven (i.e. build/mvn). It might be that
 we require a newer version of maven than you have. The release itself
 is built with maven 3.3.3:

 https://github.com/apache/spark/blob/master/build/mvn#L72

 - Patrick

 On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com
 wrote:
  Yep, happens to me as well. Build loops.
  Cheers
  k/
 
  On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  Patrick:
  I used the following command:
  ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive
  clean
  package
 
  The build doesn't seem to stop.
  Here is tail of build output:
 
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
 
  Here is part of the stack trace for the build process:
 
  http://pastebin.com/xL2Y0QMU
 
  FYI
 
  On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
  version
  1.4.1!
 
  This release fixes a handful of known issues in Spark 1.4.0, listed
  here:
  http://s.apache.org/spark-1.4.1
 
  The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  07b95c7adf88f0662b7ab1c47e302ff5e6859606
 
  The release files, including signatures, digests, etc. can be found
  at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.1]
 
  https://repository.apache.org/content/repositories/orgapachespark-1120/
  [published as version: 1.4.1-rc2]
 
  https://repository.apache.org/content/repositories/orgapachespark-1121/
 
  The documentation corresponding to this release can be found at:
 
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.1!
 
  The vote is open until Monday, July 06, at 22:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Can not build master

2015-07-03 Thread Patrick Wendell

Okay I did some forensics with Sean Owen. Some things about this bug:

1. The underlying cause is that we added some code to make the tests
of sub modules depend on the core tests. For unknown reasons this
causes Spark to hit MSHADE-148 for *some* combinations of build
profiles.

2. MSHADE-148 can be worked around by disabling building of
dependency reduced poms because then the buggy code path is
circumvented. Andrew Or did this in a patch on the 1.4 branch.
However, that is not a tenable option for us because our *published*
pom files require dependency reduction to substitute in the scala
version correctly for the poms published to maven central.

3. As a result, Andrew Or reverted his patch recently, causing some
package builds to start failing again (but publishing works now).

4. The reason this is not detected in our test harness or release
build is that it is sensitive to the profiles enabled. The combination
of profiles we enable in the test harness and release builds do not
trigger this bug.

The best path I see forward right now is to do the following:

1. Disable creation of dependency reduced poms by default (this
doesn't matter for people doing a package build) so typical users
won't have this bug.

2. Add a profile that re-enables that setting.

3. Use the above profile when publishing release artifacts to maven central.

4. Hope that we don't hit this bug for publishing.

- Patrick

On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote:
 Doesn't change anything for me.

 On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote:

 Can you try using the built in maven build/mvn...? All of our builds
 are passing on Jenkins so I wonder if it's a maven version issue:

 https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

 - Patrick

 On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote:
  Please take a look at SPARK-8781
  (https://github.com/apache/spark/pull/7193)
 
  Cheers
 
  On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote:
 
  I found a solution, there might be a better one.
 
  https://github.com/apache/spark/pull/7217
 
  On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk
  wrote:
 
  Yes me too
 
  On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote:
 
  This is what I got (the last line was repeated non-stop):
 
  [INFO] Replacing original artifact with shaded artifact.
  [INFO] Replacing
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar
  with
 
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
 
  On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com
  wrote:
 
  Hi all,
 
  I am trying to build the master, but it stucks and prints
 
  [INFO] Dependency-reduced POM written at:
  /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml
 
  build command:  mvn -DskipTests clean package
 
  Do others have the same issue?
 
  Regards,
  Tarek
 
 
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Can not build master

2015-07-03 Thread Patrick Wendell

Patch that added test-jar dependencies:
https://github.com/apache/spark/commit/bfe74b34

Patch that originally disabled dependency reduced poms:
https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724

Patch that reverted the disabling of dependency reduced poms:
https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e

On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell pwend...@gmail.com wrote:
 Okay I did some forensics with Sean Owen. Some things about this bug:

 1. The underlying cause is that we added some code to make the tests
 of sub modules depend on the core tests. For unknown reasons this
 causes Spark to hit MSHADE-148 for *some* combinations of build
 profiles.

 2. MSHADE-148 can be worked around by disabling building of
 dependency reduced poms because then the buggy code path is
 circumvented. Andrew Or did this in a patch on the 1.4 branch.
 However, that is not a tenable option for us because our *published*
 pom files require dependency reduction to substitute in the scala
 version correctly for the poms published to maven central.

 3. As a result, Andrew Or reverted his patch recently, causing some
 package builds to start failing again (but publishing works now).

 4. The reason this is not detected in our test harness or release
 build is that it is sensitive to the profiles enabled. The combination
 of profiles we enable in the test harness and release builds do not
 trigger this bug.

 The best path I see forward right now is to do the following:

 1. Disable creation of dependency reduced poms by default (this
 doesn't matter for people doing a package build) so typical users
 won't have this bug.

 2. Add a profile that re-enables that setting.

 3. Use the above profile when publishing release artifacts to maven central.

 4. Hope that we don't hit this bug for publishing.

 - Patrick

 On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote:
 Doesn't change anything for me.

 On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote:

 Can you try using the built in maven build/mvn...? All of our builds
 are passing on Jenkins so I wonder if it's a maven version issue:

 https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

 - Patrick

 On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote:
  Please take a look at SPARK-8781
  (https://github.com/apache/spark/pull/7193)
 
  Cheers
 
  On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote:
 
  I found a solution, there might be a better one.
 
  https://github.com/apache/spark/pull/7217
 
  On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk
  wrote:
 
  Yes me too
 
  On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote:
 
  This is what I got (the last line was repeated non-stop):
 
  [INFO] Replacing original artifact with shaded artifact.
  [INFO] Replacing
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar
  with
 
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
 
  On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com
  wrote:
 
  Hi all,
 
  I am trying to build the master, but it stucks and prints
 
  [INFO] Dependency-reduced POM written at:
  /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml
 
  build command:  mvn -DskipTests clean package
 
  Do others have the same issue?
 
  Regards,
  Tarek
 
 
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Can not build master

2015-07-03 Thread Patrick Wendell

Can you try using the built in maven build/mvn...? All of our builds
are passing on Jenkins so I wonder if it's a maven version issue:

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

- Patrick

On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote:
 Please take a look at SPARK-8781 (https://github.com/apache/spark/pull/7193)

 Cheers

 On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote:

 I found a solution, there might be a better one.

 https://github.com/apache/spark/pull/7217

 On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk wrote:

 Yes me too

 On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote:

 This is what I got (the last line was repeated non-stop):

 [INFO] Replacing original artifact with shaded artifact.
 [INFO] Replacing
 /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with
 /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark/bagel/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark/bagel/dependency-reduced-pom.xml

 On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com wrote:

 Hi all,

 I am trying to build the master, but it stucks and prints

 [INFO] Dependency-reduced POM written at:
 /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml

 build command:  mvn -DskipTests clean package

 Do others have the same issue?

 Regards,
 Tarek





-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell

Thanks - it appears this is just a legitimate issue with the build,
affecting all versions of Maven.

On Fri, Jul 3, 2015 at 4:02 PM, Krishna Sankar ksanka...@gmail.com wrote:
 I have 3.3.3
 USS-Defiant:NW ksankar$ mvn -version
 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T04:57:37-07:00)
 Maven home: /usr/local/apache-maven-3.3.3
 Java version: 1.7.0_60, vendor: Oracle Corporation
 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac

 Let me nuke it and reinstall maven.

 Cheers
 k/

 On Fri, Jul 3, 2015 at 3:41 PM, Patrick Wendell pwend...@gmail.com wrote:

 What if you use the built-in maven (i.e. build/mvn). It might be that
 we require a newer version of maven than you have. The release itself
 is built with maven 3.3.3:

 https://github.com/apache/spark/blob/master/build/mvn#L72

 - Patrick

 On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com
 wrote:
  Yep, happens to me as well. Build loops.
  Cheers
  k/
 
  On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  Patrick:
  I used the following command:
  ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive
  clean
  package
 
  The build doesn't seem to stop.
  Here is tail of build output:
 
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
 
  Here is part of the stack trace for the build process:
 
  http://pastebin.com/xL2Y0QMU
 
  FYI
 
  On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
  version
  1.4.1!
 
  This release fixes a handful of known issues in Spark 1.4.0, listed
  here:
  http://s.apache.org/spark-1.4.1
 
  The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  07b95c7adf88f0662b7ab1c47e302ff5e6859606
 
  The release files, including signatures, digests, etc. can be found
  at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.1]
 
  https://repository.apache.org/content/repositories/orgapachespark-1120/
  [published as version: 1.4.1-rc2]
 
  https://repository.apache.org/content/repositories/orgapachespark-1121/
 
  The documentation corresponding to this release can be found at:
 
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.1!
 
  The vote is open until Monday, July 06, at 22:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.1

2015-07-03 Thread Patrick Wendell

This vote is cancelled in favor of RC2. Thanks very much to Sean Owen
for triaging an important bug associated with RC1.

I took a look at the branch-1.4 contents and I think its safe to cut
RC2 from the head of that branch (i.e no very high risk patches that I
could see). JIRA management around the time of the RC voting is an
interesting topic, Sean I like your most recent proposal. Maybe we can
put that on the wiki or start a DISCUSS thread to cover that topic.

On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.1!

This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1

The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
07b95c7adf88f0662b7ab1c47e302ff5e6859606

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.1]
https://repository.apache.org/content/repositories/orgapachespark-1120/
[published as version: 1.4.1-rc2]
https://repository.apache.org/content/repositories/orgapachespark-1121/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/

Please vote on releasing this package as Apache Spark 1.4.1!

The vote is open until Monday, July 06, at 22:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-28 Thread Patrick Wendell

Hey Krishna - this is still the current release candidate.

- Patrick

On Sun, Jun 28, 2015 at 12:14 PM, Krishna Sankar ksanka...@gmail.com wrote:
 Patrick,
Haven't seen any replies on test results. I will byte ;o) - Should I test
 this version or is another one in the wings ?
 Cheers
 k/

 On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-26 Thread Patrick Wendell

Hey Tom - no one voted on this yet, so I need to keep it open until
people vote. But I'm not aware of specific things we are waiting for.
Anyone else?

- Patrick

On Fri, Jun 26, 2015 at 7:10 AM, Tom Graves tgraves...@yahoo.com wrote:
 So is this open for vote then or are we waiting on other things?

 Tom



 On Thursday, June 25, 2015 10:32 AM, Andrew Ash and...@andrewash.com
 wrote:


 I would guess that many tickets targeted at 1.4.1 were set that way during
 the tail end of the 1.4.0 voting process as people realized they wouldn't
 make the .0 release in time.  In that case, they were likely aiming for a
 1.4.x release, not necessarily 1.4.1 specifically.  Maybe creating a 1.4.x
 target in Jira in addition to 1.4.0, 1.4.1, 1.4.2, etc would make it more
 clear that these tickets are targeted at some 1.4 update release rather
 than specifically the 1.4.1 update.

 On Thu, Jun 25, 2015 at 5:38 AM, Sean Owen so...@cloudera.com wrote:

 That makes sense to me -- there's an urgent fix to get out. I missed
 that part. Not that it really matters but was that expressed
 elsewhere?

 I know we tend to start the RC process even when a few more changes
 are still in progress, to get a first wave or two of testing done
 early, knowing that the RC won't be the final one. It makes sense for
 some issues for X to be open when an RC is cut, if they are actually
 truly intended for X.

 44 seems like a lot, and I don't think it's good practice just because
 that's how it's happened before. It looks like half of them weren't
 actually important for 1.4.x as we're now down to 21. I don't disagree
 with the idea that only most of the issues targeted for version X
 will be in version X; the target expresses a stretch goal. Given the
 fast pace of change that's probably the only practical view.

 I think we're just missing a step then: before RC of X, ask people to
 review and update the target of JIRAs for X? In this case, it was a
 good point to untarget stuff from 1.4.x entirely; I suspect everything
 else should then be targeted at 1.4.2 by default with the exception of
 a handful that people really do intend to work in for 1.4.1 before its
 final release.

 I know it sounds like pencil-pushing, but it's a cheap way to bring
 some additional focus to release planning. RC time has felt like a
 last-call to *begin* changes ad-hoc when it would go faster if it were
 more intentional and constrained. Meaning faster RCs, meaning getting
 back to a 3-month release cycle or less, and meaning less rush to push
 stuff into a .0 release and less frequent need for a maintenance .1
 version.

 So what happens if all 1.4.1-targeted JIRAs are targeted to 1.4.2?
 would that miss something that is definitely being worked on for
 1.4.1?

 On Wed, Jun 24, 2015 at 6:56 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Sean,

 This is being shipped now because there is a severe bug in 1.4.0 that
 can cause data corruption for Parquet users.

 There are no blockers targeted for 1.4.1 - so I don't see that JIRA is
 inconsistent with shipping a release now. The goal of having every
 single targeted JIRA cleared by the time we start voting, I don't
 think there is broad consensus and cultural adoption of that principle
 yet. So I do not take it as a signal this release is premature (the
 story has been the same for every previous release we've ever done).

 The fact that we hit 90/124 of issues targeted at this release means
 we are targeting such that we get around 70% of issues merged. That
 actually doesn't seem so bad to me since there is some uncertainty in
 the process. B

 - Patrick

 On Wed, Jun 24, 2015 at 1:54 AM, Sean Owen so...@cloudera.com wrote:
 There are 44 issues still targeted for 1.4.1. None are Blockers; 12
 are Critical. ~80% were opened and/or set by committers. Compare with
 90 issues resolved for 1.4.1.

 I'm concerned that committers are targeting lots more for a release
 even in the short term than realistically can go in. On its face, it
 suggests that an RC is premature. Why is 1.4.1 being put forth for
 release now? It seems like people are saying they want a fair bit more
 time to work on 1.4.1.

 I suspect that in fact people would rather untarget / slip (again)
 these JIRAs, but it calls into question again how the targeting is
 consistently off by this much.

 What unresolved JIRAs targeted for 1.4.1 are *really* still open for
 1.4.1? like, what would go badly if all 32 non-Critical JIRAs were
 untargeted now? is the reality that there are a handful of items to
 get in before the final release, and those are hopefully the ~12
 critical ones? How about some review of that before we ask people to
 seriously test these bits?

 On Wed, Jun 24, 2015 at 8:37 AM, Patrick Wendell pwend...@gmail.com
 wrote:
 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed
 here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-24 Thread Patrick Wendell

Hey Sean,

This is being shipped now because there is a severe bug in 1.4.0 that
can cause data corruption for Parquet users.

There are no blockers targeted for 1.4.1 - so I don't see that JIRA is
inconsistent with shipping a release now. The goal of having every
single targeted JIRA cleared by the time we start voting, I don't
think there is broad consensus and cultural adoption of that principle
yet. So I do not take it as a signal this release is premature (the
story has been the same for every previous release we've ever done).

The fact that we hit 90/124 of issues targeted at this release means
we are targeting such that we get around 70% of issues merged. That
actually doesn't seem so bad to me since there is some uncertainty in
the process. B

- Patrick

On Wed, Jun 24, 2015 at 1:54 AM, Sean Owen so...@cloudera.com wrote:
 There are 44 issues still targeted for 1.4.1. None are Blockers; 12
 are Critical. ~80% were opened and/or set by committers. Compare with
 90 issues resolved for 1.4.1.

 I'm concerned that committers are targeting lots more for a release
 even in the short term than realistically can go in. On its face, it
 suggests that an RC is premature. Why is 1.4.1 being put forth for
 release now? It seems like people are saying they want a fair bit more
 time to work on 1.4.1.

 I suspect that in fact people would rather untarget / slip (again)
 these JIRAs, but it calls into question again how the targeting is
 consistently off by this much.

 What unresolved JIRAs targeted for 1.4.1 are *really* still open for
 1.4.1? like, what would go badly if all 32 non-Critical JIRAs were
 untargeted now? is the reality that there are a handful of items to
 get in before the final release, and those are hopefully the ~12
 critical ones? How about some review of that before we ask people to
 seriously test these bits?

 On Wed, Jun 24, 2015 at 8:37 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[VOTE] Release Apache Spark 1.4.1

2015-06-23 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.1!

This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1

The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
60e08e50751fe3929156de956d62faea79f5b801

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.1]
https://repository.apache.org/content/repositories/orgapachespark-1118/
[published as version: 1.4.1-rc1]
https://repository.apache.org/content/repositories/orgapachespark-1119/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

Please vote on releasing this package as Apache Spark 1.4.1!

The vote is open until Saturday, June 27, at 06:32 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-13 Thread Patrick Wendell

Yeah so Steve, hopefully it's self evident, but that is a perfect
example of the kind of annoying stuff we don't want to force users to
deal with by forcing an upgrade to 2.X. Compare the pain from Spark
users of trying to reason about what to do (and btw it seems like the
answer is simply that there isn't a good answer). And that will be
experienced by every Spark users who uses AWS and the Spark ec2
scripts, which are extremely popular.

Is this pain, in aggregate, more than our cost of having a few patches
to deal with runtime reflection stuff to make things work with Hadoop
1? My feeling is that it's much more efficient for us as the Spark
maintainers to pay this cost rather than to force a lot of our users
to deal with painful upgrades.

On Sat, Jun 13, 2015 at 1:39 AM, Steve Loughran ste...@hortonworks.com wrote:

 On 12 Jun 2015, at 17:12, Patrick Wendell pwend...@gmail.com wrote:

  For instance at Databricks we use
 the FileSystem library for talking to S3... every time we've tried to
 upgrade to Hadoop 2.X there have been significant regressions in
 performance and we've had to downgrade. That's purely anecdotal, but I
 think you have people out there using the Hadoop 1 bindings for whom
 upgrade would be a pain.

 ah s3n. The unloved orphan FS, which has been fairly neglected as being 
 non-strategic to anyone but Amazon, who have a private fork.

 s3n broke in hadopo 2.4 where the upgraded Jets3t went in with some patch 
 which swallowed exceptions (nobody should ever do that) and as result would 
 NPE on a seek(0) of a file of length(0). HADOOP-10457. Fixed in Hadoop 2.5

 Hadoop 2.6 has left S3n on maintenance out of fear of breaking more things, 
 future work is in s3a:,, which switched to the amazon awstoolkit JAR and 
 moved the implementation to hadoop-aws JAR. S3a promises: speed, partitioned 
 upload, better auth.

 But: it's not ready for serious use in Hadoop 2.6, so don't try. You need the 
 Hadoop 2.7 patches, which are in ASF Hadoop 2.7, will be in HDP2.3, and have 
 been picked up in CDH5.3. (HADOOP-11571). For Spark, the fact that the block 
 size is being returned as 0 in getFileStatus() could be the killer.

 Future work is going to improve performance and scale ( HADOOP-11694 )

 Now, if spark is finding problems with s3a performance, tests for this would 
 be great -complaints on JIRAs too. There's not enough functional testing of 
 analytics workloads against the object stores, especially s3 and swift. If 
 someone volunteers to add some optional test module for object store testing, 
 I'll help review it and suggest some tests to generate stress

 That can be done without the leap to Hadoop 2 —though the proposed 
 HADOOP-9565 work allowing object stores to declare that they are and publish 
 some of their consistency and atomicity semantics will be Hadoop 2.8+. If you 
 want your output committers to recognise when the destination is an 
 eventually constitent object store with O(n) directory rename and delete, 
 that's where the code will be.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Patrick Wendell

I feel this is quite different from the Java 6 decision and personally
I don't see sufficient cause to do it.

I would like to understand though Sean - what is the proposal exactly?
Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think
that makes much sense since so many libraries still use those API's.
For YARN support, we already don't support Hadoop 1. So I'll assume
what you mean is to prevent or stop supporting from linking against
the Hadoop 1 filesystem binaries at runtime (is that right?).

The main reason I'd push back is that I do think there are still
people running the older versions. For instance at Databricks we use
the FileSystem library for talking to S3... every time we've tried to
upgrade to Hadoop 2.X there have been significant regressions in
performance and we've had to downgrade. That's purely anecdotal, but I
think you have people out there using the Hadoop 1 bindings for whom
upgrade would be a pain.

In terms of our maintenance cost, to me the much bigger cost for us
IMO is dealing with differences between e.g. 2.2, 2.4, and 2.6 where
major new API's were added. In comparison the Hadoop 1 vs 2 seems
fairly low with just a few bugs cropping up here and there. So unlike
Java 6 where you have a critical mass of maintenance issues, security
issues, etc, I just don't see as compelling a cost here.

To me the framework for deciding about these upgrades is the
maintenance cost vs the inconvenience for users.

- Patrick

On Fri, Jun 12, 2015 at 8:45 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 I'm personally in favor, but I don't have a sense of how many people still
 rely on Hadoop 1.

 Nick

 2015년 6월 12일 (금) 오전 9:13, Steve Loughran
 ste...@hortonworks.com님이 작성:

 +1 for 2.2+

 Not only are the APis in Hadoop 2 better, there's more people testing
 Hadoop 2.x  spark, and bugs in Hadoop itself being fixed.

 (usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc)

  On 12 Jun 2015, at 11:09, Sean Owen so...@cloudera.com wrote:
 
  How does the idea of removing support for Hadoop 1.x for Spark 1.5
  strike everyone? Really, I mean, Hadoop  2.2, as 2.2 seems to me more
  consistent with the modern 2.x line than 2.1 or 2.0.
 
  The arguments against are simply, well, someone out there might be
  using these versions.
 
  The arguments for are just simplification -- fewer gotchas in trying
  to keep supporting older Hadoop, of which we've seen several lately.
  We get to chop out a little bit of shim code and update to use some
  non-deprecated APIs. Along with removing support for Java 6, it might
  be a reasonable time to also draw a line under older Hadoop too.
 
  I'm just gauging feeling now: for, against, indifferent?
  I favor it, but would not push hard on it if there are objections.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[ANNOUNCE] Announcing Spark 1.4

2015-06-11 Thread Patrick Wendell

Hi All,

I'm happy to announce the availability of Spark 1.4.0! Spark 1.4.0 is
the fifth release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 210 developers and more
than 1,000 commits!

A huge thanks go to all of the individuals and organizations involved
in development and testing of this release.

Visit the release notes [1] to read about the new features, or
download [2] the release today.

For errata in the contributions or release notes, please e-mail me
*directly* (not on-list).

Thanks to everyone who helped work on this release!

[1] http://spark.apache.org/releases/spark-release-1-4-0.html
[2] http://spark.apache.org/downloads.html

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-10 Thread Patrick Wendell

This vote passes! Thanks to everyone who voted. I will get the release
artifacts and notes up within a day or two.

+1 (23 votes):
Reynold Xin*
Patrick Wendell*
Matei Zaharia*
Andrew Or*
Timothy Chen
Calvin Jia
Burak Yavuz
Krishna Sankar
Hari Shreedharan
Ram Sriharsha*
Kousuke Saruta
Sandy Ryza
Marcelo Vanzin
Bobby Chowdary
Mark Hamstra
Guoqiang Li
Joseph Bradley
Sean McNamara
Tathagata Das*
Ajay Singal
Wang, Daoyuan
Denny Lee
Forest Fang

0:

-1:

* Binding

On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 22596c534a38cfdda91aef18aa9037ab101e4251

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-/
 [published as version: 1.4.0-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1112/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Saturday, June 06, at 05:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC3 ==
 In addition to may smaller fixes, three blocker issues were fixed:
 4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
 metadataHive get constructed too early
 6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
 78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Jcenter / bintray support for spark packages?

2015-06-10 Thread Patrick Wendell

Hey Hector,

It's not a bad idea. I think we'd want to do this by virtue of
allowing custom repositories, so users can add bintray or others.

- Patrick

On Wed, Jun 10, 2015 at 6:23 PM, Hector Yee hector@gmail.com wrote:
 Hi Spark devs,

 Is it possible to add jcenter or bintray support for Spark packages?

 I'm trying to add our artifact which is on jcenter

 https://bintray.com/airbnb/aerosolve

 but I noticed in Spark packages it only accepts Maven coordinates.

 --
 Yee Yang Li Hector
 google.com/+HectorYee

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-08 Thread Patrick Wendell

Hi All,

Thanks for the continued voting! I'm going to leave this thread open
for another few days to continue to collect feedback.

- Patrick

On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 22596c534a38cfdda91aef18aa9037ab101e4251

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-/
 [published as version: 1.4.0-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1112/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Saturday, June 06, at 05:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC3 ==
 In addition to may smaller fixes, three blocker issues were fixed:
 4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
 metadataHive get constructed too early
 6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
 78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Scheduler question: stages with non-arithmetic numbering

2015-06-07 Thread Patrick Wendell

Hey Mike,

Stage ID's are not guaranteed to be sequential because of the way the
DAG scheduler works (only increasing). In some cases stage ID numbers
are skipped when stages are generated.

Any stage/ID that appears in the Spark UI is an actual stage, so if
you see ID's in there, but they are not in the logs, then let us know
(that would be a bug).

- Patrick

On Sun, Jun 7, 2015 at 9:06 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
 Are you seeing the same behavior on the driver UI? (that running on port
 4040), If you click on the stage id header you can sort the stages based on
 IDs.

 Thanks
 Best Regards

 On Fri, Jun 5, 2015 at 10:21 PM, Mike Hynes 91m...@gmail.com wrote:

 Hi folks,

 When I look at the output logs for an iterative Spark program, I see
 that the stage IDs are not arithmetically numbered---that is, there
 are gaps between stages and I might find log information about Stage
 0, 1,2, 5, but not 3 or 4.

 As an example, the output from the Spark logs below shows what I mean:

 # grep -rE Stage [[:digit:]]+ spark_stderr  | grep finished
 12048:INFO:DAGScheduler:Stage 0 (mapPartitions at blockMap.scala:1444)
 finished in 7.820 s:
 15994:INFO:DAGScheduler:Stage 1 (map at blockMap.scala:1810) finished
 in 3.874 s:
 18291:INFO:DAGScheduler:Stage 2 (count at blockMap.scala:1179)
 finished in 2.237 s:
 20121:INFO:DAGScheduler:Stage 4 (map at blockMap.scala:1817) finished
 in 1.749 s:
 21254:INFO:DAGScheduler:Stage 5 (count at blockMap.scala:1180)
 finished in 1.082 s:
 23422:INFO:DAGScheduler:Stage 7 (map at blockMap.scala:1810) finished
 in 2.078 s:
 24773:INFO:DAGScheduler:Stage 8 (count at blockMap.scala:1188)
 finished in 1.317 s:
 26455:INFO:DAGScheduler:Stage 10 (map at blockMap.scala:1817) finished
 in 1.638 s:
 27228:INFO:DAGScheduler:Stage 11 (count at blockMap.scala:1189)
 finished in 0.732 s:
 27494:INFO:DAGScheduler:Stage 14 (foreach at blockMap.scala:1302)
 finished in 0.192 s:
 27709:INFO:DAGScheduler:Stage 17 (foreach at blockMap.scala:1302)
 finished in 0.170 s:
 28018:INFO:DAGScheduler:Stage 20 (count at blockMap.scala:1201)
 finished in 0.270 s:
 28611:INFO:DAGScheduler:Stage 23 (map at blockMap.scala:1355) finished
 in 0.455 s:
 29598:INFO:DAGScheduler:Stage 24 (count at blockMap.scala:274)
 finished in 0.928 s:
 29954:INFO:DAGScheduler:Stage 27 (map at blockMap.scala:1355) finished
 in 0.305 s:
 30390:INFO:DAGScheduler:Stage 28 (count at blockMap.scala:275)
 finished in 0.391 s:
 30452:INFO:DAGScheduler:Stage 32 (first at
 MatrixFactorizationModel.scala:60) finished in 0.028 s:
 30506:INFO:DAGScheduler:Stage 36 (first at
 MatrixFactorizationModel.scala:60) finished in 0.023 s:

 Can anyone comment on this being normal behavior? Is it indicative of
 faults causing stages to be resubmitted? I also cannot find the
 missing stages in any stage's parent List(Stage x, Stage y, ...)

 Thanks,
 Mike


 On 6/1/15, Reynold Xin r...@databricks.com wrote:
  Thanks, René. I actually added a warning to the new JDBC reader/writer
  interface for 1.4.0.
 
  Even with that, I think we should support throttling JDBC; otherwise
  it's
  too convenient for our users to DOS their production database servers!
 
 
/**
 * Construct a [[DataFrame]] representing the database table
  accessible
  via JDBC URL
 * url named table. Partitions of the table will be retrieved in
  parallel
  based on the parameters
 * passed to this function.
 *
  *   * Don't create too many partitions in parallel on a large cluster;
  otherwise Spark might crash*
  *   * your external database systems.*
 *
 * @param url JDBC database url of the form `jdbc:subprotocol:subname`
 * @param table Name of the table in the external database.
 * @param columnName the name of a column of integral type that will
  be
  used for partitioning.
 * @param lowerBound the minimum value of `columnName` used to decide
  partition stride
 * @param upperBound the maximum value of `columnName` used to decide
  partition stride
 * @param numPartitions the number of partitions.  the range
  `minValue`-`maxValue` will be split
 *  evenly into this many partitions
 * @param connectionProperties JDBC database connection arguments, a
  list
  of arbitrary string
 * tag/value. Normally at least a user
  and
  password property
 * should be included.
 *
 * @since 1.4.0
 */
 
 
  On Mon, Jun 1, 2015 at 1:54 AM, René Treffer rtref...@gmail.com wrote:
 
  Hi,
 
  I'm using sqlContext.jdbc(uri, table, where).map(_ =
  1).aggregate(0)(_+_,_+_) on an interactive shell (where where is an
  Array[String] of 32 to 48 elements).  (The code is tailored to your db,
  specifically through the where conditions, I'd have otherwise post it)
  That should be the DataFrame API, but I'm just trying to load
  everything
  and discard it as soon as possible :-)
 
  (1) Never do a silent drop of the values by default:

[DISCUSS] Minimize use of MINOR, BUILD, and HOTFIX w/ no JIRA

2015-06-06 Thread Patrick Wendell

Hey All,

Just a request here - it would be great if people could create JIRA's
for any and all merged pull requests. The reason is that when patches
get reverted due to build breaks or other issues, it is very difficult
to keep track of what is going on if there is no JIRA. Here is a list
of 5 patches we had to revert recently that didn't include a JIRA:

Revert [MINOR] [BUILD] Use custom temp directory during build.
Revert [SQL] [TEST] [MINOR] Uses a temporary log4j.properties in
HiveThriftServer2Test to ensure expected logging behavior
Revert [BUILD] Always run SQL tests in master build.
Revert [MINOR] [CORE] Warn users who try to cache RDDs with
dynamic allocation on.
Revert [HOT FIX] [YARN] Check whether `/lib` exists before
listing its files

The cost overhead of creating a JIRA relative to other aspects of
development is very small. If it's *really* a documentation change or
something small, that's okay.

But anything affecting the build, packaging, etc. These all need to
have a JIRA to ensure that follow-up can be well communicated to all
Spark developers.

Hopefully this is something everyone can get behind, but opened a
discussion here in case others feel differently.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-04 Thread Patrick Wendell

I will give +1 as well.

On Wed, Jun 3, 2015 at 11:59 PM, Reynold Xin r...@databricks.com wrote:
 Let me give you the 1st

 +1



 On Tue, Jun 2, 2015 at 10:47 PM, Patrick Wendell pwend...@gmail.com wrote:

 He all - a tiny nit from the last e-mail. The tag is v1.4.0-rc4. The
 exact commit and all other information is correct. (thanks Shivaram
 who pointed this out).

 On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
  1.4.0!
 
  The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  22596c534a38cfdda91aef18aa9037ab101e4251
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.0]
  https://repository.apache.org/content/repositories/orgapachespark-/
  [published as version: 1.4.0-rc4]
  https://repository.apache.org/content/repositories/orgapachespark-1112/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Saturday, June 06, at 05:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What has changed since RC3 ==
  In addition to may smaller fixes, three blocker issues were fixed:
  4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
  metadataHive get constructed too early
  6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
  78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton
 
  == How can I help test this release? ==
  If you are a Spark user, you can help us test this release by
  taking a Spark 1.3 workload and running on this release candidate,
  then reporting any regressions.
 
  == What justifies a -1 vote for this release? ==
  This vote is happening towards the end of the 1.4 QA period,
  so -1 votes should only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related
  to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-02 Thread Patrick Wendell

This vote is cancelled in favor of RC4.

Thanks everyone for the thorough testing of this RC. We are really
close, but there were a few blockers found. I've cut a new RC to
incorporate those issues.

The following patches were merged during the RC3 testing period:

(blockers)
4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
metadataHive get constructed too early
6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

(other fixes)
9d6475b [SPARK-6917] [SQL] DecimalType is not read back when
non-native type exists
97d4cd0 [SPARK-8049] [MLLIB] drop tmp col from OneVsRest output
cbaf595 [SPARK-8014] [SQL] Avoid premature metadata discovery when
writing a HadoopFsRelation with a save mode other than Append
fa292dc [SPARK-8015] [FLUME] Remove Guava dependency from flume-sink.
f71a09d [SPARK-8037] [SQL] Ignores files whose name starts with dot in
HadoopFsRelation
292ee1a [SPARK-8021] [SQL] [PYSPARK] make Python read/write API
consistent with Scala
87941ff [SPARK-8023][SQL] Add deterministic attribute to Expression
to avoid collapsing nondeterministic projects.
e6d5895 [SPARK-7965] [SPARK-7972] [SQL] Handle expressions containing
multiple window expressions and make parser match window frames in
case insensitive way
8ac2376 [SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API
efc0e05 [SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead
of null for pairs that don't appear
cbfb682a [SPARK-8028] [SPARKR] Use addJar instead of setJars in SparkR
a7c8b00 [SPARK-7958] [STREAMING] Handled exception in
StreamingContext.start() to prevent leaking of actors
a76c2e1 [SPARK-7899] [PYSPARK] Fix Python 3 pyspark/sql/types module conflict
f1d4e7e [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame.
01f38f7 [SPARK-7979] Enforce structural type checker.
2c45009 [SPARK-7459] [MLLIB] ElementwiseProduct Java example
8938a74 [SPARK-7962] [MESOS] Fix master url parsing in rest submission client.
1513cff [SPARK-7957] Preserve partitioning when using randomSplit
9a88be1 [SPARK-6013] [ML] Add more Python ML examples for spark.ml
2bd4460 [SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init

On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-1109/
 [published as version: 1.4.0-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1110/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Tuesday, June 02, at 00:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC1 ==
 Below is a list of bug fixes that went into this RC:
 http://s.apache.org/vN

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-02 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
22596c534a38cfdda91aef18aa9037ab101e4251

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.0]
https://repository.apache.org/content/repositories/orgapachespark-/
[published as version: 1.4.0-rc4]
https://repository.apache.org/content/repositories/orgapachespark-1112/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/

Please vote on releasing this package as Apache Spark 1.4.0!

The vote is open until Saturday, June 06, at 05:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== What has changed since RC3 ==
In addition to may smaller fixes, three blocker issues were fixed:
4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
metadataHive get constructed too early
6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions from 1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs related
to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-02 Thread Patrick Wendell

He all - a tiny nit from the last e-mail. The tag is v1.4.0-rc4. The
exact commit and all other information is correct. (thanks Shivaram
who pointed this out).

On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 22596c534a38cfdda91aef18aa9037ab101e4251

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-/
 [published as version: 1.4.0-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1112/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Saturday, June 06, at 05:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC3 ==
 In addition to may smaller fixes, three blocker issues were fixed:
 4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
 metadataHive get constructed too early
 6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
 78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Patrick Wendell

Hey Bobby,

Those are generic warnings that the hadoop libraries throw. If you are
using MapRFS they shouldn't matter since you are using the MapR client
and not the default hadoop client.

Do you have any issues with functionality... or was it just seeing the
warnings that was the concern?

Thanks for helping test!

- Patrick

On Mon, Jun 1, 2015 at 5:18 PM, Bobby Chowdary
bobby.chowdar...@gmail.com wrote:
 Hive Context works on RC3 for Mapr after adding
 spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819. However,
 there still seems to be some other issues with native libraries, i get below
 warning
 WARN NativeCodeLoader: Unable to load native-hadoop library for your
 platform... using builtin-java classes where applicable. I tried adding even
 after adding SPARK_LIBRARYPATH and --driver-library-path with no luck.

 Built on MacOSX and running CentOS 7 JDK1.6 and JDK 1.8 (tried both)

  make-distribution.sh --tgz --skip-java-test -Phive -Phive-0.13.1 -Pmapr4
 -Pnetlib-lgpl -Phive-thriftserver.

   C

 On Mon, Jun 1, 2015 at 3:05 PM, Sean Owen so...@cloudera.com wrote:

 I get a bunch of failures in VersionSuite with build/test params
 -Pyarn -Phive -Phadoop-2.6:

 - success sanity check *** FAILED ***
   java.lang.RuntimeException: [download failed:
 org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed:
 commons-net#commons-net;3.1!commons-net.jar]
   at
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978)

 ... but maybe I missed the memo about how to build for Hive? do I
 still need another Hive profile?

 Other tests, signatures, etc look good.

 On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
  1.4.0!
 
  The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
 
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.0]
  https://repository.apache.org/content/repositories/orgapachespark-1109/
  [published as version: 1.4.0-rc3]
  https://repository.apache.org/content/repositories/orgapachespark-1110/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Tuesday, June 02, at 00:32 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What has changed since RC1 ==
  Below is a list of bug fixes that went into this RC:
  http://s.apache.org/vN
 
  == How can I help test this release? ==
  If you are a Spark user, you can help us test this release by
  taking a Spark 1.3 workload and running on this release candidate,
  then reporting any regressions.
 
  == What justifies a -1 vote for this release? ==
  This vote is happening towards the end of the 1.4 QA period,
  so -1 votes should only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related
  to new features will not block this release.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-29 Thread Patrick Wendell

Thanks for all the discussion on the vote thread. I am canceling this
vote in favor of RC3.

On Sun, May 24, 2015 at 12:22 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc2 (commit 03fb26a3):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=03fb26a3e50e00739cc815ba4e2e82d71d003168

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-1103/
 [published as version: 1.4.0-rc2]
 https://repository.apache.org/content/repositories/orgapachespark-1104/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Wednesday, May 27, at 08:12 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC1 ==
 Below is a list of bug fixes that went into this RC:
 http://s.apache.org/U1M

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.0]
https://repository.apache.org/content/repositories/orgapachespark-1109/
[published as version: 1.4.0-rc3]
https://repository.apache.org/content/repositories/orgapachespark-1110/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

Please vote on releasing this package as Apache Spark 1.4.0!

The vote is open until Tuesday, June 02, at 00:32 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== What has changed since RC1 ==
Below is a list of bug fixes that went into this RC:
http://s.apache.org/vN

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions from 1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs related
to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-27 Thread Patrick Wendell

Hi James,

As I said before that is not a blocker issue for this release, thanks.
Separately, there are some comments in this code review that indicate
you may be facing a bug in your own code rather than with Spark:

https://github.com/apache/spark/pull/5688#issuecomment-104491410

Please follow up on that issue outside of the vote thread.

Thanks!

On Wed, May 27, 2015 at 5:22 PM, jameszhouyi yiaz...@gmail.com wrote:
 -1 , SPARK-7119 blocker issue



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-4-0-RC2-tp12420p12472.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-24 Thread Patrick Wendell

Hey jameszhouyi,

Since SPARK-7119 is not a regression from earlier versions, we won't
hold the release for it. However, please comment on the JIRA if it is
affecting you... it will help us prioritize the bug.

- Patrick

On Fri, May 22, 2015 at 8:41 PM, jameszhouyi yiaz...@gmail.com wrote:
 We came across a Spark SQL issue
 (https://issues.apache.org/jira/browse/SPARK-7119) that cause query to fail.
 I not sure that if vote -1 to this RC1.



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-4-0-RC1-tp12321p12403.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-24 Thread Patrick Wendell

This vote is cancelled in favor of RC2.

On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-24 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc2 (commit 03fb26a3):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=03fb26a3e50e00739cc815ba4e2e82d71d003168

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.0]
https://repository.apache.org/content/repositories/orgapachespark-1103/
[published as version: 1.4.0-rc2]
https://repository.apache.org/content/repositories/orgapachespark-1104/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-docs/

Please vote on releasing this package as Apache Spark 1.4.0!

The vote is open until Wednesday, May 27, at 08:12 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== What has changed since RC1 ==
Below is a list of bug fixes that went into this RC:
http://s.apache.org/U1M

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions from 1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs related
to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[ANNOUNCE] Nightly maven and package builds for Spark

2015-05-24 Thread Patrick Wendell

Hi All,

This week I got around to setting up nightly builds for Spark on
Jenkins. I'd like feedback on these and if it's going well I can merge
the relevant automation scripts into Spark mainline and document it on
the website. Right now I'm doing:

1. SNAPSHOT's of Spark master and release branches published to ASF
Maven snapshot repo:

https://repository.apache.org/content/repositories/snapshots/org/apache/spark/

These are usable by adding this repository in your build and using a
snapshot version (e.g. 1.3.2-SNAPSHOT).

2. Nightly binary package builds and doc builds of master and release versions.

http://people.apache.org/~pwendell/spark-nightly/

These build 4 times per day and are tagged based on commits.

If anyone has feedback on these please let me know.

Thanks!
- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [IMPORTANT] Committers please update merge script

2015-05-23 Thread Patrick Wendell

Thanks Ted - there is no need for people to upgrade at this point,
since the changes in the release scripts just modify it not to rely on
default behavior.

On Sat, May 23, 2015 at 7:06 AM, Ted Yu yuzhih...@gmail.com wrote:
 INFRA-9646 has been resolved.

 FYI

 On Wed, May 13, 2015 at 6:00 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hi All - unfortunately the fix introduced another bug, which is that
 fixVersion was not updated properly. I've updated the script and had
 one other person test it.

 So committers please pull from master again thanks!

 - Patrick

 On Tue, May 12, 2015 at 6:25 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Due to an ASF infrastructure change (bug?) [1] the default JIRA
  resolution status has switched to Pending Closed. I've made a change
  to our merge script to coerce the correct status of Fixed when
  resolving [2]. Please upgrade the merge script to master.
 
  I've manually corrected JIRA's that were closed with the incorrect
  status. Let me know if you have any issues.
 
  [1] https://issues.apache.org/jira/browse/INFRA-9646
 
  [2]
  https://github.com/apache/spark/commit/1b9e434b6c19f23a01e9875a3c1966cd03ce8e2d

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: spark packages

2015-05-23 Thread Patrick Wendell

Yes - spark packages can include non ASF licenses.

On Sat, May 23, 2015 at 6:16 PM, Debasish Das debasish.da...@gmail.com wrote:
 Hi,

 Is it possible to add GPL/LGPL code on spark packages or it must be licensed
 under Apache as well ?

 I want to expose Professor Tim Davis's LGPL library for sparse algebra and
 ECOS GPL library through the package.

 Thanks.
 Deb

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Patrick Wendell

Thanks Andrew, the doc issue should be fixed in RC2 (if not, please
chine in!). R was missing in the build envirionment.

- Patrick

On Fri, May 22, 2015 at 3:33 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 Thanks for catching this. I'll check with Patrick to see why the R API docs
 are not getting included.

 On Fri, May 22, 2015 at 2:44 PM, Andrew Psaltis psaltis.and...@gmail.com
 wrote:

 All,
 Should all the docs work from
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ ? If so the R API
 docs 404.


 On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found
 at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell

HI all,

I've created another release repository where the release is
identified with the version 1.4.0-rc1:

https://repository.apache.org/content/repositories/orgapachespark-1093/

On Tue, May 19, 2015 at 5:36 PM, Krishna Sankar ksanka...@gmail.com wrote:
 Quick tests from my side - looks OK. The results are same or very similar to
 1.3.1. Will add dataframes et al in future tests.

 +1 (non-binding, of course)

 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:42 min
  mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
 -Dhadoop.version=2.6.0 -Phive -DskipTests
 2. Tested pyspark, mlib - running as well as compare results with 1.3.1
 2.1. statistics (min,max,mean,Pearson,Spearman) OK
 2.2. Linear/Ridge/Laso Regression OK
 2.3. Decision Tree, Naive Bayes OK
 2.4. KMeans OK
Center And Scale OK
 2.5. RDD operations OK
   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
Model evaluation/optimization (rank, numIter, lambda) with itertools
 OK

 Cheers
 k/

 On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell

Punya,

Let me see if I can publish these under rc1 as well. In the future
this will all be automated but current it's a somewhat manual task.

- Patrick

On Tue, May 19, 2015 at 9:32 AM, Punyashloka Biswal
punya.bis...@gmail.com wrote:
 When publishing future RCs to the staging repository, would it be possible
 to use a version number that includes the rc1 designation? In the current
 setup, when I run a build against the artifacts at
 https://repository.apache.org/content/repositories/orgapachespark-1092/org/apache/spark/spark-core_2.10/1.4.0/,
 my local Maven cache will get polluted with things that claim to be 1.4.0
 but aren't. It would be preferable for the version number to be 1.4.0-rc1
 instead.

 Thanks!
 Punya


 On Tue, May 19, 2015 at 12:20 PM Sean Owen so...@cloudera.com wrote:

 Before I vote, I wanted to point out there are still 9 Blockers for 1.4.0.
 I'd like to use this status to really mean must happen before the release.
 Many of these may be already fixed, or aren't really blockers -- can just be
 updated accordingly.

 I bet at least one will require further work if it's really meant for 1.4,
 so all this means is there is likely to be another RC. We should still kick
 the tires on RC1.

 (I also assume we should be extra conservative about what is merged into
 1.4 at this point.)


 SPARK-6784 SQL Clean up all the inbound/outbound conversions for DateType
 Adrian Wang

 SPARK-6811 SparkR Building binary R packages for SparkR Shivaram
 Venkataraman

 SPARK-6941 SQL Provide a better error message to explain that tables
 created from RDDs are immutable
 SPARK-7158 SQL collect and take return different results
 SPARK-7478 SQL Add a SQLContext.getOrCreate to maintain a singleton
 instance of SQLContext Tathagata Das

 SPARK-7616 SQL Overwriting a partitioned parquet table corrupt data Cheng
 Lian

 SPARK-7654 SQL DataFrameReader and DataFrameWriter for input/output API
 Reynold Xin

 SPARK-7662 SQL Exception of multi-attribute generator anlysis in
 projection

 SPARK-7713 SQL Use shared broadcast hadoop conf for partitioned table
 scan. Yin Huai


 On Tue, May 19, 2015 at 5:10 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

branch-1.4 merge ettiquite

2015-05-19 Thread Patrick Wendell

Hey All,

Since we are now voting, please tread very carefully with branch-1.4 merges.

For instances, bug fixes that don't represent regressions from 1.3.X,
these probably shouldn't be merged unless they are extremely simple
and well reviewed.

As usual mature/core components (e.g. Spark core) are more sensitive
than newer/edge ones (e.g. Dataframes).

I'm happy to provide guidance to people if they are on the fence about
patches. Ultimately this ends up being a matter of judgement and
assessing risk of specific patches. Just ping me on github.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc1 (commit 777a081):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.4.0-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1092/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

Please vote on releasing this package as Apache Spark 1.4.0!

The vote is open until Friday, May 22, at 17:03 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions from 1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs related
to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell

A couple of other process things:

1. Please *keep voting* (+1/-1) on this thread even if we find some
issues, until we cut RC2. This lets us pipeline the QA.
2. The SQL team owes a JIRA clean-up (forthcoming shortly)... there
are still a few Blocker's that aren't.


On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 507 matches

Mail list logo