[RESULT] [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-10 Thread Patrick Wendell
This vote passes with 13 +1 votes (6 binding) and no 0 or -1 votes:

+1 (13):
Patrick Wendell*
Marcelo Vanzin
Krishna Sankar
Sean Owen*
Matei Zaharia*
Sandy Ryza
Tom Graves*
Sean McNamara*
Denny Lee
Kostas Sakellis
Joseph Bradley*
Corey Nolet
GuoQiang Li

0:
-1:

I will finalize the release notes and packaging and will post the
release in the next two days.

- Patrick

On Mon, Mar 9, 2015 at 11:51 PM, GuoQiang Li  wrote:
> I'm sorry, this is my mistake. :)
>
>
> -- 原始邮件 --
> 发件人: "Patrick Wendell";
> 发送时间: 2015年3月10日(星期二) 下午2:20
> 收件人: "GuoQiang Li";
> 主题: Re: [VOTE] Release Apache Spark 1.3.0 (RC3)
>
> Thanks! But please e-mail the dev list and not just me personally :)
>
> On Mon, Mar 9, 2015 at 11:08 PM, GuoQiang Li  wrote:
>> +1 (non-binding)
>>
>> Test on Mac OS X 10.10.2 and CentOS 6.5
>>
>>
>> -- Original --
>> From:  "Patrick Wendell";;
>> Date:  Fri, Mar 6, 2015 10:52 AM
>> To:  "dev@spark.apache.org";
>> Subject:  [VOTE] Release Apache Spark 1.3.0 (RC3)
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.3.0!
>>
>> The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
>>
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-1.3.0-rc3/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> Staging repositories for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1078
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/
>>
>> Please vote on releasing this package as Apache Spark 1.3.0!
>>
>> The vote is open until Monday, March 09, at 02:52 UTC and passes if
>> a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.3.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>
>> == How does this compare to RC2 ==
>> This release includes the following bug fixes:
>>
>> https://issues.apache.org/jira/browse/SPARK-6144
>> https://issues.apache.org/jira/browse/SPARK-6171
>> https://issues.apache.org/jira/browse/SPARK-5143
>> https://issues.apache.org/jira/browse/SPARK-6182
>> https://issues.apache.org/jira/browse/SPARK-6175
>>
>> == How can I help test this release? ==
>> If you are a Spark user, you can help us test this release by
>> taking a Spark 1.2 workload and running on this release candidate,
>> then reporting any regressions.
>>
>> If you are happy with this release based on your own testing, give a +1
>> vote.
>>
>> == What justifies a -1 vote for this release? ==
>> This vote is happening towards the end of the 1.3 QA period,
>> so -1 votes should only occur for significant regressions from 1.2.1.
>> Bugs already present in 1.2.X, minor regressions, or bugs related
>> to new features will not block this release.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Corey Nolet
+1 (non-binding)

- Verified signatures
- Built on Mac OS X and Fedora 21.

On Mon, Mar 9, 2015 at 11:01 PM, Krishna Sankar  wrote:

> Excellent, Thanks Xiangrui. The mystery is solved.
> Cheers
> 
>
>
> On Mon, Mar 9, 2015 at 3:30 PM, Xiangrui Meng  wrote:
>
> > Krishna, I tested your linear regression example. For linear
> > regression, we changed its objective function from 1/n * \|A x -
> > b\|_2^2 to 1/(2n) * \|Ax - b\|_2^2 to be consistent with common least
> > squares formulations. It means you could re-produce the same result by
> > multiplying the step size by 2. This is not a problem if both run
> > until convergence (if not blow up). However, in your example, a very
> > small step size is chosen and it didn't converge in 100 iterations. In
> > this case, the step size matters. I will put a note in the migration
> > guide. Thanks! -Xiangrui
> >
> > On Mon, Mar 9, 2015 at 1:38 PM, Sean Owen  wrote:
> > > I'm +1 as I have not heard of any one else seeing the Hive test
> > > failure, which is likely a test issue rather than code issue anyway,
> > > and not a blocker.
> > >
> > > On Fri, Mar 6, 2015 at 9:36 PM, Sean Owen  wrote:
> > >> Although the problem is small, especially if indeed the essential docs
> > >> changes are following just a couple days behind the final release, I
> > >> mean, why the rush if they're essential? wait a couple days, finish
> > >> them, make the release.
> > >>
> > >> Answer is, I think these changes aren't actually essential given the
> > >> comment from tdas, so: just mark these Critical? (although ... they do
> > >> say they're changes for the 1.3 release, so kind of funny to get to
> > >> them for 1.3.x or 1.4, but that's not important now.)
> > >>
> > >> I thought that Blocker really meant Blocker in this project, as I've
> > >> been encouraged to use it to mean "don't release without this." I
> > >> think we should use it that way. Just thinking of it as "extra
> > >> Critical" doesn't add anything. I don't think Documentation should be
> > >> special-cased as less important, and I don't think there's confusion
> > >> if Blocker means what it says, so I'd 'fix' that way.
> > >>
> > >> If nobody sees the Hive failure I observed, and if we can just zap
> > >> those "Blockers" one way or the other, +1
> > >>
> > >>
> > >> On Fri, Mar 6, 2015 at 9:17 PM, Patrick Wendell 
> > wrote:
> > >>> Sean,
> > >>>
> > >>> The docs are distributed and consumed in a fundamentally different
> way
> > >>> than Spark code itself. So we've always considered the "deadline" for
> > >>> doc changes to be when the release is finally posted.
> > >>>
> > >>> If there are small inconsistencies with the docs present in the
> source
> > >>> code for that release tag, IMO that doesn't matter much since we
> don't
> > >>> even distribute the docs with Spark's binary releases and virtually
> no
> > >>> one builds and hosts the docs on their own (that I am aware of, at
> > >>> least). Perhaps we can recommend if people want to build the doc
> > >>> sources that they should always grab the head of the most recent
> > >>> release branch, to set expectations accordingly.
> > >>>
> > >>> In the past we haven't considered it worth holding up the release
> > >>> process for the purpose of the docs. It just doesn't make sense since
> > >>> they are consumed "as a service". If we decide to change this
> > >>> convention, it would mean shipping our releases later, since we
> > >>> could't pipeline the doc finalization with voting.
> > >>>
> > >>> - Patrick
> > >>>
> > >>> On Fri, Mar 6, 2015 at 11:02 AM, Sean Owen 
> wrote:
> >  Given the title and tagging, it sounds like there could be some
> >  must-have doc changes to go with what is being released as 1.3. It
> can
> >  be finished later, and published later, but then the docs source
> >  shipped with the release doesn't match the site, and until then, 1.3
> >  is released without some "must-have" docs for 1.3 on the site.
> > 
> >  The real question to me is: are there any further, absolutely
> >  essential doc changes that need to accompany 1.3 or not?
> > 
> >  If not, just resolve these. If there are, then it seems like the
> >  release has to block on them. If there are some docs that should
> have
> >  gone in for 1.3, but didn't, but aren't essential, well I suppose it
> >  bears thinking about how to not slip as much work, but it doesn't
> >  block.
> > 
> >  I think Documentation issues certainly can be a blocker and
> shouldn't
> >  be specially ignored.
> > 
> > 
> >  BTW the UISeleniumSuite issue is a real failure, but I do not think
> it
> >  is serious: http://issues.apache.org/jira/browse/SPARK-6205  It
> isn't
> >  a regression from 1.2.x, but only affects tests, and only affects a
> >  subset of build profiles.
> > 
> > 
> > 
> > 
> >  On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell  >
> > wrote:
> > > Hey Sean,
> > >
>

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Krishna Sankar
Excellent, Thanks Xiangrui. The mystery is solved.
Cheers



On Mon, Mar 9, 2015 at 3:30 PM, Xiangrui Meng  wrote:

> Krishna, I tested your linear regression example. For linear
> regression, we changed its objective function from 1/n * \|A x -
> b\|_2^2 to 1/(2n) * \|Ax - b\|_2^2 to be consistent with common least
> squares formulations. It means you could re-produce the same result by
> multiplying the step size by 2. This is not a problem if both run
> until convergence (if not blow up). However, in your example, a very
> small step size is chosen and it didn't converge in 100 iterations. In
> this case, the step size matters. I will put a note in the migration
> guide. Thanks! -Xiangrui
>
> On Mon, Mar 9, 2015 at 1:38 PM, Sean Owen  wrote:
> > I'm +1 as I have not heard of any one else seeing the Hive test
> > failure, which is likely a test issue rather than code issue anyway,
> > and not a blocker.
> >
> > On Fri, Mar 6, 2015 at 9:36 PM, Sean Owen  wrote:
> >> Although the problem is small, especially if indeed the essential docs
> >> changes are following just a couple days behind the final release, I
> >> mean, why the rush if they're essential? wait a couple days, finish
> >> them, make the release.
> >>
> >> Answer is, I think these changes aren't actually essential given the
> >> comment from tdas, so: just mark these Critical? (although ... they do
> >> say they're changes for the 1.3 release, so kind of funny to get to
> >> them for 1.3.x or 1.4, but that's not important now.)
> >>
> >> I thought that Blocker really meant Blocker in this project, as I've
> >> been encouraged to use it to mean "don't release without this." I
> >> think we should use it that way. Just thinking of it as "extra
> >> Critical" doesn't add anything. I don't think Documentation should be
> >> special-cased as less important, and I don't think there's confusion
> >> if Blocker means what it says, so I'd 'fix' that way.
> >>
> >> If nobody sees the Hive failure I observed, and if we can just zap
> >> those "Blockers" one way or the other, +1
> >>
> >>
> >> On Fri, Mar 6, 2015 at 9:17 PM, Patrick Wendell 
> wrote:
> >>> Sean,
> >>>
> >>> The docs are distributed and consumed in a fundamentally different way
> >>> than Spark code itself. So we've always considered the "deadline" for
> >>> doc changes to be when the release is finally posted.
> >>>
> >>> If there are small inconsistencies with the docs present in the source
> >>> code for that release tag, IMO that doesn't matter much since we don't
> >>> even distribute the docs with Spark's binary releases and virtually no
> >>> one builds and hosts the docs on their own (that I am aware of, at
> >>> least). Perhaps we can recommend if people want to build the doc
> >>> sources that they should always grab the head of the most recent
> >>> release branch, to set expectations accordingly.
> >>>
> >>> In the past we haven't considered it worth holding up the release
> >>> process for the purpose of the docs. It just doesn't make sense since
> >>> they are consumed "as a service". If we decide to change this
> >>> convention, it would mean shipping our releases later, since we
> >>> could't pipeline the doc finalization with voting.
> >>>
> >>> - Patrick
> >>>
> >>> On Fri, Mar 6, 2015 at 11:02 AM, Sean Owen  wrote:
>  Given the title and tagging, it sounds like there could be some
>  must-have doc changes to go with what is being released as 1.3. It can
>  be finished later, and published later, but then the docs source
>  shipped with the release doesn't match the site, and until then, 1.3
>  is released without some "must-have" docs for 1.3 on the site.
> 
>  The real question to me is: are there any further, absolutely
>  essential doc changes that need to accompany 1.3 or not?
> 
>  If not, just resolve these. If there are, then it seems like the
>  release has to block on them. If there are some docs that should have
>  gone in for 1.3, but didn't, but aren't essential, well I suppose it
>  bears thinking about how to not slip as much work, but it doesn't
>  block.
> 
>  I think Documentation issues certainly can be a blocker and shouldn't
>  be specially ignored.
> 
> 
>  BTW the UISeleniumSuite issue is a real failure, but I do not think it
>  is serious: http://issues.apache.org/jira/browse/SPARK-6205  It isn't
>  a regression from 1.2.x, but only affects tests, and only affects a
>  subset of build profiles.
> 
> 
> 
> 
>  On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell 
> wrote:
> > Hey Sean,
> >
> >> SPARK-5310 Update SQL programming guide for 1.3
> >> SPARK-5183 Document data source API
> >> SPARK-6128 Update Spark Streaming Guide for Spark 1.3
> >
> > For these, the issue is that they are documentation JIRA's, which
> > don't need to be timed exactly with the release vote, since we can
> > update the documentatio

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Joseph Bradley
+1
Tested on Mac OS X

On Mon, Mar 9, 2015 at 3:30 PM, Xiangrui Meng  wrote:

> Krishna, I tested your linear regression example. For linear
> regression, we changed its objective function from 1/n * \|A x -
> b\|_2^2 to 1/(2n) * \|Ax - b\|_2^2 to be consistent with common least
> squares formulations. It means you could re-produce the same result by
> multiplying the step size by 2. This is not a problem if both run
> until convergence (if not blow up). However, in your example, a very
> small step size is chosen and it didn't converge in 100 iterations. In
> this case, the step size matters. I will put a note in the migration
> guide. Thanks! -Xiangrui
>
> On Mon, Mar 9, 2015 at 1:38 PM, Sean Owen  wrote:
> > I'm +1 as I have not heard of any one else seeing the Hive test
> > failure, which is likely a test issue rather than code issue anyway,
> > and not a blocker.
> >
> > On Fri, Mar 6, 2015 at 9:36 PM, Sean Owen  wrote:
> >> Although the problem is small, especially if indeed the essential docs
> >> changes are following just a couple days behind the final release, I
> >> mean, why the rush if they're essential? wait a couple days, finish
> >> them, make the release.
> >>
> >> Answer is, I think these changes aren't actually essential given the
> >> comment from tdas, so: just mark these Critical? (although ... they do
> >> say they're changes for the 1.3 release, so kind of funny to get to
> >> them for 1.3.x or 1.4, but that's not important now.)
> >>
> >> I thought that Blocker really meant Blocker in this project, as I've
> >> been encouraged to use it to mean "don't release without this." I
> >> think we should use it that way. Just thinking of it as "extra
> >> Critical" doesn't add anything. I don't think Documentation should be
> >> special-cased as less important, and I don't think there's confusion
> >> if Blocker means what it says, so I'd 'fix' that way.
> >>
> >> If nobody sees the Hive failure I observed, and if we can just zap
> >> those "Blockers" one way or the other, +1
> >>
> >>
> >> On Fri, Mar 6, 2015 at 9:17 PM, Patrick Wendell 
> wrote:
> >>> Sean,
> >>>
> >>> The docs are distributed and consumed in a fundamentally different way
> >>> than Spark code itself. So we've always considered the "deadline" for
> >>> doc changes to be when the release is finally posted.
> >>>
> >>> If there are small inconsistencies with the docs present in the source
> >>> code for that release tag, IMO that doesn't matter much since we don't
> >>> even distribute the docs with Spark's binary releases and virtually no
> >>> one builds and hosts the docs on their own (that I am aware of, at
> >>> least). Perhaps we can recommend if people want to build the doc
> >>> sources that they should always grab the head of the most recent
> >>> release branch, to set expectations accordingly.
> >>>
> >>> In the past we haven't considered it worth holding up the release
> >>> process for the purpose of the docs. It just doesn't make sense since
> >>> they are consumed "as a service". If we decide to change this
> >>> convention, it would mean shipping our releases later, since we
> >>> could't pipeline the doc finalization with voting.
> >>>
> >>> - Patrick
> >>>
> >>> On Fri, Mar 6, 2015 at 11:02 AM, Sean Owen  wrote:
>  Given the title and tagging, it sounds like there could be some
>  must-have doc changes to go with what is being released as 1.3. It can
>  be finished later, and published later, but then the docs source
>  shipped with the release doesn't match the site, and until then, 1.3
>  is released without some "must-have" docs for 1.3 on the site.
> 
>  The real question to me is: are there any further, absolutely
>  essential doc changes that need to accompany 1.3 or not?
> 
>  If not, just resolve these. If there are, then it seems like the
>  release has to block on them. If there are some docs that should have
>  gone in for 1.3, but didn't, but aren't essential, well I suppose it
>  bears thinking about how to not slip as much work, but it doesn't
>  block.
> 
>  I think Documentation issues certainly can be a blocker and shouldn't
>  be specially ignored.
> 
> 
>  BTW the UISeleniumSuite issue is a real failure, but I do not think it
>  is serious: http://issues.apache.org/jira/browse/SPARK-6205  It isn't
>  a regression from 1.2.x, but only affects tests, and only affects a
>  subset of build profiles.
> 
> 
> 
> 
>  On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell 
> wrote:
> > Hey Sean,
> >
> >> SPARK-5310 Update SQL programming guide for 1.3
> >> SPARK-5183 Document data source API
> >> SPARK-6128 Update Spark Streaming Guide for Spark 1.3
> >
> > For these, the issue is that they are documentation JIRA's, which
> > don't need to be timed exactly with the release vote, since we can
> > update the documentation on the website whenever we want. In 

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Xiangrui Meng
Krishna, I tested your linear regression example. For linear
regression, we changed its objective function from 1/n * \|A x -
b\|_2^2 to 1/(2n) * \|Ax - b\|_2^2 to be consistent with common least
squares formulations. It means you could re-produce the same result by
multiplying the step size by 2. This is not a problem if both run
until convergence (if not blow up). However, in your example, a very
small step size is chosen and it didn't converge in 100 iterations. In
this case, the step size matters. I will put a note in the migration
guide. Thanks! -Xiangrui

On Mon, Mar 9, 2015 at 1:38 PM, Sean Owen  wrote:
> I'm +1 as I have not heard of any one else seeing the Hive test
> failure, which is likely a test issue rather than code issue anyway,
> and not a blocker.
>
> On Fri, Mar 6, 2015 at 9:36 PM, Sean Owen  wrote:
>> Although the problem is small, especially if indeed the essential docs
>> changes are following just a couple days behind the final release, I
>> mean, why the rush if they're essential? wait a couple days, finish
>> them, make the release.
>>
>> Answer is, I think these changes aren't actually essential given the
>> comment from tdas, so: just mark these Critical? (although ... they do
>> say they're changes for the 1.3 release, so kind of funny to get to
>> them for 1.3.x or 1.4, but that's not important now.)
>>
>> I thought that Blocker really meant Blocker in this project, as I've
>> been encouraged to use it to mean "don't release without this." I
>> think we should use it that way. Just thinking of it as "extra
>> Critical" doesn't add anything. I don't think Documentation should be
>> special-cased as less important, and I don't think there's confusion
>> if Blocker means what it says, so I'd 'fix' that way.
>>
>> If nobody sees the Hive failure I observed, and if we can just zap
>> those "Blockers" one way or the other, +1
>>
>>
>> On Fri, Mar 6, 2015 at 9:17 PM, Patrick Wendell  wrote:
>>> Sean,
>>>
>>> The docs are distributed and consumed in a fundamentally different way
>>> than Spark code itself. So we've always considered the "deadline" for
>>> doc changes to be when the release is finally posted.
>>>
>>> If there are small inconsistencies with the docs present in the source
>>> code for that release tag, IMO that doesn't matter much since we don't
>>> even distribute the docs with Spark's binary releases and virtually no
>>> one builds and hosts the docs on their own (that I am aware of, at
>>> least). Perhaps we can recommend if people want to build the doc
>>> sources that they should always grab the head of the most recent
>>> release branch, to set expectations accordingly.
>>>
>>> In the past we haven't considered it worth holding up the release
>>> process for the purpose of the docs. It just doesn't make sense since
>>> they are consumed "as a service". If we decide to change this
>>> convention, it would mean shipping our releases later, since we
>>> could't pipeline the doc finalization with voting.
>>>
>>> - Patrick
>>>
>>> On Fri, Mar 6, 2015 at 11:02 AM, Sean Owen  wrote:
 Given the title and tagging, it sounds like there could be some
 must-have doc changes to go with what is being released as 1.3. It can
 be finished later, and published later, but then the docs source
 shipped with the release doesn't match the site, and until then, 1.3
 is released without some "must-have" docs for 1.3 on the site.

 The real question to me is: are there any further, absolutely
 essential doc changes that need to accompany 1.3 or not?

 If not, just resolve these. If there are, then it seems like the
 release has to block on them. If there are some docs that should have
 gone in for 1.3, but didn't, but aren't essential, well I suppose it
 bears thinking about how to not slip as much work, but it doesn't
 block.

 I think Documentation issues certainly can be a blocker and shouldn't
 be specially ignored.


 BTW the UISeleniumSuite issue is a real failure, but I do not think it
 is serious: http://issues.apache.org/jira/browse/SPARK-6205  It isn't
 a regression from 1.2.x, but only affects tests, and only affects a
 subset of build profiles.




 On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell  wrote:
> Hey Sean,
>
>> SPARK-5310 Update SQL programming guide for 1.3
>> SPARK-5183 Document data source API
>> SPARK-6128 Update Spark Streaming Guide for Spark 1.3
>
> For these, the issue is that they are documentation JIRA's, which
> don't need to be timed exactly with the release vote, since we can
> update the documentation on the website whenever we want. In the past
> I've just mentally filtered these out when considering RC's. I see a
> few options here:
>
> 1. We downgrade such issues away from Blocker (more clear, but we risk
> loosing them in the fray if they really are things we want to have
> before 

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-09 Thread Andrew Ash
Does the Apache project team have any ability to measure download counts of
the various releases?  That data could be useful when it comes time to
sunset vendor-specific releases, like CDH4 for example.

On Mon, Mar 9, 2015 at 5:34 AM, Mridul Muralidharan 
wrote:

> In ideal situation, +1 on removing all vendor specific builds and
> making just hadoop version specific - that is what we should depend on
> anyway.
> Though I hope Sean is correct in assuming that vendor specific builds
> for hadoop 2.4 are just that; and not 2.4- or 2.4+ which cause
> incompatibilities for us or our users !
>
> Regards,
> Mridul
>
>
> On Mon, Mar 9, 2015 at 2:50 AM, Sean Owen  wrote:
> > Yes, you should always find working bits at Apache no matter what --
> > though 'no matter what' really means 'as long as you use Hadoop distro
> > compatible with upstream Hadoop'. Even distros have a strong interest
> > in that, since the market, the 'pie', is made large by this kind of
> > freedom at the core.
> >
> > If tso, then no vendor-specific builds are needed, only some
> > Hadoop-release-specific ones. So a Hadoop 2.6-specific build could be
> > good (although I'm not yet clear if there's something about 2.5 or 2.6
> > that needs a different build.)
> >
> > I take it that we already believe that, say, the "Hadoop 2.4" build
> > works with CDH5, so no CDH5-specific build is provided by Spark.
> >
> > If a distro doesn't work with stock Spark, then it's either something
> > Spark should fix (e.g. use of a private YARN API or something), or
> > it's something the distro should really fix because it's incompatible.
> >
> > Could we maybe rename the "CDH4" build then, as it doesn't really work
> > with all CDH4, to be a "Hadoop 2.0.x build"? That's been floated
> > before. And can we remove the MapR builds -- or else can someone
> > explain why these exist separately from a Hadoop 2.3 build? I hope it
> > is not *because* they are somehow non-standard. And shall we first run
> > down why Spark doesn't fully work on HDP and see if it's something
> > that Spark or HDP needs to tweak, rather than contemplate another
> > binary? or, if so, can it simply be called a "Hadoop 2.7 + YARN
> > whatever" build and not made specific to a vendor, even if the project
> > has to field another tarball combo for a vendor?
> >
> > Maybe we are saying almost the same thing.
> >
> >
> > On Mon, Mar 9, 2015 at 1:33 AM, Matei Zaharia 
> wrote:
> >> Yeah, my concern is that people should get Apache Spark from *Apache*,
> not from a vendor. It helps everyone use the latest features no matter
> where they are. In the Hadoop distro case, Hadoop made all this effort to
> have standard APIs (e.g. YARN), so it should be easy. But it is a problem
> if we're not packaging for the newest versions of some distros; I think we
> just fell behind at Hadoop 2.4.
> >>
> >> Matei
> >>
> >>> On Mar 8, 2015, at 8:02 PM, Sean Owen  wrote:
> >>>
> >>> Yeah it's not much overhead, but here's an example of where it causes
> >>> a little issue.
> >>>
> >>> I like that reasoning. However, the released builds don't track the
> >>> later versions of Hadoop that vendors would be distributing -- there's
> >>> no Hadoop 2.6 build for example. CDH4 is here, but not the
> >>> far-more-used CDH5. HDP isn't present at all. The CDH4 build doesn't
> >>> actually work with many CDH4 versions.
> >>>
> >>> I agree with the goal of maximizing the reach of Spark, but I don't
> >>> know how much these builds advance that goal.
> >>>
> >>> Anyone can roll-their-own exactly-right build, and the docs and build
> >>> have been set up to make that as simple as can be expected. So these
> >>> aren't *required* to let me use latest Spark on distribution X.
> >>>
> >>> I had thought these existed to sorta support 'legacy' distributions,
> >>> like CDH4, and that build was justified as a
> >>> quasi-Hadoop-2.0.x-flavored build. But then I don't understand what
> >>> the MapR profiles are for.
> >>>
> >>> I think it's too much work to correctly, in parallel, maintain any
> >>> customizations necessary for any major distro, and it might be best to
> >>> do not at all than to do it incompletely. You could say it's also an
> >>> enabler for distros to vary in ways that require special
> >>> customization.
> >>>
> >>> Maybe there's a concern that, if lots of people consume Spark on
> >>> Hadoop, and most people consume Hadoop through distros, and distros
> >>> alone manage Spark distributions, then you de facto 'have to' go
> >>> through a distro instead of get bits from Spark? Different
> >>> conversation but I think this sort of effect does not end up being a
> >>> negative.
> >>>
> >>> Well anyway, I like the idea of seeing how far Hadoop-provided
> >>> releases can help. It might kill several birds with one stone.
> >>>
> >>> On Sun, Mar 8, 2015 at 11:07 PM, Matei Zaharia <
> matei.zaha...@gmail.com> wrote:
>  Our goal is to let people use the latest Apache release even if
> vendors fall behind or don't 

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Sean Owen
I'm +1 as I have not heard of any one else seeing the Hive test
failure, which is likely a test issue rather than code issue anyway,
and not a blocker.

On Fri, Mar 6, 2015 at 9:36 PM, Sean Owen  wrote:
> Although the problem is small, especially if indeed the essential docs
> changes are following just a couple days behind the final release, I
> mean, why the rush if they're essential? wait a couple days, finish
> them, make the release.
>
> Answer is, I think these changes aren't actually essential given the
> comment from tdas, so: just mark these Critical? (although ... they do
> say they're changes for the 1.3 release, so kind of funny to get to
> them for 1.3.x or 1.4, but that's not important now.)
>
> I thought that Blocker really meant Blocker in this project, as I've
> been encouraged to use it to mean "don't release without this." I
> think we should use it that way. Just thinking of it as "extra
> Critical" doesn't add anything. I don't think Documentation should be
> special-cased as less important, and I don't think there's confusion
> if Blocker means what it says, so I'd 'fix' that way.
>
> If nobody sees the Hive failure I observed, and if we can just zap
> those "Blockers" one way or the other, +1
>
>
> On Fri, Mar 6, 2015 at 9:17 PM, Patrick Wendell  wrote:
>> Sean,
>>
>> The docs are distributed and consumed in a fundamentally different way
>> than Spark code itself. So we've always considered the "deadline" for
>> doc changes to be when the release is finally posted.
>>
>> If there are small inconsistencies with the docs present in the source
>> code for that release tag, IMO that doesn't matter much since we don't
>> even distribute the docs with Spark's binary releases and virtually no
>> one builds and hosts the docs on their own (that I am aware of, at
>> least). Perhaps we can recommend if people want to build the doc
>> sources that they should always grab the head of the most recent
>> release branch, to set expectations accordingly.
>>
>> In the past we haven't considered it worth holding up the release
>> process for the purpose of the docs. It just doesn't make sense since
>> they are consumed "as a service". If we decide to change this
>> convention, it would mean shipping our releases later, since we
>> could't pipeline the doc finalization with voting.
>>
>> - Patrick
>>
>> On Fri, Mar 6, 2015 at 11:02 AM, Sean Owen  wrote:
>>> Given the title and tagging, it sounds like there could be some
>>> must-have doc changes to go with what is being released as 1.3. It can
>>> be finished later, and published later, but then the docs source
>>> shipped with the release doesn't match the site, and until then, 1.3
>>> is released without some "must-have" docs for 1.3 on the site.
>>>
>>> The real question to me is: are there any further, absolutely
>>> essential doc changes that need to accompany 1.3 or not?
>>>
>>> If not, just resolve these. If there are, then it seems like the
>>> release has to block on them. If there are some docs that should have
>>> gone in for 1.3, but didn't, but aren't essential, well I suppose it
>>> bears thinking about how to not slip as much work, but it doesn't
>>> block.
>>>
>>> I think Documentation issues certainly can be a blocker and shouldn't
>>> be specially ignored.
>>>
>>>
>>> BTW the UISeleniumSuite issue is a real failure, but I do not think it
>>> is serious: http://issues.apache.org/jira/browse/SPARK-6205  It isn't
>>> a regression from 1.2.x, but only affects tests, and only affects a
>>> subset of build profiles.
>>>
>>>
>>>
>>>
>>> On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell  wrote:
 Hey Sean,

> SPARK-5310 Update SQL programming guide for 1.3
> SPARK-5183 Document data source API
> SPARK-6128 Update Spark Streaming Guide for Spark 1.3

 For these, the issue is that they are documentation JIRA's, which
 don't need to be timed exactly with the release vote, since we can
 update the documentation on the website whenever we want. In the past
 I've just mentally filtered these out when considering RC's. I see a
 few options here:

 1. We downgrade such issues away from Blocker (more clear, but we risk
 loosing them in the fray if they really are things we want to have
 before the release is posted).
 2. We provide a filter to the community that excludes 'Documentation'
 issues and shows all other blockers for 1.3. We can put this on the
 wiki, for instance.

 Which do you prefer?

 - Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Kostas Sakellis
+1 on RC3

I agree that this should not block the release. Once we have a fix for it,
putting it in a double dot release sounds like a good plan.

Kostas



On Mon, Mar 9, 2015 at 11:27 AM, Patrick Wendell  wrote:

> Hey All,
>
> Today there was a JIRA posted with an observed regression around Spark
> Streaming during certain recovery scenarios:
>
> https://issues.apache.org/jira/browse/SPARK-6222
>
> My preference is to go ahead and ship this release (RC3) as-is and if
> this issue is isolated resolved soon, we can make a patch release in
> the next week or two.
>
> At some point, the cost of continuing to hold the release re/vote is
> so high that it's better to just ship the release. We can document
> known issues and point users to a fix once it's available. We did this
> in 1.2.0 as well (there were two small known issues) and I think as a
> point of process, this approach is necessary given the size of the
> project.
>
> I wanted to notify this thread though, in case this change anyones
> opinion on their release vote. I will leave the thread open at least
> until the end of today.
>
> Still +1 on RC3, for me.
>
> - Patrick
>
> On Mon, Mar 9, 2015 at 9:36 AM, Denny Lee  wrote:
> > +1 (non-binding)
> >
> > Spark Standalone and YARN on Hadoop 2.6 on OSX plus various tests (MLLib,
> > SparkSQL, etc.)
> >
> > On Mon, Mar 9, 2015 at 9:18 AM Tom Graves 
> > wrote:
> >>
> >> +1. Built from source and ran Spark on yarn on hadoop 2.6 in cluster and
> >> client mode.
> >> Tom
> >>
> >>  On Thursday, March 5, 2015 8:53 PM, Patrick Wendell
> >>  wrote:
> >>
> >>
> >>  Please vote on releasing the following candidate as Apache Spark
> version
> >> 1.3.0!
> >>
> >> The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> http://people.apache.org/~pwendell/spark-1.3.0-rc3/
> >>
> >> Release artifacts are signed with the following key:
> >> https://people.apache.org/keys/committer/pwendell.asc
> >>
> >> Staging repositories for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1078
> >>
> >> The documentation corresponding to this release can be found at:
> >> http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/
> >>
> >> Please vote on releasing this package as Apache Spark 1.3.0!
> >>
> >> The vote is open until Monday, March 09, at 02:52 UTC and passes if
> >> a majority of at least 3 +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Spark 1.3.0
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see
> >> http://spark.apache.org/
> >>
> >> == How does this compare to RC2 ==
> >> This release includes the following bug fixes:
> >>
> >> https://issues.apache.org/jira/browse/SPARK-6144
> >> https://issues.apache.org/jira/browse/SPARK-6171
> >> https://issues.apache.org/jira/browse/SPARK-5143
> >> https://issues.apache.org/jira/browse/SPARK-6182
> >> https://issues.apache.org/jira/browse/SPARK-6175
> >>
> >> == How can I help test this release? ==
> >> If you are a Spark user, you can help us test this release by
> >> taking a Spark 1.2 workload and running on this release candidate,
> >> then reporting any regressions.
> >>
> >> If you are happy with this release based on your own testing, give a +1
> >> vote.
> >>
> >> == What justifies a -1 vote for this release? ==
> >> This vote is happening towards the end of the 1.3 QA period,
> >> so -1 votes should only occur for significant regressions from 1.2.1.
> >> Bugs already present in 1.2.X, minor regressions, or bugs related
> >> to new features will not block this release.
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >>
> >>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Patrick Wendell
Hey All,

Today there was a JIRA posted with an observed regression around Spark
Streaming during certain recovery scenarios:

https://issues.apache.org/jira/browse/SPARK-6222

My preference is to go ahead and ship this release (RC3) as-is and if
this issue is isolated resolved soon, we can make a patch release in
the next week or two.

At some point, the cost of continuing to hold the release re/vote is
so high that it's better to just ship the release. We can document
known issues and point users to a fix once it's available. We did this
in 1.2.0 as well (there were two small known issues) and I think as a
point of process, this approach is necessary given the size of the
project.

I wanted to notify this thread though, in case this change anyones
opinion on their release vote. I will leave the thread open at least
until the end of today.

Still +1 on RC3, for me.

- Patrick

On Mon, Mar 9, 2015 at 9:36 AM, Denny Lee  wrote:
> +1 (non-binding)
>
> Spark Standalone and YARN on Hadoop 2.6 on OSX plus various tests (MLLib,
> SparkSQL, etc.)
>
> On Mon, Mar 9, 2015 at 9:18 AM Tom Graves 
> wrote:
>>
>> +1. Built from source and ran Spark on yarn on hadoop 2.6 in cluster and
>> client mode.
>> Tom
>>
>>  On Thursday, March 5, 2015 8:53 PM, Patrick Wendell
>>  wrote:
>>
>>
>>  Please vote on releasing the following candidate as Apache Spark version
>> 1.3.0!
>>
>> The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
>>
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-1.3.0-rc3/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> Staging repositories for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1078
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/
>>
>> Please vote on releasing this package as Apache Spark 1.3.0!
>>
>> The vote is open until Monday, March 09, at 02:52 UTC and passes if
>> a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.3.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>
>> == How does this compare to RC2 ==
>> This release includes the following bug fixes:
>>
>> https://issues.apache.org/jira/browse/SPARK-6144
>> https://issues.apache.org/jira/browse/SPARK-6171
>> https://issues.apache.org/jira/browse/SPARK-5143
>> https://issues.apache.org/jira/browse/SPARK-6182
>> https://issues.apache.org/jira/browse/SPARK-6175
>>
>> == How can I help test this release? ==
>> If you are a Spark user, you can help us test this release by
>> taking a Spark 1.2 workload and running on this release candidate,
>> then reporting any regressions.
>>
>> If you are happy with this release based on your own testing, give a +1
>> vote.
>>
>> == What justifies a -1 vote for this release? ==
>> This vote is happening towards the end of the 1.3 QA period,
>> so -1 votes should only occur for significant regressions from 1.2.1.
>> Bugs already present in 1.2.X, minor regressions, or bugs related
>> to new features will not block this release.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>>
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Denny Lee
+1 (non-binding)

Spark Standalone and YARN on Hadoop 2.6 on OSX plus various tests (MLLib,
SparkSQL, etc.)

On Mon, Mar 9, 2015 at 9:18 AM Tom Graves 
wrote:

> +1. Built from source and ran Spark on yarn on hadoop 2.6 in cluster and
> client mode.
> Tom
>
>  On Thursday, March 5, 2015 8:53 PM, Patrick Wendell <
> pwend...@gmail.com> wrote:
>
>
>  Please vote on releasing the following candidate as Apache Spark version
> 1.3.0!
>
> The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
> 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> Staging repositories for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1078
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/
>
> Please vote on releasing this package as Apache Spark 1.3.0!
>
> The vote is open until Monday, March 09, at 02:52 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == How does this compare to RC2 ==
> This release includes the following bug fixes:
>
> https://issues.apache.org/jira/browse/SPARK-6144
> https://issues.apache.org/jira/browse/SPARK-6171
> https://issues.apache.org/jira/browse/SPARK-5143
> https://issues.apache.org/jira/browse/SPARK-6182
> https://issues.apache.org/jira/browse/SPARK-6175
>
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.2 workload and running on this release candidate,
> then reporting any regressions.
>
> If you are happy with this release based on your own testing, give a +1
> vote.
>
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.3 QA period,
> so -1 votes should only occur for significant regressions from 1.2.1.
> Bugs already present in 1.2.X, minor regressions, or bugs related
> to new features will not block this release.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>
>
>


Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Tom Graves
+1. Built from source and ran Spark on yarn on hadoop 2.6 in cluster and client 
mode.
Tom 

 On Thursday, March 5, 2015 8:53 PM, Patrick Wendell  
wrote:
   

 Please vote on releasing the following candidate as Apache Spark version 1.3.0!

The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.3.0-rc3/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

Staging repositories for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1078

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/

Please vote on releasing this package as Apache Spark 1.3.0!

The vote is open until Monday, March 09, at 02:52 UTC and passes if
a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.3.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== How does this compare to RC2 ==
This release includes the following bug fixes:

https://issues.apache.org/jira/browse/SPARK-6144
https://issues.apache.org/jira/browse/SPARK-6171
https://issues.apache.org/jira/browse/SPARK-5143
https://issues.apache.org/jira/browse/SPARK-6182
https://issues.apache.org/jira/browse/SPARK-6175

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.2 workload and running on this release candidate,
then reporting any regressions.

If you are happy with this release based on your own testing, give a +1 vote.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.3 QA period,
so -1 votes should only occur for significant regressions from 1.2.1.
Bugs already present in 1.2.X, minor regressions, or bugs related
to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org





Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Sean McNamara
+1

Ran local tests and tested our spark apps on a spark+yarn cluster.

Cheers,

Sean


> On Mar 8, 2015, at 11:51 PM, Sandy Ryza  wrote:
> 
> +1 (non-binding, doc and packaging issues aside)
> 
> Built from source, ran jobs and spark-shell against a pseudo-distributed
> YARN cluster.
> 
> On Sun, Mar 8, 2015 at 2:42 PM, Krishna Sankar  wrote:
> 
>> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
>> Distributions X ...
>> 
>> May be one option is to have a minimum basic set (which I know is what we
>> are discussing) and move the rest to spark-packages.org. There the vendors
>> can add the latest downloads - for example when 1.4 is released, HDP can
>> build a release of HDP Spark 1.4 bundle.
>> 
>> Cheers
>> 
>> 
>> On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell 
>> wrote:
>> 
>>> We probably want to revisit the way we do binaries in general for
>>> 1.4+. IMO, something worth forking a separate thread for.
>>> 
>>> I've been hesitating to add new binaries because people
>>> (understandably) complain if you ever stop packaging older ones, but
>>> on the other hand the ASF has complained that we have too many
>>> binaries already and that we need to pare it down because of the large
>>> volume of files. Doubling the number of binaries we produce for Scala
>>> 2.11 seemed like it would be too much.
>>> 
>>> One solution potentially is to actually package "Hadoop provided"
>>> binaries and encourage users to use these by simply setting
>>> HADOOP_HOME, or have instructions for specific distros. I've heard
>>> that our existing packages don't work well on HDP for instance, since
>>> there are some configuration quirks that differ from the upstream
>>> Hadoop.
>>> 
>>> If we cut down on the cross building for Hadoop versions, then it is
>>> more tenable to cross build for Scala versions without exploding the
>>> number of binaries.
>>> 
>>> - Patrick
>>> 
>>> On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen  wrote:
 Yeah, interesting question of what is the better default for the
 single set of artifacts published to Maven. I think there's an
 argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
 and cons discussed more at
 
 https://issues.apache.org/jira/browse/SPARK-5134
 https://github.com/apache/spark/pull/3917
 
 On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia >> 
>>> wrote:
> +1
> 
> Tested it on Mac OS X.
> 
> One small issue I noticed is that the Scala 2.11 build is using Hadoop
>>> 1 without Hive, which is kind of weird because people will more likely
>> want
>>> Hadoop 2 with Hive. So it would be good to publish a build for that
>>> configuration instead. We can do it if we do a new RC, or it might be
>> that
>>> binary builds may not need to be voted on (I forgot the details there).
> 
> Matei
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>> 
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-09 Thread Mridul Muralidharan
In ideal situation, +1 on removing all vendor specific builds and
making just hadoop version specific - that is what we should depend on
anyway.
Though I hope Sean is correct in assuming that vendor specific builds
for hadoop 2.4 are just that; and not 2.4- or 2.4+ which cause
incompatibilities for us or our users !

Regards,
Mridul


On Mon, Mar 9, 2015 at 2:50 AM, Sean Owen  wrote:
> Yes, you should always find working bits at Apache no matter what --
> though 'no matter what' really means 'as long as you use Hadoop distro
> compatible with upstream Hadoop'. Even distros have a strong interest
> in that, since the market, the 'pie', is made large by this kind of
> freedom at the core.
>
> If tso, then no vendor-specific builds are needed, only some
> Hadoop-release-specific ones. So a Hadoop 2.6-specific build could be
> good (although I'm not yet clear if there's something about 2.5 or 2.6
> that needs a different build.)
>
> I take it that we already believe that, say, the "Hadoop 2.4" build
> works with CDH5, so no CDH5-specific build is provided by Spark.
>
> If a distro doesn't work with stock Spark, then it's either something
> Spark should fix (e.g. use of a private YARN API or something), or
> it's something the distro should really fix because it's incompatible.
>
> Could we maybe rename the "CDH4" build then, as it doesn't really work
> with all CDH4, to be a "Hadoop 2.0.x build"? That's been floated
> before. And can we remove the MapR builds -- or else can someone
> explain why these exist separately from a Hadoop 2.3 build? I hope it
> is not *because* they are somehow non-standard. And shall we first run
> down why Spark doesn't fully work on HDP and see if it's something
> that Spark or HDP needs to tweak, rather than contemplate another
> binary? or, if so, can it simply be called a "Hadoop 2.7 + YARN
> whatever" build and not made specific to a vendor, even if the project
> has to field another tarball combo for a vendor?
>
> Maybe we are saying almost the same thing.
>
>
> On Mon, Mar 9, 2015 at 1:33 AM, Matei Zaharia  wrote:
>> Yeah, my concern is that people should get Apache Spark from *Apache*, not 
>> from a vendor. It helps everyone use the latest features no matter where 
>> they are. In the Hadoop distro case, Hadoop made all this effort to have 
>> standard APIs (e.g. YARN), so it should be easy. But it is a problem if 
>> we're not packaging for the newest versions of some distros; I think we just 
>> fell behind at Hadoop 2.4.
>>
>> Matei
>>
>>> On Mar 8, 2015, at 8:02 PM, Sean Owen  wrote:
>>>
>>> Yeah it's not much overhead, but here's an example of where it causes
>>> a little issue.
>>>
>>> I like that reasoning. However, the released builds don't track the
>>> later versions of Hadoop that vendors would be distributing -- there's
>>> no Hadoop 2.6 build for example. CDH4 is here, but not the
>>> far-more-used CDH5. HDP isn't present at all. The CDH4 build doesn't
>>> actually work with many CDH4 versions.
>>>
>>> I agree with the goal of maximizing the reach of Spark, but I don't
>>> know how much these builds advance that goal.
>>>
>>> Anyone can roll-their-own exactly-right build, and the docs and build
>>> have been set up to make that as simple as can be expected. So these
>>> aren't *required* to let me use latest Spark on distribution X.
>>>
>>> I had thought these existed to sorta support 'legacy' distributions,
>>> like CDH4, and that build was justified as a
>>> quasi-Hadoop-2.0.x-flavored build. But then I don't understand what
>>> the MapR profiles are for.
>>>
>>> I think it's too much work to correctly, in parallel, maintain any
>>> customizations necessary for any major distro, and it might be best to
>>> do not at all than to do it incompletely. You could say it's also an
>>> enabler for distros to vary in ways that require special
>>> customization.
>>>
>>> Maybe there's a concern that, if lots of people consume Spark on
>>> Hadoop, and most people consume Hadoop through distros, and distros
>>> alone manage Spark distributions, then you de facto 'have to' go
>>> through a distro instead of get bits from Spark? Different
>>> conversation but I think this sort of effect does not end up being a
>>> negative.
>>>
>>> Well anyway, I like the idea of seeing how far Hadoop-provided
>>> releases can help. It might kill several birds with one stone.
>>>
>>> On Sun, Mar 8, 2015 at 11:07 PM, Matei Zaharia  
>>> wrote:
 Our goal is to let people use the latest Apache release even if vendors 
 fall behind or don't want to package everything, so that's why we put out 
 releases for vendors' versions. It's fairly low overhead.

 Matei

> On Mar 8, 2015, at 5:56 PM, Sean Owen  wrote:
>
> Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
> at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
> Maven artifacts.
>
> Patrick I see you just commented on SPARK-5134 and will follow up
> 

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-09 Thread Sean Owen
Yes, you should always find working bits at Apache no matter what --
though 'no matter what' really means 'as long as you use Hadoop distro
compatible with upstream Hadoop'. Even distros have a strong interest
in that, since the market, the 'pie', is made large by this kind of
freedom at the core.

If tso, then no vendor-specific builds are needed, only some
Hadoop-release-specific ones. So a Hadoop 2.6-specific build could be
good (although I'm not yet clear if there's something about 2.5 or 2.6
that needs a different build.)

I take it that we already believe that, say, the "Hadoop 2.4" build
works with CDH5, so no CDH5-specific build is provided by Spark.

If a distro doesn't work with stock Spark, then it's either something
Spark should fix (e.g. use of a private YARN API or something), or
it's something the distro should really fix because it's incompatible.

Could we maybe rename the "CDH4" build then, as it doesn't really work
with all CDH4, to be a "Hadoop 2.0.x build"? That's been floated
before. And can we remove the MapR builds -- or else can someone
explain why these exist separately from a Hadoop 2.3 build? I hope it
is not *because* they are somehow non-standard. And shall we first run
down why Spark doesn't fully work on HDP and see if it's something
that Spark or HDP needs to tweak, rather than contemplate another
binary? or, if so, can it simply be called a "Hadoop 2.7 + YARN
whatever" build and not made specific to a vendor, even if the project
has to field another tarball combo for a vendor?

Maybe we are saying almost the same thing.


On Mon, Mar 9, 2015 at 1:33 AM, Matei Zaharia  wrote:
> Yeah, my concern is that people should get Apache Spark from *Apache*, not 
> from a vendor. It helps everyone use the latest features no matter where they 
> are. In the Hadoop distro case, Hadoop made all this effort to have standard 
> APIs (e.g. YARN), so it should be easy. But it is a problem if we're not 
> packaging for the newest versions of some distros; I think we just fell 
> behind at Hadoop 2.4.
>
> Matei
>
>> On Mar 8, 2015, at 8:02 PM, Sean Owen  wrote:
>>
>> Yeah it's not much overhead, but here's an example of where it causes
>> a little issue.
>>
>> I like that reasoning. However, the released builds don't track the
>> later versions of Hadoop that vendors would be distributing -- there's
>> no Hadoop 2.6 build for example. CDH4 is here, but not the
>> far-more-used CDH5. HDP isn't present at all. The CDH4 build doesn't
>> actually work with many CDH4 versions.
>>
>> I agree with the goal of maximizing the reach of Spark, but I don't
>> know how much these builds advance that goal.
>>
>> Anyone can roll-their-own exactly-right build, and the docs and build
>> have been set up to make that as simple as can be expected. So these
>> aren't *required* to let me use latest Spark on distribution X.
>>
>> I had thought these existed to sorta support 'legacy' distributions,
>> like CDH4, and that build was justified as a
>> quasi-Hadoop-2.0.x-flavored build. But then I don't understand what
>> the MapR profiles are for.
>>
>> I think it's too much work to correctly, in parallel, maintain any
>> customizations necessary for any major distro, and it might be best to
>> do not at all than to do it incompletely. You could say it's also an
>> enabler for distros to vary in ways that require special
>> customization.
>>
>> Maybe there's a concern that, if lots of people consume Spark on
>> Hadoop, and most people consume Hadoop through distros, and distros
>> alone manage Spark distributions, then you de facto 'have to' go
>> through a distro instead of get bits from Spark? Different
>> conversation but I think this sort of effect does not end up being a
>> negative.
>>
>> Well anyway, I like the idea of seeing how far Hadoop-provided
>> releases can help. It might kill several birds with one stone.
>>
>> On Sun, Mar 8, 2015 at 11:07 PM, Matei Zaharia  
>> wrote:
>>> Our goal is to let people use the latest Apache release even if vendors 
>>> fall behind or don't want to package everything, so that's why we put out 
>>> releases for vendors' versions. It's fairly low overhead.
>>>
>>> Matei
>>>
 On Mar 8, 2015, at 5:56 PM, Sean Owen  wrote:

 Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
 at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
 Maven artifacts.

 Patrick I see you just commented on SPARK-5134 and will follow up
 there. Sounds like this may accidentally not be a problem.

 On binary tarball releases, I wonder if anyone has an opinion on my
 opinion that these shouldn't be distributed for specific Hadoop
 *distributions* to begin with. (Won't repeat the argument here yet.)
 That resolves this n x m explosion too.

 Vendors already provide their own distribution, yes, that's their job.


 On Sun, Mar 8, 2015 at 9:42 PM, Krishna Sankar  wrote:
> Yep, otherwise this will bec

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Sandy Ryza
+1 (non-binding, doc and packaging issues aside)

Built from source, ran jobs and spark-shell against a pseudo-distributed
YARN cluster.

On Sun, Mar 8, 2015 at 2:42 PM, Krishna Sankar  wrote:

> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
> Distributions X ...
>
> May be one option is to have a minimum basic set (which I know is what we
> are discussing) and move the rest to spark-packages.org. There the vendors
> can add the latest downloads - for example when 1.4 is released, HDP can
> build a release of HDP Spark 1.4 bundle.
>
> Cheers
> 
>
> On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell 
> wrote:
>
> > We probably want to revisit the way we do binaries in general for
> > 1.4+. IMO, something worth forking a separate thread for.
> >
> > I've been hesitating to add new binaries because people
> > (understandably) complain if you ever stop packaging older ones, but
> > on the other hand the ASF has complained that we have too many
> > binaries already and that we need to pare it down because of the large
> > volume of files. Doubling the number of binaries we produce for Scala
> > 2.11 seemed like it would be too much.
> >
> > One solution potentially is to actually package "Hadoop provided"
> > binaries and encourage users to use these by simply setting
> > HADOOP_HOME, or have instructions for specific distros. I've heard
> > that our existing packages don't work well on HDP for instance, since
> > there are some configuration quirks that differ from the upstream
> > Hadoop.
> >
> > If we cut down on the cross building for Hadoop versions, then it is
> > more tenable to cross build for Scala versions without exploding the
> > number of binaries.
> >
> > - Patrick
> >
> > On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen  wrote:
> > > Yeah, interesting question of what is the better default for the
> > > single set of artifacts published to Maven. I think there's an
> > > argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
> > > and cons discussed more at
> > >
> > > https://issues.apache.org/jira/browse/SPARK-5134
> > > https://github.com/apache/spark/pull/3917
> > >
> > > On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia  >
> > wrote:
> > >> +1
> > >>
> > >> Tested it on Mac OS X.
> > >>
> > >> One small issue I noticed is that the Scala 2.11 build is using Hadoop
> > 1 without Hive, which is kind of weird because people will more likely
> want
> > Hadoop 2 with Hive. So it would be good to publish a build for that
> > configuration instead. We can do it if we do a new RC, or it might be
> that
> > binary builds may not need to be voted on (I forgot the details there).
> > >>
> > >> Matei
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
> >
>


Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Matei Zaharia
Yeah, my concern is that people should get Apache Spark from *Apache*, not from 
a vendor. It helps everyone use the latest features no matter where they are. 
In the Hadoop distro case, Hadoop made all this effort to have standard APIs 
(e.g. YARN), so it should be easy. But it is a problem if we're not packaging 
for the newest versions of some distros; I think we just fell behind at Hadoop 
2.4.

Matei

> On Mar 8, 2015, at 8:02 PM, Sean Owen  wrote:
> 
> Yeah it's not much overhead, but here's an example of where it causes
> a little issue.
> 
> I like that reasoning. However, the released builds don't track the
> later versions of Hadoop that vendors would be distributing -- there's
> no Hadoop 2.6 build for example. CDH4 is here, but not the
> far-more-used CDH5. HDP isn't present at all. The CDH4 build doesn't
> actually work with many CDH4 versions.
> 
> I agree with the goal of maximizing the reach of Spark, but I don't
> know how much these builds advance that goal.
> 
> Anyone can roll-their-own exactly-right build, and the docs and build
> have been set up to make that as simple as can be expected. So these
> aren't *required* to let me use latest Spark on distribution X.
> 
> I had thought these existed to sorta support 'legacy' distributions,
> like CDH4, and that build was justified as a
> quasi-Hadoop-2.0.x-flavored build. But then I don't understand what
> the MapR profiles are for.
> 
> I think it's too much work to correctly, in parallel, maintain any
> customizations necessary for any major distro, and it might be best to
> do not at all than to do it incompletely. You could say it's also an
> enabler for distros to vary in ways that require special
> customization.
> 
> Maybe there's a concern that, if lots of people consume Spark on
> Hadoop, and most people consume Hadoop through distros, and distros
> alone manage Spark distributions, then you de facto 'have to' go
> through a distro instead of get bits from Spark? Different
> conversation but I think this sort of effect does not end up being a
> negative.
> 
> Well anyway, I like the idea of seeing how far Hadoop-provided
> releases can help. It might kill several birds with one stone.
> 
> On Sun, Mar 8, 2015 at 11:07 PM, Matei Zaharia  
> wrote:
>> Our goal is to let people use the latest Apache release even if vendors fall 
>> behind or don't want to package everything, so that's why we put out 
>> releases for vendors' versions. It's fairly low overhead.
>> 
>> Matei
>> 
>>> On Mar 8, 2015, at 5:56 PM, Sean Owen  wrote:
>>> 
>>> Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
>>> at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
>>> Maven artifacts.
>>> 
>>> Patrick I see you just commented on SPARK-5134 and will follow up
>>> there. Sounds like this may accidentally not be a problem.
>>> 
>>> On binary tarball releases, I wonder if anyone has an opinion on my
>>> opinion that these shouldn't be distributed for specific Hadoop
>>> *distributions* to begin with. (Won't repeat the argument here yet.)
>>> That resolves this n x m explosion too.
>>> 
>>> Vendors already provide their own distribution, yes, that's their job.
>>> 
>>> 
>>> On Sun, Mar 8, 2015 at 9:42 PM, Krishna Sankar  wrote:
 Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
 Distributions X ...
 
 May be one option is to have a minimum basic set (which I know is what we
 are discussing) and move the rest to spark-packages.org. There the vendors
 can add the latest downloads - for example when 1.4 is released, HDP can
 build a release of HDP Spark 1.4 bundle.
 
 Cheers
 
 
 On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell  wrote:
> 
> We probably want to revisit the way we do binaries in general for
> 1.4+. IMO, something worth forking a separate thread for.
> 
> I've been hesitating to add new binaries because people
> (understandably) complain if you ever stop packaging older ones, but
> on the other hand the ASF has complained that we have too many
> binaries already and that we need to pare it down because of the large
> volume of files. Doubling the number of binaries we produce for Scala
> 2.11 seemed like it would be too much.
> 
> One solution potentially is to actually package "Hadoop provided"
> binaries and encourage users to use these by simply setting
> HADOOP_HOME, or have instructions for specific distros. I've heard
> that our existing packages don't work well on HDP for instance, since
> there are some configuration quirks that differ from the upstream
> Hadoop.
> 
> If we cut down on the cross building for Hadoop versions, then it is
> more tenable to cross build for Scala versions without exploding the
> number of binaries.
> 
> - Patrick
> 
> On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen  wrote:
>> Yeah, interesting question of what is th

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Sean Owen
Yeah it's not much overhead, but here's an example of where it causes
a little issue.

I like that reasoning. However, the released builds don't track the
later versions of Hadoop that vendors would be distributing -- there's
no Hadoop 2.6 build for example. CDH4 is here, but not the
far-more-used CDH5. HDP isn't present at all. The CDH4 build doesn't
actually work with many CDH4 versions.

I agree with the goal of maximizing the reach of Spark, but I don't
know how much these builds advance that goal.

Anyone can roll-their-own exactly-right build, and the docs and build
have been set up to make that as simple as can be expected. So these
aren't *required* to let me use latest Spark on distribution X.

I had thought these existed to sorta support 'legacy' distributions,
like CDH4, and that build was justified as a
quasi-Hadoop-2.0.x-flavored build. But then I don't understand what
the MapR profiles are for.

I think it's too much work to correctly, in parallel, maintain any
customizations necessary for any major distro, and it might be best to
do not at all than to do it incompletely. You could say it's also an
enabler for distros to vary in ways that require special
customization.

Maybe there's a concern that, if lots of people consume Spark on
Hadoop, and most people consume Hadoop through distros, and distros
alone manage Spark distributions, then you de facto 'have to' go
through a distro instead of get bits from Spark? Different
conversation but I think this sort of effect does not end up being a
negative.

Well anyway, I like the idea of seeing how far Hadoop-provided
releases can help. It might kill several birds with one stone.

On Sun, Mar 8, 2015 at 11:07 PM, Matei Zaharia  wrote:
> Our goal is to let people use the latest Apache release even if vendors fall 
> behind or don't want to package everything, so that's why we put out releases 
> for vendors' versions. It's fairly low overhead.
>
> Matei
>
>> On Mar 8, 2015, at 5:56 PM, Sean Owen  wrote:
>>
>> Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
>> at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
>> Maven artifacts.
>>
>> Patrick I see you just commented on SPARK-5134 and will follow up
>> there. Sounds like this may accidentally not be a problem.
>>
>> On binary tarball releases, I wonder if anyone has an opinion on my
>> opinion that these shouldn't be distributed for specific Hadoop
>> *distributions* to begin with. (Won't repeat the argument here yet.)
>> That resolves this n x m explosion too.
>>
>> Vendors already provide their own distribution, yes, that's their job.
>>
>>
>> On Sun, Mar 8, 2015 at 9:42 PM, Krishna Sankar  wrote:
>>> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
>>> Distributions X ...
>>>
>>> May be one option is to have a minimum basic set (which I know is what we
>>> are discussing) and move the rest to spark-packages.org. There the vendors
>>> can add the latest downloads - for example when 1.4 is released, HDP can
>>> build a release of HDP Spark 1.4 bundle.
>>>
>>> Cheers
>>> 
>>>
>>> On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell  wrote:

 We probably want to revisit the way we do binaries in general for
 1.4+. IMO, something worth forking a separate thread for.

 I've been hesitating to add new binaries because people
 (understandably) complain if you ever stop packaging older ones, but
 on the other hand the ASF has complained that we have too many
 binaries already and that we need to pare it down because of the large
 volume of files. Doubling the number of binaries we produce for Scala
 2.11 seemed like it would be too much.

 One solution potentially is to actually package "Hadoop provided"
 binaries and encourage users to use these by simply setting
 HADOOP_HOME, or have instructions for specific distros. I've heard
 that our existing packages don't work well on HDP for instance, since
 there are some configuration quirks that differ from the upstream
 Hadoop.

 If we cut down on the cross building for Hadoop versions, then it is
 more tenable to cross build for Scala versions without exploding the
 number of binaries.

 - Patrick

 On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen  wrote:
> Yeah, interesting question of what is the better default for the
> single set of artifacts published to Maven. I think there's an
> argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
> and cons discussed more at
>
> https://issues.apache.org/jira/browse/SPARK-5134
> https://github.com/apache/spark/pull/3917
>
> On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia 
> wrote:
>> +1
>>
>> Tested it on Mac OS X.
>>
>> One small issue I noticed is that the Scala 2.11 build is using Hadoop
>> 1 without Hive, which is kind of weird because people will more likely 
>> want
>> Hadoop 

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Patrick Wendell
I think it's important to separate the goals from the implementation.
I agree with Matei on the goal - I think the goal needs to be to allow
people to download Apache Spark and use it with CDH, HDP, MapR,
whatever... This is the whole reason why HDFS and YARN have stable
API's, so that other projects can build on them in a way that works
across multiple versions. I wouldn't want to force users to upgrade
according only to some vendor timetable, that doesn't seem from the
ASF perspective like a good thing for the project. If users want to
get packages from Bigtop, or the vendors, that's totally fine too.

My point earlier was - I am not sure we are actually accomplishing
that goal now, because I've heard in some cases our "Hadoop 2.X"
packages actually don't work on certain distributions, even those that
are based on that Hadoop version. So one solution is to move towards
"bring your own Hadoop" binaries and have users just set HADOOP_HOME
and maybe document any vendor-specific configs that need to be set.
That also happens to solve the "too many binaries" problem, but only
incidentally.

- Patrick

On Sun, Mar 8, 2015 at 4:07 PM, Matei Zaharia  wrote:
> Our goal is to let people use the latest Apache release even if vendors fall 
> behind or don't want to package everything, so that's why we put out releases 
> for vendors' versions. It's fairly low overhead.
>
> Matei
>
>> On Mar 8, 2015, at 5:56 PM, Sean Owen  wrote:
>>
>> Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
>> at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
>> Maven artifacts.
>>
>> Patrick I see you just commented on SPARK-5134 and will follow up
>> there. Sounds like this may accidentally not be a problem.
>>
>> On binary tarball releases, I wonder if anyone has an opinion on my
>> opinion that these shouldn't be distributed for specific Hadoop
>> *distributions* to begin with. (Won't repeat the argument here yet.)
>> That resolves this n x m explosion too.
>>
>> Vendors already provide their own distribution, yes, that's their job.
>>
>>
>> On Sun, Mar 8, 2015 at 9:42 PM, Krishna Sankar  wrote:
>>> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
>>> Distributions X ...
>>>
>>> May be one option is to have a minimum basic set (which I know is what we
>>> are discussing) and move the rest to spark-packages.org. There the vendors
>>> can add the latest downloads - for example when 1.4 is released, HDP can
>>> build a release of HDP Spark 1.4 bundle.
>>>
>>> Cheers
>>> 
>>>
>>> On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell  wrote:

 We probably want to revisit the way we do binaries in general for
 1.4+. IMO, something worth forking a separate thread for.

 I've been hesitating to add new binaries because people
 (understandably) complain if you ever stop packaging older ones, but
 on the other hand the ASF has complained that we have too many
 binaries already and that we need to pare it down because of the large
 volume of files. Doubling the number of binaries we produce for Scala
 2.11 seemed like it would be too much.

 One solution potentially is to actually package "Hadoop provided"
 binaries and encourage users to use these by simply setting
 HADOOP_HOME, or have instructions for specific distros. I've heard
 that our existing packages don't work well on HDP for instance, since
 there are some configuration quirks that differ from the upstream
 Hadoop.

 If we cut down on the cross building for Hadoop versions, then it is
 more tenable to cross build for Scala versions without exploding the
 number of binaries.

 - Patrick

 On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen  wrote:
> Yeah, interesting question of what is the better default for the
> single set of artifacts published to Maven. I think there's an
> argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
> and cons discussed more at
>
> https://issues.apache.org/jira/browse/SPARK-5134
> https://github.com/apache/spark/pull/3917
>
> On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia 
> wrote:
>> +1
>>
>> Tested it on Mac OS X.
>>
>> One small issue I noticed is that the Scala 2.11 build is using Hadoop
>> 1 without Hive, which is kind of weird because people will more likely 
>> want
>> Hadoop 2 with Hive. So it would be good to publish a build for that
>> configuration instead. We can do it if we do a new RC, or it might be 
>> that
>> binary builds may not need to be voted on (I forgot the details there).
>>
>> Matei

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

>>>
>

-
To unsubscribe

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Matei Zaharia
Our goal is to let people use the latest Apache release even if vendors fall 
behind or don't want to package everything, so that's why we put out releases 
for vendors' versions. It's fairly low overhead.

Matei

> On Mar 8, 2015, at 5:56 PM, Sean Owen  wrote:
> 
> Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
> at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
> Maven artifacts.
> 
> Patrick I see you just commented on SPARK-5134 and will follow up
> there. Sounds like this may accidentally not be a problem.
> 
> On binary tarball releases, I wonder if anyone has an opinion on my
> opinion that these shouldn't be distributed for specific Hadoop
> *distributions* to begin with. (Won't repeat the argument here yet.)
> That resolves this n x m explosion too.
> 
> Vendors already provide their own distribution, yes, that's their job.
> 
> 
> On Sun, Mar 8, 2015 at 9:42 PM, Krishna Sankar  wrote:
>> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
>> Distributions X ...
>> 
>> May be one option is to have a minimum basic set (which I know is what we
>> are discussing) and move the rest to spark-packages.org. There the vendors
>> can add the latest downloads - for example when 1.4 is released, HDP can
>> build a release of HDP Spark 1.4 bundle.
>> 
>> Cheers
>> 
>> 
>> On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell  wrote:
>>> 
>>> We probably want to revisit the way we do binaries in general for
>>> 1.4+. IMO, something worth forking a separate thread for.
>>> 
>>> I've been hesitating to add new binaries because people
>>> (understandably) complain if you ever stop packaging older ones, but
>>> on the other hand the ASF has complained that we have too many
>>> binaries already and that we need to pare it down because of the large
>>> volume of files. Doubling the number of binaries we produce for Scala
>>> 2.11 seemed like it would be too much.
>>> 
>>> One solution potentially is to actually package "Hadoop provided"
>>> binaries and encourage users to use these by simply setting
>>> HADOOP_HOME, or have instructions for specific distros. I've heard
>>> that our existing packages don't work well on HDP for instance, since
>>> there are some configuration quirks that differ from the upstream
>>> Hadoop.
>>> 
>>> If we cut down on the cross building for Hadoop versions, then it is
>>> more tenable to cross build for Scala versions without exploding the
>>> number of binaries.
>>> 
>>> - Patrick
>>> 
>>> On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen  wrote:
 Yeah, interesting question of what is the better default for the
 single set of artifacts published to Maven. I think there's an
 argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
 and cons discussed more at
 
 https://issues.apache.org/jira/browse/SPARK-5134
 https://github.com/apache/spark/pull/3917
 
 On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia 
 wrote:
> +1
> 
> Tested it on Mac OS X.
> 
> One small issue I noticed is that the Scala 2.11 build is using Hadoop
> 1 without Hive, which is kind of weird because people will more likely 
> want
> Hadoop 2 with Hive. So it would be good to publish a build for that
> configuration instead. We can do it if we do a new RC, or it might be that
> binary builds may not need to be voted on (I forgot the details there).
> 
> Matei
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Sean Owen
Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
Maven artifacts.

Patrick I see you just commented on SPARK-5134 and will follow up
there. Sounds like this may accidentally not be a problem.

On binary tarball releases, I wonder if anyone has an opinion on my
opinion that these shouldn't be distributed for specific Hadoop
*distributions* to begin with. (Won't repeat the argument here yet.)
That resolves this n x m explosion too.

Vendors already provide their own distribution, yes, that's their job.


On Sun, Mar 8, 2015 at 9:42 PM, Krishna Sankar  wrote:
> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
> Distributions X ...
>
> May be one option is to have a minimum basic set (which I know is what we
> are discussing) and move the rest to spark-packages.org. There the vendors
> can add the latest downloads - for example when 1.4 is released, HDP can
> build a release of HDP Spark 1.4 bundle.
>
> Cheers
> 
>
> On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell  wrote:
>>
>> We probably want to revisit the way we do binaries in general for
>> 1.4+. IMO, something worth forking a separate thread for.
>>
>> I've been hesitating to add new binaries because people
>> (understandably) complain if you ever stop packaging older ones, but
>> on the other hand the ASF has complained that we have too many
>> binaries already and that we need to pare it down because of the large
>> volume of files. Doubling the number of binaries we produce for Scala
>> 2.11 seemed like it would be too much.
>>
>> One solution potentially is to actually package "Hadoop provided"
>> binaries and encourage users to use these by simply setting
>> HADOOP_HOME, or have instructions for specific distros. I've heard
>> that our existing packages don't work well on HDP for instance, since
>> there are some configuration quirks that differ from the upstream
>> Hadoop.
>>
>> If we cut down on the cross building for Hadoop versions, then it is
>> more tenable to cross build for Scala versions without exploding the
>> number of binaries.
>>
>> - Patrick
>>
>> On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen  wrote:
>> > Yeah, interesting question of what is the better default for the
>> > single set of artifacts published to Maven. I think there's an
>> > argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
>> > and cons discussed more at
>> >
>> > https://issues.apache.org/jira/browse/SPARK-5134
>> > https://github.com/apache/spark/pull/3917
>> >
>> > On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia 
>> > wrote:
>> >> +1
>> >>
>> >> Tested it on Mac OS X.
>> >>
>> >> One small issue I noticed is that the Scala 2.11 build is using Hadoop
>> >> 1 without Hive, which is kind of weird because people will more likely 
>> >> want
>> >> Hadoop 2 with Hive. So it would be good to publish a build for that
>> >> configuration instead. We can do it if we do a new RC, or it might be that
>> >> binary builds may not need to be voted on (I forgot the details there).
>> >>
>> >> Matei
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Krishna Sankar
Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
Distributions X ...

May be one option is to have a minimum basic set (which I know is what we
are discussing) and move the rest to spark-packages.org. There the vendors
can add the latest downloads - for example when 1.4 is released, HDP can
build a release of HDP Spark 1.4 bundle.

Cheers


On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell  wrote:

> We probably want to revisit the way we do binaries in general for
> 1.4+. IMO, something worth forking a separate thread for.
>
> I've been hesitating to add new binaries because people
> (understandably) complain if you ever stop packaging older ones, but
> on the other hand the ASF has complained that we have too many
> binaries already and that we need to pare it down because of the large
> volume of files. Doubling the number of binaries we produce for Scala
> 2.11 seemed like it would be too much.
>
> One solution potentially is to actually package "Hadoop provided"
> binaries and encourage users to use these by simply setting
> HADOOP_HOME, or have instructions for specific distros. I've heard
> that our existing packages don't work well on HDP for instance, since
> there are some configuration quirks that differ from the upstream
> Hadoop.
>
> If we cut down on the cross building for Hadoop versions, then it is
> more tenable to cross build for Scala versions without exploding the
> number of binaries.
>
> - Patrick
>
> On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen  wrote:
> > Yeah, interesting question of what is the better default for the
> > single set of artifacts published to Maven. I think there's an
> > argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
> > and cons discussed more at
> >
> > https://issues.apache.org/jira/browse/SPARK-5134
> > https://github.com/apache/spark/pull/3917
> >
> > On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia 
> wrote:
> >> +1
> >>
> >> Tested it on Mac OS X.
> >>
> >> One small issue I noticed is that the Scala 2.11 build is using Hadoop
> 1 without Hive, which is kind of weird because people will more likely want
> Hadoop 2 with Hive. So it would be good to publish a build for that
> configuration instead. We can do it if we do a new RC, or it might be that
> binary builds may not need to be voted on (I forgot the details there).
> >>
> >> Matei
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Patrick Wendell
We probably want to revisit the way we do binaries in general for
1.4+. IMO, something worth forking a separate thread for.

I've been hesitating to add new binaries because people
(understandably) complain if you ever stop packaging older ones, but
on the other hand the ASF has complained that we have too many
binaries already and that we need to pare it down because of the large
volume of files. Doubling the number of binaries we produce for Scala
2.11 seemed like it would be too much.

One solution potentially is to actually package "Hadoop provided"
binaries and encourage users to use these by simply setting
HADOOP_HOME, or have instructions for specific distros. I've heard
that our existing packages don't work well on HDP for instance, since
there are some configuration quirks that differ from the upstream
Hadoop.

If we cut down on the cross building for Hadoop versions, then it is
more tenable to cross build for Scala versions without exploding the
number of binaries.

- Patrick

On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen  wrote:
> Yeah, interesting question of what is the better default for the
> single set of artifacts published to Maven. I think there's an
> argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
> and cons discussed more at
>
> https://issues.apache.org/jira/browse/SPARK-5134
> https://github.com/apache/spark/pull/3917
>
> On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia  wrote:
>> +1
>>
>> Tested it on Mac OS X.
>>
>> One small issue I noticed is that the Scala 2.11 build is using Hadoop 1 
>> without Hive, which is kind of weird because people will more likely want 
>> Hadoop 2 with Hive. So it would be good to publish a build for that 
>> configuration instead. We can do it if we do a new RC, or it might be that 
>> binary builds may not need to be voted on (I forgot the details there).
>>
>> Matei

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Sean Owen
Yeah, interesting question of what is the better default for the
single set of artifacts published to Maven. I think there's an
argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
and cons discussed more at

https://issues.apache.org/jira/browse/SPARK-5134
https://github.com/apache/spark/pull/3917

On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia  wrote:
> +1
>
> Tested it on Mac OS X.
>
> One small issue I noticed is that the Scala 2.11 build is using Hadoop 1 
> without Hive, which is kind of weird because people will more likely want 
> Hadoop 2 with Hive. So it would be good to publish a build for that 
> configuration instead. We can do it if we do a new RC, or it might be that 
> binary builds may not need to be voted on (I forgot the details there).
>
> Matei

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Matei Zaharia
+1

Tested it on Mac OS X.

One small issue I noticed is that the Scala 2.11 build is using Hadoop 1 
without Hive, which is kind of weird because people will more likely want 
Hadoop 2 with Hive. So it would be good to publish a build for that 
configuration instead. We can do it if we do a new RC, or it might be that 
binary builds may not need to be voted on (I forgot the details there).

Matei

> On Mar 5, 2015, at 9:52 PM, Patrick Wendell  wrote:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 1.3.0!
> 
> The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc
> 
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3/
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
> 
> Staging repositories for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1078
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/
> 
> Please vote on releasing this package as Apache Spark 1.3.0!
> 
> The vote is open until Monday, March 09, at 02:52 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 1.3.0
> [ ] -1 Do not release this package because ...
> 
> To learn more about Apache Spark, please see
> http://spark.apache.org/
> 
> == How does this compare to RC2 ==
> This release includes the following bug fixes:
> 
> https://issues.apache.org/jira/browse/SPARK-6144
> https://issues.apache.org/jira/browse/SPARK-6171
> https://issues.apache.org/jira/browse/SPARK-5143
> https://issues.apache.org/jira/browse/SPARK-6182
> https://issues.apache.org/jira/browse/SPARK-6175
> 
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.2 workload and running on this release candidate,
> then reporting any regressions.
> 
> If you are happy with this release based on your own testing, give a +1 vote.
> 
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.3 QA period,
> so -1 votes should only occur for significant regressions from 1.2.1.
> Bugs already present in 1.2.X, minor regressions, or bugs related
> to new features will not block this release.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
For now, I'll just put this as critical. We can discuss the
documentation stuff offline or in another thread.

On Fri, Mar 6, 2015 at 1:36 PM, Sean Owen  wrote:
> Although the problem is small, especially if indeed the essential docs
> changes are following just a couple days behind the final release, I
> mean, why the rush if they're essential? wait a couple days, finish
> them, make the release.
>
> Answer is, I think these changes aren't actually essential given the
> comment from tdas, so: just mark these Critical? (although ... they do
> say they're changes for the 1.3 release, so kind of funny to get to
> them for 1.3.x or 1.4, but that's not important now.)
>
> I thought that Blocker really meant Blocker in this project, as I've
> been encouraged to use it to mean "don't release without this." I
> think we should use it that way. Just thinking of it as "extra
> Critical" doesn't add anything. I don't think Documentation should be
> special-cased as less important, and I don't think there's confusion
> if Blocker means what it says, so I'd 'fix' that way.
>
> If nobody sees the Hive failure I observed, and if we can just zap
> those "Blockers" one way or the other, +1
>
>
> On Fri, Mar 6, 2015 at 9:17 PM, Patrick Wendell  wrote:
>> Sean,
>>
>> The docs are distributed and consumed in a fundamentally different way
>> than Spark code itself. So we've always considered the "deadline" for
>> doc changes to be when the release is finally posted.
>>
>> If there are small inconsistencies with the docs present in the source
>> code for that release tag, IMO that doesn't matter much since we don't
>> even distribute the docs with Spark's binary releases and virtually no
>> one builds and hosts the docs on their own (that I am aware of, at
>> least). Perhaps we can recommend if people want to build the doc
>> sources that they should always grab the head of the most recent
>> release branch, to set expectations accordingly.
>>
>> In the past we haven't considered it worth holding up the release
>> process for the purpose of the docs. It just doesn't make sense since
>> they are consumed "as a service". If we decide to change this
>> convention, it would mean shipping our releases later, since we
>> could't pipeline the doc finalization with voting.
>>
>> - Patrick
>>
>> On Fri, Mar 6, 2015 at 11:02 AM, Sean Owen  wrote:
>>> Given the title and tagging, it sounds like there could be some
>>> must-have doc changes to go with what is being released as 1.3. It can
>>> be finished later, and published later, but then the docs source
>>> shipped with the release doesn't match the site, and until then, 1.3
>>> is released without some "must-have" docs for 1.3 on the site.
>>>
>>> The real question to me is: are there any further, absolutely
>>> essential doc changes that need to accompany 1.3 or not?
>>>
>>> If not, just resolve these. If there are, then it seems like the
>>> release has to block on them. If there are some docs that should have
>>> gone in for 1.3, but didn't, but aren't essential, well I suppose it
>>> bears thinking about how to not slip as much work, but it doesn't
>>> block.
>>>
>>> I think Documentation issues certainly can be a blocker and shouldn't
>>> be specially ignored.
>>>
>>>
>>> BTW the UISeleniumSuite issue is a real failure, but I do not think it
>>> is serious: http://issues.apache.org/jira/browse/SPARK-6205  It isn't
>>> a regression from 1.2.x, but only affects tests, and only affects a
>>> subset of build profiles.
>>>
>>>
>>>
>>>
>>> On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell  wrote:
 Hey Sean,

> SPARK-5310 Update SQL programming guide for 1.3
> SPARK-5183 Document data source API
> SPARK-6128 Update Spark Streaming Guide for Spark 1.3

 For these, the issue is that they are documentation JIRA's, which
 don't need to be timed exactly with the release vote, since we can
 update the documentation on the website whenever we want. In the past
 I've just mentally filtered these out when considering RC's. I see a
 few options here:

 1. We downgrade such issues away from Blocker (more clear, but we risk
 loosing them in the fray if they really are things we want to have
 before the release is posted).
 2. We provide a filter to the community that excludes 'Documentation'
 issues and shows all other blockers for 1.3. We can put this on the
 wiki, for instance.

 Which do you prefer?

 - Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Sean Owen
Although the problem is small, especially if indeed the essential docs
changes are following just a couple days behind the final release, I
mean, why the rush if they're essential? wait a couple days, finish
them, make the release.

Answer is, I think these changes aren't actually essential given the
comment from tdas, so: just mark these Critical? (although ... they do
say they're changes for the 1.3 release, so kind of funny to get to
them for 1.3.x or 1.4, but that's not important now.)

I thought that Blocker really meant Blocker in this project, as I've
been encouraged to use it to mean "don't release without this." I
think we should use it that way. Just thinking of it as "extra
Critical" doesn't add anything. I don't think Documentation should be
special-cased as less important, and I don't think there's confusion
if Blocker means what it says, so I'd 'fix' that way.

If nobody sees the Hive failure I observed, and if we can just zap
those "Blockers" one way or the other, +1


On Fri, Mar 6, 2015 at 9:17 PM, Patrick Wendell  wrote:
> Sean,
>
> The docs are distributed and consumed in a fundamentally different way
> than Spark code itself. So we've always considered the "deadline" for
> doc changes to be when the release is finally posted.
>
> If there are small inconsistencies with the docs present in the source
> code for that release tag, IMO that doesn't matter much since we don't
> even distribute the docs with Spark's binary releases and virtually no
> one builds and hosts the docs on their own (that I am aware of, at
> least). Perhaps we can recommend if people want to build the doc
> sources that they should always grab the head of the most recent
> release branch, to set expectations accordingly.
>
> In the past we haven't considered it worth holding up the release
> process for the purpose of the docs. It just doesn't make sense since
> they are consumed "as a service". If we decide to change this
> convention, it would mean shipping our releases later, since we
> could't pipeline the doc finalization with voting.
>
> - Patrick
>
> On Fri, Mar 6, 2015 at 11:02 AM, Sean Owen  wrote:
>> Given the title and tagging, it sounds like there could be some
>> must-have doc changes to go with what is being released as 1.3. It can
>> be finished later, and published later, but then the docs source
>> shipped with the release doesn't match the site, and until then, 1.3
>> is released without some "must-have" docs for 1.3 on the site.
>>
>> The real question to me is: are there any further, absolutely
>> essential doc changes that need to accompany 1.3 or not?
>>
>> If not, just resolve these. If there are, then it seems like the
>> release has to block on them. If there are some docs that should have
>> gone in for 1.3, but didn't, but aren't essential, well I suppose it
>> bears thinking about how to not slip as much work, but it doesn't
>> block.
>>
>> I think Documentation issues certainly can be a blocker and shouldn't
>> be specially ignored.
>>
>>
>> BTW the UISeleniumSuite issue is a real failure, but I do not think it
>> is serious: http://issues.apache.org/jira/browse/SPARK-6205  It isn't
>> a regression from 1.2.x, but only affects tests, and only affects a
>> subset of build profiles.
>>
>>
>>
>>
>> On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell  wrote:
>>> Hey Sean,
>>>
 SPARK-5310 Update SQL programming guide for 1.3
 SPARK-5183 Document data source API
 SPARK-6128 Update Spark Streaming Guide for Spark 1.3
>>>
>>> For these, the issue is that they are documentation JIRA's, which
>>> don't need to be timed exactly with the release vote, since we can
>>> update the documentation on the website whenever we want. In the past
>>> I've just mentally filtered these out when considering RC's. I see a
>>> few options here:
>>>
>>> 1. We downgrade such issues away from Blocker (more clear, but we risk
>>> loosing them in the fray if they really are things we want to have
>>> before the release is posted).
>>> 2. We provide a filter to the community that excludes 'Documentation'
>>> issues and shows all other blockers for 1.3. We can put this on the
>>> wiki, for instance.
>>>
>>> Which do you prefer?
>>>
>>> - Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Tathagata Das
To add to what Patrick said, the only reason that those JIRAs are marked as
Blockers (at least I can say for myself) is so that they are at the top of
the JIRA list signifying that these are more *immediate* issues than all
the Critical issues. To make it less confusing for the community voting, we
can definitely add a filter that ignores Documentation issues from the JIRA
list.


On Fri, Mar 6, 2015 at 1:17 PM, Patrick Wendell  wrote:

> Sean,
>
> The docs are distributed and consumed in a fundamentally different way
> than Spark code itself. So we've always considered the "deadline" for
> doc changes to be when the release is finally posted.
>
> If there are small inconsistencies with the docs present in the source
> code for that release tag, IMO that doesn't matter much since we don't
> even distribute the docs with Spark's binary releases and virtually no
> one builds and hosts the docs on their own (that I am aware of, at
> least). Perhaps we can recommend if people want to build the doc
> sources that they should always grab the head of the most recent
> release branch, to set expectations accordingly.
>
> In the past we haven't considered it worth holding up the release
> process for the purpose of the docs. It just doesn't make sense since
> they are consumed "as a service". If we decide to change this
> convention, it would mean shipping our releases later, since we
> could't pipeline the doc finalization with voting.
>
> - Patrick
>
> On Fri, Mar 6, 2015 at 11:02 AM, Sean Owen  wrote:
> > Given the title and tagging, it sounds like there could be some
> > must-have doc changes to go with what is being released as 1.3. It can
> > be finished later, and published later, but then the docs source
> > shipped with the release doesn't match the site, and until then, 1.3
> > is released without some "must-have" docs for 1.3 on the site.
> >
> > The real question to me is: are there any further, absolutely
> > essential doc changes that need to accompany 1.3 or not?
> >
> > If not, just resolve these. If there are, then it seems like the
> > release has to block on them. If there are some docs that should have
> > gone in for 1.3, but didn't, but aren't essential, well I suppose it
> > bears thinking about how to not slip as much work, but it doesn't
> > block.
> >
> > I think Documentation issues certainly can be a blocker and shouldn't
> > be specially ignored.
> >
> >
> > BTW the UISeleniumSuite issue is a real failure, but I do not think it
> > is serious: http://issues.apache.org/jira/browse/SPARK-6205  It isn't
> > a regression from 1.2.x, but only affects tests, and only affects a
> > subset of build profiles.
> >
> >
> >
> >
> > On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell 
> wrote:
> >> Hey Sean,
> >>
> >>> SPARK-5310 Update SQL programming guide for 1.3
> >>> SPARK-5183 Document data source API
> >>> SPARK-6128 Update Spark Streaming Guide for Spark 1.3
> >>
> >> For these, the issue is that they are documentation JIRA's, which
> >> don't need to be timed exactly with the release vote, since we can
> >> update the documentation on the website whenever we want. In the past
> >> I've just mentally filtered these out when considering RC's. I see a
> >> few options here:
> >>
> >> 1. We downgrade such issues away from Blocker (more clear, but we risk
> >> loosing them in the fray if they really are things we want to have
> >> before the release is posted).
> >> 2. We provide a filter to the community that excludes 'Documentation'
> >> issues and shows all other blockers for 1.3. We can put this on the
> >> wiki, for instance.
> >>
> >> Which do you prefer?
> >>
> >> - Patrick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
Sean,

The docs are distributed and consumed in a fundamentally different way
than Spark code itself. So we've always considered the "deadline" for
doc changes to be when the release is finally posted.

If there are small inconsistencies with the docs present in the source
code for that release tag, IMO that doesn't matter much since we don't
even distribute the docs with Spark's binary releases and virtually no
one builds and hosts the docs on their own (that I am aware of, at
least). Perhaps we can recommend if people want to build the doc
sources that they should always grab the head of the most recent
release branch, to set expectations accordingly.

In the past we haven't considered it worth holding up the release
process for the purpose of the docs. It just doesn't make sense since
they are consumed "as a service". If we decide to change this
convention, it would mean shipping our releases later, since we
could't pipeline the doc finalization with voting.

- Patrick

On Fri, Mar 6, 2015 at 11:02 AM, Sean Owen  wrote:
> Given the title and tagging, it sounds like there could be some
> must-have doc changes to go with what is being released as 1.3. It can
> be finished later, and published later, but then the docs source
> shipped with the release doesn't match the site, and until then, 1.3
> is released without some "must-have" docs for 1.3 on the site.
>
> The real question to me is: are there any further, absolutely
> essential doc changes that need to accompany 1.3 or not?
>
> If not, just resolve these. If there are, then it seems like the
> release has to block on them. If there are some docs that should have
> gone in for 1.3, but didn't, but aren't essential, well I suppose it
> bears thinking about how to not slip as much work, but it doesn't
> block.
>
> I think Documentation issues certainly can be a blocker and shouldn't
> be specially ignored.
>
>
> BTW the UISeleniumSuite issue is a real failure, but I do not think it
> is serious: http://issues.apache.org/jira/browse/SPARK-6205  It isn't
> a regression from 1.2.x, but only affects tests, and only affects a
> subset of build profiles.
>
>
>
>
> On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell  wrote:
>> Hey Sean,
>>
>>> SPARK-5310 Update SQL programming guide for 1.3
>>> SPARK-5183 Document data source API
>>> SPARK-6128 Update Spark Streaming Guide for Spark 1.3
>>
>> For these, the issue is that they are documentation JIRA's, which
>> don't need to be timed exactly with the release vote, since we can
>> update the documentation on the website whenever we want. In the past
>> I've just mentally filtered these out when considering RC's. I see a
>> few options here:
>>
>> 1. We downgrade such issues away from Blocker (more clear, but we risk
>> loosing them in the fray if they really are things we want to have
>> before the release is posted).
>> 2. We provide a filter to the community that excludes 'Documentation'
>> issues and shows all other blockers for 1.3. We can put this on the
>> wiki, for instance.
>>
>> Which do you prefer?
>>
>> - Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Krishna Sankar
+1 (non-binding, of course)

1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:55 min
 mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11
2. Tested pyspark, mlib - running as well as compare results with 1.1.x &
1.2.x
   pyspark works well with the new iPython 3.0.0 release
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Laso Regression OK
 Note: But MSE has increased from 40.81 (1.2.x) to 105.86 (1.3.0).
2.3. Decision Tree, Naive Bayes OK
2.4. KMeans OK
   Center And Scale OK
   Note : WSSSE has come down slightly
2.5. RDD operations OK
  State of the Union Texts - MapReduce, Filter,sortByKey (word count)
2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
   Model evaluation/optimization (rank, numIter, lambda) with itertools
OK
3. Scala - MLlib
3.1. statistics (min,max,mean,Pearson,Spearman) OK
3.2. LinearRegressionWithSGD OK
3.3. Decision Tree OK
3.4. KMeans OK
3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
4.0. Spark SQL from Python OK
4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") OK
5.0  Good work on introducing DataFrames. Didn’t test DataFrames. Will add
test cases for next release.

Cheers


On Thu, Mar 5, 2015 at 6:52 PM, Patrick Wendell  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.3.0!
>
> The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
>
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> Staging repositories for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1078
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/
>
> Please vote on releasing this package as Apache Spark 1.3.0!
>
> The vote is open until Monday, March 09, at 02:52 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == How does this compare to RC2 ==
> This release includes the following bug fixes:
>
> https://issues.apache.org/jira/browse/SPARK-6144
> https://issues.apache.org/jira/browse/SPARK-6171
> https://issues.apache.org/jira/browse/SPARK-5143
> https://issues.apache.org/jira/browse/SPARK-6182
> https://issues.apache.org/jira/browse/SPARK-6175
>
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.2 workload and running on this release candidate,
> then reporting any regressions.
>
> If you are happy with this release based on your own testing, give a +1
> vote.
>
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.3 QA period,
> so -1 votes should only occur for significant regressions from 1.2.1.
> Bugs already present in 1.2.X, minor regressions, or bugs related
> to new features will not block this release.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Marcelo Vanzin
+1 (non-binding, doc issues aside)

Ran batch of tests against yarn and standalone, including tests for
rc2 blockers, all looks fine.

On Thu, Mar 5, 2015 at 6:52 PM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.3.0!
>
> The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> Staging repositories for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1078
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/
>
> Please vote on releasing this package as Apache Spark 1.3.0!
>
> The vote is open until Monday, March 09, at 02:52 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == How does this compare to RC2 ==
> This release includes the following bug fixes:
>
> https://issues.apache.org/jira/browse/SPARK-6144
> https://issues.apache.org/jira/browse/SPARK-6171
> https://issues.apache.org/jira/browse/SPARK-5143
> https://issues.apache.org/jira/browse/SPARK-6182
> https://issues.apache.org/jira/browse/SPARK-6175
>
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.2 workload and running on this release candidate,
> then reporting any regressions.
>
> If you are happy with this release based on your own testing, give a +1 vote.
>
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.3 QA period,
> so -1 votes should only occur for significant regressions from 1.2.1.
> Bugs already present in 1.2.X, minor regressions, or bugs related
> to new features will not block this release.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>



-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Sean Owen
Given the title and tagging, it sounds like there could be some
must-have doc changes to go with what is being released as 1.3. It can
be finished later, and published later, but then the docs source
shipped with the release doesn't match the site, and until then, 1.3
is released without some "must-have" docs for 1.3 on the site.

The real question to me is: are there any further, absolutely
essential doc changes that need to accompany 1.3 or not?

If not, just resolve these. If there are, then it seems like the
release has to block on them. If there are some docs that should have
gone in for 1.3, but didn't, but aren't essential, well I suppose it
bears thinking about how to not slip as much work, but it doesn't
block.

I think Documentation issues certainly can be a blocker and shouldn't
be specially ignored.


BTW the UISeleniumSuite issue is a real failure, but I do not think it
is serious: http://issues.apache.org/jira/browse/SPARK-6205  It isn't
a regression from 1.2.x, but only affects tests, and only affects a
subset of build profiles.




On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell  wrote:
> Hey Sean,
>
>> SPARK-5310 Update SQL programming guide for 1.3
>> SPARK-5183 Document data source API
>> SPARK-6128 Update Spark Streaming Guide for Spark 1.3
>
> For these, the issue is that they are documentation JIRA's, which
> don't need to be timed exactly with the release vote, since we can
> update the documentation on the website whenever we want. In the past
> I've just mentally filtered these out when considering RC's. I see a
> few options here:
>
> 1. We downgrade such issues away from Blocker (more clear, but we risk
> loosing them in the fray if they really are things we want to have
> before the release is posted).
> 2. We provide a filter to the community that excludes 'Documentation'
> issues and shows all other blockers for 1.3. We can put this on the
> wiki, for instance.
>
> Which do you prefer?
>
> - Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
Hey Sean,

> SPARK-5310 Update SQL programming guide for 1.3
> SPARK-5183 Document data source API
> SPARK-6128 Update Spark Streaming Guide for Spark 1.3

For these, the issue is that they are documentation JIRA's, which
don't need to be timed exactly with the release vote, since we can
update the documentation on the website whenever we want. In the past
I've just mentally filtered these out when considering RC's. I see a
few options here:

1. We downgrade such issues away from Blocker (more clear, but we risk
loosing them in the fray if they really are things we want to have
before the release is posted).
2. We provide a filter to the community that excludes 'Documentation'
issues and shows all other blockers for 1.3. We can put this on the
wiki, for instance.

Which do you prefer?

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Sean Owen
There are still three JIRAs marked as blockers for 1.3.0:

SPARK-5310 Update SQL programming guide for 1.3
SPARK-5183 Document data source API
SPARK-6128 Update Spark Streaming Guide for Spark 1.3

As a matter of hygiene, let's either mark them resolved if they're
resolved, or push them / deprioritize them.


Signatures look good, source compiles with a Hadoop-2.6 + YARN +
Hive-flavored build, for me.


On OS X and Ubuntu, I still observe the same test failure as in the
first RC, but agree this isn't a blocker:

UISeleniumSuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal
  ...


On both, I also see a few Hive tests fail, like the following:

- udf_std *** FAILED ***
  Results do not match for udf_std:
  DESCRIBE FUNCTION EXTENDED std
  == Parsed Logical Plan ==
  HiveNativeCommand DESCRIBE FUNCTION EXTENDED std

  == Analyzed Logical Plan ==
  HiveNativeCommand DESCRIBE FUNCTION EXTENDED std

  == Optimized Logical Plan ==
  HiveNativeCommand DESCRIBE FUNCTION EXTENDED std

  == Physical Plan ==
  ExecutedCommand (HiveNativeCommand DESCRIBE FUNCTION EXTENDED std)

  Code Generation: false
  == RDD ==
  result
  !== HIVE - 2 row(s) == ==
CATALYST - 2 row(s) ==
   std(x) - Returns the standard deviation of a set of numbers
std(x) - Returns the standard deviation of a set of numbers
  !Synonyms: stddev_pop, stddev
Synonyms: stddev, stddev_pop (HiveComparisonTest.scala:384)


Before I give a +1 I wanted to see if anyone sees these test failures
too, and/or believes they're ignorable for some reason. I also want to
resolve the open blocker JIRAs.


On Fri, Mar 6, 2015 at 2:52 AM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.3.0!
>
> The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> Staging repositories for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1078
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/
>
> Please vote on releasing this package as Apache Spark 1.3.0!
>
> The vote is open until Monday, March 09, at 02:52 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == How does this compare to RC2 ==
> This release includes the following bug fixes:
>
> https://issues.apache.org/jira/browse/SPARK-6144
> https://issues.apache.org/jira/browse/SPARK-6171
> https://issues.apache.org/jira/browse/SPARK-5143
> https://issues.apache.org/jira/browse/SPARK-6182
> https://issues.apache.org/jira/browse/SPARK-6175
>
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.2 workload and running on this release candidate,
> then reporting any regressions.
>
> If you are happy with this release based on your own testing, give a +1 vote.
>
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.3 QA period,
> so -1 votes should only occur for significant regressions from 1.2.1.
> Bugs already present in 1.2.X, minor regressions, or bugs related
> to new features will not block this release.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
I'll kick it off with a +1.

On Thu, Mar 5, 2015 at 6:52 PM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.3.0!
>
> The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> Staging repositories for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1078
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/
>
> Please vote on releasing this package as Apache Spark 1.3.0!
>
> The vote is open until Monday, March 09, at 02:52 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == How does this compare to RC2 ==
> This release includes the following bug fixes:
>
> https://issues.apache.org/jira/browse/SPARK-6144
> https://issues.apache.org/jira/browse/SPARK-6171
> https://issues.apache.org/jira/browse/SPARK-5143
> https://issues.apache.org/jira/browse/SPARK-6182
> https://issues.apache.org/jira/browse/SPARK-6175
>
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.2 workload and running on this release candidate,
> then reporting any regressions.
>
> If you are happy with this release based on your own testing, give a +1 vote.
>
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.3 QA period,
> so -1 votes should only occur for significant regressions from 1.2.1.
> Bugs already present in 1.2.X, minor regressions, or bugs related
> to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-05 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.0!

The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.3.0-rc3/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

Staging repositories for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1078

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.3.0-rc3-docs/

Please vote on releasing this package as Apache Spark 1.3.0!

The vote is open until Monday, March 09, at 02:52 UTC and passes if
a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.3.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== How does this compare to RC2 ==
This release includes the following bug fixes:

https://issues.apache.org/jira/browse/SPARK-6144
https://issues.apache.org/jira/browse/SPARK-6171
https://issues.apache.org/jira/browse/SPARK-5143
https://issues.apache.org/jira/browse/SPARK-6182
https://issues.apache.org/jira/browse/SPARK-6175

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.2 workload and running on this release candidate,
then reporting any regressions.

If you are happy with this release based on your own testing, give a +1 vote.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.3 QA period,
so -1 votes should only occur for significant regressions from 1.2.1.
Bugs already present in 1.2.X, minor regressions, or bugs related
to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org