Re: Signal/Noise Ratio

2014-02-22 Thread Patrick Wendell
Hey Chris,

Would the following be consistent with the Apache guidelines?

(a) We establish a culture of not having overall design discussions on
github. Design discussions should to occur on JIRA or on the dev list.
IMO this is pretty much already true, but there are a few exceptions.
(b) We add a mailing list called github@s.a.o which receives the
github traffic. This way everything is available in Apache infra.
(c) Because of our use of JIRA it might make sense to have an
issues@s.a.o list as well similar to what YARN and other projects use.

The github chatter is so noisy that I think, overall, it decreases
engagement with the official developer list. This is the opposite of
what we want.

- Patrick

On Sat, Feb 22, 2014 at 11:34 AM, Mattmann, Chris A (3980)
chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Everyone,

 The biggest thing is simply making sure that the dev@projecta.o list is
 meaningful
 and that meaningful development isn't going on elsewhere that constitute
 decisions for the Apache project as reified in code contributions and
 overall
 stewardship of the effort.

 I noticed in a few emails from Github relating to comments on Github Pull
 Requests
 some conversation which I deemed to be relevant to the project, so I
 brought this
 up and it came up during graduation.

 Here's a general rule of thumb: it's fine if devs converse e.g., on
 Github, etc.,
 and even if it's project discussion *so long as* that relevant project
 discussion
 makes it way in some form to the actual, bona fide project's
 dev@projecta.o list,
 giving others in the community not necessarily on Github or watching
 Github or part
 of that non Apache conversation to comment, and be part of the community
 led decisions
 for the project there.

 Making its way to that bona fide Apache project dev list can happen in
 several ways.

 1. by simply direct 1:1 mapping from Github comments which I see Apache
 project
 related dev discussion on from time to time and believe fits the criteria
 I'm describing
 above to the project's dev@project.a.o list.

 2. by not 1:1 mapping all Github conversation to the dev@project.a.o
 list, but to
 some other list, e.g., github@projecta.o, for example (or any of the
 others being
 discussed) *so long as*, and this is key, that those discussions on Github
 get summarized
 on the dev@project.a.o list giving everyone an opportunity to
 participate in the development
 by being *here at Apache*.

 3. By not worrying about Github at all and simply doing all the
 development here at
 the ASF.

 4. Others..

 My feeling is that some combination of #1 and #2 can pass muster, and the
 Apache Spark
 community can decide. That said, noise reduction can also lead to loss of
 precision and
 accuracy and don't be surprised in reducing that noise if some key thing
 makes it onto
 a Github PR but didn't make it onto the dev list b/c we are all human and
 forgot to summarize
 it there. Even if that happens, we assume everyone has good intentions and
 we simply
 address those issues when/if they come up.

 Cheers,
 Chris




 -Original Message-
 From: Sandy Ryza sandy.r...@cloudera.com
 Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Date: Saturday, February 22, 2014 11:19 AM
 To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Subject: Re: Signal/Noise Ratio

Hadoop subprojects (MR, YARN, HDFS) each have a dev list that contains
discussion as well as a single email whenever a JIRA is filed, and an
issues list with all the JIRA activity.  I think this works out pretty
well.  Subscribing just to the dev list, I can keep up with changes that
are going to be made and follow the ones I care about.  And the issues
list
is there if I want the firehose.

Is Apache actually prescriptive that a list with dev in its name needs
to
contain all discussion?  If so, most projects I've followed are violating
this.


On Fri, Feb 21, 2014 at 7:54 PM, Kay Ousterhout
k...@eecs.berkeley.eduwrote:

 It looks like there's at least one other apache project, jclouds, that
 sends the github notifications to a separate notifications@ list (see


http://mail-archives.apache.org/mod_mbox/incubator-general/201402.mbox/%3
C1391721862.67613.YahooMailNeo%40web172602.mail.ir2.yahoo.com%3E
 ).
  Given that many people are annoyed by getting the messages on this
list,
 and that there is some precedent for sending them to a different list,
I'd
 be in favor of doing that.


 On Fri, Feb 21, 2014 at 6:18 PM, Mattmann, Chris A (3980) 
 chris.a.mattm...@jpl.nasa.gov wrote:

  Sweet great job Reynold.
 
  ++
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398)
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 171-283, Mailstop: 171-246
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
  ++
  

Re: Signal/Noise Ratio

2014-02-22 Thread Patrick Wendell
btw - I'd prefer reviews@s.a.o instead of github@ to remain more
neutral and flexible.

On Sat, Feb 22, 2014 at 12:35 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Chris,

 Would the following be consistent with the Apache guidelines?

 (a) We establish a culture of not having overall design discussions on
 github. Design discussions should to occur on JIRA or on the dev list.
 IMO this is pretty much already true, but there are a few exceptions.
 (b) We add a mailing list called github@s.a.o which receives the
 github traffic. This way everything is available in Apache infra.
 (c) Because of our use of JIRA it might make sense to have an
 issues@s.a.o list as well similar to what YARN and other projects use.

 The github chatter is so noisy that I think, overall, it decreases
 engagement with the official developer list. This is the opposite of
 what we want.

 - Patrick

 On Sat, Feb 22, 2014 at 11:34 AM, Mattmann, Chris A (3980)
 chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Everyone,

 The biggest thing is simply making sure that the dev@projecta.o list is
 meaningful
 and that meaningful development isn't going on elsewhere that constitute
 decisions for the Apache project as reified in code contributions and
 overall
 stewardship of the effort.

 I noticed in a few emails from Github relating to comments on Github Pull
 Requests
 some conversation which I deemed to be relevant to the project, so I
 brought this
 up and it came up during graduation.

 Here's a general rule of thumb: it's fine if devs converse e.g., on
 Github, etc.,
 and even if it's project discussion *so long as* that relevant project
 discussion
 makes it way in some form to the actual, bona fide project's
 dev@projecta.o list,
 giving others in the community not necessarily on Github or watching
 Github or part
 of that non Apache conversation to comment, and be part of the community
 led decisions
 for the project there.

 Making its way to that bona fide Apache project dev list can happen in
 several ways.

 1. by simply direct 1:1 mapping from Github comments which I see Apache
 project
 related dev discussion on from time to time and believe fits the criteria
 I'm describing
 above to the project's dev@project.a.o list.

 2. by not 1:1 mapping all Github conversation to the dev@project.a.o
 list, but to
 some other list, e.g., github@projecta.o, for example (or any of the
 others being
 discussed) *so long as*, and this is key, that those discussions on Github
 get summarized
 on the dev@project.a.o list giving everyone an opportunity to
 participate in the development
 by being *here at Apache*.

 3. By not worrying about Github at all and simply doing all the
 development here at
 the ASF.

 4. Others..

 My feeling is that some combination of #1 and #2 can pass muster, and the
 Apache Spark
 community can decide. That said, noise reduction can also lead to loss of
 precision and
 accuracy and don't be surprised in reducing that noise if some key thing
 makes it onto
 a Github PR but didn't make it onto the dev list b/c we are all human and
 forgot to summarize
 it there. Even if that happens, we assume everyone has good intentions and
 we simply
 address those issues when/if they come up.

 Cheers,
 Chris




 -Original Message-
 From: Sandy Ryza sandy.r...@cloudera.com
 Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Date: Saturday, February 22, 2014 11:19 AM
 To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Subject: Re: Signal/Noise Ratio

Hadoop subprojects (MR, YARN, HDFS) each have a dev list that contains
discussion as well as a single email whenever a JIRA is filed, and an
issues list with all the JIRA activity.  I think this works out pretty
well.  Subscribing just to the dev list, I can keep up with changes that
are going to be made and follow the ones I care about.  And the issues
list
is there if I want the firehose.

Is Apache actually prescriptive that a list with dev in its name needs
to
contain all discussion?  If so, most projects I've followed are violating
this.


On Fri, Feb 21, 2014 at 7:54 PM, Kay Ousterhout
k...@eecs.berkeley.eduwrote:

 It looks like there's at least one other apache project, jclouds, that
 sends the github notifications to a separate notifications@ list (see


http://mail-archives.apache.org/mod_mbox/incubator-general/201402.mbox/%3
C1391721862.67613.YahooMailNeo%40web172602.mail.ir2.yahoo.com%3E
 ).
  Given that many people are annoyed by getting the messages on this
list,
 and that there is some precedent for sending them to a different list,
I'd
 be in favor of doing that.


 On Fri, Feb 21, 2014 at 6:18 PM, Mattmann, Chris A (3980) 
 chris.a.mattm...@jpl.nasa.gov wrote:

  Sweet great job Reynold.
 
  ++
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398)
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

Re: Signal/Noise Ratio

2014-02-22 Thread Patrick Wendell
Hey All,

I created a JIRA to ask infra to create a dedicated reviews@ mailing
list for this purpose.

https://issues.apache.org/jira/browse/INFRA-7368

Hopefully they can migrate the github stream to this list so that
people can distinguish it from developer discussions. In parallel, we
are also trying to see if we can use the github status notifier rather
than the constant comments from jenkins.

- Patrick

On Sat, Feb 22, 2014 at 1:04 PM, Mattmann, Chris A (3980)
chris.a.mattm...@jpl.nasa.gov wrote:
 Patrick, +1 to the below. Great summary and yes I think that would
 work great.

 Cheers,
 Chris

 ++
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398)
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-283, Mailstop: 171-246
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Associate Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++






 -Original Message-
 From: Patrick Wendell pwend...@gmail.com
 Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Date: Saturday, February 22, 2014 12:35 PM
 To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Subject: Re: Signal/Noise Ratio

Hey Chris,

Would the following be consistent with the Apache guidelines?

(a) We establish a culture of not having overall design discussions on
github. Design discussions should to occur on JIRA or on the dev list.
IMO this is pretty much already true, but there are a few exceptions.
(b) We add a mailing list called github@s.a.o which receives the
github traffic. This way everything is available in Apache infra.
(c) Because of our use of JIRA it might make sense to have an
issues@s.a.o list as well similar to what YARN and other projects use.

The github chatter is so noisy that I think, overall, it decreases
engagement with the official developer list. This is the opposite of
what we want.

- Patrick

On Sat, Feb 22, 2014 at 11:34 AM, Mattmann, Chris A (3980)
chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Everyone,

 The biggest thing is simply making sure that the dev@projecta.o list
is
 meaningful
 and that meaningful development isn't going on elsewhere that constitute
 decisions for the Apache project as reified in code contributions and
 overall
 stewardship of the effort.

 I noticed in a few emails from Github relating to comments on Github
Pull
 Requests
 some conversation which I deemed to be relevant to the project, so I
 brought this
 up and it came up during graduation.

 Here's a general rule of thumb: it's fine if devs converse e.g., on
 Github, etc.,
 and even if it's project discussion *so long as* that relevant project
 discussion
 makes it way in some form to the actual, bona fide project's
 dev@projecta.o list,
 giving others in the community not necessarily on Github or watching
 Github or part
 of that non Apache conversation to comment, and be part of the community
 led decisions
 for the project there.

 Making its way to that bona fide Apache project dev list can happen in
 several ways.

 1. by simply direct 1:1 mapping from Github comments which I see Apache
 project
 related dev discussion on from time to time and believe fits the
criteria
 I'm describing
 above to the project's dev@project.a.o list.

 2. by not 1:1 mapping all Github conversation to the dev@project.a.o
 list, but to
 some other list, e.g., github@projecta.o, for example (or any of the
 others being
 discussed) *so long as*, and this is key, that those discussions on
Github
 get summarized
 on the dev@project.a.o list giving everyone an opportunity to
 participate in the development
 by being *here at Apache*.

 3. By not worrying about Github at all and simply doing all the
 development here at
 the ASF.

 4. Others..

 My feeling is that some combination of #1 and #2 can pass muster, and
the
 Apache Spark
 community can decide. That said, noise reduction can also lead to loss
of
 precision and
 accuracy and don't be surprised in reducing that noise if some key thing
 makes it onto
 a Github PR but didn't make it onto the dev list b/c we are all human
and
 forgot to summarize
 it there. Even if that happens, we assume everyone has good intentions
and
 we simply
 address those issues when/if they come up.

 Cheers,
 Chris




 -Original Message-
 From: Sandy Ryza sandy.r...@cloudera.com
 Reply-To: dev@spark.incubator.apache.org
dev@spark.incubator.apache.org
 Date: Saturday, February 22, 2014 11:19 AM
 To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Subject: Re: Signal/Noise Ratio

Hadoop subprojects (MR, YARN, HDFS) each have a dev list that contains
discussion as well as a single email whenever a JIRA is filed, and an
issues

Re: Request to review PR #605

2014-02-22 Thread Patrick Wendell
Hey Punya,

It's sufficient to just ping the request on github rather than e-mail
the dev list. Sometimes it can takes a few days for people to get to
looking at patches...

- Patrick

On Sat, Feb 22, 2014 at 5:17 PM, Punya Biswal pbis...@palantir.com wrote:
 Hi all,

 Can someone review and/or merge PR #605 (convert or move Java code)? It's
 been sitting for four days.

 Thanks!
 Punya




Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-21 Thread Patrick Wendell
Hey Everyone,

We are going to publish artifacts to maven central in the exact same
format no matter which build system we use.

For normal consumers of Spark {maven vs sbt} won't make a difference.
It will make a difference for people who are extended the Spark build
to do their own packaging. This is what I'm trying to gauge - does
anyone do this in a way where they feel only maven or only sbt
supports their particular issue.

- Patrick

On Fri, Feb 21, 2014 at 12:40 AM, Pascal Voitot Dev
pascal.voitot@gmail.com wrote:
 Hi,

 My small contrib to the discussion.
 SBT is able to publish Maven artifacts generating the POM and all JAR 
 signed files.
 So even if not in the project, a Pom can be found somewhere.

 Pascal



 On Fri, Feb 21, 2014 at 9:28 AM, Paul Brown p...@mult.ifario.us wrote:

 As a customer of the code, I don't care *how* the code gets built, but it
 is important to me that the Maven artifacts (POM files, binaries, sources,
 javadocs) are clean, accurate, up to date, and published on Maven Central.

 Some examples where structure/publishing failures have been bad for users:

 - For a long time (and perhaps still), Solr and Lucene were built by an Ant
 build that produced incorrect POMs and required potential developers to
 manually configure their IDEs.

 - For a long time (and perhaps still), Pig was built by Ant, published
 incorrect POMs, and failed to publish useful auxiliary artifacts like
 PigUnit and the PiggyBank as Maven-addressable artifacts.  (That said,
 thanks to Spark, we no longer use Pig...)

 - For a long time (and perhaps still), Cassandra depended on
 non-generally-available libraries (high-scale, etc.) that made it
 inconvenient to embed Cassandra in a larger system.  Cassandra gets a
 little slack because the build/structure was almost too terrible to look at
 prior to incubation and it's gotten better...

 And those are just a few projects at Apache that come to mind; I could make
 a longish list of offenders.

 btw, among other things that the Spark project probably *should* do would
 be to publish artifacts with a classifier to distinguish the Hadoop version
 linked against.

 I'll be a happy user of sbt-built artifacts, or if the project goes/sticks
 with Maven I'm more than willing to help answer questions or provide PRs
 for stickier items around assemblies, multiple artifacts, etc.


 --
 p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


 On Thu, Feb 20, 2014 at 11:56 PM, Sean Owen so...@cloudera.com wrote:

  Two builds is indeed a pain, since it's an ongoing chore to keep them
  in sync. For example, I am already seeing that the two do not quite
  declare the same dependencies (see recent patch).
 
  I think publishing artifacts to Maven central should be considered a
  hard requirement if it isn't already one from the ASF, and it may be?
  Certainly most people out there would be shocked if you told them
  Spark is not in the repo at all. And that requires at least
  maintaining a pom that declares the structure of the project.
 
  This does not necessarily mean using Maven to build, but is a reason
  that removing the pom is going to make this a lot harder for people to
  consume as a project.
 
  Maven has its pros and cons but there are plenty of people lurking
  around who know it quite well. Certainly it's easier for the Hadoop
  people to understand and work with. On the other hand, it supports
  Scala although only via a plugin, which is weaker support. sbt seems
  like a fairly new, basic, ad-hoc tool. Is there an advantage to it,
  other than being Scala (which is an advantage)?
 
  --
  Sean Owen | Director, Data Science | London
 
 
  On Fri, Feb 21, 2014 at 4:03 AM, Patrick Wendell pwend...@gmail.com
  wrote:
   Hey All,
  
   It's very high overhead having two build systems in Spark. Before
   getting into a long discussion about the merits of sbt vs maven, I
   wanted to pose a simple question to the dev list:
  
   Is there anyone who feels that dropping either sbt or maven would have
   a major consequence for them?
  
   And I say major consequence meaning something becomes completely
   impossible now and can't be worked around. This is different from an
   inconvenience, i.e., something which can be worked around but will
   require some investment.
  
   I'm posing the question in this way because, if there are features in
   either build system that are absolutely-un-available in the other,
   then we'll have to maintain both for the time being. I'm merely trying
   to see whether this is the case...
  
   - Patrick
 



Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-21 Thread Patrick Wendell
Kos - thanks for chiming in. Could you be more specific about what is
available in maven and not in sbt for these issues? I took a look at
the bigtop code relating to Spark. As far as I could tell [1] was the
main point of integration with the build system (maybe there are other
integration points)?

   - in order to integrate Spark well into existing Hadoop stack it was
 necessary to have a way to avoid transitive dependencies duplications and
 possible conflicts.

 E.g. Maven assembly allows us to avoid adding _all_ Hadoop libs and later
 merely declare Spark package dependency on standard Bigtop Hadoop
 packages. And yes - Bigtop packaging means the naming and layout would be
 standard across all commercial Hadoop distributions that are worth
 mentioning: ASF Bigtop convenience binary packages, and Cloudera or
 Hortonworks packages. Hence, the downstream user doesn't need to spend any
 effort to make sure that Spark clicks-in properly.

The sbt build also allows you to plug in a Hadoop version similar to
the maven build.


   - Maven provides a relatively easy way to deal with the jar-hell problem,
 although the original maven build was just Shader'ing everything into a
 huge lump of class files. Oftentimes ending up with classes slamming on
 top of each other from different transitive dependencies.

AFIAK we are only using the shade plug-in to deal with conflict
resolution in the assembly jar. These are dealt with in sbt via the
sbt assembly plug-in in an identical way. Is there a difference?

[1] 
https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master


Re: Planned 0.9.1 release

2014-02-21 Thread Patrick Wendell
We back port bug fixes into the 0.9 branch as they come in, so if
there is a particular fix you want to get you can always build from
the head of branch-0.9 and expect only stability improvements compared
with Spark 0.9.0.

The timing of the maintenance releases depends a bit on what bug fixes
come in and their importance. I'm thinking we should propose a release
pretty soon (order weeks) since there are some valuable bug fixes that
came in this week.

- Patrick

On Fri, Feb 21, 2014 at 2:22 PM, Gary Malouf malouf.g...@gmail.com wrote:
 My team has avoided upgrading to 0.9 to this point because of the Mesos bug
 that has since been fixed in master.  For ease of tracking, we are trying
 to only use tagged releases going forward as long as they will continue to
 be frequent or become more stable over time.

 Is there any timeline on cutting a tag for the 0.9.1 bug fix release?


[DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-20 Thread Patrick Wendell
Hey All,

It's very high overhead having two build systems in Spark. Before
getting into a long discussion about the merits of sbt vs maven, I
wanted to pose a simple question to the dev list:

Is there anyone who feels that dropping either sbt or maven would have
a major consequence for them?

And I say major consequence meaning something becomes completely
impossible now and can't be worked around. This is different from an
inconvenience, i.e., something which can be worked around but will
require some investment.

I'm posing the question in this way because, if there are features in
either build system that are absolutely-un-available in the other,
then we'll have to maintain both for the time being. I'm merely trying
to see whether this is the case...

- Patrick


Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-20 Thread Patrick Wendell
Hey Henry,

Yep, I wanted to reboot this since some time has passed and people may
have new or changed ways of using the build.

Maven makes the Apache publishing fairly seamless, but after the last
two releases I believe we could make it work with sbt as well. sbt
also supports publishing and other Apache projects such as Kafka
publish with sbt.

On Thu, Feb 20, 2014 at 8:50 PM, Henry Saputra henry.sapu...@gmail.com wrote:
 Thanks for bringing back the build systems discussions, Patrick.
 There was a long discussion way back before Spark joining ASF and as I
 remember there has not been clear winner between using sbt or maven.

 Maven makes it easier to publish the artifacts to Nexus repository,
 not sure if sbt can do  the same, and as I remember one of the
 limitations or drawbacks about maven is the use of profiles.
 Matei had suggested using some kind of Hadoop client detection as in
 Parquet project to manage the Hadoop versions to avoid profiles.


 - Henry

 On Thu, Feb 20, 2014 at 8:03 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey All,

 It's very high overhead having two build systems in Spark. Before
 getting into a long discussion about the merits of sbt vs maven, I
 wanted to pose a simple question to the dev list:

 Is there anyone who feels that dropping either sbt or maven would have
 a major consequence for them?

 And I say major consequence meaning something becomes completely
 impossible now and can't be worked around. This is different from an
 inconvenience, i.e., something which can be worked around but will
 require some investment.

 I'm posing the question in this way because, if there are features in
 either build system that are absolutely-un-available in the other,
 then we'll have to maintain both for the time being. I'm merely trying
 to see whether this is the case...

 - Patrick


Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Patrick Wendell
+1 overall.

Christopher - I agree that once the number of rules becomes large it's
more efficient to pursue a use your judgement approach. However,
since this is only 3 cases I'd prefer to wait to see if it grows.

The concern with this approach is that for newer people, contributors,
etc it's hard for them to understand what good judgement is. Many are
new to scala, so explicit rules are generally better.

- Patrick

On Wed, Feb 19, 2014 at 12:19 AM, Reynold Xin r...@databricks.com wrote:
 Yes, the case you brought up is not a matter of readability or style. If it
 returns a different type, it should be declared (otherwise it is just
 wrong).


 On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan mri...@gmail.comwrote:

 You are right.
 A degenerate case would be :

 def createFoo = new FooImpl()

 vs

 def createFoo: Foo = new FooImpl()

 Former will cause api instability. Reynold, maybe this is already
 avoided - and I understood it wrong ?

 Thanks,
 Mridul



 On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com
 wrote:
  Mridul, IIUUC, what you've mentioned did come to mind, but I deemed it
  orthogonal to the stylistic issue Reynold is talking about.
 
  I believe you're referring to the case where there is a specific desired
  return type by API design, but the implementation does not, in which
 case,
  of course, one must define the return type. That's an API requirement and
  not just a matter of readability.
 
  We could add this as an NB in the proposed guideline.
 
  --
  Christopher T. Nguyen
  Co-founder  CEO, Adatao http://adatao.com
  linkedin.com/in/ctnguyen
 
 
 
  On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com
 wrote:
 
  +1 Christopher's suggestion.
 
  Mridul,
 
  How would that happen? Case 3 requires the method to be invoking the
  constructor directly. It was implicit in my email, but the return type
  should be the same as the class itself.
 
 
 
 
  On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan mri...@gmail.com
  wrote:
 
   Case 3 can be a potential issue.
   Current implementation might be returning a concrete class which we
   might want to change later - making it a type change.
   The intention might be to return an RDD (for example), but the
   inferred type might be a subclass of RDD - and future changes will
   cause signature change.
  
  
   Regards,
   Mridul
  
  
   On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin r...@databricks.com
  wrote:
Hi guys,
   
Want to bring to the table this issue to see what other members of
 the
community think and then we can codify it in the Spark coding style
   guide.
The topic is about declaring return types explicitly in public APIs.
   
In general I think we should favor explicit type declaration in
 public
APIs. However, I do think there are 3 cases we can avoid the public
 API
definition because in these 3 cases the types are self-evident 
   repetitive.
   
Case 1. toString
   
Case 2. A method returning a string or a val defining a string
   
def name = abcd // this is so obvious that it is a string
val name = edfg // this too
   
Case 3. The method or variable is invoking the constructor of a
 class
  and
return that immediately. For example:
   
val a = new SparkContext(...)
implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]) = new
AsyncRDDActions(rdd)
   
   
Thoughts?
  
 



Re: Adding my wiki user id (hsaputra) as contributors in Apache Spark confluence wiki space

2014-02-13 Thread Patrick Wendell
Hey Henry,

Ya unfortunately I have no idea how to do this!

On Thu, Feb 13, 2014 at 9:54 AM, Mayur Rustagi mayur.rust...@gmail.com wrote:
 I can help out here as well. I am trying to develop docs around setting up
 Spark, Streaming and Shark, currently doing it on my wiki (
 docs.sigmoidanalytics.com). Would love to contribute.
 Regards
 Mayur

 Mayur Rustagi
 Ph: +919632149971
 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com
 https://twitter.com/mayur_rustagi



 On Thu, Feb 13, 2014 at 8:28 AM, Henry Saputra henry.sapu...@gmail.comwrote:

 HI Andy,

 Could you or someone with space admin role in the Spark wiki [1]
 kindly help to add my userid hsaputra as collaborators to edit/ add
 new content in the Spark wiki space?

 I believe Andy's userid  was granted the space admin to the wiki.

 Thank you,

 - Henry

 [1]  https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage



Re: [GitHub] incubator-spark pull request: SPARK-1078: Replace lift-json with j...

2014-02-11 Thread Patrick Wendell
I think Aaron just meant 1.0.0 by the next minor release.

On Tue, Feb 11, 2014 at 7:56 PM, Mark Hamstra m...@clearstorydata.com wrote:

 The situation sounds fine for the next minor release...


 I don't understand what you mean by this.  According to my current
 understanding, the next release of Spark other than maintenance releases on
 0.9.x is intended to be a major release, 1.0.0, and there are no plans for
 an intervening minor release, which would be 0.10.0.  Thus the next minor
 release would be 1.1.0, and I fail to see why we would wait for that
 instead of putting the dependency change (assuming that it is something
 that we do, indeed, want) in 1.0.0.



 On Tue, Feb 11, 2014 at 7:51 PM, aarondav g...@git.apache.org wrote:

 Github user aarondav commented on the pull request:


 https://github.com/apache/incubator-spark/pull/582#issuecomment-34836430

 Thanks for looking into it! The situation sounds fine for the next
 minor release, and I don't think this patch needs to be included in the
 next maintenance release anyway (following your very own [suggestion](
 http://mail-archives.apache.org/mod_mbox/spark-dev/201402.mbox/browser)
 on the dev list).

 While this patch looks good to me, I am not sure I fully understand
 the need for it. I posted my question on the [dev list thread](
 http://mail-archives.apache.org/mod_mbox/spark-dev/201402.mbox/%3C945190638.685798.1391974088596.JavaMail.zimbra%40redhat.com%3E).
 Besides the dependency change, you also mention performance improvements.
 [This benchmark](
 http://engineering.ooyala.com/blog/comparing-scala-json-libraries) does
 show Jackson outperforming lift on a particular workload, but do you have
 another source showing how the relative performance changes with input size?




Re: Github merge script

2014-02-10 Thread Patrick Wendell
Hey Andrew,

The intent was to be consistent with the way the merge messages look
before. But I agree it obfuscates the commit messages from the user
and hides them further down.

I think your proposal is good, but it might be better to use the title
of their pull request message rather than the first line of the most
recent commit in their branch (not sure what you meant by commit
message).

Maybe you could submit a pull request for this? The script we use to
merge things is in dev/merge_spark_pr.py.

Another nice thing is if people are formatting their titles with
jira's then it will all look nice and pretty... which is kind of the
goal.

- Patrick

On Sun, Feb 9, 2014 at 11:55 PM, Andrew Ash and...@andrewash.com wrote:
 The current script for merging a GitHub PR squashes the commits and sticks a
 Merge pull request #123 from abc/def at the top of the commit message.
 However this obscures the original commit message when doing a short gitlog
 (first line only) so the recent history is much less meaningful than before.

 Compare recent history A:

 * 919bd7f Prashant Sharma 86 minutes ago  (origin/master, origin/HEAD)Merge
 pull request #567 from ScrapCodes/style2.
 * 2182aa3 Martin Jaggi 8 hours ago Merge pull request #566 from
 martinjaggi/copy-MLlib-d.
 * afc8f3c qqsun8819 10 hours ago Merge pull request #551 from
 qqsun8819/json-protocol.
 * 94ccf86 Patrick Wendell 10 hours ago Merge pull request #569 from
 pwendell/merge-fixes.
 * b69f8b2 Patrick Wendell 14 hours ago Merge pull request #557 from
 ScrapCodes/style. Closes #557.
 * b6dba10 CodingCat 24 hours ago Merge pull request #556 from
 CodingCat/JettyUtil. Closes #556.
 | * de22abc jyotiska 24 hours ago  (origin/branch-0.9)Merge pull request
 #562 from jyotiska/master. Closes #562.
 * | 2ef37c9 jyotiska 24 hours ago Merge pull request #562 from
 jyotiska/master. Closes #562.
 | * 2e3d1c3 Patrick Wendell 24 hours ago Merge pull request #560 from
 pwendell/logging. Closes #560.
 * | b6d40b7 Patrick Wendell 24 hours ago Merge pull request #560 from
 pwendell/logging. Closes #560.
 * | f892da8 Patrick Wendell 25 hours ago Merge pull request #565 from
 pwendell/dev-scripts. Closes #565.
 * | c2341c9 Mark Hamstra 32 hours ago Merge pull request #542 from
 markhamstra/versionBump. Closes #542.
 | * 22e0a3b Qiuzhuang Lian 35 hours ago Merge pull request #561 from
 Qiuzhuang/master. Closes #561.
 * | f0ce736 Qiuzhuang Lian 35 hours ago Merge pull request #561 from
 Qiuzhuang/master. Closes #561.
 * | 7805080 Jey Kottalam 35 hours ago Merge pull request #454 from
 jey/atomic-sbt-download. Closes #454.
 * | fabf174 Martin Jaggi 2 days ago Merge pull request #552 from
 martinjaggi/master. Closes #552.
 * | 3a9d82c Andrew Ash 3 days ago Merge pull request #506 from
 ash211/intersection. Closes #506.
 | * ce179f6 Andrew Or 3 days ago Merge pull request #533 from
 andrewor14/master. Closes #533.


 To B:

 If you go back some time in history, you get a much more branched history,
 like this:

 | * | | | | | | | | 0984647 Patrick Wendell 4 weeks ago Enable compression
 by default for spills
 |/ / / / / / / / /
 | * | | | | | | | 4e497db Tathagata Das 4 weeks ago Removed
 StreamingContext.registerInputStream and registerOutputStream - they were
 useless as InputDStream has been made to register itself. Also made DS
 * | | | | | | | |   fdaabdc Patrick Wendell 4 weeks ago Merge pull request
 #380 from mateiz/py-bayes
 |\ \ \ \ \ \ \ \ \
 | | | | | * | | | | c2852cf Frank Dai 4 weeks ago Indent two spaces
 * | | | | | | | | |   4a805af Patrick Wendell 4 weeks ago Merge pull request
 #367 from ankurdave/graphx
 |\ \ \ \ \ \ \ \ \ \
 | * | | | | | | | | | 80e73ed Joseph E. Gonzalez 4 weeks ago Adding minimal
 additional functionality to EdgeRDD
 * | | | | | | | | | |   945fe7a Patrick Wendell 4 weeks ago Merge pull
 request #408 from pwendell/external-serializers
 |\ \ \ \ \ \ \ \ \ \ \
 | | * | | | | | | | | | 4bafc4f Joseph E. Gonzalez 4 weeks ago adding
 documentation about EdgeRDD
 * | | | | | | | | | | |   68641bc Patrick Wendell 4 weeks ago Merge pull
 request #413 from rxin/scaladoc
 |\ \ \ \ \ \ \ \ \ \ \ \
 | | | | | | | | * | | | | 12386b3 Frank Dai 4 weeks ago Since getLong() and
 getInt() have side effect, get back parentheses, and remove an empty line
 | | | | | | | | * | | | | 0d94d74 Frank Dai 4 weeks ago Code clean up for
 mllib
 * | | | | | | | | | | | |   0ca0d4d Patrick Wendell 4 weeks ago Merge pull
 request #401 from andrewor14/master
 |\ \ \ \ \ \ \ \ \ \ \ \ \
 | | | | * | | | | | | | | | af645be Ankur Dave 4 weeks ago Fix all code
 examples in guide
 | | | | * | | | | | | | | | 2cd9358 Ankur Dave 4 weeks ago Finish
 6f6f8c928ce493357d4d32e46971c5e401682ea8
 * | | | | | | | | | | | | |   08b9fec Patrick Wendell 4 weeks ago Merge pull
 request #409 from tdas/unpersist

 Ignoring the merge commits here, the commit messages are much better here
 than in the current setup because they're what the original author wrote.
 Not a pretty generic

Re: [VOTE] Graduation of Apache Spark from the Incubator

2014-02-10 Thread Patrick Wendell
+1

To clarify to others, this is an IPCM vote so only the IPCM votes are binding :)

On Mon, Feb 10, 2014 at 10:02 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
 +1


 On Mon, Feb 10, 2014 at 9:57 PM, Mark Hamstra m...@clearstorydata.comwrote:

 +1


 On Mon, Feb 10, 2014 at 8:27 PM, Chris Mattmann mattm...@apache.org
 wrote:

  Hi Everyone,
 
  This is a new VOTE to decide if Apache Spark should graduate
  from the Incubator. Please VOTE on the resolution pasted below
  the ballot. I'll leave this VOTE open for at least 72 hours.
 
  Thanks!
 
  [ ] +1 Graduate Apache Spark from the Incubator.
  [ ] +0 Don't care.
  [ ] -1 Don't graduate Apache Spark from the Incubator because..
 
  Here is my +1 binding for graduation.
 
  Cheers,
  Chris
 
   snip
 
  WHEREAS, the Board of Directors deems it to be in the best
  interests of the Foundation and consistent with the
  Foundation's purpose to establish a Project Management
  Committee charged with the creation and maintenance of
  open-source software, for distribution at no charge to the
  public, related to fast and flexible large-scale data analysis
  on clusters.
 
  NOW, THEREFORE, BE IT RESOLVED, that a Project Management
  Committee (PMC), to be known as the Apache Spark Project, be
  and hereby is established pursuant to Bylaws of the Foundation;
  and be it further
 
  RESOLVED, that the Apache Spark Project be and hereby is
  responsible for the creation and maintenance of software
  related to fast and flexible large-scale data analysis
  on clusters; and be it further RESOLVED, that the office
  of Vice President, Apache Spark be and hereby is created,
  the person holding such office to serve at the direction of
  the Board of Directors as the chair of the Apache Spark
  Project, and to have primary responsibility for management
  of the projects within the scope of responsibility
  of the Apache Spark Project; and be it further
  RESOLVED, that the persons listed immediately below be and
  hereby are appointed to serve as the initial members of the
  Apache Spark Project:
 
  * Mosharaf Chowdhury mosha...@apache.org
  * Jason Dai jason...@apache.org
  * Tathagata Das t...@apache.org
  * Ankur Dave ankurd...@apache.org
  * Aaron Davidson a...@apache.org
  * Thomas Dudziak to...@apache.org
  * Robert Evans bo...@apache.org
  * Thomas Graves tgra...@apache.org
  * Andy Konwinski and...@apache.org
  * Stephen Haberman steph...@apache.org
  * Mark Hamstra markhams...@apache.org
  * Shane Huang shane_hu...@apache.org
  * Ryan LeCompte ryanlecom...@apache.org
  * Haoyuan Li haoy...@apache.org
  * Sean McNamara mcnam...@apache.org
  * Mridul Muralidharam mridul...@apache.org
  * Kay Ousterhout kayousterh...@apache.org
  * Nick Pentreath mln...@apache.org
  * Imran Rashid iras...@apache.org
  * Charles Reiss wog...@apache.org
  * Josh Rosen joshro...@apache.org
  * Prashant Sharma prash...@apache.org
  * Ram Sriharsha har...@apache.org
  * Shivaram Venkataraman shiva...@apache.org
  * Patrick Wendell pwend...@apache.org
  * Andrew Xia xiajunl...@apache.org
  * Reynold Xin r...@apache.org
  * Matei Zaharia ma...@apache.org
 
  NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matei Zaharia be
  appointed to the office of Vice President, Apache Spark, to
  serve in accordance with and subject to the direction of the
  Board of Directors and the Bylaws of the Foundation until
  death, resignation, retirement, removal or disqualification, or
  until a successor is appointed; and be it further
 
  RESOLVED, that the Apache Spark Project be and hereby is
  tasked with the migration and rationalization of the Apache
  Incubator Spark podling; and be it further
 
  RESOLVED, that all responsibilities pertaining to the Apache
  Incubator Spark podling encumbered upon the Apache Incubator
  Project are hereafter discharged.
 
  
 
 
 
 



Re: [TODO] Document the release process for Apache Spark

2014-02-09 Thread Patrick Wendell
Done, thanks. Feel free to edit it directly as well :)

On Sat, Feb 8, 2014 at 11:28 PM, Henry Saputra henry.sapu...@gmail.com wrote:
 Cool! Thanks Patrick.

 Looks good to me. Just one small recommendation about Get Access to
 Apache Nexus for Publishing Artifacts, as I remember you need to file
 INFRA ticket for your Apache id [1] to get it?

 If it is then probably good idea to add it to the wiki.

 - Henry


 [1] https://issues.apache.org/jira


 On Sat, Feb 8, 2014 at 9:42 PM, Patrick Wendell pwend...@gmail.com wrote:
 I ported the release docs to the wiki today. Thanks for reminding me
 about this Henry:

 https://cwiki.apache.org/confluence/display/SPARK/Preparing+Spark+Releases

 - Patrick

 On Fri, Feb 7, 2014 at 11:51 AM, Henry Saputra henry.sapu...@gmail.com 
 wrote:
 Cool, Thanks Patrick! Really appreciate it =)

 - Henry

 On Fri, Feb 7, 2014 at 11:46 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Henry,

 Let me document this on the wiki. I've already keep pretty thorough
 docs on this I just need to migrate them to the wiki. I've created a
 JIRA here:

 https://spark-project.atlassian.net/browse/SPARK-1066

 - Patrick

 On Fri, Feb 7, 2014 at 11:35 AM, Henry Saputra henry.sapu...@gmail.com 
 wrote:
 Hi Patrick,

 As part of the unofficial checklist for graduation, we need to have a
 documented steps to make a release.

 As the first and so far the only RE for Apache Spark, I would like to
 ask for your help to document the steps to release. This will help
 other member to do the release and take turns to make sure all future
 PMCs and committers know how to do Apache Spark release.

 Most of the steps are probably similar to other projects but it is
 always useful for each podling to have its own documentation to
 release artifacts.

 Really appreciate your help.


 Thanks,

 - Henry


Re: How to write test cases for the functionalities which involves actor communication

2014-02-09 Thread Patrick Wendell
It's possible to mock out actors... we have a few examples in the code
base. One his here:

https://github.com/apache/incubator-spark/blob/master/core/src/test/scala/org/apache/spark/deploy/worker/WorkerWatcherSuite.scala

On Sun, Feb 9, 2014 at 6:21 AM, Nan Zhu zhunanmcg...@gmail.com wrote:
 Hi, all

 I have a question when trying to write some test cases for the PR

 The key functionality in my PR involves actor communication between master 
 and worker, like the worker does something and returns the result to the 
 master via a message, I want to test if the master can do the right thing 
 according to the number of workers existing in the cluster and the return 
 result from the worker,

 Is there any way to test this via some test cases?

 Thank you

 Best,

 --
 Nan Zhu



[SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Patrick Wendell
Hey All,

Thanks for everyone who participated in this thread. I've distilled
feedback based on the discussion and wanted to summarize the
conclusions:

- People seem universally +1 on semantic versioning in general.

- People seem universally +1 on having a public merge windows for releases.

- People seem universally +1 on a policy of having associated JIRA's
with features.

- Everyone believes link-level compatiblity should be the goal. Some
people think we should outright promise it now. Others thing we should
either not promise it or promise it later.
-- Compromise: let's do one minor release 1.0-1.1 to convince
ourselves this is possible (some issues with Scala traits will make
this tricky). Then we can codify it in writing. I've created
SPARK-1069 [1] to clearly establish that this is the goal for 1.X
family of releases.

- Some people think we should add particular features before having 1.0.
-- Version 1.X indicates API stability rather than a feature set;
this was clarified.
-- That said, people still have several months to work on features if
they really want to get them in for this release.

I'm going to integrate this feedback and post a tentative version of
the release guidelines to the wiki.

With all this said, I would like to move the master version to
1.0.0-SNAPSHOT as the main concerns with this have been addressed and
clarified. This merely represents a tentative consensus and the
release is still subject to a formal vote amongst PMC members.

[1] https://spark-project.atlassian.net/browse/SPARK-1069

- Patrick


Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Patrick Wendell
:P - I'm pretty sure this can be done but it will require some work -
we already use the github API in our merge script and we could hook
something like that up with the jenkins tests. Henry maybe you could
create a JIRA for this for Spark 1.0?

- Patrick

On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra m...@clearstorydata.com wrote:
 I know that it can be done -- which is different from saying that I know how 
 to set it up.


 On Feb 8, 2014, at 2:57 PM, Henry Saputra henry.sapu...@gmail.com wrote:

 Patrick, do you know if there is a way to check if a Github PR's
 subject/ title contains JIRA number and will raise warning by the
 Jenkins?

 - Henry

 On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey All,

 Thanks for everyone who participated in this thread. I've distilled
 feedback based on the discussion and wanted to summarize the
 conclusions:

 - People seem universally +1 on semantic versioning in general.

 - People seem universally +1 on having a public merge windows for releases.

 - People seem universally +1 on a policy of having associated JIRA's
 with features.

 - Everyone believes link-level compatiblity should be the goal. Some
 people think we should outright promise it now. Others thing we should
 either not promise it or promise it later.
 -- Compromise: let's do one minor release 1.0-1.1 to convince
 ourselves this is possible (some issues with Scala traits will make
 this tricky). Then we can codify it in writing. I've created
 SPARK-1069 [1] to clearly establish that this is the goal for 1.X
 family of releases.

 - Some people think we should add particular features before having 1.0.
 -- Version 1.X indicates API stability rather than a feature set;
 this was clarified.
 -- That said, people still have several months to work on features if
 they really want to get them in for this release.

 I'm going to integrate this feedback and post a tentative version of
 the release guidelines to the wiki.

 With all this said, I would like to move the master version to
 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
 clarified. This merely represents a tentative consensus and the
 release is still subject to a formal vote amongst PMC members.

 [1] https://spark-project.atlassian.net/browse/SPARK-1069

 - Patrick


Re: [TODO] Document the release process for Apache Spark

2014-02-08 Thread Patrick Wendell
I ported the release docs to the wiki today. Thanks for reminding me
about this Henry:

https://cwiki.apache.org/confluence/display/SPARK/Preparing+Spark+Releases

- Patrick

On Fri, Feb 7, 2014 at 11:51 AM, Henry Saputra henry.sapu...@gmail.com wrote:
 Cool, Thanks Patrick! Really appreciate it =)

 - Henry

 On Fri, Feb 7, 2014 at 11:46 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Henry,

 Let me document this on the wiki. I've already keep pretty thorough
 docs on this I just need to migrate them to the wiki. I've created a
 JIRA here:

 https://spark-project.atlassian.net/browse/SPARK-1066

 - Patrick

 On Fri, Feb 7, 2014 at 11:35 AM, Henry Saputra henry.sapu...@gmail.com 
 wrote:
 Hi Patrick,

 As part of the unofficial checklist for graduation, we need to have a
 documented steps to make a release.

 As the first and so far the only RE for Apache Spark, I would like to
 ask for your help to document the steps to release. This will help
 other member to do the release and take turns to make sure all future
 PMCs and committers know how to do Apache Spark release.

 Most of the steps are probably similar to other projects but it is
 always useful for each podling to have its own documentation to
 release artifacts.

 Really appreciate your help.


 Thanks,

 - Henry


Re: 0.9.0 forces log4j usage

2014-02-07 Thread Patrick Wendell
Hey Paul,

Thanks for digging this up. I worked on this feature and the intent
was to give users good default behavior if they didn't include any
logging configuration on the classpath.

The problem with assuming that CL tooling is going to fix the job is
that many people link against spark as a library and run their
application using their own scripts. In this case the first thing
people see when they run an application that links against Spark was a
big ugly logging warning.

I'm not super familiar with log4j-over-slf4j, but this behavior of
returning null for the appenders seems a little weird. What is the use
case for using this and not just directly use slf4j-log4j12 like Spark
itself does?

Did you have a more general fix for this in mind? Or was your plan to
just revert the existing behavior... We might be able to add a
configuration option to disable this logging default stuff. Or we
could just rip it out - but I'd like to avoid that if possible.

- Patrick

On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown p...@mult.ifario.us wrote:
 We have a few applications that embed Spark, and in 0.8.0 and 0.8.1, we
 were able to use slf4j, but 0.9.0 broke that and unintentionally forces
 direct use of log4j as the logging backend.

 The issue is here in the org.apache.spark.Logging trait:

 https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107

 log4j-over-slf4j *always* returns an empty enumeration for appenders to the
 ROOT logger:

 https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81

 And this causes an infinite loop and an eventual stack overflow.

 I'm happy to submit a Jira and a patch, but it would be significant enough
 reversal of recent changes that it's probably worth discussing before I
 sink a half hour into it.  My suggestion would be that initialization (or
 not) should be left to the user with reasonable default behavior supplied
 by the spark commandline tooling and not forced on applications that
 incorporate Spark.

 Thoughts/opinions?

 -- Paul
 --
 p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


Re: 0.9.0 forces log4j usage

2014-02-07 Thread Patrick Wendell
Koert - my suggestion was this. We let users use any slf4j backend
they want. If we detect that they are using the log4j backend and
*also* they didn't configure any log4j appenders, we set up some nice
defaults for them. If they are using another backend, Spark doesn't
try to modify the configuration at all.

On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers ko...@tresata.com wrote:
 well static binding is probably the wrong terminology but you get the
 idea. multiple backends are not allowed and cause an even uglier warning...

 see also here:
 https://github.com/twitter/scalding/pull/636
 and here:
 https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
 all me being annoying and complaining about slf4j-log4j12 dependencies
 (which did get removed).


 On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers ko...@tresata.com wrote:

 the issue is that slf4j uses static binding. you can put only one slf4j
 backend on the classpath, and that's what it uses. more than one is not
 allowed.

 so you either keep the slf4j-log4j12 dependency for spark, and then you
 took away people's choice of slf4j backend which is considered bad form for
 a library, or you do not include it and then people will always get the big
 fat ugly warning and slf4j logging will not flow to log4j.

 including log4j itself is not necessary a problem i think?


 On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell pwend...@gmail.comwrote:

 This also seems relevant - but not my area of expertise (whether this
 is a valid way to check this).


 http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to

 On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  Hey Guys,
 
  Thanks for explainning. Ya this is a problem - we didn't really know
  that people are using other slf4j backends, slf4j is in there for
  historical reasons but I think we may assume in a few places that
  log4j is being used and we should minimize those.
 
  We should patch this and get a fix into 0.9.1. So some solutions I see
 are:
 
  (a) Add SparkConf option to disable this. I'm fine with this one.
 
  (b) Ask slf4j which backend is active and only try to enforce this
  default if we know slf4j is using log4j. Do either of you know if this
  is possible? Not sure if slf4j exposes this.
 
  (c) Just remove this default stuff. We'd rather not do this. The goal
  of this thing is to provide good usability for people who have linked
  against Spark and haven't done anything to configure logging. For
  beginners we try to minimize the assumptions about what else they know
  about, and I've found log4j configuration is a huge mental barrier for
  people who are getting started.
 
  Paul if you submit a patch doing (a) we can merge it in. If you have
  any idea if (b) is possible I prefer that one, but it may not be
  possible or might be brittle.
 
  - Patrick
 
  On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers ko...@tresata.com
 wrote:
  Totally agree with Paul: a library should not pick the slf4j backend.
 It
  defeats the purpose of slf4j. That big ugly warning is there to alert
  people that its their responsibility to pick the back end...
  On Feb 7, 2014 3:55 AM, Paul Brown p...@mult.ifario.us wrote:
 
  Hi, Patrick --
 
  From slf4j, you can either backend it into log4j (which is the way
 that
  Spark is shipped) or you can route log4j through slf4j and then on to
 a
  different backend (e.g., logback).  We're doing the latter and
 manipulating
  the dependencies in the build because that's the way the enclosing
  application is set up.
 
  The issue with the current situation is that there's no way for an
 end user
  to choose to *not* use the log4j backend.  (My short-term solution
 was to
  use the Maven shade plugin to swap in a version of the Logging trait
 with
  the body of that method commented out.)  In addition to the situation
 with
  log4j-over-slf4j and the empty enumeration of ROOT appenders, you
 might
  also run afoul of someone who intentionally configured log4j with an
 empty
  set of appenders at the time that Spark is initializing.
 
  I'd be happy with any implementation that lets me choose my logging
  backend: override default behavior via system property, plug-in
  architecture, etc.  I do think it's reasonable to expect someone
 digesting
  a substantial JDK-based system like Spark to understand how to
 initialize
  logging -- surely they're using logging of some kind elsewhere in
 their
  application -- but if you want the default behavior there as a
 courtesy, it
  might be worth putting an INFO (versus a the glaring log4j WARN)
 message on
  the output that says something like Initialized default logging via
 Log4J;
  pass -Dspark.logging.loadDefaultLogger=false to disable this
 behavior. so
  that it's both convenient and explicit.
 
  Cheers.
  -- Paul
 
 
 
 
 
 
  --
  p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
 
 
  On Fri, Feb 7, 2014 at 12:05 AM

Re: [TODO] Document the release process for Apache Spark

2014-02-07 Thread Patrick Wendell
Hey Henry,

Let me document this on the wiki. I've already keep pretty thorough
docs on this I just need to migrate them to the wiki. I've created a
JIRA here:

https://spark-project.atlassian.net/browse/SPARK-1066

- Patrick

On Fri, Feb 7, 2014 at 11:35 AM, Henry Saputra henry.sapu...@gmail.com wrote:
 Hi Patrick,

 As part of the unofficial checklist for graduation, we need to have a
 documented steps to make a release.

 As the first and so far the only RE for Apache Spark, I would like to
 ask for your help to document the steps to release. This will help
 other member to do the release and take turns to make sure all future
 PMCs and committers know how to do Apache Spark release.

 Most of the steps are probably similar to other projects but it is
 always useful for each podling to have its own documentation to
 release artifacts.

 Really appreciate your help.


 Thanks,

 - Henry


Re: 0.9.0 forces log4j usage

2014-02-07 Thread Patrick Wendell
Ah okay sounds good. This is what I meant earlier by You have
some other application that directly calls log4j i.e. you have
for historical reasons installed the log4j-over-slf4j.

Would you mind trying out this fix and seeing if it works? This is
designed to be a hotfix for 0.9, not a general solution where we rip
out log4j from our published dependencies:

https://github.com/apache/incubator-spark/pull/560/files

- Patrick

On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown p...@mult.ifario.us wrote:
 Hi, Patrick --

 I forget which other component is responsible, but we're using the
 log4j-over-slf4j as part of an overall requirement to centralize logging,
 i.e., *someone* else is logging over log4j and we're pulling that in.
  (There's also some jul logging from Jersey, etc.)

 Goals:

 - Fully control/capture all possible logging.  (God forbid we have to grab
 System.out/err, but we'd do it if needed.)
 - Use the backend we like best at the moment.  (Happens to be logback.)

 Possible cases:

 - If Spark used Log4j at all, we would pull in that logging via
 log4j-over-slf4j.
 - If Spark used only slf4j and referenced no backend, we would use it as-is
 although we'd still have the log4j-over-slf4j because of other libraries.
 - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we
 would exclude that one dependency (via our POM).

 Best.
 -- Paul


 --
 p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


 On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Paul,

 So if your goal is ultimately to output to logback. Then why don't you
 just use slf4j and logback-classic.jar as described here [1]. Why
 involve log4j-over-slf4j at all?

 Let's say we refactored the spark build so it didn't advertise
 slf4j-log4j12 as a dependency. Would you still be using
 log4j-over-slf4j... or is this just a fix to deal with the fact that
 Spark is somewhat log4j dependent at this point.

 [1] http://www.slf4j.org/manual.html

 - Patrick

 On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown p...@mult.ifario.us wrote:
  Hi, Patrick --
 
  That's close but not quite it.
 
  The issue that occurs is not the delegation loop mentioned in slf4j
  documentation.  The stack overflow is entirely within the code in the
 Spark
  trait:
 
  at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
  at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
  at org.apache.spark.Logging$class.log(Logging.scala:36)
  at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)
 
 
  And then that repeats.
 
  As for our situation, we exclude the slf4j-log4j12 dependency when we
  import the Spark library (because we don't want to use log4j) and have
  log4j-over-slf4j already in place to ensure that all of the logging in
 the
  overall application runs through slf4j and then out through logback.  (We
  also, as another poster already mentioned, also force jcl and jul through
  slf4j.)
 
  The zen of slf4j for libraries is that the library uses the slf4j API and
  then the enclosing application can route logging as it sees fit.  Spark
  master CLI would log via slf4j and include the slf4j-log4j12 backend;
 same
  for Spark worker CLI.  Spark as a library (versus as a container) would
 not
  include any backend to the slf4j API and leave this up to the
 application.
   (FWIW, this would also avoid your log4j warning message.)
 
  But as I was saying before, I'd be happy with a situation where I can
 avoid
  log4j being enabled or configured, and I think you'll find an existing
  choice of logging framework to be a common scenario for those embedding
  Spark in other systems.
 
  Best.
  -- Paul
 
  --
  p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
 
 
  On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Paul,
 
  Looking back at your problem. I think it's the one here:
  http://www.slf4j.org/codes.html#log4jDelegationLoop
 
  So let me just be clear what you are doing so I understand. You have
  some other application that directly calls log4j. So you have to
  include log4j-over-slf4j to route those logs through slf4j to logback.
 
  At the same time you embed Spark in this application. In the past it
  was fine, but now that Spark programmatic ally initializes log4j, it
  screws up your application because log4j-over-slf4j doesn't work with
  applications that do this explicilty as discussed here:
  http://www.slf4j.org/legacy.html
 
  Correct?
 
  - Patrick
 
  On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers ko...@tresata.com
 wrote:
   got it. that sounds reasonable
  
  
   On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell pwend...@gmail.com
  wrote:
  
   Koert - my suggestion was this. We let users use any slf4j backend
   they want. If we detect that they are using the log4j backend and
   *also* they didn't configure any log4j appenders, we set up some nice
   defaults for them. If they are using another backend

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Patrick Wendell
 I like Heiko's proposal that requires every pull request to reference a
 JIRA.  This is how things are done in Hadoop and it makes it much easier
 to, for example, find out whether an issue you came across when googling
 for an error is in a release.

I think this is a good idea and something on which there is wide
consensus. I separately was going to suggest this in a later e-mail
(it's not directly tied to versioning). One of many reasons this is
necessary is because it's becoming hard to track which features ended
up in which releases.

 I agree with Mridul about binary compatibility.  It can be a dealbreaker
 for organizations that are considering an upgrade. The two ways I'm aware
 of that cause binary compatibility are scala version upgrades and messing
 around with inheritance.  Are these not avoidable at least for minor
 releases?

This is clearly a goal but I'm hesitant to codify it until we
understand all of the reasons why it might not work. I've heard in
general with Scala there are many non-obvious things that can break
binary compatibility and we need to understand what they are. I'd
propose we add the migration tool [1] here to our build and use it for
a few months and see what happens (hat tip to Michael Armbrust).

It's easy to formalize this as a requirement later, it's impossible to
go the other direction. For Scala major versions it's possible we can
cross-build between 2.10 and 2.11 to retain link-level compatibility.
It's just entirely uncharted territory and AFAIK no one who's
suggesting this is speaking from experience maintaining this guarantee
for a Scala project.

That would be the strongest convincing reason for me - if someone has
actually done this in the past in a Scala project and speaks from
experience. Most of use are speaking from the perspective of Java
projects where we understand well the trade-off's and costs of
maintaining this guarantee.

[1] https://github.com/typesafehub/migration-manager

- Patrick


Re: Proposal for Spark Release Strategy

2014-02-06 Thread Patrick Wendell
 and the
   vision and make adjustment accordingly.
  
   Release a 1.0.0 is a huge milestone and if we do need to break API
   somehow or modify internal behavior dramatically we could take
   advantage to release 1.0.0 as good step to go to.
  
  
   - Henry
  
  
  
   On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com
 wrote:
   Agree on timeboxed releases as well.
  
   Is there a vision for where we want to be as a project before
 declaring
  the
   first 1.0 release?  While we're in the 0.x days per semver we can
 break
   backcompat at will (though we try to avoid it where possible), and
 that
   luxury goes away with 1.x  I just don't want to release a 1.0 simply
   because it seems to follow after 0.9 rather than making an intentional
   decision that we're at the point where we can stand by the current
 APIs
  and
   binary compatibility for the next year or so of the major release.
  
   Until that decision is made as a group I'd rather we do an immediate
   version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
  later,
   replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
 1.0
   but not the other way around.
  
   https://github.com/apache/incubator-spark/pull/542
  
   Cheers!
   Andrew
  
  
   On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com
  wrote:
  
   +1 on time boxed releases and compatibility guidelines
  
  
   Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com
 :
  
   Hi Everyone,
  
   In an effort to coordinate development amongst the growing list of
   Spark contributors, I've taken some time to write up a proposal to
   formalize various pieces of the development process. The next
 release
   of Spark will likely be Spark 1.0.0, so this message is intended in
   part to coordinate the release plan for 1.0.0 and future releases.
   I'll post this on the wiki after discussing it on this thread as
   tentative project guidelines.
  
   == Spark Release Structure ==
   Starting with Spark 1.0.0, the Spark project will follow the
 semantic
   versioning guidelines (http://semver.org/) with a few deviations.
   These small differences account for Spark's nature as a multi-module
   project.
  
   Each Spark release will be versioned:
   [MAJOR].[MINOR].[MAINTENANCE]
  
   All releases with the same major version number will have API
   compatibility, defined as [1]. Major version numbers will remain
   stable over long periods of time. For instance, 1.X.Y may last 1
 year
   or more.
  
   Minor releases will typically contain new features and improvements.
   The target frequency for minor releases is every 3-4 months. One
   change we'd like to make is to announce fixed release dates and
 merge
   windows for each release, to facilitate coordination. Each minor
   release will have a merge window where new patches can be merged, a
 QA
   window when only fixes can be merged, then a final period where
 voting
   occurs on release candidates. These windows will be announced
   immediately after the previous minor release to give people plenty
 of
   time, and over time, we might make the whole release process more
   regular (similar to Ubuntu). At the bottom of this document is an
   example window for the 1.0.0 release.
  
   Maintenance releases will occur more frequently and depend on
 specific
   patches introduced (e.g. bug fixes) and their urgency. In general
   these releases are designed to patch bugs. However, higher level
   libraries may introduce small features, such as a new algorithm,
   provided they are entirely additive and isolated from existing code
   paths. Spark core may not introduce any features.
  
   When new components are added to Spark, they may initially be marked
   as alpha. Alpha components do not have to abide by the above
   guidelines, however, to the maximum extent possible, they should try
   to. Once they are marked stable they have to follow these
   guidelines. At present, GraphX is the only alpha component of Spark.
  
   [1] API compatibility:
  
   An API is any public class or interface exposed in Spark that is not
   marked as semi-private or experimental. Release A is API compatible
   with release B if code compiled against release A *compiles cleanly*
   against B. This does not guarantee that a compiled application that
 is
   linked against version A will link cleanly against version B without
   re-compiling. Link-level compatibility is something we'll try to
   guarantee that as well, and we might make it a requirement in the
   future, but challenges with things like Scala versions have made
 this
   difficult to guarantee in the past.
  
   == Merging Pull Requests ==
   To merge pull requests, committers are encouraged to use this tool
 [2]
   to collapse the request into one commit rather than manually
   performing git merges. It will also format the commit message nicely
   in a way that can be easily parsed later when writing credits.
   Currently

Proposal for Spark Release Strategy

2014-02-05 Thread Patrick Wendell
Hi Everyone,

In an effort to coordinate development amongst the growing list of
Spark contributors, I've taken some time to write up a proposal to
formalize various pieces of the development process. The next release
of Spark will likely be Spark 1.0.0, so this message is intended in
part to coordinate the release plan for 1.0.0 and future releases.
I'll post this on the wiki after discussing it on this thread as
tentative project guidelines.

== Spark Release Structure ==
Starting with Spark 1.0.0, the Spark project will follow the semantic
versioning guidelines (http://semver.org/) with a few deviations.
These small differences account for Spark's nature as a multi-module
project.

Each Spark release will be versioned:
[MAJOR].[MINOR].[MAINTENANCE]

All releases with the same major version number will have API
compatibility, defined as [1]. Major version numbers will remain
stable over long periods of time. For instance, 1.X.Y may last 1 year
or more.

Minor releases will typically contain new features and improvements.
The target frequency for minor releases is every 3-4 months. One
change we'd like to make is to announce fixed release dates and merge
windows for each release, to facilitate coordination. Each minor
release will have a merge window where new patches can be merged, a QA
window when only fixes can be merged, then a final period where voting
occurs on release candidates. These windows will be announced
immediately after the previous minor release to give people plenty of
time, and over time, we might make the whole release process more
regular (similar to Ubuntu). At the bottom of this document is an
example window for the 1.0.0 release.

Maintenance releases will occur more frequently and depend on specific
patches introduced (e.g. bug fixes) and their urgency. In general
these releases are designed to patch bugs. However, higher level
libraries may introduce small features, such as a new algorithm,
provided they are entirely additive and isolated from existing code
paths. Spark core may not introduce any features.

When new components are added to Spark, they may initially be marked
as alpha. Alpha components do not have to abide by the above
guidelines, however, to the maximum extent possible, they should try
to. Once they are marked stable they have to follow these
guidelines. At present, GraphX is the only alpha component of Spark.

[1] API compatibility:

An API is any public class or interface exposed in Spark that is not
marked as semi-private or experimental. Release A is API compatible
with release B if code compiled against release A *compiles cleanly*
against B. This does not guarantee that a compiled application that is
linked against version A will link cleanly against version B without
re-compiling. Link-level compatibility is something we'll try to
guarantee that as well, and we might make it a requirement in the
future, but challenges with things like Scala versions have made this
difficult to guarantee in the past.

== Merging Pull Requests ==
To merge pull requests, committers are encouraged to use this tool [2]
to collapse the request into one commit rather than manually
performing git merges. It will also format the commit message nicely
in a way that can be easily parsed later when writing credits.
Currently it is maintained in a public utility repository, but we'll
merge it into mainline Spark soon.

[2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py

== Tentative Release Window for 1.0.0 ==
Feb 1st - April 1st: General development
April 1st: Code freeze for new features
April 15th: RC1

== Deviations ==
For now, the proposal is to consider these tentative guidelines. We
can vote to formalize these as project rules at a later time after
some experience working with them. Once formalized, any deviation to
these guidelines will be subject to a lazy majority vote.

- Patrick


Re: Proposal for Spark Release Strategy

2014-02-05 Thread Patrick Wendell
 How are Alpha components and higher level libraries which may add small
 features within a maintenance release going to be marked with that status?
  Somehow/somewhere within the code itself, as just as some kind of external
 reference?

I think we'd mark alpha features as such in the java/scaladoc. This is
what scala does with experimental features. Higher level libraries are
anything that isn't Spark core. Maybe we can formalize this more
somehow.

We might be able to annotate the new features as experimental if they
end up in a patch release. This could make it more clear.


 I would strongly encourage that developers submitting pull requests include
 within the description of that PR whether you intend the contribution to be
 mergeable at the maintenance level, minor level, or major level.  That will
 help those of us doing code reviews and merges decide where the code should
 go and how closely to scrutinize the PR for changes that are not compatible
 with the intended release level.

I'd say the default is the minor level. If contributors know it should
be added in a maintenance release, it's great if they say so. However
I'd say this is also responsibility with the committers, since
individual contributors may not know. It will probably be a while
before major level patches are being merged :P


Re: Proposal for Spark Release Strategy

2014-02-05 Thread Patrick Wendell
If people feel that merging the intermediate SNAPSHOT number is
significant, let's just defer merging that until this discussion
concludes.

That said - the decision to settle on 1.0 for the next release is not
just because it happens to come after 0.9. It's a conscientious
decision based on the development of the project to this point. A
major focus of the 0.9 release was tying off loose ends in terms of
backwards compatibility (e.g. spark configuration). There was some
discussion back then of maybe cutting a 1.0 release but the decision
was deferred until after 0.9.

@mridul - pleas see the original post for discussion about binary compatibility.

On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski andykonwin...@gmail.com wrote:
 +1 for 0.10.0 now with the option to switch to 1.0.0 after further
 discussion.
 On Feb 5, 2014 9:53 PM, Andrew Ash and...@andrewash.com wrote:

 Agree on timeboxed releases as well.

 Is there a vision for where we want to be as a project before declaring the
 first 1.0 release?  While we're in the 0.x days per semver we can break
 backcompat at will (though we try to avoid it where possible), and that
 luxury goes away with 1.x  I just don't want to release a 1.0 simply
 because it seems to follow after 0.9 rather than making an intentional
 decision that we're at the point where we can stand by the current APIs and
 binary compatibility for the next year or so of the major release.

 Until that decision is made as a group I'd rather we do an immediate
 version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
 replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
 but not the other way around.

 https://github.com/apache/incubator-spark/pull/542

 Cheers!
 Andrew


 On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com
 wrote:

  +1 on time boxed releases and compatibility guidelines
 
 
   Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com:
  
   Hi Everyone,
  
   In an effort to coordinate development amongst the growing list of
   Spark contributors, I've taken some time to write up a proposal to
   formalize various pieces of the development process. The next release
   of Spark will likely be Spark 1.0.0, so this message is intended in
   part to coordinate the release plan for 1.0.0 and future releases.
   I'll post this on the wiki after discussing it on this thread as
   tentative project guidelines.
  
   == Spark Release Structure ==
   Starting with Spark 1.0.0, the Spark project will follow the semantic
   versioning guidelines (http://semver.org/) with a few deviations.
   These small differences account for Spark's nature as a multi-module
   project.
  
   Each Spark release will be versioned:
   [MAJOR].[MINOR].[MAINTENANCE]
  
   All releases with the same major version number will have API
   compatibility, defined as [1]. Major version numbers will remain
   stable over long periods of time. For instance, 1.X.Y may last 1 year
   or more.
  
   Minor releases will typically contain new features and improvements.
   The target frequency for minor releases is every 3-4 months. One
   change we'd like to make is to announce fixed release dates and merge
   windows for each release, to facilitate coordination. Each minor
   release will have a merge window where new patches can be merged, a QA
   window when only fixes can be merged, then a final period where voting
   occurs on release candidates. These windows will be announced
   immediately after the previous minor release to give people plenty of
   time, and over time, we might make the whole release process more
   regular (similar to Ubuntu). At the bottom of this document is an
   example window for the 1.0.0 release.
  
   Maintenance releases will occur more frequently and depend on specific
   patches introduced (e.g. bug fixes) and their urgency. In general
   these releases are designed to patch bugs. However, higher level
   libraries may introduce small features, such as a new algorithm,
   provided they are entirely additive and isolated from existing code
   paths. Spark core may not introduce any features.
  
   When new components are added to Spark, they may initially be marked
   as alpha. Alpha components do not have to abide by the above
   guidelines, however, to the maximum extent possible, they should try
   to. Once they are marked stable they have to follow these
   guidelines. At present, GraphX is the only alpha component of Spark.
  
   [1] API compatibility:
  
   An API is any public class or interface exposed in Spark that is not
   marked as semi-private or experimental. Release A is API compatible
   with release B if code compiled against release A *compiles cleanly*
   against B. This does not guarantee that a compiled application that is
   linked against version A will link cleanly against version B without
   re-compiling. Link-level compatibility is something we'll try to
   guarantee that as well, and we

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-02-02 Thread Patrick Wendell
It takes a day or two to package the release pass votes and is cut to
maven. Coming soon!

On Sat, Feb 1, 2014 at 8:08 PM, Kapil Malik kma...@adobe.com wrote:
 Awesome ! Thanks everyone :)

 -Original Message-
 From: Matei Zaharia [mailto:matei.zaha...@gmail.com]
 Sent: 02 February 2014 08:09
 To: dev@spark.incubator.apache.org; j...@cs.berkeley.edu Kottalam
 Subject: Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

 Yup, we’re still working on putting it on the website, but this is the final 
 release. You can download the RC5 artifacts from 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-0-9-0-incubating-rc5-td318.html.

 Matei

 On Feb 1, 2014, at 12:51 PM, Jey Kottalam j...@cs.berkeley.edu wrote:

 Hi Kapil,

 It looks to me like the artifacts in Maven are the official 0.9.0
 release, though the website has not yet been updated. The IPMC
 approved RC5 as of yesterday:

 https://mail-archives.apache.org/mod_mbox/incubator-general/201401.mbo
 x/cabpqxstjm+po7_22bdybqxk90zsy3pnxppft87-9xdff98u...@mail.gmail.com

 -Jey

 On Sat, Feb 1, 2014 at 8:19 AM, Kapil Malik kma...@adobe.com wrote:
 Hi Stevo,
 Thanks for the link. Indeed, different versions are available on maven 
 repository which I can clone/sync for development purposes. But I'm more 
 confident about official release version when deploying to a cluster which 
 is used by multiple people.
 Hence curious about date for 0.9 official release.

 Thanks and regards,

 Kapil

 -Original Message-
 From: Stevo Slavić [mailto:ssla...@gmail.com]
 Sent: 01 February 2014 21:33
 To: dev@spark.incubator.apache.org
 Subject: Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

 Apache Spark 0.9.0 artifacts are on Maven central repo (see
 http://central.maven.org/maven2/org/apache/spark/spark-core_2.10/0.9.
 0-incubating/)

 Kind regards,
 Stevo Slavic


 On Sat, Feb 1, 2014 at 4:59 PM, Kapil Malik kma...@adobe.com wrote:

 Sent too early ... 1 week* (maybe I refreshed too fast)

 -Original Message-
 From: Kapil Malik
 Sent: 01 February 2014 21:27
 To: dev@spark.incubator.apache.org
 Subject: RE: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

 +1 for Q !
 Have been monitoring this thread from past 3 weeks in anticipation
 :) Any tentative dates for official 0.9 release ?

 Kapil Malik | kma...@adobe.com | 33430 / 8800836581

 -Original Message-
 From: C. Ross Jam [mailto:cross...@crossjam.net]
 Sent: 01 February 2014 21:18
 To: dev@spark.incubator.apache.org
 Subject: Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

 Curious lurker here. Did this vote close successfully? Should I wait
 for an official 0.9 release?

 Cheers!

 On Friday, January 24, 2014, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.






Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-28 Thread Patrick Wendell
I'll add my own +1.

On Tue, Jan 28, 2014 at 12:45 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Stephen,

 Yes this runs afoul of good practice in Maven where a given version
 shouldn't be re-used. As far as I understand though, it is required by
 the way the Apache release process works.

 The artifacts and repository content that get voted on need to exactly
 match the final release. So we can't hold a vote on a version of the
 code where everything says -rcx, then we go back and change the
 source code and do a second push to maven with code that doesn't have
 an -rcx suffix. This would effectively change the code that is being
 released.

 I was thinking as a work around that maybe we could publish a second
 set of staging artifacts that are versioned with -rcX for people to
 test against. I think as long as we make it clear that these are not
 the official artifacts being voted on it might be okay. I'm not
 totally sure if this is allowed though.

 - Patrick

 On Tue, Jan 28, 2014 at 9:01 AM, Stephen Haberman
 stephen.haber...@gmail.com wrote:
 Hi Patrick,

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1006/

 I was going to import this rc5 release into our internal Maven repo to
 try it out, but noticed that the version doesn't have rc5 in it.

 This means that, if there is an rc6, I'll have to re-import over the same
 artifacts, which is generally not a good thing given Maven assumes artifacts
 never change.

 Is this restriction required by the blessing process, or would it be
 possible to sneak rc5 into the pre-final version number?

 For now, I'll just build a local version, at the same commit, but with
 the as 0.9.0-incubating-rc5.

 Apologies if this was discussed before and I just missed it.

 - Stephen



Re: Moving to Typesafe Config?

2014-01-27 Thread Patrick Wendell
Hey Heiko,

Spark 0.9 introduced a common config class for Spark applications. It
also (initially) supported loading config files in the nested typesafe
format, but this was removed last minute due to a bug. In 1.0 we'll
probably add support for config files, though it may not support
typesafe's tree-style config files because that conflicts with the
naming style of several spark options (we have options where x.y and
x.y.z are both named keys, and the typesafe parser doesn't allow
that).

- Patrick

On Mon, Jan 27, 2014 at 8:59 AM, Heiko Braun ike.br...@googlemail.com wrote:
 Thanks. I found the discussion myself ;)

 /heiko

 Am 27.01.2014 um 17:34 schrieb Mark Hamstra m...@clearstorydata.com:

 And it would be more helpful if I gave you a usable link 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html

 Sent from my iPhone

 On Jan 27, 2014, at 8:13 AM, Heiko Braun ike.br...@googlemail.com wrote:

 Thanks Mark.

 On 27 Jan 2014, at 17:05, Mark Hamstra m...@clearstorydata.com wrote:

 Been done and undone, and will probably be redone for 1.0.  See
 https://mail.google.com/mail/ca/u/0/#search/config/143a6c39e3995882


 On Mon, Jan 27, 2014 at 7:58 AM, Heiko Braun 
 ike.br...@googlemail.comwrote:


 Is there any interest in moving to a more structured approach for
 configuring spark components? I.e. moving to the typesafe config [1]. 
 Since
 spark already leverages akka, this seems to be a reasonable choice IMO.

 [1] https://github.com/typesafehub/config

 Regards, Heiko



Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Patrick Wendell
Hey Taka,

If you build a second version you need to clean the existing assembly jar.

The reference implementation of the tests are the ones on the U.C.
Berkeley Jenkins. These are passing for Branch 0.9 for both Hadoop 1
and Hadoop 2 versions, so I'm inclined to think it's an issue with
your test env or setup.

https://amplab.cs.berkeley.edu/jenkins/view/Spark/

- Patrick

On Sun, Jan 26, 2014 at 10:52 PM, Reynold Xin r...@databricks.com wrote:
 It is possible that you have generated the assembly jar using one version
 of Hadoop, and then another assembly jar with another version. Those tests
 that failed are all using a local cluster that sets up multiple processes,
 which would require launching Spark worker processes using the assembly
 jar. If that's indeed the problem, removing the extra assembly jars should
 fix them.


 On Sun, Jan 26, 2014 at 10:49 PM, Taka Shinagawa 
 taka.epsi...@gmail.comwrote:

 If I build Spark for Hadoop 1.0.4 (either SPARK_HADOOP_VERSION=1.0.4
 sbt/sbt assembly  or sbt/sbt assembly) or use the binary distribution,
 'sbt/sbt test' runs successfully.

 However, if I build Spark targeting any other Hadoop versions (e.g.
 SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly, SPARK_HADOOP_VERSION=2.2.0
 sbt/sbt assembly), I'm getting the following errors with 'sbt/sbt test':

 1) type mismatch errors with JavaPairDStream.scala
 2) following test failures
 [error] Failed tests:
 [error] org.apache.spark.ShuffleNettySuite
 [error] org.apache.spark.ShuffleSuite
 [error] org.apache.spark.FileServerSuite
 [error] org.apache.spark.DistributedSuite

 I don't have Hadoop 1.0.4 installed on my test systems (but the test
 succeeds, and failed with the installed Hadoop versions). I'm seeing these
 sbt test errors with the previous 0.9.0 RCs and 0.8.1, too.

 I'm wondering if anyone else has seen this problem or I'm missing something
 to run the test correctly.

 Thanks,
 Taka




 On Sat, Jan 25, 2014 at 5:00 PM, Sean McNamara
 sean.mcnam...@webtrends.comwrote:

  +1
 
  On 1/25/14, 4:04 PM, Mark Hamstra m...@clearstorydata.com wrote:
 
  +1
  
  
  On Sat, Jan 25, 2014 at 2:37 PM, Andy Konwinski
  andykonwin...@gmail.comwrote:
  
   +1
  
  
   On Sat, Jan 25, 2014 at 2:27 PM, Reynold Xin r...@databricks.com
  wrote:
  
+1
   
 On Jan 25, 2014, at 12:07 PM, Hossein fal...@gmail.com wrote:

 +1

 Compiled and tested on Mavericks.

 --Hossein


 On Sat, Jan 25, 2014 at 11:38 AM, Patrick Wendell
  pwend...@gmail.com
wrote:

 I'll kick of the voting with a +1.

 On Thu, Jan 23, 2014 at 11:33 PM, Patrick Wendell
  pwend...@gmail.com
   
 wrote:
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is
  attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit 95d28ff3):

   
  
  
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=
  95d28ff3d0d20d9c583e184f9e2c5ae842d8a4d9

 The release files, including signatures, digests, etc can be
 found
   at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc5

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:

   
  
 https://repository.apache.org/content/repositories/orgapachespark-1006/

 The documentation corresponding to this release can be found at:

  http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc5-docs/

 Please vote on releasing this package as Apache Spark
   0.9.0-incubating!

 The vote is open until Monday, January 27, at 07:30 UTC and
 passes
   ifa
 majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/

   
--
You received this message because you are subscribed to the Google
  Groups
Unofficial Apache Spark Dev Mailing List Mirror group.
To unsubscribe from this group and stop receiving emails from it,
  send an
email to apache-spark-dev-mirror+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
   
  
 
 



[RESULT] [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Patrick Wendell
Voting is now closed. This vote passes with 5 binding +1 votes and no
0 or -1 votes. This vote will now go to the IPMC list for a second
72-hour vote. Spark developers are encouraged to comment on the IPMC
vote as well.

The totals are:

+1
Patrick Wendell*
Hossein Falaki
Reynold Xin*
Andy Konwinski*
Mark Hamstra*
Sean McNamara*

0: (none)
-1: (none)

On Sun, Jan 26, 2014 at 10:58 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Taka,

 If you build a second version you need to clean the existing assembly jar.

 The reference implementation of the tests are the ones on the U.C.
 Berkeley Jenkins. These are passing for Branch 0.9 for both Hadoop 1
 and Hadoop 2 versions, so I'm inclined to think it's an issue with
 your test env or setup.

 https://amplab.cs.berkeley.edu/jenkins/view/Spark/

 - Patrick

 On Sun, Jan 26, 2014 at 10:52 PM, Reynold Xin r...@databricks.com wrote:
 It is possible that you have generated the assembly jar using one version
 of Hadoop, and then another assembly jar with another version. Those tests
 that failed are all using a local cluster that sets up multiple processes,
 which would require launching Spark worker processes using the assembly
 jar. If that's indeed the problem, removing the extra assembly jars should
 fix them.


 On Sun, Jan 26, 2014 at 10:49 PM, Taka Shinagawa 
 taka.epsi...@gmail.comwrote:

 If I build Spark for Hadoop 1.0.4 (either SPARK_HADOOP_VERSION=1.0.4
 sbt/sbt assembly  or sbt/sbt assembly) or use the binary distribution,
 'sbt/sbt test' runs successfully.

 However, if I build Spark targeting any other Hadoop versions (e.g.
 SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly, SPARK_HADOOP_VERSION=2.2.0
 sbt/sbt assembly), I'm getting the following errors with 'sbt/sbt test':

 1) type mismatch errors with JavaPairDStream.scala
 2) following test failures
 [error] Failed tests:
 [error] org.apache.spark.ShuffleNettySuite
 [error] org.apache.spark.ShuffleSuite
 [error] org.apache.spark.FileServerSuite
 [error] org.apache.spark.DistributedSuite

 I don't have Hadoop 1.0.4 installed on my test systems (but the test
 succeeds, and failed with the installed Hadoop versions). I'm seeing these
 sbt test errors with the previous 0.9.0 RCs and 0.8.1, too.

 I'm wondering if anyone else has seen this problem or I'm missing something
 to run the test correctly.

 Thanks,
 Taka




 On Sat, Jan 25, 2014 at 5:00 PM, Sean McNamara
 sean.mcnam...@webtrends.comwrote:

  +1
 
  On 1/25/14, 4:04 PM, Mark Hamstra m...@clearstorydata.com wrote:
 
  +1
  
  
  On Sat, Jan 25, 2014 at 2:37 PM, Andy Konwinski
  andykonwin...@gmail.comwrote:
  
   +1
  
  
   On Sat, Jan 25, 2014 at 2:27 PM, Reynold Xin r...@databricks.com
  wrote:
  
+1
   
 On Jan 25, 2014, at 12:07 PM, Hossein fal...@gmail.com wrote:

 +1

 Compiled and tested on Mavericks.

 --Hossein


 On Sat, Jan 25, 2014 at 11:38 AM, Patrick Wendell
  pwend...@gmail.com
wrote:

 I'll kick of the voting with a +1.

 On Thu, Jan 23, 2014 at 11:33 PM, Patrick Wendell
  pwend...@gmail.com
   
 wrote:
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is
  attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit 95d28ff3):

   
  
  
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=
  95d28ff3d0d20d9c583e184f9e2c5ae842d8a4d9

 The release files, including signatures, digests, etc can be
 found
   at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc5

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:

   
  
 https://repository.apache.org/content/repositories/orgapachespark-1006/

 The documentation corresponding to this release can be found at:

  http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc5-docs/

 Please vote on releasing this package as Apache Spark
   0.9.0-incubating!

 The vote is open until Monday, January 27, at 07:30 UTC and
 passes
   ifa
 majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/

   
--
You received this message because you are subscribed to the Google
  Groups
Unofficial Apache Spark Dev Mailing List Mirror group.
To unsubscribe from this group and stop receiving emails from it,
  send an
email to apache-spark-dev-mirror+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
   
  
 
 



Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc4) [new thread]

2014-01-22 Thread Patrick Wendell
Hey Tom,

Matei had to remove this because it turns out that there was a fairly
serious bug in the Typesafe config library we use for parsing conf
files [1]. There wasn't an immediate solution to this so he just
removed the capability for this release and we can revisit it in the
next release.

http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html

- Patrick

On Wed, Jan 22, 2014 at 8:18 AM, Tom Graves tgraves...@yahoo.com wrote:
 It looks like the latest round of changes took out spark.conf.  Are there 
 plans to add this back in (jira)?

 Tom



 On Wednesday, January 22, 2014 3:46 AM, Henry Saputra 
 henry.sapu...@gmail.com wrote:

 Would love to hear from Mridul to verify the fixes for problems he saw are
 in.


 On Tuesday, January 21, 2014, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit 0771df67):

 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=0771df675363c69622404cb514bd751bc90526af

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc4

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1005/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc4-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Friday, January 24, at 11:15 UTC and passes if
 a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/



Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc4) [new thread]

2014-01-22 Thread Patrick Wendell
Btw - to be clear this was an incompatibility between Spark's config
names and constraints on names imposed by typesafe. So didn't mean to
imply there was something broken in their config library.

On Wed, Jan 22, 2014 at 9:14 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Tom,

 Matei had to remove this because it turns out that there was a fairly
 serious bug in the Typesafe config library we use for parsing conf
 files [1]. There wasn't an immediate solution to this so he just
 removed the capability for this release and we can revisit it in the
 next release.

 http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html

 - Patrick

 On Wed, Jan 22, 2014 at 8:18 AM, Tom Graves tgraves...@yahoo.com wrote:
 It looks like the latest round of changes took out spark.conf.  Are there 
 plans to add this back in (jira)?

 Tom



 On Wednesday, January 22, 2014 3:46 AM, Henry Saputra 
 henry.sapu...@gmail.com wrote:

 Would love to hear from Mridul to verify the fixes for problems he saw are
 in.


 On Tuesday, January 21, 2014, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit 0771df67):

 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=0771df675363c69622404cb514bd751bc90526af

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc4

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1005/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc4-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Friday, January 24, at 11:15 UTC and passes if
 a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/



Re: Config properties broken in master

2014-01-19 Thread Patrick Wendell
Hey Mridul this was patched and we cut a new release candidate. There
were several different config options which had a.b and a.b.c... they
should all work in the new RC.

On Sun, Jan 19, 2014 at 4:56 AM, Mridul Muralidharan mri...@gmail.com wrote:
 Chanced upon spill related config which exhibit same pattern ...

 - Mridul

 On Sun, Jan 19, 2014 at 1:10 AM, Reynold Xin r...@databricks.com wrote:
 I also just went over the config options to see how pervasive this is. In
 addition to speculation, there is one more conflict of this kind:

 spark.locality.wait
 spark.locality.wait.node
 spark.locality.wait.process
 spark.locality.wait.rack


 spark.speculation
 spark.speculation.interval
 spark.speculation.multiplier
 spark.speculation.quantile


 On Sat, Jan 18, 2014 at 11:36 AM, Matei Zaharia 
 matei.zaha...@gmail.comwrote:

 This is definitely an important issue to fix. Instead of renaming
 properties, one solution would be to replace Typesafe Config with just
 reading Java system properties, and disable config files for this release.
 I kind of like that over renaming.

 Matei

 On Jan 18, 2014, at 11:30 AM, Mridul Muralidharan mri...@gmail.com
 wrote:

  Hi,
 
   Speculation was an example, there are others in spark which are
  affected by this ...
  Some of them have been around for a while, so will break existing
 code/scripts.
 
  Regards,
  Mridul
 
  On Sun, Jan 19, 2014 at 12:51 AM, Nan Zhu zhunanmcg...@gmail.com
 wrote:
  change spark.speculation to spark.speculation.switch?
 
  maybe we can restrict that all properties in Spark should be three
 levels
 
 
  On Sat, Jan 18, 2014 at 2:10 PM, Mridul Muralidharan mri...@gmail.com
 wrote:
 
  Hi,
 
   Unless I am mistaken, the change to using typesafe ConfigFactory has
  broken some of the system properties we use in spark.
 
  For example: if we have both
  -Dspark.speculation=true -Dspark.speculation.multiplier=0.95
  set, then the spark.speculation property is dropped.
 
  The rules of parseProperty actually document this clearly [1]
 
 
  I am not sure what the right fix here would be (other than replacing
  use of config that is).
 
  Any thoughts ?
  I would vote -1 for 0.9 to be released before this is fixed.
 
 
  Regards,
  Mridul
 
 
  [1]
 
 http://typesafehub.github.io/config/latest/api/com/typesafe/config/ConfigFactory.html#parseProperties%28java.util.Properties,%20com.typesafe.config.ConfigParseOptions%29
 




Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc2)

2014-01-19 Thread Patrick Wendell
This vote is cancelled in favor of rc3 - which fixes the YARN issue
Sandy ran into.

@taka - thanks for reporting that bug. It's not enough to block this
release however. Once a fix exists we can merge it into the 0.9 branch
and it will be in 0.9.1

On Sun, Jan 19, 2014 at 12:37 PM, Taka Shinagawa taka.epsi...@gmail.com wrote:
 I've found a problem with the cartesian method on Pyspark and filed
 as SPARK-1034
 https://spark-project.atlassian.net/browse/SPARK-1034

 0.8.1 doesn't have this problem. On Scala, cartesian method works fine.

 It's also nice if SPARK-978 can be fixed, too.
 https://spark-project.atlassian.net/browse/SPARK-978

 Thanks,
 Taka


 On Sun, Jan 19, 2014 at 1:24 AM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Has anybody tested against YARN 2.2?  I tried it out against a
 pseudo-distributed cluster and ran into an issue I just filed as
 SPARK-1031https://spark-project.atlassian.net/browse/SPARK-1031
 .

 thanks,
 Sandy


 On Sun, Jan 19, 2014 at 12:55 AM, Reynold Xin r...@databricks.com wrote:

  +1
 
 
  On Sat, Jan 18, 2014 at 11:11 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   I'll kick of the voting with a +1.
  
   On Sat, Jan 18, 2014 at 11:05 PM, Patrick Wendell pwend...@gmail.com
   wrote:
Please vote on releasing the following candidate as Apache Spark
(incubating) version 0.9.0.
   
A draft of the release notes along with the changes file is attached
to this e-mail.
   
The tag to be voted on is v0.9.0-incubating (commit 00c847a):
   
  
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=00c847af1d4be2fe5fad887a57857eead1e517dc
   
The release files, including signatures, digests, etc can be found
 at:
http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2/
   
Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc
   
The staging repository for this release can be found at:
   
  https://repository.apache.org/content/repositories/orgapachespark-1003/
   
The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2-docs/
   
Please vote on releasing this package as Apache Spark
 0.9.0-incubating!
   
The vote is open until Wednesday, January 22, at 07:05 UTC
and passes if a majority of at least 3 +1 PPMC votes are cast.
   
[ ] +1 Release this package as Apache Spark 0.9.0-incubating
[ ] -1 Do not release this package because ...
   
To learn more about Apache Spark, please see
http://spark.incubator.apache.org/
  
 



Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc3)

2014-01-19 Thread Patrick Wendell
Attempting to attach the release notes again (I think it may have been
blocked previously due to not having an extension).

On Sun, Jan 19, 2014 at 8:05 PM, Patrick Wendell pwend...@gmail.com wrote:
 I'll add my +1 as well

 On Sun, Jan 19, 2014 at 7:33 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote:
 +1

 Re-tested on Mac.

 Matei

 On Jan 19, 2014, at 7:09 PM, Tathagata Das tathagata.das1...@gmail.com 
 wrote:

 Starting off.
 +1


 On Sun, Jan 19, 2014 at 2:15 PM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit a7760eff):

 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=a7760eff4ea6a474cab68896a88550f63bae8b0d

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1004/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Wednesday, January 22, at 22:15 UTC and passes
 if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/


Spark 0.9.0 is a major release that adds significant new features. It updates 
Spark to Scala 2.10, simplifies high availability, and updates numerous 
components of the project. This release includes a first version of GraphX, a 
powerful new framework for graph processing that comes with a library of 
standard algorithms. In addition, Spark Streaming is now out of alpha, and 
includes significant optimizations and simplified high availability deployment.

### Scala 2.10 Support

Spark now runs on Scala 2.10, letting users benefit from the language and 
library improvements in this version.

### Configuration System

The new [SparkConf] class is now the preferred way to configure advanced 
settings on your SparkContext, though the previous Java system property still 
works. SparkConf is especially useful in tests to make sure properties don’t 
stay set across tests.

### Spark Streaming Improvements

Spark Streaming is no longer alpha, and comes with simplified high availability 
and several optimizations.

* When running on a Spark standalone cluster with the [standalone cluster high 
availability mode], you can submit a Spark Streaming driver application to the 
cluster and have it automatically recovered if either the driver or the cluster 
master crashes.
* Windowed operators have been sped up by 30-50%.
* Spark Streaming’s input source plugins (e.g. for Twitter, Kafka and Flume) 
are now separate projects, making it easier to pull in only the dependencies 
you need.
* A new StreamingListener interface has been added for monitoring statistics 
about the streaming computation.
* A few aspects of the API have been improved:
* `DStream` and `PairDStream` classes have been moved from 
`org.apache.spark.streaming` to `org.apache.spark.streaming.dstream` to keep it 
consistent with `org.apache.spark.rdd.RDD`.
* `DStream.foreach` - `DStream.foreachRDD` to make it explicit that it works 
for every RDD, not every element
* `StreamingContext.awaitTermination()` allows you wait for context shutdown 
and catch any exception that occurs in the streaming computation.
*`StreamingContext.stop()` now allows stopping of StreamingContext without 
stopping the underlying SparkContext.

### GraphX Alpha

GraphX is a new API for graph processing that uses recent advances in 
graph-parallel computation. It lets you build a graph within a Spark program 
using the standard Spark operators, then process it with new graph operators 
that are optimized for distributed computation. It includes basic 
transformations, a Pregel API for iterative computation, and a standard library 
of graph loaders and analytics algorithms. By offering these features within 
the Spark engine, GraphX can significantly speed up processing tasks compared 
to workflows that use different engines.

GraphX features in this release include:

* Building graphs from arbitrary Spark RDDs
* Basic operations to transform graphs or extract subgraphs
* An optimized Pregel API that takes advantage of graph partitioning and 
indexing
* Standard algorithms including PageRank, connected components, strongly 
connected components, SVD++, and triangle counting
* Interactive use from the Spark shell

GraphX

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc3)

2014-01-19 Thread Patrick Wendell
Eventually the notes get posted on the apache website. I attached them
to this e-mail so that people can get a sense of what is in the
release before they vote on it.

On Sun, Jan 19, 2014 at 9:57 PM, Henry Saputra henry.sapu...@gmail.com wrote:
 Hi Patrick, quick question, where are you planning to add the release notes?
 I dont think it is part of the source, is it?

 - Henry

 On Sun, Jan 19, 2014 at 8:41 PM, Patrick Wendell pwend...@gmail.com wrote:
 Attempting to attach the release notes again (I think it may have been
 blocked previously due to not having an extension).

 On Sun, Jan 19, 2014 at 8:05 PM, Patrick Wendell pwend...@gmail.com wrote:
 I'll add my +1 as well

 On Sun, Jan 19, 2014 at 7:33 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote:
 +1

 Re-tested on Mac.

 Matei

 On Jan 19, 2014, at 7:09 PM, Tathagata Das tathagata.das1...@gmail.com 
 wrote:

 Starting off.
 +1


 On Sun, Jan 19, 2014 at 2:15 PM, Patrick Wendell pwend...@gmail.com 
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit a7760eff):

 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=a7760eff4ea6a474cab68896a88550f63bae8b0d

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1004/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Wednesday, January 22, at 22:15 UTC and passes
 if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/




Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)

2014-01-18 Thread Patrick Wendell
Mridul, thanks a *lot* for pointing this out. This is indeed an issue
and something which warrants cutting a new RC.

- Patrick

On Sat, Jan 18, 2014 at 11:14 AM, Mridul Muralidharan mri...@gmail.com wrote:
 I would vote -1 for this release until we resolve config property
 issue [1] : if there is a known resolution for this (which I could not
 find unfortunately, apologies if it exists !), then will change my
 vote.

 Thanks,
 Mridul


 [1] 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html

 On Thu, Jan 16, 2014 at 7:18 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit 7348893):
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=7348893f0edd96dacce2f00970db1976266f7008

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1001/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Sunday, January 19, at 02:00 UTC
 and passes if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/


Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)

2014-01-18 Thread Patrick Wendell
This vote is cancelled in favor of rc2 which I'll post shortly.

On Sat, Jan 18, 2014 at 12:14 PM, Patrick Wendell pwend...@gmail.com wrote:
 Mridul, thanks a *lot* for pointing this out. This is indeed an issue
 and something which warrants cutting a new RC.

 - Patrick

 On Sat, Jan 18, 2014 at 11:14 AM, Mridul Muralidharan mri...@gmail.com 
 wrote:
 I would vote -1 for this release until we resolve config property
 issue [1] : if there is a known resolution for this (which I could not
 find unfortunately, apologies if it exists !), then will change my
 vote.

 Thanks,
 Mridul


 [1] 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html

 On Thu, Jan 16, 2014 at 7:18 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit 7348893):
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=7348893f0edd96dacce2f00970db1976266f7008

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1001/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Sunday, January 19, at 02:00 UTC
 and passes if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/


Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc2)

2014-01-18 Thread Patrick Wendell
I'll kick of the voting with a +1.

On Sat, Jan 18, 2014 at 11:05 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit 00c847a):
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=00c847af1d4be2fe5fad887a57857eead1e517dc

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1003/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Wednesday, January 22, at 07:05 UTC
 and passes if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/


Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)

2014-01-16 Thread Patrick Wendell
I also ran your example locally and it worked with 0.8.1 and
0.9.0-rc1. So it's possible somehow you are pulling in an older
version if Spark or an incompatible version of Hadoop.

- Patrick

On Thu, Jan 16, 2014 at 9:39 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Alex,

 Thanks for testing out this rc. Would you mind forking this into a different
 thread so we can discuss there?

 Also, does your application build and run correctly with spark 0.8.1? That
 would determine whether the problem is specifically with this rc...

 Patrick

 ---
 sent from my phone

 On Jan 15, 2014 11:44 PM, Alex Cozzi alexco...@gmail.com wrote:

 Oh, I forgot: I am using the “yarn” maven profile to target yarn 2.2

 Alex Cozzi
 alexco...@gmail.com
 On Jan 15, 2014, at 11:41 PM, Alex Cozzi alexco...@gmail.com wrote:

  Just testing out the rc1. I create a dependent project (using maven) and
  I copied the HdfsTest.scala test, but I added a single line to save the 
  file
  back to disk:
 
  package org.apache.spark.examples
 
  import org.apache.spark._
 
  object HdfsTest {
def main(args: Array[String]) {
  val sc = new SparkContext(args(0), HdfsTest,
System.getenv(SPARK_HOME),
  SparkContext.jarOfClass(this.getClass))
  val file = sc.textFile(args(1))
  val mapped = file.map(s = s.length).cache()
  for (iter - 1 to 10) {
val start = System.currentTimeMillis()
for (x - mapped) { x + 2 }
//  println(Processing:  + x)
val end = System.currentTimeMillis()
println(Iteration  + iter +  took  + (end-start) +  ms)
mapped.saveAsTextFile(out)
  }
  System.exit(0)
}
  }
 
  and this my pom file:
  project xmlns=http://maven.apache.org/POM/4.0.0;
  xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation=http://maven.apache.org/POM/4.0.0
  http://maven.apache.org/maven-v4_0_0.xsd;
modelVersion4.0.0/modelVersion
groupIdmy.examples/groupId
artifactIdspark-samples/artifactId
version0.0.1-SNAPSHOT/version
inceptionYear2014/inceptionYear
 
properties
maven.compiler.source1.6/maven.compiler.source
maven.compiler.target1.6/maven.compiler.target
encodingUTF-8/encoding
scala.tools.version2.10/scala.tools.version
scala.version2.10.0/scala.version
/properties
 
repositories
repository
idspark staging/id
 
  urlhttps://repository.apache.org/content/repositories/orgapachespark-1001/url
/repository
/repositories
 
dependencies
dependency
groupIdorg.scala-lang/groupId
artifactIdscala-library/artifactId
version${scala.version}/version
/dependency
 
dependency
groupIdorg.apache.spark/groupId
 
  artifactIdspark-core_${scala.tools.version}/artifactId
version0.9.0-incubating/version
/dependency
 
!-- Test --
dependency
groupIdjunit/groupId
artifactIdjunit/artifactId
version4.11/version
scopetest/scope
/dependency
dependency
groupIdorg.specs2/groupId
 
  artifactIdspecs2_${scala.tools.version}/artifactId
version1.13/version
scopetest/scope
/dependency
dependency
groupIdorg.scalatest/groupId
 
  artifactIdscalatest_${scala.tools.version}/artifactId
version2.0.M6-SNAP8/version
scopetest/scope
/dependency
/dependencies
 
build
sourceDirectorysrc/main/scala/sourceDirectory
testSourceDirectorysrc/test/scala/testSourceDirectory
plugins
plugin
!-- see
  http://davidb.github.com/scala-maven-plugin --
groupIdnet.alchim31.maven/groupId
 
  artifactIdscala-maven-plugin/artifactId
version3.1.6/version
configuration
 
  scalaCompatVersion2.10/scalaCompatVersion
jvmArgs
jvmArg-Xms128m/jvmArg
jvmArg-Xmx2048m/jvmArg
/jvmArgs
/configuration
executions
execution
goals
 
  goalcompile/goal
 
  goaltestCompile/goal
/goals

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)

2014-01-16 Thread Patrick Wendell
I'll kick this vote off with a +1.

On Thu, Jan 16, 2014 at 10:43 AM, Patrick Wendell pwend...@gmail.com wrote:
 I also ran your example locally and it worked with 0.8.1 and
 0.9.0-rc1. So it's possible somehow you are pulling in an older
 version if Spark or an incompatible version of Hadoop.

 - Patrick

 On Thu, Jan 16, 2014 at 9:39 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Alex,

 Thanks for testing out this rc. Would you mind forking this into a different
 thread so we can discuss there?

 Also, does your application build and run correctly with spark 0.8.1? That
 would determine whether the problem is specifically with this rc...

 Patrick

 ---
 sent from my phone

 On Jan 15, 2014 11:44 PM, Alex Cozzi alexco...@gmail.com wrote:

 Oh, I forgot: I am using the “yarn” maven profile to target yarn 2.2

 Alex Cozzi
 alexco...@gmail.com
 On Jan 15, 2014, at 11:41 PM, Alex Cozzi alexco...@gmail.com wrote:

  Just testing out the rc1. I create a dependent project (using maven) and
  I copied the HdfsTest.scala test, but I added a single line to save the 
  file
  back to disk:
 
  package org.apache.spark.examples
 
  import org.apache.spark._
 
  object HdfsTest {
def main(args: Array[String]) {
  val sc = new SparkContext(args(0), HdfsTest,
System.getenv(SPARK_HOME),
  SparkContext.jarOfClass(this.getClass))
  val file = sc.textFile(args(1))
  val mapped = file.map(s = s.length).cache()
  for (iter - 1 to 10) {
val start = System.currentTimeMillis()
for (x - mapped) { x + 2 }
//  println(Processing:  + x)
val end = System.currentTimeMillis()
println(Iteration  + iter +  took  + (end-start) +  ms)
mapped.saveAsTextFile(out)
  }
  System.exit(0)
}
  }
 
  and this my pom file:
  project xmlns=http://maven.apache.org/POM/4.0.0;
  xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation=http://maven.apache.org/POM/4.0.0
  http://maven.apache.org/maven-v4_0_0.xsd;
modelVersion4.0.0/modelVersion
groupIdmy.examples/groupId
artifactIdspark-samples/artifactId
version0.0.1-SNAPSHOT/version
inceptionYear2014/inceptionYear
 
properties
maven.compiler.source1.6/maven.compiler.source
maven.compiler.target1.6/maven.compiler.target
encodingUTF-8/encoding
scala.tools.version2.10/scala.tools.version
scala.version2.10.0/scala.version
/properties
 
repositories
repository
idspark staging/id
 
  urlhttps://repository.apache.org/content/repositories/orgapachespark-1001/url
/repository
/repositories
 
dependencies
dependency
groupIdorg.scala-lang/groupId
artifactIdscala-library/artifactId
version${scala.version}/version
/dependency
 
dependency
groupIdorg.apache.spark/groupId
 
  artifactIdspark-core_${scala.tools.version}/artifactId
version0.9.0-incubating/version
/dependency
 
!-- Test --
dependency
groupIdjunit/groupId
artifactIdjunit/artifactId
version4.11/version
scopetest/scope
/dependency
dependency
groupIdorg.specs2/groupId
 
  artifactIdspecs2_${scala.tools.version}/artifactId
version1.13/version
scopetest/scope
/dependency
dependency
groupIdorg.scalatest/groupId
 
  artifactIdscalatest_${scala.tools.version}/artifactId
version2.0.M6-SNAP8/version
scopetest/scope
/dependency
/dependencies
 
build
sourceDirectorysrc/main/scala/sourceDirectory
testSourceDirectorysrc/test/scala/testSourceDirectory
plugins
plugin
!-- see
  http://davidb.github.com/scala-maven-plugin --
groupIdnet.alchim31.maven/groupId
 
  artifactIdscala-maven-plugin/artifactId
version3.1.6/version
configuration
 
  scalaCompatVersion2.10/scalaCompatVersion
jvmArgs
jvmArg-Xms128m/jvmArg
jvmArg-Xmx2048m/jvmArg
/jvmArgs
/configuration
executions
execution
goals

Re: testing 0.9.0-incubating and maven

2014-01-16 Thread Patrick Wendell
Hey Alex,

Maven profiles only affect the Spark build itself. They do not
transitively affect your own build.

Checkout the docs for how to deploy applications on yarn:
http://spark.incubator.apache.org/docs/latest/running-on-yarn.html

When compiling your application, just should explicitly add the hadoop
version you depend on to your own build (e.g. a hadoop-client
dependency). Take a look at the example here where we show adding
hadoop-client:

http://spark.incubator.apache.org/docs/latest/quick-start.html

When deploying Spark applications on YARN, you actually want to mark
spark as a provided dependency in your application's maven and bundle
your application as an assembly jar, then submit it with a Spark YARN
bundle to a YARN cluster. The instructions are the same as they were
in 0.8.1.

For the spark jar you want to submit to YARN, you can download the
precompiled Spark one.

It might make sense to try this pipeline with 0.8.1 and get it working
there. It sounds here more like you are dealing with getting the build
set-up rather than a particular issue with the 0.9.0 RC.

- Patrick

On Thu, Jan 16, 2014 at 1:13 PM, Alex Cozzi alexco...@gmail.com wrote:
 Hi Patrick,
 thank you for testing. I think I found out what is wrong: I am trying to 
 build my own examples that also depend on another library which in turns 
 depends on hadoop 2.2.
 what was happening is that my library brings in hadoop 2.2, while spark 
 depends on hadoop 1.04 and then I think I get conflict versions of the 
 classes.

 A couple of things are not clear to me:

 1: do the published artifacts support YARN and hadoop 2.2 or will I need to 
 make my own build?
 2: if they do, how do I activate the profiles in my maven config? I tried mvn 
 -Pyarn compile but it does not work (maven says “[WARNING] The requested 
 profile yarn could not be activated because it does not exist.”)


 essentially I would like to specify the spark dependencies as:

 dependencies
 dependency
 groupIdorg.scala-lang/groupId
 artifactIdscala-library/artifactId
 version${scala.version}/version
 /dependency

 dependency
 groupIdorg.apache.spark/groupId
 
 artifactIdspark-core_${scala.tools.version}/artifactId
 version0.9.0-incubating/version
 /dependency

 and tell maven to use the “yarn” profile for this dependency, but I do not 
 seem to be able to make it work.
 Anybody has any suggestion?

 Alex


Re: spark code formatter?

2014-01-09 Thread Patrick Wendell
I'm also very wary of using a code formatter for the reasons already
mentioned by Reynold.

Does scaliform have a mode where it just provides style checks rather
than reformat the code? This is something we really need for, e.g.,
reviewing the many submissions to the project.

- Patrick

On Wed, Jan 8, 2014 at 11:51 PM, Reynold Xin r...@databricks.com wrote:
 Thanks for doing that, DB. Not sure about others, but I'm actually strongly
 against blanket automatic code formatters, given that they can be
 disruptive. Often humans would intentionally choose to style things in a
 certain way for more clear semantics and better readability. Code
 formatters don't capture these nuances. It is pretty dangerous to just auto
 format everything.

 Maybe it'd be ok if we restrict the code formatters to a very limited set
 of things, such as indenting function parameters, etc.


 On Wed, Jan 8, 2014 at 10:28 PM, DB Tsai dbt...@alpinenow.com wrote:

 A pull request for scalariform.
 https://github.com/apache/incubator-spark/pull/365

 Sincerely,

 DB Tsai
 Machine Learning Engineer
 Alpine Data Labs
 --
 Web: http://alpinenow.com/


 On Wed, Jan 8, 2014 at 10:09 PM, DB Tsai dbt...@alpinenow.com wrote:
  We use sbt-scalariform in our company, and it can automatically format
  the coding style when runs `sbt compile`.
 
  https://github.com/sbt/sbt-scalariform
 
  We ask our developers to run `sbt compile` before commit, and it's
  really nice to see everyone has the same spacing and indentation.
 
  Sincerely,
 
  DB Tsai
  Machine Learning Engineer
  Alpine Data Labs
  --
  Web: http://alpinenow.com/
 
 
  On Wed, Jan 8, 2014 at 9:50 PM, Reynold Xin r...@databricks.com wrote:
  We have a Scala style configuration file in Shark:
  https://github.com/amplab/shark/blob/master/scalastyle-config.xml
 
  However, the scalastyle project is still pretty primitive and doesn't
 cover
  most of the use cases. It is still great to include it to cover basic
  checks such as 100-char wide lines.
 
 
  On Wed, Jan 8, 2014 at 8:02 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 
  Not that I know of. This would be very useful to add, especially if we
 can
  make SBT automatically check the code style (or we can somehow plug
 this
  into Jenkins).
 
  Matei
 
  On Jan 8, 2014, at 11:00 AM, Michael Allman m...@allman.ms wrote:
 
   Hi,
  
   I've read the spark code style guide for contributors here:
  
  
 https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
  
   For scala code, do you have a scalariform configuration that you use
 to
  format your code to these specs?
  
   Cheers,
  
   Michael
 
 



Re: Build Changes for SBT Users

2014-01-05 Thread Patrick Wendell
Ya I was referring to already released version. Of course we can
update for subsequent releases...

On Sun, Jan 5, 2014 at 4:24 PM, Reynold Xin r...@databricks.com wrote:
 Why is it not possible? You always update the script; just can't update
 scripts for released versions.




 On Sat, Jan 4, 2014 at 9:07 PM, Patrick Wendell pwend...@gmail.com wrote:

 I agree TD - I was just saying that Reynold's proposal that we could
 update the release post-hoc is unfortunately not possible.

 On Sat, Jan 4, 2014 at 7:13 PM, Tathagata Das
 tathagata.das1...@gmail.com wrote:
  Patrick, that is right. All we are trying to ensure is to make a
  best-effort attempt to make it smooth for a new user. The script will
 try
  its best to automatically install / download sbt for the user. The
 fallback
  will be that the user will have to install sbt on their own. If the URL
  happens to change and our script fails to automatically download, then we
  are *no worse* than not providing the script at all.
 
  TD
 
 
  On Sat, Jan 4, 2014 at 7:06 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Reynold the issue is releases are immutable and we expect them to be
  downloaded for several years after the release date.
 
  On Sat, Jan 4, 2014 at 5:57 PM, Xuefeng Wu ben...@gmail.com wrote:
   Sound reasonable.  But I think few installed sbt even it is easy to
  install.  I think can provide this tricky script in online document,
 user
  could download this script to install sbt independence. Sound like a yet
  another brew install sbt?
   :)
  
   Yours, Xuefeng Wu 吴雪峰 敬上
  
   On 2014年1月5日, at 上午2:56, Patrick Wendell pwend...@gmail.com wrote:
  
   We thought about this but elected not to do this for a few reasons.
  
   1. Some people build from machines that do not have internet access
   for security reasons and retrieve dependency from internal nexus
   repositories. So having a build dependency that relies on internet
   downloads is not desirable.
  
   2. It's a hard to ensure stability of a particular URL in perpetuity.
   This is why maven central and other mirror networks exist. Keep in
   mind that we can't change the release code ever once we release it,
   and if something changed about the particular URL it could break the
   build.
  
   - Patrick
  
   On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com
  wrote:
   +1 on bundling a script similar to that one
  
  
   On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca
 
  wrote:
  
   Could we ship a shell script which downloads the sbt jar if not
  present
   (like for example
 https://github.com/holdenk/slashem/blob/master/sbt)?
  
  
   On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
  
   Hey All,
  
   Due to an ASF requirement, we recently merged a patch which
 removes
   the sbt jar from the build. This is necessary because we aren't
   allowed to distributed binary artifacts with our source packages.
  
   This means that instead of building Spark with sbt/sbt XXX,
 you'll
   need to have sbt yourself and just run sbt XXX from within the
  Spark
   directory. This is similar to the maven build, where we expect
 users
   already have maven installed.
  
   You can download sbt at http://www.scala-sbt.org/. It's okay to
 just
   download the most recent version of sbt, since sbt knows how to
 fetch
   other versions of itself and will always use the one we specify in
  our
   build file to compile spark.
  
   - Patrick
  
  
  
   --
   Cell : 425-233-8271
  
 



Build Changes for SBT Users

2014-01-04 Thread Patrick Wendell
Hey All,

Due to an ASF requirement, we recently merged a patch which removes
the sbt jar from the build. This is necessary because we aren't
allowed to distributed binary artifacts with our source packages.

This means that instead of building Spark with sbt/sbt XXX, you'll
need to have sbt yourself and just run sbt XXX from within the Spark
directory. This is similar to the maven build, where we expect users
already have maven installed.

You can download sbt at http://www.scala-sbt.org/. It's okay to just
download the most recent version of sbt, since sbt knows how to fetch
other versions of itself and will always use the one we specify in our
build file to compile spark.

- Patrick


Re: Build Changes for SBT Users

2014-01-04 Thread Patrick Wendell
We thought about this but elected not to do this for a few reasons.

1. Some people build from machines that do not have internet access
for security reasons and retrieve dependency from internal nexus
repositories. So having a build dependency that relies on internet
downloads is not desirable.

2. It's a hard to ensure stability of a particular URL in perpetuity.
This is why maven central and other mirror networks exist. Keep in
mind that we can't change the release code ever once we release it,
and if something changed about the particular URL it could break the
build.

- Patrick

On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com wrote:
 +1 on bundling a script similar to that one


 On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca wrote:

 Could we ship a shell script which downloads the sbt jar if not present
 (like for example https://github.com/holdenk/slashem/blob/master/sbt )?


 On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell pwend...@gmail.com
 wrote:

  Hey All,
 
  Due to an ASF requirement, we recently merged a patch which removes
  the sbt jar from the build. This is necessary because we aren't
  allowed to distributed binary artifacts with our source packages.
 
  This means that instead of building Spark with sbt/sbt XXX, you'll
  need to have sbt yourself and just run sbt XXX from within the Spark
  directory. This is similar to the maven build, where we expect users
  already have maven installed.
 
  You can download sbt at http://www.scala-sbt.org/. It's okay to just
  download the most recent version of sbt, since sbt knows how to fetch
  other versions of itself and will always use the one we specify in our
  build file to compile spark.
 
  - Patrick
 



 --
 Cell : 425-233-8271



Re: Build Changes for SBT Users

2014-01-04 Thread Patrick Wendell
Hey Holden,

That sounds reasonable to me. Where would we get a url we can control
though? Right now the project has web space is at incubator.apache...
but later this will change to a full apache domain. Is there somewhere
in maven central these jars are hosted... that would be the nicest
because things like repo1.maven.org basically never changes.

- Patrick

On Sat, Jan 4, 2014 at 1:20 PM, Holden Karau hol...@pigscanfly.ca wrote:
 That makes sense, I think we could structure a script in such a way that it
 would overcome these problems though and probably provide a fair a mount of
 benefit for people who just want to get started quickly.

 The easiest would be to have it use the system sbt if present and then fall
 back to downloading the sbt jar. As far as stability of the URL goes we
 could solve this by either having it point at a domain we control, or just
 with an clear error message indicating it failed to download sbt and the
 user needs to install sbt.

 If a restructured script in that manner would be useful I could whip up a
 pull request :)


 On Sat, Jan 4, 2014 at 10:56 AM, Patrick Wendell pwend...@gmail.com wrote:

 We thought about this but elected not to do this for a few reasons.

 1. Some people build from machines that do not have internet access
 for security reasons and retrieve dependency from internal nexus
 repositories. So having a build dependency that relies on internet
 downloads is not desirable.

 2. It's a hard to ensure stability of a particular URL in perpetuity.
 This is why maven central and other mirror networks exist. Keep in
 mind that we can't change the release code ever once we release it,
 and if something changed about the particular URL it could break the
 build.

 - Patrick

 On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com wrote:
  +1 on bundling a script similar to that one
 
 
  On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca
 wrote:
 
  Could we ship a shell script which downloads the sbt jar if not present
  (like for example https://github.com/holdenk/slashem/blob/master/sbt )?
 
 
  On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   Hey All,
  
   Due to an ASF requirement, we recently merged a patch which removes
   the sbt jar from the build. This is necessary because we aren't
   allowed to distributed binary artifacts with our source packages.
  
   This means that instead of building Spark with sbt/sbt XXX, you'll
   need to have sbt yourself and just run sbt XXX from within the Spark
   directory. This is similar to the maven build, where we expect users
   already have maven installed.
  
   You can download sbt at http://www.scala-sbt.org/. It's okay to just
   download the most recent version of sbt, since sbt knows how to fetch
   other versions of itself and will always use the one we specify in our
   build file to compile spark.
  
   - Patrick
  
 
 
 
  --
  Cell : 425-233-8271
 




 --
 Cell : 425-233-8271


Re: Build Changes for SBT Users

2014-01-04 Thread Patrick Wendell
Reynold the issue is releases are immutable and we expect them to be
downloaded for several years after the release date.

On Sat, Jan 4, 2014 at 5:57 PM, Xuefeng Wu ben...@gmail.com wrote:
 Sound reasonable.  But I think few installed sbt even it is easy to install.  
 I think can provide this tricky script in online document, user could 
 download this script to install sbt independence. Sound like a yet another 
 brew install sbt?
 :)

 Yours, Xuefeng Wu 吴雪峰 敬上

 On 2014年1月5日, at 上午2:56, Patrick Wendell pwend...@gmail.com wrote:

 We thought about this but elected not to do this for a few reasons.

 1. Some people build from machines that do not have internet access
 for security reasons and retrieve dependency from internal nexus
 repositories. So having a build dependency that relies on internet
 downloads is not desirable.

 2. It's a hard to ensure stability of a particular URL in perpetuity.
 This is why maven central and other mirror networks exist. Keep in
 mind that we can't change the release code ever once we release it,
 and if something changed about the particular URL it could break the
 build.

 - Patrick

 On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com wrote:
 +1 on bundling a script similar to that one


 On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca wrote:

 Could we ship a shell script which downloads the sbt jar if not present
 (like for example https://github.com/holdenk/slashem/blob/master/sbt )?


 On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 Hey All,

 Due to an ASF requirement, we recently merged a patch which removes
 the sbt jar from the build. This is necessary because we aren't
 allowed to distributed binary artifacts with our source packages.

 This means that instead of building Spark with sbt/sbt XXX, you'll
 need to have sbt yourself and just run sbt XXX from within the Spark
 directory. This is similar to the maven build, where we expect users
 already have maven installed.

 You can download sbt at http://www.scala-sbt.org/. It's okay to just
 download the most recent version of sbt, since sbt knows how to fetch
 other versions of itself and will always use the one we specify in our
 build file to compile spark.

 - Patrick



 --
 Cell : 425-233-8271



Re: Build Changes for SBT Users

2014-01-04 Thread Patrick Wendell
I agree TD - I was just saying that Reynold's proposal that we could
update the release post-hoc is unfortunately not possible.

On Sat, Jan 4, 2014 at 7:13 PM, Tathagata Das
tathagata.das1...@gmail.com wrote:
 Patrick, that is right. All we are trying to ensure is to make a
 best-effort attempt to make it smooth for a new user. The script will try
 its best to automatically install / download sbt for the user. The fallback
 will be that the user will have to install sbt on their own. If the URL
 happens to change and our script fails to automatically download, then we
 are *no worse* than not providing the script at all.

 TD


 On Sat, Jan 4, 2014 at 7:06 PM, Patrick Wendell pwend...@gmail.com wrote:

 Reynold the issue is releases are immutable and we expect them to be
 downloaded for several years after the release date.

 On Sat, Jan 4, 2014 at 5:57 PM, Xuefeng Wu ben...@gmail.com wrote:
  Sound reasonable.  But I think few installed sbt even it is easy to
 install.  I think can provide this tricky script in online document, user
 could download this script to install sbt independence. Sound like a yet
 another brew install sbt?
  :)
 
  Yours, Xuefeng Wu 吴雪峰 敬上
 
  On 2014年1月5日, at 上午2:56, Patrick Wendell pwend...@gmail.com wrote:
 
  We thought about this but elected not to do this for a few reasons.
 
  1. Some people build from machines that do not have internet access
  for security reasons and retrieve dependency from internal nexus
  repositories. So having a build dependency that relies on internet
  downloads is not desirable.
 
  2. It's a hard to ensure stability of a particular URL in perpetuity.
  This is why maven central and other mirror networks exist. Keep in
  mind that we can't change the release code ever once we release it,
  and if something changed about the particular URL it could break the
  build.
 
  - Patrick
 
  On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com
 wrote:
  +1 on bundling a script similar to that one
 
 
  On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca
 wrote:
 
  Could we ship a shell script which downloads the sbt jar if not
 present
  (like for example https://github.com/holdenk/slashem/blob/master/sbt)?
 
 
  On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Hey All,
 
  Due to an ASF requirement, we recently merged a patch which removes
  the sbt jar from the build. This is necessary because we aren't
  allowed to distributed binary artifacts with our source packages.
 
  This means that instead of building Spark with sbt/sbt XXX, you'll
  need to have sbt yourself and just run sbt XXX from within the
 Spark
  directory. This is similar to the maven build, where we expect users
  already have maven installed.
 
  You can download sbt at http://www.scala-sbt.org/. It's okay to just
  download the most recent version of sbt, since sbt knows how to fetch
  other versions of itself and will always use the one we specify in
 our
  build file to compile spark.
 
  - Patrick
 
 
 
  --
  Cell : 425-233-8271
 



Re: Changes that affect packaging and running Spark

2014-01-03 Thread Patrick Wendell
-- Small correction

 /sbin contains administrative scripts for launching the standalone
 cluster manager:
 /sbin/start-master.sh
 /sbin/start-all.sh
 ...etc


Re: Terminology: worker vs slave

2014-01-02 Thread Patrick Wendell
Ya we've been trying to standardize on the terminology here (see glossary):

http://spark.incubator.apache.org/docs/latest/cluster-overview.html

I think slave actually isn't mentioned here at all - but references
to slave in the codebase are synonymous with worker.

- Patrick

On Thu, Jan 2, 2014 at 10:42 PM, Reynold Xin r...@databricks.com wrote:
 It is historic.

 I think we are converging towards

 worker: the slave daemon in the standalone cluster manager

 executor: the jvm process that is launched by the worker that executes tasks



 On Thu, Jan 2, 2014 at 10:39 PM, Andrew Ash and...@andrewash.com wrote:

 The terms worker and slave seem to be used interchangeably.  Are they the
 same?

 Worker is used more frequently in the codebase:

 aash@aash-mbp ~/git/spark$ git grep -i worker | wc -l
  981
 aash@aash-mbp ~/git/spark$ git grep -i slave | wc -l
  348
 aash@aash-mbp ~/git/spark$

 Does it make sense to unify on one or the other?



Disallowing null mergeCombiners

2013-12-31 Thread Patrick Wendell
Hey All,

There is a small API change that we are considering for the external
sort patch. Previously we allowed mergeCombiner to be null when map
side aggregation was not enabled. This is because it wasn't necessary
in that case since mappers didn't ship pre-aggregated values to
reducers.

Because the external sort capability also relies on the mergeCombiner
function to merge partially-aggregated on-disk segments, we now need
it all the time, even if map side aggregation is enabled. This is a
fairly esoteric thing that I'm not sure anyone other than Shark ever
used, but I want to check in case anyone had feelings about this.

The relevant code is here:

https://github.com/apache/incubator-spark/pull/303/files#diff-f70e97c099b5eac05c75288cb215e080R72

- Patrick


Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st

2013-12-24 Thread Patrick Wendell
Hey Andy - these Nabble groups look great! Thanks for setting them up.


On Tue, Dec 24, 2013 at 10:49 AM, Evan Chan e...@ooyala.com wrote:

 Thanks Andy, at first glance nabble seems great, it allows search plus
 posting new topics, so it appears to be bidirectional.Now just have to
 register an account on there.


 On Sun, Dec 22, 2013 at 2:47 PM, Andy Konwinski 
 andykonwin...@gmail.comwrote:

 Per Matei's suggestion, I've set up two nabble archive lists, one to
 archive the apache dev list and one to archive the apache user list.

 user list archive: http://apache-spark-user-list.1001560.n3.nabble.com
 dev list archive:
 http://apache-spark-developers-list.1001551.n3.nabble.com

 Between these and whatever solution we end up with for the google group
 mirrors, we should have decent enough alternatives to reading via the
 apache list archives going forward.


 On Thu, Dec 19, 2013 at 11:09 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  Yes, I agree that we should close down the existing Google group on Jan
  1st. While it’s more convenient to use, it’s created confusion. I hope
 that
  we can get the ASF to support better search interfaces in the future
 too. I
  think we just have to drive this from within.
 
  The Google Group should be a nice way to make the content searchable
 from
  the web. We should also see what it takes to make it mirrored on Nabble
 (
  http://www.nabble.com). I’ve found a lot of information about other
  projects there, and other Apache projects do use it.
 
  Matei
 
  On Dec 19, 2013, at 10:49 PM, Andy Konwinski andykonwin...@gmail.com
  wrote:
 
  I've set up two new unofficial google groups to mirror the Apache Spark
  user and dev lists:
 
  https://groups.google.com/forum/#!forum/apache-spark-dev-mirror
  https://groups.google.com/forum/#!forum/apache-spark-user-mirror
 
  Basically these lists each subscribe to the corresponding Apache list.
 
  They do not allow folks to subscribe directly to them. Getting emails
 from
  the Google Group would offer no advantages that I can think of and we
  really want to encourage folks to sign up for the official mailing list
  instead.
 
  The lists do allow the public to send email to them, which I think might
  be necessary since the from: field for all emails that get distributed
  via the Apache mailing list is set to the author of the email.
 
  I think this might be a great compromise. At least we can try this out
 and
  see how it goes.
 
  Matei, can you confirm that Jan 1 is the date we want to turn off the
  existing spark-users google group?
 
  We could consider using the existing spark-developers and spark-users
  google groups instead of the two new ones I just created but I think
 that
  it is much more obvious to have the lists include the word mirror in
 their
  names.
 
  The dev list mirror seems to be working, because I see the last couple
  emails from this thread in it already. I'll confirm and ensure that the
  user list mirror is working too.
 
  Thoughts?
 
  Andy
 
  P.S. Thanks to Patrick for suggesting this to me originally.
 
  On Thu, Dec 19, 2013 at 8:46 PM, Aaron Davidson ilike...@gmail.com
 wrote:
 
  I'd be fine with one-way mirrors here (Apache threads being reflected
 in
  Google groups) -- I have no idea how one is supposed to navigate the
 Apache
  list to look for historic threads.
 
 
  On Thu, Dec 19, 2013 at 7:58 PM, Mike Potts maspo...@gmail.com
 wrote:
 
  Thanks very much for the prompt and comprehensive reply!  I appreciate
  the overarching desire to integrate with apache: I'm very happy to
 hear
  that there's a move to use the existing groups as mirrors: that will
  overcome all of my objections: particularly if it's bidirectional! :)
 
 
  On Thursday, December 19, 2013 7:19:06 PM UTC-8, Andy Konwinski wrote:
 
  Hey Mike,
 
  As you probably noticed when you CC'd spark-de...@googlegroups.com,
  that list has already be reconfigured so that it no longer allows
 posting
  (and bounces emails sent to it).
 
  We will be doing the same thing to the spark...@googlegroups.comlist
  too (we'll announce a date for that soon).
 
  That may sound very frustrating, and you are *not* alone feeling that
  way. We've had a long conversation with our mentors about this, and
 I've
  felt very similar to you, so I'd like to give you background.
 
  As I'm coming to see it, part of becoming an Apache project is moving
  the community *fully* over to Apache infrastructure, and more
 generally the
  Apache way of organizing the community.
 
  This applies in both the nuts-and-bolts sense of being on apache
 infra,
  but possibly more importantly, it is also a guiding principle and
 way of
  thinking.
 
  In various ways, moving to apache Infra can be a painful process, and
  IMO the loss of all the great mailing list functionality that comes
 with
  using Google Groups is perhaps the most painful step. But basically,
 the de
  facto mailing lists need to be the Apache ones, and not Google
 

Re: Akka problem when using scala command to launch Spark applications in the current 0.9.0-SNAPSHOT

2013-12-24 Thread Patrick Wendell
Even,

This problem also exists for people who write their own applications that
depend on/include Spark. E.g. they bundle up their app and then launch the
driver with scala -cp my-budle.jar... I've seen this cause an issue in
that setting.

- Patrick


On Tue, Dec 24, 2013 at 10:50 AM, Evan Chan e...@ooyala.com wrote:

 Hi Reynold,

 The default, documented methods of starting Spark all use the assembly
 jar, and thus java, right?

 -Evan



 On Fri, Dec 20, 2013 at 11:36 PM, Reynold Xin r...@databricks.com wrote:

 It took me hours to debug a problem yesterday on the latest master branch
 (0.9.0-SNAPSHOT), and I would like to share with the dev list in case
 anybody runs into this Akka problem.

 A little background for those of you who haven't followed closely the
 development of Spark and YARN 2.2: YARN 2.2 uses protobuf 2.5, and Akka
 uses an older version of protobuf that is not binary compatible. In order
 to have a single build that is compatible for both YARN 2.2 and pre-2.2
 YARN/Hadoop, we published a special version of Akka that builds with
 protobuf shaded (i.e. using a different package name for the protobuf
 stuff).

 However, it turned out Scala 2.10 includes a version of Akka jar in its
 default classpath (look at the lib folder in Scala 2.10 binary
 distribution). If you use the scala command to launch any Spark
 application
 on the current master branch, there is a pretty high chance that you
 wouldn't be able to create the SparkContext (stack trace at the end of the
 email). The problem is that the Akka packaged with Scala 2.10 takes
 precedence in the classloader over the special Akka version Spark
 includes.

 Before we have a good solution for this, the workaround is to use java to
 launch the application instead of scala. All you need to do is to include
 the right Scala jars (scala-library and scala-compiler) in the classpath.
 Note that the scala command is really just a simple script that calls java
 with the right classpath.


 Stack trace:

 java.lang.NoSuchMethodException:
 akka.remote.RemoteActorRefProvider.init(java.lang.String,
 akka.actor.ActorSystem$Settings, akka.event.EventStream,
 akka.actor.Scheduler, akka.actor.DynamicAccess)
 at java.lang.Class.getConstructor0(Class.java:2763)
 at java.lang.Class.getDeclaredConstructor(Class.java:2021)
 at

 akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:77)
 at scala.util.Try$.apply(Try.scala:161)
 at

 akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:74)
 at

 akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:85)
 at

 akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:85)
 at scala.util.Success.flatMap(Try.scala:200)
 at

 akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:85)
 at akka.actor.ActorSystemImpl.init(ActorSystem.scala:546)
 at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
 at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
 at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:79)
 at
 org.apache.spark.SparkEnv$.createFromSystemProperties(SparkEnv.scala:120)
 at org.apache.spark.SparkContext.init(SparkContext.scala:106)




 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |

 http://www.ooyala.com/ 
 http://www.facebook.com/ooyalahttp://www.linkedin.com/company/ooyalahttp://www.twitter.com/ooyala




Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st

2013-12-20 Thread Patrick Wendell
Andy and Mike,

I'd also prefer to just convert the old groups into mirrors. That way
people who are still subscribed to them will continue to get e-mails
(and most people on the list are read-only users).

Ideally we'd have the behavior that users who try to e-mail the google
group get a bounce back saying this is now a read only mirror.

That said I have *no idea* of this is possible to set-up nicely within
google groups. I defer to Andy! Having the new mirror groups also
seems like a decent solution as well...

- Patrick

On Fri, Dec 20, 2013 at 8:35 AM, Mike Potts maspo...@gmail.com wrote:
 I actually prefer that, but I didn't want my preference to get in the way of
 creating mirror groups, one way or the other :)  (My argument would be that
 since the old groups would be closing anyway, re-purposing them as mirrors
 is fair use: and less work/confusing than creating new *-mirror groups
 instead.)


 On Friday, December 20, 2013 8:29:40 AM UTC-8, Andy Konwinski wrote:

 That would be really awesome. I'm not familiar with any Google Groups
 functionality that supports that but I'll look.

 That's an argument for maybe just changing the names of the existing
 groups to something with mirror in them instead of using newly created ones.

 --
 You received this message because you are subscribed to the Google Groups
 Spark Users group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to spark-users+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


Spark 0.8.1 Released

2013-12-19 Thread Patrick Wendell
Hi everyone,

We've just posted Spark 0.8.1, a new maintenance release that contains
some bug fixes and improvements to the 0.8 branch. The full release
notes are available at [1]. Apart from various bug fixes, 0.8.1
includes support for YARN 2.2, a high availability mode for the
standalone scheduler, and optimizations to the shuffle. We recommend
that current users update to this release. You can grab the release at
[2].

[1] http://spark.incubator.apache.org/releases/spark-release-0-8-1.html
[2] http://spark.incubator.apache.org/downloads

Thanks to the following people who contributed to this release:

Michael Armbrust, Pierre Borckmans, Evan Chan, Ewen Cheslack, Mosharaf
Chowdhury, Frank Dai, Aaron Davidson, Tathagata Das, Ankur Dave,
Harvey Feng, Ali Ghodsi, Thomas Graves, Li Guoqiang, Stephen Haberman,
Haidar Hadi, Nathan Howell, Holden Karau, Du Li, Raymond Liu, Xi Liu,
David McCauley, Michael (wannabeast), Fabrizio Milo, Mridul
Muralidharan, Sundeep Narravula, Kay Ousterhout, Nick Pentreath, Imran
Rashid, Ahir Reddy, Josh Rosen, Henry Saputra, Jerry Shao, Mingfei
Shi, Andre Schumacher, Karthik Tunga, Patrick Wendell, Neal Wiggins,
Andrew Xia, Reynold Xin, Matei Zaharia, and Wu Zeming

- Patrick


[RESULT] [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-17 Thread Patrick Wendell
The vote is now closed. This vote passes with 4 IPMC +1's and no 0 or -1 votes.

+1 (4 Total)
Marvin Humphrey
Henry Saputra
Chris Mattmann
Roman Shaposhnik

0 (0 Total)

-1 (0 Total)

* = Binding Vote

Thanks to everyone who helped vet this release.

- Patrick


Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-15 Thread Patrick Wendell
You can checkout the docs mentioned in the vote thread. There is also
a pre-build binary for hadoop2 that is compiled for YARN 2.2

- Patrick

On Sun, Dec 15, 2013 at 4:31 AM, Azuryy Yu azury...@gmail.com wrote:
 yarn 2.2, not yarn 0.22, I am so sorry.


 On Sun, Dec 15, 2013 at 8:31 PM, Azuryy Yu azury...@gmail.com wrote:

 Hi,
 Spark-0.8.1 supports yarn 0.22 right? where to find the release note?
 Thanks.


 On Sun, Dec 15, 2013 at 3:20 AM, Henry Saputra 
 henry.sapu...@gmail.comwrote:

 Yeah seems like it. He was ok with our prev release.
 Let's wait for his reply

 On Saturday, December 14, 2013, Patrick Wendell wrote:

  Henry - from that thread it looks like sebb's concern was something
  different than this.
 
  On Sat, Dec 14, 2013 at 11:08 AM, Henry Saputra 
 henry.sapu...@gmail.com
  wrote:
   Hi Patrick,
  
   Yeap I agree, but technically ASF VOTE release on source only, there
   even debate about it =), so putting it in the vote staging artifact
   could confuse people because in our case we do package 3rd party
   libraries in the binary jars.
  
   I have sent email to sebb asking clarification about his concern in
   general@ list.
  
   - Henry
  
   On Sat, Dec 14, 2013 at 10:56 AM, Patrick Wendell pwend...@gmail.com
 
  wrote:
   Hey Henry,
  
   One thing a lot of people do during the vote is test the binaries and
   make sure they work. This is really valuable. If you'd like I could
   add a caveat to the vote thread explaining that we are only voting on
   the source.
  
   - Patrick
  
   On Sat, Dec 14, 2013 at 10:40 AM, Henry Saputra 
  henry.sapu...@gmail.com wrote:
   Actually we should be fine putting the binaries there as long as the
   VOTE is for the source.
  
   Let's verify with sebb in the general@ list about his concern.
  
   - Henry
  
   On Sat, Dec 14, 2013 at 10:31 AM, Henry Saputra 
  henry.sapu...@gmail.com wrote:
   Hi Patrick, as sebb has mentioned let's move the binaries from the
   voting directory in your people.apache.org directory.
   ASF release voting is for source code and not binaries, and
   technically we provide binaries for convenience.
  
   And add link to the KEYS location in the dist[1] to let verify
  signatures.
  
   Sorry for the late response to the VOTE thread, guys.
  
   - Henry
  
   [1]
 https://dist.apache.org/repos/dist/release/incubator/spark/KEYS
  
   On Fri, Dec 13, 2013 at 6:37 PM, Patrick Wendell 
 pwend...@gmail.com
  wrote:
   The vote is now closed. This vote passes with 5 PPMC +1's and no 0
  or -1
   votes.
  
   +1 (5 Total)
   Matei Zaharia*
   Nick Pentreath*
   Patrick Wendell*
   Prashant Sharma*
   Tom Graves*
  
   0 (0 Total)
  
   -1 (0 Total)
  
   * = Binding Vote
  
   As per the incubator release guide [1] I'll be sending this to the
   general incubator list for a final vote from IPMC members.
  
   [1]
  
 
 http://incubator.apache.org/guides/releasemanagement.html#best-practice-incubator-release-
   vote
  
  
   On Thu, Dec 12, 2013 at 8:59 AM, Evan Chan e...@ooyala.com wrote:
  
   I'd be personally fine with a standard workflow of assemble-deps
 +
   packaging just the Spark files as separate packages, if it
 speeds up
   everyone's development time.
  
  
   On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra 
  m...@clearstorydata.com
   wrote:
  
I don't know how to make sense of the numbers, but here's what
  I've got
from a very small sample size.





Re: Scala 2.10 Merge

2013-12-14 Thread Patrick Wendell
Alright I just merged this in - so Spark is officially Scala 2.10
from here forward.

For reference I cut a new branch called scala-2.9 with the commit
immediately prior to the merge:
https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=shortlog;h=refs/heads/scala-2.9

- Patrick

On Thu, Dec 12, 2013 at 8:26 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Reymond,

 Let's move this discussion out of this thread and into the associated JIRA.
 I'll write up our current approach over there.

 https://spark-project.atlassian.net/browse/SPARK-995

 - Patrick


 On Thu, Dec 12, 2013 at 5:56 PM, Liu, Raymond raymond@intel.com wrote:

 Hi Patrick

 So what's the plan for support Yarn 2.2 in 0.9? As far as I can
 see, if you want to support both 2.2 and 2.0 , due to protobuf version
 incompatible issue. You need two version of akka anyway.

 Akka 2.3-M1 looks like have a little bit change in API, we
 probably could isolate the code like what we did on yarn part API. I
 remember that it is mentioned that to use reflection for different API is
 preferred. So the purpose to use reflection is to use one release bin jar to
 support both version of Hadoop/Yarn on runtime, instead of build different
 bin jar on compile time?

  Then all code related to hadoop will also be built in separate
 modules for loading on demand? This sounds to me involve a lot of works. And
 you still need to have shim layer and separate code for different version
 API and depends on different version Akka etc. Sounds like and even strict
 demands versus our current approaching on master, and with dynamic class
 loader in addition, And the problem we are facing now are still there?

 Best Regards,
 Raymond Liu

 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Thursday, December 12, 2013 5:13 PM
 To: dev@spark.incubator.apache.org
 Subject: Re: Scala 2.10 Merge

 Also - the code is still there because of a recent merge that took in some
 newer changes... we'll be removing it for the final merge.


 On Thu, Dec 12, 2013 at 1:12 AM, Patrick Wendell pwend...@gmail.com
 wrote:

  Hey Raymond,
 
  This won't work because AFAIK akka 2.3-M1 is not binary compatible
  with akka 2.2.3 (right?). For all of the non-yarn 2.2 versions we need
  to still use the older protobuf library, so we'd need to support both.
 
  I'd also be concerned about having a reference to a non-released
  version of akka. Akka is the source of our hardest-to-find bugs and
  simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting.
  Of course, if you are building off of master you can maintain a fork
  that uses this.
 
  - Patrick
 
 
  On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond
  raymond@intel.comwrote:
 
  Hi Patrick
 
  What does that means for drop YARN 2.2? seems codes are still
  there. You mean if build upon 2.2 it will break, and won't and work
  right?
  Since the home made akka build on scala 2.10 are not there. While, if
  for this case, can we just use akka 2.3-M1 which run on protobuf 2.5
  for replacement?
 
  Best Regards,
  Raymond Liu
 
 
  -Original Message-
  From: Patrick Wendell [mailto:pwend...@gmail.com]
  Sent: Thursday, December 12, 2013 4:21 PM
  To: dev@spark.incubator.apache.org
  Subject: Scala 2.10 Merge
 
  Hi Developers,
 
  In the next few days we are planning to merge Scala 2.10 support into
  Spark. For those that haven't been following this, Prashant Sharma
  has been maintaining the scala-2.10 branch of Spark for several
  months. This branch is current with master and has been reviewed for
  merging:
 
  https://github.com/apache/incubator-spark/tree/scala-2.10
 
  Scala 2.10 support is one of the most requested features for Spark -
  it will be great to get this into Spark 0.9! Please note that *Scala
  2.10 is not binary compatible with Scala 2.9*. With that in mind, I
  wanted to give a few heads-up/requests to developers:
 
  If you are developing applications on top of Spark's master branch,
  those will need to migrate to Scala 2.10. You may want to download
  and test the current scala-2.10 branch in order to make sure you will
  be okay as Spark developments move forward. Of course, you can always
  stick with the current master commit and be fine (I'll cut a tag when
  we do the merge in order to delineate where the version changes).
  Please open new threads on the dev list to report and discuss any
  issues.
 
  This merge will temporarily drop support for YARN 2.2 on the master
  branch.
  This is because the workaround we used was only compiled for Scala 2.9.
  We are going to come up with a more robust solution to YARN 2.2
  support before releasing 0.9.
 
  Going forward, we will continue to make maintenance releases on
  branch-0.8 which will remain compatible with Scala 2.9.
 
  For those interested, the primary code changes in this merge are
  upgrading the akka version, changing the use

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-13 Thread Patrick Wendell
The vote is now closed. This vote passes with 5 PPMC +1's and no 0 or -1
votes.

+1 (5 Total)
Matei Zaharia*
Nick Pentreath*
Patrick Wendell*
Prashant Sharma*
Tom Graves*

0 (0 Total)

-1 (0 Total)

* = Binding Vote

As per the incubator release guide [1] I'll be sending this to the
general incubator list for a final vote from IPMC members.

[1]
http://incubator.apache.org/guides/releasemanagement.html#best-practice-incubator-release-
vote


On Thu, Dec 12, 2013 at 8:59 AM, Evan Chan e...@ooyala.com wrote:

 I'd be personally fine with a standard workflow of assemble-deps +
 packaging just the Spark files as separate packages, if it speeds up
 everyone's development time.


 On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra m...@clearstorydata.com
 wrote:

  I don't know how to make sense of the numbers, but here's what I've got
  from a very small sample size.
 
  For both v0.8.0-incubating and v0.8.1-incubating, building separate
  assemblies is faster than `./sbt/sbt assembly` and the times for building
  separate assemblies for 0.8.0 and 0.8.1 are about the same.
 
  For v0.8.0-incubating, `./sbt/sbt assembly` takes about 2.5x as long as
 the
  sum of the separate assemblies.
  For v0.8.1-incubating, `./sbt/sbt assembly` takes almost 8x as long as
 the
  sum of the separate assemblies.
 
  Weird.
 
 
  On Wed, Dec 11, 2013 at 11:49 AM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   I'll +1 myself also.
  
   For anyone who has the slow build problem: does this issue happen when
   building v0.8.0-incubating also? Trying to figure out whether it's
   related to something we added in 0.8.1 or if it's a long standing
   issue.
  
   - Patrick
  
   On Wed, Dec 11, 2013 at 10:39 AM, Matei Zaharia 
 matei.zaha...@gmail.com
  
   wrote:
Woah, weird, but definitely good to know.
   
If you’re doing Spark development, there’s also a more convenient
  option
   added by Shivaram in the master branch. You can do sbt assemble-deps to
   package *just* the dependencies of each project in a special assembly
  JAR,
   and then use sbt compile to update the code. This will use the classes
   directly out of the target/scala-2.9.3/classes directories. You have to
   redo assemble-deps only if your external dependencies change.
   
Matei
   
On Dec 11, 2013, at 1:04 AM, Prashant Sharma scrapco...@gmail.com
   wrote:
   
I hope this PR https://github.com/apache/incubator-spark/pull/252can
   help.
Again this is not a blocker for the release from my side either.
   
   
On Wed, Dec 11, 2013 at 2:14 PM, Mark Hamstra 
  m...@clearstorydata.com
   wrote:
   
Interesting, and confirmed: On my machine where `./sbt/sbt
 assembly`
   takes
a long, long, long time to complete (a MBP, in my case),
 building
   three
separate assemblies (`./sbt/sbt assembly/assembly`, `./sbt/sbt
examples/assembly`, `./sbt/sbt tools/assembly`) takes much, much
 less
   time.
   
   
   
On Wed, Dec 11, 2013 at 12:02 AM, Prashant Sharma 
   scrapco...@gmail.com
wrote:
   
forgot to mention, after running sbt/sbt assembly/assembly running
sbt/sbt
examples/assembly takes just 37s. Not to mention my hardware is
 not
really
great.
   
   
On Wed, Dec 11, 2013 at 1:28 PM, Prashant Sharma 
   scrapco...@gmail.com
wrote:
   
Hi Patrick and Matei,
   
Was trying out this and followed the quick start guide which says
  do
sbt/sbt assembly, like few others I was also stuck for few
 minutes
  on
linux. On the other hand if I use sbt/sbt assembly/assembly it is
   much
faster.
   
Should we change the documentation to reflect this. It will not
 be
great
for first time users to get stuck there.
   
   
On Wed, Dec 11, 2013 at 9:54 AM, Matei Zaharia 
matei.zaha...@gmail.com
wrote:
   
+1
   
Built and tested it on Mac OS X.
   
Matei
   
   
On Dec 10, 2013, at 4:49 PM, Patrick Wendell 
 pwend...@gmail.com
wrote:
   
Please vote on releasing the following candidate as Apache
 Spark
(incubating) version 0.8.1.
   
The tag to be voted on is v0.8.1-incubating (commit b87d31d):
   
   
   
   
  
 
 https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=commit;h=b87d31dd8eb4b4e47c0138e9242d0dd6922c8c4e
   
The release files, including signatures, digests, etc can be
  found
at:
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
   
Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc
   
The staging repository for this release can be found at:
   
   
   https://repository.apache.org/content/repositories/orgapachespark-040/
   
The documentation corresponding to this release can be found
 at:
   
   http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/
   
For information about the contents of this release see:
   
   
   
   
  
 
 https://git-wip-us.apache.org/repos/asf

Re: Scala 2.10 Merge

2013-12-12 Thread Patrick Wendell
Also - the code is still there because of a recent merge that took in some
newer changes... we'll be removing it for the final merge.


On Thu, Dec 12, 2013 at 1:12 AM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Raymond,

 This won't work because AFAIK akka 2.3-M1 is not binary compatible with
 akka 2.2.3 (right?). For all of the non-yarn 2.2 versions we need to still
 use the older protobuf library, so we'd need to support both.

 I'd also be concerned about having a reference to a non-released version
 of akka. Akka is the source of our hardest-to-find bugs and simultaneously
 trying to support 2.2.3 and 2.3-M1 is a bit daunting. Of course, if you are
 building off of master you can maintain a fork that uses this.

 - Patrick


 On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond raymond@intel.comwrote:

 Hi Patrick

 What does that means for drop YARN 2.2? seems codes are still
 there. You mean if build upon 2.2 it will break, and won't and work right?
 Since the home made akka build on scala 2.10 are not there. While, if for
 this case, can we just use akka 2.3-M1 which run on protobuf 2.5 for
 replacement?

 Best Regards,
 Raymond Liu


 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Thursday, December 12, 2013 4:21 PM
 To: dev@spark.incubator.apache.org
 Subject: Scala 2.10 Merge

 Hi Developers,

 In the next few days we are planning to merge Scala 2.10 support into
 Spark. For those that haven't been following this, Prashant Sharma has been
 maintaining the scala-2.10 branch of Spark for several months. This branch
 is current with master and has been reviewed for merging:

 https://github.com/apache/incubator-spark/tree/scala-2.10

 Scala 2.10 support is one of the most requested features for Spark - it
 will be great to get this into Spark 0.9! Please note that *Scala 2.10 is
 not binary compatible with Scala 2.9*. With that in mind, I wanted to give
 a few heads-up/requests to developers:

 If you are developing applications on top of Spark's master branch, those
 will need to migrate to Scala 2.10. You may want to download and test the
 current scala-2.10 branch in order to make sure you will be okay as Spark
 developments move forward. Of course, you can always stick with the current
 master commit and be fine (I'll cut a tag when we do the merge in order to
 delineate where the version changes). Please open new threads on the dev
 list to report and discuss any issues.

 This merge will temporarily drop support for YARN 2.2 on the master
 branch.
 This is because the workaround we used was only compiled for Scala 2.9.
 We are going to come up with a more robust solution to YARN 2.2 support
 before releasing 0.9.

 Going forward, we will continue to make maintenance releases on
 branch-0.8 which will remain compatible with Scala 2.9.

 For those interested, the primary code changes in this merge are
 upgrading the akka version, changing the use of Scala 2.9's ClassManifest
 construct to Scala 2.10's ClassTag, and updating the spark shell to work
 with Scala 2.10's repl.

 - Patrick





Re: Scala 2.10 Merge

2013-12-12 Thread Patrick Wendell
Hey Reymond,

Let's move this discussion out of this thread and into the associated JIRA.
I'll write up our current approach over there.

https://spark-project.atlassian.net/browse/SPARK-995

- Patrick


On Thu, Dec 12, 2013 at 5:56 PM, Liu, Raymond raymond@intel.com wrote:

 Hi Patrick

 So what's the plan for support Yarn 2.2 in 0.9? As far as I can
 see, if you want to support both 2.2 and 2.0 , due to protobuf version
 incompatible issue. You need two version of akka anyway.

 Akka 2.3-M1 looks like have a little bit change in API, we
 probably could isolate the code like what we did on yarn part API. I
 remember that it is mentioned that to use reflection for different API is
 preferred. So the purpose to use reflection is to use one release bin jar
 to support both version of Hadoop/Yarn on runtime, instead of build
 different bin jar on compile time?

  Then all code related to hadoop will also be built in separate
 modules for loading on demand? This sounds to me involve a lot of works.
 And you still need to have shim layer and separate code for different
 version API and depends on different version Akka etc. Sounds like and even
 strict demands versus our current approaching on master, and with dynamic
 class loader in addition, And the problem we are facing now are still there?

 Best Regards,
 Raymond Liu

 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Thursday, December 12, 2013 5:13 PM
 To: dev@spark.incubator.apache.org
 Subject: Re: Scala 2.10 Merge

 Also - the code is still there because of a recent merge that took in some
 newer changes... we'll be removing it for the final merge.


 On Thu, Dec 12, 2013 at 1:12 AM, Patrick Wendell pwend...@gmail.com
 wrote:

  Hey Raymond,
 
  This won't work because AFAIK akka 2.3-M1 is not binary compatible
  with akka 2.2.3 (right?). For all of the non-yarn 2.2 versions we need
  to still use the older protobuf library, so we'd need to support both.
 
  I'd also be concerned about having a reference to a non-released
  version of akka. Akka is the source of our hardest-to-find bugs and
  simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting.
  Of course, if you are building off of master you can maintain a fork
 that uses this.
 
  - Patrick
 
 
  On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond raymond@intel.com
 wrote:
 
  Hi Patrick
 
  What does that means for drop YARN 2.2? seems codes are still
  there. You mean if build upon 2.2 it will break, and won't and work
 right?
  Since the home made akka build on scala 2.10 are not there. While, if
  for this case, can we just use akka 2.3-M1 which run on protobuf 2.5
  for replacement?
 
  Best Regards,
  Raymond Liu
 
 
  -Original Message-
  From: Patrick Wendell [mailto:pwend...@gmail.com]
  Sent: Thursday, December 12, 2013 4:21 PM
  To: dev@spark.incubator.apache.org
  Subject: Scala 2.10 Merge
 
  Hi Developers,
 
  In the next few days we are planning to merge Scala 2.10 support into
  Spark. For those that haven't been following this, Prashant Sharma
  has been maintaining the scala-2.10 branch of Spark for several
  months. This branch is current with master and has been reviewed for
 merging:
 
  https://github.com/apache/incubator-spark/tree/scala-2.10
 
  Scala 2.10 support is one of the most requested features for Spark -
  it will be great to get this into Spark 0.9! Please note that *Scala
  2.10 is not binary compatible with Scala 2.9*. With that in mind, I
  wanted to give a few heads-up/requests to developers:
 
  If you are developing applications on top of Spark's master branch,
  those will need to migrate to Scala 2.10. You may want to download
  and test the current scala-2.10 branch in order to make sure you will
  be okay as Spark developments move forward. Of course, you can always
  stick with the current master commit and be fine (I'll cut a tag when
  we do the merge in order to delineate where the version changes).
  Please open new threads on the dev list to report and discuss any
 issues.
 
  This merge will temporarily drop support for YARN 2.2 on the master
  branch.
  This is because the workaround we used was only compiled for Scala 2.9.
  We are going to come up with a more robust solution to YARN 2.2
  support before releasing 0.9.
 
  Going forward, we will continue to make maintenance releases on
  branch-0.8 which will remain compatible with Scala 2.9.
 
  For those interested, the primary code changes in this merge are
  upgrading the akka version, changing the use of Scala 2.9's
  ClassManifest construct to Scala 2.10's ClassTag, and updating the
  spark shell to work with Scala 2.10's repl.
 
  - Patrick
 
 
 



Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-11 Thread Patrick Wendell
I'll +1 myself also.

For anyone who has the slow build problem: does this issue happen when
building v0.8.0-incubating also? Trying to figure out whether it's
related to something we added in 0.8.1 or if it's a long standing
issue.

- Patrick

On Wed, Dec 11, 2013 at 10:39 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
 Woah, weird, but definitely good to know.

 If you’re doing Spark development, there’s also a more convenient option 
 added by Shivaram in the master branch. You can do sbt assemble-deps to 
 package *just* the dependencies of each project in a special assembly JAR, 
 and then use sbt compile to update the code. This will use the classes 
 directly out of the target/scala-2.9.3/classes directories. You have to redo 
 assemble-deps only if your external dependencies change.

 Matei

 On Dec 11, 2013, at 1:04 AM, Prashant Sharma scrapco...@gmail.com wrote:

 I hope this PR https://github.com/apache/incubator-spark/pull/252 can help.
 Again this is not a blocker for the release from my side either.


 On Wed, Dec 11, 2013 at 2:14 PM, Mark Hamstra m...@clearstorydata.comwrote:

 Interesting, and confirmed: On my machine where `./sbt/sbt assembly` takes
 a long, long, long time to complete (a MBP, in my case), building three
 separate assemblies (`./sbt/sbt assembly/assembly`, `./sbt/sbt
 examples/assembly`, `./sbt/sbt tools/assembly`) takes much, much less time.



 On Wed, Dec 11, 2013 at 12:02 AM, Prashant Sharma scrapco...@gmail.com
 wrote:

 forgot to mention, after running sbt/sbt assembly/assembly running
 sbt/sbt
 examples/assembly takes just 37s. Not to mention my hardware is not
 really
 great.


 On Wed, Dec 11, 2013 at 1:28 PM, Prashant Sharma scrapco...@gmail.com
 wrote:

 Hi Patrick and Matei,

 Was trying out this and followed the quick start guide which says do
 sbt/sbt assembly, like few others I was also stuck for few minutes on
 linux. On the other hand if I use sbt/sbt assembly/assembly it is much
 faster.

 Should we change the documentation to reflect this. It will not be
 great
 for first time users to get stuck there.


 On Wed, Dec 11, 2013 at 9:54 AM, Matei Zaharia 
 matei.zaha...@gmail.com
 wrote:

 +1

 Built and tested it on Mac OS X.

 Matei


 On Dec 10, 2013, at 4:49 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.8.1.

 The tag to be voted on is v0.8.1-incubating (commit b87d31d):



 https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=commit;h=b87d31dd8eb4b4e47c0138e9242d0dd6922c8c4e

 The release files, including signatures, digests, etc can be found
 at:
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-040/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/

 For information about the contents of this release see:



 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=CHANGES.txt;h=ce0aeab524505b63c7999e0371157ac2def6fe1c;hb=branch-0.8

 Please vote on releasing this package as Apache Spark
 0.8.1-incubating!

 The vote is open until Saturday, December 14th at 01:00 UTC and
 passes if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.8.1-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/




 --
 s




 --
 s





 --
 s



Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-11 Thread Patrick Wendell
Hey Tom,

I re-verified the signatures and got someone else to do it. It seemed
fine. Here is what I did.

gpg --recv-key 9E4FE3AF
wget 
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/spark-0.8.1-incubating.tgz.asc
wget 
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/spark-0.8.1-incubating.tgz
gpg --verify spark-0.8.1-incubating.tgz.asc spark-0.8.1-incubating.tgz
gpg: Signature made Tue 10 Dec 2013 02:53:15 PM PST using RSA key ID 9E4FE3AF
gpg: Good signature from Patrick Wendell pwend...@gmail.com

On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra m...@clearstorydata.com wrote:
 I don't know how to make sense of the numbers, but here's what I've got
 from a very small sample size.

 For both v0.8.0-incubating and v0.8.1-incubating, building separate
 assemblies is faster than `./sbt/sbt assembly` and the times for building
 separate assemblies for 0.8.0 and 0.8.1 are about the same.

 For v0.8.0-incubating, `./sbt/sbt assembly` takes about 2.5x as long as the
 sum of the separate assemblies.
 For v0.8.1-incubating, `./sbt/sbt assembly` takes almost 8x as long as the
 sum of the separate assemblies.

 Weird.


 On Wed, Dec 11, 2013 at 11:49 AM, Patrick Wendell pwend...@gmail.comwrote:

 I'll +1 myself also.

 For anyone who has the slow build problem: does this issue happen when
 building v0.8.0-incubating also? Trying to figure out whether it's
 related to something we added in 0.8.1 or if it's a long standing
 issue.

 - Patrick

 On Wed, Dec 11, 2013 at 10:39 AM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Woah, weird, but definitely good to know.
 
  If you’re doing Spark development, there’s also a more convenient option
 added by Shivaram in the master branch. You can do sbt assemble-deps to
 package *just* the dependencies of each project in a special assembly JAR,
 and then use sbt compile to update the code. This will use the classes
 directly out of the target/scala-2.9.3/classes directories. You have to
 redo assemble-deps only if your external dependencies change.
 
  Matei
 
  On Dec 11, 2013, at 1:04 AM, Prashant Sharma scrapco...@gmail.com
 wrote:
 
  I hope this PR https://github.com/apache/incubator-spark/pull/252 can
 help.
  Again this is not a blocker for the release from my side either.
 
 
  On Wed, Dec 11, 2013 at 2:14 PM, Mark Hamstra m...@clearstorydata.com
 wrote:
 
  Interesting, and confirmed: On my machine where `./sbt/sbt assembly`
 takes
  a long, long, long time to complete (a MBP, in my case), building
 three
  separate assemblies (`./sbt/sbt assembly/assembly`, `./sbt/sbt
  examples/assembly`, `./sbt/sbt tools/assembly`) takes much, much less
 time.
 
 
 
  On Wed, Dec 11, 2013 at 12:02 AM, Prashant Sharma 
 scrapco...@gmail.com
  wrote:
 
  forgot to mention, after running sbt/sbt assembly/assembly running
  sbt/sbt
  examples/assembly takes just 37s. Not to mention my hardware is not
  really
  great.
 
 
  On Wed, Dec 11, 2013 at 1:28 PM, Prashant Sharma 
 scrapco...@gmail.com
  wrote:
 
  Hi Patrick and Matei,
 
  Was trying out this and followed the quick start guide which says do
  sbt/sbt assembly, like few others I was also stuck for few minutes on
  linux. On the other hand if I use sbt/sbt assembly/assembly it is
 much
  faster.
 
  Should we change the documentation to reflect this. It will not be
  great
  for first time users to get stuck there.
 
 
  On Wed, Dec 11, 2013 at 9:54 AM, Matei Zaharia 
  matei.zaha...@gmail.com
  wrote:
 
  +1
 
  Built and tested it on Mac OS X.
 
  Matei
 
 
  On Dec 10, 2013, at 4:49 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
  (incubating) version 0.8.1.
 
  The tag to be voted on is v0.8.1-incubating (commit b87d31d):
 
 
 
 
 https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=commit;h=b87d31dd8eb4b4e47c0138e9242d0dd6922c8c4e
 
  The release files, including signatures, digests, etc can be found
  at:
  http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
 
 
 https://repository.apache.org/content/repositories/orgapachespark-040/
 
  The documentation corresponding to this release can be found at:
 
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/
 
  For information about the contents of this release see:
 
 
 
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=CHANGES.txt;h=ce0aeab524505b63c7999e0371157ac2def6fe1c;hb=branch-0.8
 
  Please vote on releasing this package as Apache Spark
  0.8.1-incubating!
 
  The vote is open until Saturday, December 14th at 01:00 UTC and
  passes if a majority of at least 3 +1 PPMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 0.8.1-incubating
  [ ] -1 Do not release this package because ...
 
  To learn more

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-11 Thread Patrick Wendell
I also talked to a few people who got corrupted binaries when
downloading from the people.apache HTTP. In that case the checksum
failed but if they re-downloaded it worked. So maybe just re-download
and try again?

On Wed, Dec 11, 2013 at 3:15 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Tom,

 I re-verified the signatures and got someone else to do it. It seemed
 fine. Here is what I did.

 gpg --recv-key 9E4FE3AF
 wget 
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/spark-0.8.1-incubating.tgz.asc
 wget 
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/spark-0.8.1-incubating.tgz
 gpg --verify spark-0.8.1-incubating.tgz.asc spark-0.8.1-incubating.tgz
 gpg: Signature made Tue 10 Dec 2013 02:53:15 PM PST using RSA key ID 9E4FE3AF
 gpg: Good signature from Patrick Wendell pwend...@gmail.com

 On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra m...@clearstorydata.com wrote:
 I don't know how to make sense of the numbers, but here's what I've got
 from a very small sample size.

 For both v0.8.0-incubating and v0.8.1-incubating, building separate
 assemblies is faster than `./sbt/sbt assembly` and the times for building
 separate assemblies for 0.8.0 and 0.8.1 are about the same.

 For v0.8.0-incubating, `./sbt/sbt assembly` takes about 2.5x as long as the
 sum of the separate assemblies.
 For v0.8.1-incubating, `./sbt/sbt assembly` takes almost 8x as long as the
 sum of the separate assemblies.

 Weird.


 On Wed, Dec 11, 2013 at 11:49 AM, Patrick Wendell pwend...@gmail.comwrote:

 I'll +1 myself also.

 For anyone who has the slow build problem: does this issue happen when
 building v0.8.0-incubating also? Trying to figure out whether it's
 related to something we added in 0.8.1 or if it's a long standing
 issue.

 - Patrick

 On Wed, Dec 11, 2013 at 10:39 AM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Woah, weird, but definitely good to know.
 
  If you’re doing Spark development, there’s also a more convenient option
 added by Shivaram in the master branch. You can do sbt assemble-deps to
 package *just* the dependencies of each project in a special assembly JAR,
 and then use sbt compile to update the code. This will use the classes
 directly out of the target/scala-2.9.3/classes directories. You have to
 redo assemble-deps only if your external dependencies change.
 
  Matei
 
  On Dec 11, 2013, at 1:04 AM, Prashant Sharma scrapco...@gmail.com
 wrote:
 
  I hope this PR https://github.com/apache/incubator-spark/pull/252 can
 help.
  Again this is not a blocker for the release from my side either.
 
 
  On Wed, Dec 11, 2013 at 2:14 PM, Mark Hamstra m...@clearstorydata.com
 wrote:
 
  Interesting, and confirmed: On my machine where `./sbt/sbt assembly`
 takes
  a long, long, long time to complete (a MBP, in my case), building
 three
  separate assemblies (`./sbt/sbt assembly/assembly`, `./sbt/sbt
  examples/assembly`, `./sbt/sbt tools/assembly`) takes much, much less
 time.
 
 
 
  On Wed, Dec 11, 2013 at 12:02 AM, Prashant Sharma 
 scrapco...@gmail.com
  wrote:
 
  forgot to mention, after running sbt/sbt assembly/assembly running
  sbt/sbt
  examples/assembly takes just 37s. Not to mention my hardware is not
  really
  great.
 
 
  On Wed, Dec 11, 2013 at 1:28 PM, Prashant Sharma 
 scrapco...@gmail.com
  wrote:
 
  Hi Patrick and Matei,
 
  Was trying out this and followed the quick start guide which says do
  sbt/sbt assembly, like few others I was also stuck for few minutes on
  linux. On the other hand if I use sbt/sbt assembly/assembly it is
 much
  faster.
 
  Should we change the documentation to reflect this. It will not be
  great
  for first time users to get stuck there.
 
 
  On Wed, Dec 11, 2013 at 9:54 AM, Matei Zaharia 
  matei.zaha...@gmail.com
  wrote:
 
  +1
 
  Built and tested it on Mac OS X.
 
  Matei
 
 
  On Dec 10, 2013, at 4:49 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
  (incubating) version 0.8.1.
 
  The tag to be voted on is v0.8.1-incubating (commit b87d31d):
 
 
 
 
 https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=commit;h=b87d31dd8eb4b4e47c0138e9242d0dd6922c8c4e
 
  The release files, including signatures, digests, etc can be found
  at:
  http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
 
 
 https://repository.apache.org/content/repositories/orgapachespark-040/
 
  The documentation corresponding to this release can be found at:
 
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/
 
  For information about the contents of this release see:
 
 
 
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=CHANGES.txt;h=ce0aeab524505b63c7999e0371157ac2def6fe1c;hb=branch-0.8
 
  Please vote on releasing

[VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark
(incubating) version 0.8.1.

The tag to be voted on is v0.8.1-incubating (commit bf23794a):
https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203

The release files, including signatures, digests, etc can be found at:
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-024/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/

For information about the contents of this release see:
attached draft of release notes
attached draft of release credits
https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt

Please vote on releasing this package as Apache Spark 0.8.1-incubating!

The vote is open until Wednesday, December 11th at 21:00 UTC and
passes if a majority of at least 3 +1 PPMC votes are cast.

[ ] +1 Release this package as Apache Spark 0.8.1-incubating
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.incubator.apache.org/
Michael Armbrust -- build fix

Pierre Borckmans -- typo fix in documentation

Evan Chan -- added `local://` scheme for dependency jars

Ewen Cheslack-Postava -- `add` method for python accumulators, support for 
setting config properties in python

Mosharaf Chowdhury -- optimized broadcast implementation

Frank Dai -- documentation fix

Aaron Davidson -- lead on shuffle file consolidation, lead on h/a mode for 
standalone scheduler, cleaned up representation of block id’s, several small 
improvements and bug fixes

Tathagata Das -- new streaming operators: `transformWith`, `leftInnerJoin`, and 
`rightOuterJoin`, fix for kafka concurrency bug

Ankur Dave -- support for pausing spot clusters on EC2

Harvey Feng -- optimization to JobConf broadcasts, minor fixes, lead on YARN 
2.2 build

Ali Ghodsi -- scheduler support for SIMR, lead on YARN 2.2 build

Thomas Graves -- lead on Spark YARN integration including secure HDFS access 
over YARN

Li Guoqiang -- fix for maven build

Stephen Haberman -- bug fix

Haidar Hadi -- documentation fix

Nathan Howell -- bug fix relating to YARN

Holden Karau -- java version of `mapPartitionsWithIndex`

Du Li -- bug fix in make-distrubion.sh

Xi Lui -- bug fix and code clean-up

David McCauley -- bug fix in standalone mode JSON output

Michael (wannabeast) -- bug fix in memory store

Fabrizio Milo -- typos in documentation, minor clean-up in DAGScheduler, typo 
in scaladoc

Mridul Muralidharan -- fixes to meta-data cleaner and speculative scheduler

Sundeep Narravula -- build fix, bug fixes in scheduler and tests, minor code 
clean-up

Kay Ousterhout -- optimization to task result fetching, extensive code clean-up 
and refactoring (task schedulers, thread pools), result-fetching state in UI, 
showing task and attempt it in UI, several bug fixes in scheduler, UI, and unit 
tests

Nick Pentreath -- implicit feedback variant of ALS algorithm

Imran Rashid -- small improvement to executor launch

Ahir Reddy -- spark support for SIMR

Josh Rosen -- reduced memory overhead for BlockInfo objects, clean up of 
BlockManager code, fix to java API auditor, code clean-up in java API, and bug 
fixes in python API

Henry Saputra -- build fix

Jerry Shao -- refactoring of fair scheduler, support for running spark as a 
specific user, bug fix

Mingfei Shi -- documentation for JobLogger

Andre Schumacher -- sortByKey in pyspark and associated changes

Karthik Tunga -- bug fix in launch script

Patrick Wendell -- added `repartition` operator, logging improvements, 
instrumentation for shuffle write, documentation improvements, fix for 
streaming example, and release management

Neal Wiggins -- minor import clean-up, documentation typo

Andrew Xia -- bug fix in UI

Reynold Xin -- optimized hash set and hash tables for primitive types, task 
killing, support for setting job properties in repl, logging improvements, Kryo 
improvements, several bug fixes, and general clean-up

Matei Zaharia -- optimized hashmap for shuffle data, pyspark documentation, 
optimizations to kryo and chill serializers

Wu Zeming -- bug fix in executors UI
DRAFT OF RELEASE NOTES FOR SPARK 0.8.1

Apache Spark 0.8.1 is a maintenance release including several bug fixes and 
performance optimizations. It also includes a few new features. Contributions 
to 0.8.1 came from 40 developers.

== High availability mode for standalone scheduler ==
The standalone scheduler now has a High Availability (H/A) mode which can 
tolerate master failures. This is particularly useful for long-running 
applications such as streaming jobs and the shark server, where the scheduler 
master previous represented

Re: [DISCUSS] About the [VOTE] Release Apache Spark 0.8.1-incubating (rc1)

2013-12-08 Thread Patrick Wendell
Hey Mark,

One constructive action you and other people can take to help us
assess the quality and completeness of this release is to download the
release, run the tests, run the release in your dev environment, read
through the documentation, etc. This is one of the main points of
releasing an RC to the community... even if you disagree with some
patches that were merged in, this is still a way you can help validate
the release.

- Patrick

On Sun, Dec 8, 2013 at 1:30 PM, Mark Hamstra m...@clearstorydata.com wrote:
 I'm aware of the changes file, but it really doesn't address the issue that
 I am raising.  The changes file just tells me what has gone into the
 release candidate.  In general, it doesn't tell me why those changes went
 in or provide any rationale by which to judge whether that is the complete
 set of changes that should go in.

 I talked some with Matei about related versioning and release issues last
 week, and I've raised them in other contexts previously, but I'm taking the
 liberty to annoy people again because I really am not happy with our
 current versioning and release process, and I really am of the opinion that
 we've got to start doing much better before I can vote in favor of a 1.0
 release.  I fully realize that this is not a 1.0 release, and that because
 we are pre-1.0 we still have a lot of flexibility with releases that break
 backward or forward compatibility and with version numbers that have
 nothing like the semantic meaning that they will eventually need to have;
 but it is not going to be easy to change our process and culture so that we
 produce the kind of stability and reliability that Spark users need to be
 able to depend upon and version numbers that clearly communicate what those
 users expect them to mean.  I think that we should start making those
 changes now.  Just because we have flexibility pre-1.0, that doesn't mean
 that we shouldn't start training ourselves now to work within the
 constraints of post-1.0 Spark.  If I'm to be happy voting for an eventual
 1.0 release candidate, I'll need to have seen at least one full development
 cycle that already adheres to the post-1.0 constraints, demonstrating the
 maturity of our development process.

 That demonstration cycle is clearly not this one -- and I understand that
 there were some compelling reasons (particularly with regard too getting a
 full release of Spark based on Scala 2.9.3 before we make the jump to
 2.10.  This patch-level release breaks binary compatibility and contains
 a lot of code that isn't anywhere close to meeting the criterion for
 inclusion in a real, post-1.0 patch-level release: essentially changes
 that every, or nearly every, existing Spark user needs (not just wants),
 and that work with all existing and future binaries built with the prior
 patch-level version of Spark as a dependency.  Like I said, we are clearly
 nowhere close to that with the move from 0.8.0 to 0.8.1; but I also haven't
 been able to recognize any alternative criterion by which to judge the
 quality and completeness of this release candidate.

 Maybe there just isn't one, and I'm just going to have to swallow my
 concerns while watching 0.8.1 go out the door; but if we don't start doing
 better on this kind of thing in the future, you are going to start hearing
 more complaining from me. I just hope that it doesn't get to the point
 where I feel compelled to actively oppose an eventual 1.0 release
 candidate.


 On Sun, Dec 8, 2013 at 12:37 PM, Henry Saputra henry.sapu...@gmail.comwrote:

 Ah, sorry for the confusion Patrick, like you said I was just trying to let
 people aware about this file and the purpose of it.

 On Sunday, December 8, 2013, Patrick Wendell wrote:

  Hey Henry,
 
  Are you suggesting we need to change something about or changes file?
  Or are you just pointing people to the file?
 
  - Patrick
 
  On Sun, Dec 8, 2013 at 11:37 AM, Henry Saputra henry.sapu...@gmail.com
  wrote:
   HI Spark devs,
  
   I have modified the Subject to avoid polluting the VOTE thread since
   it related to more info how and which commits merge back to 0.8.*
   branch.
   Please respond to the previous question to this thread.
  
   Technically the CHANGES.txt [1] file should describe the changes in a
   particular release and it is the main requirement needed to cut an ASF
   release.
  
  
   - Henry
  
   [1]
  https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt
  
   On Sun, Dec 8, 2013 at 12:03 AM, Josh Rosen rosenvi...@gmail.com
  wrote:
   We can use git log to figure out which changes haven't made it into
   branch-0.8.  Here's a quick attempt, which only lists pull requests
 that
   were only merged into one of the branches.  For completeness, this
  could be
   extended to find commits that weren't part of a merge and are only
  present
   in one branch.
  
   *Script:*
  
   MASTER_BRANCH=origin/master
   RELEASE_BRANCH=origin/branch-0.8
  
   git log --oneline --grep Merge pull

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Patrick Wendell
Hey Take,

Could you start a separate thread to debug your build issue? In that
thread, could you paste the exact build command and entire output? The log
you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0
based on the assembly file name it is logging.

---
sent from my phone
On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com wrote:

 With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble
 completing the build process (sbt/sbt assembly) on Macbook. The sbt command
 hangs at the last step.

 ...
 ...
 [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395
 [info] Packaging

 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar
 ...
 [info] SHA-1: 0657a347240266230247693f265a5797d40c326a
 [info] Packaging

 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar
 ...
 (hangs here)
 --


 On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I was
 able to build it successfully.
 ..
 ..
 [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534
 [info] Packaging

 /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
 ...
 [info] Done packaging.
 [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM
 --



 On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com
 wrote:

  Please vote on releasing the following candidate as Apache Spark
  (incubating) version 0.8.1.
 
  The tag to be voted on is v0.8.1-incubating (commit bf23794a):
 
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203
 
  The release files, including signatures, digests, etc can be found at:
  http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-024/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/
 
  For information about the contents of this release see:
  attached draft of release notes
  attached draft of release credits
  https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt
 
  Please vote on releasing this package as Apache Spark 0.8.1-incubating!
 
  The vote is open until Wednesday, December 11th at 21:00 UTC and
  passes if a majority of at least 3 +1 PPMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 0.8.1-incubating
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.incubator.apache.org/
 



Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Patrick Wendell
For my own part I'll give a +1 to this RC.

On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa taka.epsi...@gmail.com wrote:
 OK. I will post the entire output via separate email. I just upgraded
 Hadoop to 2.2.0 recently. So there might be something I need to
 remove/clean up.


 On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Take,

 Could you start a separate thread to debug your build issue? In that
 thread, could you paste the exact build command and entire output? The log
 you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0
 based on the assembly file name it is logging.

 ---
 sent from my phone
 On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com wrote:

  With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble
  completing the build process (sbt/sbt assembly) on Macbook. The sbt
 command
  hangs at the last step.
 
  ...
  ...
  [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395
  [info] Packaging
 
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar
  ...
  [info] SHA-1: 0657a347240266230247693f265a5797d40c326a
  [info] Packaging
 
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar
  ...
  (hangs here)
  --
 
 
  On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I was
  able to build it successfully.
  ..
  ..
  [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534
  [info] Packaging
 
 
 /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
  ...
  [info] Done packaging.
  [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM
  --
 
 
 
  On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   Please vote on releasing the following candidate as Apache Spark
   (incubating) version 0.8.1.
  
   The tag to be voted on is v0.8.1-incubating (commit bf23794a):
  
  
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203
  
   The release files, including signatures, digests, etc can be found at:
   http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/pwendell.asc
  
   The staging repository for this release can be found at:
   https://repository.apache.org/content/repositories/orgapachespark-024/
  
   The documentation corresponding to this release can be found at:
   http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/
  
   For information about the contents of this release see:
   attached draft of release notes
   attached draft of release credits
   https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt
  
   Please vote on releasing this package as Apache Spark 0.8.1-incubating!
  
   The vote is open until Wednesday, December 11th at 21:00 UTC and
   passes if a majority of at least 3 +1 PPMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 0.8.1-incubating
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.incubator.apache.org/
  
 



Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Patrick Wendell
Hey Mark - ya this would be good to get in.

Does merging that particular PR put this in sufficient shape for the
0.8.1 release or are there other open patches we need to look at?

- Patrick

On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra m...@clearstorydata.com wrote:
 SPARK-962 should be resolved before release.  See also:
 https://github.com/apache/incubator-spark/pull/195

 With the references to the way I changed Debian packaging for ClearStory,
 we should be at least 90% of the way toward doing it right for Apache.


 On Sun, Dec 8, 2013 at 5:29 PM, Patrick Wendell pwend...@gmail.com wrote:

 For my own part I'll give a +1 to this RC.

 On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa taka.epsi...@gmail.com
 wrote:
  OK. I will post the entire output via separate email. I just upgraded
  Hadoop to 2.2.0 recently. So there might be something I need to
  remove/clean up.
 
 
  On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Hey Take,
 
  Could you start a separate thread to debug your build issue? In that
  thread, could you paste the exact build command and entire output? The
 log
  you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0
  based on the assembly file name it is logging.
 
  ---
  sent from my phone
  On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com
 wrote:
 
   With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble
   completing the build process (sbt/sbt assembly) on Macbook. The sbt
  command
   hangs at the last step.
  
   ...
   ...
   [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395
   [info] Packaging
  
  
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar
   ...
   [info] SHA-1: 0657a347240266230247693f265a5797d40c326a
   [info] Packaging
  
  
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar
   ...
   (hangs here)
   --
  
  
   On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I
 was
   able to build it successfully.
   ..
   ..
   [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534
   [info] Packaging
  
  
 
 /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
   ...
   [info] Done packaging.
   [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM
   --
  
  
  
   On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com
   wrote:
  
Please vote on releasing the following candidate as Apache Spark
(incubating) version 0.8.1.
   
The tag to be voted on is v0.8.1-incubating (commit bf23794a):
   
   
  
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203
   
The release files, including signatures, digests, etc can be found
 at:
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/
   
Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc
   
The staging repository for this release can be found at:
   
 https://repository.apache.org/content/repositories/orgapachespark-024/
   
The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/
   
For information about the contents of this release see:
attached draft of release notes
attached draft of release credits
   
 https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt
   
Please vote on releasing this package as Apache Spark
 0.8.1-incubating!
   
The vote is open until Wednesday, December 11th at 21:00 UTC and
passes if a majority of at least 3 +1 PPMC votes are cast.
   
[ ] +1 Release this package as Apache Spark 0.8.1-incubating
[ ] -1 Do not release this package because ...
   
To learn more about Apache Spark, please see
http://spark.incubator.apache.org/
   
  
 



Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Patrick Wendell
Looked into this a bit more - I think removing repl-bin is something
we should wait until 0.9 to do, because we've published it to maven in
0.8.0 and people might expect it to be there in 0.8.1.

Merging the directly referenced pull request (195) seems like a good
idea though since it fixes a bug in the script.

Is that what you are suggesting?

- Patrick

On Sun, Dec 8, 2013 at 7:30 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Mark - ya this would be good to get in.

 Does merging that particular PR put this in sufficient shape for the
 0.8.1 release or are there other open patches we need to look at?

 - Patrick

 On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra m...@clearstorydata.com wrote:
 SPARK-962 should be resolved before release.  See also:
 https://github.com/apache/incubator-spark/pull/195

 With the references to the way I changed Debian packaging for ClearStory,
 we should be at least 90% of the way toward doing it right for Apache.


 On Sun, Dec 8, 2013 at 5:29 PM, Patrick Wendell pwend...@gmail.com wrote:

 For my own part I'll give a +1 to this RC.

 On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa taka.epsi...@gmail.com
 wrote:
  OK. I will post the entire output via separate email. I just upgraded
  Hadoop to 2.2.0 recently. So there might be something I need to
  remove/clean up.
 
 
  On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Hey Take,
 
  Could you start a separate thread to debug your build issue? In that
  thread, could you paste the exact build command and entire output? The
 log
  you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0
  based on the assembly file name it is logging.
 
  ---
  sent from my phone
  On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com
 wrote:
 
   With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble
   completing the build process (sbt/sbt assembly) on Macbook. The sbt
  command
   hangs at the last step.
  
   ...
   ...
   [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395
   [info] Packaging
  
  
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar
   ...
   [info] SHA-1: 0657a347240266230247693f265a5797d40c326a
   [info] Packaging
  
  
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar
   ...
   (hangs here)
   --
  
  
   On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I
 was
   able to build it successfully.
   ..
   ..
   [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534
   [info] Packaging
  
  
 
 /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
   ...
   [info] Done packaging.
   [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM
   --
  
  
  
   On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com
   wrote:
  
Please vote on releasing the following candidate as Apache Spark
(incubating) version 0.8.1.
   
The tag to be voted on is v0.8.1-incubating (commit bf23794a):
   
   
  
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203
   
The release files, including signatures, digests, etc can be found
 at:
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/
   
Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc
   
The staging repository for this release can be found at:
   
 https://repository.apache.org/content/repositories/orgapachespark-024/
   
The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/
   
For information about the contents of this release see:
attached draft of release notes
attached draft of release credits
   
 https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt
   
Please vote on releasing this package as Apache Spark
 0.8.1-incubating!
   
The vote is open until Wednesday, December 11th at 21:00 UTC and
passes if a majority of at least 3 +1 PPMC votes are cast.
   
[ ] +1 Release this package as Apache Spark 0.8.1-incubating
[ ] -1 Do not release this package because ...
   
To learn more about Apache Spark, please see
http://spark.incubator.apache.org/
   
  
 



Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Patrick Wendell
Hey Mark,

What I'm asking is whether this patch is sufficient to have a working
debian build in 0.8.1, or are there other outstanding issues to make
it work? By working I mean, within the initial design that was
contributed (with repl-bin) it works according to that approach.

We can redesign this packaging in 0.9. That will require having a PR
against Apache Spark, discussing, etc. But it doesn't need to be on
the critical path for this release.

- Patrick

On Sun, Dec 8, 2013 at 7:54 PM, Mark Hamstra m...@clearstorydata.com wrote:
 Whatever Debian package gets built has to work, so that's the first
 requirement.  I don't know how to decide whether a change is acceptable in
 0.8 or has to wait until 0.9, but the 0.9 packaging should definitely
 leverage the assembly sub-project, making repl-bin unnecessary.


 On Sun, Dec 8, 2013 at 7:46 PM, Patrick Wendell pwend...@gmail.com wrote:

 Looked into this a bit more - I think removing repl-bin is something
 we should wait until 0.9 to do, because we've published it to maven in
 0.8.0 and people might expect it to be there in 0.8.1.

 Merging the directly referenced pull request (195) seems like a good
 idea though since it fixes a bug in the script.

 Is that what you are suggesting?

 - Patrick

 On Sun, Dec 8, 2013 at 7:30 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Hey Mark - ya this would be good to get in.
 
  Does merging that particular PR put this in sufficient shape for the
  0.8.1 release or are there other open patches we need to look at?
 
  - Patrick
 
  On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra m...@clearstorydata.com
 wrote:
  SPARK-962 should be resolved before release.  See also:
  https://github.com/apache/incubator-spark/pull/195
 
  With the references to the way I changed Debian packaging for
 ClearStory,
  we should be at least 90% of the way toward doing it right for Apache.
 
 
  On Sun, Dec 8, 2013 at 5:29 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  For my own part I'll give a +1 to this RC.
 
  On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa taka.epsi...@gmail.com
 
  wrote:
   OK. I will post the entire output via separate email. I just upgraded
   Hadoop to 2.2.0 recently. So there might be something I need to
   remove/clean up.
  
  
   On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell pwend...@gmail.com
  wrote:
  
   Hey Take,
  
   Could you start a separate thread to debug your build issue? In that
   thread, could you paste the exact build command and entire output?
 The
  log
   you posted here suggests the first build detected hadoop 1.0.4 not
 2.2.0
   based on the assembly file name it is logging.
  
   ---
   sent from my phone
   On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com
  wrote:
  
With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble
completing the build process (sbt/sbt assembly) on Macbook. The
 sbt
   command
hangs at the last step.
   
...
...
[info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395
[info] Packaging
   
   
  
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar
...
[info] SHA-1: 0657a347240266230247693f265a5797d40c326a
[info] Packaging
   
   
  
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar
...
(hangs here)
--
   
   
On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed,
 I
  was
able to build it successfully.
..
..
[info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534
[info] Packaging
   
   
  
 
 /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
...
[info] Done packaging.
[success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM
--
   
   
   
On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell 
 pwend...@gmail.com
wrote:
   
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.8.1.

 The tag to be voted on is v0.8.1-incubating (commit bf23794a):


   
  
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203

 The release files, including signatures, digests, etc can be
 found
  at:
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:

  https://repository.apache.org/content/repositories/orgapachespark-024/

 The documentation corresponding to this release can be found at:

 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/

 For information about the contents

Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)

2013-12-08 Thread Patrick Wendell
Hey Mark,

Okay if 195 gets this in working order in the branch 0.8 let's just
merge that to keep it consistent with our docs and the way this is
done in 0.8.0

We can do a broader refactoring in 0.9. Would be great if you could
kick off a JIRA discussion or submit a PR relating to that.

- Patrick

On Sun, Dec 8, 2013 at 8:07 PM, Mark Hamstra m...@clearstorydata.com wrote:
 Well, 195 is sufficient to give you something that runs, but it doesn't run
 the same way as Spark built/distributed by other means -- e.g., after 195
 the package still uses something equivalent to the old `run` script instead
 of the current `spark-class` way.


 On Sun, Dec 8, 2013 at 8:02 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Mark,

 What I'm asking is whether this patch is sufficient to have a working
 debian build in 0.8.1, or are there other outstanding issues to make
 it work? By working I mean, within the initial design that was
 contributed (with repl-bin) it works according to that approach.

 We can redesign this packaging in 0.9. That will require having a PR
 against Apache Spark, discussing, etc. But it doesn't need to be on
 the critical path for this release.

 - Patrick

 On Sun, Dec 8, 2013 at 7:54 PM, Mark Hamstra m...@clearstorydata.com
 wrote:
  Whatever Debian package gets built has to work, so that's the first
  requirement.  I don't know how to decide whether a change is acceptable
 in
  0.8 or has to wait until 0.9, but the 0.9 packaging should definitely
  leverage the assembly sub-project, making repl-bin unnecessary.
 
 
  On Sun, Dec 8, 2013 at 7:46 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Looked into this a bit more - I think removing repl-bin is something
  we should wait until 0.9 to do, because we've published it to maven in
  0.8.0 and people might expect it to be there in 0.8.1.
 
  Merging the directly referenced pull request (195) seems like a good
  idea though since it fixes a bug in the script.
 
  Is that what you are suggesting?
 
  - Patrick
 
  On Sun, Dec 8, 2013 at 7:30 PM, Patrick Wendell pwend...@gmail.com
  wrote:
   Hey Mark - ya this would be good to get in.
  
   Does merging that particular PR put this in sufficient shape for the
   0.8.1 release or are there other open patches we need to look at?
  
   - Patrick
  
   On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra m...@clearstorydata.com
 
  wrote:
   SPARK-962 should be resolved before release.  See also:
   https://github.com/apache/incubator-spark/pull/195
  
   With the references to the way I changed Debian packaging for
  ClearStory,
   we should be at least 90% of the way toward doing it right for
 Apache.
  
  
   On Sun, Dec 8, 2013 at 5:29 PM, Patrick Wendell pwend...@gmail.com
  wrote:
  
   For my own part I'll give a +1 to this RC.
  
   On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa 
 taka.epsi...@gmail.com
  
   wrote:
OK. I will post the entire output via separate email. I just
 upgraded
Hadoop to 2.2.0 recently. So there might be something I need to
remove/clean up.
   
   
On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
   
Hey Take,
   
Could you start a separate thread to debug your build issue? In
 that
thread, could you paste the exact build command and entire
 output?
  The
   log
you posted here suggests the first build detected hadoop 1.0.4
 not
  2.2.0
based on the assembly file name it is logging.
   
---
sent from my phone
On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com
 
   wrote:
   
 With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having
 trouble
 completing the build process (sbt/sbt assembly) on Macbook. The
  sbt
command
 hangs at the last step.

 ...
 ...
 [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395
 [info] Packaging


   
  
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar
 ...
 [info] SHA-1: 0657a347240266230247693f265a5797d40c326a
 [info] Packaging


   
  
 
 /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar
 ...
 (hangs here)
 --


 On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45)
 installed,
  I
   was
 able to build it successfully.
 ..
 ..
 [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534
 [info] Packaging


   
  
 
 /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
 ...
 [info] Done packaging.
 [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM
 --



 On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell 
  pwend...@gmail.com
 wrote:

  Please vote on releasing the following candidate

Re: difference between 'fetchWaitTime' and 'remoteFetchTime'

2013-11-25 Thread Patrick Wendell
Hey Umar,

I dug into this a bit today out of curiosity since I also wasn't sure.

I updated the in-line documentation here:
https://github.com/apache/incubator-spark/pull/209/files

The more important metric is `fetchWaitTime` which indicates how much
of the task runtime was spent waiting for input data.

remoteFetchTime is an aggregation of all of the fetch delays for each
block... this second metric is a bit more convoluted because those
fetches can actually overlap, so if this is high it doesn't
necessarily indicate any latency hit.

- Patrick

On Mon, Nov 25, 2013 at 1:23 PM, Umar Javed umarj.ja...@gmail.com wrote:
 Any clarification on this? thanks.


 On Wed, Nov 20, 2013 at 3:02 PM, Umar Javed umarj.ja...@gmail.com wrote:

 In the class ShuffleReadMetrics in executor/TaskMetrics.scala, there are
 two variables:

 1) fetchWaitTime: /**


* Total time that is spent blocked waiting for shuffle to fetch data


*/

 2) remoteFetchTime

 /**


* The total amount of time for all the shuffle fetches.  This adds up
 time from overlapping

* shuffles, so can be longer than task time


*/

 As I understand it, the difference between these two is that fetchWaitTime
 is remoteFetchTime without the overlapped time counted exactly once. Is
 that right? Can somebody explain the difference better?

 thanks!



Re: Documenting the release process for Apache Spark

2013-11-08 Thread Patrick Wendell
Hey Henry,

I did create release notes for this. However, I wanted to dogfood
them for the 0.8.1 release before I push them publicly, just so I know
the thing is actually comprehensive. It's quite complicated and I
don't want to publish something that leads people down the wrong path.

My thought was I would use these personally for the 0.8.1 release to
verify them, then publish them and try to have someone else do the
0.9.0 release (perhaps wishful thinking!).

- Patrick

On Thu, Nov 7, 2013 at 12:09 PM, Henry Saputra henry.sapu...@gmail.com wrote:
 Hi Patrick,

 Did you end up writing up the steps you were taking to generate the
 Apache Spark release to provide help to the next Apache Spark RE?

 I remember you were trying to create one after we released 0.8

 Thanks,

 - Henry


Re: Getting failures in FileServerSuite

2013-10-30 Thread Patrick Wendell
This may have been caused by a recent merge since a bunch of people
independently hit it in the last 48 hours.

One debugging step would be to narrow it down to which merge caused
it. I don't have time personally today, but just a suggestion for ppl
for whom this is blocking progress.

- Patrick

On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra m...@clearstorydata.com wrote:
 What JDK version on you using, Evan?

 I tried to reproduce your problem earlier today, but I wasn't even able to
 get through the assembly build -- kept hanging when trying to build the
 examples assembly.  Foregoing the assembly and running the tests would hang
 on FileServerSuite Dynamically adding JARS locally -- no stack trace,
 just hung.  And I was actually seeing a very similar stack trace to yours
 from a test suite of our own running against 0.8.1-SNAPSHOT -- not exactly
 the same because line numbers were different once it went into the java
 runtime, and it eventually ended up someplace a little different.  That got
 me curious about differences in Java versions, so I updated to the latest
 Oracle release (1.7.0_45).  Now it cruises right through the build and test
 of Spark master from before Matei merged your PR.  Then I logged into a
 machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
 installed, and I'm right back to the hanging during the examples assembly
 (but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
 improve the results of the ClearStory test suite I was looking at, so my
 misery isn't over; but yours might be with a newer JDK



 On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan e...@ooyala.com wrote:

 Must be a local environment thing, because AmpLab Jenkins can't
 reproduce it. :-p

 On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen rosenvi...@gmail.com wrote:
  Someone on the users list also encountered this exception:
 
 
 https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
 
 
  On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan e...@ooyala.com wrote:
 
  I'm at the latest
 
  commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
  Merge: aec9bf9 a197137
  Author: Reynold Xin r...@apache.org
  Date:   Tue Oct 29 01:41:44 2013 -0400
 
 
  and seeing this when I do a test-only FileServerSuite:
 
  13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
  13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
  java.io.StreamCorruptedException
  java.io.StreamCorruptedException: invalid type code: AC
  at
  java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
  at
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
  at
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
  at
 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
  at
  org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
  at
 scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at
 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
  at
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
  at
  org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
  at
 
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
  at
 
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
  at
 
 org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
  at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
  at
  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at
  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:680)
 
 
  Anybody else seen this yet?
 
  I have a really simple PR and this fails without my change, so I may
  go ahead and submit it anyways.
 
  --
  --
  Evan Chan
  Staff Engineer
  e...@ooyala.com  |
 



 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |



Re: Are we moving too fast or too far on 0.8.1-SNAPSHOT?

2013-10-28 Thread Patrick Wendell
Shark is not a great example in general because it uses semi-private
internal interfaces that are not guaranteed to be compatible within
minor releases.

Spark's public, documented API has always (AFAIK) maintained
compatibility within minor versions. In fact, we've been diligent to
maintain compatibility with major versions as well and there have only
been very minute changes in that API.

Over time it would be good for Shark to migrate to using higher API's
(and we may need to build these).

But my point is that the public API has maintained compatibility
consistent with the norms discussed here.

- Patrick

On Mon, Oct 28, 2013 at 3:50 PM, Jey Kottalam j...@cs.berkeley.edu wrote:
 I agree that we should strive to maintain full backward compatibility
 between patch releases (i.e. incrementing the z in version x.y.z).

 On Mon, Oct 28, 2013 at 3:22 PM, Mark Hamstra m...@clearstorydata.com wrote:
 Or more to the point: What is our commitment to backward compatibility in
 point releases?

 Many Java developers will come to a library or platform versioned as x.y.z
 with the expectation that if their own code worked well using x.y.(z-1) as
 a dependency, then moving up to x.y.z will be painless and trivial.  That
 is not looking like it will be the case for Spark 0.8.0 and 0.8.1.

 We only need to look at Shark as an example of code built with a dependency
 on Spark to see the problem.  Shark 0.8.0 works with Spark 0.8.0.  Shark
 0.8.0 does not build with Spark 0.8.1-SNAPSHOT.  Presumably that lack of
 backwards compatibility will continue into the eventual release of Spark
 0.8.1, and that makes life hard on developers using Spark and Shark.  For
 example, a developer using the released version of Shark but wanting to
 pick up the bug fixes in Spark doesn't have a good option anymore since
 0.8.1-SNAPSHOT (or the eventual 0.8.1 release) doesn't work, and moving to
 the wild and woolly development on the master branches of Spark and Shark
 is not a good idea for someone trying to develop production code.  In other
 words, all of the bug fixes in Spark 0.8.1 are not accessible to this
 developer until such time as there are available 0.8.1-compatible versions
 of Shark and anything else built on Spark that this developer is using.

 The only other option is trying to cherry-pick commits from, e.g., Shark
 0.9.0-SNAPSHOT into Shark 0.8.0 until Shark 0.8.0 has been brought up to a
 point where it works with Spark 0.8.1.  But an application developer
 shouldn't need to do that just to get the bug fixes in Spark 0.8.1, and it
 is not immediately obvious just which Shark commits are necessary and
 sufficient to produce a correct, Spark-0.8.1-compatible version of Shark
 (indeed, there is no guarantee that such a thing is even possible.)  Right
 now, I believe that 67626ae3eb6a23efc504edf5aedc417197f072cf,
 488930f5187264d094810f06f33b5b5a2fde230a and
 bae19222b3b221946ff870e0cee4dba0371dea04 are necessary to get Shark to work
 with Spark 0.8.1-SNAPSHOT, but that those commits are not sufficient (Shark
 builds against Spark 0.8.1-SNAPSHOT with those cherry-picks, but I'm still
 seeing runtime errors.)

 In short, this is not a good situation, and we probably need a real 0.8
 maintenance branch that maintains backward compatibility with 0.8.0,
 because (at least to me) the current branch-0.8 of Spark looks more like
 another active development branch (in addition to the master and scala-2.10
 branches) than it does a maintenance branch.


Re: Suggestion/Recommendation for language bindings

2013-10-15 Thread Patrick Wendell
I think Ruby integration via JRuby would be a great idea.

On Tue, Oct 15, 2013 at 9:45 AM, Ryan Weald r...@weald.com wrote:
 Writing a JRuby wrapper around the existing Java bindings would be pretty
 cool. Could help to get some of the Ruby community to start using the Spark
 platform.

 -Ryan


 On Mon, Oct 14, 2013 at 12:07 PM, Aaron Babcock 
 aaron.babc...@gmail.comwrote:

 Hey Laksh,

 Not sure if you are interested in groovy at all, but I've got the
 beginning of a project here:
 https://github.com/bunions1/groovy-spark-example

 The idea is to map groovy idioms: myRdd.collect{ row - newRow } to
 spark api calls myRdd.map( row = newRow)  and support a good repl.

 Its not officially related to spark at all and is very early stage but
 maybe it will be a point of reference for you.






 On Mon, Oct 14, 2013 at 12:42 PM, Laksh Gupta glaks...@gmail.com wrote:
  Hi
 
  I am interested in contributing to the project and want to start with
  supporting a new programming language on Spark. I can see that Spark
  already support Java and Python. Would someone provide me some
  suggestion/references to start with? I think this would be a great
 learning
  experince for me. Thank you in advance.
 
  --
  - Laksh Gupta



Re: Spark 0.8.0: bits need to come from ASF infrastructure

2013-09-25 Thread Patrick Wendell
Yep, we definitely need to just directly point people the location at
apache.org where they can find the hashes. I just updated the release
notes and downloads page to point to that site.

I just wanted to point out that mirroring these through a CDN seems
philosophically the same as mirroring through Apache, since in neither
case do we expect the users to trust the artifact they download. We
just need to be more explicit that we are, indeed, mirroring and
explain that the trusted root is at apache.org

- Patrick

On Wed, Sep 25, 2013 at 3:56 PM, Roman Shaposhnik r...@apache.org wrote:
 On Wed, Sep 25, 2013 at 3:48 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey we've actually distributed our artifacts through amazon cloudfront
 in the past (and that is where the website links redirect to).

 Since the apache mirrors don't distribute signatures anyways,

 True, but apache dist does. IOW, it is not uncommon for those
 having an automated build/fetching systems to get bits from
 one of the mirrors and then get the hashes directly from dist.

 In your current case, I don't think I know of a way to do that.

 Now, you may say that the current CDN you guys are you using
 is functioning like a mirror -- well, I'd say that it needs to be
 called out like one then.

 Otherwise, as a naive user I *really* have to guess where
 to get the hashes.

 what is the difference between linking to an apache mirror vs using a more
 robust CDN? If people want to verify the downloads they need to go to
 the apache root in either case.

 Is this just a cultural thing or is there some security reason?

 A bit of both I guess.

 Thanks,
 Roman.


Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-20 Thread Patrick Wendell
Henry - one thing is that, because the filenames are not included in
the signatures, I could just alter the filenames now to not include
-RCX... would that be preferable or would that necessitate another
vote?

- Patrick

On Fri, Sep 20, 2013 at 6:39 AM, Henry Saputra henry.sapu...@gmail.com wrote:
 The RC should be just the directory where the artifact live but the final
 name should omit the RCxx

 Hmm not sure if IPMCs will be picky about this but should not be blocker to
 release.

 - Henry

 On Thu, Sep 19, 2013 at 8:17 PM, Patrick Wendell
 pwend...@gmail.comjavascript:;
 wrote:
 Hey Roman,

 We can do this in the future - I wasn't sure exactly what the right
 standard approach was. Just so I understand, the change you are
 proposing from what is there now is just to remove rcX from the
 file-names, correct?

 - Patrick

 On Thu, Sep 19, 2013 at 8:06 PM, Roman Shaposhnik 
 r...@apache.orgjavascript:;
 wrote:
 On Thu, Sep 19, 2013 at 5:56 PM, Patrick Wendell 
 pwend...@gmail.comjavascript:;
 wrote:
 FYI this vote ends in 8 hours.

 I was going to test it on a fully distributed Bigtop cluster, but hit
 a few snags. That now will extend into the weekend.

 Of course, that's not that big of a deal -- I can always vote on
 incubator general once you guys move the vote over there.

 The only minor nit for the future I've noticed is that I would
 highly encourage you to follow the usual RC practices where
 you name all of your artifact as final bits and have a subdirectory
 that reflects the RC name. E.g. here's how a very recent
 Hadoop RC looks like:
 http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0/

 Thanks,
 Roman.


[RESULT] [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-20 Thread Patrick Wendell
The vote is now closed. Below are the vote totals.

+1 (7 Total)
Andy Konwinski
Matei Zaharia
Patrick Wendell
Konstantin Boudnik
Reynold Xin
Chris Mattmann*
Henry Saputra*

0 (1 Total)
Mark Hamstra

-1 (0 Total)

* = Binding Vote

As per the incubator release guide [1] I'll be sending this to the
general incubator list for a final vote from IPMC members.

[1] 
http://incubator.apache.org/guides/releasemanagement.html#best-practice-incubator-release-vote

- Patrick

-- Forwarded message --
From: Roman Shaposhnik r...@apache.org
Date: Fri, Sep 20, 2013 at 8:10 AM
Subject: Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)
To: dev@spark.incubator.apache.org


On Thu, Sep 19, 2013 at 8:17 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Roman,

 We can do this in the future - I wasn't sure exactly what the right
 standard approach was. Just so I understand, the change you are
 proposing from what is there now is just to remove rcX from the
 file-names, correct?

Right. Basically your artifacts should look exactly like what
is going to be released when the vote passes.

Like I said -- it is a small nit, but it makes it easier for the
guys like me to test the RCs in the automated manner.

Thanks,
Roman.


Re: [RESULT] [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-20 Thread Patrick Wendell
Hey Henry,

Sounds good. I'll send an email to general@ shortly. I didn't realize
that this vote technically counts as passing according to those rules
(since plenty of PPMC gave +1).

On Fri, Sep 20, 2013 at 1:30 PM, Henry Saputra henry.sapu...@gmail.com wrote:
 Thanks to Patrick for driving the first Apache Spark release. Great job so 
 far.

 A bit clarification, the release VOTE passes with more than 3 +1
 binding votes from Apache Spark Podling Project Management (PPMC):

 +1 (7 Total)
 Andy Konwinski
 Matei Zaharia
 Patrick Wendell
 Reynold Xin
 Chris Mattmann*
 Henry Saputra*

 (* indicates IPMC)

 Since Spark is under ASF incubator we need to send another VOTE to
 general@i.a.o list.

 From the ASF release management page:
 
 It is Apache policy that all releases be formally approved by the
 responsible PMC. In the case of the incubator, the IPMC must approve
 all releases. That means there is an additional bit of voting that the
 release manager must now oversee on general@incubator in order to gain
 that approval. The release manager must inform general@incubator that
 the vote has passed on the podling's development list, and should
 indicate any IPMC votes gained during that process. A new vote on the
 release candidate artifacts must now be held on general@incubator to
 seek majority consensus from the IPMC. Previous IPMC votes issued on
 the project's development list count towards that goal. Even if there
 are sufficient IPMC votes already, it is vital that the IPMC as whole
 is informed via a VOTE e-mail on general@incubator.
 

 We have 2 IPMCs Vote already so technically we need one more unless we
 got veto votes against the release.

 - Henry


 On Fri, Sep 20, 2013 at 11:43 AM, Patrick Wendell pwend...@gmail.com wrote:
 The vote is now closed. Below are the vote totals.

 +1 (7 Total)
 Andy Konwinski
 Matei Zaharia
 Patrick Wendell
 Konstantin Boudnik
 Reynold Xin
 Chris Mattmann*
 Henry Saputra*

 0 (1 Total)
 Mark Hamstra

 -1 (0 Total)

 * = Binding Vote

 As per the incubator release guide [1] I'll be sending this to the
 general incubator list for a final vote from IPMC members.

 [1] 
 http://incubator.apache.org/guides/releasemanagement.html#best-practice-incubator-release-vote

 - Patrick

 -- Forwarded message --
 From: Roman Shaposhnik r...@apache.org
 Date: Fri, Sep 20, 2013 at 8:10 AM
 Subject: Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)
 To: dev@spark.incubator.apache.org


 On Thu, Sep 19, 2013 at 8:17 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Roman,

 We can do this in the future - I wasn't sure exactly what the right
 standard approach was. Just so I understand, the change you are
 proposing from what is there now is just to remove rcX from the
 file-names, correct?

 Right. Basically your artifacts should look exactly like what
 is going to be released when the vote passes.

 Like I said -- it is a small nit, but it makes it easier for the
 guys like me to test the RCs in the automated manner.

 Thanks,
 Roman.


Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-19 Thread Patrick Wendell
Hey Chris the tag in github is 3b85a85, which I listed in the original
vote next to the git URL. Is there another type of tag I should be
adding?

On Thu, Sep 19, 2013 at 7:20 PM, Chris Mattmann mattm...@apache.org wrote:
 I'm currently downloading the RC (all 127mb of the bin; then onto source).
 I have a generic set of Incubator scripts so should go fine after that.

 I'm giving you a preview of my minor nit:

 We don't VOTE on github URLs -- we VOTE on ASF URLs (e.g., the tag). That
 should be corrected in future RC emails. If all checks out, should be +1
 shortly.



 -Original Message-
 From: Patrick Wendell pwend...@gmail.com
 Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Date: Thursday, September 19, 2013 5:56 PM
 To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Subject: Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

FYI this vote ends in 8 hours.

On Wed, Sep 18, 2013 at 8:56 PM, Reynold Xin r...@cs.berkeley.edu wrote:
 +1


 --
 Reynold Xin, AMPLab, UC Berkeley
 http://rxin.org



 On Wed, Sep 18, 2013 at 11:06 AM, Konstantin Boudnik c...@apache.org
wrote:

 Maven package could be run with -DskipTests that will simply build...
well,
 the package.

 +1 on the RC. The nits are indeed minor.

   Cos

 On Tue, Sep 17, 2013 at 07:20PM, Matei Zaharia wrote:
  In Maven, mvn package should also create the assembly, but the
 non-obvious
  thing is that it needs to happen for all projects before mvn test for
 core
  works. Unfortunately I don't know any easy way around that.
 
  Matei
 
  On Sep 17, 2013, at 1:46 PM, Patrick Wendell pwend...@gmail.com
wrote:
 
   Hey Mark,
  
   Good catches here. Ya the driver suite thing is sorta annoying - we
   should try to fix that in master. The audit script I wrote first
does
   an sbt/sbt assembly to avoid this. I agree though these shouldn't
   block the release (if a blocker does come up we can revisit these
   potentially when cutting a release).
  
   - Patrick
  
   On Tue, Sep 17, 2013 at 1:26 PM, Mark Hamstra
m...@clearstorydata.com
 wrote:
   There are a few nits left to pick: 'sbt/sbt publish-local' isn't
 generating
   correct POM files because of the way the exclusions are defined in
   SparkBuild.scala using wildcards; looks like there may be some
broken
 doc
   links generated in that task, as well; DriverSuite doesn't like to
 run from
   the maven build, complaining that 'sbt/sbt assembly' needs to be
run
 first.
  
   None of these is enough for me to give RC6 a -1.
  
  
   On Tue, Sep 17, 2013 at 11:28 AM, Matei Zaharia 
 matei.zaha...@gmail.comwrote:
  
   +1
  
   Tried new staging repo to make sure the issue with RC5 is fixed.
  
   Matei
  
   On Sep 17, 2013, at 2:03 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  
   Please vote on releasing the following candidate as Apache Spark
   (incubating) version 0.8.0. This will be the first incubator
 release for
   Spark in Apache.
  
   The tag to be voted on is v0.8.0-incubating (commit 3b85a85):
  

https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating
  
   The release files, including signatures, digests, etc can be
found
 at:
  
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/files/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/pwendell.asc
  
   The staging repository for this release can be found at:
  
 https://repository.apache.org/content/repositories/orgapachespark-059/
  
   The documentation corresponding to this release can be found at:
  
http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/docs/
  
   Please vote on releasing this package as Apache Spark
 0.8.0-incubating!
  
   The vote is open until Friday, September 20th at 09:00 UTC and
 passes if
   a majority of at least 3 +1 IPMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 0.8.0-incubating
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.incubator.apache.org/
  
  
 





[VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-17 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark
(incubating) version 0.8.0. This will be the first incubator release for
Spark in Apache.

The tag to be voted on is v0.8.0-incubating (commit 3b85a85):
https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating

The release files, including signatures, digests, etc can be found at:
http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/files/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-059/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/docs/

Please vote on releasing this package as Apache Spark 0.8.0-incubating!

The vote is open until Friday, September 20th at 09:00 UTC and passes if
a majority of at least 3 +1 IPMC votes are cast.

[ ] +1 Release this package as Apache Spark 0.8.0-incubating
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.incubator.apache.org/


Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)

2013-09-17 Thread Patrick Wendell
Thanks for the feedback guys. I've changed the audit script to fix
Andy's suggestion. I also added tests for building sbt and maven
projects against the staged repository to test that artifacts are
setup correctly in maven.

I've posted RC6 which adds a very small change to this RC. This vote
is therefore cancelled in favor of RC6.

- Patrick

On Mon, Sep 16, 2013 at 9:47 PM, Andy Konwinski andykonwin...@gmail.com wrote:
 Patrick, I took a quick look over your release_auditor.py script and it's
 really great!

 Then I ran it (had to add --keyserver pgp.mit.edu to the gpg command) and
 everything passed on OS X!

 Great job and +1 from me whenever you resolve the kafka jar issue you
 mentioned.

 Andy


 On Mon, Sep 16, 2013 at 8:37 PM, Matei Zaharia matei.zaha...@gmail.comwrote:

 FWIW, I tested it otherwise and it seems good modulo this issue.

 Matei

 On Sep 16, 2013, at 6:39 PM, Patrick Wendell pwend...@gmail.com wrote:

  Hey folks, just FYI we found one minor issue with this RC (the kafka
  jar in the stream pom needs to be published as provided since it's
  not available in maven). Please still continue to test this and
  provide feedback here until the following RC is posted later.
 
  - Patrick
 
  On Mon, Sep 16, 2013 at 1:28 PM, Reynold Xin r...@cs.berkeley.edu
 wrote:
  +1
 
 
  --
  Reynold Xin, AMPLab, UC Berkeley
  http://rxin.org
 
 
 
  On Sun, Sep 15, 2013 at 11:09 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  I also wrote an audit script [1] to verify various aspects of the
  release binaries and ran it on this RC. People are welcome to run this
  themselves, but I haven't tested it on other machines yet, and some of
  the Spark tests are very sensitive to the test environment :) Output
  is pasted below:
 
  [1]
 https://github.com/pwendell/spark-utils/blob/master/release_auditor.py
 
  -
   Verifying download integrity for artifact:
  spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
  [PASSED] Artifact signature verified.
  [PASSED] Artifact MD5 verified.
  [PASSED] Artifact SHA verified.
  [PASSED] Tarball contains CHANGES.txt file
  [PASSED] Tarball contains NOTICE file
  [PASSED] Tarball contains LICENSE file
  [PASSED] README file contains disclaimer
   Verifying download integrity for artifact:
  spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
  [PASSED] Artifact signature verified.
  [PASSED] Artifact MD5 verified.
  [PASSED] Artifact SHA verified.
  [PASSED] Tarball contains CHANGES.txt file
  [PASSED] Tarball contains NOTICE file
  [PASSED] Tarball contains LICENSE file
  [PASSED] README file contains disclaimer
   Verifying download integrity for artifact:
  spark-0.8.0-incubating-rc5.tgz 
  [PASSED] Artifact signature verified.
  [PASSED] Artifact MD5 verified.
  [PASSED] Artifact SHA verified.
  [PASSED] Tarball contains CHANGES.txt file
  [PASSED] Tarball contains NOTICE file
  [PASSED] Tarball contains LICENSE file
  [PASSED] README file contains disclaimer
   Verifying build and tests for artifact:
  spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
  == Running build
  [PASSED] sbt build successful
  [PASSED] Maven build successful
  == Performing unit tests
  [PASSED] Tests successful
   Verifying build and tests for artifact:
  spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
  == Running build
  [PASSED] sbt build successful
  [PASSED] Maven build successful
  == Performing unit tests
  [PASSED] Tests successful
   Verifying build and tests for artifact:
  spark-0.8.0-incubating-rc5.tgz 
  == Running build
  [PASSED] sbt build successful
  [PASSED] Maven build successful
  == Performing unit tests
  [PASSED] Tests successful
 
  - Patrick
 
  On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell pwend...@gmail.com
  wrote:
  Please vote on releasing the following candidate as Apache Spark
  (incubating) version 0.8.0. This will be the first incubator release
 for
  Spark in Apache.
 
  The tag to be voted on is v0.8.0-incubating (commit d9e80d5):
 
 https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating
 
  The release files, including signatures, digests, etc can be found at:
  http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
 
 
 https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/
 
  Please vote on releasing this package as Apache Spark
 0.8.0-incubating!
  The vote is open until Thursday, September 19th at 05:00 UTC and
 passes
  if
  a majority of at least 3 +1 IPMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 0.8.0-incubating
  [ ] -1 Do not release this package because ...
 
  To learn

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)

2013-09-16 Thread Patrick Wendell
I also wrote an audit script [1] to verify various aspects of the
release binaries and ran it on this RC. People are welcome to run this
themselves, but I haven't tested it on other machines yet, and some of
the Spark tests are very sensitive to the test environment :) Output
is pasted below:

[1] https://github.com/pwendell/spark-utils/blob/master/release_auditor.py

-
 Verifying download integrity for artifact:
spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
[PASSED] Artifact signature verified.
[PASSED] Artifact MD5 verified.
[PASSED] Artifact SHA verified.
[PASSED] Tarball contains CHANGES.txt file
[PASSED] Tarball contains NOTICE file
[PASSED] Tarball contains LICENSE file
[PASSED] README file contains disclaimer
 Verifying download integrity for artifact:
spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
[PASSED] Artifact signature verified.
[PASSED] Artifact MD5 verified.
[PASSED] Artifact SHA verified.
[PASSED] Tarball contains CHANGES.txt file
[PASSED] Tarball contains NOTICE file
[PASSED] Tarball contains LICENSE file
[PASSED] README file contains disclaimer
 Verifying download integrity for artifact:
spark-0.8.0-incubating-rc5.tgz 
[PASSED] Artifact signature verified.
[PASSED] Artifact MD5 verified.
[PASSED] Artifact SHA verified.
[PASSED] Tarball contains CHANGES.txt file
[PASSED] Tarball contains NOTICE file
[PASSED] Tarball contains LICENSE file
[PASSED] README file contains disclaimer
 Verifying build and tests for artifact:
spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
== Running build
[PASSED] sbt build successful
[PASSED] Maven build successful
== Performing unit tests
[PASSED] Tests successful
 Verifying build and tests for artifact:
spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
== Running build
[PASSED] sbt build successful
[PASSED] Maven build successful
== Performing unit tests
[PASSED] Tests successful
 Verifying build and tests for artifact: spark-0.8.0-incubating-rc5.tgz 
== Running build
[PASSED] sbt build successful
[PASSED] Maven build successful
== Performing unit tests
[PASSED] Tests successful

- Patrick

On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.8.0. This will be the first incubator release for
 Spark in Apache.

 The tag to be voted on is v0.8.0-incubating (commit d9e80d5):
 https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/

 Please vote on releasing this package as Apache Spark 0.8.0-incubating!
 The vote is open until Thursday, September 19th at 05:00 UTC and passes if
 a majority of at least 3 +1 IPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.8.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/


Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)

2013-09-16 Thread Patrick Wendell
Hey folks, just FYI we found one minor issue with this RC (the kafka
jar in the stream pom needs to be published as provided since it's
not available in maven). Please still continue to test this and
provide feedback here until the following RC is posted later.

- Patrick

On Mon, Sep 16, 2013 at 1:28 PM, Reynold Xin r...@cs.berkeley.edu wrote:
 +1


 --
 Reynold Xin, AMPLab, UC Berkeley
 http://rxin.org



 On Sun, Sep 15, 2013 at 11:09 PM, Patrick Wendell pwend...@gmail.comwrote:

 I also wrote an audit script [1] to verify various aspects of the
 release binaries and ran it on this RC. People are welcome to run this
 themselves, but I haven't tested it on other machines yet, and some of
 the Spark tests are very sensitive to the test environment :) Output
 is pasted below:

 [1] https://github.com/pwendell/spark-utils/blob/master/release_auditor.py

 -
  Verifying download integrity for artifact:
 spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
 [PASSED] Artifact signature verified.
 [PASSED] Artifact MD5 verified.
 [PASSED] Artifact SHA verified.
 [PASSED] Tarball contains CHANGES.txt file
 [PASSED] Tarball contains NOTICE file
 [PASSED] Tarball contains LICENSE file
 [PASSED] README file contains disclaimer
  Verifying download integrity for artifact:
 spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
 [PASSED] Artifact signature verified.
 [PASSED] Artifact MD5 verified.
 [PASSED] Artifact SHA verified.
 [PASSED] Tarball contains CHANGES.txt file
 [PASSED] Tarball contains NOTICE file
 [PASSED] Tarball contains LICENSE file
 [PASSED] README file contains disclaimer
  Verifying download integrity for artifact:
 spark-0.8.0-incubating-rc5.tgz 
 [PASSED] Artifact signature verified.
 [PASSED] Artifact MD5 verified.
 [PASSED] Artifact SHA verified.
 [PASSED] Tarball contains CHANGES.txt file
 [PASSED] Tarball contains NOTICE file
 [PASSED] Tarball contains LICENSE file
 [PASSED] README file contains disclaimer
  Verifying build and tests for artifact:
 spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
 == Running build
 [PASSED] sbt build successful
 [PASSED] Maven build successful
 == Performing unit tests
 [PASSED] Tests successful
  Verifying build and tests for artifact:
 spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
 == Running build
 [PASSED] sbt build successful
 [PASSED] Maven build successful
 == Performing unit tests
 [PASSED] Tests successful
  Verifying build and tests for artifact:
 spark-0.8.0-incubating-rc5.tgz 
 == Running build
 [PASSED] sbt build successful
 [PASSED] Maven build successful
 == Performing unit tests
 [PASSED] Tests successful

 - Patrick

 On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark
  (incubating) version 0.8.0. This will be the first incubator release for
  Spark in Apache.
 
  The tag to be voted on is v0.8.0-incubating (commit d9e80d5):
  https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating
 
  The release files, including signatures, digests, etc can be found at:
  http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
 
 https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/
 
  Please vote on releasing this package as Apache Spark 0.8.0-incubating!
  The vote is open until Thursday, September 19th at 05:00 UTC and passes
 if
  a majority of at least 3 +1 IPMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 0.8.0-incubating
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.incubator.apache.org/



Re: git commit: Hard code scala version in pom files.

2013-09-15 Thread Patrick Wendell
So Mark does that mean you'd be OK with us hard coding the scala
version in branch 0.8.0 build? It just seems like the overall simplest
solution for now. Or would this cause a large problem for you guys?

We can solve this on master for 0.9, I didn't touch master at all wrt
the maven build.

- Patrick

On Sun, Sep 15, 2013 at 7:32 PM, Mark Hamstra m...@clearstorydata.com wrote:
 Yes, it looks like we need to do something to get 0.8.0 shipped and
 something to fix the problem longer term.  I agree that those somethings
 don't have to be the same thing, and that we can take this up again once
 the 0.8.0 dust has settled.

 Give me a day and I'll probably have more to say about how I'd like things
 to look in the future.



 On Sun, Sep 15, 2013 at 7:06 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Mark,

 Thanks for providing the detailed explanation.

 My primary concern was just that this changes the published artifacts
 in a way that could break downstream consumers of these poms which may
 assume that artifact id's are immutable within a pom.xml file. For
 now, let me revert my change and test that a few important things
 still work (e.g. IDE's, etc). At a minimum I just want to make sure
 things we are advising people to do don't break under this release. If
 this doesn't break those things we can  move forward with the
 parameterized artifacts for 0.8.0.

 Just a word of caution though, there may be other downstream consumers
 of the pom files for whom this will cause a problem in the future. If
 someone presents a compelling reason, we'll have to think about
 whether we can keep publishing them like this, since this is not
 technically a valid maven format.

 - Patrick

 On Sun, Sep 15, 2013 at 6:46 PM, Mark Hamstra m...@clearstorydata.com
 wrote:
  Ah sorry, I've gotten so used to using ClearStory's poms (where we make
  quite a lot of use of such parameterization) that I lost track of exactly
  when Spark's maven build was changed to work in a similar way.
 
  This all revolves around a basic difference of opinion as to whether the
  thing that specifies how a project is built should be a fixed, static
  document or is more of a program itself or a parameterized function that
  drives the build and results in an artifact.  SBT is of the latter
 opinion,
  while Maven (at least with Maven 3) is going the other way.  That means
  that building idiomatic Scala artifacts (which expect things like
  cross-versioning support and artifactIds that include the Scala binary
  version that was used to create them) is somewhat at odds with the Maven
  philosophy.  Hard-coding artifactIds, versions, and whatever else Maven
 now
  requires to guarantee that a pom file be a fixed, repeatable build
  description works okay for a single build of an artifact; and a user of
  just that built artifact won't have to change behavior if the pom is no
  longer parameterized.  However, users who are not just interested in
 using
  pre-built artifacts but also in modifying, adding to or reusing the code
 do
  have to change their behavior if parameterized Maven builds disappear
 (yes,
  you have pointed out the state of affairs with the 0.6 and 0.7 releases;
  I'll point out that some of those making further use of the code have
 been
  using the current, not-yet-released poms for a good while.)
 
  Without some form of parameterized Maven builds, developers who now rely
  upon such parameterized builds will have to choose to fork the Apache
 poms
  and maintain their own parameterized build, or to repeatedly and manually
  edit static Apache pom files in order to change artifactIds and
 dependency
  versions (which is a frequent need when integrating Spark into a much
  larger and more complicated technology stack), or to switch over to using
  SBT in order to get parameterized builds (which, of course, would
  necessitate a lot of other changes, not all of them welcome.)  Archetypes
  or something similar seems like a way to satisfy Maven's new requirement
  for static build configurations while at the same time providing a
  parameterized way to generate that configuration or a modified version of
  it -- solving the problem by adding a layer of abstraction.
 
 
  On Sun, Sep 15, 2013 at 6:12 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Hey Mark,
 
  Could you describe a user whose behavior is changed by this, and how
  it is changed? This commit actually brings 0.8 in line with the 0.7
  and 0.6 branches, where the scala version is hard coded in the
  released artifacts:
 
 
 
 http://repo1.maven.org/maven2/org/spark-project/spark-streaming_2.9.3/0.7.3/spark-streaming_2.9.3-0.7.3.pom
 
  That seems to me to minimize the changes in user behavior as much as
  possible. It would be bad if during the 0.8 release the format of our
  released artifacts changed in a way that caused things to break for
  users. One example of something that could break is an IDE or some
  other tool that consumes these builds

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC4)

2013-09-15 Thread Patrick Wendell
Yes, we've moved onto RC5, thanks.

On Sun, Sep 15, 2013 at 10:06 PM, Henry Saputra henry.sapu...@gmail.com wrote:
 Looks like this VOTE thread has been cancelled.

 Patrick has sent VOTE for RC5 in separate thread.

 - Henry

 On Saturday, September 14, 2013, Patrick Wendell wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.8.0. This will be the first incubator release for
 Spark in Apache.

 The tag to be voted on is v0.8.0-incubating (commit 32fc250):
 https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc4/files/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-046/org/apache/spark/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc4/docs/

 Please vote on releasing this package as Apache Spark 0.8.0-incubating!

 The vote is open until Tuesday, September 17th at 10:00 UTC and passes if
 a majority of at least 3 +1 IPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.8.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/



Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC3)

2013-09-13 Thread Patrick Wendell
I'll post another RC in a bit which addresses Mark's comments (though
please continue to provide feedback on this one!).

Suresh - it's signed with the following key:

http://people.apache.org/~pwendell/9E4FE3AF.asc



On Fri, Sep 13, 2013 at 11:28 AM, Mark Hamstra m...@clearstorydata.com wrote:
 [X] -1 Do not release this package because ...

 Prior, out-of-band discussion:

 Thanks for the insight Mark, we need to move this discussion to the
 main VOTE thread in dev@ list to be official.

 Mark, could you kind reply to Patrick VOTE email thread with the -1
 vote to make sure community know that there are missing pieces in the
 release artifacts proposed by Spark's RE (Patrick)

 Thanks,

 Henry


 On Thu, Sep 12, 2013 at 10:57 PM, Mark Hamstra m...@clearstorydata.com
 wrote:
  Yeah, that may get tricky, because the check of the tests in the
 'prepare'
  step and the running of the deploy goal in the 'perform' step (excuse my
  calling it 'release' previously) will want to change the build
 dependencies.
  We may end up needing to do as Patrick has been doing but then run a
  separate script to make sure that the yarn and repl-bin modules get
 properly
  versioned, tagged, and uploaded.  Maybe a maven-release-plugin expert
 knows
  how to get it do just what we want, but I certainly don't see how myself
  right now.
 
 
  On Thu, Sep 12, 2013 at 10:45 PM, Matei Zaharia matei.zaha...@gmail.com
 
  wrote:
 
  Hmm, one potentially nasty issue here is if spark-core ends up depending
  on hadoop-client 2.0.x instead of 1.0.4 by default with these settings.
 We
  should make sure that doesn't happen.
 
  If you'll make another RC, here are a few other small fixes I'd suggest:
 
  - In the title tag of docs/_layout/global.html, use
  site.SPARK_VERSION_SHORT instead of SPARK_VERSION (it's kind of verbose
 now)
 
  - Fix the jets3t version thing mentioned here:
  https://github.com/mesos/spark/pull/919 (just remove the unneeded
 version
  from core/pom.xml)
 
  Matei
 
  On Sep 12, 2013, at 10:25 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
   Oh I see - okay I'll try to make sure they (a) get pushed and (b) have
   the correct version. Thanks for bringing this up, would have totally
   missed it otherwise.
  
   On Thu, Sep 12, 2013 at 10:20 PM, Mark Hamstra 
 m...@clearstorydata.com
   wrote:
   I just mean that with the yarn and repl-bin poms still specifying
   SNAPSHOT
   versions, any maven build that tries to use the hadoop2-yarn or
   repl-bin
   profile will not work because those modules will not be able to find
 a
   SNAPSHOT parent pom.  Including those profiles in the prepare and
   release
   step should fix the problem, but you may need to manually sync up the
   version of those two pom files first.
  
  
   On Thu, Sep 12, 2013 at 10:16 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
  
   Hey Mark,
  
   I haven't been including those - I'll use that flag and try to
 publish
   again. The last sentence there the maven build is broken does that
   refer to an additional problem, or just the problem of me not
   including the flag.
  
   - Patrick
  
   On Thu, Sep 12, 2013 at 10:11 PM, Mark Hamstra
   m...@clearstorydata.com
   wrote:
   It's a definite do not release from me because you are still not
   picking
   up all of the modules in your prepare and release.  Are you
 including
   -Phadoop2-yarn,repl-bin on the command line for your mvn
 prepare
   and
   mvn release?  Because the yarn module and repl-bin module are not
   being
   processed by the maven-release-plugin, so the pom files for those
   modules
   still show their version as 0.8.0-incubating-SNAPSHOT instead of
   0.8.0-incubating.  That means that the maven build is broken.
  


 On Thu, Sep 12, 2013 at 3:57 PM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.8.0. This will be the first incubator release for
 Spark in Apache.

 The tag to be voted on is v0.8.0-incubating (commit ffacd17):
 https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/files/

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-034/org/apache/spark/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/docs/

 Please vote on releasing this package as Apache Spark 0.8.0-incubating!

 The vote is open until Saturday, June 13th at 23:00 UTC and passes if
 a majority of at least 3 +1 IPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.8.0-incubating
 [X] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/



Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC3)

2013-09-12 Thread Patrick Wendell
Hey guys, we actually decided on a slightly different naming
convention for the downloads. I'm going to amend the files in the next
few minutes... in case anyone happens to be looking *this instant*
(which I doubt) hold off until I update them.

On Thu, Sep 12, 2013 at 3:57 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.8.0. This will be the first incubator release for
 Spark in Apache.

 The tag to be voted on is v0.8.0-incubating (commit ffacd17):
 https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/files/

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-034/org/apache/spark/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/docs/

 Please vote on releasing this package as Apache Spark 0.8.0-incubating!

 The vote is open until Saturday, June 13th at 23:00 UTC and passes if
 a majority of at least 3 +1 IPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.8.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/


Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC3)

2013-09-12 Thread Patrick Wendell
Fixed!

On Thu, Sep 12, 2013 at 4:22 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey guys, we actually decided on a slightly different naming
 convention for the downloads. I'm going to amend the files in the next
 few minutes... in case anyone happens to be looking *this instant*
 (which I doubt) hold off until I update them.

 On Thu, Sep 12, 2013 at 3:57 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.8.0. This will be the first incubator release for
 Spark in Apache.

 The tag to be voted on is v0.8.0-incubating (commit ffacd17):
 https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/files/

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-034/org/apache/spark/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/docs/

 Please vote on releasing this package as Apache Spark 0.8.0-incubating!

 The vote is open until Saturday, June 13th at 23:00 UTC and passes if
 a majority of at least 3 +1 IPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.8.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/


[VOTE] Release Apache Spark 0.8.0-incubating (RC3)

2013-09-12 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark
(incubating) version 0.8.0. This will be the first incubator release for
Spark in Apache.

The tag to be voted on is v0.8.0-incubating (commit ffacd17):
https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating

The release files, including signatures, digests, etc can be found at:
http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/files/

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-034/org/apache/spark/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/docs/

Please vote on releasing this package as Apache Spark 0.8.0-incubating!

The vote is open until Saturday, June 13th at 23:00 UTC and passes if
a majority of at least 3 +1 IPMC votes are cast.

[ ] +1 Release this package as Apache Spark 0.8.0-incubating
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.incubator.apache.org/


Re: Spark 0.8.0-incubating RC2

2013-09-11 Thread Patrick Wendell
Hey Chris,

The only issue with CHANGES.txt is that we've only recently become
more disciplined about tracking issues in JIRA and tracking version
numbers when we do make JIRA issues. If we generated a CHANGES.txt
based on JIRA, it would be largely incomplete since many changes from
the beginning of the release would be missing.

What about if I created a CHANGES.txt based on the Git history? Would
that be better than not having one at all?

- Patrick

On Wed, Sep 11, 2013 at 6:58 AM, Chris Mattmann mattm...@apache.org wrote:
 Hey Patrick,

 Looking good. If the license info and so forth has been vetted and
 looks good which it sounds like Henry and others have checked out,
 I took a look at:

 http://people.apache.org/~pwendell/spark-rc/

 And the only thing I would recommend adding is some CHANGES.txt file
 that contains a JIRA change log of what is provided in this RC.

 But I would definitely proceed to a [VOTE] thread on the RC and let's
 get this going formally.

 Great work.

 Cheers,
 Chris


 -Original Message-
 From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov
 Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Date: Friday, September 6, 2013 4:15 PM
 To: Patrick Wendell pwend...@gmail.com
 Cc: dev@spark.incubator.apache.org dev@spark.incubator.apache.org,
 Henry Saputra henry.sapu...@gmail.com
 Subject: Re: Spark 0.8.0-incubating RC2

Awesome was going to tell you it might take a sec to sync. Woot.

OK more tonight..

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Patrick Wendell pwend...@gmail.com
Date: Friday, September 6, 2013 2:14 PM
To: jpluser chris.a.mattm...@jpl.nasa.gov
Cc: dev@spark.incubator.apache.org dev@spark.incubator.apache.org,
Henry Saputra henry.sapu...@gmail.com
Subject: Re: Spark 0.8.0-incubating RC2

Thanks Chris - also it appears that my key has now been added to this
file:

http://people.apache.org/keys/group/spark.asc

- Patrick

On Fri, Sep 6, 2013 at 1:57 PM, Mattmann, Chris A (398J)
chris.a.mattm...@jpl.nasa.gov wrote:
 Feedback coming, sorry been swamped and only recently back from
DC/DARPA
 but will reply soon (hopefully tonight).

 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++






 -Original Message-
 From: Patrick Wendell pwend...@gmail.com
 Date: Friday, September 6, 2013 1:56 PM
 To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org
 Cc: jpluser chris.a.mattm...@jpl.nasa.gov, Henry Saputra
 henry.sapu...@gmail.com
 Subject: Re: Spark 0.8.0-incubating RC2

Hey Chris, Henry... do you guys have feedback here? This was based
largely on your feedback in the last round :)

On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell pwend...@gmail.com
wrote:
 Hey Evan,

 These are posted primarily for the purpose of having the Apache
 mentors look at the bundling format, they are not likely to be the
 exact commit we release. Matei will be merging in some doc stuff
 before the release, I'm pretty sure that includes your docs.

 - Patrick

 On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan e...@ooyala.com wrote:
 Patrick,

 I'm planning to submit documentation PR's against mesos/spark, by
tomorrow,
 is that OK?We really should update the docs.

 thanks,
 Evan



 On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell pwend...@gmail.com
wrote:

 No these are posted primarily for the purpose of having the Apache
 mentors look at the bundling format, they are not likely to be the
 exact commit we release (though this RC was
 fc6fbfe7d7e9171572c898d9e90301117517e60e).

 On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra
m...@clearstorydata.com
 wrote:
  Are these RCs not getting tagged in the repository, or am I just
not
  looking in the right place?
 
 
 
  On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell
pwend...@gmail.com
 wrote:
 
  Hey All,
 
  Matei asked me to pick this up because he's travelling this
week. I
  cut a second release candidate from the head of the 0.8 branch
(on
  mesos/spark gitub) to address the following issues:
 
  - RC is now

Re: Spark 0.8.0-incubating RC2

2013-09-08 Thread Patrick Wendell
Thanks Henry. The MLLib files have been fixed since you ran the tool.

On Sat, Sep 7, 2013 at 11:25 PM, Henry Saputra henry.sapu...@gmail.com wrote:
 HI Patrick,

 I ran the Apache RAT tool as shown at
 http://creadur.apache.org/rat/apache-rat/index.html:

 java -jar apache-rat-0.10.jar ~/Downloads/spark-0.8.0-src-incubating-RC2

 However we should add maven plugin to Spark pom.xml to support
 integrated RAT check as part of CI later.

 - Henry

 On Sat, Sep 7, 2013 at 11:24 AM, Patrick Wendell pwend...@gmail.com wrote:
 Henry,

 Thanks a lot for your feedback.

 Could you let me know how you ran Apache RAT tool so I can reproduce this?

 My sense is that the best next step is to do a RC that is built
 against the Apache Git and also includes both `src` and `bin` in
 addition to cleaned up license files. Some inline responses below.

 1. I only see source artifacts in Patrick's p.a.o URL. I assume the
 pre-built ones will also be published with hash and signed?

 Yes, we'll do both src and binary releases. I'll hash, and sign both.

 2. For every ASF release, we need designated release engineer (RE)
 that will drive the release process including determining bugs to be
 included, make sure all files have the right ASF header (running maven
 RAT plugin check), create release branch, update version for next
 development, create release artifacts and sign them correctly. I
 assume this would be Matei or Patrick?

 Yes, this might be me for this release because I've got the keys
 correctly set-up. I'll chat with Matei when he's back.

 3. The proposed source artifacts 0.8.0-RC2's signature looks good and
 hash looks good. However it was generated against github mesos:spark
 repo.
 Reminder that when we send proposal for release to
 general@incubator.a.o we need to generate RC builds using ASF git repo
 with the right tagged branch.

 Next RC we will take care of this.

 4. I ran RAT check for the source artifact and found a lot of source
 do not have ASF license header.

  For example some in repl directory has this:

 /* NSC -- new Scala compiler
  * Copyright 2005-2011 LAMP/EPFL
  * @author Paul Phillips
  */

 Not sure if we need to ASF header to it since we are technically put
 in under apache package.

 Scala source files under mllib are missing ASF headers.

 See comment above.

 5. Add public key of RE to
 http://people.apache.org/keys/group/spark.asc (@Chris do we still need
 to create KEYS file in the Spark git repo?)

 This is now finished for me :)


Re: Spark 0.8.0-incubating RC2

2013-09-07 Thread Patrick Wendell
Henry,

Thanks a lot for your feedback.

Could you let me know how you ran Apache RAT tool so I can reproduce this?

My sense is that the best next step is to do a RC that is built
against the Apache Git and also includes both `src` and `bin` in
addition to cleaned up license files. Some inline responses below.

 1. I only see source artifacts in Patrick's p.a.o URL. I assume the
 pre-built ones will also be published with hash and signed?

Yes, we'll do both src and binary releases. I'll hash, and sign both.

 2. For every ASF release, we need designated release engineer (RE)
 that will drive the release process including determining bugs to be
 included, make sure all files have the right ASF header (running maven
 RAT plugin check), create release branch, update version for next
 development, create release artifacts and sign them correctly. I
 assume this would be Matei or Patrick?

Yes, this might be me for this release because I've got the keys
correctly set-up. I'll chat with Matei when he's back.

 3. The proposed source artifacts 0.8.0-RC2's signature looks good and
 hash looks good. However it was generated against github mesos:spark
 repo.
 Reminder that when we send proposal for release to
 general@incubator.a.o we need to generate RC builds using ASF git repo
 with the right tagged branch.

Next RC we will take care of this.

 4. I ran RAT check for the source artifact and found a lot of source
 do not have ASF license header.

  For example some in repl directory has this:

 /* NSC -- new Scala compiler
  * Copyright 2005-2011 LAMP/EPFL
  * @author Paul Phillips
  */

 Not sure if we need to ASF header to it since we are technically put
 in under apache package.

 Scala source files under mllib are missing ASF headers.

See comment above.

 5. Add public key of RE to
 http://people.apache.org/keys/group/spark.asc (@Chris do we still need
 to create KEYS file in the Spark git repo?)

This is now finished for me :)


Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Patrick Wendell
Hey Chris, Henry... do you guys have feedback here? This was based
largely on your feedback in the last round :)

On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Evan,

 These are posted primarily for the purpose of having the Apache
 mentors look at the bundling format, they are not likely to be the
 exact commit we release. Matei will be merging in some doc stuff
 before the release, I'm pretty sure that includes your docs.

 - Patrick

 On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan e...@ooyala.com wrote:
 Patrick,

 I'm planning to submit documentation PR's against mesos/spark, by tomorrow,
 is that OK?We really should update the docs.

 thanks,
 Evan



 On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell pwend...@gmail.com wrote:

 No these are posted primarily for the purpose of having the Apache
 mentors look at the bundling format, they are not likely to be the
 exact commit we release (though this RC was
 fc6fbfe7d7e9171572c898d9e90301117517e60e).

 On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra m...@clearstorydata.com
 wrote:
  Are these RCs not getting tagged in the repository, or am I just not
  looking in the right place?
 
 
 
  On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Hey All,
 
  Matei asked me to pick this up because he's travelling this week. I
  cut a second release candidate from the head of the 0.8 branch (on
  mesos/spark gitub) to address the following issues:
 
  - RC is now hosted in an apache web space
  - RC now includes signature
  - RC now includes MD5 and SHA512 digests
 
  [tgz]
 
 http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubating-RC2.tgz
  [all files] http://people.apache.org/~pwendell/spark-rc/
 
  It would be great to get feedback on the release structure. I also
  changed the name to include src since we will be releasing both
  source and binary releases.
 
  I was a bit confused about how to attach my GPG key to the spark.asc
  file. I took the following steps.
 
  1. Greated a GPG key locally
  2. Distributed the key to public key servers (gpg --send-key)
  3. Add exported key to my apache web space:
  http://people.apache.org/~pwendell/9E4FE3AF.asc
  4. Added the key fingerprint at id.apage.org
  5. Create an apache FOAF file with the key signature
 
  However, this doesn't seem sufficient to get my key on this page (at
  least, not yet):
  http://people.apache.org/keys/group/spark.asc
 
  Chris - are there other steps I missed? Is there a manual way to
  augment this file?
 
  - Patrick
 




 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |

 http://www.ooyala.com/
 http://www.facebook.com/ooyalahttp://www.linkedin.com/company/ooyalahttp://www.twitter.com/ooyala


  1   2   >