Re: Signal/Noise Ratio
Hey Chris, Would the following be consistent with the Apache guidelines? (a) We establish a culture of not having overall design discussions on github. Design discussions should to occur on JIRA or on the dev list. IMO this is pretty much already true, but there are a few exceptions. (b) We add a mailing list called github@s.a.o which receives the github traffic. This way everything is available in Apache infra. (c) Because of our use of JIRA it might make sense to have an issues@s.a.o list as well similar to what YARN and other projects use. The github chatter is so noisy that I think, overall, it decreases engagement with the official developer list. This is the opposite of what we want. - Patrick On Sat, Feb 22, 2014 at 11:34 AM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Hi Everyone, The biggest thing is simply making sure that the dev@projecta.o list is meaningful and that meaningful development isn't going on elsewhere that constitute decisions for the Apache project as reified in code contributions and overall stewardship of the effort. I noticed in a few emails from Github relating to comments on Github Pull Requests some conversation which I deemed to be relevant to the project, so I brought this up and it came up during graduation. Here's a general rule of thumb: it's fine if devs converse e.g., on Github, etc., and even if it's project discussion *so long as* that relevant project discussion makes it way in some form to the actual, bona fide project's dev@projecta.o list, giving others in the community not necessarily on Github or watching Github or part of that non Apache conversation to comment, and be part of the community led decisions for the project there. Making its way to that bona fide Apache project dev list can happen in several ways. 1. by simply direct 1:1 mapping from Github comments which I see Apache project related dev discussion on from time to time and believe fits the criteria I'm describing above to the project's dev@project.a.o list. 2. by not 1:1 mapping all Github conversation to the dev@project.a.o list, but to some other list, e.g., github@projecta.o, for example (or any of the others being discussed) *so long as*, and this is key, that those discussions on Github get summarized on the dev@project.a.o list giving everyone an opportunity to participate in the development by being *here at Apache*. 3. By not worrying about Github at all and simply doing all the development here at the ASF. 4. Others.. My feeling is that some combination of #1 and #2 can pass muster, and the Apache Spark community can decide. That said, noise reduction can also lead to loss of precision and accuracy and don't be surprised in reducing that noise if some key thing makes it onto a Github PR but didn't make it onto the dev list b/c we are all human and forgot to summarize it there. Even if that happens, we assume everyone has good intentions and we simply address those issues when/if they come up. Cheers, Chris -Original Message- From: Sandy Ryza sandy.r...@cloudera.com Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Date: Saturday, February 22, 2014 11:19 AM To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Subject: Re: Signal/Noise Ratio Hadoop subprojects (MR, YARN, HDFS) each have a dev list that contains discussion as well as a single email whenever a JIRA is filed, and an issues list with all the JIRA activity. I think this works out pretty well. Subscribing just to the dev list, I can keep up with changes that are going to be made and follow the ones I care about. And the issues list is there if I want the firehose. Is Apache actually prescriptive that a list with dev in its name needs to contain all discussion? If so, most projects I've followed are violating this. On Fri, Feb 21, 2014 at 7:54 PM, Kay Ousterhout k...@eecs.berkeley.eduwrote: It looks like there's at least one other apache project, jclouds, that sends the github notifications to a separate notifications@ list (see http://mail-archives.apache.org/mod_mbox/incubator-general/201402.mbox/%3 C1391721862.67613.YahooMailNeo%40web172602.mail.ir2.yahoo.com%3E ). Given that many people are annoyed by getting the messages on this list, and that there is some precedent for sending them to a different list, I'd be in favor of doing that. On Fri, Feb 21, 2014 at 6:18 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Sweet great job Reynold. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-283, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++
Re: Signal/Noise Ratio
btw - I'd prefer reviews@s.a.o instead of github@ to remain more neutral and flexible. On Sat, Feb 22, 2014 at 12:35 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Chris, Would the following be consistent with the Apache guidelines? (a) We establish a culture of not having overall design discussions on github. Design discussions should to occur on JIRA or on the dev list. IMO this is pretty much already true, but there are a few exceptions. (b) We add a mailing list called github@s.a.o which receives the github traffic. This way everything is available in Apache infra. (c) Because of our use of JIRA it might make sense to have an issues@s.a.o list as well similar to what YARN and other projects use. The github chatter is so noisy that I think, overall, it decreases engagement with the official developer list. This is the opposite of what we want. - Patrick On Sat, Feb 22, 2014 at 11:34 AM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Hi Everyone, The biggest thing is simply making sure that the dev@projecta.o list is meaningful and that meaningful development isn't going on elsewhere that constitute decisions for the Apache project as reified in code contributions and overall stewardship of the effort. I noticed in a few emails from Github relating to comments on Github Pull Requests some conversation which I deemed to be relevant to the project, so I brought this up and it came up during graduation. Here's a general rule of thumb: it's fine if devs converse e.g., on Github, etc., and even if it's project discussion *so long as* that relevant project discussion makes it way in some form to the actual, bona fide project's dev@projecta.o list, giving others in the community not necessarily on Github or watching Github or part of that non Apache conversation to comment, and be part of the community led decisions for the project there. Making its way to that bona fide Apache project dev list can happen in several ways. 1. by simply direct 1:1 mapping from Github comments which I see Apache project related dev discussion on from time to time and believe fits the criteria I'm describing above to the project's dev@project.a.o list. 2. by not 1:1 mapping all Github conversation to the dev@project.a.o list, but to some other list, e.g., github@projecta.o, for example (or any of the others being discussed) *so long as*, and this is key, that those discussions on Github get summarized on the dev@project.a.o list giving everyone an opportunity to participate in the development by being *here at Apache*. 3. By not worrying about Github at all and simply doing all the development here at the ASF. 4. Others.. My feeling is that some combination of #1 and #2 can pass muster, and the Apache Spark community can decide. That said, noise reduction can also lead to loss of precision and accuracy and don't be surprised in reducing that noise if some key thing makes it onto a Github PR but didn't make it onto the dev list b/c we are all human and forgot to summarize it there. Even if that happens, we assume everyone has good intentions and we simply address those issues when/if they come up. Cheers, Chris -Original Message- From: Sandy Ryza sandy.r...@cloudera.com Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Date: Saturday, February 22, 2014 11:19 AM To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Subject: Re: Signal/Noise Ratio Hadoop subprojects (MR, YARN, HDFS) each have a dev list that contains discussion as well as a single email whenever a JIRA is filed, and an issues list with all the JIRA activity. I think this works out pretty well. Subscribing just to the dev list, I can keep up with changes that are going to be made and follow the ones I care about. And the issues list is there if I want the firehose. Is Apache actually prescriptive that a list with dev in its name needs to contain all discussion? If so, most projects I've followed are violating this. On Fri, Feb 21, 2014 at 7:54 PM, Kay Ousterhout k...@eecs.berkeley.eduwrote: It looks like there's at least one other apache project, jclouds, that sends the github notifications to a separate notifications@ list (see http://mail-archives.apache.org/mod_mbox/incubator-general/201402.mbox/%3 C1391721862.67613.YahooMailNeo%40web172602.mail.ir2.yahoo.com%3E ). Given that many people are annoyed by getting the messages on this list, and that there is some precedent for sending them to a different list, I'd be in favor of doing that. On Fri, Feb 21, 2014 at 6:18 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Sweet great job Reynold. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Re: Signal/Noise Ratio
Hey All, I created a JIRA to ask infra to create a dedicated reviews@ mailing list for this purpose. https://issues.apache.org/jira/browse/INFRA-7368 Hopefully they can migrate the github stream to this list so that people can distinguish it from developer discussions. In parallel, we are also trying to see if we can use the github status notifier rather than the constant comments from jenkins. - Patrick On Sat, Feb 22, 2014 at 1:04 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Patrick, +1 to the below. Great summary and yes I think that would work great. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-283, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Patrick Wendell pwend...@gmail.com Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Date: Saturday, February 22, 2014 12:35 PM To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Subject: Re: Signal/Noise Ratio Hey Chris, Would the following be consistent with the Apache guidelines? (a) We establish a culture of not having overall design discussions on github. Design discussions should to occur on JIRA or on the dev list. IMO this is pretty much already true, but there are a few exceptions. (b) We add a mailing list called github@s.a.o which receives the github traffic. This way everything is available in Apache infra. (c) Because of our use of JIRA it might make sense to have an issues@s.a.o list as well similar to what YARN and other projects use. The github chatter is so noisy that I think, overall, it decreases engagement with the official developer list. This is the opposite of what we want. - Patrick On Sat, Feb 22, 2014 at 11:34 AM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Hi Everyone, The biggest thing is simply making sure that the dev@projecta.o list is meaningful and that meaningful development isn't going on elsewhere that constitute decisions for the Apache project as reified in code contributions and overall stewardship of the effort. I noticed in a few emails from Github relating to comments on Github Pull Requests some conversation which I deemed to be relevant to the project, so I brought this up and it came up during graduation. Here's a general rule of thumb: it's fine if devs converse e.g., on Github, etc., and even if it's project discussion *so long as* that relevant project discussion makes it way in some form to the actual, bona fide project's dev@projecta.o list, giving others in the community not necessarily on Github or watching Github or part of that non Apache conversation to comment, and be part of the community led decisions for the project there. Making its way to that bona fide Apache project dev list can happen in several ways. 1. by simply direct 1:1 mapping from Github comments which I see Apache project related dev discussion on from time to time and believe fits the criteria I'm describing above to the project's dev@project.a.o list. 2. by not 1:1 mapping all Github conversation to the dev@project.a.o list, but to some other list, e.g., github@projecta.o, for example (or any of the others being discussed) *so long as*, and this is key, that those discussions on Github get summarized on the dev@project.a.o list giving everyone an opportunity to participate in the development by being *here at Apache*. 3. By not worrying about Github at all and simply doing all the development here at the ASF. 4. Others.. My feeling is that some combination of #1 and #2 can pass muster, and the Apache Spark community can decide. That said, noise reduction can also lead to loss of precision and accuracy and don't be surprised in reducing that noise if some key thing makes it onto a Github PR but didn't make it onto the dev list b/c we are all human and forgot to summarize it there. Even if that happens, we assume everyone has good intentions and we simply address those issues when/if they come up. Cheers, Chris -Original Message- From: Sandy Ryza sandy.r...@cloudera.com Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Date: Saturday, February 22, 2014 11:19 AM To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Subject: Re: Signal/Noise Ratio Hadoop subprojects (MR, YARN, HDFS) each have a dev list that contains discussion as well as a single email whenever a JIRA is filed, and an issues
Re: Request to review PR #605
Hey Punya, It's sufficient to just ping the request on github rather than e-mail the dev list. Sometimes it can takes a few days for people to get to looking at patches... - Patrick On Sat, Feb 22, 2014 at 5:17 PM, Punya Biswal pbis...@palantir.com wrote: Hi all, Can someone review and/or merge PR #605 (convert or move Java code)? It's been sitting for four days. Thanks! Punya
Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark
Hey Everyone, We are going to publish artifacts to maven central in the exact same format no matter which build system we use. For normal consumers of Spark {maven vs sbt} won't make a difference. It will make a difference for people who are extended the Spark build to do their own packaging. This is what I'm trying to gauge - does anyone do this in a way where they feel only maven or only sbt supports their particular issue. - Patrick On Fri, Feb 21, 2014 at 12:40 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, My small contrib to the discussion. SBT is able to publish Maven artifacts generating the POM and all JAR signed files. So even if not in the project, a Pom can be found somewhere. Pascal On Fri, Feb 21, 2014 at 9:28 AM, Paul Brown p...@mult.ifario.us wrote: As a customer of the code, I don't care *how* the code gets built, but it is important to me that the Maven artifacts (POM files, binaries, sources, javadocs) are clean, accurate, up to date, and published on Maven Central. Some examples where structure/publishing failures have been bad for users: - For a long time (and perhaps still), Solr and Lucene were built by an Ant build that produced incorrect POMs and required potential developers to manually configure their IDEs. - For a long time (and perhaps still), Pig was built by Ant, published incorrect POMs, and failed to publish useful auxiliary artifacts like PigUnit and the PiggyBank as Maven-addressable artifacts. (That said, thanks to Spark, we no longer use Pig...) - For a long time (and perhaps still), Cassandra depended on non-generally-available libraries (high-scale, etc.) that made it inconvenient to embed Cassandra in a larger system. Cassandra gets a little slack because the build/structure was almost too terrible to look at prior to incubation and it's gotten better... And those are just a few projects at Apache that come to mind; I could make a longish list of offenders. btw, among other things that the Spark project probably *should* do would be to publish artifacts with a classifier to distinguish the Hadoop version linked against. I'll be a happy user of sbt-built artifacts, or if the project goes/sticks with Maven I'm more than willing to help answer questions or provide PRs for stickier items around assemblies, multiple artifacts, etc. -- p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Thu, Feb 20, 2014 at 11:56 PM, Sean Owen so...@cloudera.com wrote: Two builds is indeed a pain, since it's an ongoing chore to keep them in sync. For example, I am already seeing that the two do not quite declare the same dependencies (see recent patch). I think publishing artifacts to Maven central should be considered a hard requirement if it isn't already one from the ASF, and it may be? Certainly most people out there would be shocked if you told them Spark is not in the repo at all. And that requires at least maintaining a pom that declares the structure of the project. This does not necessarily mean using Maven to build, but is a reason that removing the pom is going to make this a lot harder for people to consume as a project. Maven has its pros and cons but there are plenty of people lurking around who know it quite well. Certainly it's easier for the Hadoop people to understand and work with. On the other hand, it supports Scala although only via a plugin, which is weaker support. sbt seems like a fairly new, basic, ad-hoc tool. Is there an advantage to it, other than being Scala (which is an advantage)? -- Sean Owen | Director, Data Science | London On Fri, Feb 21, 2014 at 4:03 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, It's very high overhead having two build systems in Spark. Before getting into a long discussion about the merits of sbt vs maven, I wanted to pose a simple question to the dev list: Is there anyone who feels that dropping either sbt or maven would have a major consequence for them? And I say major consequence meaning something becomes completely impossible now and can't be worked around. This is different from an inconvenience, i.e., something which can be worked around but will require some investment. I'm posing the question in this way because, if there are features in either build system that are absolutely-un-available in the other, then we'll have to maintain both for the time being. I'm merely trying to see whether this is the case... - Patrick
Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark
Kos - thanks for chiming in. Could you be more specific about what is available in maven and not in sbt for these issues? I took a look at the bigtop code relating to Spark. As far as I could tell [1] was the main point of integration with the build system (maybe there are other integration points)? - in order to integrate Spark well into existing Hadoop stack it was necessary to have a way to avoid transitive dependencies duplications and possible conflicts. E.g. Maven assembly allows us to avoid adding _all_ Hadoop libs and later merely declare Spark package dependency on standard Bigtop Hadoop packages. And yes - Bigtop packaging means the naming and layout would be standard across all commercial Hadoop distributions that are worth mentioning: ASF Bigtop convenience binary packages, and Cloudera or Hortonworks packages. Hence, the downstream user doesn't need to spend any effort to make sure that Spark clicks-in properly. The sbt build also allows you to plug in a Hadoop version similar to the maven build. - Maven provides a relatively easy way to deal with the jar-hell problem, although the original maven build was just Shader'ing everything into a huge lump of class files. Oftentimes ending up with classes slamming on top of each other from different transitive dependencies. AFIAK we are only using the shade plug-in to deal with conflict resolution in the assembly jar. These are dealt with in sbt via the sbt assembly plug-in in an identical way. Is there a difference? [1] https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master
Re: Planned 0.9.1 release
We back port bug fixes into the 0.9 branch as they come in, so if there is a particular fix you want to get you can always build from the head of branch-0.9 and expect only stability improvements compared with Spark 0.9.0. The timing of the maintenance releases depends a bit on what bug fixes come in and their importance. I'm thinking we should propose a release pretty soon (order weeks) since there are some valuable bug fixes that came in this week. - Patrick On Fri, Feb 21, 2014 at 2:22 PM, Gary Malouf malouf.g...@gmail.com wrote: My team has avoided upgrading to 0.9 to this point because of the Mesos bug that has since been fixed in master. For ease of tracking, we are trying to only use tagged releases going forward as long as they will continue to be frequent or become more stable over time. Is there any timeline on cutting a tag for the 0.9.1 bug fix release?
[DISCUSS] Necessity of Maven *and* SBT Build in Spark
Hey All, It's very high overhead having two build systems in Spark. Before getting into a long discussion about the merits of sbt vs maven, I wanted to pose a simple question to the dev list: Is there anyone who feels that dropping either sbt or maven would have a major consequence for them? And I say major consequence meaning something becomes completely impossible now and can't be worked around. This is different from an inconvenience, i.e., something which can be worked around but will require some investment. I'm posing the question in this way because, if there are features in either build system that are absolutely-un-available in the other, then we'll have to maintain both for the time being. I'm merely trying to see whether this is the case... - Patrick
Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark
Hey Henry, Yep, I wanted to reboot this since some time has passed and people may have new or changed ways of using the build. Maven makes the Apache publishing fairly seamless, but after the last two releases I believe we could make it work with sbt as well. sbt also supports publishing and other Apache projects such as Kafka publish with sbt. On Thu, Feb 20, 2014 at 8:50 PM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks for bringing back the build systems discussions, Patrick. There was a long discussion way back before Spark joining ASF and as I remember there has not been clear winner between using sbt or maven. Maven makes it easier to publish the artifacts to Nexus repository, not sure if sbt can do the same, and as I remember one of the limitations or drawbacks about maven is the use of profiles. Matei had suggested using some kind of Hadoop client detection as in Parquet project to manage the Hadoop versions to avoid profiles. - Henry On Thu, Feb 20, 2014 at 8:03 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, It's very high overhead having two build systems in Spark. Before getting into a long discussion about the merits of sbt vs maven, I wanted to pose a simple question to the dev list: Is there anyone who feels that dropping either sbt or maven would have a major consequence for them? And I say major consequence meaning something becomes completely impossible now and can't be worked around. This is different from an inconvenience, i.e., something which can be worked around but will require some investment. I'm posing the question in this way because, if there are features in either build system that are absolutely-un-available in the other, then we'll have to maintain both for the time being. I'm merely trying to see whether this is the case... - Patrick
Re: coding style discussion: explicit return type in public APIs
+1 overall. Christopher - I agree that once the number of rules becomes large it's more efficient to pursue a use your judgement approach. However, since this is only 3 cases I'd prefer to wait to see if it grows. The concern with this approach is that for newer people, contributors, etc it's hard for them to understand what good judgement is. Many are new to scala, so explicit rules are generally better. - Patrick On Wed, Feb 19, 2014 at 12:19 AM, Reynold Xin r...@databricks.com wrote: Yes, the case you brought up is not a matter of readability or style. If it returns a different type, it should be declared (otherwise it is just wrong). On Wed, Feb 19, 2014 at 12:17 AM, Mridul Muralidharan mri...@gmail.comwrote: You are right. A degenerate case would be : def createFoo = new FooImpl() vs def createFoo: Foo = new FooImpl() Former will cause api instability. Reynold, maybe this is already avoided - and I understood it wrong ? Thanks, Mridul On Wed, Feb 19, 2014 at 12:44 PM, Christopher Nguyen c...@adatao.com wrote: Mridul, IIUUC, what you've mentioned did come to mind, but I deemed it orthogonal to the stylistic issue Reynold is talking about. I believe you're referring to the case where there is a specific desired return type by API design, but the implementation does not, in which case, of course, one must define the return type. That's an API requirement and not just a matter of readability. We could add this as an NB in the proposed guideline. -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com linkedin.com/in/ctnguyen On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin r...@databricks.com wrote: +1 Christopher's suggestion. Mridul, How would that happen? Case 3 requires the method to be invoking the constructor directly. It was implicit in my email, but the return type should be the same as the class itself. On Tue, Feb 18, 2014 at 10:37 PM, Mridul Muralidharan mri...@gmail.com wrote: Case 3 can be a potential issue. Current implementation might be returning a concrete class which we might want to change later - making it a type change. The intention might be to return an RDD (for example), but the inferred type might be a subclass of RDD - and future changes will cause signature change. Regards, Mridul On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin r...@databricks.com wrote: Hi guys, Want to bring to the table this issue to see what other members of the community think and then we can codify it in the Spark coding style guide. The topic is about declaring return types explicitly in public APIs. In general I think we should favor explicit type declaration in public APIs. However, I do think there are 3 cases we can avoid the public API definition because in these 3 cases the types are self-evident repetitive. Case 1. toString Case 2. A method returning a string or a val defining a string def name = abcd // this is so obvious that it is a string val name = edfg // this too Case 3. The method or variable is invoking the constructor of a class and return that immediately. For example: val a = new SparkContext(...) implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]) = new AsyncRDDActions(rdd) Thoughts?
Re: Adding my wiki user id (hsaputra) as contributors in Apache Spark confluence wiki space
Hey Henry, Ya unfortunately I have no idea how to do this! On Thu, Feb 13, 2014 at 9:54 AM, Mayur Rustagi mayur.rust...@gmail.com wrote: I can help out here as well. I am trying to develop docs around setting up Spark, Streaming and Shark, currently doing it on my wiki ( docs.sigmoidanalytics.com). Would love to contribute. Regards Mayur Mayur Rustagi Ph: +919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Thu, Feb 13, 2014 at 8:28 AM, Henry Saputra henry.sapu...@gmail.comwrote: HI Andy, Could you or someone with space admin role in the Spark wiki [1] kindly help to add my userid hsaputra as collaborators to edit/ add new content in the Spark wiki space? I believe Andy's userid was granted the space admin to the wiki. Thank you, - Henry [1] https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage
Re: [GitHub] incubator-spark pull request: SPARK-1078: Replace lift-json with j...
I think Aaron just meant 1.0.0 by the next minor release. On Tue, Feb 11, 2014 at 7:56 PM, Mark Hamstra m...@clearstorydata.com wrote: The situation sounds fine for the next minor release... I don't understand what you mean by this. According to my current understanding, the next release of Spark other than maintenance releases on 0.9.x is intended to be a major release, 1.0.0, and there are no plans for an intervening minor release, which would be 0.10.0. Thus the next minor release would be 1.1.0, and I fail to see why we would wait for that instead of putting the dependency change (assuming that it is something that we do, indeed, want) in 1.0.0. On Tue, Feb 11, 2014 at 7:51 PM, aarondav g...@git.apache.org wrote: Github user aarondav commented on the pull request: https://github.com/apache/incubator-spark/pull/582#issuecomment-34836430 Thanks for looking into it! The situation sounds fine for the next minor release, and I don't think this patch needs to be included in the next maintenance release anyway (following your very own [suggestion]( http://mail-archives.apache.org/mod_mbox/spark-dev/201402.mbox/browser) on the dev list). While this patch looks good to me, I am not sure I fully understand the need for it. I posted my question on the [dev list thread]( http://mail-archives.apache.org/mod_mbox/spark-dev/201402.mbox/%3C945190638.685798.1391974088596.JavaMail.zimbra%40redhat.com%3E). Besides the dependency change, you also mention performance improvements. [This benchmark]( http://engineering.ooyala.com/blog/comparing-scala-json-libraries) does show Jackson outperforming lift on a particular workload, but do you have another source showing how the relative performance changes with input size?
Re: Github merge script
Hey Andrew, The intent was to be consistent with the way the merge messages look before. But I agree it obfuscates the commit messages from the user and hides them further down. I think your proposal is good, but it might be better to use the title of their pull request message rather than the first line of the most recent commit in their branch (not sure what you meant by commit message). Maybe you could submit a pull request for this? The script we use to merge things is in dev/merge_spark_pr.py. Another nice thing is if people are formatting their titles with jira's then it will all look nice and pretty... which is kind of the goal. - Patrick On Sun, Feb 9, 2014 at 11:55 PM, Andrew Ash and...@andrewash.com wrote: The current script for merging a GitHub PR squashes the commits and sticks a Merge pull request #123 from abc/def at the top of the commit message. However this obscures the original commit message when doing a short gitlog (first line only) so the recent history is much less meaningful than before. Compare recent history A: * 919bd7f Prashant Sharma 86 minutes ago (origin/master, origin/HEAD)Merge pull request #567 from ScrapCodes/style2. * 2182aa3 Martin Jaggi 8 hours ago Merge pull request #566 from martinjaggi/copy-MLlib-d. * afc8f3c qqsun8819 10 hours ago Merge pull request #551 from qqsun8819/json-protocol. * 94ccf86 Patrick Wendell 10 hours ago Merge pull request #569 from pwendell/merge-fixes. * b69f8b2 Patrick Wendell 14 hours ago Merge pull request #557 from ScrapCodes/style. Closes #557. * b6dba10 CodingCat 24 hours ago Merge pull request #556 from CodingCat/JettyUtil. Closes #556. | * de22abc jyotiska 24 hours ago (origin/branch-0.9)Merge pull request #562 from jyotiska/master. Closes #562. * | 2ef37c9 jyotiska 24 hours ago Merge pull request #562 from jyotiska/master. Closes #562. | * 2e3d1c3 Patrick Wendell 24 hours ago Merge pull request #560 from pwendell/logging. Closes #560. * | b6d40b7 Patrick Wendell 24 hours ago Merge pull request #560 from pwendell/logging. Closes #560. * | f892da8 Patrick Wendell 25 hours ago Merge pull request #565 from pwendell/dev-scripts. Closes #565. * | c2341c9 Mark Hamstra 32 hours ago Merge pull request #542 from markhamstra/versionBump. Closes #542. | * 22e0a3b Qiuzhuang Lian 35 hours ago Merge pull request #561 from Qiuzhuang/master. Closes #561. * | f0ce736 Qiuzhuang Lian 35 hours ago Merge pull request #561 from Qiuzhuang/master. Closes #561. * | 7805080 Jey Kottalam 35 hours ago Merge pull request #454 from jey/atomic-sbt-download. Closes #454. * | fabf174 Martin Jaggi 2 days ago Merge pull request #552 from martinjaggi/master. Closes #552. * | 3a9d82c Andrew Ash 3 days ago Merge pull request #506 from ash211/intersection. Closes #506. | * ce179f6 Andrew Or 3 days ago Merge pull request #533 from andrewor14/master. Closes #533. To B: If you go back some time in history, you get a much more branched history, like this: | * | | | | | | | | 0984647 Patrick Wendell 4 weeks ago Enable compression by default for spills |/ / / / / / / / / | * | | | | | | | 4e497db Tathagata Das 4 weeks ago Removed StreamingContext.registerInputStream and registerOutputStream - they were useless as InputDStream has been made to register itself. Also made DS * | | | | | | | | fdaabdc Patrick Wendell 4 weeks ago Merge pull request #380 from mateiz/py-bayes |\ \ \ \ \ \ \ \ \ | | | | | * | | | | c2852cf Frank Dai 4 weeks ago Indent two spaces * | | | | | | | | | 4a805af Patrick Wendell 4 weeks ago Merge pull request #367 from ankurdave/graphx |\ \ \ \ \ \ \ \ \ \ | * | | | | | | | | | 80e73ed Joseph E. Gonzalez 4 weeks ago Adding minimal additional functionality to EdgeRDD * | | | | | | | | | | 945fe7a Patrick Wendell 4 weeks ago Merge pull request #408 from pwendell/external-serializers |\ \ \ \ \ \ \ \ \ \ \ | | * | | | | | | | | | 4bafc4f Joseph E. Gonzalez 4 weeks ago adding documentation about EdgeRDD * | | | | | | | | | | | 68641bc Patrick Wendell 4 weeks ago Merge pull request #413 from rxin/scaladoc |\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | * | | | | 12386b3 Frank Dai 4 weeks ago Since getLong() and getInt() have side effect, get back parentheses, and remove an empty line | | | | | | | | * | | | | 0d94d74 Frank Dai 4 weeks ago Code clean up for mllib * | | | | | | | | | | | | 0ca0d4d Patrick Wendell 4 weeks ago Merge pull request #401 from andrewor14/master |\ \ \ \ \ \ \ \ \ \ \ \ \ | | | | * | | | | | | | | | af645be Ankur Dave 4 weeks ago Fix all code examples in guide | | | | * | | | | | | | | | 2cd9358 Ankur Dave 4 weeks ago Finish 6f6f8c928ce493357d4d32e46971c5e401682ea8 * | | | | | | | | | | | | | 08b9fec Patrick Wendell 4 weeks ago Merge pull request #409 from tdas/unpersist Ignoring the merge commits here, the commit messages are much better here than in the current setup because they're what the original author wrote. Not a pretty generic
Re: [VOTE] Graduation of Apache Spark from the Incubator
+1 To clarify to others, this is an IPCM vote so only the IPCM votes are binding :) On Mon, Feb 10, 2014 at 10:02 PM, Sandy Ryza sandy.r...@cloudera.com wrote: +1 On Mon, Feb 10, 2014 at 9:57 PM, Mark Hamstra m...@clearstorydata.comwrote: +1 On Mon, Feb 10, 2014 at 8:27 PM, Chris Mattmann mattm...@apache.org wrote: Hi Everyone, This is a new VOTE to decide if Apache Spark should graduate from the Incubator. Please VOTE on the resolution pasted below the ballot. I'll leave this VOTE open for at least 72 hours. Thanks! [ ] +1 Graduate Apache Spark from the Incubator. [ ] +0 Don't care. [ ] -1 Don't graduate Apache Spark from the Incubator because.. Here is my +1 binding for graduation. Cheers, Chris snip WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to fast and flexible large-scale data analysis on clusters. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache Spark Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Spark Project be and hereby is responsible for the creation and maintenance of software related to fast and flexible large-scale data analysis on clusters; and be it further RESOLVED, that the office of Vice President, Apache Spark be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Spark Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Spark Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Spark Project: * Mosharaf Chowdhury mosha...@apache.org * Jason Dai jason...@apache.org * Tathagata Das t...@apache.org * Ankur Dave ankurd...@apache.org * Aaron Davidson a...@apache.org * Thomas Dudziak to...@apache.org * Robert Evans bo...@apache.org * Thomas Graves tgra...@apache.org * Andy Konwinski and...@apache.org * Stephen Haberman steph...@apache.org * Mark Hamstra markhams...@apache.org * Shane Huang shane_hu...@apache.org * Ryan LeCompte ryanlecom...@apache.org * Haoyuan Li haoy...@apache.org * Sean McNamara mcnam...@apache.org * Mridul Muralidharam mridul...@apache.org * Kay Ousterhout kayousterh...@apache.org * Nick Pentreath mln...@apache.org * Imran Rashid iras...@apache.org * Charles Reiss wog...@apache.org * Josh Rosen joshro...@apache.org * Prashant Sharma prash...@apache.org * Ram Sriharsha har...@apache.org * Shivaram Venkataraman shiva...@apache.org * Patrick Wendell pwend...@apache.org * Andrew Xia xiajunl...@apache.org * Reynold Xin r...@apache.org * Matei Zaharia ma...@apache.org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matei Zaharia be appointed to the office of Vice President, Apache Spark, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Spark Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Spark podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Spark podling encumbered upon the Apache Incubator Project are hereafter discharged.
Re: [TODO] Document the release process for Apache Spark
Done, thanks. Feel free to edit it directly as well :) On Sat, Feb 8, 2014 at 11:28 PM, Henry Saputra henry.sapu...@gmail.com wrote: Cool! Thanks Patrick. Looks good to me. Just one small recommendation about Get Access to Apache Nexus for Publishing Artifacts, as I remember you need to file INFRA ticket for your Apache id [1] to get it? If it is then probably good idea to add it to the wiki. - Henry [1] https://issues.apache.org/jira On Sat, Feb 8, 2014 at 9:42 PM, Patrick Wendell pwend...@gmail.com wrote: I ported the release docs to the wiki today. Thanks for reminding me about this Henry: https://cwiki.apache.org/confluence/display/SPARK/Preparing+Spark+Releases - Patrick On Fri, Feb 7, 2014 at 11:51 AM, Henry Saputra henry.sapu...@gmail.com wrote: Cool, Thanks Patrick! Really appreciate it =) - Henry On Fri, Feb 7, 2014 at 11:46 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Henry, Let me document this on the wiki. I've already keep pretty thorough docs on this I just need to migrate them to the wiki. I've created a JIRA here: https://spark-project.atlassian.net/browse/SPARK-1066 - Patrick On Fri, Feb 7, 2014 at 11:35 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, As part of the unofficial checklist for graduation, we need to have a documented steps to make a release. As the first and so far the only RE for Apache Spark, I would like to ask for your help to document the steps to release. This will help other member to do the release and take turns to make sure all future PMCs and committers know how to do Apache Spark release. Most of the steps are probably similar to other projects but it is always useful for each podling to have its own documentation to release artifacts. Really appreciate your help. Thanks, - Henry
Re: How to write test cases for the functionalities which involves actor communication
It's possible to mock out actors... we have a few examples in the code base. One his here: https://github.com/apache/incubator-spark/blob/master/core/src/test/scala/org/apache/spark/deploy/worker/WorkerWatcherSuite.scala On Sun, Feb 9, 2014 at 6:21 AM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all I have a question when trying to write some test cases for the PR The key functionality in my PR involves actor communication between master and worker, like the worker does something and returns the result to the master via a message, I want to test if the master can do the right thing according to the number of workers existing in the cluster and the return result from the worker, Is there any way to test this via some test cases? Thank you Best, -- Nan Zhu
[SUMMARY] Proposal for Spark Release Strategy
Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. -- Compromise: let's do one minor release 1.0-1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. -- Version 1.X indicates API stability rather than a feature set; this was clarified. -- That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
Re: [SUMMARY] Proposal for Spark Release Strategy
:P - I'm pretty sure this can be done but it will require some work - we already use the github API in our merge script and we could hook something like that up with the jenkins tests. Henry maybe you could create a JIRA for this for Spark 1.0? - Patrick On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra m...@clearstorydata.com wrote: I know that it can be done -- which is different from saying that I know how to set it up. On Feb 8, 2014, at 2:57 PM, Henry Saputra henry.sapu...@gmail.com wrote: Patrick, do you know if there is a way to check if a Github PR's subject/ title contains JIRA number and will raise warning by the Jenkins? - Henry On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. -- Compromise: let's do one minor release 1.0-1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. -- Version 1.X indicates API stability rather than a feature set; this was clarified. -- That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
Re: [TODO] Document the release process for Apache Spark
I ported the release docs to the wiki today. Thanks for reminding me about this Henry: https://cwiki.apache.org/confluence/display/SPARK/Preparing+Spark+Releases - Patrick On Fri, Feb 7, 2014 at 11:51 AM, Henry Saputra henry.sapu...@gmail.com wrote: Cool, Thanks Patrick! Really appreciate it =) - Henry On Fri, Feb 7, 2014 at 11:46 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Henry, Let me document this on the wiki. I've already keep pretty thorough docs on this I just need to migrate them to the wiki. I've created a JIRA here: https://spark-project.atlassian.net/browse/SPARK-1066 - Patrick On Fri, Feb 7, 2014 at 11:35 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, As part of the unofficial checklist for graduation, we need to have a documented steps to make a release. As the first and so far the only RE for Apache Spark, I would like to ask for your help to document the steps to release. This will help other member to do the release and take turns to make sure all future PMCs and committers know how to do Apache Spark release. Most of the steps are probably similar to other projects but it is always useful for each podling to have its own documentation to release artifacts. Really appreciate your help. Thanks, - Henry
Re: 0.9.0 forces log4j usage
Hey Paul, Thanks for digging this up. I worked on this feature and the intent was to give users good default behavior if they didn't include any logging configuration on the classpath. The problem with assuming that CL tooling is going to fix the job is that many people link against spark as a library and run their application using their own scripts. In this case the first thing people see when they run an application that links against Spark was a big ugly logging warning. I'm not super familiar with log4j-over-slf4j, but this behavior of returning null for the appenders seems a little weird. What is the use case for using this and not just directly use slf4j-log4j12 like Spark itself does? Did you have a more general fix for this in mind? Or was your plan to just revert the existing behavior... We might be able to add a configuration option to disable this logging default stuff. Or we could just rip it out - but I'd like to avoid that if possible. - Patrick On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown p...@mult.ifario.us wrote: We have a few applications that embed Spark, and in 0.8.0 and 0.8.1, we were able to use slf4j, but 0.9.0 broke that and unintentionally forces direct use of log4j as the logging backend. The issue is here in the org.apache.spark.Logging trait: https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107 log4j-over-slf4j *always* returns an empty enumeration for appenders to the ROOT logger: https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81 And this causes an infinite loop and an eventual stack overflow. I'm happy to submit a Jira and a patch, but it would be significant enough reversal of recent changes that it's probably worth discussing before I sink a half hour into it. My suggestion would be that initialization (or not) should be left to the user with reasonable default behavior supplied by the spark commandline tooling and not forced on applications that incorporate Spark. Thoughts/opinions? -- Paul -- p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
Re: 0.9.0 forces log4j usage
Koert - my suggestion was this. We let users use any slf4j backend they want. If we detect that they are using the log4j backend and *also* they didn't configure any log4j appenders, we set up some nice defaults for them. If they are using another backend, Spark doesn't try to modify the configuration at all. On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers ko...@tresata.com wrote: well static binding is probably the wrong terminology but you get the idea. multiple backends are not allowed and cause an even uglier warning... see also here: https://github.com/twitter/scalding/pull/636 and here: https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls all me being annoying and complaining about slf4j-log4j12 dependencies (which did get removed). On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers ko...@tresata.com wrote: the issue is that slf4j uses static binding. you can put only one slf4j backend on the classpath, and that's what it uses. more than one is not allowed. so you either keep the slf4j-log4j12 dependency for spark, and then you took away people's choice of slf4j backend which is considered bad form for a library, or you do not include it and then people will always get the big fat ugly warning and slf4j logging will not flow to log4j. including log4j itself is not necessary a problem i think? On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell pwend...@gmail.comwrote: This also seems relevant - but not my area of expertise (whether this is a valid way to check this). http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Guys, Thanks for explainning. Ya this is a problem - we didn't really know that people are using other slf4j backends, slf4j is in there for historical reasons but I think we may assume in a few places that log4j is being used and we should minimize those. We should patch this and get a fix into 0.9.1. So some solutions I see are: (a) Add SparkConf option to disable this. I'm fine with this one. (b) Ask slf4j which backend is active and only try to enforce this default if we know slf4j is using log4j. Do either of you know if this is possible? Not sure if slf4j exposes this. (c) Just remove this default stuff. We'd rather not do this. The goal of this thing is to provide good usability for people who have linked against Spark and haven't done anything to configure logging. For beginners we try to minimize the assumptions about what else they know about, and I've found log4j configuration is a huge mental barrier for people who are getting started. Paul if you submit a patch doing (a) we can merge it in. If you have any idea if (b) is possible I prefer that one, but it may not be possible or might be brittle. - Patrick On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers ko...@tresata.com wrote: Totally agree with Paul: a library should not pick the slf4j backend. It defeats the purpose of slf4j. That big ugly warning is there to alert people that its their responsibility to pick the back end... On Feb 7, 2014 3:55 AM, Paul Brown p...@mult.ifario.us wrote: Hi, Patrick -- From slf4j, you can either backend it into log4j (which is the way that Spark is shipped) or you can route log4j through slf4j and then on to a different backend (e.g., logback). We're doing the latter and manipulating the dependencies in the build because that's the way the enclosing application is set up. The issue with the current situation is that there's no way for an end user to choose to *not* use the log4j backend. (My short-term solution was to use the Maven shade plugin to swap in a version of the Logging trait with the body of that method commented out.) In addition to the situation with log4j-over-slf4j and the empty enumeration of ROOT appenders, you might also run afoul of someone who intentionally configured log4j with an empty set of appenders at the time that Spark is initializing. I'd be happy with any implementation that lets me choose my logging backend: override default behavior via system property, plug-in architecture, etc. I do think it's reasonable to expect someone digesting a substantial JDK-based system like Spark to understand how to initialize logging -- surely they're using logging of some kind elsewhere in their application -- but if you want the default behavior there as a courtesy, it might be worth putting an INFO (versus a the glaring log4j WARN) message on the output that says something like Initialized default logging via Log4J; pass -Dspark.logging.loadDefaultLogger=false to disable this behavior. so that it's both convenient and explicit. Cheers. -- Paul -- p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Fri, Feb 7, 2014 at 12:05 AM
Re: [TODO] Document the release process for Apache Spark
Hey Henry, Let me document this on the wiki. I've already keep pretty thorough docs on this I just need to migrate them to the wiki. I've created a JIRA here: https://spark-project.atlassian.net/browse/SPARK-1066 - Patrick On Fri, Feb 7, 2014 at 11:35 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, As part of the unofficial checklist for graduation, we need to have a documented steps to make a release. As the first and so far the only RE for Apache Spark, I would like to ask for your help to document the steps to release. This will help other member to do the release and take turns to make sure all future PMCs and committers know how to do Apache Spark release. Most of the steps are probably similar to other projects but it is always useful for each podling to have its own documentation to release artifacts. Really appreciate your help. Thanks, - Henry
Re: 0.9.0 forces log4j usage
Ah okay sounds good. This is what I meant earlier by You have some other application that directly calls log4j i.e. you have for historical reasons installed the log4j-over-slf4j. Would you mind trying out this fix and seeing if it works? This is designed to be a hotfix for 0.9, not a general solution where we rip out log4j from our published dependencies: https://github.com/apache/incubator-spark/pull/560/files - Patrick On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown p...@mult.ifario.us wrote: Hi, Patrick -- I forget which other component is responsible, but we're using the log4j-over-slf4j as part of an overall requirement to centralize logging, i.e., *someone* else is logging over log4j and we're pulling that in. (There's also some jul logging from Jersey, etc.) Goals: - Fully control/capture all possible logging. (God forbid we have to grab System.out/err, but we'd do it if needed.) - Use the backend we like best at the moment. (Happens to be logback.) Possible cases: - If Spark used Log4j at all, we would pull in that logging via log4j-over-slf4j. - If Spark used only slf4j and referenced no backend, we would use it as-is although we'd still have the log4j-over-slf4j because of other libraries. - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we would exclude that one dependency (via our POM). Best. -- Paul -- p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Paul, So if your goal is ultimately to output to logback. Then why don't you just use slf4j and logback-classic.jar as described here [1]. Why involve log4j-over-slf4j at all? Let's say we refactored the spark build so it didn't advertise slf4j-log4j12 as a dependency. Would you still be using log4j-over-slf4j... or is this just a fix to deal with the fact that Spark is somewhat log4j dependent at this point. [1] http://www.slf4j.org/manual.html - Patrick On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown p...@mult.ifario.us wrote: Hi, Patrick -- That's close but not quite it. The issue that occurs is not the delegation loop mentioned in slf4j documentation. The stack overflow is entirely within the code in the Spark trait: at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112) at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97) at org.apache.spark.Logging$class.log(Logging.scala:36) at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94) And then that repeats. As for our situation, we exclude the slf4j-log4j12 dependency when we import the Spark library (because we don't want to use log4j) and have log4j-over-slf4j already in place to ensure that all of the logging in the overall application runs through slf4j and then out through logback. (We also, as another poster already mentioned, also force jcl and jul through slf4j.) The zen of slf4j for libraries is that the library uses the slf4j API and then the enclosing application can route logging as it sees fit. Spark master CLI would log via slf4j and include the slf4j-log4j12 backend; same for Spark worker CLI. Spark as a library (versus as a container) would not include any backend to the slf4j API and leave this up to the application. (FWIW, this would also avoid your log4j warning message.) But as I was saying before, I'd be happy with a situation where I can avoid log4j being enabled or configured, and I think you'll find an existing choice of logging framework to be a common scenario for those embedding Spark in other systems. Best. -- Paul -- p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell pwend...@gmail.com wrote: Paul, Looking back at your problem. I think it's the one here: http://www.slf4j.org/codes.html#log4jDelegationLoop So let me just be clear what you are doing so I understand. You have some other application that directly calls log4j. So you have to include log4j-over-slf4j to route those logs through slf4j to logback. At the same time you embed Spark in this application. In the past it was fine, but now that Spark programmatic ally initializes log4j, it screws up your application because log4j-over-slf4j doesn't work with applications that do this explicilty as discussed here: http://www.slf4j.org/legacy.html Correct? - Patrick On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers ko...@tresata.com wrote: got it. that sounds reasonable On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell pwend...@gmail.com wrote: Koert - my suggestion was this. We let users use any slf4j backend they want. If we detect that they are using the log4j backend and *also* they didn't configure any log4j appenders, we set up some nice defaults for them. If they are using another backend
Re: Proposal for Spark Release Strategy
I like Heiko's proposal that requires every pull request to reference a JIRA. This is how things are done in Hadoop and it makes it much easier to, for example, find out whether an issue you came across when googling for an error is in a release. I think this is a good idea and something on which there is wide consensus. I separately was going to suggest this in a later e-mail (it's not directly tied to versioning). One of many reasons this is necessary is because it's becoming hard to track which features ended up in which releases. I agree with Mridul about binary compatibility. It can be a dealbreaker for organizations that are considering an upgrade. The two ways I'm aware of that cause binary compatibility are scala version upgrades and messing around with inheritance. Are these not avoidable at least for minor releases? This is clearly a goal but I'm hesitant to codify it until we understand all of the reasons why it might not work. I've heard in general with Scala there are many non-obvious things that can break binary compatibility and we need to understand what they are. I'd propose we add the migration tool [1] here to our build and use it for a few months and see what happens (hat tip to Michael Armbrust). It's easy to formalize this as a requirement later, it's impossible to go the other direction. For Scala major versions it's possible we can cross-build between 2.10 and 2.11 to retain link-level compatibility. It's just entirely uncharted territory and AFAIK no one who's suggesting this is speaking from experience maintaining this guarantee for a Scala project. That would be the strongest convincing reason for me - if someone has actually done this in the past in a Scala project and speaks from experience. Most of use are speaking from the perspective of Java projects where we understand well the trade-off's and costs of maintaining this guarantee. [1] https://github.com/typesafehub/migration-manager - Patrick
Re: Proposal for Spark Release Strategy
and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com : Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after the previous minor release to give people plenty of time, and over time, we might make the whole release process more regular (similar to Ubuntu). At the bottom of this document is an example window for the 1.0.0 release. Maintenance releases will occur more frequently and depend on specific patches introduced (e.g. bug fixes) and their urgency. In general these releases are designed to patch bugs. However, higher level libraries may introduce small features, such as a new algorithm, provided they are entirely additive and isolated from existing code paths. Spark core may not introduce any features. When new components are added to Spark, they may initially be marked as alpha. Alpha components do not have to abide by the above guidelines, however, to the maximum extent possible, they should try to. Once they are marked stable they have to follow these guidelines. At present, GraphX is the only alpha component of Spark. [1] API compatibility: An API is any public class or interface exposed in Spark that is not marked as semi-private or experimental. Release A is API compatible with release B if code compiled against release A *compiles cleanly* against B. This does not guarantee that a compiled application that is linked against version A will link cleanly against version B without re-compiling. Link-level compatibility is something we'll try to guarantee that as well, and we might make it a requirement in the future, but challenges with things like Scala versions have made this difficult to guarantee in the past. == Merging Pull Requests == To merge pull requests, committers are encouraged to use this tool [2] to collapse the request into one commit rather than manually performing git merges. It will also format the commit message nicely in a way that can be easily parsed later when writing credits. Currently
Proposal for Spark Release Strategy
Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after the previous minor release to give people plenty of time, and over time, we might make the whole release process more regular (similar to Ubuntu). At the bottom of this document is an example window for the 1.0.0 release. Maintenance releases will occur more frequently and depend on specific patches introduced (e.g. bug fixes) and their urgency. In general these releases are designed to patch bugs. However, higher level libraries may introduce small features, such as a new algorithm, provided they are entirely additive and isolated from existing code paths. Spark core may not introduce any features. When new components are added to Spark, they may initially be marked as alpha. Alpha components do not have to abide by the above guidelines, however, to the maximum extent possible, they should try to. Once they are marked stable they have to follow these guidelines. At present, GraphX is the only alpha component of Spark. [1] API compatibility: An API is any public class or interface exposed in Spark that is not marked as semi-private or experimental. Release A is API compatible with release B if code compiled against release A *compiles cleanly* against B. This does not guarantee that a compiled application that is linked against version A will link cleanly against version B without re-compiling. Link-level compatibility is something we'll try to guarantee that as well, and we might make it a requirement in the future, but challenges with things like Scala versions have made this difficult to guarantee in the past. == Merging Pull Requests == To merge pull requests, committers are encouraged to use this tool [2] to collapse the request into one commit rather than manually performing git merges. It will also format the commit message nicely in a way that can be easily parsed later when writing credits. Currently it is maintained in a public utility repository, but we'll merge it into mainline Spark soon. [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py == Tentative Release Window for 1.0.0 == Feb 1st - April 1st: General development April 1st: Code freeze for new features April 15th: RC1 == Deviations == For now, the proposal is to consider these tentative guidelines. We can vote to formalize these as project rules at a later time after some experience working with them. Once formalized, any deviation to these guidelines will be subject to a lazy majority vote. - Patrick
Re: Proposal for Spark Release Strategy
How are Alpha components and higher level libraries which may add small features within a maintenance release going to be marked with that status? Somehow/somewhere within the code itself, as just as some kind of external reference? I think we'd mark alpha features as such in the java/scaladoc. This is what scala does with experimental features. Higher level libraries are anything that isn't Spark core. Maybe we can formalize this more somehow. We might be able to annotate the new features as experimental if they end up in a patch release. This could make it more clear. I would strongly encourage that developers submitting pull requests include within the description of that PR whether you intend the contribution to be mergeable at the maintenance level, minor level, or major level. That will help those of us doing code reviews and merges decide where the code should go and how closely to scrutinize the PR for changes that are not compatible with the intended release level. I'd say the default is the minor level. If contributors know it should be added in a maintenance release, it's great if they say so. However I'd say this is also responsibility with the committers, since individual contributors may not know. It will probably be a while before major level patches are being merged :P
Re: Proposal for Spark Release Strategy
If people feel that merging the intermediate SNAPSHOT number is significant, let's just defer merging that until this discussion concludes. That said - the decision to settle on 1.0 for the next release is not just because it happens to come after 0.9. It's a conscientious decision based on the development of the project to this point. A major focus of the 0.9 release was tying off loose ends in terms of backwards compatibility (e.g. spark configuration). There was some discussion back then of maybe cutting a 1.0 release but the decision was deferred until after 0.9. @mridul - pleas see the original post for discussion about binary compatibility. On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski andykonwin...@gmail.com wrote: +1 for 0.10.0 now with the option to switch to 1.0.0 after further discussion. On Feb 5, 2014 9:53 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com: Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after the previous minor release to give people plenty of time, and over time, we might make the whole release process more regular (similar to Ubuntu). At the bottom of this document is an example window for the 1.0.0 release. Maintenance releases will occur more frequently and depend on specific patches introduced (e.g. bug fixes) and their urgency. In general these releases are designed to patch bugs. However, higher level libraries may introduce small features, such as a new algorithm, provided they are entirely additive and isolated from existing code paths. Spark core may not introduce any features. When new components are added to Spark, they may initially be marked as alpha. Alpha components do not have to abide by the above guidelines, however, to the maximum extent possible, they should try to. Once they are marked stable they have to follow these guidelines. At present, GraphX is the only alpha component of Spark. [1] API compatibility: An API is any public class or interface exposed in Spark that is not marked as semi-private or experimental. Release A is API compatible with release B if code compiled against release A *compiles cleanly* against B. This does not guarantee that a compiled application that is linked against version A will link cleanly against version B without re-compiling. Link-level compatibility is something we'll try to guarantee that as well, and we
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)
It takes a day or two to package the release pass votes and is cut to maven. Coming soon! On Sat, Feb 1, 2014 at 8:08 PM, Kapil Malik kma...@adobe.com wrote: Awesome ! Thanks everyone :) -Original Message- From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: 02 February 2014 08:09 To: dev@spark.incubator.apache.org; j...@cs.berkeley.edu Kottalam Subject: Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5) Yup, we’re still working on putting it on the website, but this is the final release. You can download the RC5 artifacts from http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-0-9-0-incubating-rc5-td318.html. Matei On Feb 1, 2014, at 12:51 PM, Jey Kottalam j...@cs.berkeley.edu wrote: Hi Kapil, It looks to me like the artifacts in Maven are the official 0.9.0 release, though the website has not yet been updated. The IPMC approved RC5 as of yesterday: https://mail-archives.apache.org/mod_mbox/incubator-general/201401.mbo x/cabpqxstjm+po7_22bdybqxk90zsy3pnxppft87-9xdff98u...@mail.gmail.com -Jey On Sat, Feb 1, 2014 at 8:19 AM, Kapil Malik kma...@adobe.com wrote: Hi Stevo, Thanks for the link. Indeed, different versions are available on maven repository which I can clone/sync for development purposes. But I'm more confident about official release version when deploying to a cluster which is used by multiple people. Hence curious about date for 0.9 official release. Thanks and regards, Kapil -Original Message- From: Stevo Slavić [mailto:ssla...@gmail.com] Sent: 01 February 2014 21:33 To: dev@spark.incubator.apache.org Subject: Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5) Apache Spark 0.9.0 artifacts are on Maven central repo (see http://central.maven.org/maven2/org/apache/spark/spark-core_2.10/0.9. 0-incubating/) Kind regards, Stevo Slavic On Sat, Feb 1, 2014 at 4:59 PM, Kapil Malik kma...@adobe.com wrote: Sent too early ... 1 week* (maybe I refreshed too fast) -Original Message- From: Kapil Malik Sent: 01 February 2014 21:27 To: dev@spark.incubator.apache.org Subject: RE: [VOTE] Release Apache Spark 0.9.0-incubating (rc5) +1 for Q ! Have been monitoring this thread from past 3 weeks in anticipation :) Any tentative dates for official 0.9 release ? Kapil Malik | kma...@adobe.com | 33430 / 8800836581 -Original Message- From: C. Ross Jam [mailto:cross...@crossjam.net] Sent: 01 February 2014 21:18 To: dev@spark.incubator.apache.org Subject: Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5) Curious lurker here. Did this vote close successfully? Should I wait for an official 0.9 release? Cheers! On Friday, January 24, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0.
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)
I'll add my own +1. On Tue, Jan 28, 2014 at 12:45 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Stephen, Yes this runs afoul of good practice in Maven where a given version shouldn't be re-used. As far as I understand though, it is required by the way the Apache release process works. The artifacts and repository content that get voted on need to exactly match the final release. So we can't hold a vote on a version of the code where everything says -rcx, then we go back and change the source code and do a second push to maven with code that doesn't have an -rcx suffix. This would effectively change the code that is being released. I was thinking as a work around that maybe we could publish a second set of staging artifacts that are versioned with -rcX for people to test against. I think as long as we make it clear that these are not the official artifacts being voted on it might be okay. I'm not totally sure if this is allowed though. - Patrick On Tue, Jan 28, 2014 at 9:01 AM, Stephen Haberman stephen.haber...@gmail.com wrote: Hi Patrick, The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1006/ I was going to import this rc5 release into our internal Maven repo to try it out, but noticed that the version doesn't have rc5 in it. This means that, if there is an rc6, I'll have to re-import over the same artifacts, which is generally not a good thing given Maven assumes artifacts never change. Is this restriction required by the blessing process, or would it be possible to sneak rc5 into the pre-final version number? For now, I'll just build a local version, at the same commit, but with the as 0.9.0-incubating-rc5. Apologies if this was discussed before and I just missed it. - Stephen
Re: Moving to Typesafe Config?
Hey Heiko, Spark 0.9 introduced a common config class for Spark applications. It also (initially) supported loading config files in the nested typesafe format, but this was removed last minute due to a bug. In 1.0 we'll probably add support for config files, though it may not support typesafe's tree-style config files because that conflicts with the naming style of several spark options (we have options where x.y and x.y.z are both named keys, and the typesafe parser doesn't allow that). - Patrick On Mon, Jan 27, 2014 at 8:59 AM, Heiko Braun ike.br...@googlemail.com wrote: Thanks. I found the discussion myself ;) /heiko Am 27.01.2014 um 17:34 schrieb Mark Hamstra m...@clearstorydata.com: And it would be more helpful if I gave you a usable link http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html Sent from my iPhone On Jan 27, 2014, at 8:13 AM, Heiko Braun ike.br...@googlemail.com wrote: Thanks Mark. On 27 Jan 2014, at 17:05, Mark Hamstra m...@clearstorydata.com wrote: Been done and undone, and will probably be redone for 1.0. See https://mail.google.com/mail/ca/u/0/#search/config/143a6c39e3995882 On Mon, Jan 27, 2014 at 7:58 AM, Heiko Braun ike.br...@googlemail.comwrote: Is there any interest in moving to a more structured approach for configuring spark components? I.e. moving to the typesafe config [1]. Since spark already leverages akka, this seems to be a reasonable choice IMO. [1] https://github.com/typesafehub/config Regards, Heiko
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)
Hey Taka, If you build a second version you need to clean the existing assembly jar. The reference implementation of the tests are the ones on the U.C. Berkeley Jenkins. These are passing for Branch 0.9 for both Hadoop 1 and Hadoop 2 versions, so I'm inclined to think it's an issue with your test env or setup. https://amplab.cs.berkeley.edu/jenkins/view/Spark/ - Patrick On Sun, Jan 26, 2014 at 10:52 PM, Reynold Xin r...@databricks.com wrote: It is possible that you have generated the assembly jar using one version of Hadoop, and then another assembly jar with another version. Those tests that failed are all using a local cluster that sets up multiple processes, which would require launching Spark worker processes using the assembly jar. If that's indeed the problem, removing the extra assembly jars should fix them. On Sun, Jan 26, 2014 at 10:49 PM, Taka Shinagawa taka.epsi...@gmail.comwrote: If I build Spark for Hadoop 1.0.4 (either SPARK_HADOOP_VERSION=1.0.4 sbt/sbt assembly or sbt/sbt assembly) or use the binary distribution, 'sbt/sbt test' runs successfully. However, if I build Spark targeting any other Hadoop versions (e.g. SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly, SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly), I'm getting the following errors with 'sbt/sbt test': 1) type mismatch errors with JavaPairDStream.scala 2) following test failures [error] Failed tests: [error] org.apache.spark.ShuffleNettySuite [error] org.apache.spark.ShuffleSuite [error] org.apache.spark.FileServerSuite [error] org.apache.spark.DistributedSuite I don't have Hadoop 1.0.4 installed on my test systems (but the test succeeds, and failed with the installed Hadoop versions). I'm seeing these sbt test errors with the previous 0.9.0 RCs and 0.8.1, too. I'm wondering if anyone else has seen this problem or I'm missing something to run the test correctly. Thanks, Taka On Sat, Jan 25, 2014 at 5:00 PM, Sean McNamara sean.mcnam...@webtrends.comwrote: +1 On 1/25/14, 4:04 PM, Mark Hamstra m...@clearstorydata.com wrote: +1 On Sat, Jan 25, 2014 at 2:37 PM, Andy Konwinski andykonwin...@gmail.comwrote: +1 On Sat, Jan 25, 2014 at 2:27 PM, Reynold Xin r...@databricks.com wrote: +1 On Jan 25, 2014, at 12:07 PM, Hossein fal...@gmail.com wrote: +1 Compiled and tested on Mavericks. --Hossein On Sat, Jan 25, 2014 at 11:38 AM, Patrick Wendell pwend...@gmail.com wrote: I'll kick of the voting with a +1. On Thu, Jan 23, 2014 at 11:33 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit 95d28ff3): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h= 95d28ff3d0d20d9c583e184f9e2c5ae842d8a4d9 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc5 Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1006/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc5-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Monday, January 27, at 07:30 UTC and passes ifa majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/ -- You received this message because you are subscribed to the Google Groups Unofficial Apache Spark Dev Mailing List Mirror group. To unsubscribe from this group and stop receiving emails from it, send an email to apache-spark-dev-mirror+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
[RESULT] [VOTE] Release Apache Spark 0.9.0-incubating (rc5)
Voting is now closed. This vote passes with 5 binding +1 votes and no 0 or -1 votes. This vote will now go to the IPMC list for a second 72-hour vote. Spark developers are encouraged to comment on the IPMC vote as well. The totals are: +1 Patrick Wendell* Hossein Falaki Reynold Xin* Andy Konwinski* Mark Hamstra* Sean McNamara* 0: (none) -1: (none) On Sun, Jan 26, 2014 at 10:58 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Taka, If you build a second version you need to clean the existing assembly jar. The reference implementation of the tests are the ones on the U.C. Berkeley Jenkins. These are passing for Branch 0.9 for both Hadoop 1 and Hadoop 2 versions, so I'm inclined to think it's an issue with your test env or setup. https://amplab.cs.berkeley.edu/jenkins/view/Spark/ - Patrick On Sun, Jan 26, 2014 at 10:52 PM, Reynold Xin r...@databricks.com wrote: It is possible that you have generated the assembly jar using one version of Hadoop, and then another assembly jar with another version. Those tests that failed are all using a local cluster that sets up multiple processes, which would require launching Spark worker processes using the assembly jar. If that's indeed the problem, removing the extra assembly jars should fix them. On Sun, Jan 26, 2014 at 10:49 PM, Taka Shinagawa taka.epsi...@gmail.comwrote: If I build Spark for Hadoop 1.0.4 (either SPARK_HADOOP_VERSION=1.0.4 sbt/sbt assembly or sbt/sbt assembly) or use the binary distribution, 'sbt/sbt test' runs successfully. However, if I build Spark targeting any other Hadoop versions (e.g. SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly, SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly), I'm getting the following errors with 'sbt/sbt test': 1) type mismatch errors with JavaPairDStream.scala 2) following test failures [error] Failed tests: [error] org.apache.spark.ShuffleNettySuite [error] org.apache.spark.ShuffleSuite [error] org.apache.spark.FileServerSuite [error] org.apache.spark.DistributedSuite I don't have Hadoop 1.0.4 installed on my test systems (but the test succeeds, and failed with the installed Hadoop versions). I'm seeing these sbt test errors with the previous 0.9.0 RCs and 0.8.1, too. I'm wondering if anyone else has seen this problem or I'm missing something to run the test correctly. Thanks, Taka On Sat, Jan 25, 2014 at 5:00 PM, Sean McNamara sean.mcnam...@webtrends.comwrote: +1 On 1/25/14, 4:04 PM, Mark Hamstra m...@clearstorydata.com wrote: +1 On Sat, Jan 25, 2014 at 2:37 PM, Andy Konwinski andykonwin...@gmail.comwrote: +1 On Sat, Jan 25, 2014 at 2:27 PM, Reynold Xin r...@databricks.com wrote: +1 On Jan 25, 2014, at 12:07 PM, Hossein fal...@gmail.com wrote: +1 Compiled and tested on Mavericks. --Hossein On Sat, Jan 25, 2014 at 11:38 AM, Patrick Wendell pwend...@gmail.com wrote: I'll kick of the voting with a +1. On Thu, Jan 23, 2014 at 11:33 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit 95d28ff3): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h= 95d28ff3d0d20d9c583e184f9e2c5ae842d8a4d9 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc5 Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1006/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc5-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Monday, January 27, at 07:30 UTC and passes ifa majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/ -- You received this message because you are subscribed to the Google Groups Unofficial Apache Spark Dev Mailing List Mirror group. To unsubscribe from this group and stop receiving emails from it, send an email to apache-spark-dev-mirror+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc4) [new thread]
Hey Tom, Matei had to remove this because it turns out that there was a fairly serious bug in the Typesafe config library we use for parsing conf files [1]. There wasn't an immediate solution to this so he just removed the capability for this release and we can revisit it in the next release. http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html - Patrick On Wed, Jan 22, 2014 at 8:18 AM, Tom Graves tgraves...@yahoo.com wrote: It looks like the latest round of changes took out spark.conf. Are there plans to add this back in (jira)? Tom On Wednesday, January 22, 2014 3:46 AM, Henry Saputra henry.sapu...@gmail.com wrote: Would love to hear from Mridul to verify the fixes for problems he saw are in. On Tuesday, January 21, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit 0771df67): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=0771df675363c69622404cb514bd751bc90526af The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc4 Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1005/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc4-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Friday, January 24, at 11:15 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc4) [new thread]
Btw - to be clear this was an incompatibility between Spark's config names and constraints on names imposed by typesafe. So didn't mean to imply there was something broken in their config library. On Wed, Jan 22, 2014 at 9:14 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Tom, Matei had to remove this because it turns out that there was a fairly serious bug in the Typesafe config library we use for parsing conf files [1]. There wasn't an immediate solution to this so he just removed the capability for this release and we can revisit it in the next release. http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html - Patrick On Wed, Jan 22, 2014 at 8:18 AM, Tom Graves tgraves...@yahoo.com wrote: It looks like the latest round of changes took out spark.conf. Are there plans to add this back in (jira)? Tom On Wednesday, January 22, 2014 3:46 AM, Henry Saputra henry.sapu...@gmail.com wrote: Would love to hear from Mridul to verify the fixes for problems he saw are in. On Tuesday, January 21, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit 0771df67): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=0771df675363c69622404cb514bd751bc90526af The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc4 Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1005/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc4-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Friday, January 24, at 11:15 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: Config properties broken in master
Hey Mridul this was patched and we cut a new release candidate. There were several different config options which had a.b and a.b.c... they should all work in the new RC. On Sun, Jan 19, 2014 at 4:56 AM, Mridul Muralidharan mri...@gmail.com wrote: Chanced upon spill related config which exhibit same pattern ... - Mridul On Sun, Jan 19, 2014 at 1:10 AM, Reynold Xin r...@databricks.com wrote: I also just went over the config options to see how pervasive this is. In addition to speculation, there is one more conflict of this kind: spark.locality.wait spark.locality.wait.node spark.locality.wait.process spark.locality.wait.rack spark.speculation spark.speculation.interval spark.speculation.multiplier spark.speculation.quantile On Sat, Jan 18, 2014 at 11:36 AM, Matei Zaharia matei.zaha...@gmail.comwrote: This is definitely an important issue to fix. Instead of renaming properties, one solution would be to replace Typesafe Config with just reading Java system properties, and disable config files for this release. I kind of like that over renaming. Matei On Jan 18, 2014, at 11:30 AM, Mridul Muralidharan mri...@gmail.com wrote: Hi, Speculation was an example, there are others in spark which are affected by this ... Some of them have been around for a while, so will break existing code/scripts. Regards, Mridul On Sun, Jan 19, 2014 at 12:51 AM, Nan Zhu zhunanmcg...@gmail.com wrote: change spark.speculation to spark.speculation.switch? maybe we can restrict that all properties in Spark should be three levels On Sat, Jan 18, 2014 at 2:10 PM, Mridul Muralidharan mri...@gmail.com wrote: Hi, Unless I am mistaken, the change to using typesafe ConfigFactory has broken some of the system properties we use in spark. For example: if we have both -Dspark.speculation=true -Dspark.speculation.multiplier=0.95 set, then the spark.speculation property is dropped. The rules of parseProperty actually document this clearly [1] I am not sure what the right fix here would be (other than replacing use of config that is). Any thoughts ? I would vote -1 for 0.9 to be released before this is fixed. Regards, Mridul [1] http://typesafehub.github.io/config/latest/api/com/typesafe/config/ConfigFactory.html#parseProperties%28java.util.Properties,%20com.typesafe.config.ConfigParseOptions%29
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc2)
This vote is cancelled in favor of rc3 - which fixes the YARN issue Sandy ran into. @taka - thanks for reporting that bug. It's not enough to block this release however. Once a fix exists we can merge it into the 0.9 branch and it will be in 0.9.1 On Sun, Jan 19, 2014 at 12:37 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: I've found a problem with the cartesian method on Pyspark and filed as SPARK-1034 https://spark-project.atlassian.net/browse/SPARK-1034 0.8.1 doesn't have this problem. On Scala, cartesian method works fine. It's also nice if SPARK-978 can be fixed, too. https://spark-project.atlassian.net/browse/SPARK-978 Thanks, Taka On Sun, Jan 19, 2014 at 1:24 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Has anybody tested against YARN 2.2? I tried it out against a pseudo-distributed cluster and ran into an issue I just filed as SPARK-1031https://spark-project.atlassian.net/browse/SPARK-1031 . thanks, Sandy On Sun, Jan 19, 2014 at 12:55 AM, Reynold Xin r...@databricks.com wrote: +1 On Sat, Jan 18, 2014 at 11:11 PM, Patrick Wendell pwend...@gmail.com wrote: I'll kick of the voting with a +1. On Sat, Jan 18, 2014 at 11:05 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit 00c847a): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=00c847af1d4be2fe5fad887a57857eead1e517dc The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1003/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Wednesday, January 22, at 07:05 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc3)
Attempting to attach the release notes again (I think it may have been blocked previously due to not having an extension). On Sun, Jan 19, 2014 at 8:05 PM, Patrick Wendell pwend...@gmail.com wrote: I'll add my +1 as well On Sun, Jan 19, 2014 at 7:33 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Re-tested on Mac. Matei On Jan 19, 2014, at 7:09 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Starting off. +1 On Sun, Jan 19, 2014 at 2:15 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit a7760eff): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=a7760eff4ea6a474cab68896a88550f63bae8b0d The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1004/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Wednesday, January 22, at 22:15 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/ Spark 0.9.0 is a major release that adds significant new features. It updates Spark to Scala 2.10, simplifies high availability, and updates numerous components of the project. This release includes a first version of GraphX, a powerful new framework for graph processing that comes with a library of standard algorithms. In addition, Spark Streaming is now out of alpha, and includes significant optimizations and simplified high availability deployment. ### Scala 2.10 Support Spark now runs on Scala 2.10, letting users benefit from the language and library improvements in this version. ### Configuration System The new [SparkConf] class is now the preferred way to configure advanced settings on your SparkContext, though the previous Java system property still works. SparkConf is especially useful in tests to make sure properties donât stay set across tests. ### Spark Streaming Improvements Spark Streaming is no longer alpha, and comes with simplified high availability and several optimizations. * When running on a Spark standalone cluster with the [standalone cluster high availability mode], you can submit a Spark Streaming driver application to the cluster and have it automatically recovered if either the driver or the cluster master crashes. * Windowed operators have been sped up by 30-50%. * Spark Streamingâs input source plugins (e.g. for Twitter, Kafka and Flume) are now separate projects, making it easier to pull in only the dependencies you need. * A new StreamingListener interface has been added for monitoring statistics about the streaming computation. * A few aspects of the API have been improved: * `DStream` and `PairDStream` classes have been moved from `org.apache.spark.streaming` to `org.apache.spark.streaming.dstream` to keep it consistent with `org.apache.spark.rdd.RDD`. * `DStream.foreach` - `DStream.foreachRDD` to make it explicit that it works for every RDD, not every element * `StreamingContext.awaitTermination()` allows you wait for context shutdown and catch any exception that occurs in the streaming computation. *`StreamingContext.stop()` now allows stopping of StreamingContext without stopping the underlying SparkContext. ### GraphX Alpha GraphX is a new API for graph processing that uses recent advances in graph-parallel computation. It lets you build a graph within a Spark program using the standard Spark operators, then process it with new graph operators that are optimized for distributed computation. It includes basic transformations, a Pregel API for iterative computation, and a standard library of graph loaders and analytics algorithms. By offering these features within the Spark engine, GraphX can significantly speed up processing tasks compared to workflows that use different engines. GraphX features in this release include: * Building graphs from arbitrary Spark RDDs * Basic operations to transform graphs or extract subgraphs * An optimized Pregel API that takes advantage of graph partitioning and indexing * Standard algorithms including PageRank, connected components, strongly connected components, SVD++, and triangle counting * Interactive use from the Spark shell GraphX
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc3)
Eventually the notes get posted on the apache website. I attached them to this e-mail so that people can get a sense of what is in the release before they vote on it. On Sun, Jan 19, 2014 at 9:57 PM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, quick question, where are you planning to add the release notes? I dont think it is part of the source, is it? - Henry On Sun, Jan 19, 2014 at 8:41 PM, Patrick Wendell pwend...@gmail.com wrote: Attempting to attach the release notes again (I think it may have been blocked previously due to not having an extension). On Sun, Jan 19, 2014 at 8:05 PM, Patrick Wendell pwend...@gmail.com wrote: I'll add my +1 as well On Sun, Jan 19, 2014 at 7:33 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Re-tested on Mac. Matei On Jan 19, 2014, at 7:09 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Starting off. +1 On Sun, Jan 19, 2014 at 2:15 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit a7760eff): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=a7760eff4ea6a474cab68896a88550f63bae8b0d The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1004/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Wednesday, January 22, at 22:15 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)
Mridul, thanks a *lot* for pointing this out. This is indeed an issue and something which warrants cutting a new RC. - Patrick On Sat, Jan 18, 2014 at 11:14 AM, Mridul Muralidharan mri...@gmail.com wrote: I would vote -1 for this release until we resolve config property issue [1] : if there is a known resolution for this (which I could not find unfortunately, apologies if it exists !), then will change my vote. Thanks, Mridul [1] http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html On Thu, Jan 16, 2014 at 7:18 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit 7348893): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=7348893f0edd96dacce2f00970db1976266f7008 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1001/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Sunday, January 19, at 02:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)
This vote is cancelled in favor of rc2 which I'll post shortly. On Sat, Jan 18, 2014 at 12:14 PM, Patrick Wendell pwend...@gmail.com wrote: Mridul, thanks a *lot* for pointing this out. This is indeed an issue and something which warrants cutting a new RC. - Patrick On Sat, Jan 18, 2014 at 11:14 AM, Mridul Muralidharan mri...@gmail.com wrote: I would vote -1 for this release until we resolve config property issue [1] : if there is a known resolution for this (which I could not find unfortunately, apologies if it exists !), then will change my vote. Thanks, Mridul [1] http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html On Thu, Jan 16, 2014 at 7:18 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit 7348893): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=7348893f0edd96dacce2f00970db1976266f7008 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1001/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Sunday, January 19, at 02:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc2)
I'll kick of the voting with a +1. On Sat, Jan 18, 2014 at 11:05 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.9.0. A draft of the release notes along with the changes file is attached to this e-mail. The tag to be voted on is v0.9.0-incubating (commit 00c847a): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=00c847af1d4be2fe5fad887a57857eead1e517dc The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1003/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2-docs/ Please vote on releasing this package as Apache Spark 0.9.0-incubating! The vote is open until Wednesday, January 22, at 07:05 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)
I also ran your example locally and it worked with 0.8.1 and 0.9.0-rc1. So it's possible somehow you are pulling in an older version if Spark or an incompatible version of Hadoop. - Patrick On Thu, Jan 16, 2014 at 9:39 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Alex, Thanks for testing out this rc. Would you mind forking this into a different thread so we can discuss there? Also, does your application build and run correctly with spark 0.8.1? That would determine whether the problem is specifically with this rc... Patrick --- sent from my phone On Jan 15, 2014 11:44 PM, Alex Cozzi alexco...@gmail.com wrote: Oh, I forgot: I am using the “yarn” maven profile to target yarn 2.2 Alex Cozzi alexco...@gmail.com On Jan 15, 2014, at 11:41 PM, Alex Cozzi alexco...@gmail.com wrote: Just testing out the rc1. I create a dependent project (using maven) and I copied the HdfsTest.scala test, but I added a single line to save the file back to disk: package org.apache.spark.examples import org.apache.spark._ object HdfsTest { def main(args: Array[String]) { val sc = new SparkContext(args(0), HdfsTest, System.getenv(SPARK_HOME), SparkContext.jarOfClass(this.getClass)) val file = sc.textFile(args(1)) val mapped = file.map(s = s.length).cache() for (iter - 1 to 10) { val start = System.currentTimeMillis() for (x - mapped) { x + 2 } // println(Processing: + x) val end = System.currentTimeMillis() println(Iteration + iter + took + (end-start) + ms) mapped.saveAsTextFile(out) } System.exit(0) } } and this my pom file: project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd; modelVersion4.0.0/modelVersion groupIdmy.examples/groupId artifactIdspark-samples/artifactId version0.0.1-SNAPSHOT/version inceptionYear2014/inceptionYear properties maven.compiler.source1.6/maven.compiler.source maven.compiler.target1.6/maven.compiler.target encodingUTF-8/encoding scala.tools.version2.10/scala.tools.version scala.version2.10.0/scala.version /properties repositories repository idspark staging/id urlhttps://repository.apache.org/content/repositories/orgapachespark-1001/url /repository /repositories dependencies dependency groupIdorg.scala-lang/groupId artifactIdscala-library/artifactId version${scala.version}/version /dependency dependency groupIdorg.apache.spark/groupId artifactIdspark-core_${scala.tools.version}/artifactId version0.9.0-incubating/version /dependency !-- Test -- dependency groupIdjunit/groupId artifactIdjunit/artifactId version4.11/version scopetest/scope /dependency dependency groupIdorg.specs2/groupId artifactIdspecs2_${scala.tools.version}/artifactId version1.13/version scopetest/scope /dependency dependency groupIdorg.scalatest/groupId artifactIdscalatest_${scala.tools.version}/artifactId version2.0.M6-SNAP8/version scopetest/scope /dependency /dependencies build sourceDirectorysrc/main/scala/sourceDirectory testSourceDirectorysrc/test/scala/testSourceDirectory plugins plugin !-- see http://davidb.github.com/scala-maven-plugin -- groupIdnet.alchim31.maven/groupId artifactIdscala-maven-plugin/artifactId version3.1.6/version configuration scalaCompatVersion2.10/scalaCompatVersion jvmArgs jvmArg-Xms128m/jvmArg jvmArg-Xmx2048m/jvmArg /jvmArgs /configuration executions execution goals goalcompile/goal goaltestCompile/goal /goals
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)
I'll kick this vote off with a +1. On Thu, Jan 16, 2014 at 10:43 AM, Patrick Wendell pwend...@gmail.com wrote: I also ran your example locally and it worked with 0.8.1 and 0.9.0-rc1. So it's possible somehow you are pulling in an older version if Spark or an incompatible version of Hadoop. - Patrick On Thu, Jan 16, 2014 at 9:39 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Alex, Thanks for testing out this rc. Would you mind forking this into a different thread so we can discuss there? Also, does your application build and run correctly with spark 0.8.1? That would determine whether the problem is specifically with this rc... Patrick --- sent from my phone On Jan 15, 2014 11:44 PM, Alex Cozzi alexco...@gmail.com wrote: Oh, I forgot: I am using the “yarn” maven profile to target yarn 2.2 Alex Cozzi alexco...@gmail.com On Jan 15, 2014, at 11:41 PM, Alex Cozzi alexco...@gmail.com wrote: Just testing out the rc1. I create a dependent project (using maven) and I copied the HdfsTest.scala test, but I added a single line to save the file back to disk: package org.apache.spark.examples import org.apache.spark._ object HdfsTest { def main(args: Array[String]) { val sc = new SparkContext(args(0), HdfsTest, System.getenv(SPARK_HOME), SparkContext.jarOfClass(this.getClass)) val file = sc.textFile(args(1)) val mapped = file.map(s = s.length).cache() for (iter - 1 to 10) { val start = System.currentTimeMillis() for (x - mapped) { x + 2 } // println(Processing: + x) val end = System.currentTimeMillis() println(Iteration + iter + took + (end-start) + ms) mapped.saveAsTextFile(out) } System.exit(0) } } and this my pom file: project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd; modelVersion4.0.0/modelVersion groupIdmy.examples/groupId artifactIdspark-samples/artifactId version0.0.1-SNAPSHOT/version inceptionYear2014/inceptionYear properties maven.compiler.source1.6/maven.compiler.source maven.compiler.target1.6/maven.compiler.target encodingUTF-8/encoding scala.tools.version2.10/scala.tools.version scala.version2.10.0/scala.version /properties repositories repository idspark staging/id urlhttps://repository.apache.org/content/repositories/orgapachespark-1001/url /repository /repositories dependencies dependency groupIdorg.scala-lang/groupId artifactIdscala-library/artifactId version${scala.version}/version /dependency dependency groupIdorg.apache.spark/groupId artifactIdspark-core_${scala.tools.version}/artifactId version0.9.0-incubating/version /dependency !-- Test -- dependency groupIdjunit/groupId artifactIdjunit/artifactId version4.11/version scopetest/scope /dependency dependency groupIdorg.specs2/groupId artifactIdspecs2_${scala.tools.version}/artifactId version1.13/version scopetest/scope /dependency dependency groupIdorg.scalatest/groupId artifactIdscalatest_${scala.tools.version}/artifactId version2.0.M6-SNAP8/version scopetest/scope /dependency /dependencies build sourceDirectorysrc/main/scala/sourceDirectory testSourceDirectorysrc/test/scala/testSourceDirectory plugins plugin !-- see http://davidb.github.com/scala-maven-plugin -- groupIdnet.alchim31.maven/groupId artifactIdscala-maven-plugin/artifactId version3.1.6/version configuration scalaCompatVersion2.10/scalaCompatVersion jvmArgs jvmArg-Xms128m/jvmArg jvmArg-Xmx2048m/jvmArg /jvmArgs /configuration executions execution goals
Re: testing 0.9.0-incubating and maven
Hey Alex, Maven profiles only affect the Spark build itself. They do not transitively affect your own build. Checkout the docs for how to deploy applications on yarn: http://spark.incubator.apache.org/docs/latest/running-on-yarn.html When compiling your application, just should explicitly add the hadoop version you depend on to your own build (e.g. a hadoop-client dependency). Take a look at the example here where we show adding hadoop-client: http://spark.incubator.apache.org/docs/latest/quick-start.html When deploying Spark applications on YARN, you actually want to mark spark as a provided dependency in your application's maven and bundle your application as an assembly jar, then submit it with a Spark YARN bundle to a YARN cluster. The instructions are the same as they were in 0.8.1. For the spark jar you want to submit to YARN, you can download the precompiled Spark one. It might make sense to try this pipeline with 0.8.1 and get it working there. It sounds here more like you are dealing with getting the build set-up rather than a particular issue with the 0.9.0 RC. - Patrick On Thu, Jan 16, 2014 at 1:13 PM, Alex Cozzi alexco...@gmail.com wrote: Hi Patrick, thank you for testing. I think I found out what is wrong: I am trying to build my own examples that also depend on another library which in turns depends on hadoop 2.2. what was happening is that my library brings in hadoop 2.2, while spark depends on hadoop 1.04 and then I think I get conflict versions of the classes. A couple of things are not clear to me: 1: do the published artifacts support YARN and hadoop 2.2 or will I need to make my own build? 2: if they do, how do I activate the profiles in my maven config? I tried mvn -Pyarn compile but it does not work (maven says “[WARNING] The requested profile yarn could not be activated because it does not exist.”) essentially I would like to specify the spark dependencies as: dependencies dependency groupIdorg.scala-lang/groupId artifactIdscala-library/artifactId version${scala.version}/version /dependency dependency groupIdorg.apache.spark/groupId artifactIdspark-core_${scala.tools.version}/artifactId version0.9.0-incubating/version /dependency and tell maven to use the “yarn” profile for this dependency, but I do not seem to be able to make it work. Anybody has any suggestion? Alex
Re: spark code formatter?
I'm also very wary of using a code formatter for the reasons already mentioned by Reynold. Does scaliform have a mode where it just provides style checks rather than reformat the code? This is something we really need for, e.g., reviewing the many submissions to the project. - Patrick On Wed, Jan 8, 2014 at 11:51 PM, Reynold Xin r...@databricks.com wrote: Thanks for doing that, DB. Not sure about others, but I'm actually strongly against blanket automatic code formatters, given that they can be disruptive. Often humans would intentionally choose to style things in a certain way for more clear semantics and better readability. Code formatters don't capture these nuances. It is pretty dangerous to just auto format everything. Maybe it'd be ok if we restrict the code formatters to a very limited set of things, such as indenting function parameters, etc. On Wed, Jan 8, 2014 at 10:28 PM, DB Tsai dbt...@alpinenow.com wrote: A pull request for scalariform. https://github.com/apache/incubator-spark/pull/365 Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Wed, Jan 8, 2014 at 10:09 PM, DB Tsai dbt...@alpinenow.com wrote: We use sbt-scalariform in our company, and it can automatically format the coding style when runs `sbt compile`. https://github.com/sbt/sbt-scalariform We ask our developers to run `sbt compile` before commit, and it's really nice to see everyone has the same spacing and indentation. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Wed, Jan 8, 2014 at 9:50 PM, Reynold Xin r...@databricks.com wrote: We have a Scala style configuration file in Shark: https://github.com/amplab/shark/blob/master/scalastyle-config.xml However, the scalastyle project is still pretty primitive and doesn't cover most of the use cases. It is still great to include it to cover basic checks such as 100-char wide lines. On Wed, Jan 8, 2014 at 8:02 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Not that I know of. This would be very useful to add, especially if we can make SBT automatically check the code style (or we can somehow plug this into Jenkins). Matei On Jan 8, 2014, at 11:00 AM, Michael Allman m...@allman.ms wrote: Hi, I've read the spark code style guide for contributors here: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide For scala code, do you have a scalariform configuration that you use to format your code to these specs? Cheers, Michael
Re: Build Changes for SBT Users
Ya I was referring to already released version. Of course we can update for subsequent releases... On Sun, Jan 5, 2014 at 4:24 PM, Reynold Xin r...@databricks.com wrote: Why is it not possible? You always update the script; just can't update scripts for released versions. On Sat, Jan 4, 2014 at 9:07 PM, Patrick Wendell pwend...@gmail.com wrote: I agree TD - I was just saying that Reynold's proposal that we could update the release post-hoc is unfortunately not possible. On Sat, Jan 4, 2014 at 7:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Patrick, that is right. All we are trying to ensure is to make a best-effort attempt to make it smooth for a new user. The script will try its best to automatically install / download sbt for the user. The fallback will be that the user will have to install sbt on their own. If the URL happens to change and our script fails to automatically download, then we are *no worse* than not providing the script at all. TD On Sat, Jan 4, 2014 at 7:06 PM, Patrick Wendell pwend...@gmail.com wrote: Reynold the issue is releases are immutable and we expect them to be downloaded for several years after the release date. On Sat, Jan 4, 2014 at 5:57 PM, Xuefeng Wu ben...@gmail.com wrote: Sound reasonable. But I think few installed sbt even it is easy to install. I think can provide this tricky script in online document, user could download this script to install sbt independence. Sound like a yet another brew install sbt? :) Yours, Xuefeng Wu 吴雪峰 敬上 On 2014年1月5日, at 上午2:56, Patrick Wendell pwend...@gmail.com wrote: We thought about this but elected not to do this for a few reasons. 1. Some people build from machines that do not have internet access for security reasons and retrieve dependency from internal nexus repositories. So having a build dependency that relies on internet downloads is not desirable. 2. It's a hard to ensure stability of a particular URL in perpetuity. This is why maven central and other mirror networks exist. Keep in mind that we can't change the release code ever once we release it, and if something changed about the particular URL it could break the build. - Patrick On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com wrote: +1 on bundling a script similar to that one On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca wrote: Could we ship a shell script which downloads the sbt jar if not present (like for example https://github.com/holdenk/slashem/blob/master/sbt)? On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to an ASF requirement, we recently merged a patch which removes the sbt jar from the build. This is necessary because we aren't allowed to distributed binary artifacts with our source packages. This means that instead of building Spark with sbt/sbt XXX, you'll need to have sbt yourself and just run sbt XXX from within the Spark directory. This is similar to the maven build, where we expect users already have maven installed. You can download sbt at http://www.scala-sbt.org/. It's okay to just download the most recent version of sbt, since sbt knows how to fetch other versions of itself and will always use the one we specify in our build file to compile spark. - Patrick -- Cell : 425-233-8271
Build Changes for SBT Users
Hey All, Due to an ASF requirement, we recently merged a patch which removes the sbt jar from the build. This is necessary because we aren't allowed to distributed binary artifacts with our source packages. This means that instead of building Spark with sbt/sbt XXX, you'll need to have sbt yourself and just run sbt XXX from within the Spark directory. This is similar to the maven build, where we expect users already have maven installed. You can download sbt at http://www.scala-sbt.org/. It's okay to just download the most recent version of sbt, since sbt knows how to fetch other versions of itself and will always use the one we specify in our build file to compile spark. - Patrick
Re: Build Changes for SBT Users
We thought about this but elected not to do this for a few reasons. 1. Some people build from machines that do not have internet access for security reasons and retrieve dependency from internal nexus repositories. So having a build dependency that relies on internet downloads is not desirable. 2. It's a hard to ensure stability of a particular URL in perpetuity. This is why maven central and other mirror networks exist. Keep in mind that we can't change the release code ever once we release it, and if something changed about the particular URL it could break the build. - Patrick On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com wrote: +1 on bundling a script similar to that one On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca wrote: Could we ship a shell script which downloads the sbt jar if not present (like for example https://github.com/holdenk/slashem/blob/master/sbt )? On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to an ASF requirement, we recently merged a patch which removes the sbt jar from the build. This is necessary because we aren't allowed to distributed binary artifacts with our source packages. This means that instead of building Spark with sbt/sbt XXX, you'll need to have sbt yourself and just run sbt XXX from within the Spark directory. This is similar to the maven build, where we expect users already have maven installed. You can download sbt at http://www.scala-sbt.org/. It's okay to just download the most recent version of sbt, since sbt knows how to fetch other versions of itself and will always use the one we specify in our build file to compile spark. - Patrick -- Cell : 425-233-8271
Re: Build Changes for SBT Users
Hey Holden, That sounds reasonable to me. Where would we get a url we can control though? Right now the project has web space is at incubator.apache... but later this will change to a full apache domain. Is there somewhere in maven central these jars are hosted... that would be the nicest because things like repo1.maven.org basically never changes. - Patrick On Sat, Jan 4, 2014 at 1:20 PM, Holden Karau hol...@pigscanfly.ca wrote: That makes sense, I think we could structure a script in such a way that it would overcome these problems though and probably provide a fair a mount of benefit for people who just want to get started quickly. The easiest would be to have it use the system sbt if present and then fall back to downloading the sbt jar. As far as stability of the URL goes we could solve this by either having it point at a domain we control, or just with an clear error message indicating it failed to download sbt and the user needs to install sbt. If a restructured script in that manner would be useful I could whip up a pull request :) On Sat, Jan 4, 2014 at 10:56 AM, Patrick Wendell pwend...@gmail.com wrote: We thought about this but elected not to do this for a few reasons. 1. Some people build from machines that do not have internet access for security reasons and retrieve dependency from internal nexus repositories. So having a build dependency that relies on internet downloads is not desirable. 2. It's a hard to ensure stability of a particular URL in perpetuity. This is why maven central and other mirror networks exist. Keep in mind that we can't change the release code ever once we release it, and if something changed about the particular URL it could break the build. - Patrick On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com wrote: +1 on bundling a script similar to that one On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca wrote: Could we ship a shell script which downloads the sbt jar if not present (like for example https://github.com/holdenk/slashem/blob/master/sbt )? On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to an ASF requirement, we recently merged a patch which removes the sbt jar from the build. This is necessary because we aren't allowed to distributed binary artifacts with our source packages. This means that instead of building Spark with sbt/sbt XXX, you'll need to have sbt yourself and just run sbt XXX from within the Spark directory. This is similar to the maven build, where we expect users already have maven installed. You can download sbt at http://www.scala-sbt.org/. It's okay to just download the most recent version of sbt, since sbt knows how to fetch other versions of itself and will always use the one we specify in our build file to compile spark. - Patrick -- Cell : 425-233-8271 -- Cell : 425-233-8271
Re: Build Changes for SBT Users
Reynold the issue is releases are immutable and we expect them to be downloaded for several years after the release date. On Sat, Jan 4, 2014 at 5:57 PM, Xuefeng Wu ben...@gmail.com wrote: Sound reasonable. But I think few installed sbt even it is easy to install. I think can provide this tricky script in online document, user could download this script to install sbt independence. Sound like a yet another brew install sbt? :) Yours, Xuefeng Wu 吴雪峰 敬上 On 2014年1月5日, at 上午2:56, Patrick Wendell pwend...@gmail.com wrote: We thought about this but elected not to do this for a few reasons. 1. Some people build from machines that do not have internet access for security reasons and retrieve dependency from internal nexus repositories. So having a build dependency that relies on internet downloads is not desirable. 2. It's a hard to ensure stability of a particular URL in perpetuity. This is why maven central and other mirror networks exist. Keep in mind that we can't change the release code ever once we release it, and if something changed about the particular URL it could break the build. - Patrick On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com wrote: +1 on bundling a script similar to that one On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca wrote: Could we ship a shell script which downloads the sbt jar if not present (like for example https://github.com/holdenk/slashem/blob/master/sbt )? On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to an ASF requirement, we recently merged a patch which removes the sbt jar from the build. This is necessary because we aren't allowed to distributed binary artifacts with our source packages. This means that instead of building Spark with sbt/sbt XXX, you'll need to have sbt yourself and just run sbt XXX from within the Spark directory. This is similar to the maven build, where we expect users already have maven installed. You can download sbt at http://www.scala-sbt.org/. It's okay to just download the most recent version of sbt, since sbt knows how to fetch other versions of itself and will always use the one we specify in our build file to compile spark. - Patrick -- Cell : 425-233-8271
Re: Build Changes for SBT Users
I agree TD - I was just saying that Reynold's proposal that we could update the release post-hoc is unfortunately not possible. On Sat, Jan 4, 2014 at 7:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Patrick, that is right. All we are trying to ensure is to make a best-effort attempt to make it smooth for a new user. The script will try its best to automatically install / download sbt for the user. The fallback will be that the user will have to install sbt on their own. If the URL happens to change and our script fails to automatically download, then we are *no worse* than not providing the script at all. TD On Sat, Jan 4, 2014 at 7:06 PM, Patrick Wendell pwend...@gmail.com wrote: Reynold the issue is releases are immutable and we expect them to be downloaded for several years after the release date. On Sat, Jan 4, 2014 at 5:57 PM, Xuefeng Wu ben...@gmail.com wrote: Sound reasonable. But I think few installed sbt even it is easy to install. I think can provide this tricky script in online document, user could download this script to install sbt independence. Sound like a yet another brew install sbt? :) Yours, Xuefeng Wu 吴雪峰 敬上 On 2014年1月5日, at 上午2:56, Patrick Wendell pwend...@gmail.com wrote: We thought about this but elected not to do this for a few reasons. 1. Some people build from machines that do not have internet access for security reasons and retrieve dependency from internal nexus repositories. So having a build dependency that relies on internet downloads is not desirable. 2. It's a hard to ensure stability of a particular URL in perpetuity. This is why maven central and other mirror networks exist. Keep in mind that we can't change the release code ever once we release it, and if something changed about the particular URL it could break the build. - Patrick On Sat, Jan 4, 2014 at 9:34 AM, Andrew Ash and...@andrewash.com wrote: +1 on bundling a script similar to that one On Sat, Jan 4, 2014 at 4:48 AM, Holden Karau hol...@pigscanfly.ca wrote: Could we ship a shell script which downloads the sbt jar if not present (like for example https://github.com/holdenk/slashem/blob/master/sbt)? On Sat, Jan 4, 2014 at 12:02 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to an ASF requirement, we recently merged a patch which removes the sbt jar from the build. This is necessary because we aren't allowed to distributed binary artifacts with our source packages. This means that instead of building Spark with sbt/sbt XXX, you'll need to have sbt yourself and just run sbt XXX from within the Spark directory. This is similar to the maven build, where we expect users already have maven installed. You can download sbt at http://www.scala-sbt.org/. It's okay to just download the most recent version of sbt, since sbt knows how to fetch other versions of itself and will always use the one we specify in our build file to compile spark. - Patrick -- Cell : 425-233-8271
Re: Changes that affect packaging and running Spark
-- Small correction /sbin contains administrative scripts for launching the standalone cluster manager: /sbin/start-master.sh /sbin/start-all.sh ...etc
Re: Terminology: worker vs slave
Ya we've been trying to standardize on the terminology here (see glossary): http://spark.incubator.apache.org/docs/latest/cluster-overview.html I think slave actually isn't mentioned here at all - but references to slave in the codebase are synonymous with worker. - Patrick On Thu, Jan 2, 2014 at 10:42 PM, Reynold Xin r...@databricks.com wrote: It is historic. I think we are converging towards worker: the slave daemon in the standalone cluster manager executor: the jvm process that is launched by the worker that executes tasks On Thu, Jan 2, 2014 at 10:39 PM, Andrew Ash and...@andrewash.com wrote: The terms worker and slave seem to be used interchangeably. Are they the same? Worker is used more frequently in the codebase: aash@aash-mbp ~/git/spark$ git grep -i worker | wc -l 981 aash@aash-mbp ~/git/spark$ git grep -i slave | wc -l 348 aash@aash-mbp ~/git/spark$ Does it make sense to unify on one or the other?
Disallowing null mergeCombiners
Hey All, There is a small API change that we are considering for the external sort patch. Previously we allowed mergeCombiner to be null when map side aggregation was not enabled. This is because it wasn't necessary in that case since mappers didn't ship pre-aggregated values to reducers. Because the external sort capability also relies on the mergeCombiner function to merge partially-aggregated on-disk segments, we now need it all the time, even if map side aggregation is enabled. This is a fairly esoteric thing that I'm not sure anyone other than Shark ever used, but I want to check in case anyone had feelings about this. The relevant code is here: https://github.com/apache/incubator-spark/pull/303/files#diff-f70e97c099b5eac05c75288cb215e080R72 - Patrick
Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st
Hey Andy - these Nabble groups look great! Thanks for setting them up. On Tue, Dec 24, 2013 at 10:49 AM, Evan Chan e...@ooyala.com wrote: Thanks Andy, at first glance nabble seems great, it allows search plus posting new topics, so it appears to be bidirectional.Now just have to register an account on there. On Sun, Dec 22, 2013 at 2:47 PM, Andy Konwinski andykonwin...@gmail.comwrote: Per Matei's suggestion, I've set up two nabble archive lists, one to archive the apache dev list and one to archive the apache user list. user list archive: http://apache-spark-user-list.1001560.n3.nabble.com dev list archive: http://apache-spark-developers-list.1001551.n3.nabble.com Between these and whatever solution we end up with for the google group mirrors, we should have decent enough alternatives to reading via the apache list archives going forward. On Thu, Dec 19, 2013 at 11:09 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Yes, I agree that we should close down the existing Google group on Jan 1st. While it’s more convenient to use, it’s created confusion. I hope that we can get the ASF to support better search interfaces in the future too. I think we just have to drive this from within. The Google Group should be a nice way to make the content searchable from the web. We should also see what it takes to make it mirrored on Nabble ( http://www.nabble.com). I’ve found a lot of information about other projects there, and other Apache projects do use it. Matei On Dec 19, 2013, at 10:49 PM, Andy Konwinski andykonwin...@gmail.com wrote: I've set up two new unofficial google groups to mirror the Apache Spark user and dev lists: https://groups.google.com/forum/#!forum/apache-spark-dev-mirror https://groups.google.com/forum/#!forum/apache-spark-user-mirror Basically these lists each subscribe to the corresponding Apache list. They do not allow folks to subscribe directly to them. Getting emails from the Google Group would offer no advantages that I can think of and we really want to encourage folks to sign up for the official mailing list instead. The lists do allow the public to send email to them, which I think might be necessary since the from: field for all emails that get distributed via the Apache mailing list is set to the author of the email. I think this might be a great compromise. At least we can try this out and see how it goes. Matei, can you confirm that Jan 1 is the date we want to turn off the existing spark-users google group? We could consider using the existing spark-developers and spark-users google groups instead of the two new ones I just created but I think that it is much more obvious to have the lists include the word mirror in their names. The dev list mirror seems to be working, because I see the last couple emails from this thread in it already. I'll confirm and ensure that the user list mirror is working too. Thoughts? Andy P.S. Thanks to Patrick for suggesting this to me originally. On Thu, Dec 19, 2013 at 8:46 PM, Aaron Davidson ilike...@gmail.com wrote: I'd be fine with one-way mirrors here (Apache threads being reflected in Google groups) -- I have no idea how one is supposed to navigate the Apache list to look for historic threads. On Thu, Dec 19, 2013 at 7:58 PM, Mike Potts maspo...@gmail.com wrote: Thanks very much for the prompt and comprehensive reply! I appreciate the overarching desire to integrate with apache: I'm very happy to hear that there's a move to use the existing groups as mirrors: that will overcome all of my objections: particularly if it's bidirectional! :) On Thursday, December 19, 2013 7:19:06 PM UTC-8, Andy Konwinski wrote: Hey Mike, As you probably noticed when you CC'd spark-de...@googlegroups.com, that list has already be reconfigured so that it no longer allows posting (and bounces emails sent to it). We will be doing the same thing to the spark...@googlegroups.comlist too (we'll announce a date for that soon). That may sound very frustrating, and you are *not* alone feeling that way. We've had a long conversation with our mentors about this, and I've felt very similar to you, so I'd like to give you background. As I'm coming to see it, part of becoming an Apache project is moving the community *fully* over to Apache infrastructure, and more generally the Apache way of organizing the community. This applies in both the nuts-and-bolts sense of being on apache infra, but possibly more importantly, it is also a guiding principle and way of thinking. In various ways, moving to apache Infra can be a painful process, and IMO the loss of all the great mailing list functionality that comes with using Google Groups is perhaps the most painful step. But basically, the de facto mailing lists need to be the Apache ones, and not Google
Re: Akka problem when using scala command to launch Spark applications in the current 0.9.0-SNAPSHOT
Even, This problem also exists for people who write their own applications that depend on/include Spark. E.g. they bundle up their app and then launch the driver with scala -cp my-budle.jar... I've seen this cause an issue in that setting. - Patrick On Tue, Dec 24, 2013 at 10:50 AM, Evan Chan e...@ooyala.com wrote: Hi Reynold, The default, documented methods of starting Spark all use the assembly jar, and thus java, right? -Evan On Fri, Dec 20, 2013 at 11:36 PM, Reynold Xin r...@databricks.com wrote: It took me hours to debug a problem yesterday on the latest master branch (0.9.0-SNAPSHOT), and I would like to share with the dev list in case anybody runs into this Akka problem. A little background for those of you who haven't followed closely the development of Spark and YARN 2.2: YARN 2.2 uses protobuf 2.5, and Akka uses an older version of protobuf that is not binary compatible. In order to have a single build that is compatible for both YARN 2.2 and pre-2.2 YARN/Hadoop, we published a special version of Akka that builds with protobuf shaded (i.e. using a different package name for the protobuf stuff). However, it turned out Scala 2.10 includes a version of Akka jar in its default classpath (look at the lib folder in Scala 2.10 binary distribution). If you use the scala command to launch any Spark application on the current master branch, there is a pretty high chance that you wouldn't be able to create the SparkContext (stack trace at the end of the email). The problem is that the Akka packaged with Scala 2.10 takes precedence in the classloader over the special Akka version Spark includes. Before we have a good solution for this, the workaround is to use java to launch the application instead of scala. All you need to do is to include the right Scala jars (scala-library and scala-compiler) in the classpath. Note that the scala command is really just a simple script that calls java with the right classpath. Stack trace: java.lang.NoSuchMethodException: akka.remote.RemoteActorRefProvider.init(java.lang.String, akka.actor.ActorSystem$Settings, akka.event.EventStream, akka.actor.Scheduler, akka.actor.DynamicAccess) at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:77) at scala.util.Try$.apply(Try.scala:161) at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:74) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:85) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:85) at scala.util.Success.flatMap(Try.scala:200) at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:85) at akka.actor.ActorSystemImpl.init(ActorSystem.scala:546) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:79) at org.apache.spark.SparkEnv$.createFromSystemProperties(SparkEnv.scala:120) at org.apache.spark.SparkContext.init(SparkContext.scala:106) -- -- Evan Chan Staff Engineer e...@ooyala.com | http://www.ooyala.com/ http://www.facebook.com/ooyalahttp://www.linkedin.com/company/ooyalahttp://www.twitter.com/ooyala
Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st
Andy and Mike, I'd also prefer to just convert the old groups into mirrors. That way people who are still subscribed to them will continue to get e-mails (and most people on the list are read-only users). Ideally we'd have the behavior that users who try to e-mail the google group get a bounce back saying this is now a read only mirror. That said I have *no idea* of this is possible to set-up nicely within google groups. I defer to Andy! Having the new mirror groups also seems like a decent solution as well... - Patrick On Fri, Dec 20, 2013 at 8:35 AM, Mike Potts maspo...@gmail.com wrote: I actually prefer that, but I didn't want my preference to get in the way of creating mirror groups, one way or the other :) (My argument would be that since the old groups would be closing anyway, re-purposing them as mirrors is fair use: and less work/confusing than creating new *-mirror groups instead.) On Friday, December 20, 2013 8:29:40 AM UTC-8, Andy Konwinski wrote: That would be really awesome. I'm not familiar with any Google Groups functionality that supports that but I'll look. That's an argument for maybe just changing the names of the existing groups to something with mirror in them instead of using newly created ones. -- You received this message because you are subscribed to the Google Groups Spark Users group. To unsubscribe from this group and stop receiving emails from it, send an email to spark-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Spark 0.8.1 Released
Hi everyone, We've just posted Spark 0.8.1, a new maintenance release that contains some bug fixes and improvements to the 0.8 branch. The full release notes are available at [1]. Apart from various bug fixes, 0.8.1 includes support for YARN 2.2, a high availability mode for the standalone scheduler, and optimizations to the shuffle. We recommend that current users update to this release. You can grab the release at [2]. [1] http://spark.incubator.apache.org/releases/spark-release-0-8-1.html [2] http://spark.incubator.apache.org/downloads Thanks to the following people who contributed to this release: Michael Armbrust, Pierre Borckmans, Evan Chan, Ewen Cheslack, Mosharaf Chowdhury, Frank Dai, Aaron Davidson, Tathagata Das, Ankur Dave, Harvey Feng, Ali Ghodsi, Thomas Graves, Li Guoqiang, Stephen Haberman, Haidar Hadi, Nathan Howell, Holden Karau, Du Li, Raymond Liu, Xi Liu, David McCauley, Michael (wannabeast), Fabrizio Milo, Mridul Muralidharan, Sundeep Narravula, Kay Ousterhout, Nick Pentreath, Imran Rashid, Ahir Reddy, Josh Rosen, Henry Saputra, Jerry Shao, Mingfei Shi, Andre Schumacher, Karthik Tunga, Patrick Wendell, Neal Wiggins, Andrew Xia, Reynold Xin, Matei Zaharia, and Wu Zeming - Patrick
[RESULT] [VOTE] Release Apache Spark 0.8.1-incubating (rc4)
The vote is now closed. This vote passes with 4 IPMC +1's and no 0 or -1 votes. +1 (4 Total) Marvin Humphrey Henry Saputra Chris Mattmann Roman Shaposhnik 0 (0 Total) -1 (0 Total) * = Binding Vote Thanks to everyone who helped vet this release. - Patrick
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)
You can checkout the docs mentioned in the vote thread. There is also a pre-build binary for hadoop2 that is compiled for YARN 2.2 - Patrick On Sun, Dec 15, 2013 at 4:31 AM, Azuryy Yu azury...@gmail.com wrote: yarn 2.2, not yarn 0.22, I am so sorry. On Sun, Dec 15, 2013 at 8:31 PM, Azuryy Yu azury...@gmail.com wrote: Hi, Spark-0.8.1 supports yarn 0.22 right? where to find the release note? Thanks. On Sun, Dec 15, 2013 at 3:20 AM, Henry Saputra henry.sapu...@gmail.comwrote: Yeah seems like it. He was ok with our prev release. Let's wait for his reply On Saturday, December 14, 2013, Patrick Wendell wrote: Henry - from that thread it looks like sebb's concern was something different than this. On Sat, Dec 14, 2013 at 11:08 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, Yeap I agree, but technically ASF VOTE release on source only, there even debate about it =), so putting it in the vote staging artifact could confuse people because in our case we do package 3rd party libraries in the binary jars. I have sent email to sebb asking clarification about his concern in general@ list. - Henry On Sat, Dec 14, 2013 at 10:56 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Henry, One thing a lot of people do during the vote is test the binaries and make sure they work. This is really valuable. If you'd like I could add a caveat to the vote thread explaining that we are only voting on the source. - Patrick On Sat, Dec 14, 2013 at 10:40 AM, Henry Saputra henry.sapu...@gmail.com wrote: Actually we should be fine putting the binaries there as long as the VOTE is for the source. Let's verify with sebb in the general@ list about his concern. - Henry On Sat, Dec 14, 2013 at 10:31 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, as sebb has mentioned let's move the binaries from the voting directory in your people.apache.org directory. ASF release voting is for source code and not binaries, and technically we provide binaries for convenience. And add link to the KEYS location in the dist[1] to let verify signatures. Sorry for the late response to the VOTE thread, guys. - Henry [1] https://dist.apache.org/repos/dist/release/incubator/spark/KEYS On Fri, Dec 13, 2013 at 6:37 PM, Patrick Wendell pwend...@gmail.com wrote: The vote is now closed. This vote passes with 5 PPMC +1's and no 0 or -1 votes. +1 (5 Total) Matei Zaharia* Nick Pentreath* Patrick Wendell* Prashant Sharma* Tom Graves* 0 (0 Total) -1 (0 Total) * = Binding Vote As per the incubator release guide [1] I'll be sending this to the general incubator list for a final vote from IPMC members. [1] http://incubator.apache.org/guides/releasemanagement.html#best-practice-incubator-release- vote On Thu, Dec 12, 2013 at 8:59 AM, Evan Chan e...@ooyala.com wrote: I'd be personally fine with a standard workflow of assemble-deps + packaging just the Spark files as separate packages, if it speeds up everyone's development time. On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra m...@clearstorydata.com wrote: I don't know how to make sense of the numbers, but here's what I've got from a very small sample size.
Re: Scala 2.10 Merge
Alright I just merged this in - so Spark is officially Scala 2.10 from here forward. For reference I cut a new branch called scala-2.9 with the commit immediately prior to the merge: https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=shortlog;h=refs/heads/scala-2.9 - Patrick On Thu, Dec 12, 2013 at 8:26 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Reymond, Let's move this discussion out of this thread and into the associated JIRA. I'll write up our current approach over there. https://spark-project.atlassian.net/browse/SPARK-995 - Patrick On Thu, Dec 12, 2013 at 5:56 PM, Liu, Raymond raymond@intel.com wrote: Hi Patrick So what's the plan for support Yarn 2.2 in 0.9? As far as I can see, if you want to support both 2.2 and 2.0 , due to protobuf version incompatible issue. You need two version of akka anyway. Akka 2.3-M1 looks like have a little bit change in API, we probably could isolate the code like what we did on yarn part API. I remember that it is mentioned that to use reflection for different API is preferred. So the purpose to use reflection is to use one release bin jar to support both version of Hadoop/Yarn on runtime, instead of build different bin jar on compile time? Then all code related to hadoop will also be built in separate modules for loading on demand? This sounds to me involve a lot of works. And you still need to have shim layer and separate code for different version API and depends on different version Akka etc. Sounds like and even strict demands versus our current approaching on master, and with dynamic class loader in addition, And the problem we are facing now are still there? Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 12, 2013 5:13 PM To: dev@spark.incubator.apache.org Subject: Re: Scala 2.10 Merge Also - the code is still there because of a recent merge that took in some newer changes... we'll be removing it for the final merge. On Thu, Dec 12, 2013 at 1:12 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Raymond, This won't work because AFAIK akka 2.3-M1 is not binary compatible with akka 2.2.3 (right?). For all of the non-yarn 2.2 versions we need to still use the older protobuf library, so we'd need to support both. I'd also be concerned about having a reference to a non-released version of akka. Akka is the source of our hardest-to-find bugs and simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting. Of course, if you are building off of master you can maintain a fork that uses this. - Patrick On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond raymond@intel.comwrote: Hi Patrick What does that means for drop YARN 2.2? seems codes are still there. You mean if build upon 2.2 it will break, and won't and work right? Since the home made akka build on scala 2.10 are not there. While, if for this case, can we just use akka 2.3-M1 which run on protobuf 2.5 for replacement? Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 12, 2013 4:21 PM To: dev@spark.incubator.apache.org Subject: Scala 2.10 Merge Hi Developers, In the next few days we are planning to merge Scala 2.10 support into Spark. For those that haven't been following this, Prashant Sharma has been maintaining the scala-2.10 branch of Spark for several months. This branch is current with master and has been reviewed for merging: https://github.com/apache/incubator-spark/tree/scala-2.10 Scala 2.10 support is one of the most requested features for Spark - it will be great to get this into Spark 0.9! Please note that *Scala 2.10 is not binary compatible with Scala 2.9*. With that in mind, I wanted to give a few heads-up/requests to developers: If you are developing applications on top of Spark's master branch, those will need to migrate to Scala 2.10. You may want to download and test the current scala-2.10 branch in order to make sure you will be okay as Spark developments move forward. Of course, you can always stick with the current master commit and be fine (I'll cut a tag when we do the merge in order to delineate where the version changes). Please open new threads on the dev list to report and discuss any issues. This merge will temporarily drop support for YARN 2.2 on the master branch. This is because the workaround we used was only compiled for Scala 2.9. We are going to come up with a more robust solution to YARN 2.2 support before releasing 0.9. Going forward, we will continue to make maintenance releases on branch-0.8 which will remain compatible with Scala 2.9. For those interested, the primary code changes in this merge are upgrading the akka version, changing the use
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)
The vote is now closed. This vote passes with 5 PPMC +1's and no 0 or -1 votes. +1 (5 Total) Matei Zaharia* Nick Pentreath* Patrick Wendell* Prashant Sharma* Tom Graves* 0 (0 Total) -1 (0 Total) * = Binding Vote As per the incubator release guide [1] I'll be sending this to the general incubator list for a final vote from IPMC members. [1] http://incubator.apache.org/guides/releasemanagement.html#best-practice-incubator-release- vote On Thu, Dec 12, 2013 at 8:59 AM, Evan Chan e...@ooyala.com wrote: I'd be personally fine with a standard workflow of assemble-deps + packaging just the Spark files as separate packages, if it speeds up everyone's development time. On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra m...@clearstorydata.com wrote: I don't know how to make sense of the numbers, but here's what I've got from a very small sample size. For both v0.8.0-incubating and v0.8.1-incubating, building separate assemblies is faster than `./sbt/sbt assembly` and the times for building separate assemblies for 0.8.0 and 0.8.1 are about the same. For v0.8.0-incubating, `./sbt/sbt assembly` takes about 2.5x as long as the sum of the separate assemblies. For v0.8.1-incubating, `./sbt/sbt assembly` takes almost 8x as long as the sum of the separate assemblies. Weird. On Wed, Dec 11, 2013 at 11:49 AM, Patrick Wendell pwend...@gmail.com wrote: I'll +1 myself also. For anyone who has the slow build problem: does this issue happen when building v0.8.0-incubating also? Trying to figure out whether it's related to something we added in 0.8.1 or if it's a long standing issue. - Patrick On Wed, Dec 11, 2013 at 10:39 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Woah, weird, but definitely good to know. If you’re doing Spark development, there’s also a more convenient option added by Shivaram in the master branch. You can do sbt assemble-deps to package *just* the dependencies of each project in a special assembly JAR, and then use sbt compile to update the code. This will use the classes directly out of the target/scala-2.9.3/classes directories. You have to redo assemble-deps only if your external dependencies change. Matei On Dec 11, 2013, at 1:04 AM, Prashant Sharma scrapco...@gmail.com wrote: I hope this PR https://github.com/apache/incubator-spark/pull/252can help. Again this is not a blocker for the release from my side either. On Wed, Dec 11, 2013 at 2:14 PM, Mark Hamstra m...@clearstorydata.com wrote: Interesting, and confirmed: On my machine where `./sbt/sbt assembly` takes a long, long, long time to complete (a MBP, in my case), building three separate assemblies (`./sbt/sbt assembly/assembly`, `./sbt/sbt examples/assembly`, `./sbt/sbt tools/assembly`) takes much, much less time. On Wed, Dec 11, 2013 at 12:02 AM, Prashant Sharma scrapco...@gmail.com wrote: forgot to mention, after running sbt/sbt assembly/assembly running sbt/sbt examples/assembly takes just 37s. Not to mention my hardware is not really great. On Wed, Dec 11, 2013 at 1:28 PM, Prashant Sharma scrapco...@gmail.com wrote: Hi Patrick and Matei, Was trying out this and followed the quick start guide which says do sbt/sbt assembly, like few others I was also stuck for few minutes on linux. On the other hand if I use sbt/sbt assembly/assembly it is much faster. Should we change the documentation to reflect this. It will not be great for first time users to get stuck there. On Wed, Dec 11, 2013 at 9:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Built and tested it on Mac OS X. Matei On Dec 10, 2013, at 4:49 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit b87d31d): https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=commit;h=b87d31dd8eb4b4e47c0138e9242d0dd6922c8c4e The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-040/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/ For information about the contents of this release see: https://git-wip-us.apache.org/repos/asf
Re: Scala 2.10 Merge
Also - the code is still there because of a recent merge that took in some newer changes... we'll be removing it for the final merge. On Thu, Dec 12, 2013 at 1:12 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Raymond, This won't work because AFAIK akka 2.3-M1 is not binary compatible with akka 2.2.3 (right?). For all of the non-yarn 2.2 versions we need to still use the older protobuf library, so we'd need to support both. I'd also be concerned about having a reference to a non-released version of akka. Akka is the source of our hardest-to-find bugs and simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting. Of course, if you are building off of master you can maintain a fork that uses this. - Patrick On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond raymond@intel.comwrote: Hi Patrick What does that means for drop YARN 2.2? seems codes are still there. You mean if build upon 2.2 it will break, and won't and work right? Since the home made akka build on scala 2.10 are not there. While, if for this case, can we just use akka 2.3-M1 which run on protobuf 2.5 for replacement? Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 12, 2013 4:21 PM To: dev@spark.incubator.apache.org Subject: Scala 2.10 Merge Hi Developers, In the next few days we are planning to merge Scala 2.10 support into Spark. For those that haven't been following this, Prashant Sharma has been maintaining the scala-2.10 branch of Spark for several months. This branch is current with master and has been reviewed for merging: https://github.com/apache/incubator-spark/tree/scala-2.10 Scala 2.10 support is one of the most requested features for Spark - it will be great to get this into Spark 0.9! Please note that *Scala 2.10 is not binary compatible with Scala 2.9*. With that in mind, I wanted to give a few heads-up/requests to developers: If you are developing applications on top of Spark's master branch, those will need to migrate to Scala 2.10. You may want to download and test the current scala-2.10 branch in order to make sure you will be okay as Spark developments move forward. Of course, you can always stick with the current master commit and be fine (I'll cut a tag when we do the merge in order to delineate where the version changes). Please open new threads on the dev list to report and discuss any issues. This merge will temporarily drop support for YARN 2.2 on the master branch. This is because the workaround we used was only compiled for Scala 2.9. We are going to come up with a more robust solution to YARN 2.2 support before releasing 0.9. Going forward, we will continue to make maintenance releases on branch-0.8 which will remain compatible with Scala 2.9. For those interested, the primary code changes in this merge are upgrading the akka version, changing the use of Scala 2.9's ClassManifest construct to Scala 2.10's ClassTag, and updating the spark shell to work with Scala 2.10's repl. - Patrick
Re: Scala 2.10 Merge
Hey Reymond, Let's move this discussion out of this thread and into the associated JIRA. I'll write up our current approach over there. https://spark-project.atlassian.net/browse/SPARK-995 - Patrick On Thu, Dec 12, 2013 at 5:56 PM, Liu, Raymond raymond@intel.com wrote: Hi Patrick So what's the plan for support Yarn 2.2 in 0.9? As far as I can see, if you want to support both 2.2 and 2.0 , due to protobuf version incompatible issue. You need two version of akka anyway. Akka 2.3-M1 looks like have a little bit change in API, we probably could isolate the code like what we did on yarn part API. I remember that it is mentioned that to use reflection for different API is preferred. So the purpose to use reflection is to use one release bin jar to support both version of Hadoop/Yarn on runtime, instead of build different bin jar on compile time? Then all code related to hadoop will also be built in separate modules for loading on demand? This sounds to me involve a lot of works. And you still need to have shim layer and separate code for different version API and depends on different version Akka etc. Sounds like and even strict demands versus our current approaching on master, and with dynamic class loader in addition, And the problem we are facing now are still there? Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 12, 2013 5:13 PM To: dev@spark.incubator.apache.org Subject: Re: Scala 2.10 Merge Also - the code is still there because of a recent merge that took in some newer changes... we'll be removing it for the final merge. On Thu, Dec 12, 2013 at 1:12 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Raymond, This won't work because AFAIK akka 2.3-M1 is not binary compatible with akka 2.2.3 (right?). For all of the non-yarn 2.2 versions we need to still use the older protobuf library, so we'd need to support both. I'd also be concerned about having a reference to a non-released version of akka. Akka is the source of our hardest-to-find bugs and simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting. Of course, if you are building off of master you can maintain a fork that uses this. - Patrick On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond raymond@intel.com wrote: Hi Patrick What does that means for drop YARN 2.2? seems codes are still there. You mean if build upon 2.2 it will break, and won't and work right? Since the home made akka build on scala 2.10 are not there. While, if for this case, can we just use akka 2.3-M1 which run on protobuf 2.5 for replacement? Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 12, 2013 4:21 PM To: dev@spark.incubator.apache.org Subject: Scala 2.10 Merge Hi Developers, In the next few days we are planning to merge Scala 2.10 support into Spark. For those that haven't been following this, Prashant Sharma has been maintaining the scala-2.10 branch of Spark for several months. This branch is current with master and has been reviewed for merging: https://github.com/apache/incubator-spark/tree/scala-2.10 Scala 2.10 support is one of the most requested features for Spark - it will be great to get this into Spark 0.9! Please note that *Scala 2.10 is not binary compatible with Scala 2.9*. With that in mind, I wanted to give a few heads-up/requests to developers: If you are developing applications on top of Spark's master branch, those will need to migrate to Scala 2.10. You may want to download and test the current scala-2.10 branch in order to make sure you will be okay as Spark developments move forward. Of course, you can always stick with the current master commit and be fine (I'll cut a tag when we do the merge in order to delineate where the version changes). Please open new threads on the dev list to report and discuss any issues. This merge will temporarily drop support for YARN 2.2 on the master branch. This is because the workaround we used was only compiled for Scala 2.9. We are going to come up with a more robust solution to YARN 2.2 support before releasing 0.9. Going forward, we will continue to make maintenance releases on branch-0.8 which will remain compatible with Scala 2.9. For those interested, the primary code changes in this merge are upgrading the akka version, changing the use of Scala 2.9's ClassManifest construct to Scala 2.10's ClassTag, and updating the spark shell to work with Scala 2.10's repl. - Patrick
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)
I'll +1 myself also. For anyone who has the slow build problem: does this issue happen when building v0.8.0-incubating also? Trying to figure out whether it's related to something we added in 0.8.1 or if it's a long standing issue. - Patrick On Wed, Dec 11, 2013 at 10:39 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Woah, weird, but definitely good to know. If you’re doing Spark development, there’s also a more convenient option added by Shivaram in the master branch. You can do sbt assemble-deps to package *just* the dependencies of each project in a special assembly JAR, and then use sbt compile to update the code. This will use the classes directly out of the target/scala-2.9.3/classes directories. You have to redo assemble-deps only if your external dependencies change. Matei On Dec 11, 2013, at 1:04 AM, Prashant Sharma scrapco...@gmail.com wrote: I hope this PR https://github.com/apache/incubator-spark/pull/252 can help. Again this is not a blocker for the release from my side either. On Wed, Dec 11, 2013 at 2:14 PM, Mark Hamstra m...@clearstorydata.comwrote: Interesting, and confirmed: On my machine where `./sbt/sbt assembly` takes a long, long, long time to complete (a MBP, in my case), building three separate assemblies (`./sbt/sbt assembly/assembly`, `./sbt/sbt examples/assembly`, `./sbt/sbt tools/assembly`) takes much, much less time. On Wed, Dec 11, 2013 at 12:02 AM, Prashant Sharma scrapco...@gmail.com wrote: forgot to mention, after running sbt/sbt assembly/assembly running sbt/sbt examples/assembly takes just 37s. Not to mention my hardware is not really great. On Wed, Dec 11, 2013 at 1:28 PM, Prashant Sharma scrapco...@gmail.com wrote: Hi Patrick and Matei, Was trying out this and followed the quick start guide which says do sbt/sbt assembly, like few others I was also stuck for few minutes on linux. On the other hand if I use sbt/sbt assembly/assembly it is much faster. Should we change the documentation to reflect this. It will not be great for first time users to get stuck there. On Wed, Dec 11, 2013 at 9:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Built and tested it on Mac OS X. Matei On Dec 10, 2013, at 4:49 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit b87d31d): https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=commit;h=b87d31dd8eb4b4e47c0138e9242d0dd6922c8c4e The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-040/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/ For information about the contents of this release see: https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=CHANGES.txt;h=ce0aeab524505b63c7999e0371157ac2def6fe1c;hb=branch-0.8 Please vote on releasing this package as Apache Spark 0.8.1-incubating! The vote is open until Saturday, December 14th at 01:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.1-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/ -- s -- s -- s
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)
Hey Tom, I re-verified the signatures and got someone else to do it. It seemed fine. Here is what I did. gpg --recv-key 9E4FE3AF wget http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/spark-0.8.1-incubating.tgz.asc wget http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/spark-0.8.1-incubating.tgz gpg --verify spark-0.8.1-incubating.tgz.asc spark-0.8.1-incubating.tgz gpg: Signature made Tue 10 Dec 2013 02:53:15 PM PST using RSA key ID 9E4FE3AF gpg: Good signature from Patrick Wendell pwend...@gmail.com On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra m...@clearstorydata.com wrote: I don't know how to make sense of the numbers, but here's what I've got from a very small sample size. For both v0.8.0-incubating and v0.8.1-incubating, building separate assemblies is faster than `./sbt/sbt assembly` and the times for building separate assemblies for 0.8.0 and 0.8.1 are about the same. For v0.8.0-incubating, `./sbt/sbt assembly` takes about 2.5x as long as the sum of the separate assemblies. For v0.8.1-incubating, `./sbt/sbt assembly` takes almost 8x as long as the sum of the separate assemblies. Weird. On Wed, Dec 11, 2013 at 11:49 AM, Patrick Wendell pwend...@gmail.comwrote: I'll +1 myself also. For anyone who has the slow build problem: does this issue happen when building v0.8.0-incubating also? Trying to figure out whether it's related to something we added in 0.8.1 or if it's a long standing issue. - Patrick On Wed, Dec 11, 2013 at 10:39 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Woah, weird, but definitely good to know. If you’re doing Spark development, there’s also a more convenient option added by Shivaram in the master branch. You can do sbt assemble-deps to package *just* the dependencies of each project in a special assembly JAR, and then use sbt compile to update the code. This will use the classes directly out of the target/scala-2.9.3/classes directories. You have to redo assemble-deps only if your external dependencies change. Matei On Dec 11, 2013, at 1:04 AM, Prashant Sharma scrapco...@gmail.com wrote: I hope this PR https://github.com/apache/incubator-spark/pull/252 can help. Again this is not a blocker for the release from my side either. On Wed, Dec 11, 2013 at 2:14 PM, Mark Hamstra m...@clearstorydata.com wrote: Interesting, and confirmed: On my machine where `./sbt/sbt assembly` takes a long, long, long time to complete (a MBP, in my case), building three separate assemblies (`./sbt/sbt assembly/assembly`, `./sbt/sbt examples/assembly`, `./sbt/sbt tools/assembly`) takes much, much less time. On Wed, Dec 11, 2013 at 12:02 AM, Prashant Sharma scrapco...@gmail.com wrote: forgot to mention, after running sbt/sbt assembly/assembly running sbt/sbt examples/assembly takes just 37s. Not to mention my hardware is not really great. On Wed, Dec 11, 2013 at 1:28 PM, Prashant Sharma scrapco...@gmail.com wrote: Hi Patrick and Matei, Was trying out this and followed the quick start guide which says do sbt/sbt assembly, like few others I was also stuck for few minutes on linux. On the other hand if I use sbt/sbt assembly/assembly it is much faster. Should we change the documentation to reflect this. It will not be great for first time users to get stuck there. On Wed, Dec 11, 2013 at 9:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Built and tested it on Mac OS X. Matei On Dec 10, 2013, at 4:49 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit b87d31d): https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=commit;h=b87d31dd8eb4b4e47c0138e9242d0dd6922c8c4e The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-040/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/ For information about the contents of this release see: https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=CHANGES.txt;h=ce0aeab524505b63c7999e0371157ac2def6fe1c;hb=branch-0.8 Please vote on releasing this package as Apache Spark 0.8.1-incubating! The vote is open until Saturday, December 14th at 01:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.1-incubating [ ] -1 Do not release this package because ... To learn more
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)
I also talked to a few people who got corrupted binaries when downloading from the people.apache HTTP. In that case the checksum failed but if they re-downloaded it worked. So maybe just re-download and try again? On Wed, Dec 11, 2013 at 3:15 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Tom, I re-verified the signatures and got someone else to do it. It seemed fine. Here is what I did. gpg --recv-key 9E4FE3AF wget http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/spark-0.8.1-incubating.tgz.asc wget http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/spark-0.8.1-incubating.tgz gpg --verify spark-0.8.1-incubating.tgz.asc spark-0.8.1-incubating.tgz gpg: Signature made Tue 10 Dec 2013 02:53:15 PM PST using RSA key ID 9E4FE3AF gpg: Good signature from Patrick Wendell pwend...@gmail.com On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra m...@clearstorydata.com wrote: I don't know how to make sense of the numbers, but here's what I've got from a very small sample size. For both v0.8.0-incubating and v0.8.1-incubating, building separate assemblies is faster than `./sbt/sbt assembly` and the times for building separate assemblies for 0.8.0 and 0.8.1 are about the same. For v0.8.0-incubating, `./sbt/sbt assembly` takes about 2.5x as long as the sum of the separate assemblies. For v0.8.1-incubating, `./sbt/sbt assembly` takes almost 8x as long as the sum of the separate assemblies. Weird. On Wed, Dec 11, 2013 at 11:49 AM, Patrick Wendell pwend...@gmail.comwrote: I'll +1 myself also. For anyone who has the slow build problem: does this issue happen when building v0.8.0-incubating also? Trying to figure out whether it's related to something we added in 0.8.1 or if it's a long standing issue. - Patrick On Wed, Dec 11, 2013 at 10:39 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Woah, weird, but definitely good to know. If you’re doing Spark development, there’s also a more convenient option added by Shivaram in the master branch. You can do sbt assemble-deps to package *just* the dependencies of each project in a special assembly JAR, and then use sbt compile to update the code. This will use the classes directly out of the target/scala-2.9.3/classes directories. You have to redo assemble-deps only if your external dependencies change. Matei On Dec 11, 2013, at 1:04 AM, Prashant Sharma scrapco...@gmail.com wrote: I hope this PR https://github.com/apache/incubator-spark/pull/252 can help. Again this is not a blocker for the release from my side either. On Wed, Dec 11, 2013 at 2:14 PM, Mark Hamstra m...@clearstorydata.com wrote: Interesting, and confirmed: On my machine where `./sbt/sbt assembly` takes a long, long, long time to complete (a MBP, in my case), building three separate assemblies (`./sbt/sbt assembly/assembly`, `./sbt/sbt examples/assembly`, `./sbt/sbt tools/assembly`) takes much, much less time. On Wed, Dec 11, 2013 at 12:02 AM, Prashant Sharma scrapco...@gmail.com wrote: forgot to mention, after running sbt/sbt assembly/assembly running sbt/sbt examples/assembly takes just 37s. Not to mention my hardware is not really great. On Wed, Dec 11, 2013 at 1:28 PM, Prashant Sharma scrapco...@gmail.com wrote: Hi Patrick and Matei, Was trying out this and followed the quick start guide which says do sbt/sbt assembly, like few others I was also stuck for few minutes on linux. On the other hand if I use sbt/sbt assembly/assembly it is much faster. Should we change the documentation to reflect this. It will not be great for first time users to get stuck there. On Wed, Dec 11, 2013 at 9:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Built and tested it on Mac OS X. Matei On Dec 10, 2013, at 4:49 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit b87d31d): https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=commit;h=b87d31dd8eb4b4e47c0138e9242d0dd6922c8c4e The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-040/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/ For information about the contents of this release see: https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=CHANGES.txt;h=ce0aeab524505b63c7999e0371157ac2def6fe1c;hb=branch-0.8 Please vote on releasing
[VOTE] Release Apache Spark 0.8.1-incubating (rc2)
Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit bf23794a): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-024/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/ For information about the contents of this release see: attached draft of release notes attached draft of release credits https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt Please vote on releasing this package as Apache Spark 0.8.1-incubating! The vote is open until Wednesday, December 11th at 21:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.1-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/ Michael Armbrust -- build fix Pierre Borckmans -- typo fix in documentation Evan Chan -- added `local://` scheme for dependency jars Ewen Cheslack-Postava -- `add` method for python accumulators, support for setting config properties in python Mosharaf Chowdhury -- optimized broadcast implementation Frank Dai -- documentation fix Aaron Davidson -- lead on shuffle file consolidation, lead on h/a mode for standalone scheduler, cleaned up representation of block id’s, several small improvements and bug fixes Tathagata Das -- new streaming operators: `transformWith`, `leftInnerJoin`, and `rightOuterJoin`, fix for kafka concurrency bug Ankur Dave -- support for pausing spot clusters on EC2 Harvey Feng -- optimization to JobConf broadcasts, minor fixes, lead on YARN 2.2 build Ali Ghodsi -- scheduler support for SIMR, lead on YARN 2.2 build Thomas Graves -- lead on Spark YARN integration including secure HDFS access over YARN Li Guoqiang -- fix for maven build Stephen Haberman -- bug fix Haidar Hadi -- documentation fix Nathan Howell -- bug fix relating to YARN Holden Karau -- java version of `mapPartitionsWithIndex` Du Li -- bug fix in make-distrubion.sh Xi Lui -- bug fix and code clean-up David McCauley -- bug fix in standalone mode JSON output Michael (wannabeast) -- bug fix in memory store Fabrizio Milo -- typos in documentation, minor clean-up in DAGScheduler, typo in scaladoc Mridul Muralidharan -- fixes to meta-data cleaner and speculative scheduler Sundeep Narravula -- build fix, bug fixes in scheduler and tests, minor code clean-up Kay Ousterhout -- optimization to task result fetching, extensive code clean-up and refactoring (task schedulers, thread pools), result-fetching state in UI, showing task and attempt it in UI, several bug fixes in scheduler, UI, and unit tests Nick Pentreath -- implicit feedback variant of ALS algorithm Imran Rashid -- small improvement to executor launch Ahir Reddy -- spark support for SIMR Josh Rosen -- reduced memory overhead for BlockInfo objects, clean up of BlockManager code, fix to java API auditor, code clean-up in java API, and bug fixes in python API Henry Saputra -- build fix Jerry Shao -- refactoring of fair scheduler, support for running spark as a specific user, bug fix Mingfei Shi -- documentation for JobLogger Andre Schumacher -- sortByKey in pyspark and associated changes Karthik Tunga -- bug fix in launch script Patrick Wendell -- added `repartition` operator, logging improvements, instrumentation for shuffle write, documentation improvements, fix for streaming example, and release management Neal Wiggins -- minor import clean-up, documentation typo Andrew Xia -- bug fix in UI Reynold Xin -- optimized hash set and hash tables for primitive types, task killing, support for setting job properties in repl, logging improvements, Kryo improvements, several bug fixes, and general clean-up Matei Zaharia -- optimized hashmap for shuffle data, pyspark documentation, optimizations to kryo and chill serializers Wu Zeming -- bug fix in executors UI DRAFT OF RELEASE NOTES FOR SPARK 0.8.1 Apache Spark 0.8.1 is a maintenance release including several bug fixes and performance optimizations. It also includes a few new features. Contributions to 0.8.1 came from 40 developers. == High availability mode for standalone scheduler == The standalone scheduler now has a High Availability (H/A) mode which can tolerate master failures. This is particularly useful for long-running applications such as streaming jobs and the shark server, where the scheduler master previous represented
Re: [DISCUSS] About the [VOTE] Release Apache Spark 0.8.1-incubating (rc1)
Hey Mark, One constructive action you and other people can take to help us assess the quality and completeness of this release is to download the release, run the tests, run the release in your dev environment, read through the documentation, etc. This is one of the main points of releasing an RC to the community... even if you disagree with some patches that were merged in, this is still a way you can help validate the release. - Patrick On Sun, Dec 8, 2013 at 1:30 PM, Mark Hamstra m...@clearstorydata.com wrote: I'm aware of the changes file, but it really doesn't address the issue that I am raising. The changes file just tells me what has gone into the release candidate. In general, it doesn't tell me why those changes went in or provide any rationale by which to judge whether that is the complete set of changes that should go in. I talked some with Matei about related versioning and release issues last week, and I've raised them in other contexts previously, but I'm taking the liberty to annoy people again because I really am not happy with our current versioning and release process, and I really am of the opinion that we've got to start doing much better before I can vote in favor of a 1.0 release. I fully realize that this is not a 1.0 release, and that because we are pre-1.0 we still have a lot of flexibility with releases that break backward or forward compatibility and with version numbers that have nothing like the semantic meaning that they will eventually need to have; but it is not going to be easy to change our process and culture so that we produce the kind of stability and reliability that Spark users need to be able to depend upon and version numbers that clearly communicate what those users expect them to mean. I think that we should start making those changes now. Just because we have flexibility pre-1.0, that doesn't mean that we shouldn't start training ourselves now to work within the constraints of post-1.0 Spark. If I'm to be happy voting for an eventual 1.0 release candidate, I'll need to have seen at least one full development cycle that already adheres to the post-1.0 constraints, demonstrating the maturity of our development process. That demonstration cycle is clearly not this one -- and I understand that there were some compelling reasons (particularly with regard too getting a full release of Spark based on Scala 2.9.3 before we make the jump to 2.10. This patch-level release breaks binary compatibility and contains a lot of code that isn't anywhere close to meeting the criterion for inclusion in a real, post-1.0 patch-level release: essentially changes that every, or nearly every, existing Spark user needs (not just wants), and that work with all existing and future binaries built with the prior patch-level version of Spark as a dependency. Like I said, we are clearly nowhere close to that with the move from 0.8.0 to 0.8.1; but I also haven't been able to recognize any alternative criterion by which to judge the quality and completeness of this release candidate. Maybe there just isn't one, and I'm just going to have to swallow my concerns while watching 0.8.1 go out the door; but if we don't start doing better on this kind of thing in the future, you are going to start hearing more complaining from me. I just hope that it doesn't get to the point where I feel compelled to actively oppose an eventual 1.0 release candidate. On Sun, Dec 8, 2013 at 12:37 PM, Henry Saputra henry.sapu...@gmail.comwrote: Ah, sorry for the confusion Patrick, like you said I was just trying to let people aware about this file and the purpose of it. On Sunday, December 8, 2013, Patrick Wendell wrote: Hey Henry, Are you suggesting we need to change something about or changes file? Or are you just pointing people to the file? - Patrick On Sun, Dec 8, 2013 at 11:37 AM, Henry Saputra henry.sapu...@gmail.com wrote: HI Spark devs, I have modified the Subject to avoid polluting the VOTE thread since it related to more info how and which commits merge back to 0.8.* branch. Please respond to the previous question to this thread. Technically the CHANGES.txt [1] file should describe the changes in a particular release and it is the main requirement needed to cut an ASF release. - Henry [1] https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt On Sun, Dec 8, 2013 at 12:03 AM, Josh Rosen rosenvi...@gmail.com wrote: We can use git log to figure out which changes haven't made it into branch-0.8. Here's a quick attempt, which only lists pull requests that were only merged into one of the branches. For completeness, this could be extended to find commits that weren't part of a merge and are only present in one branch. *Script:* MASTER_BRANCH=origin/master RELEASE_BRANCH=origin/branch-0.8 git log --oneline --grep Merge pull
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)
Hey Take, Could you start a separate thread to debug your build issue? In that thread, could you paste the exact build command and entire output? The log you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0 based on the assembly file name it is logging. --- sent from my phone On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble completing the build process (sbt/sbt assembly) on Macbook. The sbt command hangs at the last step. ... ... [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395 [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar ... [info] SHA-1: 0657a347240266230247693f265a5797d40c326a [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar ... (hangs here) -- On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I was able to build it successfully. .. .. [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534 [info] Packaging /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar ... [info] Done packaging. [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM -- On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit bf23794a): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-024/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/ For information about the contents of this release see: attached draft of release notes attached draft of release credits https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt Please vote on releasing this package as Apache Spark 0.8.1-incubating! The vote is open until Wednesday, December 11th at 21:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.1-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)
For my own part I'll give a +1 to this RC. On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: OK. I will post the entire output via separate email. I just upgraded Hadoop to 2.2.0 recently. So there might be something I need to remove/clean up. On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Take, Could you start a separate thread to debug your build issue? In that thread, could you paste the exact build command and entire output? The log you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0 based on the assembly file name it is logging. --- sent from my phone On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble completing the build process (sbt/sbt assembly) on Macbook. The sbt command hangs at the last step. ... ... [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395 [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar ... [info] SHA-1: 0657a347240266230247693f265a5797d40c326a [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar ... (hangs here) -- On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I was able to build it successfully. .. .. [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534 [info] Packaging /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar ... [info] Done packaging. [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM -- On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit bf23794a): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-024/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/ For information about the contents of this release see: attached draft of release notes attached draft of release credits https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt Please vote on releasing this package as Apache Spark 0.8.1-incubating! The vote is open until Wednesday, December 11th at 21:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.1-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)
Hey Mark - ya this would be good to get in. Does merging that particular PR put this in sufficient shape for the 0.8.1 release or are there other open patches we need to look at? - Patrick On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra m...@clearstorydata.com wrote: SPARK-962 should be resolved before release. See also: https://github.com/apache/incubator-spark/pull/195 With the references to the way I changed Debian packaging for ClearStory, we should be at least 90% of the way toward doing it right for Apache. On Sun, Dec 8, 2013 at 5:29 PM, Patrick Wendell pwend...@gmail.com wrote: For my own part I'll give a +1 to this RC. On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: OK. I will post the entire output via separate email. I just upgraded Hadoop to 2.2.0 recently. So there might be something I need to remove/clean up. On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Take, Could you start a separate thread to debug your build issue? In that thread, could you paste the exact build command and entire output? The log you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0 based on the assembly file name it is logging. --- sent from my phone On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble completing the build process (sbt/sbt assembly) on Macbook. The sbt command hangs at the last step. ... ... [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395 [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar ... [info] SHA-1: 0657a347240266230247693f265a5797d40c326a [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar ... (hangs here) -- On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I was able to build it successfully. .. .. [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534 [info] Packaging /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar ... [info] Done packaging. [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM -- On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit bf23794a): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-024/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/ For information about the contents of this release see: attached draft of release notes attached draft of release credits https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt Please vote on releasing this package as Apache Spark 0.8.1-incubating! The vote is open until Wednesday, December 11th at 21:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.1-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)
Looked into this a bit more - I think removing repl-bin is something we should wait until 0.9 to do, because we've published it to maven in 0.8.0 and people might expect it to be there in 0.8.1. Merging the directly referenced pull request (195) seems like a good idea though since it fixes a bug in the script. Is that what you are suggesting? - Patrick On Sun, Dec 8, 2013 at 7:30 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mark - ya this would be good to get in. Does merging that particular PR put this in sufficient shape for the 0.8.1 release or are there other open patches we need to look at? - Patrick On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra m...@clearstorydata.com wrote: SPARK-962 should be resolved before release. See also: https://github.com/apache/incubator-spark/pull/195 With the references to the way I changed Debian packaging for ClearStory, we should be at least 90% of the way toward doing it right for Apache. On Sun, Dec 8, 2013 at 5:29 PM, Patrick Wendell pwend...@gmail.com wrote: For my own part I'll give a +1 to this RC. On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: OK. I will post the entire output via separate email. I just upgraded Hadoop to 2.2.0 recently. So there might be something I need to remove/clean up. On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Take, Could you start a separate thread to debug your build issue? In that thread, could you paste the exact build command and entire output? The log you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0 based on the assembly file name it is logging. --- sent from my phone On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble completing the build process (sbt/sbt assembly) on Macbook. The sbt command hangs at the last step. ... ... [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395 [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar ... [info] SHA-1: 0657a347240266230247693f265a5797d40c326a [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar ... (hangs here) -- On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I was able to build it successfully. .. .. [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534 [info] Packaging /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar ... [info] Done packaging. [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM -- On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit bf23794a): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-024/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/ For information about the contents of this release see: attached draft of release notes attached draft of release credits https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt Please vote on releasing this package as Apache Spark 0.8.1-incubating! The vote is open until Wednesday, December 11th at 21:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.1-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)
Hey Mark, What I'm asking is whether this patch is sufficient to have a working debian build in 0.8.1, or are there other outstanding issues to make it work? By working I mean, within the initial design that was contributed (with repl-bin) it works according to that approach. We can redesign this packaging in 0.9. That will require having a PR against Apache Spark, discussing, etc. But it doesn't need to be on the critical path for this release. - Patrick On Sun, Dec 8, 2013 at 7:54 PM, Mark Hamstra m...@clearstorydata.com wrote: Whatever Debian package gets built has to work, so that's the first requirement. I don't know how to decide whether a change is acceptable in 0.8 or has to wait until 0.9, but the 0.9 packaging should definitely leverage the assembly sub-project, making repl-bin unnecessary. On Sun, Dec 8, 2013 at 7:46 PM, Patrick Wendell pwend...@gmail.com wrote: Looked into this a bit more - I think removing repl-bin is something we should wait until 0.9 to do, because we've published it to maven in 0.8.0 and people might expect it to be there in 0.8.1. Merging the directly referenced pull request (195) seems like a good idea though since it fixes a bug in the script. Is that what you are suggesting? - Patrick On Sun, Dec 8, 2013 at 7:30 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mark - ya this would be good to get in. Does merging that particular PR put this in sufficient shape for the 0.8.1 release or are there other open patches we need to look at? - Patrick On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra m...@clearstorydata.com wrote: SPARK-962 should be resolved before release. See also: https://github.com/apache/incubator-spark/pull/195 With the references to the way I changed Debian packaging for ClearStory, we should be at least 90% of the way toward doing it right for Apache. On Sun, Dec 8, 2013 at 5:29 PM, Patrick Wendell pwend...@gmail.com wrote: For my own part I'll give a +1 to this RC. On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: OK. I will post the entire output via separate email. I just upgraded Hadoop to 2.2.0 recently. So there might be something I need to remove/clean up. On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Take, Could you start a separate thread to debug your build issue? In that thread, could you paste the exact build command and entire output? The log you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0 based on the assembly file name it is logging. --- sent from my phone On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble completing the build process (sbt/sbt assembly) on Macbook. The sbt command hangs at the last step. ... ... [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395 [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar ... [info] SHA-1: 0657a347240266230247693f265a5797d40c326a [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar ... (hangs here) -- On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I was able to build it successfully. .. .. [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534 [info] Packaging /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar ... [info] Done packaging. [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM -- On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1. The tag to be voted on is v0.8.1-incubating (commit bf23794a): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-024/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/ For information about the contents
Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc2)
Hey Mark, Okay if 195 gets this in working order in the branch 0.8 let's just merge that to keep it consistent with our docs and the way this is done in 0.8.0 We can do a broader refactoring in 0.9. Would be great if you could kick off a JIRA discussion or submit a PR relating to that. - Patrick On Sun, Dec 8, 2013 at 8:07 PM, Mark Hamstra m...@clearstorydata.com wrote: Well, 195 is sufficient to give you something that runs, but it doesn't run the same way as Spark built/distributed by other means -- e.g., after 195 the package still uses something equivalent to the old `run` script instead of the current `spark-class` way. On Sun, Dec 8, 2013 at 8:02 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mark, What I'm asking is whether this patch is sufficient to have a working debian build in 0.8.1, or are there other outstanding issues to make it work? By working I mean, within the initial design that was contributed (with repl-bin) it works according to that approach. We can redesign this packaging in 0.9. That will require having a PR against Apache Spark, discussing, etc. But it doesn't need to be on the critical path for this release. - Patrick On Sun, Dec 8, 2013 at 7:54 PM, Mark Hamstra m...@clearstorydata.com wrote: Whatever Debian package gets built has to work, so that's the first requirement. I don't know how to decide whether a change is acceptable in 0.8 or has to wait until 0.9, but the 0.9 packaging should definitely leverage the assembly sub-project, making repl-bin unnecessary. On Sun, Dec 8, 2013 at 7:46 PM, Patrick Wendell pwend...@gmail.com wrote: Looked into this a bit more - I think removing repl-bin is something we should wait until 0.9 to do, because we've published it to maven in 0.8.0 and people might expect it to be there in 0.8.1. Merging the directly referenced pull request (195) seems like a good idea though since it fixes a bug in the script. Is that what you are suggesting? - Patrick On Sun, Dec 8, 2013 at 7:30 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mark - ya this would be good to get in. Does merging that particular PR put this in sufficient shape for the 0.8.1 release or are there other open patches we need to look at? - Patrick On Sun, Dec 8, 2013 at 6:05 PM, Mark Hamstra m...@clearstorydata.com wrote: SPARK-962 should be resolved before release. See also: https://github.com/apache/incubator-spark/pull/195 With the references to the way I changed Debian packaging for ClearStory, we should be at least 90% of the way toward doing it right for Apache. On Sun, Dec 8, 2013 at 5:29 PM, Patrick Wendell pwend...@gmail.com wrote: For my own part I'll give a +1 to this RC. On Sun, Dec 8, 2013 at 4:30 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: OK. I will post the entire output via separate email. I just upgraded Hadoop to 2.2.0 recently. So there might be something I need to remove/clean up. On Sun, Dec 8, 2013 at 4:24 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Take, Could you start a separate thread to debug your build issue? In that thread, could you paste the exact build command and entire output? The log you posted here suggests the first build detected hadoop 1.0.4 not 2.2.0 based on the assembly file name it is logging. --- sent from my phone On Dec 8, 2013 4:13 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: With Hadoop 2.2.0 ( Java 1.7.0_45) installed, I'm having trouble completing the build process (sbt/sbt assembly) on Macbook. The sbt command hangs at the last step. ... ... [info] SHA-1: ce8275f5841002164c4305c912a2892ec7c1d395 [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/tools/target/scala-2.9.3/spark-tools-assembly-0.8.1-incubating.jar ... [info] SHA-1: 0657a347240266230247693f265a5797d40c326a [info] Packaging /Users/taka/Documents/Spark/Releases/spark-0.8.1-incubating-rc2/assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop1.0.4.jar ... (hangs here) -- On another Macbook with Hadoop 1.1.1 ( Java 1.7.0_45) installed, I was able to build it successfully. .. .. [info] SHA-1: 77109cd085bd4f0d2b601b3451b35b961d357534 [info] Packaging /Users/tshinagawa/Documents/Spark/RCs/spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar ... [info] Done packaging. [success] Total time: 266 s, completed Dec 8, 2013 3:03:10 PM -- On Sun, Dec 8, 2013 at 12:41 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate
Re: difference between 'fetchWaitTime' and 'remoteFetchTime'
Hey Umar, I dug into this a bit today out of curiosity since I also wasn't sure. I updated the in-line documentation here: https://github.com/apache/incubator-spark/pull/209/files The more important metric is `fetchWaitTime` which indicates how much of the task runtime was spent waiting for input data. remoteFetchTime is an aggregation of all of the fetch delays for each block... this second metric is a bit more convoluted because those fetches can actually overlap, so if this is high it doesn't necessarily indicate any latency hit. - Patrick On Mon, Nov 25, 2013 at 1:23 PM, Umar Javed umarj.ja...@gmail.com wrote: Any clarification on this? thanks. On Wed, Nov 20, 2013 at 3:02 PM, Umar Javed umarj.ja...@gmail.com wrote: In the class ShuffleReadMetrics in executor/TaskMetrics.scala, there are two variables: 1) fetchWaitTime: /** * Total time that is spent blocked waiting for shuffle to fetch data */ 2) remoteFetchTime /** * The total amount of time for all the shuffle fetches. This adds up time from overlapping * shuffles, so can be longer than task time */ As I understand it, the difference between these two is that fetchWaitTime is remoteFetchTime without the overlapped time counted exactly once. Is that right? Can somebody explain the difference better? thanks!
Re: Documenting the release process for Apache Spark
Hey Henry, I did create release notes for this. However, I wanted to dogfood them for the 0.8.1 release before I push them publicly, just so I know the thing is actually comprehensive. It's quite complicated and I don't want to publish something that leads people down the wrong path. My thought was I would use these personally for the 0.8.1 release to verify them, then publish them and try to have someone else do the 0.9.0 release (perhaps wishful thinking!). - Patrick On Thu, Nov 7, 2013 at 12:09 PM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, Did you end up writing up the steps you were taking to generate the Apache Spark release to provide help to the next Apache Spark RE? I remember you were trying to create one after we released 0.8 Thanks, - Henry
Re: Getting failures in FileServerSuite
This may have been caused by a recent merge since a bunch of people independently hit it in the last 48 hours. One debugging step would be to narrow it down to which merge caused it. I don't have time personally today, but just a suggestion for ppl for whom this is blocking progress. - Patrick On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra m...@clearstorydata.com wrote: What JDK version on you using, Evan? I tried to reproduce your problem earlier today, but I wasn't even able to get through the assembly build -- kept hanging when trying to build the examples assembly. Foregoing the assembly and running the tests would hang on FileServerSuite Dynamically adding JARS locally -- no stack trace, just hung. And I was actually seeing a very similar stack trace to yours from a test suite of our own running against 0.8.1-SNAPSHOT -- not exactly the same because line numbers were different once it went into the java runtime, and it eventually ended up someplace a little different. That got me curious about differences in Java versions, so I updated to the latest Oracle release (1.7.0_45). Now it cruises right through the build and test of Spark master from before Matei merged your PR. Then I logged into a machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually) installed, and I'm right back to the hanging during the examples assembly (but passes FileServerSuite, oddly enough.) Upgrading the JDK didn't improve the results of the ClearStory test suite I was looking at, so my misery isn't over; but yours might be with a newer JDK On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan e...@ooyala.com wrote: Must be a local environment thing, because AmpLab Jenkins can't reproduce it. :-p On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen rosenvi...@gmail.com wrote: Someone on the users list also encountered this exception: https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan e...@ooyala.com wrote: I'm at the latest commit f0e23a023ce1356bc0f04248605c48d4d08c2d05 Merge: aec9bf9 a197137 Author: Reynold Xin r...@apache.org Date: Tue Oct 29 01:41:44 2013 -0400 and seeing this when I do a test-only FileServerSuite: 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0) 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to java.io.StreamCorruptedException java.io.StreamCorruptedException: invalid type code: AC at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39) at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27) at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94) at org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237) at org.apache.spark.rdd.RDD.iterator(RDD.scala:226) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Anybody else seen this yet? I have a really simple PR and this fails without my change, so I may go ahead and submit it anyways. -- -- Evan Chan Staff Engineer e...@ooyala.com | -- -- Evan Chan Staff Engineer e...@ooyala.com |
Re: Are we moving too fast or too far on 0.8.1-SNAPSHOT?
Shark is not a great example in general because it uses semi-private internal interfaces that are not guaranteed to be compatible within minor releases. Spark's public, documented API has always (AFAIK) maintained compatibility within minor versions. In fact, we've been diligent to maintain compatibility with major versions as well and there have only been very minute changes in that API. Over time it would be good for Shark to migrate to using higher API's (and we may need to build these). But my point is that the public API has maintained compatibility consistent with the norms discussed here. - Patrick On Mon, Oct 28, 2013 at 3:50 PM, Jey Kottalam j...@cs.berkeley.edu wrote: I agree that we should strive to maintain full backward compatibility between patch releases (i.e. incrementing the z in version x.y.z). On Mon, Oct 28, 2013 at 3:22 PM, Mark Hamstra m...@clearstorydata.com wrote: Or more to the point: What is our commitment to backward compatibility in point releases? Many Java developers will come to a library or platform versioned as x.y.z with the expectation that if their own code worked well using x.y.(z-1) as a dependency, then moving up to x.y.z will be painless and trivial. That is not looking like it will be the case for Spark 0.8.0 and 0.8.1. We only need to look at Shark as an example of code built with a dependency on Spark to see the problem. Shark 0.8.0 works with Spark 0.8.0. Shark 0.8.0 does not build with Spark 0.8.1-SNAPSHOT. Presumably that lack of backwards compatibility will continue into the eventual release of Spark 0.8.1, and that makes life hard on developers using Spark and Shark. For example, a developer using the released version of Shark but wanting to pick up the bug fixes in Spark doesn't have a good option anymore since 0.8.1-SNAPSHOT (or the eventual 0.8.1 release) doesn't work, and moving to the wild and woolly development on the master branches of Spark and Shark is not a good idea for someone trying to develop production code. In other words, all of the bug fixes in Spark 0.8.1 are not accessible to this developer until such time as there are available 0.8.1-compatible versions of Shark and anything else built on Spark that this developer is using. The only other option is trying to cherry-pick commits from, e.g., Shark 0.9.0-SNAPSHOT into Shark 0.8.0 until Shark 0.8.0 has been brought up to a point where it works with Spark 0.8.1. But an application developer shouldn't need to do that just to get the bug fixes in Spark 0.8.1, and it is not immediately obvious just which Shark commits are necessary and sufficient to produce a correct, Spark-0.8.1-compatible version of Shark (indeed, there is no guarantee that such a thing is even possible.) Right now, I believe that 67626ae3eb6a23efc504edf5aedc417197f072cf, 488930f5187264d094810f06f33b5b5a2fde230a and bae19222b3b221946ff870e0cee4dba0371dea04 are necessary to get Shark to work with Spark 0.8.1-SNAPSHOT, but that those commits are not sufficient (Shark builds against Spark 0.8.1-SNAPSHOT with those cherry-picks, but I'm still seeing runtime errors.) In short, this is not a good situation, and we probably need a real 0.8 maintenance branch that maintains backward compatibility with 0.8.0, because (at least to me) the current branch-0.8 of Spark looks more like another active development branch (in addition to the master and scala-2.10 branches) than it does a maintenance branch.
Re: Suggestion/Recommendation for language bindings
I think Ruby integration via JRuby would be a great idea. On Tue, Oct 15, 2013 at 9:45 AM, Ryan Weald r...@weald.com wrote: Writing a JRuby wrapper around the existing Java bindings would be pretty cool. Could help to get some of the Ruby community to start using the Spark platform. -Ryan On Mon, Oct 14, 2013 at 12:07 PM, Aaron Babcock aaron.babc...@gmail.comwrote: Hey Laksh, Not sure if you are interested in groovy at all, but I've got the beginning of a project here: https://github.com/bunions1/groovy-spark-example The idea is to map groovy idioms: myRdd.collect{ row - newRow } to spark api calls myRdd.map( row = newRow) and support a good repl. Its not officially related to spark at all and is very early stage but maybe it will be a point of reference for you. On Mon, Oct 14, 2013 at 12:42 PM, Laksh Gupta glaks...@gmail.com wrote: Hi I am interested in contributing to the project and want to start with supporting a new programming language on Spark. I can see that Spark already support Java and Python. Would someone provide me some suggestion/references to start with? I think this would be a great learning experince for me. Thank you in advance. -- - Laksh Gupta
Re: Spark 0.8.0: bits need to come from ASF infrastructure
Yep, we definitely need to just directly point people the location at apache.org where they can find the hashes. I just updated the release notes and downloads page to point to that site. I just wanted to point out that mirroring these through a CDN seems philosophically the same as mirroring through Apache, since in neither case do we expect the users to trust the artifact they download. We just need to be more explicit that we are, indeed, mirroring and explain that the trusted root is at apache.org - Patrick On Wed, Sep 25, 2013 at 3:56 PM, Roman Shaposhnik r...@apache.org wrote: On Wed, Sep 25, 2013 at 3:48 PM, Patrick Wendell pwend...@gmail.com wrote: Hey we've actually distributed our artifacts through amazon cloudfront in the past (and that is where the website links redirect to). Since the apache mirrors don't distribute signatures anyways, True, but apache dist does. IOW, it is not uncommon for those having an automated build/fetching systems to get bits from one of the mirrors and then get the hashes directly from dist. In your current case, I don't think I know of a way to do that. Now, you may say that the current CDN you guys are you using is functioning like a mirror -- well, I'd say that it needs to be called out like one then. Otherwise, as a naive user I *really* have to guess where to get the hashes. what is the difference between linking to an apache mirror vs using a more robust CDN? If people want to verify the downloads they need to go to the apache root in either case. Is this just a cultural thing or is there some security reason? A bit of both I guess. Thanks, Roman.
Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)
Henry - one thing is that, because the filenames are not included in the signatures, I could just alter the filenames now to not include -RCX... would that be preferable or would that necessitate another vote? - Patrick On Fri, Sep 20, 2013 at 6:39 AM, Henry Saputra henry.sapu...@gmail.com wrote: The RC should be just the directory where the artifact live but the final name should omit the RCxx Hmm not sure if IPMCs will be picky about this but should not be blocker to release. - Henry On Thu, Sep 19, 2013 at 8:17 PM, Patrick Wendell pwend...@gmail.comjavascript:; wrote: Hey Roman, We can do this in the future - I wasn't sure exactly what the right standard approach was. Just so I understand, the change you are proposing from what is there now is just to remove rcX from the file-names, correct? - Patrick On Thu, Sep 19, 2013 at 8:06 PM, Roman Shaposhnik r...@apache.orgjavascript:; wrote: On Thu, Sep 19, 2013 at 5:56 PM, Patrick Wendell pwend...@gmail.comjavascript:; wrote: FYI this vote ends in 8 hours. I was going to test it on a fully distributed Bigtop cluster, but hit a few snags. That now will extend into the weekend. Of course, that's not that big of a deal -- I can always vote on incubator general once you guys move the vote over there. The only minor nit for the future I've noticed is that I would highly encourage you to follow the usual RC practices where you name all of your artifact as final bits and have a subdirectory that reflects the RC name. E.g. here's how a very recent Hadoop RC looks like: http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0/ Thanks, Roman.
[RESULT] [VOTE] Release Apache Spark 0.8.0-incubating (RC6)
The vote is now closed. Below are the vote totals. +1 (7 Total) Andy Konwinski Matei Zaharia Patrick Wendell Konstantin Boudnik Reynold Xin Chris Mattmann* Henry Saputra* 0 (1 Total) Mark Hamstra -1 (0 Total) * = Binding Vote As per the incubator release guide [1] I'll be sending this to the general incubator list for a final vote from IPMC members. [1] http://incubator.apache.org/guides/releasemanagement.html#best-practice-incubator-release-vote - Patrick -- Forwarded message -- From: Roman Shaposhnik r...@apache.org Date: Fri, Sep 20, 2013 at 8:10 AM Subject: Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6) To: dev@spark.incubator.apache.org On Thu, Sep 19, 2013 at 8:17 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Roman, We can do this in the future - I wasn't sure exactly what the right standard approach was. Just so I understand, the change you are proposing from what is there now is just to remove rcX from the file-names, correct? Right. Basically your artifacts should look exactly like what is going to be released when the vote passes. Like I said -- it is a small nit, but it makes it easier for the guys like me to test the RCs in the automated manner. Thanks, Roman.
Re: [RESULT] [VOTE] Release Apache Spark 0.8.0-incubating (RC6)
Hey Henry, Sounds good. I'll send an email to general@ shortly. I didn't realize that this vote technically counts as passing according to those rules (since plenty of PPMC gave +1). On Fri, Sep 20, 2013 at 1:30 PM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks to Patrick for driving the first Apache Spark release. Great job so far. A bit clarification, the release VOTE passes with more than 3 +1 binding votes from Apache Spark Podling Project Management (PPMC): +1 (7 Total) Andy Konwinski Matei Zaharia Patrick Wendell Reynold Xin Chris Mattmann* Henry Saputra* (* indicates IPMC) Since Spark is under ASF incubator we need to send another VOTE to general@i.a.o list. From the ASF release management page: It is Apache policy that all releases be formally approved by the responsible PMC. In the case of the incubator, the IPMC must approve all releases. That means there is an additional bit of voting that the release manager must now oversee on general@incubator in order to gain that approval. The release manager must inform general@incubator that the vote has passed on the podling's development list, and should indicate any IPMC votes gained during that process. A new vote on the release candidate artifacts must now be held on general@incubator to seek majority consensus from the IPMC. Previous IPMC votes issued on the project's development list count towards that goal. Even if there are sufficient IPMC votes already, it is vital that the IPMC as whole is informed via a VOTE e-mail on general@incubator. We have 2 IPMCs Vote already so technically we need one more unless we got veto votes against the release. - Henry On Fri, Sep 20, 2013 at 11:43 AM, Patrick Wendell pwend...@gmail.com wrote: The vote is now closed. Below are the vote totals. +1 (7 Total) Andy Konwinski Matei Zaharia Patrick Wendell Konstantin Boudnik Reynold Xin Chris Mattmann* Henry Saputra* 0 (1 Total) Mark Hamstra -1 (0 Total) * = Binding Vote As per the incubator release guide [1] I'll be sending this to the general incubator list for a final vote from IPMC members. [1] http://incubator.apache.org/guides/releasemanagement.html#best-practice-incubator-release-vote - Patrick -- Forwarded message -- From: Roman Shaposhnik r...@apache.org Date: Fri, Sep 20, 2013 at 8:10 AM Subject: Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6) To: dev@spark.incubator.apache.org On Thu, Sep 19, 2013 at 8:17 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Roman, We can do this in the future - I wasn't sure exactly what the right standard approach was. Just so I understand, the change you are proposing from what is there now is just to remove rcX from the file-names, correct? Right. Basically your artifacts should look exactly like what is going to be released when the vote passes. Like I said -- it is a small nit, but it makes it easier for the guys like me to test the RCs in the automated manner. Thanks, Roman.
Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)
Hey Chris the tag in github is 3b85a85, which I listed in the original vote next to the git URL. Is there another type of tag I should be adding? On Thu, Sep 19, 2013 at 7:20 PM, Chris Mattmann mattm...@apache.org wrote: I'm currently downloading the RC (all 127mb of the bin; then onto source). I have a generic set of Incubator scripts so should go fine after that. I'm giving you a preview of my minor nit: We don't VOTE on github URLs -- we VOTE on ASF URLs (e.g., the tag). That should be corrected in future RC emails. If all checks out, should be +1 shortly. -Original Message- From: Patrick Wendell pwend...@gmail.com Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Date: Thursday, September 19, 2013 5:56 PM To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Subject: Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6) FYI this vote ends in 8 hours. On Wed, Sep 18, 2013 at 8:56 PM, Reynold Xin r...@cs.berkeley.edu wrote: +1 -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Wed, Sep 18, 2013 at 11:06 AM, Konstantin Boudnik c...@apache.org wrote: Maven package could be run with -DskipTests that will simply build... well, the package. +1 on the RC. The nits are indeed minor. Cos On Tue, Sep 17, 2013 at 07:20PM, Matei Zaharia wrote: In Maven, mvn package should also create the assembly, but the non-obvious thing is that it needs to happen for all projects before mvn test for core works. Unfortunately I don't know any easy way around that. Matei On Sep 17, 2013, at 1:46 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mark, Good catches here. Ya the driver suite thing is sorta annoying - we should try to fix that in master. The audit script I wrote first does an sbt/sbt assembly to avoid this. I agree though these shouldn't block the release (if a blocker does come up we can revisit these potentially when cutting a release). - Patrick On Tue, Sep 17, 2013 at 1:26 PM, Mark Hamstra m...@clearstorydata.com wrote: There are a few nits left to pick: 'sbt/sbt publish-local' isn't generating correct POM files because of the way the exclusions are defined in SparkBuild.scala using wildcards; looks like there may be some broken doc links generated in that task, as well; DriverSuite doesn't like to run from the maven build, complaining that 'sbt/sbt assembly' needs to be run first. None of these is enough for me to give RC6 a -1. On Tue, Sep 17, 2013 at 11:28 AM, Matei Zaharia matei.zaha...@gmail.comwrote: +1 Tried new staging repo to make sure the issue with RC5 is fixed. Matei On Sep 17, 2013, at 2:03 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit 3b85a85): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/files/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-059/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Friday, September 20th at 09:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
[VOTE] Release Apache Spark 0.8.0-incubating (RC6)
Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit 3b85a85): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/files/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-059/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Friday, September 20th at 09:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)
Thanks for the feedback guys. I've changed the audit script to fix Andy's suggestion. I also added tests for building sbt and maven projects against the staged repository to test that artifacts are setup correctly in maven. I've posted RC6 which adds a very small change to this RC. This vote is therefore cancelled in favor of RC6. - Patrick On Mon, Sep 16, 2013 at 9:47 PM, Andy Konwinski andykonwin...@gmail.com wrote: Patrick, I took a quick look over your release_auditor.py script and it's really great! Then I ran it (had to add --keyserver pgp.mit.edu to the gpg command) and everything passed on OS X! Great job and +1 from me whenever you resolve the kafka jar issue you mentioned. Andy On Mon, Sep 16, 2013 at 8:37 PM, Matei Zaharia matei.zaha...@gmail.comwrote: FWIW, I tested it otherwise and it seems good modulo this issue. Matei On Sep 16, 2013, at 6:39 PM, Patrick Wendell pwend...@gmail.com wrote: Hey folks, just FYI we found one minor issue with this RC (the kafka jar in the stream pom needs to be published as provided since it's not available in maven). Please still continue to test this and provide feedback here until the following RC is posted later. - Patrick On Mon, Sep 16, 2013 at 1:28 PM, Reynold Xin r...@cs.berkeley.edu wrote: +1 -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Sun, Sep 15, 2013 at 11:09 PM, Patrick Wendell pwend...@gmail.com wrote: I also wrote an audit script [1] to verify various aspects of the release binaries and ran it on this RC. People are welcome to run this themselves, but I haven't tested it on other machines yet, and some of the Spark tests are very sensitive to the test environment :) Output is pasted below: [1] https://github.com/pwendell/spark-utils/blob/master/release_auditor.py - Verifying download integrity for artifact: spark-0.8.0-incubating-bin-cdh4-rc5.tgz [PASSED] Artifact signature verified. [PASSED] Artifact MD5 verified. [PASSED] Artifact SHA verified. [PASSED] Tarball contains CHANGES.txt file [PASSED] Tarball contains NOTICE file [PASSED] Tarball contains LICENSE file [PASSED] README file contains disclaimer Verifying download integrity for artifact: spark-0.8.0-incubating-bin-hadoop1-rc5.tgz [PASSED] Artifact signature verified. [PASSED] Artifact MD5 verified. [PASSED] Artifact SHA verified. [PASSED] Tarball contains CHANGES.txt file [PASSED] Tarball contains NOTICE file [PASSED] Tarball contains LICENSE file [PASSED] README file contains disclaimer Verifying download integrity for artifact: spark-0.8.0-incubating-rc5.tgz [PASSED] Artifact signature verified. [PASSED] Artifact MD5 verified. [PASSED] Artifact SHA verified. [PASSED] Tarball contains CHANGES.txt file [PASSED] Tarball contains NOTICE file [PASSED] Tarball contains LICENSE file [PASSED] README file contains disclaimer Verifying build and tests for artifact: spark-0.8.0-incubating-bin-cdh4-rc5.tgz == Running build [PASSED] sbt build successful [PASSED] Maven build successful == Performing unit tests [PASSED] Tests successful Verifying build and tests for artifact: spark-0.8.0-incubating-bin-hadoop1-rc5.tgz == Running build [PASSED] sbt build successful [PASSED] Maven build successful == Performing unit tests [PASSED] Tests successful Verifying build and tests for artifact: spark-0.8.0-incubating-rc5.tgz == Running build [PASSED] sbt build successful [PASSED] Maven build successful == Performing unit tests [PASSED] Tests successful - Patrick On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit d9e80d5): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Thursday, September 19th at 05:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [ ] -1 Do not release this package because ... To learn
Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)
I also wrote an audit script [1] to verify various aspects of the release binaries and ran it on this RC. People are welcome to run this themselves, but I haven't tested it on other machines yet, and some of the Spark tests are very sensitive to the test environment :) Output is pasted below: [1] https://github.com/pwendell/spark-utils/blob/master/release_auditor.py - Verifying download integrity for artifact: spark-0.8.0-incubating-bin-cdh4-rc5.tgz [PASSED] Artifact signature verified. [PASSED] Artifact MD5 verified. [PASSED] Artifact SHA verified. [PASSED] Tarball contains CHANGES.txt file [PASSED] Tarball contains NOTICE file [PASSED] Tarball contains LICENSE file [PASSED] README file contains disclaimer Verifying download integrity for artifact: spark-0.8.0-incubating-bin-hadoop1-rc5.tgz [PASSED] Artifact signature verified. [PASSED] Artifact MD5 verified. [PASSED] Artifact SHA verified. [PASSED] Tarball contains CHANGES.txt file [PASSED] Tarball contains NOTICE file [PASSED] Tarball contains LICENSE file [PASSED] README file contains disclaimer Verifying download integrity for artifact: spark-0.8.0-incubating-rc5.tgz [PASSED] Artifact signature verified. [PASSED] Artifact MD5 verified. [PASSED] Artifact SHA verified. [PASSED] Tarball contains CHANGES.txt file [PASSED] Tarball contains NOTICE file [PASSED] Tarball contains LICENSE file [PASSED] README file contains disclaimer Verifying build and tests for artifact: spark-0.8.0-incubating-bin-cdh4-rc5.tgz == Running build [PASSED] sbt build successful [PASSED] Maven build successful == Performing unit tests [PASSED] Tests successful Verifying build and tests for artifact: spark-0.8.0-incubating-bin-hadoop1-rc5.tgz == Running build [PASSED] sbt build successful [PASSED] Maven build successful == Performing unit tests [PASSED] Tests successful Verifying build and tests for artifact: spark-0.8.0-incubating-rc5.tgz == Running build [PASSED] sbt build successful [PASSED] Maven build successful == Performing unit tests [PASSED] Tests successful - Patrick On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit d9e80d5): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Thursday, September 19th at 05:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)
Hey folks, just FYI we found one minor issue with this RC (the kafka jar in the stream pom needs to be published as provided since it's not available in maven). Please still continue to test this and provide feedback here until the following RC is posted later. - Patrick On Mon, Sep 16, 2013 at 1:28 PM, Reynold Xin r...@cs.berkeley.edu wrote: +1 -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Sun, Sep 15, 2013 at 11:09 PM, Patrick Wendell pwend...@gmail.comwrote: I also wrote an audit script [1] to verify various aspects of the release binaries and ran it on this RC. People are welcome to run this themselves, but I haven't tested it on other machines yet, and some of the Spark tests are very sensitive to the test environment :) Output is pasted below: [1] https://github.com/pwendell/spark-utils/blob/master/release_auditor.py - Verifying download integrity for artifact: spark-0.8.0-incubating-bin-cdh4-rc5.tgz [PASSED] Artifact signature verified. [PASSED] Artifact MD5 verified. [PASSED] Artifact SHA verified. [PASSED] Tarball contains CHANGES.txt file [PASSED] Tarball contains NOTICE file [PASSED] Tarball contains LICENSE file [PASSED] README file contains disclaimer Verifying download integrity for artifact: spark-0.8.0-incubating-bin-hadoop1-rc5.tgz [PASSED] Artifact signature verified. [PASSED] Artifact MD5 verified. [PASSED] Artifact SHA verified. [PASSED] Tarball contains CHANGES.txt file [PASSED] Tarball contains NOTICE file [PASSED] Tarball contains LICENSE file [PASSED] README file contains disclaimer Verifying download integrity for artifact: spark-0.8.0-incubating-rc5.tgz [PASSED] Artifact signature verified. [PASSED] Artifact MD5 verified. [PASSED] Artifact SHA verified. [PASSED] Tarball contains CHANGES.txt file [PASSED] Tarball contains NOTICE file [PASSED] Tarball contains LICENSE file [PASSED] README file contains disclaimer Verifying build and tests for artifact: spark-0.8.0-incubating-bin-cdh4-rc5.tgz == Running build [PASSED] sbt build successful [PASSED] Maven build successful == Performing unit tests [PASSED] Tests successful Verifying build and tests for artifact: spark-0.8.0-incubating-bin-hadoop1-rc5.tgz == Running build [PASSED] sbt build successful [PASSED] Maven build successful == Performing unit tests [PASSED] Tests successful Verifying build and tests for artifact: spark-0.8.0-incubating-rc5.tgz == Running build [PASSED] sbt build successful [PASSED] Maven build successful == Performing unit tests [PASSED] Tests successful - Patrick On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit d9e80d5): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Thursday, September 19th at 05:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: git commit: Hard code scala version in pom files.
So Mark does that mean you'd be OK with us hard coding the scala version in branch 0.8.0 build? It just seems like the overall simplest solution for now. Or would this cause a large problem for you guys? We can solve this on master for 0.9, I didn't touch master at all wrt the maven build. - Patrick On Sun, Sep 15, 2013 at 7:32 PM, Mark Hamstra m...@clearstorydata.com wrote: Yes, it looks like we need to do something to get 0.8.0 shipped and something to fix the problem longer term. I agree that those somethings don't have to be the same thing, and that we can take this up again once the 0.8.0 dust has settled. Give me a day and I'll probably have more to say about how I'd like things to look in the future. On Sun, Sep 15, 2013 at 7:06 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mark, Thanks for providing the detailed explanation. My primary concern was just that this changes the published artifacts in a way that could break downstream consumers of these poms which may assume that artifact id's are immutable within a pom.xml file. For now, let me revert my change and test that a few important things still work (e.g. IDE's, etc). At a minimum I just want to make sure things we are advising people to do don't break under this release. If this doesn't break those things we can move forward with the parameterized artifacts for 0.8.0. Just a word of caution though, there may be other downstream consumers of the pom files for whom this will cause a problem in the future. If someone presents a compelling reason, we'll have to think about whether we can keep publishing them like this, since this is not technically a valid maven format. - Patrick On Sun, Sep 15, 2013 at 6:46 PM, Mark Hamstra m...@clearstorydata.com wrote: Ah sorry, I've gotten so used to using ClearStory's poms (where we make quite a lot of use of such parameterization) that I lost track of exactly when Spark's maven build was changed to work in a similar way. This all revolves around a basic difference of opinion as to whether the thing that specifies how a project is built should be a fixed, static document or is more of a program itself or a parameterized function that drives the build and results in an artifact. SBT is of the latter opinion, while Maven (at least with Maven 3) is going the other way. That means that building idiomatic Scala artifacts (which expect things like cross-versioning support and artifactIds that include the Scala binary version that was used to create them) is somewhat at odds with the Maven philosophy. Hard-coding artifactIds, versions, and whatever else Maven now requires to guarantee that a pom file be a fixed, repeatable build description works okay for a single build of an artifact; and a user of just that built artifact won't have to change behavior if the pom is no longer parameterized. However, users who are not just interested in using pre-built artifacts but also in modifying, adding to or reusing the code do have to change their behavior if parameterized Maven builds disappear (yes, you have pointed out the state of affairs with the 0.6 and 0.7 releases; I'll point out that some of those making further use of the code have been using the current, not-yet-released poms for a good while.) Without some form of parameterized Maven builds, developers who now rely upon such parameterized builds will have to choose to fork the Apache poms and maintain their own parameterized build, or to repeatedly and manually edit static Apache pom files in order to change artifactIds and dependency versions (which is a frequent need when integrating Spark into a much larger and more complicated technology stack), or to switch over to using SBT in order to get parameterized builds (which, of course, would necessitate a lot of other changes, not all of them welcome.) Archetypes or something similar seems like a way to satisfy Maven's new requirement for static build configurations while at the same time providing a parameterized way to generate that configuration or a modified version of it -- solving the problem by adding a layer of abstraction. On Sun, Sep 15, 2013 at 6:12 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mark, Could you describe a user whose behavior is changed by this, and how it is changed? This commit actually brings 0.8 in line with the 0.7 and 0.6 branches, where the scala version is hard coded in the released artifacts: http://repo1.maven.org/maven2/org/spark-project/spark-streaming_2.9.3/0.7.3/spark-streaming_2.9.3-0.7.3.pom That seems to me to minimize the changes in user behavior as much as possible. It would be bad if during the 0.8 release the format of our released artifacts changed in a way that caused things to break for users. One example of something that could break is an IDE or some other tool that consumes these builds
Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC4)
Yes, we've moved onto RC5, thanks. On Sun, Sep 15, 2013 at 10:06 PM, Henry Saputra henry.sapu...@gmail.com wrote: Looks like this VOTE thread has been cancelled. Patrick has sent VOTE for RC5 in separate thread. - Henry On Saturday, September 14, 2013, Patrick Wendell wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit 32fc250): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc4/files/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-046/org/apache/spark/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc4/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Tuesday, September 17th at 10:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC3)
I'll post another RC in a bit which addresses Mark's comments (though please continue to provide feedback on this one!). Suresh - it's signed with the following key: http://people.apache.org/~pwendell/9E4FE3AF.asc On Fri, Sep 13, 2013 at 11:28 AM, Mark Hamstra m...@clearstorydata.com wrote: [X] -1 Do not release this package because ... Prior, out-of-band discussion: Thanks for the insight Mark, we need to move this discussion to the main VOTE thread in dev@ list to be official. Mark, could you kind reply to Patrick VOTE email thread with the -1 vote to make sure community know that there are missing pieces in the release artifacts proposed by Spark's RE (Patrick) Thanks, Henry On Thu, Sep 12, 2013 at 10:57 PM, Mark Hamstra m...@clearstorydata.com wrote: Yeah, that may get tricky, because the check of the tests in the 'prepare' step and the running of the deploy goal in the 'perform' step (excuse my calling it 'release' previously) will want to change the build dependencies. We may end up needing to do as Patrick has been doing but then run a separate script to make sure that the yarn and repl-bin modules get properly versioned, tagged, and uploaded. Maybe a maven-release-plugin expert knows how to get it do just what we want, but I certainly don't see how myself right now. On Thu, Sep 12, 2013 at 10:45 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hmm, one potentially nasty issue here is if spark-core ends up depending on hadoop-client 2.0.x instead of 1.0.4 by default with these settings. We should make sure that doesn't happen. If you'll make another RC, here are a few other small fixes I'd suggest: - In the title tag of docs/_layout/global.html, use site.SPARK_VERSION_SHORT instead of SPARK_VERSION (it's kind of verbose now) - Fix the jets3t version thing mentioned here: https://github.com/mesos/spark/pull/919 (just remove the unneeded version from core/pom.xml) Matei On Sep 12, 2013, at 10:25 PM, Patrick Wendell pwend...@gmail.com wrote: Oh I see - okay I'll try to make sure they (a) get pushed and (b) have the correct version. Thanks for bringing this up, would have totally missed it otherwise. On Thu, Sep 12, 2013 at 10:20 PM, Mark Hamstra m...@clearstorydata.com wrote: I just mean that with the yarn and repl-bin poms still specifying SNAPSHOT versions, any maven build that tries to use the hadoop2-yarn or repl-bin profile will not work because those modules will not be able to find a SNAPSHOT parent pom. Including those profiles in the prepare and release step should fix the problem, but you may need to manually sync up the version of those two pom files first. On Thu, Sep 12, 2013 at 10:16 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mark, I haven't been including those - I'll use that flag and try to publish again. The last sentence there the maven build is broken does that refer to an additional problem, or just the problem of me not including the flag. - Patrick On Thu, Sep 12, 2013 at 10:11 PM, Mark Hamstra m...@clearstorydata.com wrote: It's a definite do not release from me because you are still not picking up all of the modules in your prepare and release. Are you including -Phadoop2-yarn,repl-bin on the command line for your mvn prepare and mvn release? Because the yarn module and repl-bin module are not being processed by the maven-release-plugin, so the pom files for those modules still show their version as 0.8.0-incubating-SNAPSHOT instead of 0.8.0-incubating. That means that the maven build is broken. On Thu, Sep 12, 2013 at 3:57 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit ffacd17): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/files/ The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-034/org/apache/spark/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Saturday, June 13th at 23:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [X] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC3)
Hey guys, we actually decided on a slightly different naming convention for the downloads. I'm going to amend the files in the next few minutes... in case anyone happens to be looking *this instant* (which I doubt) hold off until I update them. On Thu, Sep 12, 2013 at 3:57 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit ffacd17): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/files/ The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-034/org/apache/spark/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Saturday, June 13th at 23:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC3)
Fixed! On Thu, Sep 12, 2013 at 4:22 PM, Patrick Wendell pwend...@gmail.com wrote: Hey guys, we actually decided on a slightly different naming convention for the downloads. I'm going to amend the files in the next few minutes... in case anyone happens to be looking *this instant* (which I doubt) hold off until I update them. On Thu, Sep 12, 2013 at 3:57 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit ffacd17): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/files/ The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-034/org/apache/spark/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Saturday, June 13th at 23:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
[VOTE] Release Apache Spark 0.8.0-incubating (RC3)
Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.0. This will be the first incubator release for Spark in Apache. The tag to be voted on is v0.8.0-incubating (commit ffacd17): https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/files/ The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-034/org/apache/spark/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc3/docs/ Please vote on releasing this package as Apache Spark 0.8.0-incubating! The vote is open until Saturday, June 13th at 23:00 UTC and passes if a majority of at least 3 +1 IPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.0-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Re: Spark 0.8.0-incubating RC2
Hey Chris, The only issue with CHANGES.txt is that we've only recently become more disciplined about tracking issues in JIRA and tracking version numbers when we do make JIRA issues. If we generated a CHANGES.txt based on JIRA, it would be largely incomplete since many changes from the beginning of the release would be missing. What about if I created a CHANGES.txt based on the Git history? Would that be better than not having one at all? - Patrick On Wed, Sep 11, 2013 at 6:58 AM, Chris Mattmann mattm...@apache.org wrote: Hey Patrick, Looking good. If the license info and so forth has been vetted and looks good which it sounds like Henry and others have checked out, I took a look at: http://people.apache.org/~pwendell/spark-rc/ And the only thing I would recommend adding is some CHANGES.txt file that contains a JIRA change log of what is provided in this RC. But I would definitely proceed to a [VOTE] thread on the RC and let's get this going formally. Great work. Cheers, Chris -Original Message- From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov Reply-To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Date: Friday, September 6, 2013 4:15 PM To: Patrick Wendell pwend...@gmail.com Cc: dev@spark.incubator.apache.org dev@spark.incubator.apache.org, Henry Saputra henry.sapu...@gmail.com Subject: Re: Spark 0.8.0-incubating RC2 Awesome was going to tell you it might take a sec to sync. Woot. OK more tonight.. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Patrick Wendell pwend...@gmail.com Date: Friday, September 6, 2013 2:14 PM To: jpluser chris.a.mattm...@jpl.nasa.gov Cc: dev@spark.incubator.apache.org dev@spark.incubator.apache.org, Henry Saputra henry.sapu...@gmail.com Subject: Re: Spark 0.8.0-incubating RC2 Thanks Chris - also it appears that my key has now been added to this file: http://people.apache.org/keys/group/spark.asc - Patrick On Fri, Sep 6, 2013 at 1:57 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Feedback coming, sorry been swamped and only recently back from DC/DARPA but will reply soon (hopefully tonight). ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Patrick Wendell pwend...@gmail.com Date: Friday, September 6, 2013 1:56 PM To: dev@spark.incubator.apache.org dev@spark.incubator.apache.org Cc: jpluser chris.a.mattm...@jpl.nasa.gov, Henry Saputra henry.sapu...@gmail.com Subject: Re: Spark 0.8.0-incubating RC2 Hey Chris, Henry... do you guys have feedback here? This was based largely on your feedback in the last round :) On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Evan, These are posted primarily for the purpose of having the Apache mentors look at the bundling format, they are not likely to be the exact commit we release. Matei will be merging in some doc stuff before the release, I'm pretty sure that includes your docs. - Patrick On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan e...@ooyala.com wrote: Patrick, I'm planning to submit documentation PR's against mesos/spark, by tomorrow, is that OK?We really should update the docs. thanks, Evan On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell pwend...@gmail.com wrote: No these are posted primarily for the purpose of having the Apache mentors look at the bundling format, they are not likely to be the exact commit we release (though this RC was fc6fbfe7d7e9171572c898d9e90301117517e60e). On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra m...@clearstorydata.com wrote: Are these RCs not getting tagged in the repository, or am I just not looking in the right place? On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Matei asked me to pick this up because he's travelling this week. I cut a second release candidate from the head of the 0.8 branch (on mesos/spark gitub) to address the following issues: - RC is now
Re: Spark 0.8.0-incubating RC2
Thanks Henry. The MLLib files have been fixed since you ran the tool. On Sat, Sep 7, 2013 at 11:25 PM, Henry Saputra henry.sapu...@gmail.com wrote: HI Patrick, I ran the Apache RAT tool as shown at http://creadur.apache.org/rat/apache-rat/index.html: java -jar apache-rat-0.10.jar ~/Downloads/spark-0.8.0-src-incubating-RC2 However we should add maven plugin to Spark pom.xml to support integrated RAT check as part of CI later. - Henry On Sat, Sep 7, 2013 at 11:24 AM, Patrick Wendell pwend...@gmail.com wrote: Henry, Thanks a lot for your feedback. Could you let me know how you ran Apache RAT tool so I can reproduce this? My sense is that the best next step is to do a RC that is built against the Apache Git and also includes both `src` and `bin` in addition to cleaned up license files. Some inline responses below. 1. I only see source artifacts in Patrick's p.a.o URL. I assume the pre-built ones will also be published with hash and signed? Yes, we'll do both src and binary releases. I'll hash, and sign both. 2. For every ASF release, we need designated release engineer (RE) that will drive the release process including determining bugs to be included, make sure all files have the right ASF header (running maven RAT plugin check), create release branch, update version for next development, create release artifacts and sign them correctly. I assume this would be Matei or Patrick? Yes, this might be me for this release because I've got the keys correctly set-up. I'll chat with Matei when he's back. 3. The proposed source artifacts 0.8.0-RC2's signature looks good and hash looks good. However it was generated against github mesos:spark repo. Reminder that when we send proposal for release to general@incubator.a.o we need to generate RC builds using ASF git repo with the right tagged branch. Next RC we will take care of this. 4. I ran RAT check for the source artifact and found a lot of source do not have ASF license header. For example some in repl directory has this: /* NSC -- new Scala compiler * Copyright 2005-2011 LAMP/EPFL * @author Paul Phillips */ Not sure if we need to ASF header to it since we are technically put in under apache package. Scala source files under mllib are missing ASF headers. See comment above. 5. Add public key of RE to http://people.apache.org/keys/group/spark.asc (@Chris do we still need to create KEYS file in the Spark git repo?) This is now finished for me :)
Re: Spark 0.8.0-incubating RC2
Henry, Thanks a lot for your feedback. Could you let me know how you ran Apache RAT tool so I can reproduce this? My sense is that the best next step is to do a RC that is built against the Apache Git and also includes both `src` and `bin` in addition to cleaned up license files. Some inline responses below. 1. I only see source artifacts in Patrick's p.a.o URL. I assume the pre-built ones will also be published with hash and signed? Yes, we'll do both src and binary releases. I'll hash, and sign both. 2. For every ASF release, we need designated release engineer (RE) that will drive the release process including determining bugs to be included, make sure all files have the right ASF header (running maven RAT plugin check), create release branch, update version for next development, create release artifacts and sign them correctly. I assume this would be Matei or Patrick? Yes, this might be me for this release because I've got the keys correctly set-up. I'll chat with Matei when he's back. 3. The proposed source artifacts 0.8.0-RC2's signature looks good and hash looks good. However it was generated against github mesos:spark repo. Reminder that when we send proposal for release to general@incubator.a.o we need to generate RC builds using ASF git repo with the right tagged branch. Next RC we will take care of this. 4. I ran RAT check for the source artifact and found a lot of source do not have ASF license header. For example some in repl directory has this: /* NSC -- new Scala compiler * Copyright 2005-2011 LAMP/EPFL * @author Paul Phillips */ Not sure if we need to ASF header to it since we are technically put in under apache package. Scala source files under mllib are missing ASF headers. See comment above. 5. Add public key of RE to http://people.apache.org/keys/group/spark.asc (@Chris do we still need to create KEYS file in the Spark git repo?) This is now finished for me :)
Re: Spark 0.8.0-incubating RC2
Hey Chris, Henry... do you guys have feedback here? This was based largely on your feedback in the last round :) On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Evan, These are posted primarily for the purpose of having the Apache mentors look at the bundling format, they are not likely to be the exact commit we release. Matei will be merging in some doc stuff before the release, I'm pretty sure that includes your docs. - Patrick On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan e...@ooyala.com wrote: Patrick, I'm planning to submit documentation PR's against mesos/spark, by tomorrow, is that OK?We really should update the docs. thanks, Evan On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell pwend...@gmail.com wrote: No these are posted primarily for the purpose of having the Apache mentors look at the bundling format, they are not likely to be the exact commit we release (though this RC was fc6fbfe7d7e9171572c898d9e90301117517e60e). On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra m...@clearstorydata.com wrote: Are these RCs not getting tagged in the repository, or am I just not looking in the right place? On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Matei asked me to pick this up because he's travelling this week. I cut a second release candidate from the head of the 0.8 branch (on mesos/spark gitub) to address the following issues: - RC is now hosted in an apache web space - RC now includes signature - RC now includes MD5 and SHA512 digests [tgz] http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubating-RC2.tgz [all files] http://people.apache.org/~pwendell/spark-rc/ It would be great to get feedback on the release structure. I also changed the name to include src since we will be releasing both source and binary releases. I was a bit confused about how to attach my GPG key to the spark.asc file. I took the following steps. 1. Greated a GPG key locally 2. Distributed the key to public key servers (gpg --send-key) 3. Add exported key to my apache web space: http://people.apache.org/~pwendell/9E4FE3AF.asc 4. Added the key fingerprint at id.apage.org 5. Create an apache FOAF file with the key signature However, this doesn't seem sufficient to get my key on this page (at least, not yet): http://people.apache.org/keys/group/spark.asc Chris - are there other steps I missed? Is there a manual way to augment this file? - Patrick -- -- Evan Chan Staff Engineer e...@ooyala.com | http://www.ooyala.com/ http://www.facebook.com/ooyalahttp://www.linkedin.com/company/ooyalahttp://www.twitter.com/ooyala