Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)
shane, is there anything we should do for pull requests that failed, but for unrelated issues? best, matt On 09/11/2014 11:29 AM, shane knapp wrote: ...and the restart is done. On Thu, Sep 11, 2014 at 7:38 AM, shane knapp skn...@berkeley.edu wrote: jenkins is now in quiet mode, and a restart is happening soon. On Wed, Sep 10, 2014 at 3:44 PM, shane knapp skn...@berkeley.edu wrote: that's kinda what we're hoping as well. :) On Wed, Sep 10, 2014 at 2:46 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I'm looking forward to this. :) Looks like Jenkins is having trouble triggering builds for new commits or after user requests (e.g. https://github.com/apache/spark/pull/2339#issuecomment-55165937). Hopefully that will be resolved tomorrow. Nick On Tue, Sep 9, 2014 at 5:00 PM, shane knapp skn...@berkeley.edu wrote: since the power incident last thursday, the github pull request builder plugin is still not really working 100%. i found an open issue w/jenkins[1] that could definitely be affecting us, i will be pausing builds early thursday morning and then restarting jenkins. i'll send out a reminder tomorrow, and if this causes any problems for you, please let me know and we can work out a better time. but, now for some good news! yesterday morning, we racked and stacked the systems for the new jenkins instance in the berkeley datacenter. tomorrow i should be able to log in to them and start getting them set up and configured. this is a major step in getting us in to a much more 'production' style environment! anyways: thanks for your patience, and i think we've all learned that hard powering down your build system is a definite recipe for disaster. :) shane [1] -- https://issues.jenkins-ci.org/browse/JENKINS-22509 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)
it was part of the review queue, but it looks like the runs have been gc'd. oh well! best, matt On 09/11/2014 12:18 PM, shane knapp wrote: you can just click on 'rebuild', if you'd like. what project specifically? (i had forgotten that i'd killed https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/557/, which i just started a rebuild on) On Thu, Sep 11, 2014 at 9:15 AM, Matthew Farrellee m...@redhat.com mailto:m...@redhat.com wrote: shane, is there anything we should do for pull requests that failed, but for unrelated issues? best, matt On 09/11/2014 11:29 AM, shane knapp wrote: ...and the restart is done. On Thu, Sep 11, 2014 at 7:38 AM, shane knapp skn...@berkeley.edu mailto:skn...@berkeley.edu wrote: jenkins is now in quiet mode, and a restart is happening soon. On Wed, Sep 10, 2014 at 3:44 PM, shane knapp skn...@berkeley.edu mailto:skn...@berkeley.edu wrote: that's kinda what we're hoping as well. :) On Wed, Sep 10, 2014 at 2:46 PM, Nicholas Chammas nicholas.cham...@gmail.com mailto:nicholas.cham...@gmail.com wrote: I'm looking forward to this. :) Looks like Jenkins is having trouble triggering builds for new commits or after user requests (e.g. https://github.com/apache/__spark/pull/2339#issuecomment-__55165937 https://github.com/apache/spark/pull/2339#issuecomment-55165937). Hopefully that will be resolved tomorrow. Nick On Tue, Sep 9, 2014 at 5:00 PM, shane knapp skn...@berkeley.edu mailto:skn...@berkeley.edu wrote: since the power incident last thursday, the github pull request builder plugin is still not really working 100%. i found an open issue w/jenkins[1] that could definitely be affecting us, i will be pausing builds early thursday morning and then restarting jenkins. i'll send out a reminder tomorrow, and if this causes any problems for you, please let me know and we can work out a better time. but, now for some good news! yesterday morning, we racked and stacked the systems for the new jenkins instance in the berkeley datacenter. tomorrow i should be able to log in to them and start getting them set up and configured. this is a major step in getting us in to a much more 'production' style environment! anyways: thanks for your patience, and i think we've all learned that hard powering down your build system is a definite recipe for disaster. :) shane [1] -- https://issues.jenkins-ci.org/__browse/JENKINS-22509 https://issues.jenkins-ci.org/browse/JENKINS-22509 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.1.0 (RC4)
+1 built from sha w/ make-distribution.sh tested basic examples (0 data) w/ local on fedora 20 (openjdk 1.7, python 2.7.5) tested detection and log processing (25GB data) w/ mesos (0.19.0) nfs on rhel 7 (openjdk 1.7, python 2.7.5) On 09/03/2014 03:24 AM, Patrick Wendell wrote: Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc4 (commit 2f9b2bd): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=2f9b2bd7844ee8393dc9c319f4fefedf95f5e460 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.1.0-rc4/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1031/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.1.0-rc4-docs/ Please vote on releasing this package as Apache Spark 1.1.0! The vote is open until Saturday, September 06, at 08:30 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == Regressions fixed since RC3 == SPARK-3332 - Issue with tagging in EC2 scripts SPARK-3358 - Issue with regression for m3.XX instances == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.0.X will not block this release. == What default changes should I be aware of? == 1. The default value of spark.io.compression.codec is now snappy -- Old behavior can be restored by switching to lzf 2. PySpark now performs external spilling during aggregations. -- Old behavior can be restored by setting spark.shuffle.spill to false. 3. PySpark uses a new heuristic for determining the parallelism of shuffle operations. -- Old behavior can be restored by setting spark.default.parallelism to the number of cores in the cluster. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: spark-ec2 depends on stuff in the Mesos repo
that's not a bad idea. it would also break the circular dep in versions that results in spark X's ec2 script installing spark X-1 by default. best, matt On 09/03/2014 01:17 PM, Shivaram Venkataraman wrote: The spark-ec2 repository isn't a part of Mesos. Back in the days, Spark used to be hosted in the Mesos github organization as well and so we put scripts that were used by Spark under the same organization. FWIW I don't think these scripts belong in the Spark repository. They are helper scripts that setup EC2 clusters with different components like HDFS, Spark, Tachyon etc. Also one of the motivations for creating this repository was the ability to change these scripts without requiring a new Spark release or a new AMI etc. We can move the repository to a different github organization like AMPLab if that'll make sense. Thanks Shivaram On Wed, Sep 3, 2014 at 10:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Spawned by this discussion https://github.com/apache/spark/pull/1120#issuecomment-54305831. See these 2 lines in spark_ec2.py: - spark_ec2 L42 https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L42 - spark_ec2 L566 https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L566 Why does the spark-ec2 script depend on stuff in the Mesos repo? Should they be moved to the Spark repo? Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: spark-ec2 depends on stuff in the Mesos repo
oh, i see pwendell is did a patch to the release branch to make the release version == --spark-version default best, matt On 09/03/2014 01:30 PM, Shivaram Venkataraman wrote: Actually the circular dependency doesn't depend on the spark-ec2 scripts -- The scripts contain download links to many Spark versions and you can configure which one should be used. Shivaram On Wed, Sep 3, 2014 at 10:22 AM, Matthew Farrellee m...@redhat.com mailto:m...@redhat.com wrote: that's not a bad idea. it would also break the circular dep in versions that results in spark X's ec2 script installing spark X-1 by default. best, matt On 09/03/2014 01:17 PM, Shivaram Venkataraman wrote: The spark-ec2 repository isn't a part of Mesos. Back in the days, Spark used to be hosted in the Mesos github organization as well and so we put scripts that were used by Spark under the same organization. FWIW I don't think these scripts belong in the Spark repository. They are helper scripts that setup EC2 clusters with different components like HDFS, Spark, Tachyon etc. Also one of the motivations for creating this repository was the ability to change these scripts without requiring a new Spark release or a new AMI etc. We can move the repository to a different github organization like AMPLab if that'll make sense. Thanks Shivaram On Wed, Sep 3, 2014 at 10:06 AM, Nicholas Chammas nicholas.cham...@gmail.com mailto:nicholas.cham...@gmail.com wrote: Spawned by this discussion https://github.com/apache/__spark/pull/1120#issuecomment-__54305831 https://github.com/apache/spark/pull/1120#issuecomment-54305831. See these 2 lines in spark_ec2.py: - spark_ec2 L42 https://github.com/apache/__spark/blob/__6a72a36940311fcb3429bd34c8818b__c7d513115c/ec2/spark_ec2.py#__L42 https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L42 - spark_ec2 L566 https://github.com/apache/__spark/blob/__6a72a36940311fcb3429bd34c8818b__c7d513115c/ec2/spark_ec2.py#__L566 https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L566 Why does the spark-ec2 script depend on stuff in the Mesos repo? Should they be moved to the Spark repo? Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Ask something about spark
reynold, would you folks be willing to put some creative commons license information on the site and its content? best, matt On 09/02/2014 06:32 PM, Reynold Xin wrote: I think in general that is fine. It would be great if your slides come with proper attribution. On Tue, Sep 2, 2014 at 3:31 PM, Sanghoon Lee phoenixl...@gmail.com wrote: Hi, I am phoenixlee and a Spark programmer in Korea. And be a good chance this time, it tries to teach college students and office workers to Spark. This course will be done with the support of the government. Can I use the data(pictures, samples, etc.) in the spark homepage for this course? Of course, I will put the comments in thanks and webpage URL. It would be a good opportunity, even though the findings were that there is no teaching materials Spark and education (or community) still in Korea. Thanks. ᐧ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Ask something about spark
CC or Apache, it'd be helpful to have it listed in the footer of pages best, matt On 09/03/2014 02:23 PM, Reynold Xin wrote: I am not sure if I can just go ahead and update the website with a creative common license. IIRC, ASF websites are also Apache 2.0 license. Might need somebody from legal to chime in. On Wed, Sep 3, 2014 at 11:15 AM, Matthew Farrellee m...@redhat.com mailto:m...@redhat.com wrote: reynold, would you folks be willing to put some creative commons license information on the site and its content? best, matt On 09/02/2014 06:32 PM, Reynold Xin wrote: I think in general that is fine. It would be great if your slides come with proper attribution. On Tue, Sep 2, 2014 at 3:31 PM, Sanghoon Lee phoenixl...@gmail.com mailto:phoenixl...@gmail.com wrote: Hi, I am phoenixlee and a Spark programmer in Korea. And be a good chance this time, it tries to teach college students and office workers to Spark. This course will be done with the support of the government. Can I use the data(pictures, samples, etc.) in the spark homepage for this course? Of course, I will put the comments in thanks and webpage URL. It would be a good opportunity, even though the findings were that there is no teaching materials Spark and education (or community) still in Korea. Thanks. ᐧ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Handling stale PRs
On 08/26/2014 04:57 AM, Sean Owen wrote: On Tue, Aug 26, 2014 at 7:02 AM, Patrick Wendell pwend...@gmail.com wrote: Most other ASF projects I know just ignore these patches. I'd prefer if we Agree, this drives me crazy. It kills part of JIRA's usefulness. Spark is blessed/cursed with incredible inbound load, but would love to still see the project get this right-er than, say, Hadoop. totally agree, this applies to patches as well as jiras. i'll add that projects who let things simply linger are missing an opportunity to engage their community. spark should capitalize on its momentum to build a smoothly running community (vs not and accept an unbounded backlog as inevitable). The more important thing, maybe, is how we want to deal with this culturally. And I think we need to do a better job of making sure no pull requests go unattended (i.e. waiting for committer feedback). If patches go stale, it should be because the user hasn't responded, not us. Stale JIRAs are a symptom, not a problem per se. I also want to see the backlog cleared, but automatically closing doesn't help, if the problem is too many JIRAs and not enough committer-hours to look at them. Some noise gets closed, but some easy or important fixes may disappear as well. engagement in the community really needs to go both ways. it's reasonable for PRs that stop merging or have open comments that need resolution by the PRer to be loudly timed out. a similar thing goes for jiras, if there's a request for more information to resolve a bug and that information does not appear, half of the communication is gone and a loud time out is reasonable. easy and important are in the eyes of the beholder. timeouts can go both ways. a jira or pr that has been around for a period of time (say 1/3 the to-close timeout) should bump up for evaluation, hopefully resulting in few easy or important issues falling through the cracks. fyi, i'm periodically going through the pyspark jiras, trying to reproduce issues, coalesce duplicates and ask for details. i've not been given any sort of permission to do this, i don't have any special position in the community to do this - in a well functioning community everyone should feel free to jump in and help. Another thing is that we should, IMO, err on the side of explicitly saying no or not yet to patches, rather than letting them linger without attention. We do get patches where the user is well intentioned, but it is Completely agree. The solution is partly more supply of committer time on JIRAs. But that is detracting from the work the committers themselves want to do. More of the solution is reducing demand by helping people create useful, actionable, non-duplicate JIRAs from the start. Or encouraging people to resolve existing JIRAs and shepherding those in. saying no/not-yet is a vitally important piece of information. Elsewhere, I've found people reluctant to close JIRA for fear of offending or turning off contributors. I think the opposite is true. There is nothing wrong with no or not now especially accompanied with constructive feedback. Better to state for the record what is not being looked at and why, than let people work on and open the same JIRAs repeatedly. well stated! I have also found in the past that a culture of tolerating eternal JIRAs led people to file JIRAs in order to forget about a problem -- it's in JIRA, and it's in progress, so it feels like someone else is going to fix it later and so it can be forgotten now. there's some value in these now-i-can-forget jira, though i'm not personally a fan. it can be good to keep them around and reachable by search, but they should be clearly marked as no/not-yet or something similar. For what it's worth, I think these project and culture mechanics are so important and it's my #1 concern for Spark at this stage. This challenge exists so much more here, exactly because there is so much potential. I'd love to help by trying to identify and close stale JIRAs but am afraid that tagging them is just adding to the heap of work. +1 concern and potential! best, matt - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org