Re: Spark 0.9.1 release

2014-03-27 Thread Tathagata Das
I have cut another release candidate, RC3, with two important bug fixes. See the following JIRAs for more details. 1. Bug with intercepts in MLLib's GLM: https://spark-project.atlassian.net/browse/SPARK-1327 2. Bug in PySpark's RDD.top() ordering:

Re: Spark 0.9.1 release

2014-03-26 Thread Patrick Wendell
Hey TD, This one we just merged into master this morning: https://spark-project.atlassian.net/browse/SPARK-1322 It should definitely go into the 0.9 branch because there was a bug in the semantics of top() which at this point is unreleased in Python. I didn't backport it yet because I figured

Re: Spark 0.9.1 release

2014-03-25 Thread Tathagata Das
@evan From the discussion in the JIRA, it seems that we still dont have a clear solution for SPARK-1138. Nor do we have a sense of whether the solution is going to small enough for a maintenance release. So I dont think we should block the release of Spark 0.9.1 for this. We can make another Spark

Re: Spark 0.9.1 release

2014-03-25 Thread Tathagata Das
PR 159 seems like a fairly big patch to me. And quite recent, so its impact on the scheduling is not clear. It may also depend on other changes that may have gotten into the DAGScheduler but not pulled into branch 0.9. I am not sure it is a good idea to pull that in. We can pull those changes

Re: Spark 0.9.1 release

2014-03-25 Thread Kay Ousterhout
I don't think the blacklisting is a priority and the CPUS_PER_TASK issue was still broken after this patch (so broken that I'm convinced no one actually uses this feature!!), so agree with TD's sentiment that this shouldn't go into 0.9.1. On Tue, Mar 25, 2014 at 10:23 PM, Tathagata Das

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
On Wed, Mar 26, 2014 at 10:53 AM, Tathagata Das tathagata.das1...@gmail.com wrote: PR 159 seems like a fairly big patch to me. And quite recent, so its impact on the scheduling is not clear. It may also depend on other changes that may have gotten into the DAGScheduler but not pulled into

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan
I also have a really minor fix for SPARK-1057 (upgrading fastutil), could that also make it in? -Evan On Sun, Mar 23, 2014 at 11:01 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Sorry this request is coming in a bit late, but would it be possible to backport SPARK-979[1] to

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan
@Tathagata, the PR is here: https://github.com/apache/spark/pull/215 On Mon, Mar 24, 2014 at 12:02 AM, Tathagata Das tathagata.das1...@gmail.com wrote: @Shivaram, That is a useful patch but I am bit afraid merge it in. Randomizing the executor has performance implications, especially for Spark

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell
Hey Evan and TD, Spark's dependency graph in a maintenance release seems potentially harmful, especially upgrading a minor version (not just a patch version) like this. This could affect other downstream users. For instance, now without knowing their fastutil dependency gets bumped and they hit

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell
Spark's dependency graph in a maintenance *Modifying* Spark's dependency graph...

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das
Patrick, that is a good point. On Mon, Mar 24, 2014 at 12:14 AM, Patrick Wendell pwend...@gmail.comwrote: Spark's dependency graph in a maintenance *Modifying* Spark's dependency graph...

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan
Patrick, yes, that is indeed a risk. On Mon, Mar 24, 2014 at 12:30 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Patrick, that is a good point. On Mon, Mar 24, 2014 at 12:14 AM, Patrick Wendell pwend...@gmail.comwrote: Spark's dependency graph in a maintenance *Modifying* Spark's

Re: Spark 0.9.1 release

2014-03-24 Thread Kevin Markey
1051 is essential! I'm not sure about the others, but anything that adds stability to Spark/Yarn would be helpful. Kevin Markey On 03/20/2014 01:12 PM, Tom Graves wrote: I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on YARN - JIRA and [SPARK-1051] On Yarn,

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das
1051 has been pulled in! search 1051 in https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-0.9 TD On Mon, Mar 24, 2014 at 4:26 PM, Kevin Markey kevin.mar...@oracle.com wrote: 1051 is essential! I'm not sure about the others, but anything that adds stability to

Re: Spark 0.9.1 release

2014-03-24 Thread Kevin Markey
Is there any way that [SPARK-782] (Shade ASM) can be included? I see that it is not currently backported to 0.9. But there is no single issue that has caused us more grief as we integrate spark-core with other project dependencies. There are way too many libraries out there in addition to

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das
Hello Kevin, A fix for SPARK-782 would definitely simplify building against Spark. However, its possible that a fix for this issue in 0.9.1 will break the builds (that reference spark) of existing 0.9 users, either due to a change in the ASM version, or for being incompatible with their current

Re: Spark 0.9.1 release

2014-03-20 Thread Bhaskar Dutta
It will be great if SPARK-1101https://spark-project.atlassian.net/browse/SPARK-1101: Umbrella for hardening Spark on YARN can get into 0.9.1. Thanks, Bhaskar On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das tathagata.das1...@gmail.comwrote: Hello everyone, Since the release of Spark 0.9, we

Re: Spark 0.9.1 release

2014-03-20 Thread Patrick Wendell
Hey Tom, I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on YARN - JIRA and [SPARK-1051] On Yarn, executors don't doAs as submitting user - JIRA in. The pyspark one I would consider more of an enhancement so might not be appropriate for a point release. Someone

Re: Spark 0.9.1 release

2014-03-20 Thread Tom Graves
Thanks for the heads up, saw that and will make sure that is resolved before pulling into 0.9.  Unless I'm missing something, they should just use sc.addJar to distributed the jar rather then relying on SPARK_YARN_APP_JAR. Tom On Thursday, March 20, 2014 3:31 PM, Patrick Wendell

Spark 0.9.1 release

2014-03-19 Thread Tathagata Das
Hello everyone, Since the release of Spark 0.9, we have received a number of important bug fixes and we would like to make a bug-fix release of Spark 0.9.1. We are going to cut a release candidate soon and we would love it if people test it out. We have backported several bug fixes into the 0.9

Re: Spark 0.9.1 release

2014-03-19 Thread Mridul Muralidharan
Would be great if the garbage collection PR is also committed - if not the whole thing, atleast the part to unpersist broadcast variables explicitly would be great. Currently we are running with a custom impl which does something similar, and I would like to move to standard distribution for that.

Re: Spark 0.9.1 release

2014-03-19 Thread Mridul Muralidharan
If 1.0 is just round the corner, then it is fair enough to push to that, thanks for clarifying ! Regards, Mridul On Wed, Mar 19, 2014 at 6:12 PM, Tathagata Das tathagata.das1...@gmail.com wrote: I agree that the garbage collection PRhttps://github.com/apache/spark/pull/126would make things