Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-06-02 Thread Marcelo Vanzin
Hi Patrick, Thanks for all the explanations, that makes sense. @DeveloperApi worries me a little bit especially because of the things Colin mentions - it's sort of hard to make people move off of APIs, or support different versions of the same API. But maybe if expectations (or lack thereof) are

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-06-02 Thread Sean Owen
On Mon, Jun 2, 2014 at 6:05 PM, Marcelo Vanzin van...@cloudera.com wrote: You mentioned something in your shading argument that kinda reminded me of something. Spark currently depends on slf4j implementations and log4j with compile scope. I'd argue that's the wrong approach if we're talking

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-31 Thread Patrick Wendell
One other consideration popped into my head: 5. Shading our dependencies could mess up our external API's if we ever return types that are outside of the spark package because we'd then be returned shaded types that users have to deal with. E.g. where before we returned an

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-31 Thread Colin McCabe
On Sat, May 31, 2014 at 10:45 AM, Patrick Wendell pwend...@gmail.com wrote: One other consideration popped into my head: 5. Shading our dependencies could mess up our external API's if we ever return types that are outside of the spark package because we'd then be returned shaded types that

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Colin McCabe
First of all, I think it's great that you're thinking about this. API stability is super important and it would be good to see Spark get on top of this. I want to clarify a bit about Hadoop. The problem that Hadoop faces is that the Java package system isn't very flexible. If you have a method

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Marcelo Vanzin
On Fri, May 30, 2014 at 12:05 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: I don't know if Scala provides any mechanisms to do this beyond what Java provides. In fact it does. You can say something like private[foo] and the annotated element will be visible for all classes under foo (where

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Patrick Wendell
Hey guys, thanks for the insights. Also, I realize Hadoop has gotten way better about this with 2.2+ and I think it's great progress. We have well defined API levels in Spark and also automated checking of API violations for new pull requests. When doing code reviews we always enforce the

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Marcelo Vanzin
Hi Patrick, On Fri, May 30, 2014 at 2:11 PM, Patrick Wendell pwend...@gmail.com wrote: 2. private[spark] 3. @Experimental or @DeveloperApi I understand @Experimental, but when would you use @DeveloperApi instead of private[spark]? Seems to me that, for the API user, they both mean very

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Colin McCabe
On Fri, May 30, 2014 at 2:11 PM, Patrick Wendell pwend...@gmail.com wrote: Hey guys, thanks for the insights. Also, I realize Hadoop has gotten way better about this with 2.2+ and I think it's great progress. We have well defined API levels in Spark and also automated checking of API

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Patrick Wendell
Spark is a bit different than Hadoop MapReduce, so maybe that's a source of some confusion. Spark is often used as a substrate for building different types of analytics applications, so @DeveloperAPI are internal API's that we'd like to expose to application writers, but that might be more

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-29 Thread Patrick Wendell
[tl;dr stable API's are important - sorry, this is slightly meandering] Hey - just wanted to chime in on this as I was travelling. Sean, you bring up great points here about the velocity and stability of Spark. Many projects have fairly customized semantics around what versions actually mean

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-18 Thread Mridul Muralidharan
On 18-May-2014 5:05 am, Mark Hamstra m...@clearstorydata.com wrote: I don't understand. We never said that interfaces wouldn't change from 0.9 Agreed. to 1.0. What we are committing to is stability going forward from the 1.0.0 baseline. Nobody is disputing that backward-incompatible

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-18 Thread Mridul Muralidharan
So I think I need to clarify a few things here - particularly since this mail went to the wrong mailing list and a much wider audience than I intended it for :-) Most of the issues I mentioned are internal implementation detail of spark core : which means, we can enhance them in future without

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Sean Owen
On this note, non-binding commentary: Releases happen in local minima of change, usually created by internally enforced code freeze. Spark is incredibly busy now due to external factors -- recently a TLP, recently discovered by a large new audience, ease of contribution enabled by Github. It's

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api changes, add missing functionality, go through a hardening release before 1.0 But the community preferred a 1.0 :-) Regards, Mridul On 17-May-2014

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra
Which of the unresolved bugs in spark-core do you think will require an API-breaking change to fix? If there are none of those, then we are still essentially on track for a 1.0.0 release. The number of contributions and pace of change now is quite high, but I don't think that waiting for the

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Andrew Ash
+1 on the next release feeling more like a 0.10 than a 1.0 On May 17, 2014 4:38 AM, Mridul Muralidharan mri...@gmail.com wrote: I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api changes, add missing

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Sean Owen
On Sat, May 17, 2014 at 4:52 PM, Mark Hamstra m...@clearstorydata.com wrote: Which of the unresolved bugs in spark-core do you think will require an API-breaking change to fix? If there are none of those, then we are still essentially on track for a 1.0.0 release. I don't have a particular

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
We made incompatible api changes whose impact we don't know yet completely : both from implementation and usage point of view. We had the option of getting real-world feedback from the user community if we had gone to 0.10 but the spark developers seemed to be in a hurry to get to 1.0 - so I made

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra
That is a past issue that we don't need to be re-opening now. The present issue, and what I am asking, is which pending bug fixes does anyone anticipate will require breaking the public API guaranteed in rc9? On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan mri...@gmail.comwrote: We made

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Kan Zhang
+1 on the running commentary here, non-binding of course :-) On Sat, May 17, 2014 at 8:44 AM, Andrew Ash and...@andrewash.com wrote: +1 on the next release feeling more like a 0.10 than a 1.0 On May 17, 2014 4:38 AM, Mridul Muralidharan mri...@gmail.com wrote: I had echoed similar

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
On 17-May-2014 11:40 pm, Mark Hamstra m...@clearstorydata.com wrote: That is a past issue that we don't need to be re-opening now. The present Huh ? If we need to revisit based on changed circumstances, we must - the scope of changes introduced in this release was definitely not anticipated

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra
I'm not trying to muzzle the discussion. All I am saying is that we don't need to have the same discussion about 0.10 vs. 1.0 that we already had. If you can tell me about specific changes in the current release candidate that occasion new arguments for why a 1.0 release is an unacceptable idea,

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Matei Zaharia
As others have said, the 1.0 milestone is about API stability, not about saying “we’ve eliminated all bugs”. The sooner you declare 1.0, the sooner users can confidently build on Spark, knowing that the application they build today will still run on Spark 1.9.9 three years from now. This is

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
On 18-May-2014 1:45 am, Mark Hamstra m...@clearstorydata.com wrote: I'm not trying to muzzle the discussion. All I am saying is that we don't need to have the same discussion about 0.10 vs. 1.0 that we already had. Agreed, no point in repeating the same discussion ... I am also trying to

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
I would make the case for interface stability not just api stability. Particularly given that we have significantly changed some of our interfaces, I want to ensure developers/users are not seeing red flags. Bugs and code stability can be addressed in minor releases if found, but behavioral

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Michael Malak
While developers may appreciate 1.0 == API stability, I'm not sure that will be the understanding of the VP who gives the green light to a Spark-based development effort. I fear a bug that silently produces erroneous results will be perceived like the FDIV bug, but in this case without the

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Matei Zaharia
Yup, this is a good point, the interface includes stuff like launch scripts and environment variables. However I do think that the current features of spark-submit can all be supported in future releases. We’ll definitely have a very strict standard for modifying these later on. Matei On May

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Henry Saputra
-- Original -- From: Madhu;ma...@madhu.com; Date: Wed, May 14, 2014 09:15 AM To: devd...@spark.incubator.apache.org; Subject: Re: [VOTE] Release Apache Spark 1.0.0 (rc5) I just built rc5 on Windows 7 and tried to reproduce the problem described

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Mark Hamstra
+1, but just barely. We've got quite a number of outstanding bugs identified, and many of them have fixes in progress. I'd hate to see those efforts get lost in a post-1.0.0 flood of new features targeted at 1.1.0 -- in other words, I'd like to see 1.0.1 retain a high priority relative to 1.1.0.

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-15 Thread Sandy Ryza
wrote: You need to set: spark.akka.frameSize 5 spark.default.parallelism1 -- Original -- From: Madhu;ma...@madhu.com; Date: Wed, May 14, 2014 09:15 AM To: devd...@spark.incubator.apache.org; Subject: Re: [VOTE] Release Apache Spark

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-15 Thread Patrick Wendell
I'm cancelling this vote in favor of rc6. On Tue, May 13, 2014 at 8:01 AM, Sean Owen so...@cloudera.com wrote: On Tue, May 13, 2014 at 2:49 PM, Sean Owen so...@cloudera.com wrote: On Tue, May 13, 2014 at 9:36 AM, Patrick Wendell pwend...@gmail.com wrote: The release files, including

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-15 Thread Matei Zaharia
SHA-1 is being end-of-lived so I’d actually say switch to 512 for all of them instead. On May 13, 2014, at 6:49 AM, Sean Owen so...@cloudera.com wrote: On Tue, May 13, 2014 at 9:36 AM, Patrick Wendell pwend...@gmail.com wrote: The release files, including signatures, digests, etc. can be

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-14 Thread witgo
You need to set: spark.akka.frameSize 5 spark.default.parallelism1 -- Original -- From: Madhu;ma...@madhu.com; Date: Wed, May 14, 2014 09:15 AM To: devd...@spark.incubator.apache.org; Subject: Re: [VOTE] Release Apache Spark 1.0.0 (rc5) I

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-14 Thread Sean Owen
On Tue, May 13, 2014 at 2:49 PM, Sean Owen so...@cloudera.com wrote: On Tue, May 13, 2014 at 9:36 AM, Patrick Wendell pwend...@gmail.com wrote: The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.0.0-rc5/ Good news is that the

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-14 Thread Madhu
I built rc5 using sbt/sbt assembly on Linux without any problems. There used to be an sbt.cmd for Windows build, has that been deprecated? If so, I can document the Windows build steps that worked for me. -- View this message in context:

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-13 Thread Mark Hamstra
There were a few early/test RCs this cycle that were never put to a vote. On Tue, May 13, 2014 at 8:07 AM, Nan Zhu zhunanmcg...@gmail.com wrote: just curious, where is rc4 VOTE? I searched my gmail but didn't find that? On Tue, May 13, 2014 at 9:49 AM, Sean Owen so...@cloudera.com

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-13 Thread witgo
@spark.apache.org; Subject: Re: [VOTE] Release Apache Spark 1.0.0 (rc5) Hey all - there were some earlier RC's that were not presented to the dev list because issues were found with them. Also, there seems to be some issues with the reliability of the dev list e-mail. Just a heads up. I'll lead

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-13 Thread Madhu
I just built rc5 on Windows 7 and tried to reproduce the problem described in https://issues.apache.org/jira/browse/SPARK-1712 It works on my machine: 14/05/13 21:06:47 INFO DAGScheduler: Stage 1 (sum at console:17) finished in 4.548 s 14/05/13 21:06:47 INFO TaskSchedulerImpl: Removed TaskSet