[RESULT] [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-17 Thread Patrick Wendell
Cancelled in favor of rc9. On Sat, May 17, 2014 at 12:51 AM, Patrick Wendell pwend...@gmail.com wrote: Due to the issue discovered by Michael, this vote is cancelled in favor of rc9. On Fri, May 16, 2014 at 6:22 PM, Michael Armbrust mich...@databricks.com wrote: -1 We found a regression

Re: [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-17 Thread Patrick Wendell
Due to the issue discovered by Michael, this vote is cancelled in favor of rc9. On Fri, May 16, 2014 at 6:22 PM, Michael Armbrust mich...@databricks.com wrote: -1 We found a regression in the way configuration is passed to executors. https://issues.apache.org/jira/browse/SPARK-1864

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Patrick Wendell
I'll start the voting with a +1. On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has one bug fix and one minor feature on top of rc8: SPARK-1864:

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Sean Owen
On this note, non-binding commentary: Releases happen in local minima of change, usually created by internally enforced code freeze. Spark is incredibly busy now due to external factors -- recently a TLP, recently discovered by a large new audience, ease of contribution enabled by Github. It's

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api changes, add missing functionality, go through a hardening release before 1.0 But the community preferred a 1.0 :-) Regards, Mridul On 17-May-2014

Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-05-17 Thread Mridul Muralidharan
I suspect this is an issue we have fixed internally here as part of a larger change - the issue we fixed was not a config issue but bugs in spark. Unfortunately we plan to contribute this as part of 1.1 Regards, Mridul On 17-May-2014 4:09 pm, sam (JIRA) j...@apache.org wrote: sam created

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra
Which of the unresolved bugs in spark-core do you think will require an API-breaking change to fix? If there are none of those, then we are still essentially on track for a 1.0.0 release. The number of contributions and pace of change now is quite high, but I don't think that waiting for the

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Andrew Ash
+1 on the next release feeling more like a 0.10 than a 1.0 On May 17, 2014 4:38 AM, Mridul Muralidharan mri...@gmail.com wrote: I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api changes, add missing

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Sean Owen
On Sat, May 17, 2014 at 4:52 PM, Mark Hamstra m...@clearstorydata.com wrote: Which of the unresolved bugs in spark-core do you think will require an API-breaking change to fix? If there are none of those, then we are still essentially on track for a 1.0.0 release. I don't have a particular

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
We made incompatible api changes whose impact we don't know yet completely : both from implementation and usage point of view. We had the option of getting real-world feedback from the user community if we had gone to 0.10 but the spark developers seemed to be in a hurry to get to 1.0 - so I made

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Andrew Or
+1 2014-05-17 8:53 GMT-07:00 Mark Hamstra m...@clearstorydata.com: +1 On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell pwend...@gmail.com wrote: I'll start the voting with a +1. On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra
That is a past issue that we don't need to be re-opening now. The present issue, and what I am asking, is which pending bug fixes does anyone anticipate will require breaking the public API guaranteed in rc9? On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan mri...@gmail.comwrote: We made

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Kan Zhang
+1 on the running commentary here, non-binding of course :-) On Sat, May 17, 2014 at 8:44 AM, Andrew Ash and...@andrewash.com wrote: +1 on the next release feeling more like a 0.10 than a 1.0 On May 17, 2014 4:38 AM, Mridul Muralidharan mri...@gmail.com wrote: I had echoed similar

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
On 17-May-2014 11:40 pm, Mark Hamstra m...@clearstorydata.com wrote: That is a past issue that we don't need to be re-opening now. The present Huh ? If we need to revisit based on changed circumstances, we must - the scope of changes introduced in this release was definitely not anticipated

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra
I'm not trying to muzzle the discussion. All I am saying is that we don't need to have the same discussion about 0.10 vs. 1.0 that we already had. If you can tell me about specific changes in the current release candidate that occasion new arguments for why a 1.0 release is an unacceptable idea,

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Matei Zaharia
As others have said, the 1.0 milestone is about API stability, not about saying “we’ve eliminated all bugs”. The sooner you declare 1.0, the sooner users can confidently build on Spark, knowing that the application they build today will still run on Spark 1.9.9 three years from now. This is

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
On 18-May-2014 1:45 am, Mark Hamstra m...@clearstorydata.com wrote: I'm not trying to muzzle the discussion. All I am saying is that we don't need to have the same discussion about 0.10 vs. 1.0 that we already had. Agreed, no point in repeating the same discussion ... I am also trying to

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-17 Thread Matei Zaharia
BTW for what it’s worth I agree this is a good option to add, the only tricky thing will be making sure the checkpoint blocks are not garbage-collected by the block store. I don’t think they will be though. Matei On May 17, 2014, at 2:20 PM, Matei Zaharia matei.zaha...@gmail.com wrote: We do

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-17 Thread Matei Zaharia
We do actually have replicated StorageLevels in Spark. You can use MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom replication factor. BTW you guys should probably have this discussion on the JIRA rather than the dev list; I think the replies somehow ended up on the

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
I would make the case for interface stability not just api stability. Particularly given that we have significantly changed some of our interfaces, I want to ensure developers/users are not seeing red flags. Bugs and code stability can be addressed in minor releases if found, but behavioral

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Michael Malak
While developers may appreciate 1.0 == API stability, I'm not sure that will be the understanding of the VP who gives the green light to a Spark-based development effort. I fear a bug that silently produces erroneous results will be perceived like the FDIV bug, but in this case without the

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Matei Zaharia
Yup, this is a good point, the interface includes stuff like launch scripts and environment variables. However I do think that the current features of spark-submit can all be supported in future releases. We’ll definitely have a very strict standard for modifying these later on. Matei On May

Re: can RDD be shared across mutil spark applications?

2014-05-17 Thread Andy Konwinski
RDDs cannot currently be shared across multiple SparkContexts without using something like the Tachyon project (which is a separate project/codebase). Andy On May 16, 2014 2:14 PM, qingyang li liqingyang1...@gmail.com wrote:

Re: can RDD be shared across mutil spark applications?

2014-05-17 Thread Christopher Nguyen
Qing Yang, Andy is correct in answering your direct question. At the same time, depending on your context, you may be able to apply a pattern where you turn the single Spark application into a service, and multiple clients if that service can indeed share access to the same RDDs. Several groups