SPARK-964 related issues
Hi everybody, two questions, somehow related to the discussion about JDK 8 support [1]. - Has support for Spores [2] been considered/discussed already? - Scala 2.11 is due in the next few month [3]. Would this be a reasonable target version for spark 1.0? As far as I know, both JDK 8 support and Spores should become part of scala 2.11. Regards, Heiko [1] https://spark-project.atlassian.net/browse/SPARK-964 [2] http://docs.scala-lang.org/sips/pending/spores.html [3] https://issues.scala-lang.org/browse/SI/component/10600?selectedTab=com.atlassian.jira.plugin.system.project%3Acomponent-roadmap-panel
Re: Proposal for Spark Release Strategy
The reason I explicitly mentioned about binary compatibility was because it was sort of hand waved in the proposal as good to have. My understanding is that scala does make it painful to ensure binary compatibility - but stability of interfaces is vital to ensure dependable platforms. Recompilation might be a viable option for developers - not for users. Regards, Mridul On Thu, Feb 6, 2014 at 12:08 PM, Patrick Wendell pwend...@gmail.com wrote: If people feel that merging the intermediate SNAPSHOT number is significant, let's just defer merging that until this discussion concludes. That said - the decision to settle on 1.0 for the next release is not just because it happens to come after 0.9. It's a conscientious decision based on the development of the project to this point. A major focus of the 0.9 release was tying off loose ends in terms of backwards compatibility (e.g. spark configuration). There was some discussion back then of maybe cutting a 1.0 release but the decision was deferred until after 0.9. @mridul - pleas see the original post for discussion about binary compatibility. On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski andykonwin...@gmail.com wrote: +1 for 0.10.0 now with the option to switch to 1.0.0 after further discussion. On Feb 5, 2014 9:53 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com: Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after the previous minor release to give people plenty of time, and over time, we might make the whole release process more regular (similar to Ubuntu). At the bottom of this document is an example window for the 1.0.0 release. Maintenance releases will occur more frequently and depend on specific patches introduced (e.g. bug fixes) and their urgency. In general these releases are designed to patch bugs. However, higher level libraries may introduce small features, such as a new algorithm, provided they are entirely additive and isolated from existing code paths. Spark core may not introduce any features. When new components are added to Spark, they may initially be marked as alpha. Alpha components do not have to abide by the above guidelines, however, to the maximum extent possible, they should try to. Once they are marked stable they have to follow these guidelines. At present, GraphX is the only alpha component of Spark. [1] API compatibility:
Re: SPARK-964 related issues
Hi, Well in the context of only JDK8 support, there is actually no need to migrate to Scala 2.11 or JDK 8 itself. The trick is to use Interfaces instead of AbstractClasses for accepting functions. On Thu, Feb 6, 2014 at 2:11 PM, Heiko Braun ike.br...@googlemail.comwrote: Hi everybody, two questions, somehow related to the discussion about JDK 8 support [1]. - Has support for Spores [2] been considered/discussed already? - Scala 2.11 is due in the next few month [3]. Would this be a reasonable target version for spark 1.0? As far as I know, both JDK 8 support and Spores should become part of scala 2.11. Regards, Heiko [1] https://spark-project.atlassian.net/browse/SPARK-964 [2] http://docs.scala-lang.org/sips/pending/spores.html [3] https://issues.scala-lang.org/browse/SI/component/10600?selectedTab=com.atlassian.jira.plugin.system.project%3Acomponent-roadmap-panel -- Prashant
Re: [0.9.0] Possible deadlock in shutdown hook?
shutdown hooks should not take 15 mins are you mentioned ! On the other hand, how busy was your disk when this was happening ? (either due to spark or something else ?) It might just be that there was a lot of stuff to remove ? Regards, Mridul On Thu, Feb 6, 2014 at 3:50 PM, Andrew Ash and...@andrewash.com wrote: Hi Spark devs, Occasionally when hitting Ctrl-C in the scala spark shell on 0.9.0 one of my workers goes dead in the spark master UI. I'm using the standalone cluster and didn't ever see this while using 0.8.0 so I think it may be a regression. When I prod on the hung CoarseGrainedExecutorBackend JVM with jstack and jmap -heap, it doesn't respond unless I add the -F force flag. The heap isn't full, but there are some interesting bits in the jstack. Poking around a little, I think there may be some kind of deadlock in the shutdown hooks. Below are the threads I think are most interesting: Thread 14308: (state = BLOCKED) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Runtime.exit(int) @bci=14, line=109 (Interpreted frame) - java.lang.System.exit(int) @bci=4, line=962 (Interpreted frame) - org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(java.lang.Object, scala.Function1) @bci=352, line=81 (Interpreted frame) - akka.actor.ActorCell.receiveMessage(java.lang.Object) @bci=25, line=498 (Interpreted frame) - akka.actor.ActorCell.invoke(akka.dispatch.Envelope) @bci=39, line=456 (Interpreted frame) - akka.dispatch.Mailbox.processMailbox(int, long) @bci=24, line=237 (Interpreted frame) - akka.dispatch.Mailbox.run() @bci=20, line=219 (Interpreted frame) - akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec() @bci=4, line=386 (Interpreted frame) - scala.concurrent.forkjoin.ForkJoinTask.doExec() @bci=10, line=260 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(scala.concurrent.forkjoin.ForkJoinTask) @bci=10, line=1339 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool.runWorker(scala.concurrent.forkjoin.ForkJoinPool$WorkQueue) @bci=11, line=1979 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinWorkerThread.run() @bci=14, line=107 (Interpreted frame) Thread 3865: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Thread.join(long) @bci=38, line=1280 (Interpreted frame) - java.lang.Thread.join() @bci=2, line=1354 (Interpreted frame) - java.lang.ApplicationShutdownHooks.runHooks() @bci=87, line=106 (Interpreted frame) - java.lang.ApplicationShutdownHooks$1.run() @bci=0, line=46 (Interpreted frame) - java.lang.Shutdown.runHooks() @bci=39, line=123 (Interpreted frame) - java.lang.Shutdown.sequence() @bci=26, line=167 (Interpreted frame) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Terminator$1.handle(sun.misc.Signal) @bci=8, line=52 (Interpreted frame) - sun.misc.Signal$1.run() @bci=8, line=212 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame) Thread 3987: (state = BLOCKED) - java.io.UnixFileSystem.list(java.io.File) @bci=0 (Interpreted frame) - java.io.File.list() @bci=29, line=1116 (Interpreted frame) - java.io.File.listFiles() @bci=1, line=1201 (Compiled frame) - org.apache.spark.util.Utils$.listFilesSafely(java.io.File) @bci=1, line=466 (Interpreted frame) - org.apache.spark.util.Utils$.deleteRecursively(java.io.File) @bci=9, line=478 (Compiled frame) - org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(java.io.File) @bci=4, line=479 (Compiled frame) - org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(java.lang.Object) @bci=5, line=478 (Compiled frame) - scala.collection.IndexedSeqOptimized$class.foreach(scala.collection.IndexedSeqOptimized, scala.Function1) @bci=22, line=33 (Compiled frame) - scala.collection.mutable.WrappedArray.foreach(scala.Function1) @bci=2, line=34 (Compiled frame) - org.apache.spark.util.Utils$.deleteRecursively(java.io.File) @bci=19, line=478 (Interpreted frame) - org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(java.io.File) @bci=14, line=141 (Interpreted frame) - org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(java.lang.Object) @bci=5, line=139 (Interpreted frame) - scala.collection.IndexedSeqOptimized$class.foreach(scala.collection.IndexedSeqOptimized, scala.Function1) @bci=22, line=33 (Compiled frame) - scala.collection.mutable.ArrayOps$ofRef.foreach(scala.Function1) @bci=2, line=108 (Interpreted frame) - org.apache.spark.storage.DiskBlockManager$$anon$1.run() @bci=39, line=139 (Interpreted frame) I think what happened here is that thread 14308 received the akka shutdown message and called System.exit(). This started thread 3865, which is the JVM shutting itself down. Part of that process is running the shutdown hooks, so it started thread 3987.
Re: [0.9.0] Possible deadlock in shutdown hook?
Per the book Java Concurrency in Practice the already-running threads continue running while the shutdown hooks run. So I think the race between the writing thread and the deleting thread could be a very real possibility :/ http://stackoverflow.com/a/3332925/120915 On Thu, Feb 6, 2014 at 2:49 AM, Andrew Ash and...@andrewash.com wrote: Got a repro locally on my MBP (the other was on a CentOS machine). Build spark, run a master and a worker with the sbin/start-all.sh script, then run this in a shell: import org.apache.spark.storage.StorageLevel._ val s = sc.parallelize(1 to 10).persist(MEMORY_AND_DISK_SER); s.count After about a minute, this line appears in the shell logging output: 14/02/06 02:44:44 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, aash-mbp.dyn.yojoe.local, 57895, 0) with no recent heart beats: 57510ms exceeds 45000ms Ctrl-C the shell. In jps there is now a worker, a master, and a CoarseGrainedExecutorBackend. Run jstack on the CGEBackend JVM, and I got the attached stacktraces. I waited around for 15min then kill -9'd the JVM and restarted the process. I wonder if what's happening here is that the threads that are spewing data to disk (as that parallelize and persist would do) can write to disk faster than the cleanup threads can delete from disk. What do you think of that theory? Andrew On Thu, Feb 6, 2014 at 2:30 AM, Mridul Muralidharan mri...@gmail.comwrote: shutdown hooks should not take 15 mins are you mentioned ! On the other hand, how busy was your disk when this was happening ? (either due to spark or something else ?) It might just be that there was a lot of stuff to remove ? Regards, Mridul On Thu, Feb 6, 2014 at 3:50 PM, Andrew Ash and...@andrewash.com wrote: Hi Spark devs, Occasionally when hitting Ctrl-C in the scala spark shell on 0.9.0 one of my workers goes dead in the spark master UI. I'm using the standalone cluster and didn't ever see this while using 0.8.0 so I think it may be a regression. When I prod on the hung CoarseGrainedExecutorBackend JVM with jstack and jmap -heap, it doesn't respond unless I add the -F force flag. The heap isn't full, but there are some interesting bits in the jstack. Poking around a little, I think there may be some kind of deadlock in the shutdown hooks. Below are the threads I think are most interesting: Thread 14308: (state = BLOCKED) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Runtime.exit(int) @bci=14, line=109 (Interpreted frame) - java.lang.System.exit(int) @bci=4, line=962 (Interpreted frame) - org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(java.lang.Object, scala.Function1) @bci=352, line=81 (Interpreted frame) - akka.actor.ActorCell.receiveMessage(java.lang.Object) @bci=25, line=498 (Interpreted frame) - akka.actor.ActorCell.invoke(akka.dispatch.Envelope) @bci=39, line=456 (Interpreted frame) - akka.dispatch.Mailbox.processMailbox(int, long) @bci=24, line=237 (Interpreted frame) - akka.dispatch.Mailbox.run() @bci=20, line=219 (Interpreted frame) - akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec() @bci=4, line=386 (Interpreted frame) - scala.concurrent.forkjoin.ForkJoinTask.doExec() @bci=10, line=260 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(scala.concurrent.forkjoin.ForkJoinTask) @bci=10, line=1339 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool.runWorker(scala.concurrent.forkjoin.ForkJoinPool$WorkQueue) @bci=11, line=1979 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinWorkerThread.run() @bci=14, line=107 (Interpreted frame) Thread 3865: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Thread.join(long) @bci=38, line=1280 (Interpreted frame) - java.lang.Thread.join() @bci=2, line=1354 (Interpreted frame) - java.lang.ApplicationShutdownHooks.runHooks() @bci=87, line=106 (Interpreted frame) - java.lang.ApplicationShutdownHooks$1.run() @bci=0, line=46 (Interpreted frame) - java.lang.Shutdown.runHooks() @bci=39, line=123 (Interpreted frame) - java.lang.Shutdown.sequence() @bci=26, line=167 (Interpreted frame) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Terminator$1.handle(sun.misc.Signal) @bci=8, line=52 (Interpreted frame) - sun.misc.Signal$1.run() @bci=8, line=212 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame) Thread 3987: (state = BLOCKED) - java.io.UnixFileSystem.list(java.io.File) @bci=0 (Interpreted frame) - java.io.File.list() @bci=29, line=1116 (Interpreted frame) - java.io.File.listFiles() @bci=1, line=1201 (Compiled frame) - org.apache.spark.util.Utils$.listFilesSafely(java.io.File) @bci=1, line=466 (Interpreted frame) -
Re: Proposal for Spark Release Strategy
Thanks for all this Patrick. I like Heiko's proposal that requires every pull request to reference a JIRA. This is how things are done in Hadoop and it makes it much easier to, for example, find out whether an issue you came across when googling for an error is in a release. I agree with Mridul about binary compatibility. It can be a dealbreaker for organizations that are considering an upgrade. The two ways I'm aware of that cause binary compatibility are scala version upgrades and messing around with inheritance. Are these not avoidable at least for minor releases? -Sandy On Thu, Feb 6, 2014 at 12:49 AM, Mridul Muralidharan mri...@gmail.comwrote: The reason I explicitly mentioned about binary compatibility was because it was sort of hand waved in the proposal as good to have. My understanding is that scala does make it painful to ensure binary compatibility - but stability of interfaces is vital to ensure dependable platforms. Recompilation might be a viable option for developers - not for users. Regards, Mridul On Thu, Feb 6, 2014 at 12:08 PM, Patrick Wendell pwend...@gmail.com wrote: If people feel that merging the intermediate SNAPSHOT number is significant, let's just defer merging that until this discussion concludes. That said - the decision to settle on 1.0 for the next release is not just because it happens to come after 0.9. It's a conscientious decision based on the development of the project to this point. A major focus of the 0.9 release was tying off loose ends in terms of backwards compatibility (e.g. spark configuration). There was some discussion back then of maybe cutting a 1.0 release but the decision was deferred until after 0.9. @mridul - pleas see the original post for discussion about binary compatibility. On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski andykonwin...@gmail.com wrote: +1 for 0.10.0 now with the option to switch to 1.0.0 after further discussion. On Feb 5, 2014 9:53 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com : Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after the previous minor release to give people plenty of time, and over time, we might make the whole release process more regular (similar to Ubuntu). At the bottom of this document is an example window for the 1.0.0 release.
Re: [0.9.0] Possible deadlock in shutdown hook?
Is it safe if we interrupt the running thread during shutdown? On Thu, Feb 6, 2014 at 3:27 AM, Andrew Ash and...@andrewash.com wrote: Per the book Java Concurrency in Practice the already-running threads continue running while the shutdown hooks run. So I think the race between the writing thread and the deleting thread could be a very real possibility :/ http://stackoverflow.com/a/3332925/120915 On Thu, Feb 6, 2014 at 2:49 AM, Andrew Ash and...@andrewash.com wrote: Got a repro locally on my MBP (the other was on a CentOS machine). Build spark, run a master and a worker with the sbin/start-all.sh script, then run this in a shell: import org.apache.spark.storage.StorageLevel._ val s = sc.parallelize(1 to 10).persist(MEMORY_AND_DISK_SER); s.count After about a minute, this line appears in the shell logging output: 14/02/06 02:44:44 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, aash-mbp.dyn.yojoe.local, 57895, 0) with no recent heart beats: 57510ms exceeds 45000ms Ctrl-C the shell. In jps there is now a worker, a master, and a CoarseGrainedExecutorBackend. Run jstack on the CGEBackend JVM, and I got the attached stacktraces. I waited around for 15min then kill -9'd the JVM and restarted the process. I wonder if what's happening here is that the threads that are spewing data to disk (as that parallelize and persist would do) can write to disk faster than the cleanup threads can delete from disk. What do you think of that theory? Andrew On Thu, Feb 6, 2014 at 2:30 AM, Mridul Muralidharan mri...@gmail.com wrote: shutdown hooks should not take 15 mins are you mentioned ! On the other hand, how busy was your disk when this was happening ? (either due to spark or something else ?) It might just be that there was a lot of stuff to remove ? Regards, Mridul On Thu, Feb 6, 2014 at 3:50 PM, Andrew Ash and...@andrewash.com wrote: Hi Spark devs, Occasionally when hitting Ctrl-C in the scala spark shell on 0.9.0 one of my workers goes dead in the spark master UI. I'm using the standalone cluster and didn't ever see this while using 0.8.0 so I think it may be a regression. When I prod on the hung CoarseGrainedExecutorBackend JVM with jstack and jmap -heap, it doesn't respond unless I add the -F force flag. The heap isn't full, but there are some interesting bits in the jstack. Poking around a little, I think there may be some kind of deadlock in the shutdown hooks. Below are the threads I think are most interesting: Thread 14308: (state = BLOCKED) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Runtime.exit(int) @bci=14, line=109 (Interpreted frame) - java.lang.System.exit(int) @bci=4, line=962 (Interpreted frame) - org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(java.lang.Object, scala.Function1) @bci=352, line=81 (Interpreted frame) - akka.actor.ActorCell.receiveMessage(java.lang.Object) @bci=25, line=498 (Interpreted frame) - akka.actor.ActorCell.invoke(akka.dispatch.Envelope) @bci=39, line=456 (Interpreted frame) - akka.dispatch.Mailbox.processMailbox(int, long) @bci=24, line=237 (Interpreted frame) - akka.dispatch.Mailbox.run() @bci=20, line=219 (Interpreted frame) - akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec() @bci=4, line=386 (Interpreted frame) - scala.concurrent.forkjoin.ForkJoinTask.doExec() @bci=10, line=260 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(scala.concurrent.forkjoin.ForkJoinTask) @bci=10, line=1339 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool.runWorker(scala.concurrent.forkjoin.ForkJoinPool$WorkQueue) @bci=11, line=1979 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinWorkerThread.run() @bci=14, line=107 (Interpreted frame) Thread 3865: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Thread.join(long) @bci=38, line=1280 (Interpreted frame) - java.lang.Thread.join() @bci=2, line=1354 (Interpreted frame) - java.lang.ApplicationShutdownHooks.runHooks() @bci=87, line=106 (Interpreted frame) - java.lang.ApplicationShutdownHooks$1.run() @bci=0, line=46 (Interpreted frame) - java.lang.Shutdown.runHooks() @bci=39, line=123 (Interpreted frame) - java.lang.Shutdown.sequence() @bci=26, line=167 (Interpreted frame) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Terminator$1.handle(sun.misc.Signal) @bci=8, line=52 (Interpreted frame) - sun.misc.Signal$1.run() @bci=8, line=212 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame) Thread 3987: (state = BLOCKED) - java.io.UnixFileSystem.list(java.io.File)
Re: Proposal for Spark Release Strategy
I like Heiko's proposal that requires every pull request to reference a JIRA. This is how things are done in Hadoop and it makes it much easier to, for example, find out whether an issue you came across when googling for an error is in a release. I think this is a good idea and something on which there is wide consensus. I separately was going to suggest this in a later e-mail (it's not directly tied to versioning). One of many reasons this is necessary is because it's becoming hard to track which features ended up in which releases. I agree with Mridul about binary compatibility. It can be a dealbreaker for organizations that are considering an upgrade. The two ways I'm aware of that cause binary compatibility are scala version upgrades and messing around with inheritance. Are these not avoidable at least for minor releases? This is clearly a goal but I'm hesitant to codify it until we understand all of the reasons why it might not work. I've heard in general with Scala there are many non-obvious things that can break binary compatibility and we need to understand what they are. I'd propose we add the migration tool [1] here to our build and use it for a few months and see what happens (hat tip to Michael Armbrust). It's easy to formalize this as a requirement later, it's impossible to go the other direction. For Scala major versions it's possible we can cross-build between 2.10 and 2.11 to retain link-level compatibility. It's just entirely uncharted territory and AFAIK no one who's suggesting this is speaking from experience maintaining this guarantee for a Scala project. That would be the strongest convincing reason for me - if someone has actually done this in the past in a Scala project and speaks from experience. Most of use are speaking from the perspective of Java projects where we understand well the trade-off's and costs of maintaining this guarantee. [1] https://github.com/typesafehub/migration-manager - Patrick
Re: Proposal for Spark Release Strategy
Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.comwrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com: Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after the previous minor release to give people plenty of time, and over time, we might make the whole release process more regular (similar to Ubuntu). At the bottom of this document is an example window for the 1.0.0 release. Maintenance releases will occur more frequently and depend on specific patches introduced (e.g. bug fixes) and their urgency. In general these releases are designed to patch bugs. However, higher level libraries may introduce small features, such as a new algorithm, provided they are entirely additive and isolated from existing code paths. Spark core may not introduce any features. When new components are added to Spark, they may initially be marked as alpha. Alpha components do not have to abide by the above guidelines, however, to the maximum extent possible, they should try to. Once they are marked stable they have to follow these guidelines. At present, GraphX is the only alpha component of Spark. [1] API compatibility: An API is any public class or interface exposed in Spark that is not marked as semi-private or experimental. Release A is API compatible with release B if code compiled against release A *compiles cleanly* against B. This does not guarantee that a compiled application that is linked against version A will link cleanly against version B without re-compiling. Link-level compatibility is something we'll try to guarantee that as well, and we might make it a requirement in the future, but challenges with things like Scala versions have made this difficult to guarantee in the past. == Merging Pull Requests == To merge pull requests, committers are encouraged to use this tool [2] to collapse the request into one commit rather than manually performing git merges. It will also format the commit message nicely in a way that can be easily parsed later when writing credits.
Re: [0.9.0] Possible deadlock in shutdown hook?
Looks like a pathological corner case here - where the the delete thread is not getting run while the OS is busy prioritizing the thread writing data (probably with heavy gc too). Ideally, the delete thread would list files, remove them and then fail when it tries to remove the non empty directory (since other thread might be creating more in parallel). Regards, Mridul On Thu, Feb 6, 2014 at 4:19 PM, Andrew Ash and...@andrewash.com wrote: Got a repro locally on my MBP (the other was on a CentOS machine). Build spark, run a master and a worker with the sbin/start-all.sh script, then run this in a shell: import org.apache.spark.storage.StorageLevel._ val s = sc.parallelize(1 to 10).persist(MEMORY_AND_DISK_SER); s.count After about a minute, this line appears in the shell logging output: 14/02/06 02:44:44 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, aash-mbp.dyn.yojoe.local, 57895, 0) with no recent heart beats: 57510ms exceeds 45000ms Ctrl-C the shell. In jps there is now a worker, a master, and a CoarseGrainedExecutorBackend. Run jstack on the CGEBackend JVM, and I got the attached stacktraces. I waited around for 15min then kill -9'd the JVM and restarted the process. I wonder if what's happening here is that the threads that are spewing data to disk (as that parallelize and persist would do) can write to disk faster than the cleanup threads can delete from disk. What do you think of that theory? Andrew On Thu, Feb 6, 2014 at 2:30 AM, Mridul Muralidharan mri...@gmail.com wrote: shutdown hooks should not take 15 mins are you mentioned ! On the other hand, how busy was your disk when this was happening ? (either due to spark or something else ?) It might just be that there was a lot of stuff to remove ? Regards, Mridul On Thu, Feb 6, 2014 at 3:50 PM, Andrew Ash and...@andrewash.com wrote: Hi Spark devs, Occasionally when hitting Ctrl-C in the scala spark shell on 0.9.0 one of my workers goes dead in the spark master UI. I'm using the standalone cluster and didn't ever see this while using 0.8.0 so I think it may be a regression. When I prod on the hung CoarseGrainedExecutorBackend JVM with jstack and jmap -heap, it doesn't respond unless I add the -F force flag. The heap isn't full, but there are some interesting bits in the jstack. Poking around a little, I think there may be some kind of deadlock in the shutdown hooks. Below are the threads I think are most interesting: Thread 14308: (state = BLOCKED) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Runtime.exit(int) @bci=14, line=109 (Interpreted frame) - java.lang.System.exit(int) @bci=4, line=962 (Interpreted frame) - org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(java.lang.Object, scala.Function1) @bci=352, line=81 (Interpreted frame) - akka.actor.ActorCell.receiveMessage(java.lang.Object) @bci=25, line=498 (Interpreted frame) - akka.actor.ActorCell.invoke(akka.dispatch.Envelope) @bci=39, line=456 (Interpreted frame) - akka.dispatch.Mailbox.processMailbox(int, long) @bci=24, line=237 (Interpreted frame) - akka.dispatch.Mailbox.run() @bci=20, line=219 (Interpreted frame) - akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec() @bci=4, line=386 (Interpreted frame) - scala.concurrent.forkjoin.ForkJoinTask.doExec() @bci=10, line=260 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(scala.concurrent.forkjoin.ForkJoinTask) @bci=10, line=1339 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool.runWorker(scala.concurrent.forkjoin.ForkJoinPool$WorkQueue) @bci=11, line=1979 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinWorkerThread.run() @bci=14, line=107 (Interpreted frame) Thread 3865: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Thread.join(long) @bci=38, line=1280 (Interpreted frame) - java.lang.Thread.join() @bci=2, line=1354 (Interpreted frame) - java.lang.ApplicationShutdownHooks.runHooks() @bci=87, line=106 (Interpreted frame) - java.lang.ApplicationShutdownHooks$1.run() @bci=0, line=46 (Interpreted frame) - java.lang.Shutdown.runHooks() @bci=39, line=123 (Interpreted frame) - java.lang.Shutdown.sequence() @bci=26, line=167 (Interpreted frame) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Terminator$1.handle(sun.misc.Signal) @bci=8, line=52 (Interpreted frame) - sun.misc.Signal$1.run() @bci=8, line=212 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame) Thread 3987: (state = BLOCKED) - java.io.UnixFileSystem.list(java.io.File) @bci=0 (Interpreted frame) - java.io.File.list() @bci=29, line=1116 (Interpreted frame) - java.io.File.listFiles() @bci=1, line=1201 (Compiled frame)
Re: Proposal for Spark Release Strategy
Bleh, hit send to early again. My second paragraph was to argue for 1.0.0 instead of 0.10.0, not to hammer on the binary compatibility point. On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza sandy.r...@cloudera.com wrote: *Would it make sense to put in something that strongly discourages binary incompatible changes when possible? On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza sandy.r...@cloudera.comwrote: Not codifying binary compatibility as a hard rule sounds fine to me. Would it make sense to put something in that . I.e. avoid making needless changes to class hierarchies. Whether Spark considers itself stable or not, users are beginning to treat it so. A responsible project will acknowledge this and provide the stability needed by its user base. I think some projects have made the mistake of waiting too long to release a 1.0.0. It allows them to put off making the hard decisions, but users and downstream projects suffer. If Spark needs to go through dramatic changes, there's always the option of a 2.0.0 that allows for this. -Sandy On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia matei.zaha...@gmail.comwrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com : Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements.
Re: Proposal for Spark Release Strategy
*Would it make sense to put in something that strongly discourages binary incompatible changes when possible? On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Not codifying binary compatibility as a hard rule sounds fine to me. Would it make sense to put something in that . I.e. avoid making needless changes to class hierarchies. Whether Spark considers itself stable or not, users are beginning to treat it so. A responsible project will acknowledge this and provide the stability needed by its user base. I think some projects have made the mistake of waiting too long to release a 1.0.0. It allows them to put off making the hard decisions, but users and downstream projects suffer. If Spark needs to go through dramatic changes, there's always the option of a 2.0.0 that allows for this. -Sandy On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia matei.zaha...@gmail.comwrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com: Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge
Re: Proposal for Spark Release Strategy
Not codifying binary compatibility as a hard rule sounds fine to me. Would it make sense to put something in that . I.e. avoid making needless changes to class hierarchies. Whether Spark considers itself stable or not, users are beginning to treat it so. A responsible project will acknowledge this and provide the stability needed by its user base. I think some projects have made the mistake of waiting too long to release a 1.0.0. It allows them to put off making the hard decisions, but users and downstream projects suffer. If Spark needs to go through dramatic changes, there's always the option of a 2.0.0 that allows for this. -Sandy On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia matei.zaha...@gmail.comwrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com: Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after
Re: Proposal for Spark Release Strategy
+1 for 0.10.0. It would give more time to study things (such as the new SparkConf) and let the community decide if any breaking API changes are needed. Also, a +1 for minor revisions not breaking code compatibility, including Scala versions. (I guess this would mean that 1.x would stay on Scala 2.10.x) On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Bleh, hit send to early again. My second paragraph was to argue for 1.0.0 instead of 0.10.0, not to hammer on the binary compatibility point. On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza sandy.r...@cloudera.com wrote: *Would it make sense to put in something that strongly discourages binary incompatible changes when possible? On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza sandy.r...@cloudera.comwrote: Not codifying binary compatibility as a hard rule sounds fine to me. Would it make sense to put something in that . I.e. avoid making needless changes to class hierarchies. Whether Spark considers itself stable or not, users are beginning to treat it so. A responsible project will acknowledge this and provide the stability needed by its user base. I think some projects have made the mistake of waiting too long to release a 1.0.0. It allows them to put off making the hard decisions, but users and downstream projects suffer. If Spark needs to go through dramatic changes, there's always the option of a 2.0.0 that allows for this. -Sandy On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia matei.zaha...@gmail.comwrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com : Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a
Re: Proposal for Spark Release Strategy
The other reason for waiting are things like stability. It would be great to have as a goal for 1.0.0 that under most heavy use scenarios, workers and executors don't just die, which is not true today. Also, there should be minimal silent failures which are difficult to debug. On Thu, Feb 6, 2014 at 11:54 AM, Evan Chan e...@ooyala.com wrote: +1 for 0.10.0. It would give more time to study things (such as the new SparkConf) and let the community decide if any breaking API changes are needed. Also, a +1 for minor revisions not breaking code compatibility, including Scala versions. (I guess this would mean that 1.x would stay on Scala 2.10.x) On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Bleh, hit send to early again. My second paragraph was to argue for 1.0.0 instead of 0.10.0, not to hammer on the binary compatibility point. On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza sandy.r...@cloudera.com wrote: *Would it make sense to put in something that strongly discourages binary incompatible changes when possible? On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza sandy.r...@cloudera.comwrote: Not codifying binary compatibility as a hard rule sounds fine to me. Would it make sense to put something in that . I.e. avoid making needless changes to class hierarchies. Whether Spark considers itself stable or not, users are beginning to treat it so. A responsible project will acknowledge this and provide the stability needed by its user base. I think some projects have made the mistake of waiting too long to release a 1.0.0. It allows them to put off making the hard decisions, but users and downstream projects suffer. If Spark needs to go through dramatic changes, there's always the option of a 2.0.0 that allows for this. -Sandy On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia matei.zaha...@gmail.comwrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com : Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0
Re: Proposal for Spark Release Strategy
+1 for 1.0 The point of 1.0 is for us to self-enforce API compatibility in the context of longer term support. If we continue down the 0.xx road, we will always have excuse for breaking APIs. That said, a major focus of 0.9 and some of the work that are happening for 1.0 (e.g. configuration, Java 8 closure support, security) are for better API compatibility support in 1.x releases. While not perfect, Spark as is is already more mature than many (ASF) projects that are versioned 1.x, 2.x, or even 10.x. Software releases are always a moving target. 1.0 doesn't mean it is perfect and final. The project will still evolve. On Thu, Feb 6, 2014 at 11:54 AM, Evan Chan e...@ooyala.com wrote: +1 for 0.10.0. It would give more time to study things (such as the new SparkConf) and let the community decide if any breaking API changes are needed. Also, a +1 for minor revisions not breaking code compatibility, including Scala versions. (I guess this would mean that 1.x would stay on Scala 2.10.x) On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Bleh, hit send to early again. My second paragraph was to argue for 1.0.0 instead of 0.10.0, not to hammer on the binary compatibility point. On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza sandy.r...@cloudera.com wrote: *Would it make sense to put in something that strongly discourages binary incompatible changes when possible? On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Not codifying binary compatibility as a hard rule sounds fine to me. Would it make sense to put something in that . I.e. avoid making needless changes to class hierarchies. Whether Spark considers itself stable or not, users are beginning to treat it so. A responsible project will acknowledge this and provide the stability needed by its user base. I think some projects have made the mistake of waiting too long to release a 1.0.0. It allows them to put off making the hard decisions, but users and downstream projects suffer. If Spark needs to go through dramatic changes, there's always the option of a 2.0.0 that allows for this. -Sandy On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia matei.zaha...@gmail.comwrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote:
Re: Proposal for Spark Release Strategy
I don't really agree with this logic. I think we haven't broken API so far because we just keep adding stuff on to it, and we haven't bothered to clean the api up, specifically to *avoid* breaking things. Here's a handful of api breaking things that we might want to consider: * should we look at all the various configuration properties, and maybe some of them should get renamed for consistency / clarity? * do all of the functions on RDD need to be in core? or do some of them that are simple additions built on top of the primitives really belong in a utils package or something? Eg., maybe we should get rid of all the variants of the mapPartitions / mapWith / etc. just have map, and mapPartitionsWithIndex (too many choices in the api can also be confusing to the user) * are the right things getting tracked in SparkListener? Do we need to add or remove anything? This is probably not the right list of questions, that's just an idea of the kind of thing we should be thinking about. Its also fine with me if 1.0 is next, I just think that we ought to be asking these kinds of questions up and down the entire api before we release 1.0. And given that we haven't even started that discussion, it seems possible that there could be new features we'd like to release in 0.10 before that discussion is finished. On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia matei.zaha...@gmail.comwrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um 01:20 schrieb Patrick Wendell pwend...@gmail.com: Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version
Re: Proposal for Spark Release Strategy
If the APIs are usable, stability and continuity are much more important than perfection. With many already relying on the current APIs, I think trying to clean them up will just cause pain for users and integrators. Hadoop made this mistake when they decided the original MapReduce APIs were ugly and introduced a new set of APIs to do the same thing. Even though this happened in a pre-1.0 release, three years down the road, both the old and new APIs are still supported, causing endless confusion for users. If individual functions or configuration properties have unclear names, they can be deprecated and replaced, but redoing the APIs or breaking compatibility at this point is simply not worth it. On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid im...@quantifind.com wrote: I don't really agree with this logic. I think we haven't broken API so far because we just keep adding stuff on to it, and we haven't bothered to clean the api up, specifically to *avoid* breaking things. Here's a handful of api breaking things that we might want to consider: * should we look at all the various configuration properties, and maybe some of them should get renamed for consistency / clarity? * do all of the functions on RDD need to be in core? or do some of them that are simple additions built on top of the primitives really belong in a utils package or something? Eg., maybe we should get rid of all the variants of the mapPartitions / mapWith / etc. just have map, and mapPartitionsWithIndex (too many choices in the api can also be confusing to the user) * are the right things getting tracked in SparkListener? Do we need to add or remove anything? This is probably not the right list of questions, that's just an idea of the kind of thing we should be thinking about. Its also fine with me if 1.0 is next, I just think that we ought to be asking these kinds of questions up and down the entire api before we release 1.0. And given that we haven't even started that discussion, it seems possible that there could be new features we'd like to release in 0.10 before that discussion is finished. On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia matei.zaha...@gmail.com wrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 on time boxed releases and compatibility guidelines Am 06.02.2014 um
Re: Proposal for Spark Release Strategy
Just to echo others - The relevant question is whether we want to advertise stable API's for users that we will support for a long time horizon. And doing this is critical to being taken seriously as a mature project. The question is not whether or not there are things we want to improve about Spark (further reduce dependencies, runtime stability, etc) - of course everyone wants to improve those things! In the next few months ahead of 1.0 the plan would be to invest effort in finishing off loose ends in the API and of course, no 1.0 release candidate will pass muster if these aren't addressed. I only see a few fairly small blockers though wrt API issues: - We should mark things that may evolve and change as semi-private developer API's (e.g. the Spark Listener). - We need to standardize the Java API in a way that supports Java 8 lamdbas. Other than that - I don't see many blockers in terms of API changes we might want to make. A lot of those were dealt with in 0.9 specifically to prepare for this. The broader question API clean-up brings up a debate about the trade off of compatibility with older pre-1.0 versions of Spark. This is not the primary issue under discussion and can be debated separably. The primary issue at hand is whether to have 1.0 in ~3 months vs pushing it to ~6 months from now or more. - Patrick On Thu, Feb 6, 2014 at 12:49 PM, Sandy Ryza sandy.r...@cloudera.com wrote: If the APIs are usable, stability and continuity are much more important than perfection. With many already relying on the current APIs, I think trying to clean them up will just cause pain for users and integrators. Hadoop made this mistake when they decided the original MapReduce APIs were ugly and introduced a new set of APIs to do the same thing. Even though this happened in a pre-1.0 release, three years down the road, both the old and new APIs are still supported, causing endless confusion for users. If individual functions or configuration properties have unclear names, they can be deprecated and replaced, but redoing the APIs or breaking compatibility at this point is simply not worth it. On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid im...@quantifind.com wrote: I don't really agree with this logic. I think we haven't broken API so far because we just keep adding stuff on to it, and we haven't bothered to clean the api up, specifically to *avoid* breaking things. Here's a handful of api breaking things that we might want to consider: * should we look at all the various configuration properties, and maybe some of them should get renamed for consistency / clarity? * do all of the functions on RDD need to be in core? or do some of them that are simple additions built on top of the primitives really belong in a utils package or something? Eg., maybe we should get rid of all the variants of the mapPartitions / mapWith / etc. just have map, and mapPartitionsWithIndex (too many choices in the api can also be confusing to the user) * are the right things getting tracked in SparkListener? Do we need to add or remove anything? This is probably not the right list of questions, that's just an idea of the kind of thing we should be thinking about. Its also fine with me if 1.0 is next, I just think that we ought to be asking these kinds of questions up and down the entire api before we release 1.0. And given that we haven't even started that discussion, it seems possible that there could be new features we'd like to release in 0.10 before that discussion is finished. On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia matei.zaha...@gmail.com wrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the
Re: Proposal for Spark Release Strategy
Imran: Its also fine with me if 1.0 is next, I just think that we ought to be asking these kinds of questions up and down the entire api before we release 1.0. And moving master to 1.0.0-SNAPSHOT doesn't preclude that. If anything, it turns that ought to into must -- which is another way of saying what Reynold said: The point of 1.0 is for us to self-enforce API compatibility in the context of longer term support. If we continue down the 0.xx road, we will always have excuse for breaking APIs. 1.0.0-SNAPSHOT doesn't mean that the API is final right now. It means that what is released next will be final over what is intended to be the lengthy scope of a major release. That means that adding new features and functionality (at least to core spark) should be a very low priority for this development cycle, and establishing the 1.0 API from what is already in 0.9.0 should be our first priority. It wouldn't trouble me at all if not-strictly-necessary new features were left to hang out on the pull request queue for quite awhile until we are ready to add them in 1.1.0, if we were to do pretty much nothing else during this cycle except to get the 1.0 API to where most of us agree that it is in good shape. If we're not adding new features and extending the 0.9.0 API, then there really is no need for a 0.10.0 minor-release, whose main purpose would be to collect the API additions from 0.9.0. Bug-fixes go in 0.9.1-SNAPSHOT; bug-fixes and finalized 1.0 API go in 1.0.0-SNAPSHOT; almost all new features are put on hold and wait for 1.1.0-SNAPSHOT. ... it seems possible that there could be new features we'd like to release in 0.10... We certainly can add new features to 1.0.0, but they will have to go through a rigorous review to be certain that they are things that we really want to commit to keeping going forward. But after 1.0, that is true for any new feature proposal unless we create specifically experimental branches. So what moving to 1.0.0-SNAPSHOT really means is that we are saying that we have gone beyond the development phase where more-or-less experimental features can be added to Spark releases only to be withdrawn later -- that time is done after 1.0.0-SNAPSHOT. Now to be fair, tentative/experimental features have not been added willy-nilly to Spark over recent releases, and withdrawal/replacement has been about as limited in scope as could be fairly expected, so this shouldn't be a radically new and different development paradigm. There are, though, some experiments that were added in the past and should probably now be withdrawn (or at least deprecated in 1.0.0, withdrawn in 1.1.0.) I'll put my own contribution of mapWith, filterWith, et. al on the chopping block as an effort that, at least in its present form, doesn't provide enough extra over mapPartitionsWithIndex, and whose syntax is awkward enough that I don't believe these methods have ever been widely used, so that their inclusion in the 1.0 API is probably not warranted. There are other elements of Spark that also should be culled and/or refactored before 1.0. Imran has listed a few. I'll also suggest that there are at least parts of alternative Broadcast variable implementations that should probably be left behind. In any event, Imran is absolutely correct that we need to have a discussion about these issues. Moving to 1.0.0-SNAPSHOT forces us to begin that discussion. So, I'm +1 for 1.0.0-incubating-SNAPSHOT (and looking forward to losing the incubating!) On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid im...@quantifind.com wrote: I don't really agree with this logic. I think we haven't broken API so far because we just keep adding stuff on to it, and we haven't bothered to clean the api up, specifically to *avoid* breaking things. Here's a handful of api breaking things that we might want to consider: * should we look at all the various configuration properties, and maybe some of them should get renamed for consistency / clarity? * do all of the functions on RDD need to be in core? or do some of them that are simple additions built on top of the primitives really belong in a utils package or something? Eg., maybe we should get rid of all the variants of the mapPartitions / mapWith / etc. just have map, and mapPartitionsWithIndex (too many choices in the api can also be confusing to the user) * are the right things getting tracked in SparkListener? Do we need to add or remove anything? This is probably not the right list of questions, that's just an idea of the kind of thing we should be thinking about. Its also fine with me if 1.0 is next, I just think that we ought to be asking these kinds of questions up and down the entire api before we release 1.0. And given that we haven't even started that discussion, it seems possible that there could be new features we'd like to release in 0.10 before that discussion is finished. On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia
Re: Proposal for Spark Release Strategy
I'm not sure that that is the conclusion that I would draw from the Hadoop example. I would certainly agree that maintaining and supporting both an old and a new API is a cause of endless confusion for users. If we are going to change or drop things from the API to reach 1.0, then we shouldn't be maintaining and support the prior way of doing things beyond a 1.0.0 - 1.1.0 deprecation cycle. On Thu, Feb 6, 2014 at 12:49 PM, Sandy Ryza sandy.r...@cloudera.com wrote: If the APIs are usable, stability and continuity are much more important than perfection. With many already relying on the current APIs, I think trying to clean them up will just cause pain for users and integrators. Hadoop made this mistake when they decided the original MapReduce APIs were ugly and introduced a new set of APIs to do the same thing. Even though this happened in a pre-1.0 release, three years down the road, both the old and new APIs are still supported, causing endless confusion for users. If individual functions or configuration properties have unclear names, they can be deprecated and replaced, but redoing the APIs or breaking compatibility at this point is simply not worth it. On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid im...@quantifind.com wrote: I don't really agree with this logic. I think we haven't broken API so far because we just keep adding stuff on to it, and we haven't bothered to clean the api up, specifically to *avoid* breaking things. Here's a handful of api breaking things that we might want to consider: * should we look at all the various configuration properties, and maybe some of them should get renamed for consistency / clarity? * do all of the functions on RDD need to be in core? or do some of them that are simple additions built on top of the primitives really belong in a utils package or something? Eg., maybe we should get rid of all the variants of the mapPartitions / mapWith / etc. just have map, and mapPartitionsWithIndex (too many choices in the api can also be confusing to the user) * are the right things getting tracked in SparkListener? Do we need to add or remove anything? This is probably not the right list of questions, that's just an idea of the kind of thing we should be thinking about. Its also fine with me if 1.0 is next, I just think that we ought to be asking these kinds of questions up and down the entire api before we release 1.0. And given that we haven't even started that discussion, it seems possible that there could be new features we'd like to release in 0.10 before that discussion is finished. On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia matei.zaha...@gmail.com wrote: I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash and...@andrewash.com wrote: Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and
Re: [0.9.0] Possible deadlock in shutdown hook?
I think the solution where we stop the writing threads and then let the deleting threads completely clean up is the best option since the final state doesn't have half-deleted temp dirs scattered across the cluster. How feasible do you think it'd be to interrupt the other threads? On Thu, Feb 6, 2014 at 10:54 AM, Mridul Muralidharan mri...@gmail.comwrote: Looks like a pathological corner case here - where the the delete thread is not getting run while the OS is busy prioritizing the thread writing data (probably with heavy gc too). Ideally, the delete thread would list files, remove them and then fail when it tries to remove the non empty directory (since other thread might be creating more in parallel). Regards, Mridul On Thu, Feb 6, 2014 at 4:19 PM, Andrew Ash and...@andrewash.com wrote: Got a repro locally on my MBP (the other was on a CentOS machine). Build spark, run a master and a worker with the sbin/start-all.sh script, then run this in a shell: import org.apache.spark.storage.StorageLevel._ val s = sc.parallelize(1 to 10).persist(MEMORY_AND_DISK_SER); s.count After about a minute, this line appears in the shell logging output: 14/02/06 02:44:44 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, aash-mbp.dyn.yojoe.local, 57895, 0) with no recent heart beats: 57510ms exceeds 45000ms Ctrl-C the shell. In jps there is now a worker, a master, and a CoarseGrainedExecutorBackend. Run jstack on the CGEBackend JVM, and I got the attached stacktraces. I waited around for 15min then kill -9'd the JVM and restarted the process. I wonder if what's happening here is that the threads that are spewing data to disk (as that parallelize and persist would do) can write to disk faster than the cleanup threads can delete from disk. What do you think of that theory? Andrew On Thu, Feb 6, 2014 at 2:30 AM, Mridul Muralidharan mri...@gmail.com wrote: shutdown hooks should not take 15 mins are you mentioned ! On the other hand, how busy was your disk when this was happening ? (either due to spark or something else ?) It might just be that there was a lot of stuff to remove ? Regards, Mridul On Thu, Feb 6, 2014 at 3:50 PM, Andrew Ash and...@andrewash.com wrote: Hi Spark devs, Occasionally when hitting Ctrl-C in the scala spark shell on 0.9.0 one of my workers goes dead in the spark master UI. I'm using the standalone cluster and didn't ever see this while using 0.8.0 so I think it may be a regression. When I prod on the hung CoarseGrainedExecutorBackend JVM with jstack and jmap -heap, it doesn't respond unless I add the -F force flag. The heap isn't full, but there are some interesting bits in the jstack. Poking around a little, I think there may be some kind of deadlock in the shutdown hooks. Below are the threads I think are most interesting: Thread 14308: (state = BLOCKED) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Runtime.exit(int) @bci=14, line=109 (Interpreted frame) - java.lang.System.exit(int) @bci=4, line=962 (Interpreted frame) - org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(java.lang.Object, scala.Function1) @bci=352, line=81 (Interpreted frame) - akka.actor.ActorCell.receiveMessage(java.lang.Object) @bci=25, line=498 (Interpreted frame) - akka.actor.ActorCell.invoke(akka.dispatch.Envelope) @bci=39, line=456 (Interpreted frame) - akka.dispatch.Mailbox.processMailbox(int, long) @bci=24, line=237 (Interpreted frame) - akka.dispatch.Mailbox.run() @bci=20, line=219 (Interpreted frame) - akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec() @bci=4, line=386 (Interpreted frame) - scala.concurrent.forkjoin.ForkJoinTask.doExec() @bci=10, line=260 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(scala.concurrent.forkjoin.ForkJoinTask) @bci=10, line=1339 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool.runWorker(scala.concurrent.forkjoin.ForkJoinPool$WorkQueue) @bci=11, line=1979 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinWorkerThread.run() @bci=14, line=107 (Interpreted frame) Thread 3865: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Thread.join(long) @bci=38, line=1280 (Interpreted frame) - java.lang.Thread.join() @bci=2, line=1354 (Interpreted frame) - java.lang.ApplicationShutdownHooks.runHooks() @bci=87, line=106 (Interpreted frame) - java.lang.ApplicationShutdownHooks$1.run() @bci=0, line=46 (Interpreted frame) - java.lang.Shutdown.runHooks() @bci=39, line=123 (Interpreted frame) - java.lang.Shutdown.sequence() @bci=26, line=167 (Interpreted frame) - java.lang.Shutdown.exit(int) @bci=96,
Is there any way to make a quick test on some pre-commit code?
Hi, all Is it always necessary to run sbt assembly when you want to test some code, Sometimes you just repeatedly change one or two lines for some failed test case, it is really time-consuming to sbt assembly every time any faster way? Best, -- Nan Zhu
Re: Is there any way to make a quick test on some pre-commit code?
You can do sbt/sbt assemble-deps and then just run sbt/sbt package each time. You can even do sbt/sbt ~package for automatic incremental compilation. On Thu, Feb 6, 2014 at 4:46 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all Is it always necessary to run sbt assembly when you want to test some code, Sometimes you just repeatedly change one or two lines for some failed test case, it is really time-consuming to sbt assembly every time any faster way? Best, -- Nan Zhu
Re: Is there any way to make a quick test on some pre-commit code?
Thank you very much, Reynold -- Nan Zhu On Thursday, February 6, 2014 at 7:50 PM, Reynold Xin wrote: You can do sbt/sbt assemble-deps and then just run sbt/sbt package each time. You can even do sbt/sbt ~package for automatic incremental compilation. On Thu, Feb 6, 2014 at 4:46 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: Hi, all Is it always necessary to run sbt assembly when you want to test some code, Sometimes you just repeatedly change one or two lines for some failed test case, it is really time-consuming to sbt assembly every time any faster way? Best, -- Nan Zhu
Re: Discussion on strategy or roadmap should happen on dev@ list
I'd prefer a spark-jira-activity list so I can filter appropriately. A separate request, but a spark-commits list that has all commits emailed to it has been helpful at work to keep a pulse on activity. On Thu, Feb 6, 2014 at 6:27 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Henry (or anyone else), do you have any preference on sending these directly to “dev versus creating another list for “issues”? I guess we can try “dev” for a while and let people decide if it gets too spammy. We’ll just have to advertise it in advance. Matei On Feb 6, 2014, at 9:55 AM, Henry Saputra henry.sapu...@gmail.com wrote: HI Matei, yeah please subscribe it for now. Once we have ASF JIRA setup for Spark it will happen automatically. - Henry On Wed, Feb 5, 2014 at 2:56 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Henry, this makes sense. I’d like to add that one other vehicle for discussion has been JIRA at https://spark-project.atlassian.net/browse/SPARK. Right now the dev list is not subscribed to JIRA, but we’d be happy to subscribe it anytime if that helps. We were hoping to do this only when JIRA has been moved to the ASF, since infra can set up the forwarding automatically. But most major discussions (e.g. https://spark-project.atlassian.net/browse/SPARK-964, https://spark-project.atlassian.net/browse/SPARK-969) happen there. I think this is the model we want to have in the future — most other projects I’ve participated in also used JIRA for their discussion, and mirrored to either the “dev” list or an “issues” list. Matei On Feb 5, 2014, at 2:49 PM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Guys, Just friendly reminder, some of you guys may work closely or collaborate outside the dev@ list and sometimes it is easier. But, as part of Apache Software Foundation project, any decision or outcome that could or will be implemented in the Apache Spark need to happen in the dev@ list as we are open and collaborative as community. If offline discussions happen please forward the history or potential solution to the dev@ list before any action taken. Most of us work remote so email is the official channel of discussion about stuff related to development in Spark. Github pull request is not the appropriate vehicle for technical discussions. It is used primarily for review of proposed patch which means initial problem most of the times had been identified and discussed. Thanks for understanding. - Henry
Notice: JIRA messages will be forwarded to this list
As part of ASF policy, we need to archive all development discussions on a mailing list, so I’m going to do this on the dev list for now. You can filter these messages by the sender, which will be j...@spark-project.atlassian.net. If the list becomes too spammy as a result, we can create a separate “issues” list later, but I just want to have something in place to archive them for now. Matei
Re: Discussion on strategy or roadmap should happen on dev@ list
Hi, Matei, Does it mean that I will receive the notification on every issue, instead of just the ones I’m watching on? Best, -- Nan Zhu On Thursday, February 6, 2014 at 10:56 PM, Matei Zaharia wrote: You can already get commits on comm...@spark.incubator.apache.org (mailto:comm...@spark.incubator.apache.org) actually. Regarding JIRA issues, I think I’ll just try putting them on dev for now, and we can move to a separate list later. If you’d like to filter them, the address is j...@spark-project.atlassian.net (mailto:j...@spark-project.atlassian.net). Matei On Feb 6, 2014, at 6:30 PM, Andrew Ash and...@andrewash.com (mailto:and...@andrewash.com) wrote: I'd prefer a spark-jira-activity list so I can filter appropriately. A separate request, but a spark-commits list that has all commits emailed to it has been helpful at work to keep a pulse on activity. On Thu, Feb 6, 2014 at 6:27 PM, Matei Zaharia matei.zaha...@gmail.com (mailto:matei.zaha...@gmail.com)wrote: Henry (or anyone else), do you have any preference on sending these directly to “dev versus creating another list for “issues”? I guess we can try “dev” for a while and let people decide if it gets too spammy. We’ll just have to advertise it in advance. Matei On Feb 6, 2014, at 9:55 AM, Henry Saputra henry.sapu...@gmail.com (mailto:henry.sapu...@gmail.com) wrote: HI Matei, yeah please subscribe it for now. Once we have ASF JIRA setup for Spark it will happen automatically. - Henry On Wed, Feb 5, 2014 at 2:56 PM, Matei Zaharia matei.zaha...@gmail.com (mailto:matei.zaha...@gmail.com) wrote: Hey Henry, this makes sense. I’d like to add that one other vehicle for discussion has been JIRA at https://spark-project.atlassian.net/browse/SPARK. Right now the dev list is not subscribed to JIRA, but we’d be happy to subscribe it anytime if that helps. We were hoping to do this only when JIRA has been moved to the ASF, since infra can set up the forwarding automatically. But most major discussions (e.g. https://spark-project.atlassian.net/browse/SPARK-964, https://spark-project.atlassian.net/browse/SPARK-969) happen there. I think this is the model we want to have in the future — most other projects I’ve participated in also used JIRA for their discussion, and mirrored to either the “dev” list or an “issues” list. Matei On Feb 5, 2014, at 2:49 PM, Henry Saputra henry.sapu...@gmail.com (mailto:henry.sapu...@gmail.com) wrote: Hi Guys, Just friendly reminder, some of you guys may work closely or collaborate outside the dev@ list and sometimes it is easier. But, as part of Apache Software Foundation project, any decision or outcome that could or will be implemented in the Apache Spark need to happen in the dev@ list as we are open and collaborative as community. If offline discussions happen please forward the history or potential solution to the dev@ list before any action taken. Most of us work remote so email is the official channel of discussion about stuff related to development in Spark. Github pull request is not the appropriate vehicle for technical discussions. It is used primarily for review of proposed patch which means initial problem most of the times had been identified and discussed. Thanks for understanding. - Henry
Test
Just sending a test email from another email address to check for bounces..
Discussion on strategy or roadmap should happen on dev@ list
I'd say let's try with dev@ list and see how spammy it is. If it is get too noisy we could always create issues@ list. - Henry On Thursday, February 6, 2014, Matei Zaharia matei.zaha...@gmail.comjavascript:_e(%7B%7D,'cvml','matei.zaha...@gmail.com'); wrote: Henry (or anyone else), do you have any preference on sending these directly to “dev versus creating another list for “issues”? I guess we can try “dev” for a while and let people decide if it gets too spammy. We’ll just have to advertise it in advance. Matei On Feb 6, 2014, at 9:55 AM, Henry Saputra henry.sapu...@gmail.com wrote: HI Matei, yeah please subscribe it for now. Once we have ASF JIRA setup for Spark it will happen automatically. - Henry On Wed, Feb 5, 2014 at 2:56 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Henry, this makes sense. I’d like to add that one other vehicle for discussion has been JIRA at https://spark-project.atlassian.net/browse/SPARK. Right now the dev list is not subscribed to JIRA, but we’d be happy to subscribe it anytime if that helps. We were hoping to do this only when JIRA has been moved to the ASF, since infra can set up the forwarding automatically. But most major discussions (e.g. https://spark-project.atlassian.net/browse/SPARK-964, https://spark-project.atlassian.net/browse/SPARK-969) happen there. I think this is the model we want to have in the future — most other projects I’ve participated in also used JIRA for their discussion, and mirrored to either the “dev” list or an “issues” list. Matei On Feb 5, 2014, at 2:49 PM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Guys, Just friendly reminder, some of you guys may work closely or collaborate outside the dev@ list and sometimes it is easier. But, as part of Apache Software Foundation project, any decision or outcome that could or will be implemented in the Apache Spark need to happen in the dev@ list as we are open and collaborative as community. If offline discussions happen please forward the history or potential solution to the dev@ list before any action taken. Most of us work remote so email is the official channel of discussion about stuff related to development in Spark. Github pull request is not the appropriate vehicle for technical discussions. It is used primarily for review of proposed patch which means initial problem most of the times had been identified and discussed. Thanks for understanding. - Henry
Re: Discussion on strategy or roadmap should happen on dev@ list
Yes, notifications on every issue will be sent to dev@spark.incubator.apache.org. You can filter them out though by matching on (sender = j...@spark-project.atlassian.net AND to = dev@spark.incubator.apache.org). Should be fairly straightforward with Gmail. Matei On Feb 6, 2014, at 8:01 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, Matei, Does it mean that I will receive the notification on every issue, instead of just the ones I’m watching on? Best, -- Nan Zhu On Thursday, February 6, 2014 at 10:56 PM, Matei Zaharia wrote: You can already get commits on comm...@spark.incubator.apache.org (mailto:comm...@spark.incubator.apache.org) actually. Regarding JIRA issues, I think I’ll just try putting them on dev for now, and we can move to a separate list later. If you’d like to filter them, the address is j...@spark-project.atlassian.net (mailto:j...@spark-project.atlassian.net). Matei On Feb 6, 2014, at 6:30 PM, Andrew Ash and...@andrewash.com (mailto:and...@andrewash.com) wrote: I'd prefer a spark-jira-activity list so I can filter appropriately. A separate request, but a spark-commits list that has all commits emailed to it has been helpful at work to keep a pulse on activity. On Thu, Feb 6, 2014 at 6:27 PM, Matei Zaharia matei.zaha...@gmail.com (mailto:matei.zaha...@gmail.com)wrote: Henry (or anyone else), do you have any preference on sending these directly to “dev versus creating another list for “issues”? I guess we can try “dev” for a while and let people decide if it gets too spammy. We’ll just have to advertise it in advance. Matei On Feb 6, 2014, at 9:55 AM, Henry Saputra henry.sapu...@gmail.com (mailto:henry.sapu...@gmail.com) wrote: HI Matei, yeah please subscribe it for now. Once we have ASF JIRA setup for Spark it will happen automatically. - Henry On Wed, Feb 5, 2014 at 2:56 PM, Matei Zaharia matei.zaha...@gmail.com (mailto:matei.zaha...@gmail.com) wrote: Hey Henry, this makes sense. I’d like to add that one other vehicle for discussion has been JIRA at https://spark-project.atlassian.net/browse/SPARK. Right now the dev list is not subscribed to JIRA, but we’d be happy to subscribe it anytime if that helps. We were hoping to do this only when JIRA has been moved to the ASF, since infra can set up the forwarding automatically. But most major discussions (e.g. https://spark-project.atlassian.net/browse/SPARK-964, https://spark-project.atlassian.net/browse/SPARK-969) happen there. I think this is the model we want to have in the future — most other projects I’ve participated in also used JIRA for their discussion, and mirrored to either the “dev” list or an “issues” list. Matei On Feb 5, 2014, at 2:49 PM, Henry Saputra henry.sapu...@gmail.com (mailto:henry.sapu...@gmail.com) wrote: Hi Guys, Just friendly reminder, some of you guys may work closely or collaborate outside the dev@ list and sometimes it is easier. But, as part of Apache Software Foundation project, any decision or outcome that could or will be implemented in the Apache Spark need to happen in the dev@ list as we are open and collaborative as community. If offline discussions happen please forward the history or potential solution to the dev@ list before any action taken. Most of us work remote so email is the official channel of discussion about stuff related to development in Spark. Github pull request is not the appropriate vehicle for technical discussions. It is used primarily for review of proposed patch which means initial problem most of the times had been identified and discussed. Thanks for understanding. - Henry
Re: Discussion on strategy or roadmap should happen on dev@ list
We can try it on dev, but I personally find the JIRA notifications pretty spammy ... It will clutter the dev list, and make it harder to search for useful information here. On Thu, Feb 6, 2014 at 6:27 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Henry (or anyone else), do you have any preference on sending these directly to dev versus creating another list for issues? I guess we can try dev for a while and let people decide if it gets too spammy. We'll just have to advertise it in advance. Matei On Feb 6, 2014, at 9:55 AM, Henry Saputra henry.sapu...@gmail.com wrote: HI Matei, yeah please subscribe it for now. Once we have ASF JIRA setup for Spark it will happen automatically. - Henry On Wed, Feb 5, 2014 at 2:56 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Henry, this makes sense. I'd like to add that one other vehicle for discussion has been JIRA at https://spark-project.atlassian.net/browse/SPARK. Right now the dev list is not subscribed to JIRA, but we'd be happy to subscribe it anytime if that helps. We were hoping to do this only when JIRA has been moved to the ASF, since infra can set up the forwarding automatically. But most major discussions (e.g. https://spark-project.atlassian.net/browse/SPARK-964, https://spark-project.atlassian.net/browse/SPARK-969) happen there. I think this is the model we want to have in the future -- most other projects I've participated in also used JIRA for their discussion, and mirrored to either the dev list or an issues list. Matei On Feb 5, 2014, at 2:49 PM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Guys, Just friendly reminder, some of you guys may work closely or collaborate outside the dev@ list and sometimes it is easier. But, as part of Apache Software Foundation project, any decision or outcome that could or will be implemented in the Apache Spark need to happen in the dev@ list as we are open and collaborative as community. If offline discussions happen please forward the history or potential solution to the dev@ list before any action taken. Most of us work remote so email is the official channel of discussion about stuff related to development in Spark. Github pull request is not the appropriate vehicle for technical discussions. It is used primarily for review of proposed patch which means initial problem most of the times had been identified and discussed. Thanks for understanding. - Henry
Re: Discussion on strategy or roadmap should happen on dev@ list
Henry (or anyone else), do you have any preference on sending these directly to “dev versus creating another list for “issues”? I guess we can try “dev” for a while and let people decide if it gets too spammy. We’ll just have to advertise it in advance. Matei On Feb 6, 2014, at 9:55 AM, Henry Saputra henry.sapu...@gmail.com wrote: HI Matei, yeah please subscribe it for now. Once we have ASF JIRA setup for Spark it will happen automatically. - Henry On Wed, Feb 5, 2014 at 2:56 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Henry, this makes sense. I’d like to add that one other vehicle for discussion has been JIRA at https://spark-project.atlassian.net/browse/SPARK. Right now the dev list is not subscribed to JIRA, but we’d be happy to subscribe it anytime if that helps. We were hoping to do this only when JIRA has been moved to the ASF, since infra can set up the forwarding automatically. But most major discussions (e.g. https://spark-project.atlassian.net/browse/SPARK-964, https://spark-project.atlassian.net/browse/SPARK-969) happen there. I think this is the model we want to have in the future — most other projects I’ve participated in also used JIRA for their discussion, and mirrored to either the “dev” list or an “issues” list. Matei On Feb 5, 2014, at 2:49 PM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Guys, Just friendly reminder, some of you guys may work closely or collaborate outside the dev@ list and sometimes it is easier. But, as part of Apache Software Foundation project, any decision or outcome that could or will be implemented in the Apache Spark need to happen in the dev@ list as we are open and collaborative as community. If offline discussions happen please forward the history or potential solution to the dev@ list before any action taken. Most of us work remote so email is the official channel of discussion about stuff related to development in Spark. Github pull request is not the appropriate vehicle for technical discussions. It is used primarily for review of proposed patch which means initial problem most of the times had been identified and discussed. Thanks for understanding. - Henry
Re: Is there any way to make a quick test on some pre-commit code?
This is neat, thanks Reynold ! Regards, Mridul On Fri, Feb 7, 2014 at 6:20 AM, Reynold Xin r...@databricks.com wrote: You can do sbt/sbt assemble-deps and then just run sbt/sbt package each time. You can even do sbt/sbt ~package for automatic incremental compilation. On Thu, Feb 6, 2014 at 4:46 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all Is it always necessary to run sbt assembly when you want to test some code, Sometimes you just repeatedly change one or two lines for some failed test case, it is really time-consuming to sbt assembly every time any faster way? Best, -- Nan Zhu
Re: [0.9.0] Possible deadlock in shutdown hook?
There is probably just one threadpool that has task threads -- is it possible to enumerate and interrupt just those? We may need to keep string a reference to that threadpool through to the shutdown thread to make that happen. On Thu, Feb 6, 2014 at 10:36 PM, Mridul Muralidharan mri...@gmail.comwrote: Ideally, interrupting the thread writing to disk should be sufficient - though since we are in middle of shutdown when this is happening, it is best case effort anyway. Identifying which threads to interrupt will be interesting since most of them are driven by threadpool's and we cant list all threads and interrupt all of them ! Regards, Mridul On Fri, Feb 7, 2014 at 5:57 AM, Andrew Ash and...@andrewash.com wrote: I think the solution where we stop the writing threads and then let the deleting threads completely clean up is the best option since the final state doesn't have half-deleted temp dirs scattered across the cluster. How feasible do you think it'd be to interrupt the other threads? On Thu, Feb 6, 2014 at 10:54 AM, Mridul Muralidharan mri...@gmail.com wrote: Looks like a pathological corner case here - where the the delete thread is not getting run while the OS is busy prioritizing the thread writing data (probably with heavy gc too). Ideally, the delete thread would list files, remove them and then fail when it tries to remove the non empty directory (since other thread might be creating more in parallel). Regards, Mridul On Thu, Feb 6, 2014 at 4:19 PM, Andrew Ash and...@andrewash.com wrote: Got a repro locally on my MBP (the other was on a CentOS machine). Build spark, run a master and a worker with the sbin/start-all.sh script, then run this in a shell: import org.apache.spark.storage.StorageLevel._ val s = sc.parallelize(1 to 10).persist(MEMORY_AND_DISK_SER); s.count After about a minute, this line appears in the shell logging output: 14/02/06 02:44:44 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, aash-mbp.dyn.yojoe.local, 57895, 0) with no recent heart beats: 57510ms exceeds 45000ms Ctrl-C the shell. In jps there is now a worker, a master, and a CoarseGrainedExecutorBackend. Run jstack on the CGEBackend JVM, and I got the attached stacktraces. I waited around for 15min then kill -9'd the JVM and restarted the process. I wonder if what's happening here is that the threads that are spewing data to disk (as that parallelize and persist would do) can write to disk faster than the cleanup threads can delete from disk. What do you think of that theory? Andrew On Thu, Feb 6, 2014 at 2:30 AM, Mridul Muralidharan mri...@gmail.com wrote: shutdown hooks should not take 15 mins are you mentioned ! On the other hand, how busy was your disk when this was happening ? (either due to spark or something else ?) It might just be that there was a lot of stuff to remove ? Regards, Mridul On Thu, Feb 6, 2014 at 3:50 PM, Andrew Ash and...@andrewash.com wrote: Hi Spark devs, Occasionally when hitting Ctrl-C in the scala spark shell on 0.9.0 one of my workers goes dead in the spark master UI. I'm using the standalone cluster and didn't ever see this while using 0.8.0 so I think it may be a regression. When I prod on the hung CoarseGrainedExecutorBackend JVM with jstack and jmap -heap, it doesn't respond unless I add the -F force flag. The heap isn't full, but there are some interesting bits in the jstack. Poking around a little, I think there may be some kind of deadlock in the shutdown hooks. Below are the threads I think are most interesting: Thread 14308: (state = BLOCKED) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Runtime.exit(int) @bci=14, line=109 (Interpreted frame) - java.lang.System.exit(int) @bci=4, line=962 (Interpreted frame) - org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(java.lang.Object, scala.Function1) @bci=352, line=81 (Interpreted frame) - akka.actor.ActorCell.receiveMessage(java.lang.Object) @bci=25, line=498 (Interpreted frame) - akka.actor.ActorCell.invoke(akka.dispatch.Envelope) @bci=39, line=456 (Interpreted frame) - akka.dispatch.Mailbox.processMailbox(int, long) @bci=24, line=237 (Interpreted frame) - akka.dispatch.Mailbox.run() @bci=20, line=219 (Interpreted frame) - akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec() @bci=4, line=386 (Interpreted frame) - scala.concurrent.forkjoin.ForkJoinTask.doExec() @bci=10, line=260 (Compiled frame) - scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(scala.concurrent.forkjoin.ForkJoinTask) @bci=10, line=1339
Re: Is there any way to make a quick test on some pre-commit code?
+1 On Thu, Feb 6, 2014 at 10:46 PM, Patrick Wendell pwend...@gmail.com wrote: We should document this on the wiki! On Thu, Feb 6, 2014 at 10:37 PM, Mridul Muralidharan mri...@gmail.com wrote: This is neat, thanks Reynold ! Regards, Mridul On Fri, Feb 7, 2014 at 6:20 AM, Reynold Xin r...@databricks.com wrote: You can do sbt/sbt assemble-deps and then just run sbt/sbt package each time. You can even do sbt/sbt ~package for automatic incremental compilation. On Thu, Feb 6, 2014 at 4:46 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all Is it always necessary to run sbt assembly when you want to test some code, Sometimes you just repeatedly change one or two lines for some failed test case, it is really time-consuming to sbt assembly every time any faster way? Best, -- Nan Zhu
Re: Is there any way to make a quick test on some pre-commit code?
+1 -- Nan Zhu On Friday, February 7, 2014 at 1:59 AM, Henry Saputra wrote: +1 On Thu, Feb 6, 2014 at 10:46 PM, Patrick Wendell pwend...@gmail.com (mailto:pwend...@gmail.com) wrote: We should document this on the wiki! On Thu, Feb 6, 2014 at 10:37 PM, Mridul Muralidharan mri...@gmail.com (mailto:mri...@gmail.com) wrote: This is neat, thanks Reynold ! Regards, Mridul On Fri, Feb 7, 2014 at 6:20 AM, Reynold Xin r...@databricks.com (mailto:r...@databricks.com) wrote: You can do sbt/sbt assemble-deps and then just run sbt/sbt package each time. You can even do sbt/sbt ~package for automatic incremental compilation. On Thu, Feb 6, 2014 at 4:46 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: Hi, all Is it always necessary to run sbt assembly when you want to test some code, Sometimes you just repeatedly change one or two lines for some failed test case, it is really time-consuming to sbt assembly every time any faster way? Best, -- Nan Zhu
Re: [0.9.0] Possible deadlock in shutdown hook?
Its highly likely that the executor with the threadpool that runs the tasks are the only set of threads that writes to disk. The tasks are designed to be interrupted when the corresponding job is cancelled. So a reasonably simple way could be to actually cancel the currently active jobs, which would send the signal to the worker to stop the tasks. Currently, the DAGSchedulerhttps://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L610does not seem to actually cancel the jobs, only mark them as failed. So it may be a simple addition. There may be some complications with the external spilling of shuffle data to disk not stopping immediately when the task is marked for killing. Gotta try it out. TD On Thu, Feb 6, 2014 at 10:39 PM, Andrew Ash and...@andrewash.com wrote: There is probably just one threadpool that has task threads -- is it possible to enumerate and interrupt just those? We may need to keep string a reference to that threadpool through to the shutdown thread to make that happen. On Thu, Feb 6, 2014 at 10:36 PM, Mridul Muralidharan mri...@gmail.com wrote: Ideally, interrupting the thread writing to disk should be sufficient - though since we are in middle of shutdown when this is happening, it is best case effort anyway. Identifying which threads to interrupt will be interesting since most of them are driven by threadpool's and we cant list all threads and interrupt all of them ! Regards, Mridul On Fri, Feb 7, 2014 at 5:57 AM, Andrew Ash and...@andrewash.com wrote: I think the solution where we stop the writing threads and then let the deleting threads completely clean up is the best option since the final state doesn't have half-deleted temp dirs scattered across the cluster. How feasible do you think it'd be to interrupt the other threads? On Thu, Feb 6, 2014 at 10:54 AM, Mridul Muralidharan mri...@gmail.com wrote: Looks like a pathological corner case here - where the the delete thread is not getting run while the OS is busy prioritizing the thread writing data (probably with heavy gc too). Ideally, the delete thread would list files, remove them and then fail when it tries to remove the non empty directory (since other thread might be creating more in parallel). Regards, Mridul On Thu, Feb 6, 2014 at 4:19 PM, Andrew Ash and...@andrewash.com wrote: Got a repro locally on my MBP (the other was on a CentOS machine). Build spark, run a master and a worker with the sbin/start-all.sh script, then run this in a shell: import org.apache.spark.storage.StorageLevel._ val s = sc.parallelize(1 to 10).persist(MEMORY_AND_DISK_SER); s.count After about a minute, this line appears in the shell logging output: 14/02/06 02:44:44 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, aash-mbp.dyn.yojoe.local, 57895, 0) with no recent heart beats: 57510ms exceeds 45000ms Ctrl-C the shell. In jps there is now a worker, a master, and a CoarseGrainedExecutorBackend. Run jstack on the CGEBackend JVM, and I got the attached stacktraces. I waited around for 15min then kill -9'd the JVM and restarted the process. I wonder if what's happening here is that the threads that are spewing data to disk (as that parallelize and persist would do) can write to disk faster than the cleanup threads can delete from disk. What do you think of that theory? Andrew On Thu, Feb 6, 2014 at 2:30 AM, Mridul Muralidharan mri...@gmail.com wrote: shutdown hooks should not take 15 mins are you mentioned ! On the other hand, how busy was your disk when this was happening ? (either due to spark or something else ?) It might just be that there was a lot of stuff to remove ? Regards, Mridul On Thu, Feb 6, 2014 at 3:50 PM, Andrew Ash and...@andrewash.com wrote: Hi Spark devs, Occasionally when hitting Ctrl-C in the scala spark shell on 0.9.0 one of my workers goes dead in the spark master UI. I'm using the standalone cluster and didn't ever see this while using 0.8.0 so I think it may be a regression. When I prod on the hung CoarseGrainedExecutorBackend JVM with jstack and jmap -heap, it doesn't respond unless I add the -F force flag. The heap isn't full, but there are some interesting bits in the jstack. Poking around a little, I think there may be some kind of deadlock in the shutdown hooks. Below are the threads I think are most interesting: Thread 14308: (state = BLOCKED) - java.lang.Shutdown.exit(int) @bci=96, line=212 (Interpreted frame) - java.lang.Runtime.exit(int) @bci=14, line=109 (Interpreted frame)
Re: [0.9.0] Possible deadlock in shutdown hook?
I don’t think we necessarily want to do this through the DAGScheduler because the worker might also shut down due to some unusual termination condition, like the driver node crashing. Can’t we do it at the top of the shutdown hook instead? If all the threads are in the same thread pool it might be possible to interrupt or stop the whole pool. Matei On Feb 6, 2014, at 11:30 PM, Andrew Ash and...@andrewash.com wrote: That's genius. Of course when a worker is told to shutdown it should interrupt its worker threads -- I think that would address this issue. Are you thinking to put running.map(_.jobId).foreach { handleJobCancellation } at the top of the StopDAGScheduler block? On Thu, Feb 6, 2014 at 11:05 PM, Tathagata Das tathagata.das1...@gmail.comwrote: Its highly likely that the executor with the threadpool that runs the tasks are the only set of threads that writes to disk. The tasks are designed to be interrupted when the corresponding job is cancelled. So a reasonably simple way could be to actually cancel the currently active jobs, which would send the signal to the worker to stop the tasks. Currently, the DAGScheduler https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L610 does not seem to actually cancel the jobs, only mark them as failed. So it may be a simple addition. There may be some complications with the external spilling of shuffle data to disk not stopping immediately when the task is marked for killing. Gotta try it out. TD On Thu, Feb 6, 2014 at 10:39 PM, Andrew Ash and...@andrewash.com wrote: There is probably just one threadpool that has task threads -- is it possible to enumerate and interrupt just those? We may need to keep string a reference to that threadpool through to the shutdown thread to make that happen. On Thu, Feb 6, 2014 at 10:36 PM, Mridul Muralidharan mri...@gmail.com wrote: Ideally, interrupting the thread writing to disk should be sufficient - though since we are in middle of shutdown when this is happening, it is best case effort anyway. Identifying which threads to interrupt will be interesting since most of them are driven by threadpool's and we cant list all threads and interrupt all of them ! Regards, Mridul On Fri, Feb 7, 2014 at 5:57 AM, Andrew Ash and...@andrewash.com wrote: I think the solution where we stop the writing threads and then let the deleting threads completely clean up is the best option since the final state doesn't have half-deleted temp dirs scattered across the cluster. How feasible do you think it'd be to interrupt the other threads? On Thu, Feb 6, 2014 at 10:54 AM, Mridul Muralidharan mri...@gmail.com wrote: Looks like a pathological corner case here - where the the delete thread is not getting run while the OS is busy prioritizing the thread writing data (probably with heavy gc too). Ideally, the delete thread would list files, remove them and then fail when it tries to remove the non empty directory (since other thread might be creating more in parallel). Regards, Mridul On Thu, Feb 6, 2014 at 4:19 PM, Andrew Ash and...@andrewash.com wrote: Got a repro locally on my MBP (the other was on a CentOS machine). Build spark, run a master and a worker with the sbin/start-all.sh script, then run this in a shell: import org.apache.spark.storage.StorageLevel._ val s = sc.parallelize(1 to 10).persist(MEMORY_AND_DISK_SER); s.count After about a minute, this line appears in the shell logging output: 14/02/06 02:44:44 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, aash-mbp.dyn.yojoe.local, 57895, 0) with no recent heart beats: 57510ms exceeds 45000ms Ctrl-C the shell. In jps there is now a worker, a master, and a CoarseGrainedExecutorBackend. Run jstack on the CGEBackend JVM, and I got the attached stacktraces. I waited around for 15min then kill -9'd the JVM and restarted the process. I wonder if what's happening here is that the threads that are spewing data to disk (as that parallelize and persist would do) can write to disk faster than the cleanup threads can delete from disk. What do you think of that theory? Andrew On Thu, Feb 6, 2014 at 2:30 AM, Mridul Muralidharan mri...@gmail.com wrote: shutdown hooks should not take 15 mins are you mentioned ! On the other hand, how busy was your disk when this was happening ? (either due to spark or something else ?) It might just be that there was a lot of stuff to remove ? Regards, Mridul On Thu, Feb 6, 2014 at 3:50 PM, Andrew Ash and...@andrewash.com wrote: Hi Spark devs, Occasionally when hitting Ctrl-C in the scala spark shell on 0.9.0 one of my workers goes dead in the spark master UI. I'm using the standalone cluster and didn't ever see this while using 0.8.0 so I think it may be a
Re: rough date for spark summit 2014
We just announced this and tickets are available now. June 30 thru July 2 in downtown SF. Details at http://spark-summit.org On Jan 30, 2014 11:13 AM, Ameet Kini ameetk...@gmail.com wrote: I know this is still a few months off and folks are rushing towards 0.9 release, but do the devs have a rough date for Spark Summit 2014? Looks like it'll be in summer, but is it Jun / July / Aug / Sep ? Even late-summer would help. Summer being a popular vacation time, a few months advance notice would be greatly appreciated (read: I missed last summit due to a pre-scheduled vacation and would hate to miss this one :) Thanks, Ameet -- You received this message because you are subscribed to the Google Groups Unofficial Apache Spark Dev Mailing List Mirror group. To unsubscribe from this group and stop receiving emails from it, send an email to apache-spark-dev-mirror+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
0.9.0 forces log4j usage
We have a few applications that embed Spark, and in 0.8.0 and 0.8.1, we were able to use slf4j, but 0.9.0 broke that and unintentionally forces direct use of log4j as the logging backend. The issue is here in the org.apache.spark.Logging trait: https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107 log4j-over-slf4j *always* returns an empty enumeration for appenders to the ROOT logger: https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81 And this causes an infinite loop and an eventual stack overflow. I'm happy to submit a Jira and a patch, but it would be significant enough reversal of recent changes that it's probably worth discussing before I sink a half hour into it. My suggestion would be that initialization (or not) should be left to the user with reasonable default behavior supplied by the spark commandline tooling and not forced on applications that incorporate Spark. Thoughts/opinions? -- Paul — p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
Re: [0.9.0] Possible deadlock in shutdown hook?
That definitely sound more reliable. Worth trying out if there is a reliable way of reproducing the deadlock-like scenario. TD On Thu, Feb 6, 2014 at 11:38 PM, Matei Zaharia matei.zaha...@gmail.comwrote: I don't think we necessarily want to do this through the DAGScheduler because the worker might also shut down due to some unusual termination condition, like the driver node crashing. Can't we do it at the top of the shutdown hook instead? If all the threads are in the same thread pool it might be possible to interrupt or stop the whole pool. Matei On Feb 6, 2014, at 11:30 PM, Andrew Ash and...@andrewash.com wrote: That's genius. Of course when a worker is told to shutdown it should interrupt its worker threads -- I think that would address this issue. Are you thinking to put running.map(_.jobId).foreach { handleJobCancellation } at the top of the StopDAGScheduler block? On Thu, Feb 6, 2014 at 11:05 PM, Tathagata Das tathagata.das1...@gmail.comwrote: Its highly likely that the executor with the threadpool that runs the tasks are the only set of threads that writes to disk. The tasks are designed to be interrupted when the corresponding job is cancelled. So a reasonably simple way could be to actually cancel the currently active jobs, which would send the signal to the worker to stop the tasks. Currently, the DAGScheduler https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L610 does not seem to actually cancel the jobs, only mark them as failed. So it may be a simple addition. There may be some complications with the external spilling of shuffle data to disk not stopping immediately when the task is marked for killing. Gotta try it out. TD On Thu, Feb 6, 2014 at 10:39 PM, Andrew Ash and...@andrewash.com wrote: There is probably just one threadpool that has task threads -- is it possible to enumerate and interrupt just those? We may need to keep string a reference to that threadpool through to the shutdown thread to make that happen. On Thu, Feb 6, 2014 at 10:36 PM, Mridul Muralidharan mri...@gmail.com wrote: Ideally, interrupting the thread writing to disk should be sufficient - though since we are in middle of shutdown when this is happening, it is best case effort anyway. Identifying which threads to interrupt will be interesting since most of them are driven by threadpool's and we cant list all threads and interrupt all of them ! Regards, Mridul On Fri, Feb 7, 2014 at 5:57 AM, Andrew Ash and...@andrewash.com wrote: I think the solution where we stop the writing threads and then let the deleting threads completely clean up is the best option since the final state doesn't have half-deleted temp dirs scattered across the cluster. How feasible do you think it'd be to interrupt the other threads? On Thu, Feb 6, 2014 at 10:54 AM, Mridul Muralidharan mri...@gmail.com wrote: Looks like a pathological corner case here - where the the delete thread is not getting run while the OS is busy prioritizing the thread writing data (probably with heavy gc too). Ideally, the delete thread would list files, remove them and then fail when it tries to remove the non empty directory (since other thread might be creating more in parallel). Regards, Mridul On Thu, Feb 6, 2014 at 4:19 PM, Andrew Ash and...@andrewash.com wrote: Got a repro locally on my MBP (the other was on a CentOS machine). Build spark, run a master and a worker with the sbin/start-all.sh script, then run this in a shell: import org.apache.spark.storage.StorageLevel._ val s = sc.parallelize(1 to 10).persist(MEMORY_AND_DISK_SER); s.count After about a minute, this line appears in the shell logging output: 14/02/06 02:44:44 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, aash-mbp.dyn.yojoe.local, 57895, 0) with no recent heart beats: 57510ms exceeds 45000ms Ctrl-C the shell. In jps there is now a worker, a master, and a CoarseGrainedExecutorBackend. Run jstack on the CGEBackend JVM, and I got the attached stacktraces. I waited around for 15min then kill -9'd the JVM and restarted the process. I wonder if what's happening here is that the threads that are spewing data to disk (as that parallelize and persist would do) can write to disk faster than the cleanup threads can delete from disk. What do you think of that theory? Andrew On Thu, Feb 6, 2014 at 2:30 AM, Mridul Muralidharan mri...@gmail.com wrote: shutdown hooks should not take 15 mins are you mentioned ! On the other hand, how busy was your disk when this was happening ? (either due to spark or something else ?) It might just be that there was a lot of stuff to remove ? Regards, Mridul On Thu,
Re: [0.9.0] Possible deadlock in shutdown hook?
I think we can enumerate all current threads with the ThreadMXBean, filter to those threads with the name of executor pool in them, and interrupt them. http://docs.oracle.com/javase/6/docs/api/java/lang/management/ManagementFactory.html#getThreadMXBean%28%29 The executor threads are currently named according to the pattern Executor task launch worker-X On Thu, Feb 6, 2014 at 11:45 PM, Tathagata Das tathagata.das1...@gmail.comwrote: That definitely sound more reliable. Worth trying out if there is a reliable way of reproducing the deadlock-like scenario. TD On Thu, Feb 6, 2014 at 11:38 PM, Matei Zaharia matei.zaha...@gmail.com wrote: I don't think we necessarily want to do this through the DAGScheduler because the worker might also shut down due to some unusual termination condition, like the driver node crashing. Can't we do it at the top of the shutdown hook instead? If all the threads are in the same thread pool it might be possible to interrupt or stop the whole pool. Matei On Feb 6, 2014, at 11:30 PM, Andrew Ash and...@andrewash.com wrote: That's genius. Of course when a worker is told to shutdown it should interrupt its worker threads -- I think that would address this issue. Are you thinking to put running.map(_.jobId).foreach { handleJobCancellation } at the top of the StopDAGScheduler block? On Thu, Feb 6, 2014 at 11:05 PM, Tathagata Das tathagata.das1...@gmail.comwrote: Its highly likely that the executor with the threadpool that runs the tasks are the only set of threads that writes to disk. The tasks are designed to be interrupted when the corresponding job is cancelled. So a reasonably simple way could be to actually cancel the currently active jobs, which would send the signal to the worker to stop the tasks. Currently, the DAGScheduler https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L610 does not seem to actually cancel the jobs, only mark them as failed. So it may be a simple addition. There may be some complications with the external spilling of shuffle data to disk not stopping immediately when the task is marked for killing. Gotta try it out. TD On Thu, Feb 6, 2014 at 10:39 PM, Andrew Ash and...@andrewash.com wrote: There is probably just one threadpool that has task threads -- is it possible to enumerate and interrupt just those? We may need to keep string a reference to that threadpool through to the shutdown thread to make that happen. On Thu, Feb 6, 2014 at 10:36 PM, Mridul Muralidharan mri...@gmail.com wrote: Ideally, interrupting the thread writing to disk should be sufficient - though since we are in middle of shutdown when this is happening, it is best case effort anyway. Identifying which threads to interrupt will be interesting since most of them are driven by threadpool's and we cant list all threads and interrupt all of them ! Regards, Mridul On Fri, Feb 7, 2014 at 5:57 AM, Andrew Ash and...@andrewash.com wrote: I think the solution where we stop the writing threads and then let the deleting threads completely clean up is the best option since the final state doesn't have half-deleted temp dirs scattered across the cluster. How feasible do you think it'd be to interrupt the other threads? On Thu, Feb 6, 2014 at 10:54 AM, Mridul Muralidharan mri...@gmail.com wrote: Looks like a pathological corner case here - where the the delete thread is not getting run while the OS is busy prioritizing the thread writing data (probably with heavy gc too). Ideally, the delete thread would list files, remove them and then fail when it tries to remove the non empty directory (since other thread might be creating more in parallel). Regards, Mridul On Thu, Feb 6, 2014 at 4:19 PM, Andrew Ash and...@andrewash.com wrote: Got a repro locally on my MBP (the other was on a CentOS machine). Build spark, run a master and a worker with the sbin/start-all.sh script, then run this in a shell: import org.apache.spark.storage.StorageLevel._ val s = sc.parallelize(1 to 10).persist(MEMORY_AND_DISK_SER); s.count After about a minute, this line appears in the shell logging output: 14/02/06 02:44:44 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, aash-mbp.dyn.yojoe.local, 57895, 0) with no recent heart beats: 57510ms exceeds 45000ms Ctrl-C the shell. In jps there is now a worker, a master, and a CoarseGrainedExecutorBackend. Run jstack on the CGEBackend JVM, and I got the attached stacktraces. I waited around for 15min then kill -9'd the JVM and restarted the process. I wonder if what's happening here is