[jira] [Created] (SPARK-9516) Improve Thread Dump page
Nan Zhu created SPARK-9516: -- Summary: Improve Thread Dump page Key: SPARK-9516 URL: https://issues.apache.org/jira/browse/SPARK-9516 Project: Spark Issue Type: New Feature Components: Web UI Reporter: Nan Zhu Originally proposed by [~irashid] in https://github.com/apache/spark/pull/7808#issuecomment-126788335: we can enhance the current thread dump page with at least the following two new features: 1) sort threads by thread status, 2) a filter to grep the threads -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9516) Improve Thread Dump page
[ https://issues.apache.org/jira/browse/SPARK-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650090#comment-14650090 ] Nan Zhu commented on SPARK-9516: I can work on it after finishing SPARK-8416 Improve Thread Dump page Key: SPARK-9516 URL: https://issues.apache.org/jira/browse/SPARK-9516 Project: Spark Issue Type: New Feature Components: Web UI Reporter: Nan Zhu Originally proposed by [~irashid] in https://github.com/apache/spark/pull/7808#issuecomment-126788335: we can enhance the current thread dump page with at least the following two new features: 1) sort threads by thread status, 2) a filter to grep the threads -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9123) Spark HistoryServer load logs too slow and can load the latest logs
[ https://issues.apache.org/jira/browse/SPARK-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630699#comment-14630699 ] Nan Zhu commented on SPARK-9123: do you mind closing the duplicate JIRAs? SPARK-9124 SPARK-9125? Spark HistoryServer load logs too slow and can load the latest logs --- Key: SPARK-9123 URL: https://issues.apache.org/jira/browse/SPARK-9123 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Xie Tingwen When I restart HistoryServer of Spark1.4,It always loading history log and slowly.I have months logs and it spend one day to load a small part of the log.In addition,It can't load the latest logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9123) Spark HistoryServer load logs too slow and can load the latest logs
[ https://issues.apache.org/jira/browse/SPARK-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630699#comment-14630699 ] Nan Zhu edited comment on SPARK-9123 at 7/17/15 3:05 AM: - do you mind closing the duplicate JIRAs, i.e. SPARK-9124 SPARK-9125? was (Author: codingcat): do you mind closing the duplicate JIRAs? SPARK-9124 SPARK-9125? Spark HistoryServer load logs too slow and can load the latest logs --- Key: SPARK-9123 URL: https://issues.apache.org/jira/browse/SPARK-9123 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Xie Tingwen When I restart HistoryServer of Spark1.4,It always loading history log and slowly.I have months logs and it spend one day to load a small part of the log.In addition,It can't load the latest logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9123) Spark HistoryServer load logs too slow and can load the latest logs
[ https://issues.apache.org/jira/browse/SPARK-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-9123: --- Target Version/s: (was: 1.5.0) Spark HistoryServer load logs too slow and can load the latest logs --- Key: SPARK-9123 URL: https://issues.apache.org/jira/browse/SPARK-9123 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Xie Tingwen When I restart HistoryServer of Spark1.4,It always loading history log and slowly.I have months logs and it spend one day to load a small part of the log.In addition,It can't load the latest logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9123) Spark HistoryServer load logs too slow and can load the latest logs
[ https://issues.apache.org/jira/browse/SPARK-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630706#comment-14630706 ] Nan Zhu commented on SPARK-9123: just removed the target version label, see here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark, nowadays, target version is only assigned by the committers to indicate a PR has been accepted for possible fix by the target version Spark HistoryServer load logs too slow and can load the latest logs --- Key: SPARK-9123 URL: https://issues.apache.org/jira/browse/SPARK-9123 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Xie Tingwen When I restart HistoryServer of Spark1.4,It always loading history log and slowly.I have months logs and it spend one day to load a small part of the log.In addition,It can't load the latest logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-1715) Ensure actor is self-contained in DAGScheduler
[ https://issues.apache.org/jira/browse/SPARK-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu closed SPARK-1715. -- Resolution: Won't Fix Akka actor has been removed from DAGScheduler Ensure actor is self-contained in DAGScheduler -- Key: SPARK-1715 URL: https://issues.apache.org/jira/browse/SPARK-1715 Project: Spark Issue Type: Improvement Components: Scheduler Reporter: Nan Zhu Assignee: Nan Zhu Though the current supervisor-child structure works fine for fault-tolerance, it violates the basic rule that the actor is better to be self-contained We should forward the message from supervisor to the child actor, so that we can eliminate the hard-coded timeout threshold for starting the DAGScheduler and provide more convenient interface for future development like parallel DAGScheduler, or new changes to the DAGScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6646) Spark 2.0: Rearchitecting Spark for Mobile Platforms
[ https://issues.apache.org/jira/browse/SPARK-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390367#comment-14390367 ] Nan Zhu commented on SPARK-6646: super cool, Spark enables Bigger than Bigger Data in mobile phones Spark 2.0: Rearchitecting Spark for Mobile Platforms Key: SPARK-6646 URL: https://issues.apache.org/jira/browse/SPARK-6646 Project: Spark Issue Type: Improvement Components: Project Infra Reporter: Reynold Xin Assignee: Reynold Xin Priority: Blocker Attachments: Spark on Mobile - Design Doc - v1.pdf Mobile computing is quickly rising to dominance, and by the end of 2017, it is estimated that 90% of CPU cycles will be devoted to mobile hardware. Spark’s project goal can be accomplished only when Spark runs efficiently for the growing population of mobile users. Designed and optimized for modern data centers and Big Data applications, Spark is unfortunately not a good fit for mobile computing today. In the past few months, we have been prototyping the feasibility of a mobile-first Spark architecture, and today we would like to share with you our findings. This ticket outlines the technical design of Spark’s mobile support, and shares results from several early prototypes. Mobile friendly version of the design doc: https://databricks.com/blog/2015/04/01/spark-2-rearchitecting-spark-for-mobile.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6592) API of Row trait should be presented in Scala doc
[ https://issues.apache.org/jira/browse/SPARK-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385724#comment-14385724 ] Nan Zhu commented on SPARK-6592: ? I don't think that makes any difference, as the path of Row.scala still contains spark/sql/catalyst? I also tried to rerun build/sbt doc, the same thing... maybe we need to hack SparkBuild.scala to exclude Row.scala? API of Row trait should be presented in Scala doc - Key: SPARK-6592 URL: https://issues.apache.org/jira/browse/SPARK-6592 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.3.0 Reporter: Nan Zhu Currently, the API of Row class is not presented in Scaladoc, though we have many chances to use it the reason is that we ignore all files under catalyst directly in SparkBuild.scala when generating Scaladoc, (https://github.com/apache/spark/blob/f75f633b21faaf911f04aeff847f25749b1ecd89/project/SparkBuild.scala#L369) What's the best approach to fix this? [~rxin] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6592) API of Row trait should be presented in Scala doc
[ https://issues.apache.org/jira/browse/SPARK-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385964#comment-14385964 ] Nan Zhu commented on SPARK-6592: it contains the reason is that the input of that line is file.getCanonicalPath...which output the absolute path e.g. {code} scala val f = new java.io.File(Row.class) f: java.io.File = Row.class scala f.getCanonicalPath res0: String = /Users/nanzhu/code/spark/sql/catalyst/target/scala-2.10/classes/org/apache/spark/sql/Row.class {code} API of Row trait should be presented in Scala doc - Key: SPARK-6592 URL: https://issues.apache.org/jira/browse/SPARK-6592 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.3.0 Reporter: Nan Zhu Priority: Critical Currently, the API of Row class is not presented in Scaladoc, though we have many chances to use it the reason is that we ignore all files under catalyst directly in SparkBuild.scala when generating Scaladoc, (https://github.com/apache/spark/blob/f75f633b21faaf911f04aeff847f25749b1ecd89/project/SparkBuild.scala#L369) What's the best approach to fix this? [~rxin] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6596) fix the instruction on building scaladoc
Nan Zhu created SPARK-6596: -- Summary: fix the instruction on building scaladoc Key: SPARK-6596 URL: https://issues.apache.org/jira/browse/SPARK-6596 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.4.0 Reporter: Nan Zhu In README.md under docs/ directory, it says that You can build just the Spark scaladoc by running build/sbt doc from the SPARK_PROJECT_ROOT directory. I guess the right approach is build/sbt unidoc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6592) API of Row trait should be presented in Scala doc
[ https://issues.apache.org/jira/browse/SPARK-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385616#comment-14385616 ] Nan Zhu commented on SPARK-6592: also cc: [~lian cheng] [~marmbrus] API of Row trait should be presented in Scala doc - Key: SPARK-6592 URL: https://issues.apache.org/jira/browse/SPARK-6592 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.3.0 Reporter: Nan Zhu Currently, the API of Row class is not presented in Scaladoc, though we have many chances to use it the reason is that we ignore all files under catalyst directly in SparkBuild.scala when generating Scaladoc, (https://github.com/apache/spark/blob/f75f633b21faaf911f04aeff847f25749b1ecd89/project/SparkBuild.scala#L369) What's the best approach to fix this? [~rxin] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6592) API of Row trait should be presented in Scala doc
Nan Zhu created SPARK-6592: -- Summary: API of Row trait should be presented in Scala doc Key: SPARK-6592 URL: https://issues.apache.org/jira/browse/SPARK-6592 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.3.0 Reporter: Nan Zhu Currently, the API of Row class is not presented in Scaladoc, though we have many chances to use it the reason is that we ignore all files under catalyst directly in SparkBuild.scala when generating Scaladoc, (https://github.com/apache/spark/blob/f75f633b21faaf911f04aeff847f25749b1ecd89/project/SparkBuild.scala#L369) What's the best approach to fix this? [~rxin] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6422) support customized akka system for actor-based receiver
Nan Zhu created SPARK-6422: -- Summary: support customized akka system for actor-based receiver Key: SPARK-6422 URL: https://issues.apache.org/jira/browse/SPARK-6422 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.4.0 Reporter: Nan Zhu To disable the Akka's fault detection system, we currently set large default values for Akka' transport failure detector threshold and heartbeat interval, and discourage users to customize these values.. Meanwhile, we are trying to eliminate the dependency to Akka, e.g. we are trying to implement a general RPC interface (https://github.com/apache/spark/pull/4588), etc. Based on the above facts, enabling a customized Akka system for actor-based receiver has two benefits: 1) users can easily set values related to failure detector which are more practical in general Akka usage scenario; 2) reduce the dependency level to Akka, (in future, we can even move Actor-based receiver to external project) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4012) Uncaught OOM in ContextCleaner
[ https://issues.apache.org/jira/browse/SPARK-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355151#comment-14355151 ] Nan Zhu commented on SPARK-4012: [~srowen], actually I got more understanding on the scenario involved in the patch...let me resubmit that within the week for more feedbacks Uncaught OOM in ContextCleaner -- Key: SPARK-4012 URL: https://issues.apache.org/jira/browse/SPARK-4012 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Nan Zhu Assignee: Nan Zhu When running an might-be-memory-intensive application locally, I received the following exception Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread Spark Context Cleaner Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread Driver Heartbeater Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated I looked at the code, we might want to call Utils.tryOrExit instead of Utils.logUncaughtExceptions -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6118) making package name of deploy.worker.CommandUtils and deploy.CommandUtilsSuite consistent
Nan Zhu created SPARK-6118: -- Summary: making package name of deploy.worker.CommandUtils and deploy.CommandUtilsSuite consistent Key: SPARK-6118 URL: https://issues.apache.org/jira/browse/SPARK-6118 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Nan Zhu Priority: Minor I found that the object CommandUtils is placed under deploy.worker package, while CommandUtilsSuite is under deploy Conventionally, we put the implementation and unit test class under the same package -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4011) tighten the visibility of the members in Master/Worker class
[ https://issues.apache.org/jira/browse/SPARK-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-4011: --- Description: Currently, most of the members in Master/Worker are with public accessibility we might wish to tighten the accessibility of them a bit more discussion is here: https://github.com/apache/spark/pull/2828 was: Currently, most of the members in Worker are with public accessibility we might wish to tighten the accessibility of them a bit more discussion is here: https://github.com/apache/spark/pull/2828 tighten the visibility of the members in Master/Worker class Key: SPARK-4011 URL: https://issues.apache.org/jira/browse/SPARK-4011 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Nan Zhu Priority: Minor Currently, most of the members in Master/Worker are with public accessibility we might wish to tighten the accessibility of them a bit more discussion is here: https://github.com/apache/spark/pull/2828 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4011) tighten the visibility of the members in Master/Worker class
[ https://issues.apache.org/jira/browse/SPARK-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-4011: --- Summary: tighten the visibility of the members in Master/Worker class (was: tighten the visibility of the members in Worker class) tighten the visibility of the members in Master/Worker class Key: SPARK-4011 URL: https://issues.apache.org/jira/browse/SPARK-4011 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Nan Zhu Priority: Minor Currently, most of the members in Worker are with public accessibility we might wish to tighten the accessibility of them a bit more discussion is here: https://github.com/apache/spark/pull/2828 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4011) tighten the visibility of the members in Master/Worker class
[ https://issues.apache.org/jira/browse/SPARK-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342240#comment-14342240 ] Nan Zhu commented on SPARK-4011: [~srowen] I just submitted the patch, pinged you in github, also left a question there, thanks tighten the visibility of the members in Master/Worker class Key: SPARK-4011 URL: https://issues.apache.org/jira/browse/SPARK-4011 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Nan Zhu Priority: Minor Currently, most of the members in Master/Worker are with public accessibility we might wish to tighten the accessibility of them a bit more discussion is here: https://github.com/apache/spark/pull/2828 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4011) tighten the visibility of the members in Worker class
[ https://issues.apache.org/jira/browse/SPARK-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338268#comment-14338268 ] Nan Zhu commented on SPARK-4011: [~sowen] not yet, but I can do that Please assign this to me, I will try to finish it in the weekend, thanks tighten the visibility of the members in Worker class - Key: SPARK-4011 URL: https://issues.apache.org/jira/browse/SPARK-4011 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Nan Zhu Priority: Minor Currently, most of the members in Worker are with public accessibility we might wish to tighten the accessibility of them a bit more discussion is here: https://github.com/apache/spark/pull/2828 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5724) misconfiguration in Akka system
Nan Zhu created SPARK-5724: -- Summary: misconfiguration in Akka system Key: SPARK-5724 URL: https://issues.apache.org/jira/browse/SPARK-5724 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0, 1.1.0 Reporter: Nan Zhu In AkkaUtil, we set several failure detector related the parameters as following {code:title=AkkaUtil.scala|borderStyle=solid} al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String]) .withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString( s |akka.daemonic = on |akka.loggers = [akka.event.slf4j.Slf4jLogger] |akka.stdout-loglevel = ERROR |akka.jvm-exit-on-fatal-error = off |akka.remote.require-cookie = $requireCookie |akka.remote.secure-cookie = $secureCookie |akka.remote.transport-failure-detector.heartbeat-interval = $akkaHeartBeatInterval s |akka.remote.transport-failure-detector.acceptable-heartbeat-pause = $akkaHeartBeatPauses s |akka.remote.transport-failure-detector.threshold = $akkaFailureDetector |akka.actor.provider = akka.remote.RemoteActorRefProvider |akka.remote.netty.tcp.transport-class = akka.remote.transport.netty.NettyTransport |akka.remote.netty.tcp.hostname = $host |akka.remote.netty.tcp.port = $port |akka.remote.netty.tcp.tcp-nodelay = on |akka.remote.netty.tcp.connection-timeout = $akkaTimeout s |akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B |akka.remote.netty.tcp.execution-pool-size = $akkaThreads |akka.actor.default-dispatcher.throughput = $akkaBatchSize |akka.log-config-on-start = $logAkkaConfig |akka.remote.log-remote-lifecycle-events = $lifecycleEvents |akka.log-dead-letters = $lifecycleEvents |akka.log-dead-letters-during-shutdown = $lifecycleEvents .stripMargin)) {code} Actually, we do not have any parameter naming akka.remote.transport-failure-detector.threshold see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html what we have is akka.remote.watch-failure-detector.threshold -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5293) Enable Spark user applications to use different versions of Akka
[ https://issues.apache.org/jira/browse/SPARK-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289258#comment-14289258 ] Nan Zhu commented on SPARK-5293: shall we make this JIRA as an umbrella task, so that the other JIRAs like SPARK-5214 can be associated to here? Enable Spark user applications to use different versions of Akka Key: SPARK-5293 URL: https://issues.apache.org/jira/browse/SPARK-5293 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.3.0 Reporter: Reynold Xin A lot of Spark user applications are using (or want to use) Akka. Akka as a whole can contribute great architectural simplicity and uniformity. However, because Spark depends on Akka, it is not possible for users to rely on different versions, and we have received many requests in the past asking for help about this specific issue. For example, Spark Streaming might be used as the receiver of Akka messages - but our dependency on Akka requires the upstream Akka actors to also use the identical version of Akka. Since our usage of Akka is limited (mainly for RPC and single-threaded event loop), we can replace it with alternative RPC implementations and a common event loop in Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5268) CoarseGrainedExecutorBackend exits for irrelevant DisassociatedEvent
[ https://issues.apache.org/jira/browse/SPARK-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-5268: --- Target Version/s: 1.2.1 CoarseGrainedExecutorBackend exits for irrelevant DisassociatedEvent Key: SPARK-5268 URL: https://issues.apache.org/jira/browse/SPARK-5268 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Nan Zhu Priority: Blocker In CoarseGrainedExecutorBackend, we subscribe DisassociatedEvent in executor backend actor and exit the program upon receive such event... let's consider the following case The user may develop an Akka-based program which starts the actor with Spark's actor system and communicate with an external actor system (e.g. an Akka-based receiver in spark streaming which communicates with an external system) If the external actor system fails or disassociates with the actor within spark's system with purpose, we may receive DisassociatedEvent and the executor is restarted. This is not the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5268) CoarseGrainedExecutorBackend exits for irrelevant DisassociatedEvent
[ https://issues.apache.org/jira/browse/SPARK-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-5268: --- Summary: CoarseGrainedExecutorBackend exits for irrelevant DisassociatedEvent (was: ExecutorBackend exits for irrelevant DisassociatedEvent) CoarseGrainedExecutorBackend exits for irrelevant DisassociatedEvent Key: SPARK-5268 URL: https://issues.apache.org/jira/browse/SPARK-5268 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Nan Zhu In CoarseGrainedExecutorBackend, we subscribe DisassociatedEvent in executor backend actor and exit the program upon receive such event... let's consider the following case The user may develop an Akka-based program which starts the actor with Spark's actor system and communicate with an external actor system (e.g. an Akka-based receiver in spark streaming which communicates with an external system) If the external actor system fails or disassociates with the actor within spark's system with purpose, we may receive DisassociatedEvent and the executor is restarted. This is not the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5268) ExecutorBackend exits for irrelevant DisassociatedEvent
Nan Zhu created SPARK-5268: -- Summary: ExecutorBackend exits for irrelevant DisassociatedEvent Key: SPARK-5268 URL: https://issues.apache.org/jira/browse/SPARK-5268 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Nan Zhu In CoarseGrainedExecutorBackend, we subscribe DisassociatedEvent in executor backend actor and exit the program upon receive such event... let's consider the following case The user may develop an Akka-based program which starts the actor with Spark's actor system and communicate with an external actor system (e.g. an Akka-based receiver in spark streaming which communicates with an external system) If the external actor system fails or disassociates with the actor within spark's system with purpose, we may receive DisassociatedEvent and the executor is restarted. This is not the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5268) CoarseGrainedExecutorBackend exits for irrelevant DisassociatedEvent
[ https://issues.apache.org/jira/browse/SPARK-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-5268: --- Priority: Blocker (was: Major) CoarseGrainedExecutorBackend exits for irrelevant DisassociatedEvent Key: SPARK-5268 URL: https://issues.apache.org/jira/browse/SPARK-5268 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Nan Zhu Priority: Blocker In CoarseGrainedExecutorBackend, we subscribe DisassociatedEvent in executor backend actor and exit the program upon receive such event... let's consider the following case The user may develop an Akka-based program which starts the actor with Spark's actor system and communicate with an external actor system (e.g. an Akka-based receiver in spark streaming which communicates with an external system) If the external actor system fails or disassociates with the actor within spark's system with purpose, we may receive DisassociatedEvent and the executor is restarted. This is not the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4004) add akka-persistence based recovery mechanism for Master (maybe Worker)
[ https://issues.apache.org/jira/browse/SPARK-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu closed SPARK-4004. -- Resolution: Won't Fix I'd close the PR as I saw some discussions in https://github.com/apache/spark/pull/3825 which stated that we would introduce less Akka's feature to make it easier to replace Akka with Spark's own RPC framework add akka-persistence based recovery mechanism for Master (maybe Worker) --- Key: SPARK-4004 URL: https://issues.apache.org/jira/browse/SPARK-4004 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Nan Zhu Since we have upgraded akka version to 2.3.x we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexibility than current File based persistence Engine -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4004) add akka-persistence based recovery mechanism for Master (maybe Worker)
[ https://issues.apache.org/jira/browse/SPARK-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274299#comment-14274299 ] Nan Zhu edited comment on SPARK-4004 at 1/12/15 10:30 PM: -- I'd close the PR as I saw some discussions in https://github.com/apache/spark/pull/3825 which stated that we would introduce less Akka's feature to make it easier to replace Akka with Spark's RPC framework was (Author: codingcat): I'd close the PR as I saw some discussions in https://github.com/apache/spark/pull/3825 which stated that we would introduce less Akka's feature to make it easier to replace Akka with Spark's own RPC framework add akka-persistence based recovery mechanism for Master (maybe Worker) --- Key: SPARK-4004 URL: https://issues.apache.org/jira/browse/SPARK-4004 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Nan Zhu Since we have upgraded akka version to 2.3.x we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexibility than current File based persistence Engine -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5153) flaky test of Reliable Kafka input stream with multiple topics
[ https://issues.apache.org/jira/browse/SPARK-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270964#comment-14270964 ] Nan Zhu commented on SPARK-5153: [~saisai_shao], yeah, I got up early to check whether the test case can pass in a lightly loaded jenkins, Same code, just rebased https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25323/consoleFull It passed... So I agree with you that it's just caused by the overloaded server but I'm not sure prolonging the timeout duration is the right way to do...as it's still possible to have a super overloaded server to make that timeout again... flaky test of Reliable Kafka input stream with multiple topics Key: SPARK-5153 URL: https://issues.apache.org/jira/browse/SPARK-5153 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.0 Reporter: Nan Zhu I have seen several irrelevant PR failed on this test https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25254/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25248/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25251/console -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5174) Missing Document for starting multiple workers/supervisors in actor-based receiver
[ https://issues.apache.org/jira/browse/SPARK-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270955#comment-14270955 ] Nan Zhu commented on SPARK-5174: can anyone assign this to me, I'd submit a PR to fix this Missing Document for starting multiple workers/supervisors in actor-based receiver -- Key: SPARK-5174 URL: https://issues.apache.org/jira/browse/SPARK-5174 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.0 Reporter: Nan Zhu Currently, the document about starting multiple supervisors/workers are missing, though the implementation provides this capacity {code:title=ActorReceiver.scala|borderStyle=solid} case props: Props = val worker = context.actorOf(props) logInfo(Started receiver worker at: + worker.path) sender ! worker case (props: Props, name: String) = val worker = context.actorOf(props, name) logInfo(Started receiver worker at: + worker.path) sender ! worker case _: PossiblyHarmful = hiccups.incrementAndGet() case _: Statistics = val workers = context.children sender ! Statistics(n.get, workers.size, hiccups.get, workers.mkString(\n)) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5175) bug in updating counters when starting multiple workers/supervisors in actor-based receiver
[ https://issues.apache.org/jira/browse/SPARK-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270957#comment-14270957 ] Nan Zhu commented on SPARK-5175: can anyone assign this to me, I'd like to submit a PR on this bug in updating counters when starting multiple workers/supervisors in actor-based receiver --- Key: SPARK-5175 URL: https://issues.apache.org/jira/browse/SPARK-5175 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: Nan Zhu Fix For: 1.2.0 when starting multiple workers(ActorReceiver.scala), we didn't update the counters in it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5175) bug in updating counters when starting multiple workers/supervisors in actor-based receiver
Nan Zhu created SPARK-5175: -- Summary: bug in updating counters when starting multiple workers/supervisors in actor-based receiver Key: SPARK-5175 URL: https://issues.apache.org/jira/browse/SPARK-5175 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: Nan Zhu Fix For: 1.2.0 when starting multiple workers(ActorReceiver.scala), we didn't update the counters in it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5153) flaky test of Reliable Kafka input stream with multiple topics
[ https://issues.apache.org/jira/browse/SPARK-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-5153: --- Component/s: Streaming flaky test of Reliable Kafka input stream with multiple topics Key: SPARK-5153 URL: https://issues.apache.org/jira/browse/SPARK-5153 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.0 Reporter: Nan Zhu I have seen several irrelevant PR failed on this test https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25254/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25248/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25251/console -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5153) flaky test of Reliable Kafka input stream with multiple topics
Nan Zhu created SPARK-5153: -- Summary: flaky test of Reliable Kafka input stream with multiple topics Key: SPARK-5153 URL: https://issues.apache.org/jira/browse/SPARK-5153 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: Nan Zhu I have seen several irrelevant PR failed on this test https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25254/consoleFull https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25248/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25251/console -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4971) fix typo in BlockGenerator
[ https://issues.apache.org/jira/browse/SPARK-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258997#comment-14258997 ] Nan Zhu commented on SPARK-4971: didn't they say every PR should be bind with a JIRA? not sure how to handle this kind of case fix typo in BlockGenerator -- Key: SPARK-4971 URL: https://issues.apache.org/jira/browse/SPARK-4971 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.2.0 Reporter: Nan Zhu Priority: Trivial Labels: patch BlockGeneratorListnere.onAddData = BlockGeneratorListener.onAddData -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4971) fix typo in BlockGenerator
[ https://issues.apache.org/jira/browse/SPARK-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259004#comment-14259004 ] Nan Zhu commented on SPARK-4971: I see...thanks for the explanation fix typo in BlockGenerator -- Key: SPARK-4971 URL: https://issues.apache.org/jira/browse/SPARK-4971 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.2.0 Reporter: Nan Zhu Priority: Trivial Labels: patch BlockGeneratorListnere.onAddData = BlockGeneratorListener.onAddData -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4971) fix typo in BlockGenerator
Nan Zhu created SPARK-4971: -- Summary: fix typo in BlockGenerator Key: SPARK-4971 URL: https://issues.apache.org/jira/browse/SPARK-4971 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.2.0 Reporter: Nan Zhu Priority: Trivial BlockGeneratorListnere.onAddData = BlockGeneratorListener.onAddData -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222973#comment-14222973 ] Nan Zhu commented on SPARK-3628: hmmmOK but for this case, shall I submit individual patches for 0.9.x, 1.0.x, because there are some merge conflicts to apply the patch directly ? Don't apply accumulator updates multiple times for tasks in result stages - Key: SPARK-3628 URL: https://issues.apache.org/jira/browse/SPARK-3628 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Matei Zaharia Assignee: Nan Zhu Priority: Blocker In previous versions of Spark, accumulator updates only got applied once for accumulators that are only used in actions (i.e. result stages), letting you use them to deterministically compute a result. Unfortunately, this got broken in some recent refactorings. This is related to https://issues.apache.org/jira/browse/SPARK-732, but that issue is about applying the same semantics to intermediate stages too, which is more work and may not be what we want for debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4238) Perform network-level retry of shuffle file fetches
[ https://issues.apache.org/jira/browse/SPARK-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200334#comment-14200334 ] Nan Zhu commented on SPARK-4238: it is related to https://issues.apache.org/jira/browse/SPARK-4188? Perform network-level retry of shuffle file fetches --- Key: SPARK-4238 URL: https://issues.apache.org/jira/browse/SPARK-4238 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Aaron Davidson Assignee: Aaron Davidson Priority: Critical During periods of high network (or GC) load, it is not uncommon that IOExceptions crop up around connection failures when fetching shuffle files. Unfortunately, when such a failure occurs, it is interpreted as an inability to fetch the files, which causes us to mark the executor as lost and recompute all of its shuffle outputs. We should allow retrying at the network level in the event of an IOException in order to avoid this circumstance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4067) refactor ExecutorUncaughtExceptionHandler as a general one as it is used like this
Nan Zhu created SPARK-4067: -- Summary: refactor ExecutorUncaughtExceptionHandler as a general one as it is used like this Key: SPARK-4067 URL: https://issues.apache.org/jira/browse/SPARK-4067 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Nan Zhu currently , we call Utils.tryOrExit everywhere AppClient Executor TaskSchedulerImpl It makes the name of ExecutorUncaughtExceptionHandler unfit to the real case -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4011) tighten the visibility of the members in Worker class
Nan Zhu created SPARK-4011: -- Summary: tighten the visibility of the members in Worker class Key: SPARK-4011 URL: https://issues.apache.org/jira/browse/SPARK-4011 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Nan Zhu Currently, most of the members in Worker are with public accessibility we might wish to tighten the accessibility of them a bit more discussion is here: https://github.com/apache/spark/pull/2828 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4012) Uncaught OOM in ContextCleaner
Nan Zhu created SPARK-4012: -- Summary: Uncaught OOM in ContextCleaner Key: SPARK-4012 URL: https://issues.apache.org/jira/browse/SPARK-4012 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Nan Zhu Fix For: 1.1.0 When running an might-be-memory-intensive application locally, I received the following exception Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread Spark Context Cleaner Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread Driver Heartbeater Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated I looked at the code, we might want to call Utils.tryOrExit instead of Utils.logUncaughtExceptions -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4004) add akka-persistence based recovery mechanism for Master (maybe Worker)
Nan Zhu created SPARK-4004: -- Summary: add akka-persistence based recovery mechanism for Master (maybe Worker) Key: SPARK-4004 URL: https://issues.apache.org/jira/browse/SPARK-4004 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Nan Zhu Since we have upgraded akka version to 2.3.0 we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexible than current File based persistence Engine -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4004) add akka-persistence based recovery mechanism for Master (maybe Worker)
[ https://issues.apache.org/jira/browse/SPARK-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-4004: --- Description: Since we have upgraded akka version to 2.3.x we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexible than current File based persistence Engine was: Since we have upgraded akka version to 2.3.0 we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexible than current File based persistence Engine add akka-persistence based recovery mechanism for Master (maybe Worker) --- Key: SPARK-4004 URL: https://issues.apache.org/jira/browse/SPARK-4004 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Nan Zhu Since we have upgraded akka version to 2.3.x we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexible than current File based persistence Engine -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4004) add akka-persistence based recovery mechanism for Master (maybe Worker)
[ https://issues.apache.org/jira/browse/SPARK-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176571#comment-14176571 ] Nan Zhu commented on SPARK-4004: I will post a design doc after Monday add akka-persistence based recovery mechanism for Master (maybe Worker) --- Key: SPARK-4004 URL: https://issues.apache.org/jira/browse/SPARK-4004 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Nan Zhu Since we have upgraded akka version to 2.3.x we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexible than current File based persistence Engine -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4004) add akka-persistence based recovery mechanism for Master (maybe Worker)
[ https://issues.apache.org/jira/browse/SPARK-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-4004: --- Description: Since we have upgraded akka version to 2.3.x we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexibility than current File based persistence Engine was: Since we have upgraded akka version to 2.3.x we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexible than current File based persistence Engine add akka-persistence based recovery mechanism for Master (maybe Worker) --- Key: SPARK-4004 URL: https://issues.apache.org/jira/browse/SPARK-4004 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Nan Zhu Since we have upgraded akka version to 2.3.x we can utilize the features which are actually helpful in many applications, e.g. by using persistence we can add akka-persistence recovery mechanism to Master (maybe also Worker, but I'm not sure if we have many things to recover from that) this would be with better performance and more flexibility than current File based persistence Engine -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174053#comment-14174053 ] Nan Zhu commented on SPARK-3957: I agree with [~andrewor14], I was also thinking about piggyback the information in the heartbeat between heartbeatReceiver and the executor ...not sure about the current Hadoop implementation, in 1.x version, TaskStatus was piggyback in the heartbeat between TaskTracker and JobTracker...to me, it's a very natural way to do this I accepted it this morning and have started some work, so, [~devlakhani], please let me finish this, thanks Broadcast variable memory usage not reflected in UI --- Key: SPARK-3957 URL: https://issues.apache.org/jira/browse/SPARK-3957 Project: Spark Issue Type: Bug Components: Block Manager, Web UI Affects Versions: 1.0.2, 1.1.0 Reporter: Shivaram Venkataraman Assignee: Nan Zhu Memory used by broadcast variables are not reflected in the memory usage reported in the WebUI. For example, the executors tab shows memory used in each executor but this number doesn't include memory used by broadcast variables. Similarly the storage tab only shows list of rdds cached and how much memory they use. We should add a separate column / tab for broadcast variables to make it easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174664#comment-14174664 ] Nan Zhu commented on SPARK-3957: After looking at the problem more closely, I think we might just set the tellMaster flag to true to get this information (after put, it will report to BlockManagerMaster), instead of introducing a fat heartbeat message or open new channel the only thing we need to add is that, we need distinguish RDD and broadcast variable in BlockStatus how you guys think about it? Broadcast variable memory usage not reflected in UI --- Key: SPARK-3957 URL: https://issues.apache.org/jira/browse/SPARK-3957 Project: Spark Issue Type: Bug Components: Block Manager, Web UI Affects Versions: 1.0.2, 1.1.0 Reporter: Shivaram Venkataraman Assignee: Nan Zhu Memory used by broadcast variables are not reflected in the memory usage reported in the WebUI. For example, the executors tab shows memory used in each executor but this number doesn't include memory used by broadcast variables. Similarly the storage tab only shows list of rdds cached and how much memory they use. We should add a separate column / tab for broadcast variables to make it easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174675#comment-14174675 ] Nan Zhu commented on SPARK-3957: BlockId can directly tell if the corresponding block is a broadcast variable Broadcast variable memory usage not reflected in UI --- Key: SPARK-3957 URL: https://issues.apache.org/jira/browse/SPARK-3957 Project: Spark Issue Type: Bug Components: Block Manager, Web UI Affects Versions: 1.0.2, 1.1.0 Reporter: Shivaram Venkataraman Assignee: Nan Zhu Memory used by broadcast variables are not reflected in the memory usage reported in the WebUI. For example, the executors tab shows memory used in each executor but this number doesn't include memory used by broadcast variables. Similarly the storage tab only shows list of rdds cached and how much memory they use. We should add a separate column / tab for broadcast variables to make it easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174684#comment-14174684 ] Nan Zhu commented on SPARK-3957: [~andrewor14], why we didn't report broadcast variable resource usage to BlockManagerMaster in the current implementation? Broadcast variable memory usage not reflected in UI --- Key: SPARK-3957 URL: https://issues.apache.org/jira/browse/SPARK-3957 Project: Spark Issue Type: Bug Components: Block Manager, Web UI Affects Versions: 1.0.2, 1.1.0 Reporter: Shivaram Venkataraman Assignee: Nan Zhu Memory used by broadcast variables are not reflected in the memory usage reported in the WebUI. For example, the executors tab shows memory used in each executor but this number doesn't include memory used by broadcast variables. Similarly the storage tab only shows list of rdds cached and how much memory they use. We should add a separate column / tab for broadcast variables to make it easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174684#comment-14174684 ] Nan Zhu edited comment on SPARK-3957 at 10/17/14 3:04 AM: -- [~andrewor14], why we don't report broadcast variable resource usage to BlockManagerMaster in the current implementation? was (Author: codingcat): [~andrewor14], why we didn't report broadcast variable resource usage to BlockManagerMaster in the current implementation? Broadcast variable memory usage not reflected in UI --- Key: SPARK-3957 URL: https://issues.apache.org/jira/browse/SPARK-3957 Project: Spark Issue Type: Bug Components: Block Manager, Web UI Affects Versions: 1.0.2, 1.1.0 Reporter: Shivaram Venkataraman Assignee: Nan Zhu Memory used by broadcast variables are not reflected in the memory usage reported in the WebUI. For example, the executors tab shows memory used in each executor but this number doesn't include memory used by broadcast variables. Similarly the storage tab only shows list of rdds cached and how much memory they use. We should add a separate column / tab for broadcast variables to make it easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174747#comment-14174747 ] Nan Zhu commented on SPARK-3957: Ok, when i work on executor tab, i rwslize that, we eventually need a per-executor record of broadcast usageso will still follow the heartbeat based strategy Broadcast variable memory usage not reflected in UI --- Key: SPARK-3957 URL: https://issues.apache.org/jira/browse/SPARK-3957 Project: Spark Issue Type: Bug Components: Block Manager, Web UI Affects Versions: 1.0.2, 1.1.0 Reporter: Shivaram Venkataraman Assignee: Nan Zhu Memory used by broadcast variables are not reflected in the memory usage reported in the WebUI. For example, the executors tab shows memory used in each executor but this number doesn't include memory used by broadcast variables. Similarly the storage tab only shows list of rdds cached and how much memory they use. We should add a separate column / tab for broadcast variables to make it easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3736) Workers should reconnect to Master if disconnected
[ https://issues.apache.org/jira/browse/SPARK-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171358#comment-14171358 ] Nan Zhu commented on SPARK-3736: if the worker itself timeout, the Master will remove the worker from idToWorker, when the worker is resumed later and sends heartbeat to Master again, Master detect this by attempting to find worker in idToWorker (search logWarning(Got heartbeat from unregistered worker + workerId) in Master.scala) you can simply replace logWarning with the logic of sending a message to worker to ask it to re-register Workers should reconnect to Master if disconnected -- Key: SPARK-3736 URL: https://issues.apache.org/jira/browse/SPARK-3736 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Andrew Ash Assignee: Matthew Cheah Priority: Critical In standalone mode, when a worker gets disconnected from the master for some reason it never attempts to reconnect. In this situation you have to bounce the worker before it will reconnect to the master. The preferred alternative is to follow what Hadoop does -- when there's a disconnect, attempt to reconnect at a particular interval until successful (I think it repeats indefinitely every 10sec). This has been observed by: - [~pkolaczk] in http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td6240.html - [~romi-totango] in http://apache-spark-user-list.1001560.n3.nabble.com/Re-Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td15335.html - [~aash] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3736) Workers should reconnect to Master if disconnected
[ https://issues.apache.org/jira/browse/SPARK-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171360#comment-14171360 ] Nan Zhu commented on SPARK-3736: BTW, master will not send heartbeat to Worker proactively Workers should reconnect to Master if disconnected -- Key: SPARK-3736 URL: https://issues.apache.org/jira/browse/SPARK-3736 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Andrew Ash Assignee: Matthew Cheah Priority: Critical In standalone mode, when a worker gets disconnected from the master for some reason it never attempts to reconnect. In this situation you have to bounce the worker before it will reconnect to the master. The preferred alternative is to follow what Hadoop does -- when there's a disconnect, attempt to reconnect at a particular interval until successful (I think it repeats indefinitely every 10sec). This has been observed by: - [~pkolaczk] in http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td6240.html - [~romi-totango] in http://apache-spark-user-list.1001560.n3.nabble.com/Re-Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td15335.html - [~aash] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1192) Around 30 parameters in Spark are used but undocumented and some are having confusing name
[ https://issues.apache.org/jira/browse/SPARK-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169663#comment-14169663 ] Nan Zhu commented on SPARK-1192: yes, I resubmitted https://github.com/apache/spark/pull/2312 for Matei's request (removed some, add some) it's still valid Around 30 parameters in Spark are used but undocumented and some are having confusing name -- Key: SPARK-1192 URL: https://issues.apache.org/jira/browse/SPARK-1192 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu I grep the code in core component, I found that around 30 parameters in the implementation is actually used but undocumented. By reading the source code, I found that some of them are actually very useful for the user. I suggest to make a complete document on the parameters. Also some parameters are having confusing names spark.shuffle.copier.threads - this parameters is to control how many threads you will use when you start a Netty-based shuffle servicebut from the name, we cannot get this information spark.shuffle.sender.port - the similar problem with the above one, when you use Netty-based shuffle receiver, you will have to setup a Netty-based sender...this parameter is to setup the port used by the Netty sender, but the name cannot convey this information -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3795) Add scheduler hooks/heuristics for adding and removing executors
[ https://issues.apache.org/jira/browse/SPARK-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167131#comment-14167131 ] Nan Zhu commented on SPARK-3795: this is for YARN or standalone? Add scheduler hooks/heuristics for adding and removing executors Key: SPARK-3795 URL: https://issues.apache.org/jira/browse/SPARK-3795 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 1.1.0 Reporter: Patrick Wendell Assignee: Andrew Or To support dynamic scaling of a Spark application, Spark's scheduler will need to have hooks around explicitly decommissioning executors. We'll also need basic heuristics governing when to start/stop executors based on load. An initial goal is to keep this very simple. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2962) Suboptimal scheduling in spark
[ https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167144#comment-14167144 ] Nan Zhu commented on SPARK-2962: Hi, [~mrid...@yahoo-inc.com] I think this has been fixed in https://github.com/apache/spark/pull/1313/files, {code:title=TaskSetManager.scala|borderStyle=solid} if (tasks(index).preferredLocations == Nil) { addTo(pendingTasksWithNoPrefs) } {code} Now, only tasks without explicit preference is added to pendingTasksWithNoPrefs, and NO_PREF tasks are always scheduled after NODE_LOCAL Suboptimal scheduling in spark -- Key: SPARK-2962 URL: https://issues.apache.org/jira/browse/SPARK-2962 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: All Reporter: Mridul Muralidharan In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs are always scheduled with PROCESS_LOCAL pendingTasksWithNoPrefs contains tasks which currently do not have any alive locations - but which could come in 'later' : particularly relevant when spark app is just coming up and containers are still being added. This causes a large number of non node local tasks to be scheduled incurring significant network transfers in the cluster when running with non trivial datasets. The comment // Look for no-pref tasks after rack-local tasks since they can run anywhere. is misleading in the method code : locality levels start from process_local down to any, and so no prefs get scheduled much before rack. Also note that, currentLocalityIndex is reset to the taskLocality returned by this method - so returning PROCESS_LOCAL as the level will trigger wait times again. (Was relevant before recent change to scheduler, and might be again based on resolution of this issue). Found as part of writing test for SPARK-2931 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3835) Spark applications that are killed should show up as KILLED or CANCELLED in the Spark UI
[ https://issues.apache.org/jira/browse/SPARK-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165768#comment-14165768 ] Nan Zhu commented on SPARK-3835: this problem still exists? I once reported the same thing in SPARK-1118 Spark applications that are killed should show up as KILLED or CANCELLED in the Spark UI Key: SPARK-3835 URL: https://issues.apache.org/jira/browse/SPARK-3835 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 1.1.0 Reporter: Matt Cheah Labels: UI Spark applications that crash or are killed are listed as FINISHED in the Spark UI. It looks like the Master only passes back a list of Running applications and a list of Completed applications, All of the applications under Completed have status FINISHED, however if they were killed manually they should show CANCELLED, or if they failed they should read FAILED. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3835) Spark applications that are killed should show up as KILLED or CANCELLED in the Spark UI
[ https://issues.apache.org/jira/browse/SPARK-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165807#comment-14165807 ] Nan Zhu commented on SPARK-3835: ah, I see, didn't look at your description closely Does shutdown hook work? Spark applications that are killed should show up as KILLED or CANCELLED in the Spark UI Key: SPARK-3835 URL: https://issues.apache.org/jira/browse/SPARK-3835 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 1.1.0 Reporter: Matt Cheah Labels: UI Spark applications that crash or are killed are listed as FINISHED in the Spark UI. It looks like the Master only passes back a list of Running applications and a list of Completed applications, All of the applications under Completed have status FINISHED, however if they were killed manually they should show CANCELLED, or if they failed they should read FAILED. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3835) Spark applications that are killed should show up as KILLED or CANCELLED in the Spark UI
[ https://issues.apache.org/jira/browse/SPARK-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165815#comment-14165815 ] Nan Zhu commented on SPARK-3835: no...it cannot capture kill -9 Spark applications that are killed should show up as KILLED or CANCELLED in the Spark UI Key: SPARK-3835 URL: https://issues.apache.org/jira/browse/SPARK-3835 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 1.1.0 Reporter: Matt Cheah Labels: UI Spark applications that crash or are killed are listed as FINISHED in the Spark UI. It looks like the Master only passes back a list of Running applications and a list of Completed applications, All of the applications under Completed have status FINISHED, however if they were killed manually they should show CANCELLED, or if they failed they should read FAILED. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3759) SparkSubmitDriverBootstrapper should return exit code of driver process
[ https://issues.apache.org/jira/browse/SPARK-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164595#comment-14164595 ] Nan Zhu commented on SPARK-3759: just passed by for merged PR, shall the status of the corresponding JIRA be RESOLVED? SparkSubmitDriverBootstrapper should return exit code of driver process --- Key: SPARK-3759 URL: https://issues.apache.org/jira/browse/SPARK-3759 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.1.0 Environment: Linux, Windows, Scala/Java Reporter: Eric Eijkelenboom Assignee: Eric Eijkelenboom Priority: Minor Fix For: 1.1.1, 1.2.0 Original Estimate: 24h Remaining Estimate: 24h SparkSubmitDriverBootstrapper.scala currently always returns exit code 0. Instead, it should return the exit code of the driver process. Suggested code change in SparkSubmitDriverBootstrapper, line 157: {code} val returnCode = process.waitFor() sys.exit(returnCode) {code} Workaround for this issue: Instead of specifying 'driver.extra*' properties in spark-defaults.conf, pass these properties to spark-submit directly. This will launch the driver program without the use of SparkSubmitDriverBootstrapper. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3767) Support wildcard in Spark properties
[ https://issues.apache.org/jira/browse/SPARK-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164605#comment-14164605 ] Nan Zhu commented on SPARK-3767: do you mean something like -DpropertyName=EXEID, where EXEID will be interpreted in SparkDeploySchedulerBackend.scala? Support wildcard in Spark properties Key: SPARK-3767 URL: https://issues.apache.org/jira/browse/SPARK-3767 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or If the user sets spark.executor.extraJavaOptions, he/she may want to express the value in terms of the executor ID, for instance. In general it would be a feature that many will find useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146714#comment-14146714 ] Nan Zhu commented on SPARK-3628: https://github.com/apache/spark/pull/2524 Don't apply accumulator updates multiple times for tasks in result stages - Key: SPARK-3628 URL: https://issues.apache.org/jira/browse/SPARK-3628 Project: Spark Issue Type: Bug Reporter: Matei Zaharia Priority: Blocker In previous versions of Spark, accumulator updates only got applied once for accumulators that are only used in actions (i.e. result stages), letting you use them to deterministically compute a result. Unfortunately, this got broken in some recent refactorings. This is related to https://issues.apache.org/jira/browse/SPARK-732, but that issue is about applying the same semantics to intermediate stages too, which is more work and may not be what we want for debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2647) DAGScheduler plugs others when processing one JobSubmitted event
[ https://issues.apache.org/jira/browse/SPARK-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147378#comment-14147378 ] Nan Zhu commented on SPARK-2647: isn't it the expected behaviour as we keep DAGScheduler as a single-thread mode? DAGScheduler plugs others when processing one JobSubmitted event Key: SPARK-2647 URL: https://issues.apache.org/jira/browse/SPARK-2647 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: YanTang Zhai If a few of jobs are submitted, DAGScheduler plugs others when processing one JobSubmitted event. For example ont JobSubmitted event is processed as follows and costs much time spark-akka.actor.default-dispatcher-67 daemon prio=10 tid=0x7f75ec001000 nid=0x7dd6 in Object.wait() [0x7f76063e1000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.hadoopcdh3.ipc.Client.call(Client.java:1130) - locked 0x000783b17330 (a org.apache.hadoopcdh3.ipc.Client$Call) at org.apache.hadoopcdh3.ipc.RPC$Invoker.invoke(RPC.java:241) at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) at sun.reflect.GeneratedMethodAccessor86.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoopcdh3.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:83) at org.apache.hadoopcdh3.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:60) at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) at org.apache.hadoopcdh3.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1472) at org.apache.hadoopcdh3.hdfs.DFSClient.getBlockLocations(DFSClient.java:1498) at org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$1.doCall(Cdh3DistributedFileSystem.java:208) at org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$1.doCall(Cdh3DistributedFileSystem.java:204) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.getFileBlockLocations(Cdh3DistributedFileSystem.java:204) at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:233) at StorageEngineClient.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:141) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:172) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:54) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:54) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:54) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at
[jira] [Commented] (SPARK-2855) pyspark test cases crashed for no reason
[ https://issues.apache.org/jira/browse/SPARK-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113778#comment-14113778 ] Nan Zhu commented on SPARK-2855: [~joshrosen]? pyspark test cases crashed for no reason Key: SPARK-2855 URL: https://issues.apache.org/jira/browse/SPARK-2855 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Reporter: Nan Zhu I met this for several times, all scala/java test cases passed, but pyspark test cases just crashed https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17875/consoleFull -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2855) pyspark test cases crashed for no reason
[ https://issues.apache.org/jira/browse/SPARK-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113776#comment-14113776 ] Nan Zhu commented on SPARK-2855: I guess they have fixed this.Jenkins side mistake? pyspark test cases crashed for no reason Key: SPARK-2855 URL: https://issues.apache.org/jira/browse/SPARK-2855 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Reporter: Nan Zhu I met this for several times, all scala/java test cases passed, but pyspark test cases just crashed https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17875/consoleFull -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2855) pyspark test cases crashed for no reason
[ https://issues.apache.org/jira/browse/SPARK-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113965#comment-14113965 ] Nan Zhu commented on SPARK-2855: no https://github.com/apache/spark/pull/1313 search This particular failure was my fault, pyspark test cases crashed for no reason Key: SPARK-2855 URL: https://issues.apache.org/jira/browse/SPARK-2855 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Reporter: Nan Zhu I met this for several times, all scala/java test cases passed, but pyspark test cases just crashed https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17875/consoleFull -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2855) pyspark test cases crashed for no reason
Nan Zhu created SPARK-2855: -- Summary: pyspark test cases crashed for no reason Key: SPARK-2855 URL: https://issues.apache.org/jira/browse/SPARK-2855 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Reporter: Nan Zhu I met this for several times, all scala/java test cases passed, but pyspark test cases just crashed https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17875/consoleFull -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2456) Scheduler refactoring
[ https://issues.apache.org/jira/browse/SPARK-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073122#comment-14073122 ] Nan Zhu commented on SPARK-2456: maybe it's also related: https://github.com/apache/spark/pull/637 Scheduler refactoring - Key: SPARK-2456 URL: https://issues.apache.org/jira/browse/SPARK-2456 Project: Spark Issue Type: Improvement Reporter: Reynold Xin Assignee: Reynold Xin This is an umbrella ticket to track scheduler refactoring. We want to clearly define semantics and responsibilities of each component, and define explicit public interfaces for them so it is easier to understand and to contribute (also less buggy). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2454) Separate driver spark home from executor spark home
[ https://issues.apache.org/jira/browse/SPARK-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069479#comment-14069479 ] Nan Zhu commented on SPARK-2454: there is a related issue and fix https://issues.apache.org/jira/browse/SPARK-2404 https://github.com/apache/spark/pull/1331 where we should not overwrite SPARK_HOME in spark-submit and spark-class if the user has set those two values in our scenario, the remote cluster has the same user with the login portal, say codingcat, so the SPARK_HOME is set to /home/codingcat/spark-1.0, other users in the login portal has a soft link to SPARK_HOME in their own directory, however the current scripts overwrite the already-set SPARK_HOME to the pwd running spark-submit, which does not exist in the cluster, causing the exceptions like cannot run /home/local_user/spark-class Separate driver spark home from executor spark home --- Key: SPARK-2454 URL: https://issues.apache.org/jira/browse/SPARK-2454 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.1 Reporter: Andrew Or Fix For: 1.1.0 The driver may not always share the same directory structure as the executors. It makes little sense to always re-use the driver's spark home on the executors. https://github.com/apache/spark/pull/1244/ is an open effort to fix this. However, this still requires us to set SPARK_HOME on all the executor nodes. Really we should separate this out into something like `spark.executor.home` and `spark.driver.home` rather than re-using SPARK_HOME everywhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2454) Separate driver spark home from executor spark home
[ https://issues.apache.org/jira/browse/SPARK-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063402#comment-14063402 ] Nan Zhu commented on SPARK-2454: this will make sparkHome as an application-specific parameter explicitly, I just thought it will confuse the user since sparkHome is actually a global setup for all application/executors run on the same machine The good thing here is it can support the user to run the application in different version of spark sharing the same cluster.(especially when you are doing spark dev work) Separate driver spark home from executor spark home --- Key: SPARK-2454 URL: https://issues.apache.org/jira/browse/SPARK-2454 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.1.0 The driver may not always share the same directory structure as the executors. It makes little sense to always re-use the driver's spark home on the executors. https://github.com/apache/spark/pull/1244/ is an open effort to fix this. However, this still requires us to set SPARK_HOME on all the executor nodes. Really we should separate this out into something like spark.driver.home spark.executor.home rather than re-using SPARK_HOME everywhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2454) Separate driver spark home from executor spark home
[ https://issues.apache.org/jira/browse/SPARK-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064085#comment-14064085 ] Nan Zhu commented on SPARK-2454: I see, it makes sense to me... Separate driver spark home from executor spark home --- Key: SPARK-2454 URL: https://issues.apache.org/jira/browse/SPARK-2454 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.1.0 The driver may not always share the same directory structure as the executors. It makes little sense to always re-use the driver's spark home on the executors. https://github.com/apache/spark/pull/1244/ is an open effort to fix this. However, this still requires us to set SPARK_HOME on all the executor nodes. Really we should separate this out into something like `spark.executor.home` rather than re-using SPARK_HOME everywhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2459) the user should be able to configure the resources used by JDBC server
[ https://issues.apache.org/jira/browse/SPARK-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061981#comment-14061981 ] Nan Zhu commented on SPARK-2459: yeah, spark-submit may solve the problem... then we need to modify the start-thriftserver.sh (https://github.com/apache/spark/commit/8032fe2fae3ac40a02c6018c52e76584a14b3438#diff-acab5881e22c8120bd801f4cbdee33cdR24) to call spark-submit instead of spark-class directly the user should be able to configure the resources used by JDBC server -- Key: SPARK-2459 URL: https://issues.apache.org/jira/browse/SPARK-2459 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.1 Reporter: Nan Zhu I'm trying the jdbc server I found that the jdbc server always occupies all cores in the cluster the reason is that when creating HiveContext, it doesn't set anything related to spark.cores.max or spark.executor.memory SparkSQLEnv.scala(https://github.com/apache/spark/blob/8032fe2fae3ac40a02c6018c52e76584a14b3438/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala) L41-L43 [~liancheng] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1706) Allow multiple executors per worker in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062966#comment-14062966 ] Nan Zhu commented on SPARK-1706: Oh, the PR has been there for a whileI just rebased it, anyone wants to have a look ? Allow multiple executors per worker in Standalone mode -- Key: SPARK-1706 URL: https://issues.apache.org/jira/browse/SPARK-1706 Project: Spark Issue Type: Improvement Components: Deploy Reporter: Patrick Wendell Assignee: Nan Zhu Fix For: 1.1.0 Right now if people want to launch multiple executors on each machine they need to start multiple standalone workers. This is not too difficult, but it means you have extra JVM's sitting around. We should just allow users to set a number of cores they want per-executor in standalone mode and then allow packing multiple executors on each node. This would make standalone mode more consistent with YARN in the way you request resources. It's not too big of a change as far as I can see. You'd need to: 1. Introduce a configuration for how many cores you want per executor. 2. Change the scheduling logic in Master.scala to take this into account. 3. Change CoarseGrainedSchedulerBackend to not assume a 1-1 correspondence between hosts and executors. And maybe modify a few other places. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2459) the user should be able to configure the resources used by JDBC server
Nan Zhu created SPARK-2459: -- Summary: the user should be able to configure the resources used by JDBC server Key: SPARK-2459 URL: https://issues.apache.org/jira/browse/SPARK-2459 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Nan Zhu I'm trying the jdbc server I found that the jdbc server always occupy all cores in the cluster the reason is that when creating HiveContext, it doesn't set anything related to spark.cores.max or spark.executor.memory SparkSQLEnv.scala(https://github.com/apache/spark/blob/8032fe2fae3ac40a02c6018c52e76584a14b3438/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala) L41-L43 [~liancheng] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2459) the user should be able to configure the resources used by JDBC server
[ https://issues.apache.org/jira/browse/SPARK-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-2459: --- Description: I'm trying the jdbc server I found that the jdbc server always occupies all cores in the cluster the reason is that when creating HiveContext, it doesn't set anything related to spark.cores.max or spark.executor.memory SparkSQLEnv.scala(https://github.com/apache/spark/blob/8032fe2fae3ac40a02c6018c52e76584a14b3438/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala) L41-L43 [~liancheng] was: I'm trying the jdbc server I found that the jdbc server always occupy all cores in the cluster the reason is that when creating HiveContext, it doesn't set anything related to spark.cores.max or spark.executor.memory SparkSQLEnv.scala(https://github.com/apache/spark/blob/8032fe2fae3ac40a02c6018c52e76584a14b3438/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala) L41-L43 [~liancheng] the user should be able to configure the resources used by JDBC server -- Key: SPARK-2459 URL: https://issues.apache.org/jira/browse/SPARK-2459 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Nan Zhu I'm trying the jdbc server I found that the jdbc server always occupies all cores in the cluster the reason is that when creating HiveContext, it doesn't set anything related to spark.cores.max or spark.executor.memory SparkSQLEnv.scala(https://github.com/apache/spark/blob/8032fe2fae3ac40a02c6018c52e76584a14b3438/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala) L41-L43 [~liancheng] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2459) the user should be able to configure the resources used by JDBC server
[ https://issues.apache.org/jira/browse/SPARK-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059636#comment-14059636 ] Nan Zhu commented on SPARK-2459: I discussed with [~liancheng], he is working on merging the branch to master, so a new merge request may interrupt his work, he asked me to submit a JIRA first the user should be able to configure the resources used by JDBC server -- Key: SPARK-2459 URL: https://issues.apache.org/jira/browse/SPARK-2459 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Nan Zhu I'm trying the jdbc server I found that the jdbc server always occupies all cores in the cluster the reason is that when creating HiveContext, it doesn't set anything related to spark.cores.max or spark.executor.memory SparkSQLEnv.scala(https://github.com/apache/spark/blob/8032fe2fae3ac40a02c6018c52e76584a14b3438/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala) L41-L43 [~liancheng] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2459) the user should be able to configure the resources used by JDBC server
[ https://issues.apache.org/jira/browse/SPARK-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059636#comment-14059636 ] Nan Zhu edited comment on SPARK-2459 at 7/12/14 4:35 AM: - I discussed with [~liancheng], he is working on merging the branch to master, so a new pull request may interrupt his work, he asked me to submit a JIRA first was (Author: codingcat): I discussed with [~liancheng], he is working on merging the branch to master, so a new merge request may interrupt his work, he asked me to submit a JIRA first the user should be able to configure the resources used by JDBC server -- Key: SPARK-2459 URL: https://issues.apache.org/jira/browse/SPARK-2459 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Nan Zhu I'm trying the jdbc server I found that the jdbc server always occupies all cores in the cluster the reason is that when creating HiveContext, it doesn't set anything related to spark.cores.max or spark.executor.memory SparkSQLEnv.scala(https://github.com/apache/spark/blob/8032fe2fae3ac40a02c6018c52e76584a14b3438/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala) L41-L43 [~liancheng] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2404) spark-submit and spark-class may overwrite the already defined SPARK_HOME
Nan Zhu created SPARK-2404: -- Summary: spark-submit and spark-class may overwrite the already defined SPARK_HOME Key: SPARK-2404 URL: https://issues.apache.org/jira/browse/SPARK-2404 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Nan Zhu Fix For: 1.0.1 in spark-class and spark-submit the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten we should not overwrite that if SPARK_HOME has been defined -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2404) spark-submit and spark-class may overwrite the already defined SPARK_HOME
[ https://issues.apache.org/jira/browse/SPARK-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-2404: --- Description: in spark-class and spark-submit the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten we should not overwrite that if SPARK_HOME has been defined Our scenario we have a login portal for all the team members to use the spark-cluster, everyone gets an account and home directory spark-1.0 is copied in the root path /, every account gets a software link to the /spark-1.0 in it's home directory spark 1.0 is deployed in a cluster, where the user name is different with any accounts except one, say nanzhu, in login portal, so when the user runs spark-shell, it always tries to run /home/user_account/spark-1.0/bin/compute-class.sh, which does not exist. We set a global SPARK_HOME to /home/nanzhu/spark-1.0 globally which is consistent with the remote cluster setup, but unfortunately, this is overwritten by the spark-class and spark-submit was: in spark-class and spark-submit the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten we should not overwrite that if SPARK_HOME has been defined spark-submit and spark-class may overwrite the already defined SPARK_HOME - Key: SPARK-2404 URL: https://issues.apache.org/jira/browse/SPARK-2404 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu Fix For: 1.0.1 in spark-class and spark-submit the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten we should not overwrite that if SPARK_HOME has been defined Our scenario we have a login portal for all the team members to use the spark-cluster, everyone gets an account and home directory spark-1.0 is copied in the root path /, every account gets a software link to the /spark-1.0 in it's home directory spark 1.0 is deployed in a cluster, where the user name is different with any accounts except one, say nanzhu, in login portal, so when the user runs spark-shell, it always tries to run /home/user_account/spark-1.0/bin/compute-class.sh, which does not exist. We set a global SPARK_HOME to /home/nanzhu/spark-1.0 globally which is consistent with the remote cluster setup, but unfortunately, this is overwritten by the spark-class and spark-submit -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2404) spark-submit and spark-class may overwrite the already defined SPARK_HOME
[ https://issues.apache.org/jira/browse/SPARK-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055165#comment-14055165 ] Nan Zhu commented on SPARK-2404: PR https://github.com/apache/spark/pull/1331 spark-submit and spark-class may overwrite the already defined SPARK_HOME - Key: SPARK-2404 URL: https://issues.apache.org/jira/browse/SPARK-2404 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu Fix For: 1.0.1 in spark-class and spark-submit the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten we should not overwrite that if SPARK_HOME has been defined Our scenario we have a login portal for all the team members to use the spark-cluster, everyone gets an account and home directory spark-1.0 is copied in the root path /, every account gets a software link to the /spark-1.0 in it's home directory spark 1.0 is deployed in a cluster, where the user name is different with any accounts except one, say nanzhu, in login portal, so when the user runs spark-shell, it always tries to run /home/user_account/spark-1.0/bin/compute-class.sh, which does not exist. We set a global SPARK_HOME to /home/nanzhu/spark-1.0 globally which is consistent with the remote cluster setup, but unfortunately, this is overwritten by the spark-class and spark-submit -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2404) spark-submit and spark-class may overwrite the already defined SPARK_HOME
[ https://issues.apache.org/jira/browse/SPARK-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-2404: --- Description: in spark-class and spark-submit the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten we should not overwrite that if SPARK_HOME has been defined Our scenario we have a login portal for all the team members to use the spark-cluster, everyone gets an account and home directory spark-1.0 is copied in the root path /, every account gets a soft link to the /spark-1.0 in it's home directory spark 1.0 is deployed in a cluster, where the user name is different with any accounts except one, say nanzhu, in login portal, so when the user runs spark-shell, it always tries to run /home/user_account/spark-1.0/bin/compute-class.sh, which does not exist. We set a global SPARK_HOME to /home/nanzhu/spark-1.0 globally which is consistent with the remote cluster setup, but unfortunately, this is overwritten by the spark-class and spark-submit was: in spark-class and spark-submit the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten we should not overwrite that if SPARK_HOME has been defined Our scenario we have a login portal for all the team members to use the spark-cluster, everyone gets an account and home directory spark-1.0 is copied in the root path /, every account gets a software link to the /spark-1.0 in it's home directory spark 1.0 is deployed in a cluster, where the user name is different with any accounts except one, say nanzhu, in login portal, so when the user runs spark-shell, it always tries to run /home/user_account/spark-1.0/bin/compute-class.sh, which does not exist. We set a global SPARK_HOME to /home/nanzhu/spark-1.0 globally which is consistent with the remote cluster setup, but unfortunately, this is overwritten by the spark-class and spark-submit spark-submit and spark-class may overwrite the already defined SPARK_HOME - Key: SPARK-2404 URL: https://issues.apache.org/jira/browse/SPARK-2404 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu Fix For: 1.0.1 in spark-class and spark-submit the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten we should not overwrite that if SPARK_HOME has been defined Our scenario we have a login portal for all the team members to use the spark-cluster, everyone gets an account and home directory spark-1.0 is copied in the root path /, every account gets a soft link to the /spark-1.0 in it's home directory spark 1.0 is deployed in a cluster, where the user name is different with any accounts except one, say nanzhu, in login portal, so when the user runs spark-shell, it always tries to run /home/user_account/spark-1.0/bin/compute-class.sh, which does not exist. We set a global SPARK_HOME to /home/nanzhu/spark-1.0 globally which is consistent with the remote cluster setup, but unfortunately, this is overwritten by the spark-class and spark-submit -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2294) TaskSchedulerImpl and TaskSetManager do not properly prioritize which tasks get assigned to an executor
[ https://issues.apache.org/jira/browse/SPARK-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053329#comment-14053329 ] Nan Zhu commented on SPARK-2294: PR: https://github.com/apache/spark/pull/1313 TaskSchedulerImpl and TaskSetManager do not properly prioritize which tasks get assigned to an executor --- Key: SPARK-2294 URL: https://issues.apache.org/jira/browse/SPARK-2294 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.0.1 Reporter: Kay Ousterhout Assignee: Nan Zhu If an executor E is free, a task may be speculatively assigned to E when there are other tasks in the job that have not been launched (at all) yet. Similarly, a task without any locality preferences may be assigned to E when there was another NODE_LOCAL task that could have been scheduled. This happens because TaskSchedulerImpl calls TaskSetManager.resourceOffer (which in turn calls TaskSetManager.findTask) with increasing locality levels, beginning with PROCESS_LOCAL, followed by NODE_LOCAL, and so on until the highest currently allowed level. Now, supposed NODE_LOCAL is the highest currently allowed locality level. The first time findTask is called, it will be called with max level PROCESS_LOCAL; if it cannot find any PROCESS_LOCAL tasks, it will try to schedule tasks with no locality preferences or speculative tasks. As a result, speculative tasks or tasks with no preferences may be scheduled instead of NODE_LOCAL tasks. cc [~matei] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1525) TaskSchedulerImpl should decrease availableCpus by spark.task.cpus not 1
[ https://issues.apache.org/jira/browse/SPARK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049054#comment-14049054 ] Nan Zhu commented on SPARK-1525: this is fixed by https://github.com/apache/spark/commit/f8111eaeb0e35f6aa9b1e3ec1173fff207174155 TaskSchedulerImpl should decrease availableCpus by spark.task.cpus not 1 Key: SPARK-1525 URL: https://issues.apache.org/jira/browse/SPARK-1525 Project: Spark Issue Type: Bug Components: Spark Core Reporter: YanTang Zhai Priority: Minor TaskSchedulerImpl decreases availableCpus by 1 in resourceOffers process always even though spark.task.cpus is more than 1, which will schedule more tasks to some node when spark.task.cpus is more than 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2126) Move MapOutputTracker behind ShuffleManager interface
[ https://issues.apache.org/jira/browse/SPARK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045514#comment-14045514 ] Nan Zhu commented on SPARK-2126: PR: https://github.com/apache/spark/pull/1240 Move MapOutputTracker behind ShuffleManager interface - Key: SPARK-2126 URL: https://issues.apache.org/jira/browse/SPARK-2126 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Reporter: Matei Zaharia Assignee: Nan Zhu This will require changing the interface between the DAGScheduler and MapOutputTracker to be method calls on the ShuffleManager instead. However, it will make it easier to do push-based shuffle and other ideas requiring changes to map output tracking. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2038) Don't shadow conf variable in saveAsHadoop functions
[ https://issues.apache.org/jira/browse/SPARK-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037739#comment-14037739 ] Nan Zhu commented on SPARK-2038: [~pwendell] Yeah, it's a good idea, just submit a new PR: https://github.com/apache/spark/pull/1137 Don't shadow conf variable in saveAsHadoop functions -- Key: SPARK-2038 URL: https://issues.apache.org/jira/browse/SPARK-2038 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Nan Zhu Priority: Minor Labels: api-breaking Fix For: 1.1.0 This could lead to a lot of bugs. We should just change it to hadoopConf. I noticed this when reviewing SPARK-1677. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2126) Move MapOutputTracker behind ShuffleManager interface
[ https://issues.apache.org/jira/browse/SPARK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037746#comment-14037746 ] Nan Zhu commented on SPARK-2126: [~pwendell] Yes, [~markhamstra] just emailed me Yes, I have been working on it for two evenings, it's a big change and I haven't make any significant change, so I don't mind that a core developer come to lead this and I'm still willing to contribute anything I can Move MapOutputTracker behind ShuffleManager interface - Key: SPARK-2126 URL: https://issues.apache.org/jira/browse/SPARK-2126 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Reporter: Matei Zaharia Assignee: Nan Zhu This will require changing the interface between the DAGScheduler and MapOutputTracker to be method calls on the ShuffleManager instead. However, it will make it easier to do push-based shuffle and other ideas requiring changes to map output tracking. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1471) Worker not recognize Driver state at standalone mode
[ https://issues.apache.org/jira/browse/SPARK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033984#comment-14033984 ] Nan Zhu commented on SPARK-1471: I will fix it right now Worker not recognize Driver state at standalone mode - Key: SPARK-1471 URL: https://issues.apache.org/jira/browse/SPARK-1471 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 0.9.0 Environment: standalone Reporter: shenhong When I run a spark job in standalone, ./bin/spark-class org.apache.spark.deploy.Client launch spark://v125050024.bja:7077 file:///home/yuling.sh/spark-0.9.0-incubating/examples/target/spark-examples_2.10-0.9.0-incubating.jar org.apache.spark.examples.SparkPi Here is the Worker log. 14/04/11 11:15:04 ERROR OneForOneStrategy: FAILED (of class scala.Enumeration$Val) scala.MatchError: FAILED (of class scala.Enumeration$Val) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1471) Worker not recognize Driver state at standalone mode
[ https://issues.apache.org/jira/browse/SPARK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033986#comment-14033986 ] Nan Zhu commented on SPARK-1471: this has been fixed by https://github.com/apache/spark/commit/95e4c9c6fb153b7f0aa4c442c4bdb6552d326640 Worker not recognize Driver state at standalone mode - Key: SPARK-1471 URL: https://issues.apache.org/jira/browse/SPARK-1471 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 0.9.0 Environment: standalone Reporter: shenhong When I run a spark job in standalone, ./bin/spark-class org.apache.spark.deploy.Client launch spark://v125050024.bja:7077 file:///home/yuling.sh/spark-0.9.0-incubating/examples/target/spark-examples_2.10-0.9.0-incubating.jar org.apache.spark.examples.SparkPi Here is the Worker log. 14/04/11 11:15:04 ERROR OneForOneStrategy: FAILED (of class scala.Enumeration$Val) scala.MatchError: FAILED (of class scala.Enumeration$Val) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2038) Don't shadow conf variable in saveAsHadoop functions
[ https://issues.apache.org/jira/browse/SPARK-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034739#comment-14034739 ] Nan Zhu commented on SPARK-2038: Ah, I seethat's fine... Don't shadow conf variable in saveAsHadoop functions -- Key: SPARK-2038 URL: https://issues.apache.org/jira/browse/SPARK-2038 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Nan Zhu Priority: Minor Fix For: 1.1.0 This could lead to a lot of bugs. We should just change it to hadoopConf. I noticed this when reviewing SPARK-1677. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2126) Move MapOutputTracker behind ShuffleManager interface
[ https://issues.apache.org/jira/browse/SPARK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032319#comment-14032319 ] Nan Zhu commented on SPARK-2126: [~matei], how about assigning it to me? I'm interested in working on this, thanks! Move MapOutputTracker behind ShuffleManager interface - Key: SPARK-2126 URL: https://issues.apache.org/jira/browse/SPARK-2126 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Reporter: Matei Zaharia This will require changing the interface between the DAGScheduler and MapOutputTracker to be method calls on the ShuffleManager instead. However, it will make it easier to do push-based shuffle and other ideas requiring changes to map output tracking. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2039) Run hadoop output checks for all formats
[ https://issues.apache.org/jira/browse/SPARK-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031745#comment-14031745 ] Nan Zhu commented on SPARK-2039: PR: https://github.com/apache/spark/pull/1088 Run hadoop output checks for all formats Key: SPARK-2039 URL: https://issues.apache.org/jira/browse/SPARK-2039 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Nan Zhu Now that SPARK-1677 allows users to disable output checks, we should just run them for all types of output formats. I'm not sure why we didn't do this originally but it might have been out of defensiveness since we weren't sure what all implementations did. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2038) Don't shadow conf variable in saveAsHadoop functions
[ https://issues.apache.org/jira/browse/SPARK-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031746#comment-14031746 ] Nan Zhu commented on SPARK-2038: PR: https://github.com/apache/spark/pull/1087 Don't shadow conf variable in saveAsHadoop functions -- Key: SPARK-2038 URL: https://issues.apache.org/jira/browse/SPARK-2038 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Nan Zhu Priority: Minor This could lead to a lot of bugs. We should just change it to hadoopConf. I noticed this when reviewing SPARK-1677. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-732) Recomputation of RDDs may result in duplicated accumulator updates
[ https://issues.apache.org/jira/browse/SPARK-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029503#comment-14029503 ] Nan Zhu commented on SPARK-732: --- I actually made a PR long time ago https://github.com/apache/spark/pull/228 Recomputation of RDDs may result in duplicated accumulator updates -- Key: SPARK-732 URL: https://issues.apache.org/jira/browse/SPARK-732 Project: Spark Issue Type: Bug Affects Versions: 0.7.0, 0.6.2, 0.7.1, 0.8.0, 0.7.2, 0.7.3, 0.8.1, 0.8.2, 0.9.0 Reporter: Josh Rosen Assignee: Nan Zhu Currently, Spark doesn't guard against duplicated updates to the same accumulator due to recomputations of an RDD. For example: {code} val acc = sc.accumulator(0) data.map(x = acc += 1; f(x)) data.count() // acc should equal data.count() here data.foreach{...} // Now, acc = 2 * data.count() because the map() was recomputed. {code} I think that this behavior is incorrect, especially because this behavior allows the additon or removal of a cache() call to affect the outcome of a computation. There's an old TODO to fix this duplicate update issue in the [DAGScheduler code|https://github.com/mesos/spark/blob/ec5e553b418be43aa3f0ccc24e0d5ca9d63504b2/core/src/main/scala/spark/scheduler/DAGScheduler.scala#L494]. I haven't tested whether recomputation due to blocks being dropped from the cache can trigger duplicate accumulator updates. Hypothetically someone could be relying on the current behavior to implement performance counters that track the actual number of computations performed (including recomputations). To be safe, we could add an explicit warning in the release notes that documents the change in behavior when we fix this. Ignoring duplicate updates shouldn't be too hard, but there are a few subtleties. Currently, we allow accumulators to be used in multiple transformations, so we'd need to detect duplicate updates at the per-transformation level. I haven't dug too deeply into the scheduler internals, but we might also run into problems where pipelining causes what is logically one set of accumulator updates to show up in two different tasks (e.g. rdd.map(accum += x; ...) and rdd.map(accum += x; ...).count() may cause what's logically the same accumulator update to be applied from two different contexts, complicating the detection of duplicate updates). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1976) misleading streaming document
Nan Zhu created SPARK-1976: -- Summary: misleading streaming document Key: SPARK-1976 URL: https://issues.apache.org/jira/browse/SPARK-1976 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Nan Zhu Fix For: 1.0.1 Spark streaming requires at least two working thread, but the document gives the example like import org.apache.spark.api.java.function._ import org.apache.spark.streaming._ import org.apache.spark.streaming.api._ // Create a StreamingContext with a local master val ssc = new StreamingContext(local, NetworkWordCount, Seconds(1)) http://spark.apache.org/docs/latest/streaming-programming-guide.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1821) Document History Server
[ https://issues.apache.org/jira/browse/SPARK-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu closed SPARK-1821. -- Resolution: Implemented sorry, missed some documents Document History Server --- Key: SPARK-1821 URL: https://issues.apache.org/jira/browse/SPARK-1821 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.0.0 Reporter: Nan Zhu In 1.0, there is a new component, history server, which is not mentioned in http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/ I think we'd better add the missing document -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1821) Document History Server
Nan Zhu created SPARK-1821: -- Summary: Document History Server Key: SPARK-1821 URL: https://issues.apache.org/jira/browse/SPARK-1821 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.0.0 Reporter: Nan Zhu In 1.0, there is a new component in the standalone mode, history server, which is not mentioned in http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/spark-standalone.html I think we'd better add the missing document -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1821) Document History Server
[ https://issues.apache.org/jira/browse/SPARK-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-1821: --- Issue Type: Improvement (was: Bug) Document History Server --- Key: SPARK-1821 URL: https://issues.apache.org/jira/browse/SPARK-1821 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.0.0 Reporter: Nan Zhu In 1.0, there is a new component in the standalone mode, history server, which is not mentioned in http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/spark-standalone.html I think we'd better add the missing document -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1771) CoarseGrainedSchedulerBackend is not resilient to Akka restarts
[ https://issues.apache.org/jira/browse/SPARK-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995962#comment-13995962 ] Nan Zhu commented on SPARK-1771: [#Aaron Davidson], I think there are basically two ways to fix this bug, which depends on whether we want to allow the restarting of the driver 1. assume we allow the restarting, we may need something similar to the persistentEngine in the deploy package 2. if not, we can introduce a supervisor actor to stop the DriverActor and kill the executorsjust similar with what we just did in the DAGScheduler CoarseGrainedSchedulerBackend is not resilient to Akka restarts --- Key: SPARK-1771 URL: https://issues.apache.org/jira/browse/SPARK-1771 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Aaron Davidson The exception reported in SPARK-1769 was propagated through the CoarseGrainedSchedulerBackend, and caused an Actor restart of the DriverActor. Unfortunately, this actor does not seem to have been written with Akka restartability in mind. For instance, the new DriverActor has lost all state about the prior Executors without cleanly disconnecting them. This means that the driver actually has executors attached to it, but doesn't think it does, which leads to mayhem of various sorts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1603) flaky test case in StreamingContextSuite
[ https://issues.apache.org/jira/browse/SPARK-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu reassigned SPARK-1603: -- Assignee: Nan Zhu flaky test case in StreamingContextSuite Key: SPARK-1603 URL: https://issues.apache.org/jira/browse/SPARK-1603 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 0.9.0, 1.0.0, 0.9.1 Reporter: Nan Zhu Assignee: Nan Zhu When Jenkins was testing 5 PRs at the same time, the test results in my PR shows that stop gracefully in StreamingContextSuite failed, the stacktrace is as {quote} stop gracefully *** FAILED *** (8 seconds, 350 milliseconds) [info] akka.actor.InvalidActorNameException: actor name [JobScheduler] is not unique! [info] at akka.actor.dungeon.ChildrenContainer$TerminatingChildrenContainer.reserve(ChildrenContainer.scala:192) [info] at akka.actor.dungeon.Children$class.reserveChild(Children.scala:77) [info] at akka.actor.ActorCell.reserveChild(ActorCell.scala:338) [info] at akka.actor.dungeon.Children$class.makeChild(Children.scala:186) [info] at akka.actor.dungeon.Children$class.attachChild(Children.scala:42) [info] at akka.actor.ActorCell.attachChild(ActorCell.scala:338) [info] at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:518) [info] at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:57) [info] at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:434) [info] at org.apache.spark.streaming.StreamingContextSuite$$anonfun$14$$anonfun$apply$mcV$sp$3.apply$mcVI$sp(StreamingContextSuite.scala:174) [info] at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) [info] at org.apache.spark.streaming.StreamingContextSuite$$anonfun$14.apply$mcV$sp(StreamingContextSuite.scala:163) [info] at org.apache.spark.streaming.StreamingContextSuite$$anonfun$14.apply(StreamingContextSuite.scala:159) [info] at org.apache.spark.streaming.StreamingContextSuite$$anonfun$14.apply(StreamingContextSuite.scala:159) [info] at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1974) [info] at org.apache.spark.streaming.StreamingContextSuite.withFixture(StreamingContextSuite.scala:34) [info] at org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262) [info] at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) [info] at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198) [info] at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271) [info] at org.apache.spark.streaming.StreamingContextSuite.org$scalatest$BeforeAndAfter$$super$runTest(StreamingContextSuite.scala:34) [info] at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:171) [info] at org.apache.spark.streaming.StreamingContextSuite.runTest(StreamingContextSuite.scala:34) [info] at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) [info] at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) [info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260) [info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326) [info] at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304) [info] at org.apache.spark.streaming.StreamingContextSuite.runTests(StreamingContextSuite.scala:34) [info] at org.scalatest.Suite$class.run(Suite.scala:2303) [info] at org.apache.spark.streaming.StreamingContextSuite.org$scalatest$FunSuite$$super$run(StreamingContextSuite.scala:34) [info] at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) [info] at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:362) [info] at org.scalatest.FunSuite$class.run(FunSuite.scala:1310) [info] at org.apache.spark.streaming.StreamingContextSuite.org$scalatest$BeforeAndAfter$$super$run(StreamingContextSuite.scala:34) [info] at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:208) [info] at org.apache.spark.streaming.StreamingContextSuite.run(StreamingContextSuite.scala:34) [info] at org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:214) [info] at