[jira] [Updated] (SPARK-4883) Add a name to the directoryCleaner thread
[ https://issues.apache.org/jira/browse/SPARK-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4883: - Affects Version/s: 1.2.0 > Add a name to the directoryCleaner thread > - > > Key: SPARK-4883 > URL: https://issues.apache.org/jira/browse/SPARK-4883 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 1.2.0 >Reporter: Shixiong Zhu >Priority: Minor > Fix For: 1.3.0, 1.2.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4733) Add missing prameter comments in ShuffleDependency
[ https://issues.apache.org/jira/browse/SPARK-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4733: - Affects Version/s: 1.2.0 > Add missing prameter comments in ShuffleDependency > -- > > Key: SPARK-4733 > URL: https://issues.apache.org/jira/browse/SPARK-4733 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Takeshi Yamamuro >Priority: Trivial > > Add missing Javadoc comments in ShuffleDependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4733) Add missing prameter comments in ShuffleDependency
[ https://issues.apache.org/jira/browse/SPARK-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4733. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Takeshi Yamamuro Target Version/s: 1.3.0 > Add missing prameter comments in ShuffleDependency > -- > > Key: SPARK-4733 > URL: https://issues.apache.org/jira/browse/SPARK-4733 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Trivial > Fix For: 1.3.0 > > > Add missing Javadoc comments in ShuffleDependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4447. Resolution: Fixed Fix Version/s: 1.3.0 > Remove layers of abstraction in YARN code no longer needed after dropping > yarn-alpha > > > Key: SPARK-4447 > URL: https://issues.apache.org/jira/browse/SPARK-4447 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.3.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.3.0 > > > For example, YarnRMClient and YarnRMClientImpl can be merged > YarnAllocator and YarnAllocationHandler can be merged -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4447: - Priority: Critical (was: Major) > Remove layers of abstraction in YARN code no longer needed after dropping > yarn-alpha > > > Key: SPARK-4447 > URL: https://issues.apache.org/jira/browse/SPARK-4447 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.3.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza >Priority: Critical > Fix For: 1.3.0 > > > For example, YarnRMClient and YarnRMClientImpl can be merged > YarnAllocator and YarnAllocationHandler can be merged -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-2458) Make failed application log visible on History Server
[ https://issues.apache.org/jira/browse/SPARK-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-2458. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Masayoshi TSUZUKI Target Version/s: 1.3.0 > Make failed application log visible on History Server > - > > Key: SPARK-2458 > URL: https://issues.apache.org/jira/browse/SPARK-2458 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Masayoshi TSUZUKI >Assignee: Masayoshi TSUZUKI > Fix For: 1.3.0 > > > History server is very helpful for debugging application correctness & > performance after the application finished. However, when the application > failed, the link is not listed on the hisotry server UI and history can't be > viewed. > It would be very useful if we can check the history of failed application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4286) Support External Shuffle Service with Mesos integration
[ https://issues.apache.org/jira/browse/SPARK-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4286: - Affects Version/s: 1.2.0 > Support External Shuffle Service with Mesos integration > --- > > Key: SPARK-4286 > URL: https://issues.apache.org/jira/browse/SPARK-4286 > Project: Spark > Issue Type: Task > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Timothy Chen >Assignee: Timothy Chen > > With the new external shuffle service added, we need to also make the Mesos > integration able to launch the shuffle service and support the auto scaling > executors. > Mesos executor will launch the external shuffle service and leave it running, > while have spark executors scalable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4389) Set akka.remote.netty.tcp.bind-hostname="0.0.0.0" so driver can be located behind NAT
[ https://issues.apache.org/jira/browse/SPARK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4389: - Affects Version/s: 1.2.0 > Set akka.remote.netty.tcp.bind-hostname="0.0.0.0" so driver can be located > behind NAT > - > > Key: SPARK-4389 > URL: https://issues.apache.org/jira/browse/SPARK-4389 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Josh Rosen >Priority: Minor > > We should set {{akka.remote.netty.tcp.bind-hostname="0.0.0.0"}} in our Akka > configuration so that Spark drivers can be located behind NATs / work with > weird DNS setups. > This is blocked by upgrading our Akka version, since this configuration is > not present Akka 2.3.4. There might be a different approach / workaround > that works on our current Akka version, though. > EDIT: this is blocked by Akka 2.4, since this feature is only available in > the 2.4 snapshot release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4951) A busy executor may be killed when dynamicAllocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4951: - Affects Version/s: 1.2.0 > A busy executor may be killed when dynamicAllocation is enabled > --- > > Key: SPARK-4951 > URL: https://issues.apache.org/jira/browse/SPARK-4951 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Shixiong Zhu > > If a task runs more than `spark.dynamicAllocation.executorIdleTimeout`, the > executor which runs this task will be killed. > The following steps (yarn-client mode) can reproduce this bug: > 1. Start `spark-shell` using > {code} > ./bin/spark-shell --conf "spark.shuffle.service.enabled=true" \ > --conf "spark.dynamicAllocation.minExecutors=1" \ > --conf "spark.dynamicAllocation.maxExecutors=4" \ > --conf "spark.dynamicAllocation.enabled=true" \ > --conf "spark.dynamicAllocation.executorIdleTimeout=30" \ > --master yarn-client \ > --driver-memory 512m \ > --executor-memory 512m \ > --executor-cores 1 > {code} > 2. Wait more than 30 seconds until there is only one executor. > 3. Run the following code (a task needs at least 50 seconds to finish) > {code} > val r = sc.parallelize(1 to 1000, 20).map{t => Thread.sleep(1000); > t}.groupBy(_ % 2).collect() > {code} > 4. Executors will be killed and allocated all the time, which makes the Job > fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5522) Accelerate the History Server start
[ https://issues.apache.org/jira/browse/SPARK-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5522: - Affects Version/s: 1.0.0 > Accelerate the History Server start > --- > > Key: SPARK-5522 > URL: https://issues.apache.org/jira/browse/SPARK-5522 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 1.0.0 >Reporter: Liangliang Gu > Fix For: 1.4.0 > > > When starting the history server, all the log files will be fetched and > parsed in order to get the applications' meta data e.g. App Name, Start Time, > Duration, etc. In our production cluster, there exist 2600 log files (160G) > in HDFS and it costs 3 hours to restart the history server, which is a little > bit too long for us. > It would be better, if the history server can show logs with missing > information during start-up and fill the missing information after fetching > and parsing a log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5522) Accelerate the History Server start
[ https://issues.apache.org/jira/browse/SPARK-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5522: - Assignee: Liangliang Gu > Accelerate the History Server start > --- > > Key: SPARK-5522 > URL: https://issues.apache.org/jira/browse/SPARK-5522 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 1.0.0 >Reporter: Liangliang Gu >Assignee: Liangliang Gu > Fix For: 1.4.0 > > > When starting the history server, all the log files will be fetched and > parsed in order to get the applications' meta data e.g. App Name, Start Time, > Duration, etc. In our production cluster, there exist 2600 log files (160G) > in HDFS and it costs 3 hours to restart the history server, which is a little > bit too long for us. > It would be better, if the history server can show logs with missing > information during start-up and fill the missing information after fetching > and parsing a log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5522) Accelerate the History Server start
[ https://issues.apache.org/jira/browse/SPARK-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5522: - Fix Version/s: 1.4.0 > Accelerate the History Server start > --- > > Key: SPARK-5522 > URL: https://issues.apache.org/jira/browse/SPARK-5522 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 1.0.0 >Reporter: Liangliang Gu > Fix For: 1.4.0 > > > When starting the history server, all the log files will be fetched and > parsed in order to get the applications' meta data e.g. App Name, Start Time, > Duration, etc. In our production cluster, there exist 2600 log files (160G) > in HDFS and it costs 3 hours to restart the history server, which is a little > bit too long for us. > It would be better, if the history server can show logs with missing > information during start-up and fill the missing information after fetching > and parsing a log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4777) Some block memory after unrollSafely not count into used memory(memoryStore.entrys or unrollMemory)
[ https://issues.apache.org/jira/browse/SPARK-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4777. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: SuYan Target Version/s: 1.4.0 > Some block memory after unrollSafely not count into used > memory(memoryStore.entrys or unrollMemory) > --- > > Key: SPARK-4777 > URL: https://issues.apache.org/jira/browse/SPARK-4777 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: SuYan >Assignee: SuYan > Fix For: 1.4.0 > > > Some memory not count into memory used by memoryStore or unrollMemory. > Thread A after unrollsafely memory, it will release 40MB unrollMemory(40MB > will used by other threads). then ThreadA wait get accountingLock to tryToPut > blockA(30MB). before Thread A get accountingLock, blockA memory size is not > counting into unrollMemory or memoryStore.currentMemory. > > IIUC, freeMemory should minus that block memory -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6132) Context cleaner thread lives across SparkContexts
Andrew Or created SPARK-6132: Summary: Context cleaner thread lives across SparkContexts Key: SPARK-6132 URL: https://issues.apache.org/jira/browse/SPARK-6132 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Andrew Or Assignee: Andrew Or The context cleaner thread is not stopped properly. If a SparkContext is started immediately after one stops, the context cleaner of the former can clean variables in the latter. This is because the cleaner.stop() just sets a flag and expects the thread to terminate asynchronously, but the code to clean broadcasts goes through `SparkEnv.get.blockManager`, which could belong to a different SparkContext. The right behavior is to wait until all currently running clean up tasks have finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6133) SparkContext#stop is not idempotent
Andrew Or created SPARK-6133: Summary: SparkContext#stop is not idempotent Key: SPARK-6133 URL: https://issues.apache.org/jira/browse/SPARK-6133 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Andrew Or Assignee: Andrew Or If we call sc.stop() twice, the listener bus will attempt to log an event after stop() is called, resulting in a scary error message. This happens if Spark calls sc.stop() internally (it does this on certain error conditions) and the application code calls it again, for instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6132) Context cleaner thread lives across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6132: - Description: The context cleaner thread is not stopped properly. If a SparkContext is started immediately after one stops, the context cleaner of the former can clean variables in the latter. This is because the cleaner.stop() just sets a flag and expects the thread to terminate asynchronously, but the code to clean broadcasts goes through `SparkEnv.get.blockManager`, which could belong to a different SparkContext. This is likely to be the cause of the `JavaAPISuite`, which creates many back-to-back SparkContexts, being flaky: {code} java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) ... Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at scala.Option.getOrElse(Option.scala:120) {code} The right behavior is to wait until all currently running clean up tasks have finished. was: The context cleaner thread is not stopped properly. If a SparkContext is started immediately after one stops, the context cleaner of the former can clean variables in the latter. This is because the cleaner.stop() just sets a flag and expects the thread to terminate asynchronously, but the code to clean broadcasts goes through `SparkEnv.get.blockManager`, which could belong to a different SparkContext. This is likely to be the cause of the `JavaAPISuite`, which creates many back-to-back SparkContexts, being flaky: {code} java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at scala.Option.getOrElse(Option.scala:120) {code} The right behavior is to wait until all currently running clean up tasks have finished. > Context cleaner thread lives across SparkContexts > - > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Andrew Or > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky: > {code} > java.io.IOException: org.apache.s
[jira] [Updated] (SPARK-6132) Context cleaner thread lives across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6132: - Description: The context cleaner thread is not stopped properly. If a SparkContext is started immediately after one stops, the context cleaner of the former can clean variables in the latter. This is because the cleaner.stop() just sets a flag and expects the thread to terminate asynchronously, but the code to clean broadcasts goes through `SparkEnv.get.blockManager`, which could belong to a different SparkContext. This is likely to be the cause of the `JavaAPISuite`, which creates many back-to-back SparkContexts, being flaky. The right behavior is to wait until all currently running clean up tasks have finished. {code} java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) ... Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at scala.Option.getOrElse(Option.scala:120) {code} was: The context cleaner thread is not stopped properly. If a SparkContext is started immediately after one stops, the context cleaner of the former can clean variables in the latter. This is because the cleaner.stop() just sets a flag and expects the thread to terminate asynchronously, but the code to clean broadcasts goes through `SparkEnv.get.blockManager`, which could belong to a different SparkContext. This is likely to be the cause of the `JavaAPISuite`, which creates many back-to-back SparkContexts, being flaky: {code} java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) ... Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at scala.Option.getOrElse(Option.scala:120) {code} The right behavior is to wait until all currently running clean up tasks have finished. > Context cleaner thread lives across SparkContexts > - > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Andrew Or > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky. > The right behavior is to wait until all currently running clean up tasks have > finished. > {code} > java.io.IOException: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > ... > Caused by: org.apache.spark.SparkException:
[jira] [Updated] (SPARK-6132) Context cleaner thread lives across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6132: - Description: The context cleaner thread is not stopped properly. If a SparkContext is started immediately after one stops, the context cleaner of the former can clean variables in the latter. This is because the cleaner.stop() just sets a flag and expects the thread to terminate asynchronously, but the code to clean broadcasts goes through `SparkEnv.get.blockManager`, which could belong to a different SparkContext. This is likely to be the cause of the `JavaAPISuite`, which creates many back-to-back SparkContexts, being flaky: {code} java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at scala.Option.getOrElse(Option.scala:120) {code} The right behavior is to wait until all currently running clean up tasks have finished. was: The context cleaner thread is not stopped properly. If a SparkContext is started immediately after one stops, the context cleaner of the former can clean variables in the latter. This is because the cleaner.stop() just sets a flag and expects the thread to terminate asynchronously, but the code to clean broadcasts goes through `SparkEnv.get.blockManager`, which could belong to a different SparkContext. The right behavior is to wait until all currently running clean up tasks have finished. > Context cleaner thread lives across SparkContexts > - > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Andrew Or > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky: > {code} > java.io.IOException: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) > at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java
[jira] [Closed] (SPARK-6020) Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite
[ https://issues.apache.org/jira/browse/SPARK-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6020. Resolution: Fixed > Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite > - > > Key: SPARK-6020 > URL: https://issues.apache.org/jira/browse/SPARK-6020 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Cheng Lian >Priority: Critical > > Observed in the following builds, only one of which has something to do with > SQL: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27931/ > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27930/ > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27929/ > org.apache.spark.sql.columnar.PartitionBatchPruningSuite.SELECT key FROM > pruningData WHERE NOT (key IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, > 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)) > {code} > Error Message > 8 did not equal 10 Wrong number of read batches: == Parsed Logical Plan == > 'Project ['key] 'Filter NOT 'key IN > (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) >'UnresolvedRelation [pruningData], None == Analyzed Logical Plan == > Project [key#5245] Filter NOT key#5245 IN > (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) >LogicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] at mapPartitions > at ExistingRDD.scala:35 == Optimized Logical Plan == Project [key#5245] > Filter NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15) >InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, > false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] > at mapPartitions at ExistingRDD.scala:35), Some(pruningData) == Physical > Plan == Filter NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15) > InMemoryColumnarTableScan [key#5245], [NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)], > (InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, > false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] > at mapPartitions at ExistingRDD.scala:35), Some(pruningData)) Code > Generation: false == RDD == > Stacktrace > sbt.ForkMain$ForkError: 8 did not equal 10 Wrong number of read batches: == > Parsed Logical Plan == > 'Project ['key] > 'Filter NOT 'key IN > (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) > 'UnresolvedRelation [pruningData], None > == Analyzed Logical Plan == > Project [key#5245] > Filter NOT key#5245 IN > (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) > LogicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] at mapPartitions > at ExistingRDD.scala:35 > == Optimized Logical Plan == > Project [key#5245] > Filter NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15) > InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, > false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] > at mapPartitions at ExistingRDD.scala:35), Some(pruningData) > == Physical Plan == > Filter NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15) > InMemoryColumnarTableScan [key#5245], [NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)], > (InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, > false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] > at mapPartitions at ExistingRDD.scala:35), Some(pruningData)) > Code Generation: false > == RDD == > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply$mcV$sp(PartitionBatchPruningSuite.scala:119) > at > org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply(PartitionBatchPruningSuite.scala:107) > at > org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply(PartitionBatchPruningSuite.scala:107) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$cl
[jira] [Commented] (SPARK-6020) Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite
[ https://issues.apache.org/jira/browse/SPARK-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344628#comment-14344628 ] Andrew Or commented on SPARK-6020: -- Ok, I will close this as resolved for now. We can always reopen it if it's flaky again. Thanks Josh. > Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite > - > > Key: SPARK-6020 > URL: https://issues.apache.org/jira/browse/SPARK-6020 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Cheng Lian >Priority: Critical > > Observed in the following builds, only one of which has something to do with > SQL: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27931/ > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27930/ > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27929/ > org.apache.spark.sql.columnar.PartitionBatchPruningSuite.SELECT key FROM > pruningData WHERE NOT (key IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, > 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)) > {code} > Error Message > 8 did not equal 10 Wrong number of read batches: == Parsed Logical Plan == > 'Project ['key] 'Filter NOT 'key IN > (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) >'UnresolvedRelation [pruningData], None == Analyzed Logical Plan == > Project [key#5245] Filter NOT key#5245 IN > (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) >LogicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] at mapPartitions > at ExistingRDD.scala:35 == Optimized Logical Plan == Project [key#5245] > Filter NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15) >InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, > false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] > at mapPartitions at ExistingRDD.scala:35), Some(pruningData) == Physical > Plan == Filter NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15) > InMemoryColumnarTableScan [key#5245], [NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)], > (InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, > false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] > at mapPartitions at ExistingRDD.scala:35), Some(pruningData)) Code > Generation: false == RDD == > Stacktrace > sbt.ForkMain$ForkError: 8 did not equal 10 Wrong number of read batches: == > Parsed Logical Plan == > 'Project ['key] > 'Filter NOT 'key IN > (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) > 'UnresolvedRelation [pruningData], None > == Analyzed Logical Plan == > Project [key#5245] > Filter NOT key#5245 IN > (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) > LogicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] at mapPartitions > at ExistingRDD.scala:35 > == Optimized Logical Plan == > Project [key#5245] > Filter NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15) > InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, > false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] > at mapPartitions at ExistingRDD.scala:35), Some(pruningData) > == Physical Plan == > Filter NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15) > InMemoryColumnarTableScan [key#5245], [NOT key#5245 INSET > (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)], > (InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, > false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] > at mapPartitions at ExistingRDD.scala:35), Some(pruningData)) > Code Generation: false > == RDD == > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply$mcV$sp(PartitionBatchPruningSuite.scala:119) > at > org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply(PartitionBatchPruningSuite.scala:107) > at > org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply(PartitionBatchPruningSuite.
[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6132: - Summary: Context cleaner race condition across SparkContexts (was: Context cleaner thread lives across SparkContexts) > Context cleaner race condition across SparkContexts > --- > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Andrew Or > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky. > The right behavior is to wait until all currently running clean up tasks have > finished. > {code} > java.io.IOException: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > ... > Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 > of broadcast_0 > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at scala.Option.getOrElse(Option.scala:120) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6132: - Affects Version/s: (was: 1.3.0) 1.0.0 > Context cleaner race condition across SparkContexts > --- > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky. > The right behavior is to wait until all currently running clean up tasks have > finished. > {code} > java.io.IOException: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > ... > Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 > of broadcast_0 > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at scala.Option.getOrElse(Option.scala:120) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3859) Use consistent config names for duration (with units!)
[ https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365 ] Andrew Or commented on SPARK-3859: -- The problem is we keep adding more and more of these inconsistent policies because there isn't really a guideline to follow. When I review other people's patches there isn't really the "correct" way to name a new config. We will have to deprecate the old ones in a nicer fashion than what we can do today, and this is why I opened SPARK-5933. HOWEVER this one is duplicated by a more specific one I opened recently. I had forgotten that I already opened this issue a while ago. I'm closing this one in favor of the new one. > Use consistent config names for duration (with units!) > -- > > Key: SPARK-3859 > URL: https://issues.apache.org/jira/browse/SPARK-3859 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Andrew Or > > There are many configs in Spark that refer to some unit of time. However, > from the first glance it is unclear what these units are. We should find a > consistent way to append the units to the end of these config names and > deprecate the old ones in favor of the more consistent ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3859) Use consistent config names for duration (with units!)
[ https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365 ] Andrew Or edited comment on SPARK-3859 at 3/3/15 5:26 PM: -- The problem is we keep adding more and more of these inconsistent properties because there isn't really a guideline to follow. When I review other people's patches there isn't really the "correct" way to name a new config. We will have to deprecate the old ones in a nicer fashion than what we can do today, and this is why I opened SPARK-5933. HOWEVER this one is duplicated by a more specific one I opened recently. I had forgotten that I already opened this issue a while ago. I'm closing this one in favor of the new one. was (Author: andrewor14): The problem is we keep adding more and more of these inconsistent policies because there isn't really a guideline to follow. When I review other people's patches there isn't really the "correct" way to name a new config. We will have to deprecate the old ones in a nicer fashion than what we can do today, and this is why I opened SPARK-5933. HOWEVER this one is duplicated by a more specific one I opened recently. I had forgotten that I already opened this issue a while ago. I'm closing this one in favor of the new one. > Use consistent config names for duration (with units!) > -- > > Key: SPARK-3859 > URL: https://issues.apache.org/jira/browse/SPARK-3859 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Andrew Or > > There are many configs in Spark that refer to some unit of time. However, > from the first glance it is unclear what these units are. We should find a > consistent way to append the units to the end of these config names and > deprecate the old ones in favor of the more consistent ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3859) Use consistent config names for duration (with units!)
[ https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365 ] Andrew Or edited comment on SPARK-3859 at 3/3/15 5:27 PM: -- The problem is we keep adding more and more of these inconsistent properties because there isn't really a guideline to follow. When I review other people's patches there isn't really the "correct" way to name a new config. We will have to deprecate the old ones in a nicer fashion than what we can do today, and this is why I opened SPARK-5933. HOWEVER this one is duplicated by a more specific one I opened recently. I had forgotten that I had already opened this issue a while ago. I'm closing this one in favor of the new one. was (Author: andrewor14): The problem is we keep adding more and more of these inconsistent properties because there isn't really a guideline to follow. When I review other people's patches there isn't really the "correct" way to name a new config. We will have to deprecate the old ones in a nicer fashion than what we can do today, and this is why I opened SPARK-5933. HOWEVER this one is duplicated by a more specific one I opened recently. I had forgotten that I already opened this issue a while ago. I'm closing this one in favor of the new one. > Use consistent config names for duration (with units!) > -- > > Key: SPARK-3859 > URL: https://issues.apache.org/jira/browse/SPARK-3859 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Andrew Or > > There are many configs in Spark that refer to some unit of time. However, > from the first glance it is unclear what these units are. We should find a > consistent way to append the units to the end of these config names and > deprecate the old ones in favor of the more consistent ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3859) Use consistent config names for duration (with units!)
[ https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-3859. Resolution: Duplicate > Use consistent config names for duration (with units!) > -- > > Key: SPARK-3859 > URL: https://issues.apache.org/jira/browse/SPARK-3859 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Andrew Or > > There are many configs in Spark that refer to some unit of time. However, > from the first glance it is unclear what these units are. We should find a > consistent way to append the units to the end of these config names and > deprecate the old ones in favor of the more consistent ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3859) Use consistent config names for duration (with units!)
[ https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365 ] Andrew Or edited comment on SPARK-3859 at 3/3/15 5:27 PM: -- The problem is we keep adding more and more of these inconsistent properties because there isn't really a guideline to follow. When I review other people's patches there isn't really the "correct" way to name a new config. We will have to deprecate the old ones in a nicer fashion than what we can do today, and this is why I opened SPARK-5933. HOWEVER this one is duplicated by a more specific one I opened recently. I forgot that I had already opened this issue a while ago. I'm closing this one in favor of the new one. was (Author: andrewor14): The problem is we keep adding more and more of these inconsistent properties because there isn't really a guideline to follow. When I review other people's patches there isn't really the "correct" way to name a new config. We will have to deprecate the old ones in a nicer fashion than what we can do today, and this is why I opened SPARK-5933. HOWEVER this one is duplicated by a more specific one I opened recently. I had forgotten that I had already opened this issue a while ago. I'm closing this one in favor of the new one. > Use consistent config names for duration (with units!) > -- > > Key: SPARK-3859 > URL: https://issues.apache.org/jira/browse/SPARK-3859 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Andrew Or > > There are many configs in Spark that refer to some unit of time. However, > from the first glance it is unclear what these units are. We should find a > consistent way to append the units to the end of these config names and > deprecate the old ones in favor of the more consistent ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3859) Use consistent config names for duration (with units!)
[ https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365 ] Andrew Or edited comment on SPARK-3859 at 3/3/15 5:28 PM: -- The problem is we keep adding more and more of these inconsistent properties because there isn't really a guideline to follow. When I review other people's patches there isn't really the "correct" way to name a new config to ask them to follow. We will have to deprecate the old ones in a nicer fashion than what we can do today, and this is why I opened SPARK-5933. HOWEVER this one is duplicated by a more specific one I opened recently. I forgot that I had already opened this issue a while ago. I'm closing this one in favor of the new one. was (Author: andrewor14): The problem is we keep adding more and more of these inconsistent properties because there isn't really a guideline to follow. When I review other people's patches there isn't really the "correct" way to name a new config. We will have to deprecate the old ones in a nicer fashion than what we can do today, and this is why I opened SPARK-5933. HOWEVER this one is duplicated by a more specific one I opened recently. I forgot that I had already opened this issue a while ago. I'm closing this one in favor of the new one. > Use consistent config names for duration (with units!) > -- > > Key: SPARK-3859 > URL: https://issues.apache.org/jira/browse/SPARK-3859 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Andrew Or > > There are many configs in Spark that refer to some unit of time. However, > from the first glance it is unclear what these units are. We should find a > consistent way to append the units to the end of these config names and > deprecate the old ones in favor of the more consistent ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6132: - Labels: backport-needed (was: ) > Context cleaner race condition across SparkContexts > --- > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Labels: backport-needed > Fix For: 1.4.0 > > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky. > The right behavior is to wait until all currently running clean up tasks have > finished. > {code} > java.io.IOException: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > ... > Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 > of broadcast_0 > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at scala.Option.getOrElse(Option.scala:120) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6132: - Fix Version/s: 1.4.0 > Context cleaner race condition across SparkContexts > --- > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Labels: backport-needed > Fix For: 1.4.0 > > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky. > The right behavior is to wait until all currently running clean up tasks have > finished. > {code} > java.io.IOException: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > ... > Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 > of broadcast_0 > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at scala.Option.getOrElse(Option.scala:120) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6132: - Target Version/s: 1.4.0, 1.3.1 (was: 1.4.0) > Context cleaner race condition across SparkContexts > --- > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Labels: backport-needed > Fix For: 1.4.0 > > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky. > The right behavior is to wait until all currently running clean up tasks have > finished. > {code} > java.io.IOException: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > ... > Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 > of broadcast_0 > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at scala.Option.getOrElse(Option.scala:120) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6132: - Target Version/s: 1.1.2, 1.2.2, 1.4.0, 1.3.1 (was: 1.4.0, 1.3.1) > Context cleaner race condition across SparkContexts > --- > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Labels: backport-needed > Fix For: 1.4.0 > > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky. > The right behavior is to wait until all currently running clean up tasks have > finished. > {code} > java.io.IOException: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > ... > Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 > of broadcast_0 > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at scala.Option.getOrElse(Option.scala:120) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6133) SparkContext#stop is not idempotent
[ https://issues.apache.org/jira/browse/SPARK-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6133: - Target Version/s: 1.2.2, 1.4.0 (was: 1.4.0) > SparkContext#stop is not idempotent > --- > > Key: SPARK-6133 > URL: https://issues.apache.org/jira/browse/SPARK-6133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 1.2.2, 1.4.0 > > > If we call sc.stop() twice, the listener bus will attempt to log an event > after stop() is called, resulting in a scary error message. This happens if > Spark calls sc.stop() internally (it does this on certain error conditions) > and the application code calls it again, for instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6133) SparkContext#stop is not idempotent
[ https://issues.apache.org/jira/browse/SPARK-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6133: - Target Version/s: 1.2.2, 1.4.0, 1.3.1 (was: 1.2.2, 1.4.0) > SparkContext#stop is not idempotent > --- > > Key: SPARK-6133 > URL: https://issues.apache.org/jira/browse/SPARK-6133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 1.2.2, 1.4.0 > > > If we call sc.stop() twice, the listener bus will attempt to log an event > after stop() is called, resulting in a scary error message. This happens if > Spark calls sc.stop() internally (it does this on certain error conditions) > and the application code calls it again, for instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6133) SparkContext#stop is not idempotent
[ https://issues.apache.org/jira/browse/SPARK-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6133: - Fix Version/s: 1.4.0 > SparkContext#stop is not idempotent > --- > > Key: SPARK-6133 > URL: https://issues.apache.org/jira/browse/SPARK-6133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 1.2.2, 1.4.0 > > > If we call sc.stop() twice, the listener bus will attempt to log an event > after stop() is called, resulting in a scary error message. This happens if > Spark calls sc.stop() internally (it does this on certain error conditions) > and the application code calls it again, for instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6133) SparkContext#stop is not idempotent
[ https://issues.apache.org/jira/browse/SPARK-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6133: - Fix Version/s: 1.2.2 > SparkContext#stop is not idempotent > --- > > Key: SPARK-6133 > URL: https://issues.apache.org/jira/browse/SPARK-6133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 1.2.2, 1.4.0 > > > If we call sc.stop() twice, the listener bus will attempt to log an event > after stop() is called, resulting in a scary error message. This happens if > Spark calls sc.stop() internally (it does this on certain error conditions) > and the application code calls it again, for instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6144) When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail
[ https://issues.apache.org/jira/browse/SPARK-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6144: - Assignee: Trystan Leftwich > When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail > --- > > Key: SPARK-6144 > URL: https://issues.apache.org/jira/browse/SPARK-6144 > Project: Spark > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Trystan Leftwich >Assignee: Trystan Leftwich >Priority: Blocker > > While in cluster mode if you use ADD JAR with a HDFS sourced jar it will fail > trying to source that jar on the worker nodes with the following error: > {code} > 15/03/03 04:56:50 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 > (TID 0) > java.io.FileNotFoundException: > /yarn/nm/usercache/vagrant/appcache/application_1425166832391_0027/-19222735701425358546704_cache > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:146) > {code} > PR https://github.com/apache/spark/pull/4880 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6144) When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail
[ https://issues.apache.org/jira/browse/SPARK-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346396#comment-14346396 ] Andrew Or commented on SPARK-6144: -- I believe this is a regression from 1.2 caused by https://github.com/apache/spark/pull/3670. > When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail > --- > > Key: SPARK-6144 > URL: https://issues.apache.org/jira/browse/SPARK-6144 > Project: Spark > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Trystan Leftwich >Priority: Blocker > > While in cluster mode if you use ADD JAR with a HDFS sourced jar it will fail > trying to source that jar on the worker nodes with the following error: > {code} > 15/03/03 04:56:50 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 > (TID 0) > java.io.FileNotFoundException: > /yarn/nm/usercache/vagrant/appcache/application_1425166832391_0027/-19222735701425358546704_cache > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:146) > {code} > PR https://github.com/apache/spark/pull/4880 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6159) Distinguish between inprogress and abnormal event log history
[ https://issues.apache.org/jira/browse/SPARK-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6159: - Affects Version/s: 1.0.0 > Distinguish between inprogress and abnormal event log history > - > > Key: SPARK-6159 > URL: https://issues.apache.org/jira/browse/SPARK-6159 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Liang-Chi Hsieh >Priority: Minor > > This is the following up of SPARK-6107. Currently, when an application is > terminated abnormally (ex. Ctrl + C), its log file is still in ".inprogress" > format. SPARK-6107 makes the inprogress log readable to SparkUI. > However, I think we should be able to distinguish between real inprogress > case and abnormal case. So this fixing tries to add a shutdownhook to > EventLoggingListener and rename ".inprogress" log to ".abnormal" log. > Then we can know what it is the case when reading log in rebuildSparkUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6144) When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail
[ https://issues.apache.org/jira/browse/SPARK-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6144. Resolution: Fixed Fix Version/s: 1.3.0 > When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail > --- > > Key: SPARK-6144 > URL: https://issues.apache.org/jira/browse/SPARK-6144 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Trystan Leftwich >Assignee: Trystan Leftwich >Priority: Blocker > Fix For: 1.3.0 > > > While in cluster mode if you use ADD JAR with a HDFS sourced jar it will fail > trying to source that jar on the worker nodes with the following error: > {code} > 15/03/03 04:56:50 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 > (TID 0) > java.io.FileNotFoundException: > /yarn/nm/usercache/vagrant/appcache/application_1425166832391_0027/-19222735701425358546704_cache > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:146) > {code} > PR https://github.com/apache/spark/pull/4880 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6171) No class def found for HiveConf in Spark shell
Andrew Or created SPARK-6171: Summary: No class def found for HiveConf in Spark shell Key: SPARK-6171 URL: https://issues.apache.org/jira/browse/SPARK-6171 Project: Spark Issue Type: Bug Components: Spark Shell, SQL Affects Versions: 1.3.0 Reporter: Andrew Or Assignee: Michael Armbrust Priority: Blocker I ran `build/sbt clean assembly` and then started the Spark shell, clean and simple, then I hit this huge stack trace. I can still run Spark jobs no problem, but we probably shouldn't be throwing this on a clean build. {code} 15/03/04 14:09:15 INFO SparkILoop: Created spark context.. Spark context available as sc. java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493) at java.lang.Class.getConstructor0(Class.java:2803) at java.lang.Class.getConstructor(Class.java:1718) at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026) at $iwC$$iwC.(:9) at $iwC.(:18) at (:20) at .(:24) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6171) No class def found for HiveConf in Spark shell
[ https://issues.apache.org/jira/browse/SPARK-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347669#comment-14347669 ] Andrew Or commented on SPARK-6171: -- Marking this as a blocker because I believe it's a regression caused by https://github.com/apache/spark/pull/4387 > No class def found for HiveConf in Spark shell > -- > > Key: SPARK-6171 > URL: https://issues.apache.org/jira/browse/SPARK-6171 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Michael Armbrust >Priority: Blocker > > I ran `build/sbt clean assembly` and then started the Spark shell, clean and > simple, then I hit this huge stack trace. I can still run Spark jobs no > problem, but we probably shouldn't be throwing this on a clean build. > {code} > 15/03/04 14:09:15 INFO SparkILoop: Created spark context.. > Spark context available as sc. > java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf > at java.lang.Class.getDeclaredConstructors0(Native Method) > at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493) > at java.lang.Class.getConstructor0(Class.java:2803) > at java.lang.Class.getConstructor(Class.java:1718) > at > org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026) > at $iwC$$iwC.(:9) > at $iwC.(:18) > at (:20) > at .(:24) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6171) No class def found for HiveConf in Spark shell
[ https://issues.apache.org/jira/browse/SPARK-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6171: - Description: I ran `build/sbt clean assembly` and then started the Spark shell, then I hit this huge stack trace. I didn't enable hive in my build, but I wasn't planning on using SQL either. I can still run Spark jobs no problem, but we probably shouldn't be throwing this on a clean build. {code} 15/03/04 14:09:15 INFO SparkILoop: Created spark context.. Spark context available as sc. java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493) at java.lang.Class.getConstructor0(Class.java:2803) at java.lang.Class.getConstructor(Class.java:1718) at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026) at $iwC$$iwC.(:9) at $iwC.(:18) at (:20) at .(:24) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI {code} was: I ran `build/sbt clean assembly` and then started the Spark shell, clean and simple, then I hit this huge stack trace. I can still run Spark jobs no problem, but we probably shouldn't be throwing this on a clean build. {code} 15/03/04 14:09:15 INFO SparkILoop: Created spark context.. Spark context available as sc. java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493) at java.lang.Class.getConstructor0(Class.java:2803) at java.lang.Class.getConstructor(Class.java:1718) at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026) at $iwC$$iwC.(:9) at $iwC.(:18) at (:20) at .(:24) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI {code} > No class def found for HiveConf in Spark shell > -- > > Key: SPARK-6171 > URL: https://issues.apache.org/jira/browse/SPARK-6171 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Michael Armbrust >Priority: Blocker > > I ran `build/sbt clean assembly` and then started the Spark shell, then I hit > this huge stack trace. I didn't enable hive in my build, but I wasn't > planning on using SQL either. I can still run Spark jobs no problem, but we > probably shouldn't be throwing this on a clean build. > {code} > 15/03/04 14:09:15 INFO SparkILoop: Created spark context.. > Spark context available as sc. > java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf > at java.lang.Class.getDeclaredConstructors0(Native Method) > at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493) > at java.lang.Class.getConstructor0(Class.java:2803) > at java.lang.Class.getConstructor(Class.java:1718) > at > org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026) > at $iwC$$iwC.(:9) > at $iwC.(:18) > at (:20) > at .(:24) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) ---
[jira] [Commented] (SPARK-6171) No class def found for HiveConf in Spark shell
[ https://issues.apache.org/jira/browse/SPARK-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347899#comment-14347899 ] Andrew Or commented on SPARK-6171: -- Closing as cannot reproduced. It must have been something wrong with my environment... > No class def found for HiveConf in Spark shell > -- > > Key: SPARK-6171 > URL: https://issues.apache.org/jira/browse/SPARK-6171 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Michael Armbrust >Priority: Blocker > > I ran `build/sbt clean assembly` and then started the Spark shell, then I hit > this huge stack trace. I didn't enable hive in my build, but I wasn't > planning on using SQL either. I can still run Spark jobs no problem, but we > probably shouldn't be throwing this on a clean build. > {code} > 15/03/04 14:09:15 INFO SparkILoop: Created spark context.. > Spark context available as sc. > java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf > at java.lang.Class.getDeclaredConstructors0(Native Method) > at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493) > at java.lang.Class.getConstructor0(Class.java:2803) > at java.lang.Class.getConstructor(Class.java:1718) > at > org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026) > at $iwC$$iwC.(:9) > at $iwC.(:18) > at (:20) > at .(:24) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6171) No class def found for HiveConf in Spark shell
[ https://issues.apache.org/jira/browse/SPARK-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6171. Resolution: Cannot Reproduce > No class def found for HiveConf in Spark shell > -- > > Key: SPARK-6171 > URL: https://issues.apache.org/jira/browse/SPARK-6171 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 1.3.0 >Reporter: Andrew Or >Assignee: Michael Armbrust >Priority: Blocker > > I ran `build/sbt clean assembly` and then started the Spark shell, then I hit > this huge stack trace. I didn't enable hive in my build, but I wasn't > planning on using SQL either. I can still run Spark jobs no problem, but we > probably shouldn't be throwing this on a clean build. > {code} > 15/03/04 14:09:15 INFO SparkILoop: Created spark context.. > Spark context available as sc. > java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf > at java.lang.Class.getDeclaredConstructors0(Native Method) > at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493) > at java.lang.Class.getConstructor0(Class.java:2803) > at java.lang.Class.getConstructor(Class.java:1718) > at > org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026) > at $iwC$$iwC.(:9) > at $iwC.(:18) > at (:20) > at .(:24) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5173) support python application running on yarn cluster mode
[ https://issues.apache.org/jira/browse/SPARK-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378035#comment-14378035 ] Andrew Or commented on SPARK-5173: -- It appears not. I just closed it. > support python application running on yarn cluster mode > --- > > Key: SPARK-5173 > URL: https://issues.apache.org/jira/browse/SPARK-5173 > Project: Spark > Issue Type: Improvement > Components: YARN >Reporter: Lianhui Wang > Fix For: 1.3.0 > > > now when we run python application on yarn cluster mode through spark-submit, > spark-submit doesnot support python application on yarn cluster mode.so i > modify code of submit and yarn's AM in order to support it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5173) support python application running on yarn cluster mode
[ https://issues.apache.org/jira/browse/SPARK-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-5173. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Lianhui Wang > support python application running on yarn cluster mode > --- > > Key: SPARK-5173 > URL: https://issues.apache.org/jira/browse/SPARK-5173 > Project: Spark > Issue Type: Improvement > Components: YARN >Reporter: Lianhui Wang >Assignee: Lianhui Wang > Fix For: 1.3.0 > > > now when we run python application on yarn cluster mode through spark-submit, > spark-submit doesnot support python application on yarn cluster mode.so i > modify code of submit and yarn's AM in order to support it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6209) ExecutorClassLoader can leak connections after failing to load classes from the REPL class server
[ https://issues.apache.org/jira/browse/SPARK-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6209: - Fix Version/s: 1.4.0 1.3.1 > ExecutorClassLoader can leak connections after failing to load classes from > the REPL class server > - > > Key: SPARK-6209 > URL: https://issues.apache.org/jira/browse/SPARK-6209 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0, 1.0.3, 1.1.2, 1.2.1, 1.3.0, 1.4.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Critical > Fix For: 1.3.1, 1.4.0 > > > ExecutorClassLoader does not ensure proper cleanup of network connections > that it opens. If it fails to load a class, it may leak partially-consumed > InputStreams that are connected to the REPL's HTTP class server, causing that > server to exhaust its thread pool, which can cause the entire job to hang. > Here is a simple reproduction: > With > {code} > ./bin/spark-shell --master local-cluster[8,8,512] > {code} > run the following command: > {code} > sc.parallelize(1 to 1000, 1000).map { x => > try { > Class.forName("some.class.that.does.not.Exist") > } catch { > case e: Exception => // do nothing > } > x > }.count() > {code} > This job will run 253 tasks, then will completely freeze without any errors > or failed tasks. > It looks like the driver has 253 threads blocked in socketRead0() calls: > {code} > [joshrosen ~]$ jstack 16765 | grep socketRead0 | wc > 253 759 14674 > {code} > e.g. > {code} > "qtp1287429402-13" daemon prio=5 tid=0x7f868a1c nid=0x5b03 runnable > [0x0001159bd000] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391) > at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141) > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227) > at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1044) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280) > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > at > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > at java.lang.Thread.run(Thread.java:745) > {code} > Jstack on the executors shows blocking in loadClass / findClass, where a > single thread is RUNNABLE and waiting to hear back from the driver and other > executor threads are BLOCKED on object monitor synchronization at > Class.forName0(). > Remotely triggering a GC on a hanging executor allows the job to progress and > complete more tasks before hanging again. If I repeatedly trigger GC on all > of the executors, then the job runs to completion: > {code} > jps | grep CoarseGra | cut -d ' ' -f 1 | xargs -I {} -n 1 -P100 jcmd {} GC.run > {code} > The culprit is a {{catch}} block that ignores all exceptions and performs no > cleanup: > https://github.com/apache/spark/blob/v1.2.0/repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala#L94 > This bug has been present since Spark 1.0.0, but I suspect that we haven't > seen it before because it's pretty hard to reproduce. Triggering this error > requires a job with tasks that trigger ClassNotFoundExceptions yet are still > able to run to completion. It also requires that executors are able to leak > enough open connections to exhaust the class server's Jetty thread pool > limit, which requires that there are a large number of tasks (253+) and > either a large number of executors or a very low amount of GC pressure on > those executors (since GC will cause the leaked connections to be closed). > The fix here is pretty simple: add proper resource cleanup to this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6079) Use index to speed up StatusTracker.getJobIdsForGroup()
[ https://issues.apache.org/jira/browse/SPARK-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6079: - Affects Version/s: 1.3.0 > Use index to speed up StatusTracker.getJobIdsForGroup() > --- > > Key: SPARK-6079 > URL: https://issues.apache.org/jira/browse/SPARK-6079 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Minor > Fix For: 1.4.0 > > > {{StatusTracker.getJobIdsForGroup()}} is implemented via a linear scan over a > HashMap rather than using an index. This might be an expensive operation if > there are many (e.g. thousands) of retained jobs. We can add a new index to > speed this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6088) UI is malformed when tasks fetch remote results
[ https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6088. Resolution: Fixed Fix Version/s: 1.4.0 1.3.1 Target Version/s: 1.3.1, 1.4.0 (was: 1.3.0) > UI is malformed when tasks fetch remote results > --- > > Key: SPARK-6088 > URL: https://issues.apache.org/jira/browse/SPARK-6088 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.3.0 >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout > Fix For: 1.3.1, 1.4.0 > > Attachments: Screenshot 2015-02-28 18.24.42.png > > > There are three issues when tasks get remote results: > (1) The status never changes from GET_RESULT to SUCCEEDED > (2) The time to get the result is shown as the absolute time (resulting in a > non-sensical output that says getting the result took >1 million hours) > rather than the elapsed time > (3) The getting result time is included as part of the scheduler delay > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6088) UI is malformed when tasks fetch remote results
[ https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6088: - Affects Version/s: 1.3.0 > UI is malformed when tasks fetch remote results > --- > > Key: SPARK-6088 > URL: https://issues.apache.org/jira/browse/SPARK-6088 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.3.0 >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout > Fix For: 1.3.1, 1.4.0 > > Attachments: Screenshot 2015-02-28 18.24.42.png > > > There are three issues when tasks get remote results: > (1) The status never changes from GET_RESULT to SUCCEEDED > (2) The time to get the result is shown as the absolute time (resulting in a > non-sensical output that says getting the result took >1 million hours) > rather than the elapsed time > (3) The getting result time is included as part of the scheduler delay > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3570) Shuffle write time does not include time to open shuffle files
[ https://issues.apache.org/jira/browse/SPARK-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-3570. Resolution: Fixed Fix Version/s: 1.4.0 1.3.1 Target Version/s: 1.3.1, 1.4.0 > Shuffle write time does not include time to open shuffle files > -- > > Key: SPARK-3570 > URL: https://issues.apache.org/jira/browse/SPARK-3570 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.2, 1.0.2, 1.1.0 >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout > Fix For: 1.3.1, 1.4.0 > > Attachments: 3a_1410854905_0_job_log_waterfall.pdf, > 3a_1410957857_0_job_log_waterfall.pdf > > > Currently, the reported shuffle write time does not include time to open the > shuffle files. This time can be very significant when the disk is highly > utilized and many shuffle files exist on the machine (I'm not sure how severe > this is in 1.0 onward -- since shuffle files are automatically deleted, this > may be less of an issue because there are fewer old files sitting around). > In experiments I did, in extreme cases, adding the time to open files can > increase the shuffle write time from 5ms (of a 2 second task) to 1 second. > We should fix this for better performance debugging. > Thanks [~shivaram] for helping to diagnose this problem. cc [~pwendell] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
[ https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5771: - Fix Version/s: (was: 1.4.0) > Number of Cores in Completed Applications of Standalone Master Web Page > always be 0 if sc.stop() is called > -- > > Key: SPARK-5771 > URL: https://issues.apache.org/jira/browse/SPARK-5771 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.2.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu >Priority: Minor > > In Standalone mode, the number of cores in Completed Applications of the > Master Web Page will always be zero, if sc.stop() is called. > But the number will always be right, if sc.stop() is not called. > The reason maybe: > after sc.stop() is called, the function removeExecutor of class > ApplicationInfo will be called, thus reduce the variable coresGranted to > zero. The variable coresGranted is used to display the number of Cores on > the Web Page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
[ https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-5771: -- > Number of Cores in Completed Applications of Standalone Master Web Page > always be 0 if sc.stop() is called > -- > > Key: SPARK-5771 > URL: https://issues.apache.org/jira/browse/SPARK-5771 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.2.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu >Priority: Minor > > In Standalone mode, the number of cores in Completed Applications of the > Master Web Page will always be zero, if sc.stop() is called. > But the number will always be right, if sc.stop() is not called. > The reason maybe: > after sc.stop() is called, the function removeExecutor of class > ApplicationInfo will be called, thus reduce the variable coresGranted to > zero. The variable coresGranted is used to display the number of Cores on > the Web Page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6469) Improving documentation on YARN local directories usage
[ https://issues.apache.org/jira/browse/SPARK-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6469. Resolution: Fixed Fix Version/s: 1.4.0 1.3.1 Assignee: Christophe Préaud Target Version/s: 1.3.1, 1.4.0 > Improving documentation on YARN local directories usage > --- > > Key: SPARK-6469 > URL: https://issues.apache.org/jira/browse/SPARK-6469 > Project: Spark > Issue Type: Documentation > Components: Documentation, YARN >Affects Versions: 1.0.0 >Reporter: Christophe Préaud >Assignee: Christophe Préaud >Priority: Minor > Fix For: 1.3.1, 1.4.0 > > Attachments: TestYarnVars.scala > > > According to the [Spark YARN doc > page|http://spark.apache.org/docs/latest/running-on-yarn.html#important-notes], > Spark executors will use the local directories configured for YARN, not > {{spark.local.dir}} which should be ignored. > However it should be noted that in yarn-client mode, though the executors > will indeed use the local directories configured for YARN, the driver will > not, because it is not running on the YARN cluster; the driver in yarn-client > will use the local directories defined in {{spark.local.dir}} > Can this please be clarified in the Spark YARN documentation above? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6469) Improving documentation on YARN local directories usage
[ https://issues.apache.org/jira/browse/SPARK-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6469: - Component/s: Documentation > Improving documentation on YARN local directories usage > --- > > Key: SPARK-6469 > URL: https://issues.apache.org/jira/browse/SPARK-6469 > Project: Spark > Issue Type: Documentation > Components: Documentation, YARN >Affects Versions: 1.0.0 >Reporter: Christophe Préaud >Assignee: Christophe Préaud >Priority: Minor > Fix For: 1.3.1, 1.4.0 > > Attachments: TestYarnVars.scala > > > According to the [Spark YARN doc > page|http://spark.apache.org/docs/latest/running-on-yarn.html#important-notes], > Spark executors will use the local directories configured for YARN, not > {{spark.local.dir}} which should be ignored. > However it should be noted that in yarn-client mode, though the executors > will indeed use the local directories configured for YARN, the driver will > not, because it is not running on the YARN cluster; the driver in yarn-client > will use the local directories defined in {{spark.local.dir}} > Can this please be clarified in the Spark YARN documentation above? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6469) Improving documentation on YARN local directories usage
[ https://issues.apache.org/jira/browse/SPARK-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6469: - Affects Version/s: 1.0.0 > Improving documentation on YARN local directories usage > --- > > Key: SPARK-6469 > URL: https://issues.apache.org/jira/browse/SPARK-6469 > Project: Spark > Issue Type: Documentation > Components: Documentation, YARN >Affects Versions: 1.0.0 >Reporter: Christophe Préaud >Assignee: Christophe Préaud >Priority: Minor > Fix For: 1.3.1, 1.4.0 > > Attachments: TestYarnVars.scala > > > According to the [Spark YARN doc > page|http://spark.apache.org/docs/latest/running-on-yarn.html#important-notes], > Spark executors will use the local directories configured for YARN, not > {{spark.local.dir}} which should be ignored. > However it should be noted that in yarn-client mode, though the executors > will indeed use the local directories configured for YARN, the driver will > not, because it is not running on the YARN cluster; the driver in yarn-client > will use the local directories defined in {{spark.local.dir}} > Can this please be clarified in the Spark YARN documentation above? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6081) DriverRunner doesn't support pulling HTTP/HTTPS URIs
[ https://issues.apache.org/jira/browse/SPARK-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6081: - Affects Version/s: 1.0.0 > DriverRunner doesn't support pulling HTTP/HTTPS URIs > > > Key: SPARK-6081 > URL: https://issues.apache.org/jira/browse/SPARK-6081 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 1.0.0 >Reporter: Timothy Chen >Priority: Minor > > Standalone cluster mode according to the docs supports specifying http|https > jar urls, but when actually called the urls passed to the driver runner is > not able to pull http uris due to the usage of hadoopfs get. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3632) ConnectionManager can run out of receive threads with authentication on
[ https://issues.apache.org/jira/browse/SPARK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380579#comment-14380579 ] Andrew Or commented on SPARK-3632: -- Ok, sounds good. > ConnectionManager can run out of receive threads with authentication on > --- > > Key: SPARK-3632 > URL: https://issues.apache.org/jira/browse/SPARK-3632 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > Fix For: 1.2.0 > > > If you turn authentication on and you are using a lot of executors. There is > a chance that all the of the threads in the handleMessageExecutor could be > waiting to send a message because they are blocked waiting on authentication > to happen. This can cause a temporary deadlock until the connection times out. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6521) executors in the same node read local shuffle file
[ https://issues.apache.org/jira/browse/SPARK-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6521: - Affects Version/s: 1.2.0 > executors in the same node read local shuffle file > -- > > Key: SPARK-6521 > URL: https://issues.apache.org/jira/browse/SPARK-6521 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 1.2.0 >Reporter: xukun > > In the past, executor read other executor's shuffle file in the same node by > net. This pr make that executors in the same node read local shuffle file In > sort-based Shuffle. It will reduce net transport. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6537) UIWorkloadGenerator: The main thread should not stop SparkContext until all jobs finish
[ https://issues.apache.org/jira/browse/SPARK-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6537: - Priority: Minor (was: Major) > UIWorkloadGenerator: The main thread should not stop SparkContext until all > jobs finish > --- > > Key: SPARK-6537 > URL: https://issues.apache.org/jira/browse/SPARK-6537 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.4.0 >Reporter: Kousuke Saruta >Priority: Minor > > The main thread of UIWorkloadGenerator spawn sub threads to launch jobs but > the main thread stop SparkContext without waiting for finishing those threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6537) UIWorkloadGenerator: The main thread should not stop SparkContext until all jobs finish
[ https://issues.apache.org/jira/browse/SPARK-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6537. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Kousuke Saruta > UIWorkloadGenerator: The main thread should not stop SparkContext until all > jobs finish > --- > > Key: SPARK-6537 > URL: https://issues.apache.org/jira/browse/SPARK-6537 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 1.4.0 > > > The main thread of UIWorkloadGenerator spawn sub threads to launch jobs but > the main thread stop SparkContext without waiting for finishing those threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6537) UIWorkloadGenerator: The main thread should not stop SparkContext until all jobs finish
[ https://issues.apache.org/jira/browse/SPARK-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6537: - Affects Version/s: (was: 1.4.0) 1.0.0 > UIWorkloadGenerator: The main thread should not stop SparkContext until all > jobs finish > --- > > Key: SPARK-6537 > URL: https://issues.apache.org/jira/browse/SPARK-6537 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Kousuke Saruta >Priority: Minor > Fix For: 1.4.0 > > > The main thread of UIWorkloadGenerator spawn sub threads to launch jobs but > the main thread stop SparkContext without waiting for finishing those threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
[ https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-5771. Resolution: Fixed Fix Version/s: 1.4.0 Target Version/s: 1.4.0 > Number of Cores in Completed Applications of Standalone Master Web Page > always be 0 if sc.stop() is called > -- > > Key: SPARK-5771 > URL: https://issues.apache.org/jira/browse/SPARK-5771 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.2.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu >Priority: Minor > Fix For: 1.4.0 > > > In Standalone mode, the number of cores in Completed Applications of the > Master Web Page will always be zero, if sc.stop() is called. > But the number will always be right, if sc.stop() is not called. > The reason maybe: > after sc.stop() is called, the function removeExecutor of class > ApplicationInfo will be called, thus reduce the variable coresGranted to > zero. The variable coresGranted is used to display the number of Cores on > the Web Page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6079) Use index to speed up StatusTracker.getJobIdsForGroup()
[ https://issues.apache.org/jira/browse/SPARK-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6079. Resolution: Fixed Target Version/s: 1.4.0 > Use index to speed up StatusTracker.getJobIdsForGroup() > --- > > Key: SPARK-6079 > URL: https://issues.apache.org/jira/browse/SPARK-6079 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Minor > Fix For: 1.4.0 > > > {{StatusTracker.getJobIdsForGroup()}} is implemented via a linear scan over a > HashMap rather than using an index. This might be an expensive operation if > there are many (e.g. thousands) of retained jobs. We can add a new index to > speed this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6132) Context cleaner race condition across SparkContexts
[ https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381133#comment-14381133 ] Andrew Or commented on SPARK-6132: -- Looks like this is back ported in all target branches now. Thanks [~srowen]. > Context cleaner race condition across SparkContexts > --- > > Key: SPARK-6132 > URL: https://issues.apache.org/jira/browse/SPARK-6132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 1.1.2, 1.2.2, 1.3.1, 1.4.0 > > > The context cleaner thread is not stopped properly. If a SparkContext is > started immediately after one stops, the context cleaner of the former can > clean variables in the latter. > This is because the cleaner.stop() just sets a flag and expects the thread to > terminate asynchronously, but the code to clean broadcasts goes through > `SparkEnv.get.blockManager`, which could belong to a different SparkContext. > This is likely to be the cause of the `JavaAPISuite`, which creates many > back-to-back SparkContexts, being flaky. > The right behavior is to wait until all currently running clean up tasks have > finished. > {code} > java.io.IOException: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > ... > Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 > of broadcast_0 > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at scala.Option.getOrElse(Option.scala:120) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4346) YarnClientSchedulerBack.asyncMonitorApplication should be common with Client.monitorApplication
[ https://issues.apache.org/jira/browse/SPARK-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4346: - Affects Version/s: 1.0.0 > YarnClientSchedulerBack.asyncMonitorApplication should be common with > Client.monitorApplication > --- > > Key: SPARK-4346 > URL: https://issues.apache.org/jira/browse/SPARK-4346 > Project: Spark > Issue Type: Improvement > Components: Scheduler, YARN >Affects Versions: 1.0.0 >Reporter: Thomas Graves > > The YarnClientSchedulerBackend.asyncMonitorApplication routine should move > into ClientBase and be made common with monitorApplication. Make sure stop > is handled properly. > See discussion on https://github.com/apache/spark/pull/3143 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6443) Could not submit app in standalone cluster mode when HA is enabled
[ https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6443: - Affects Version/s: 1.0.0 > Could not submit app in standalone cluster mode when HA is enabled > -- > > Key: SPARK-6443 > URL: https://issues.apache.org/jira/browse/SPARK-6443 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.0.0 >Reporter: Tao Wang >Priority: Critical > > After digging some codes, I found user could not submit app in standalone > cluster mode when HA is enabled. But in client mode it can work. > Haven't try yet. But I will verify this and file a PR to resolve it if the > problem exists. > 3/23 update: > I started a HA cluster with zk, and tried to submit SparkPi example with > command: > ./spark-submit --class org.apache.spark.examples.SparkPi --master > spark://doggie153:7077,doggie159:7077 --deploy-mode cluster > ../lib/spark-examples-1.2.0-hadoop2.4.0.jar > and it failed with error message: > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: > spark://doggie153:7077,doggie159:7077 > akka.actor.ActorInitializationException: exception during creation > at akka.actor.ActorInitializationException$.apply(Actor.scala:164) > at akka.actor.ActorCell.create(ActorCell.scala:596) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: org.apache.spark.SparkException: Invalid master URL: > spark://doggie153:7077,doggie159:7077 > at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) > at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42) > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) > at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35) > at akka.actor.ActorCell.create(ActorCell.scala:580) > ... 9 more > But in client mode it ended with correct result. So my guess is right. I will > fix it in the related PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6443) Support HA in standalone cluster modehen HA is enabled
[ https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6443: - Summary: Support HA in standalone cluster modehen HA is enabled (was: Could not submit app in standalone cluster mode when HA is enabled) > Support HA in standalone cluster modehen HA is enabled > -- > > Key: SPARK-6443 > URL: https://issues.apache.org/jira/browse/SPARK-6443 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.0.0 >Reporter: Tao Wang >Priority: Critical > > After digging some codes, I found user could not submit app in standalone > cluster mode when HA is enabled. But in client mode it can work. > Haven't try yet. But I will verify this and file a PR to resolve it if the > problem exists. > 3/23 update: > I started a HA cluster with zk, and tried to submit SparkPi example with > command: > ./spark-submit --class org.apache.spark.examples.SparkPi --master > spark://doggie153:7077,doggie159:7077 --deploy-mode cluster > ../lib/spark-examples-1.2.0-hadoop2.4.0.jar > and it failed with error message: > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: > spark://doggie153:7077,doggie159:7077 > akka.actor.ActorInitializationException: exception during creation > at akka.actor.ActorInitializationException$.apply(Actor.scala:164) > at akka.actor.ActorCell.create(ActorCell.scala:596) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: org.apache.spark.SparkException: Invalid master URL: > spark://doggie153:7077,doggie159:7077 > at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) > at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42) > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) > at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35) > at akka.actor.ActorCell.create(ActorCell.scala:580) > ... 9 more > But in client mode it ended with correct result. So my guess is right. I will > fix it in the related PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6443: - Priority: Major (was: Critical) > Support HA in standalone cluster mode > - > > Key: SPARK-6443 > URL: https://issues.apache.org/jira/browse/SPARK-6443 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.0.0 >Reporter: Tao Wang > > After digging some codes, I found user could not submit app in standalone > cluster mode when HA is enabled. But in client mode it can work. > Haven't try yet. But I will verify this and file a PR to resolve it if the > problem exists. > 3/23 update: > I started a HA cluster with zk, and tried to submit SparkPi example with > command: > ./spark-submit --class org.apache.spark.examples.SparkPi --master > spark://doggie153:7077,doggie159:7077 --deploy-mode cluster > ../lib/spark-examples-1.2.0-hadoop2.4.0.jar > and it failed with error message: > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: > spark://doggie153:7077,doggie159:7077 > akka.actor.ActorInitializationException: exception during creation > at akka.actor.ActorInitializationException$.apply(Actor.scala:164) > at akka.actor.ActorCell.create(ActorCell.scala:596) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: org.apache.spark.SparkException: Invalid master URL: > spark://doggie153:7077,doggie159:7077 > at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) > at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42) > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) > at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35) > at akka.actor.ActorCell.create(ActorCell.scala:580) > ... 9 more > But in client mode it ended with correct result. So my guess is right. I will > fix it in the related PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6443: - Summary: Support HA in standalone cluster mode (was: Support HA in standalone cluster modehen HA is enabled) > Support HA in standalone cluster mode > - > > Key: SPARK-6443 > URL: https://issues.apache.org/jira/browse/SPARK-6443 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.0.0 >Reporter: Tao Wang >Priority: Critical > > After digging some codes, I found user could not submit app in standalone > cluster mode when HA is enabled. But in client mode it can work. > Haven't try yet. But I will verify this and file a PR to resolve it if the > problem exists. > 3/23 update: > I started a HA cluster with zk, and tried to submit SparkPi example with > command: > ./spark-submit --class org.apache.spark.examples.SparkPi --master > spark://doggie153:7077,doggie159:7077 --deploy-mode cluster > ../lib/spark-examples-1.2.0-hadoop2.4.0.jar > and it failed with error message: > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: > spark://doggie153:7077,doggie159:7077 > akka.actor.ActorInitializationException: exception during creation > at akka.actor.ActorInitializationException$.apply(Actor.scala:164) > at akka.actor.ActorCell.create(ActorCell.scala:596) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: org.apache.spark.SparkException: Invalid master URL: > spark://doggie153:7077,doggie159:7077 > at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) > at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42) > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) > at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35) > at akka.actor.ActorCell.create(ActorCell.scala:580) > ... 9 more > But in client mode it ended with correct result. So my guess is right. I will > fix it in the related PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6443: - Description: After digging some codes, I found user could not submit app in standalone cluster mode when HA is enabled. But in client mode it can work. Haven't try yet. But I will verify this and file a PR to resolve it if the problem exists. 3/23 update: I started a HA cluster with zk, and tried to submit SparkPi example with command: ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://doggie153:7077,doggie159:7077 --deploy-mode cluster ../lib/spark-examples-1.2.0-hadoop2.4.0.jar and it failed with error message: Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: spark://doggie153:7077,doggie159:7077 akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Invalid master URL: spark://doggie153:7077,doggie159:7077 at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42) at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35) at akka.actor.ActorCell.create(ActorCell.scala:580) ... 9 more But in client mode it ended with correct result. So my guess is right. I will fix it in the related PR. === EDIT by Andrew === >From a quick survey in the code I can confirm that client mode does support >this. [This >line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162] > splits the master URLs by comma and passes these URLs into the AppClient. In >standalone cluster mode, there is not equivalent logic to even split the >master URLs, whether in the old submission gateway (o.a.s.deploy.Client) or in >the new one (o.a.s.deploy.rest.StandaloneRestClient). Thus, this is an unsupported feature, not a bug! was: After digging some codes, I found user could not submit app in standalone cluster mode when HA is enabled. But in client mode it can work. Haven't try yet. But I will verify this and file a PR to resolve it if the problem exists. 3/23 update: I started a HA cluster with zk, and tried to submit SparkPi example with command: ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://doggie153:7077,doggie159:7077 --deploy-mode cluster ../lib/spark-examples-1.2.0-hadoop2.4.0.jar and it failed with error message: Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: spark://doggie153:7077,doggie159:7077 akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Invalid master URL: spark://doggie153:7077,doggie159:7077 at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42) at akka.actor.Actor$class.ar
[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6443: - Description: == EDIT by Andrew == >From a quick survey in the code I can confirm that client mode does support >this. [This >line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162] > splits the master URLs by comma and passes these URLs into the AppClient. In >standalone cluster mode, there is simply no equivalent logic to even split the >master URLs, whether in the old submission gateway (o.a.s.deploy.Client) or in >the new one (o.a.s.deploy.rest.StandaloneRestClient). Thus, this is an unsupported feature, not a bug! == Original description from Tao Wang == After digging some codes, I found user could not submit app in standalone cluster mode when HA is enabled. But in client mode it can work. Haven't try yet. But I will verify this and file a PR to resolve it if the problem exists. 3/23 update: I started a HA cluster with zk, and tried to submit SparkPi example with command: ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://doggie153:7077,doggie159:7077 --deploy-mode cluster ../lib/spark-examples-1.2.0-hadoop2.4.0.jar and it failed with error message: Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: spark://doggie153:7077,doggie159:7077 akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Invalid master URL: spark://doggie153:7077,doggie159:7077 at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42) at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35) at akka.actor.ActorCell.create(ActorCell.scala:580) ... 9 more But in client mode it ended with correct result. So my guess is right. I will fix it in the related PR. was: == EDIT by Andrew == >From a quick survey in the code I can confirm that client mode does support >this. [This >line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162] > splits the master URLs by comma and passes these URLs into the AppClient. In >standalone cluster mode, there is not equivalent logic to even split the >master URLs, whether in the old submission gateway (o.a.s.deploy.Client) or in >the new one (o.a.s.deploy.rest.StandaloneRestClient). Thus, this is an unsupported feature, not a bug! == Original description from Tao Wang == After digging some codes, I found user could not submit app in standalone cluster mode when HA is enabled. But in client mode it can work. Haven't try yet. But I will verify this and file a PR to resolve it if the problem exists. 3/23 update: I started a HA cluster with zk, and tried to submit SparkPi example with command: ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://doggie153:7077,doggie159:7077 --deploy-mode cluster ../lib/spark-examples-1.2.0-hadoop2.4.0.jar and it failed with error message: Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: spark://doggie153:7077,doggie159:7077 akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala
[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6443: - Description: == EDIT by Andrew == >From a quick survey in the code I can confirm that client mode does support >this. [This >line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162] > splits the master URLs by comma and passes these URLs into the AppClient. In >standalone cluster mode, there is not equivalent logic to even split the >master URLs, whether in the old submission gateway (o.a.s.deploy.Client) or in >the new one (o.a.s.deploy.rest.StandaloneRestClient). Thus, this is an unsupported feature, not a bug! == Original description from Tao Wang == After digging some codes, I found user could not submit app in standalone cluster mode when HA is enabled. But in client mode it can work. Haven't try yet. But I will verify this and file a PR to resolve it if the problem exists. 3/23 update: I started a HA cluster with zk, and tried to submit SparkPi example with command: ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://doggie153:7077,doggie159:7077 --deploy-mode cluster ../lib/spark-examples-1.2.0-hadoop2.4.0.jar and it failed with error message: Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: spark://doggie153:7077,doggie159:7077 akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Invalid master URL: spark://doggie153:7077,doggie159:7077 at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42) at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35) at akka.actor.ActorCell.create(ActorCell.scala:580) ... 9 more But in client mode it ended with correct result. So my guess is right. I will fix it in the related PR. was: After digging some codes, I found user could not submit app in standalone cluster mode when HA is enabled. But in client mode it can work. Haven't try yet. But I will verify this and file a PR to resolve it if the problem exists. 3/23 update: I started a HA cluster with zk, and tried to submit SparkPi example with command: ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://doggie153:7077,doggie159:7077 --deploy-mode cluster ../lib/spark-examples-1.2.0-hadoop2.4.0.jar and it failed with error message: Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: spark://doggie153:7077,doggie159:7077 akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Invalid master URL: spark://doggie153:7077,doggie159:7077 at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) at org.apache.spark.deploy.Cl
[jira] [Closed] (SPARK-6650) ExecutorAllocationManager never stops
[ https://issues.apache.org/jira/browse/SPARK-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6650. Resolution: Fixed Fix Version/s: 1.4.0 1.3.1 Assignee: Marcelo Vanzin > ExecutorAllocationManager never stops > - > > Key: SPARK-6650 > URL: https://issues.apache.org/jira/browse/SPARK-6650 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 1.3.1, 1.4.0 > > > {{ExecutorAllocationManager}} doesn't even have a stop() method. That means > that when the owning SparkContext goes away, the internal thread it uses to > schedule its activities remains alive. > That means it constantly spams the logs and does who knows what else that > could affect any future contexts that are allocated. > It's particularly evil during unit tests, since it slows down everything else > after the suite is run, leaving multiple threads behind. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6673) spark-shell.cmd can't start even when spark was built in Windows
[ https://issues.apache.org/jira/browse/SPARK-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6673: - Priority: Blocker (was: Major) > spark-shell.cmd can't start even when spark was built in Windows > > > Key: SPARK-6673 > URL: https://issues.apache.org/jira/browse/SPARK-6673 > Project: Spark > Issue Type: Bug > Components: Windows >Affects Versions: 1.3.0 >Reporter: Masayoshi TSUZUKI >Priority: Blocker > > spark-shell.cmd can't start. > {code} > bin\spark-shell.cmd --master local > {code} > will get > {code} > Failed to find Spark assembly JAR. > You need to build Spark before running this program. > {code} > even when we have built spark. > This is because of the lack of the environment {{SPARK_SCALA_VERSION}} which > is used in {{spark-class2.cmd}}. > In linux scripts, this value is set as {{2.10}} or {{2.11}} by default in > {{load-spark-env.sh}}, but there are no equivalent script in Windows. > As workaround, by executing > {code} > set SPARK_SCALA_VERSION=2.10 > {code} > before execute spark-shell.cmd, we can successfully start it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6673) spark-shell.cmd can't start even when spark was built in Windows
[ https://issues.apache.org/jira/browse/SPARK-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6673: - Target Version/s: 1.3.1, 1.4.0 > spark-shell.cmd can't start even when spark was built in Windows > > > Key: SPARK-6673 > URL: https://issues.apache.org/jira/browse/SPARK-6673 > Project: Spark > Issue Type: Bug > Components: Windows >Affects Versions: 1.3.0 >Reporter: Masayoshi TSUZUKI >Priority: Blocker > > spark-shell.cmd can't start. > {code} > bin\spark-shell.cmd --master local > {code} > will get > {code} > Failed to find Spark assembly JAR. > You need to build Spark before running this program. > {code} > even when we have built spark. > This is because of the lack of the environment {{SPARK_SCALA_VERSION}} which > is used in {{spark-class2.cmd}}. > In linux scripts, this value is set as {{2.10}} or {{2.11}} by default in > {{load-spark-env.sh}}, but there are no equivalent script in Windows. > As workaround, by executing > {code} > set SPARK_SCALA_VERSION=2.10 > {code} > before execute spark-shell.cmd, we can successfully start it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6673) spark-shell.cmd can't start even when spark was built in Windows
[ https://issues.apache.org/jira/browse/SPARK-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6673: - Assignee: Masayoshi TSUZUKI > spark-shell.cmd can't start even when spark was built in Windows > > > Key: SPARK-6673 > URL: https://issues.apache.org/jira/browse/SPARK-6673 > Project: Spark > Issue Type: Bug > Components: Windows >Affects Versions: 1.3.0 >Reporter: Masayoshi TSUZUKI >Assignee: Masayoshi TSUZUKI >Priority: Blocker > > spark-shell.cmd can't start. > {code} > bin\spark-shell.cmd --master local > {code} > will get > {code} > Failed to find Spark assembly JAR. > You need to build Spark before running this program. > {code} > even when we have built spark. > This is because of the lack of the environment {{SPARK_SCALA_VERSION}} which > is used in {{spark-class2.cmd}}. > In linux scripts, this value is set as {{2.10}} or {{2.11}} by default in > {{load-spark-env.sh}}, but there are no equivalent script in Windows. > As workaround, by executing > {code} > set SPARK_SCALA_VERSION=2.10 > {code} > before execute spark-shell.cmd, we can successfully start it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6640) Executor may connect to HeartbeartReceiver before it's setup in the driver side
[ https://issues.apache.org/jira/browse/SPARK-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6640: - Assignee: Shixiong Zhu > Executor may connect to HeartbeartReceiver before it's setup in the driver > side > --- > > Key: SPARK-6640 > URL: https://issues.apache.org/jira/browse/SPARK-6640 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > > Here is the current code about starting LocalBackend and creating > HeartbeatReceiver: > {code} > // Create and start the scheduler > private[spark] var (schedulerBackend, taskScheduler) = > SparkContext.createTaskScheduler(this, master) > private val heartbeatReceiver = env.actorSystem.actorOf( > Props(new HeartbeatReceiver(this, taskScheduler)), "HeartbeatReceiver") > {code} > When creating LocalBackend, it will start `LocalActor`. `LocalActor` will > create Executor, and Executor's constructor will retrieve `HeartbeatReceiver`. > So we should make sure this line: > {code} > private val heartbeatReceiver = env.actorSystem.actorOf( > Props(new HeartbeatReceiver(this, taskScheduler)), "HeartbeatReceiver") > {code} > happen before "creating LocalActor". > However, current codes can not guarantee that. Sometimes, creating Executor > will crash. The issue was reported by sparkdi in > http://apache-spark-user-list.1001560.n3.nabble.com/Actor-not-found-td22265.html#a22324 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6640) Executor may connect to HeartbeartReceiver before it's setup in the driver side
[ https://issues.apache.org/jira/browse/SPARK-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6640. Resolution: Fixed Fix Version/s: 1.4.0 Target Version/s: 1.4.0 > Executor may connect to HeartbeartReceiver before it's setup in the driver > side > --- > > Key: SPARK-6640 > URL: https://issues.apache.org/jira/browse/SPARK-6640 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 1.4.0 > > > Here is the current code about starting LocalBackend and creating > HeartbeatReceiver: > {code} > // Create and start the scheduler > private[spark] var (schedulerBackend, taskScheduler) = > SparkContext.createTaskScheduler(this, master) > private val heartbeatReceiver = env.actorSystem.actorOf( > Props(new HeartbeatReceiver(this, taskScheduler)), "HeartbeatReceiver") > {code} > When creating LocalBackend, it will start `LocalActor`. `LocalActor` will > create Executor, and Executor's constructor will retrieve `HeartbeatReceiver`. > So we should make sure this line: > {code} > private val heartbeatReceiver = env.actorSystem.actorOf( > Props(new HeartbeatReceiver(this, taskScheduler)), "HeartbeatReceiver") > {code} > happen before "creating LocalActor". > However, current codes can not guarantee that. Sometimes, creating Executor > will crash. The issue was reported by sparkdi in > http://apache-spark-user-list.1001560.n3.nabble.com/Actor-not-found-td22265.html#a22324 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6688) EventLoggingListener should always operate on resolved URIs
[ https://issues.apache.org/jira/browse/SPARK-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6688. Resolution: Fixed Fix Version/s: 1.4.0 1.3.1 Assignee: Marcelo Vanzin > EventLoggingListener should always operate on resolved URIs > --- > > Key: SPARK-6688 > URL: https://issues.apache.org/jira/browse/SPARK-6688 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Minor > Fix For: 1.3.1, 1.4.0 > > > A small bug was introduced in 1.3.0, where a check in > EventLoggingListener.scala is performed on the non-resolved log path. This > means that if "fs.defaultFS" is not the local filesystem, and the user is > trying to store logs in the local filesystem by providing a path with no > "file:" protocol, thing will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6701) Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python application
[ https://issues.apache.org/jira/browse/SPARK-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6701: - Description: Observed in Master and 1.3, both in SBT and in Maven (with YARN). {code} Process List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, --master, yarn-cluster, --num-executors, 1, --properties-file, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties, --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) exited with code 1 sbt.ForkMain$ForkError: Process List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, --master, yarn-cluster, --num-executors, 1, --properties-file, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties, --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) exited with code 1 at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122) at org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259) at org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160) at org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) at org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) {code} was: Observed in Master and 1.3, both in SBT and in Maven (with YARN). {code} Error Message Process List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, --master, yarn-cluster, --num-executors, 1, --properties-file, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties, --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) exited with code 1 sbt.ForkMain$ForkError: Process List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, --master, yarn-cluster, --num-executors, 1, --properties-file, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties, --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) exited with code 1 at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122) at org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259) at org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160) at org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) at org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) {code} > Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python application > - > > Key: SPARK-6701 > URL: https://issues.apache.org/jira/browse/SPARK-6701 > Project: Spark > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 1.3.0 >Reporter: Andrew Or >Priority: Critical > > Observed in Master and 1.3, both in SBT and in Maven (with YARN). > {code} > Process > List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, > --master, yarn-cluster, --num-executors, 1, --properties-file, > /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties, > --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, > /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, > /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) > exited with c
[jira] [Created] (SPARK-6701) Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python application
Andrew Or created SPARK-6701: Summary: Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python application Key: SPARK-6701 URL: https://issues.apache.org/jira/browse/SPARK-6701 Project: Spark Issue Type: Bug Components: Tests, YARN Affects Versions: 1.3.0 Reporter: Andrew Or Priority: Critical Observed in Master and 1.3, both in SBT and in Maven (with YARN). {code} Error Message Process List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, --master, yarn-cluster, --num-executors, 1, --properties-file, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties, --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) exited with code 1 sbt.ForkMain$ForkError: Process List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, --master, yarn-cluster, --num-executors, 1, --properties-file, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties, --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) exited with code 1 at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122) at org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259) at org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160) at org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) at org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6700) flaky test: run Python application in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6700. Resolution: Fixed > flaky test: run Python application in yarn-cluster mode > > > Key: SPARK-6700 > URL: https://issues.apache.org/jira/browse/SPARK-6700 > Project: Spark > Issue Type: Bug > Components: Tests >Reporter: Davies Liu >Assignee: Lianhui Wang >Priority: Critical > Labels: test, yarn > > org.apache.spark.deploy.yarn.YarnClusterSuite.run Python application in > yarn-cluster mode > Failing for the past 1 build (Since Failed#2025 ) > Took 12 sec. > Error Message > {code} > Process > List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, > --master, yarn-cluster, --num-executors, 1, --properties-file, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties, > --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp) > exited with code 1 > Stacktrace > sbt.ForkMain$ForkError: Process > List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, > --master, yarn-cluster, --num-executors, 1, --properties-file, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties, > --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp) > exited with code 1 > at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122) > at > org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259) > at > org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160) > at > org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) > at > org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.scalatest.Suite$class.withFixture(Suite.scala:1122) > at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.deploy.yarn.YarnClusterSuite.org$scalatest$BeforeAndAfterAll$$super$run(YarnClusterSuite.scala:44) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(Before
[jira] [Reopened] (SPARK-6700) flaky test: run Python application in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-6700: -- > flaky test: run Python application in yarn-cluster mode > > > Key: SPARK-6700 > URL: https://issues.apache.org/jira/browse/SPARK-6700 > Project: Spark > Issue Type: Bug > Components: Tests >Reporter: Davies Liu >Assignee: Lianhui Wang >Priority: Critical > Labels: test, yarn > > org.apache.spark.deploy.yarn.YarnClusterSuite.run Python application in > yarn-cluster mode > Failing for the past 1 build (Since Failed#2025 ) > Took 12 sec. > Error Message > {code} > Process > List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, > --master, yarn-cluster, --num-executors, 1, --properties-file, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties, > --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp) > exited with code 1 > Stacktrace > sbt.ForkMain$ForkError: Process > List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, > --master, yarn-cluster, --num-executors, 1, --properties-file, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties, > --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp) > exited with code 1 > at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122) > at > org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259) > at > org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160) > at > org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) > at > org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.scalatest.Suite$class.withFixture(Suite.scala:1122) > at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.deploy.yarn.YarnClusterSuite.org$scalatest$BeforeAndAfterAll$$super$run(YarnClusterSuite.scala:44) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:
[jira] [Closed] (SPARK-6700) flaky test: run Python application in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6700. Resolution: Duplicate > flaky test: run Python application in yarn-cluster mode > > > Key: SPARK-6700 > URL: https://issues.apache.org/jira/browse/SPARK-6700 > Project: Spark > Issue Type: Bug > Components: Tests >Reporter: Davies Liu >Assignee: Lianhui Wang >Priority: Critical > Labels: test, yarn > > org.apache.spark.deploy.yarn.YarnClusterSuite.run Python application in > yarn-cluster mode > Failing for the past 1 build (Since Failed#2025 ) > Took 12 sec. > Error Message > {code} > Process > List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, > --master, yarn-cluster, --num-executors, 1, --properties-file, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties, > --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp) > exited with code 1 > Stacktrace > sbt.ForkMain$ForkError: Process > List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit, > --master, yarn-cluster, --num-executors, 1, --properties-file, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties, > --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, > /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp) > exited with code 1 > at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122) > at > org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259) > at > org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160) > at > org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) > at > org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.scalatest.Suite$class.withFixture(Suite.scala:1122) > at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.deploy.yarn.YarnClusterSuite.org$scalatest$BeforeAndAfterAll$$super$run(YarnClusterSuite.scala:44) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(Be
[jira] [Updated] (SPARK-6703) Provide a way to discover existing SparkContext's
[ https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6703: - Target Version/s: 1.4.0 > Provide a way to discover existing SparkContext's > - > > Key: SPARK-6703 > URL: https://issues.apache.org/jira/browse/SPARK-6703 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Patrick Wendell > > Right now it is difficult to write a Spark application in a way that can be > run independently and also be composed with other Spark applications in an > environment such as the JobServer, notebook servers, etc where there is a > shared SparkContext. > It would be nice to provide a rendez-vous point so that applications can > learn whether an existing SparkContext already exists before creating one. > The most simple/surgical way I see to do this is to have an optional static > SparkContext singleton that people can be retrieved as follows: > {code} > val sc = SparkContext.getOrCreate(conf = new SparkConf()) > {code} > And you could also have a setter where some outer framework/server can set it > for use by multiple downstream applications. > A more advanced version of this would have some named registry or something, > but since we only support a single SparkContext in one JVM at this point > anyways, this seems sufficient and much simpler. Another advanced option > would be to allow plugging in some other notion of configuration you'd pass > when retrieving an existing context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6703) Provide a way to discover existing SparkContext's
[ https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6703: - Affects Version/s: 1.3.0 > Provide a way to discover existing SparkContext's > - > > Key: SPARK-6703 > URL: https://issues.apache.org/jira/browse/SPARK-6703 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Patrick Wendell > > Right now it is difficult to write a Spark application in a way that can be > run independently and also be composed with other Spark applications in an > environment such as the JobServer, notebook servers, etc where there is a > shared SparkContext. > It would be nice to provide a rendez-vous point so that applications can > learn whether an existing SparkContext already exists before creating one. > The most simple/surgical way I see to do this is to have an optional static > SparkContext singleton that people can be retrieved as follows: > {code} > val sc = SparkContext.getOrCreate(conf = new SparkConf()) > {code} > And you could also have a setter where some outer framework/server can set it > for use by multiple downstream applications. > A more advanced version of this would have some named registry or something, > but since we only support a single SparkContext in one JVM at this point > anyways, this seems sufficient and much simpler. Another advanced option > would be to allow plugging in some other notion of configuration you'd pass > when retrieving an existing context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3596) Support changing the yarn client monitor interval
[ https://issues.apache.org/jira/browse/SPARK-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-3596. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Weizhong Target Version/s: 1.4.0 > Support changing the yarn client monitor interval > -- > > Key: SPARK-3596 > URL: https://issues.apache.org/jira/browse/SPARK-3596 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Thomas Graves >Assignee: Weizhong > Fix For: 1.4.0 > > > Right now spark on yarn has a monitor interval that can be configured by > spark.yarn.report.interval. This is how often the client checks with the RM > to get status on the running application in cluster mode. We should allow > users to set this interval as some may not need to check so often. There is > another jira filed to make it so the client doesn't have to stay around for > cluster mode. > With the changes in https://github.com/apache/spark/pull/2350, it further > extends that to affect client mode. > We may want to add in specific configs for that since the monitorApplication > function is now used in multiple different scenarios it actually might make > sense for it to take the timeout as a parameter. You could want different > timeout for different situations. > for instance how quickly we poll on client side and print information > (cluster mode) vs how quickly we recognize the application quit and we want > to terminate (client mode), I want the latter to happen quickly where as in > cluster mode I might not care as much about how often it is printing updated > info to the screen. I guess its private so we could leave it as is and change > if we add support for that later. > my suggestion for name would be something like > spark.yarn.client.progress.pollinterval. If we were to add separate ones in > the future then they could be something like > spark.yarn.app.ready.pollinterval and spark.yarn.app.completion.pollinterval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4346) YarnClientSchedulerBack.asyncMonitorApplication should be common with Client.monitorApplication
[ https://issues.apache.org/jira/browse/SPARK-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4346. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Weizhong Target Version/s: 1.4.0 > YarnClientSchedulerBack.asyncMonitorApplication should be common with > Client.monitorApplication > --- > > Key: SPARK-4346 > URL: https://issues.apache.org/jira/browse/SPARK-4346 > Project: Spark > Issue Type: Improvement > Components: Scheduler, YARN >Affects Versions: 1.0.0 >Reporter: Thomas Graves >Assignee: Weizhong > Fix For: 1.4.0 > > > The YarnClientSchedulerBackend.asyncMonitorApplication routine should move > into ClientBase and be made common with monitorApplication. Make sure stop > is handled properly. > See discussion on https://github.com/apache/spark/pull/3143 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3596) Support changing the yarn client monitor interval
[ https://issues.apache.org/jira/browse/SPARK-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486014#comment-14486014 ] Andrew Or commented on SPARK-3596: -- Transitively fixed by https://github.com/apache/spark/pull/5305 > Support changing the yarn client monitor interval > -- > > Key: SPARK-3596 > URL: https://issues.apache.org/jira/browse/SPARK-3596 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Thomas Graves > Fix For: 1.4.0 > > > Right now spark on yarn has a monitor interval that can be configured by > spark.yarn.report.interval. This is how often the client checks with the RM > to get status on the running application in cluster mode. We should allow > users to set this interval as some may not need to check so often. There is > another jira filed to make it so the client doesn't have to stay around for > cluster mode. > With the changes in https://github.com/apache/spark/pull/2350, it further > extends that to affect client mode. > We may want to add in specific configs for that since the monitorApplication > function is now used in multiple different scenarios it actually might make > sense for it to take the timeout as a parameter. You could want different > timeout for different situations. > for instance how quickly we poll on client side and print information > (cluster mode) vs how quickly we recognize the application quit and we want > to terminate (client mode), I want the latter to happen quickly where as in > cluster mode I might not care as much about how often it is printing updated > info to the screen. I guess its private so we could leave it as is and change > if we add support for that later. > my suggestion for name would be something like > spark.yarn.client.progress.pollinterval. If we were to add separate ones in > the future then they could be something like > spark.yarn.app.ready.pollinterval and spark.yarn.app.completion.pollinterval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5931) Use consistent naming for time properties
[ https://issues.apache.org/jira/browse/SPARK-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-5931. Resolution: Fixed Fix Version/s: 1.4.0 > Use consistent naming for time properties > - > > Key: SPARK-5931 > URL: https://issues.apache.org/jira/browse/SPARK-5931 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Ilya Ganelin > Fix For: 1.4.0 > > > This is SPARK-5932's sister issue. > The naming of existing time configs is inconsistent. We currently have the > following throughout the code base: > {code} > spark.network.timeout // seconds > spark.executor.heartbeatInterval // milliseconds > spark.storage.blockManagerSlaveTimeoutMs // milliseconds > spark.yarn.scheduler.heartbeat.interval-ms // milliseconds > {code} > Instead, my proposal is to simplify the config name itself and make > everything accept time using the following format: 5s, 2ms, 100us. For > instance: > {code} > spark.network.timeout = 5s > spark.executor.heartbeatInterval = 500ms > spark.storage.blockManagerSlaveTimeout = 100ms > spark.yarn.scheduler.heartbeatInterval = 400ms > {code} > All existing configs that are relevant will be deprecated in favor of the new > ones. We should do this soon before we keep introducing more time configs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5931) Use consistent naming for time properties
[ https://issues.apache.org/jira/browse/SPARK-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5931: - Assignee: Ilya Ganelin (was: Andrew Or) > Use consistent naming for time properties > - > > Key: SPARK-5931 > URL: https://issues.apache.org/jira/browse/SPARK-5931 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Ilya Ganelin > Fix For: 1.4.0 > > > This is SPARK-5932's sister issue. > The naming of existing time configs is inconsistent. We currently have the > following throughout the code base: > {code} > spark.network.timeout // seconds > spark.executor.heartbeatInterval // milliseconds > spark.storage.blockManagerSlaveTimeoutMs // milliseconds > spark.yarn.scheduler.heartbeat.interval-ms // milliseconds > {code} > Instead, my proposal is to simplify the config name itself and make > everything accept time using the following format: 5s, 2ms, 100us. For > instance: > {code} > spark.network.timeout = 5s > spark.executor.heartbeatInterval = 500ms > spark.storage.blockManagerSlaveTimeout = 100ms > spark.yarn.scheduler.heartbeatInterval = 400ms > {code} > All existing configs that are relevant will be deprecated in favor of the new > ones. We should do this soon before we keep introducing more time configs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6890) Local cluster mode in Mac is broken
[ https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6890: - Affects Version/s: 1.4.0 > Local cluster mode in Mac is broken > --- > > Key: SPARK-6890 > URL: https://issues.apache.org/jira/browse/SPARK-6890 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > > The worker can not be launched, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6890) Local cluster mode in Mac is broken
[ https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6890: - Priority: Critical (was: Blocker) > Local cluster mode in Mac is broken > --- > > Key: SPARK-6890 > URL: https://issues.apache.org/jira/browse/SPARK-6890 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Critical > > In master, local cluster mode is broken. If I run `bin/spark-submit --master > local-cluster[2,1,512]`, my executors keep failing with class not found > exception. It appears that the assembly jar is not added to the executors' > class paths. I suspect that this is caused by > https://github.com/apache/spark/pull/5085. > {code} > Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) > at java.lang.Class.getMethod0(Class.java:2774) > at java.lang.Class.getMethod(Class.java:1663) > at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) > Caused by: java.lang.ClassNotFoundException: scala.Option > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6890) Local cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6890: - Summary: Local cluster mode is broken (was: Local cluster mode in Mac is broken) > Local cluster mode is broken > > > Key: SPARK-6890 > URL: https://issues.apache.org/jira/browse/SPARK-6890 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Critical > > In master, local cluster mode is broken. If I run `bin/spark-submit --master > local-cluster[2,1,512]`, my executors keep failing with class not found > exception. It appears that the assembly jar is not added to the executors' > class paths. I suspect that this is caused by > https://github.com/apache/spark/pull/5085. > {code} > Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) > at java.lang.Class.getMethod0(Class.java:2774) > at java.lang.Class.getMethod(Class.java:1663) > at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) > Caused by: java.lang.ClassNotFoundException: scala.Option > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6890) Local cluster mode in Mac is broken
[ https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6890: - Description: In master, local cluster mode is broken. If I run `bin/spark-submit --master local-cluster[2,1,512]`, my executors keep failing with class not found exception. It appears that the assembly jar is not added to the executors' class paths. I suspect that this is caused by https://github.com/apache/spark/pull/5085. {code} Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) Caused by: java.lang.ClassNotFoundException: scala.Option at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) {code} was:The worker can not be launched, > Local cluster mode in Mac is broken > --- > > Key: SPARK-6890 > URL: https://issues.apache.org/jira/browse/SPARK-6890 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > > In master, local cluster mode is broken. If I run `bin/spark-submit --master > local-cluster[2,1,512]`, my executors keep failing with class not found > exception. It appears that the assembly jar is not added to the executors' > class paths. I suspect that this is caused by > https://github.com/apache/spark/pull/5085. > {code} > Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) > at java.lang.Class.getMethod0(Class.java:2774) > at java.lang.Class.getMethod(Class.java:1663) > at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) > Caused by: java.lang.ClassNotFoundException: scala.Option > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4848) Allow different Worker configurations in standalone cluster
[ https://issues.apache.org/jira/browse/SPARK-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4848. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Nathan Kronenfeld Target Version/s: 1.4.0 > Allow different Worker configurations in standalone cluster > --- > > Key: SPARK-4848 > URL: https://issues.apache.org/jira/browse/SPARK-4848 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 > Environment: stand-alone spark cluster >Reporter: Nathan Kronenfeld >Assignee: Nathan Kronenfeld > Fix For: 1.4.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > On a stand-alone spark cluster, much of the determination of worker > specifics, especially one has multiple instances per node, is done only on > the master. > The master loops over instances, and starts a worker per instance on each > node. > This means, if your workers have different values of SPARK_WORKER_INSTANCES > or SPARK_WORKER_WEBUI_PORT from each other (or from the master), all values > are ignored except the one on the master. > SPARK_WORKER_PORT looks like it is unread in scripts, but read in code - I'm > not sure how it will behave, since all instances will read the same value > from the environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6890) Local cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6890: - Assignee: Marcelo Vanzin (was: Andrew Or) > Local cluster mode is broken > > > Key: SPARK-6890 > URL: https://issues.apache.org/jira/browse/SPARK-6890 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Davies Liu >Assignee: Marcelo Vanzin >Priority: Critical > > In master, local cluster mode is broken. If I run `bin/spark-submit --master > local-cluster[2,1,512]`, my executors keep failing with class not found > exception. It appears that the assembly jar is not added to the executors' > class paths. I suspect that this is caused by > https://github.com/apache/spark/pull/5085. > {code} > Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) > at java.lang.Class.getMethod0(Class.java:2774) > at java.lang.Class.getMethod(Class.java:1663) > at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) > Caused by: java.lang.ClassNotFoundException: scala.Option > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6890) Local cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493448#comment-14493448 ] Andrew Or commented on SPARK-6890: -- I'm not actively working on this. Feel free to fix it since you and Nishkam have more experience in that part of the code. > Local cluster mode is broken > > > Key: SPARK-6890 > URL: https://issues.apache.org/jira/browse/SPARK-6890 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Critical > > In master, local cluster mode is broken. If I run `bin/spark-submit --master > local-cluster[2,1,512]`, my executors keep failing with class not found > exception. It appears that the assembly jar is not added to the executors' > class paths. I suspect that this is caused by > https://github.com/apache/spark/pull/5085. > {code} > Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) > at java.lang.Class.getMethod0(Class.java:2774) > at java.lang.Class.getMethod(Class.java:1663) > at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) > Caused by: java.lang.ClassNotFoundException: scala.Option > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org