[jira] [Updated] (SPARK-4883) Add a name to the directoryCleaner thread

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4883:
-
Affects Version/s: 1.2.0

> Add a name to the directoryCleaner thread
> -
>
> Key: SPARK-4883
> URL: https://issues.apache.org/jira/browse/SPARK-4883
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.2.0
>Reporter: Shixiong Zhu
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4733) Add missing prameter comments in ShuffleDependency

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4733:
-
Affects Version/s: 1.2.0

> Add missing prameter comments in ShuffleDependency
> --
>
> Key: SPARK-4733
> URL: https://issues.apache.org/jira/browse/SPARK-4733
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> Add missing Javadoc comments in ShuffleDependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4733) Add missing prameter comments in ShuffleDependency

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4733.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Takeshi Yamamuro
Target Version/s: 1.3.0

> Add missing prameter comments in ShuffleDependency
> --
>
> Key: SPARK-4733
> URL: https://issues.apache.org/jira/browse/SPARK-4733
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Trivial
> Fix For: 1.3.0
>
>
> Add missing Javadoc comments in ShuffleDependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4447.

   Resolution: Fixed
Fix Version/s: 1.3.0

> Remove layers of abstraction in YARN code no longer needed after dropping 
> yarn-alpha
> 
>
> Key: SPARK-4447
> URL: https://issues.apache.org/jira/browse/SPARK-4447
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.3.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 1.3.0
>
>
> For example, YarnRMClient and YarnRMClientImpl can be merged
> YarnAllocator and YarnAllocationHandler can be merged



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4447:
-
Priority: Critical  (was: Major)

> Remove layers of abstraction in YARN code no longer needed after dropping 
> yarn-alpha
> 
>
> Key: SPARK-4447
> URL: https://issues.apache.org/jira/browse/SPARK-4447
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.3.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Critical
> Fix For: 1.3.0
>
>
> For example, YarnRMClient and YarnRMClientImpl can be merged
> YarnAllocator and YarnAllocationHandler can be merged



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-2458) Make failed application log visible on History Server

2015-01-07 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-2458.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Masayoshi TSUZUKI
Target Version/s: 1.3.0

> Make failed application log visible on History Server
> -
>
> Key: SPARK-2458
> URL: https://issues.apache.org/jira/browse/SPARK-2458
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Masayoshi TSUZUKI
>Assignee: Masayoshi TSUZUKI
> Fix For: 1.3.0
>
>
> History server is very helpful for debugging application correctness & 
> performance after the application finished. However, when the application 
> failed, the link is not listed on the hisotry server UI and history can't be 
> viewed.
> It would be very useful if we can check the history of failed application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4286) Support External Shuffle Service with Mesos integration

2015-01-07 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4286:
-
Affects Version/s: 1.2.0

> Support External Shuffle Service with Mesos integration
> ---
>
> Key: SPARK-4286
> URL: https://issues.apache.org/jira/browse/SPARK-4286
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Affects Versions: 1.2.0
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>
> With the new external shuffle service added, we need to also make the Mesos 
> integration able to launch the shuffle service and support the auto scaling 
> executors.
> Mesos executor will launch the external shuffle service and leave it running, 
> while have spark executors scalable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4389) Set akka.remote.netty.tcp.bind-hostname="0.0.0.0" so driver can be located behind NAT

2015-01-07 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4389:
-
Affects Version/s: 1.2.0

> Set akka.remote.netty.tcp.bind-hostname="0.0.0.0" so driver can be located 
> behind NAT
> -
>
> Key: SPARK-4389
> URL: https://issues.apache.org/jira/browse/SPARK-4389
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Priority: Minor
>
> We should set {{akka.remote.netty.tcp.bind-hostname="0.0.0.0"}} in our Akka 
> configuration so that Spark drivers can be located behind NATs / work with 
> weird DNS setups.
> This is blocked by upgrading our Akka version, since this configuration is 
> not present Akka 2.3.4.  There might be a different approach / workaround 
> that works on our current Akka version, though.
> EDIT: this is blocked by Akka 2.4, since this feature is only available in 
> the 2.4 snapshot release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4951) A busy executor may be killed when dynamicAllocation is enabled

2015-01-07 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4951:
-
Affects Version/s: 1.2.0

> A busy executor may be killed when dynamicAllocation is enabled
> ---
>
> Key: SPARK-4951
> URL: https://issues.apache.org/jira/browse/SPARK-4951
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Shixiong Zhu
>
> If a task runs more than `spark.dynamicAllocation.executorIdleTimeout`, the 
> executor which runs this task will be killed.
> The following steps (yarn-client mode) can reproduce this bug:
> 1. Start `spark-shell` using
> {code}
> ./bin/spark-shell --conf "spark.shuffle.service.enabled=true" \
> --conf "spark.dynamicAllocation.minExecutors=1" \
> --conf "spark.dynamicAllocation.maxExecutors=4" \
> --conf "spark.dynamicAllocation.enabled=true" \
> --conf "spark.dynamicAllocation.executorIdleTimeout=30" \
> --master yarn-client \
> --driver-memory 512m \
> --executor-memory 512m \
> --executor-cores 1
> {code}
> 2. Wait more than 30 seconds until there is only one executor.
> 3. Run the following code (a task needs at least 50 seconds to finish)
> {code}
> val r = sc.parallelize(1 to 1000, 20).map{t => Thread.sleep(1000); 
> t}.groupBy(_ % 2).collect()
> {code}
> 4. Executors will be killed and allocated all the time, which makes the Job 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5522) Accelerate the History Server start

2015-03-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5522:
-
Affects Version/s: 1.0.0

> Accelerate the History Server start
> ---
>
> Key: SPARK-5522
> URL: https://issues.apache.org/jira/browse/SPARK-5522
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 1.0.0
>Reporter: Liangliang Gu
> Fix For: 1.4.0
>
>
> When starting the history server, all the log files will be fetched and 
> parsed in order to get the applications' meta data e.g. App Name, Start Time, 
> Duration, etc. In our production cluster, there exist 2600 log files (160G) 
> in HDFS and it costs 3 hours to restart the history server, which is a little 
> bit too long for us.
> It would be better, if the history server can show logs with missing 
> information during start-up and fill the missing information after fetching 
> and parsing a log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5522) Accelerate the History Server start

2015-03-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5522:
-
Assignee: Liangliang Gu

> Accelerate the History Server start
> ---
>
> Key: SPARK-5522
> URL: https://issues.apache.org/jira/browse/SPARK-5522
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 1.0.0
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Fix For: 1.4.0
>
>
> When starting the history server, all the log files will be fetched and 
> parsed in order to get the applications' meta data e.g. App Name, Start Time, 
> Duration, etc. In our production cluster, there exist 2600 log files (160G) 
> in HDFS and it costs 3 hours to restart the history server, which is a little 
> bit too long for us.
> It would be better, if the history server can show logs with missing 
> information during start-up and fill the missing information after fetching 
> and parsing a log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5522) Accelerate the History Server start

2015-03-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5522:
-
Fix Version/s: 1.4.0

> Accelerate the History Server start
> ---
>
> Key: SPARK-5522
> URL: https://issues.apache.org/jira/browse/SPARK-5522
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 1.0.0
>Reporter: Liangliang Gu
> Fix For: 1.4.0
>
>
> When starting the history server, all the log files will be fetched and 
> parsed in order to get the applications' meta data e.g. App Name, Start Time, 
> Duration, etc. In our production cluster, there exist 2600 log files (160G) 
> in HDFS and it costs 3 hours to restart the history server, which is a little 
> bit too long for us.
> It would be better, if the history server can show logs with missing 
> information during start-up and fill the missing information after fetching 
> and parsing a log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4777) Some block memory after unrollSafely not count into used memory(memoryStore.entrys or unrollMemory)

2015-03-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4777.

  Resolution: Fixed
   Fix Version/s: 1.4.0
Assignee: SuYan
Target Version/s: 1.4.0

> Some block memory after unrollSafely not count into used 
> memory(memoryStore.entrys or unrollMemory)
> ---
>
> Key: SPARK-4777
> URL: https://issues.apache.org/jira/browse/SPARK-4777
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: SuYan
>Assignee: SuYan
> Fix For: 1.4.0
>
>
> Some memory not count into memory used by memoryStore or unrollMemory.
> Thread A after unrollsafely memory, it will release 40MB unrollMemory(40MB 
> will used by other threads). then ThreadA wait get accountingLock to tryToPut 
> blockA(30MB). before Thread A get accountingLock, blockA memory size is not 
> counting into unrollMemory or memoryStore.currentMemory.
>   
>  IIUC, freeMemory should minus that block memory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6132) Context cleaner thread lives across SparkContexts

2015-03-02 Thread Andrew Or (JIRA)
Andrew Or created SPARK-6132:


 Summary: Context cleaner thread lives across SparkContexts
 Key: SPARK-6132
 URL: https://issues.apache.org/jira/browse/SPARK-6132
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: Andrew Or
Assignee: Andrew Or


The context cleaner thread is not stopped properly. If a SparkContext is 
started immediately after one stops, the context cleaner of the former can 
clean variables in the latter.

This is because the cleaner.stop() just sets a flag and expects the thread to 
terminate asynchronously, but the code to clean broadcasts goes through 
`SparkEnv.get.blockManager`, which could belong to a different SparkContext.

The right behavior is to wait until all currently running clean up tasks have 
finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6133) SparkContext#stop is not idempotent

2015-03-02 Thread Andrew Or (JIRA)
Andrew Or created SPARK-6133:


 Summary: SparkContext#stop is not idempotent
 Key: SPARK-6133
 URL: https://issues.apache.org/jira/browse/SPARK-6133
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Andrew Or
Assignee: Andrew Or


If we call sc.stop() twice, the listener bus will attempt to log an event after 
stop() is called, resulting in a scary error message. This happens if Spark 
calls sc.stop() internally (it does this on certain error conditions) and the 
application code calls it again, for instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6132) Context cleaner thread lives across SparkContexts

2015-03-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6132:
-
Description: 
The context cleaner thread is not stopped properly. If a SparkContext is 
started immediately after one stops, the context cleaner of the former can 
clean variables in the latter.

This is because the cleaner.stop() just sets a flag and expects the thread to 
terminate asynchronously, but the code to clean broadcasts goes through 
`SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
This is likely to be the cause of the `JavaAPISuite`, which creates many 
back-to-back SparkContexts, being flaky:
{code}
java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_0_piece0 of broadcast_0
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
...
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of 
broadcast_0
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
{code}
The right behavior is to wait until all currently running clean up tasks have 
finished.

  was:
The context cleaner thread is not stopped properly. If a SparkContext is 
started immediately after one stops, the context cleaner of the former can 
clean variables in the latter.

This is because the cleaner.stop() just sets a flag and expects the thread to 
terminate asynchronously, but the code to clean broadcasts goes through 
`SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
This is likely to be the cause of the `JavaAPISuite`, which creates many 
back-to-back SparkContexts, being flaky:
{code}
java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_0_piece0 of broadcast_0
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of 
broadcast_0
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
{code}
The right behavior is to wait until all currently running clean up tasks have 
finished.


> Context cleaner thread lives across SparkContexts
> -
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky:
> {code}
> java.io.IOException: org.apache.s

[jira] [Updated] (SPARK-6132) Context cleaner thread lives across SparkContexts

2015-03-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6132:
-
Description: 
The context cleaner thread is not stopped properly. If a SparkContext is 
started immediately after one stops, the context cleaner of the former can 
clean variables in the latter.

This is because the cleaner.stop() just sets a flag and expects the thread to 
terminate asynchronously, but the code to clean broadcasts goes through 
`SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
This is likely to be the cause of the `JavaAPISuite`, which creates many 
back-to-back SparkContexts, being flaky.

The right behavior is to wait until all currently running clean up tasks have 
finished.
{code}
java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_0_piece0 of broadcast_0
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
...
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of 
broadcast_0
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
{code}

  was:
The context cleaner thread is not stopped properly. If a SparkContext is 
started immediately after one stops, the context cleaner of the former can 
clean variables in the latter.

This is because the cleaner.stop() just sets a flag and expects the thread to 
terminate asynchronously, but the code to clean broadcasts goes through 
`SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
This is likely to be the cause of the `JavaAPISuite`, which creates many 
back-to-back SparkContexts, being flaky:
{code}
java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_0_piece0 of broadcast_0
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
...
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of 
broadcast_0
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
{code}
The right behavior is to wait until all currently running clean up tasks have 
finished.


> Context cleaner thread lives across SparkContexts
> -
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky.
> The right behavior is to wait until all currently running clean up tasks have 
> finished.
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> ...
> Caused by: org.apache.spark.SparkException:

[jira] [Updated] (SPARK-6132) Context cleaner thread lives across SparkContexts

2015-03-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6132:
-
Description: 
The context cleaner thread is not stopped properly. If a SparkContext is 
started immediately after one stops, the context cleaner of the former can 
clean variables in the latter.

This is because the cleaner.stop() just sets a flag and expects the thread to 
terminate asynchronously, but the code to clean broadcasts goes through 
`SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
This is likely to be the cause of the `JavaAPISuite`, which creates many 
back-to-back SparkContexts, being flaky:
{code}
java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_0_piece0 of broadcast_0
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of 
broadcast_0
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
{code}
The right behavior is to wait until all currently running clean up tasks have 
finished.

  was:
The context cleaner thread is not stopped properly. If a SparkContext is 
started immediately after one stops, the context cleaner of the former can 
clean variables in the latter.

This is because the cleaner.stop() just sets a flag and expects the thread to 
terminate asynchronously, but the code to clean broadcasts goes through 
`SparkEnv.get.blockManager`, which could belong to a different SparkContext.

The right behavior is to wait until all currently running clean up tasks have 
finished.


> Context cleaner thread lives across SparkContexts
> -
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky:
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java

[jira] [Closed] (SPARK-6020) Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite

2015-03-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6020.

Resolution: Fixed

> Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite
> -
>
> Key: SPARK-6020
> URL: https://issues.apache.org/jira/browse/SPARK-6020
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Cheng Lian
>Priority: Critical
>
> Observed in the following builds, only one of which has something to do with 
> SQL:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27931/
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27930/
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27929/
> org.apache.spark.sql.columnar.PartitionBatchPruningSuite.SELECT key FROM 
> pruningData WHERE NOT (key IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
> 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30))
> {code}
> Error Message
> 8 did not equal 10 Wrong number of read batches: == Parsed Logical Plan == 
> 'Project ['key]  'Filter NOT 'key IN 
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
>'UnresolvedRelation [pruningData], None  == Analyzed Logical Plan == 
> Project [key#5245]  Filter NOT key#5245 IN 
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
>LogicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] at mapPartitions 
> at ExistingRDD.scala:35  == Optimized Logical Plan == Project [key#5245]  
> Filter NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)
>InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, 
> false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] 
> at mapPartitions at ExistingRDD.scala:35), Some(pruningData)  == Physical 
> Plan == Filter NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)
>   InMemoryColumnarTableScan [key#5245], [NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)],
>  (InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, 
> false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] 
> at mapPartitions at ExistingRDD.scala:35), Some(pruningData))  Code 
> Generation: false == RDD ==
> Stacktrace
> sbt.ForkMain$ForkError: 8 did not equal 10 Wrong number of read batches: == 
> Parsed Logical Plan ==
> 'Project ['key]
>  'Filter NOT 'key IN 
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
>   'UnresolvedRelation [pruningData], None
> == Analyzed Logical Plan ==
> Project [key#5245]
>  Filter NOT key#5245 IN 
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
>   LogicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] at mapPartitions 
> at ExistingRDD.scala:35
> == Optimized Logical Plan ==
> Project [key#5245]
>  Filter NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)
>   InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, 
> false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] 
> at mapPartitions at ExistingRDD.scala:35), Some(pruningData)
> == Physical Plan ==
> Filter NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)
>  InMemoryColumnarTableScan [key#5245], [NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)],
>  (InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, 
> false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] 
> at mapPartitions at ExistingRDD.scala:35), Some(pruningData))
> Code Generation: false
> == RDD ==
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply$mcV$sp(PartitionBatchPruningSuite.scala:119)
>   at 
> org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply(PartitionBatchPruningSuite.scala:107)
>   at 
> org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply(PartitionBatchPruningSuite.scala:107)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$cl

[jira] [Commented] (SPARK-6020) Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite

2015-03-02 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344628#comment-14344628
 ] 

Andrew Or commented on SPARK-6020:
--

Ok, I will close this as resolved for now. We can always reopen it if it's 
flaky again. Thanks Josh.

> Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite
> -
>
> Key: SPARK-6020
> URL: https://issues.apache.org/jira/browse/SPARK-6020
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Cheng Lian
>Priority: Critical
>
> Observed in the following builds, only one of which has something to do with 
> SQL:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27931/
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27930/
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27929/
> org.apache.spark.sql.columnar.PartitionBatchPruningSuite.SELECT key FROM 
> pruningData WHERE NOT (key IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
> 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30))
> {code}
> Error Message
> 8 did not equal 10 Wrong number of read batches: == Parsed Logical Plan == 
> 'Project ['key]  'Filter NOT 'key IN 
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
>'UnresolvedRelation [pruningData], None  == Analyzed Logical Plan == 
> Project [key#5245]  Filter NOT key#5245 IN 
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
>LogicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] at mapPartitions 
> at ExistingRDD.scala:35  == Optimized Logical Plan == Project [key#5245]  
> Filter NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)
>InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, 
> false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] 
> at mapPartitions at ExistingRDD.scala:35), Some(pruningData)  == Physical 
> Plan == Filter NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)
>   InMemoryColumnarTableScan [key#5245], [NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)],
>  (InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, 
> false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] 
> at mapPartitions at ExistingRDD.scala:35), Some(pruningData))  Code 
> Generation: false == RDD ==
> Stacktrace
> sbt.ForkMain$ForkError: 8 did not equal 10 Wrong number of read batches: == 
> Parsed Logical Plan ==
> 'Project ['key]
>  'Filter NOT 'key IN 
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
>   'UnresolvedRelation [pruningData], None
> == Analyzed Logical Plan ==
> Project [key#5245]
>  Filter NOT key#5245 IN 
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
>   LogicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] at mapPartitions 
> at ExistingRDD.scala:35
> == Optimized Logical Plan ==
> Project [key#5245]
>  Filter NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)
>   InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, 
> false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] 
> at mapPartitions at ExistingRDD.scala:35), Some(pruningData)
> == Physical Plan ==
> Filter NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)
>  InMemoryColumnarTableScan [key#5245], [NOT key#5245 INSET 
> (5,10,24,25,14,20,29,1,6,28,21,9,13,2,17,22,27,12,7,3,18,16,11,26,23,8,30,19,4,15)],
>  (InMemoryRelation [key#5245,value#5246], true, 10, StorageLevel(true, true, 
> false, true, 1), (PhysicalRDD [key#5245,value#5246], MapPartitionsRDD[3202] 
> at mapPartitions at ExistingRDD.scala:35), Some(pruningData))
> Code Generation: false
> == RDD ==
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply$mcV$sp(PartitionBatchPruningSuite.scala:119)
>   at 
> org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply(PartitionBatchPruningSuite.scala:107)
>   at 
> org.apache.spark.sql.columnar.PartitionBatchPruningSuite$$anonfun$checkBatchPruning$1.apply(PartitionBatchPruningSuite.

[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6132:
-
Summary: Context cleaner race condition across SparkContexts  (was: Context 
cleaner thread lives across SparkContexts)

> Context cleaner race condition across SparkContexts
> ---
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky.
> The right behavior is to wait until all currently running clean up tasks have 
> finished.
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> ...
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 
> of broadcast_0
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6132:
-
Affects Version/s: (was: 1.3.0)
   1.0.0

> Context cleaner race condition across SparkContexts
> ---
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky.
> The right behavior is to wait until all currently running clean up tasks have 
> finished.
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> ...
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 
> of broadcast_0
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3859) Use consistent config names for duration (with units!)

2015-03-03 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365
 ] 

Andrew Or commented on SPARK-3859:
--

The problem is we keep adding more and more of these inconsistent policies 
because there isn't really a guideline to follow. When I review other people's 
patches there isn't really the "correct" way to name a new config. We will have 
to deprecate the old ones in a nicer fashion than what we can do today, and 
this is why I opened SPARK-5933.

HOWEVER this one is duplicated by a more specific one I opened recently. I had 
forgotten that I already opened this issue a while ago. I'm closing this one in 
favor of the new one.

> Use consistent config names for duration (with units!)
> --
>
> Key: SPARK-3859
> URL: https://issues.apache.org/jira/browse/SPARK-3859
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>
> There are many configs in Spark that refer to some unit of time. However, 
> from the first glance it is unclear what these units are. We should find a 
> consistent way to append the units to the end of these config names and 
> deprecate the old ones in favor of the more consistent ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3859) Use consistent config names for duration (with units!)

2015-03-03 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365
 ] 

Andrew Or edited comment on SPARK-3859 at 3/3/15 5:26 PM:
--

The problem is we keep adding more and more of these inconsistent properties 
because there isn't really a guideline to follow. When I review other people's 
patches there isn't really the "correct" way to name a new config. We will have 
to deprecate the old ones in a nicer fashion than what we can do today, and 
this is why I opened SPARK-5933.

HOWEVER this one is duplicated by a more specific one I opened recently. I had 
forgotten that I already opened this issue a while ago. I'm closing this one in 
favor of the new one.


was (Author: andrewor14):
The problem is we keep adding more and more of these inconsistent policies 
because there isn't really a guideline to follow. When I review other people's 
patches there isn't really the "correct" way to name a new config. We will have 
to deprecate the old ones in a nicer fashion than what we can do today, and 
this is why I opened SPARK-5933.

HOWEVER this one is duplicated by a more specific one I opened recently. I had 
forgotten that I already opened this issue a while ago. I'm closing this one in 
favor of the new one.

> Use consistent config names for duration (with units!)
> --
>
> Key: SPARK-3859
> URL: https://issues.apache.org/jira/browse/SPARK-3859
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>
> There are many configs in Spark that refer to some unit of time. However, 
> from the first glance it is unclear what these units are. We should find a 
> consistent way to append the units to the end of these config names and 
> deprecate the old ones in favor of the more consistent ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3859) Use consistent config names for duration (with units!)

2015-03-03 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365
 ] 

Andrew Or edited comment on SPARK-3859 at 3/3/15 5:27 PM:
--

The problem is we keep adding more and more of these inconsistent properties 
because there isn't really a guideline to follow. When I review other people's 
patches there isn't really the "correct" way to name a new config. We will have 
to deprecate the old ones in a nicer fashion than what we can do today, and 
this is why I opened SPARK-5933.

HOWEVER this one is duplicated by a more specific one I opened recently. I had 
forgotten that I had already opened this issue a while ago. I'm closing this 
one in favor of the new one.


was (Author: andrewor14):
The problem is we keep adding more and more of these inconsistent properties 
because there isn't really a guideline to follow. When I review other people's 
patches there isn't really the "correct" way to name a new config. We will have 
to deprecate the old ones in a nicer fashion than what we can do today, and 
this is why I opened SPARK-5933.

HOWEVER this one is duplicated by a more specific one I opened recently. I had 
forgotten that I already opened this issue a while ago. I'm closing this one in 
favor of the new one.

> Use consistent config names for duration (with units!)
> --
>
> Key: SPARK-3859
> URL: https://issues.apache.org/jira/browse/SPARK-3859
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>
> There are many configs in Spark that refer to some unit of time. However, 
> from the first glance it is unclear what these units are. We should find a 
> consistent way to append the units to the end of these config names and 
> deprecate the old ones in favor of the more consistent ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3859) Use consistent config names for duration (with units!)

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-3859.

Resolution: Duplicate

> Use consistent config names for duration (with units!)
> --
>
> Key: SPARK-3859
> URL: https://issues.apache.org/jira/browse/SPARK-3859
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>
> There are many configs in Spark that refer to some unit of time. However, 
> from the first glance it is unclear what these units are. We should find a 
> consistent way to append the units to the end of these config names and 
> deprecate the old ones in favor of the more consistent ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3859) Use consistent config names for duration (with units!)

2015-03-03 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365
 ] 

Andrew Or edited comment on SPARK-3859 at 3/3/15 5:27 PM:
--

The problem is we keep adding more and more of these inconsistent properties 
because there isn't really a guideline to follow. When I review other people's 
patches there isn't really the "correct" way to name a new config. We will have 
to deprecate the old ones in a nicer fashion than what we can do today, and 
this is why I opened SPARK-5933.

HOWEVER this one is duplicated by a more specific one I opened recently. I 
forgot that I had already opened this issue a while ago. I'm closing this one 
in favor of the new one.


was (Author: andrewor14):
The problem is we keep adding more and more of these inconsistent properties 
because there isn't really a guideline to follow. When I review other people's 
patches there isn't really the "correct" way to name a new config. We will have 
to deprecate the old ones in a nicer fashion than what we can do today, and 
this is why I opened SPARK-5933.

HOWEVER this one is duplicated by a more specific one I opened recently. I had 
forgotten that I had already opened this issue a while ago. I'm closing this 
one in favor of the new one.

> Use consistent config names for duration (with units!)
> --
>
> Key: SPARK-3859
> URL: https://issues.apache.org/jira/browse/SPARK-3859
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>
> There are many configs in Spark that refer to some unit of time. However, 
> from the first glance it is unclear what these units are. We should find a 
> consistent way to append the units to the end of these config names and 
> deprecate the old ones in favor of the more consistent ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3859) Use consistent config names for duration (with units!)

2015-03-03 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345365#comment-14345365
 ] 

Andrew Or edited comment on SPARK-3859 at 3/3/15 5:28 PM:
--

The problem is we keep adding more and more of these inconsistent properties 
because there isn't really a guideline to follow. When I review other people's 
patches there isn't really the "correct" way to name a new config to ask them 
to follow. We will have to deprecate the old ones in a nicer fashion than what 
we can do today, and this is why I opened SPARK-5933.

HOWEVER this one is duplicated by a more specific one I opened recently. I 
forgot that I had already opened this issue a while ago. I'm closing this one 
in favor of the new one.


was (Author: andrewor14):
The problem is we keep adding more and more of these inconsistent properties 
because there isn't really a guideline to follow. When I review other people's 
patches there isn't really the "correct" way to name a new config. We will have 
to deprecate the old ones in a nicer fashion than what we can do today, and 
this is why I opened SPARK-5933.

HOWEVER this one is duplicated by a more specific one I opened recently. I 
forgot that I had already opened this issue a while ago. I'm closing this one 
in favor of the new one.

> Use consistent config names for duration (with units!)
> --
>
> Key: SPARK-3859
> URL: https://issues.apache.org/jira/browse/SPARK-3859
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>
> There are many configs in Spark that refer to some unit of time. However, 
> from the first glance it is unclear what these units are. We should find a 
> consistent way to append the units to the end of these config names and 
> deprecate the old ones in favor of the more consistent ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6132:
-
Labels: backport-needed  (was: )

> Context cleaner race condition across SparkContexts
> ---
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>  Labels: backport-needed
> Fix For: 1.4.0
>
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky.
> The right behavior is to wait until all currently running clean up tasks have 
> finished.
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> ...
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 
> of broadcast_0
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6132:
-
Fix Version/s: 1.4.0

> Context cleaner race condition across SparkContexts
> ---
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>  Labels: backport-needed
> Fix For: 1.4.0
>
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky.
> The right behavior is to wait until all currently running clean up tasks have 
> finished.
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> ...
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 
> of broadcast_0
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6132:
-
Target Version/s: 1.4.0, 1.3.1  (was: 1.4.0)

> Context cleaner race condition across SparkContexts
> ---
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>  Labels: backport-needed
> Fix For: 1.4.0
>
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky.
> The right behavior is to wait until all currently running clean up tasks have 
> finished.
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> ...
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 
> of broadcast_0
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6132) Context cleaner race condition across SparkContexts

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6132:
-
Target Version/s: 1.1.2, 1.2.2, 1.4.0, 1.3.1  (was: 1.4.0, 1.3.1)

> Context cleaner race condition across SparkContexts
> ---
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>  Labels: backport-needed
> Fix For: 1.4.0
>
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky.
> The right behavior is to wait until all currently running clean up tasks have 
> finished.
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> ...
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 
> of broadcast_0
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6133) SparkContext#stop is not idempotent

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6133:
-
Target Version/s: 1.2.2, 1.4.0  (was: 1.4.0)

> SparkContext#stop is not idempotent
> ---
>
> Key: SPARK-6133
> URL: https://issues.apache.org/jira/browse/SPARK-6133
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.2.2, 1.4.0
>
>
> If we call sc.stop() twice, the listener bus will attempt to log an event 
> after stop() is called, resulting in a scary error message. This happens if 
> Spark calls sc.stop() internally (it does this on certain error conditions) 
> and the application code calls it again, for instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6133) SparkContext#stop is not idempotent

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6133:
-
Target Version/s: 1.2.2, 1.4.0, 1.3.1  (was: 1.2.2, 1.4.0)

> SparkContext#stop is not idempotent
> ---
>
> Key: SPARK-6133
> URL: https://issues.apache.org/jira/browse/SPARK-6133
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.2.2, 1.4.0
>
>
> If we call sc.stop() twice, the listener bus will attempt to log an event 
> after stop() is called, resulting in a scary error message. This happens if 
> Spark calls sc.stop() internally (it does this on certain error conditions) 
> and the application code calls it again, for instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6133) SparkContext#stop is not idempotent

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6133:
-
Fix Version/s: 1.4.0

> SparkContext#stop is not idempotent
> ---
>
> Key: SPARK-6133
> URL: https://issues.apache.org/jira/browse/SPARK-6133
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.2.2, 1.4.0
>
>
> If we call sc.stop() twice, the listener bus will attempt to log an event 
> after stop() is called, resulting in a scary error message. This happens if 
> Spark calls sc.stop() internally (it does this on certain error conditions) 
> and the application code calls it again, for instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6133) SparkContext#stop is not idempotent

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6133:
-
Fix Version/s: 1.2.2

> SparkContext#stop is not idempotent
> ---
>
> Key: SPARK-6133
> URL: https://issues.apache.org/jira/browse/SPARK-6133
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.2.2, 1.4.0
>
>
> If we call sc.stop() twice, the listener bus will attempt to log an event 
> after stop() is called, resulting in a scary error message. This happens if 
> Spark calls sc.stop() internally (it does this on certain error conditions) 
> and the application code calls it again, for instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6144) When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail

2015-03-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6144:
-
Assignee: Trystan Leftwich

> When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail
> ---
>
> Key: SPARK-6144
> URL: https://issues.apache.org/jira/browse/SPARK-6144
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Trystan Leftwich
>Assignee: Trystan Leftwich
>Priority: Blocker
>
> While in cluster mode if you use ADD JAR with a HDFS sourced jar it will fail 
> trying to source that jar on the worker nodes with the following error:
> {code}
> 15/03/03 04:56:50 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 
> (TID 0)
> java.io.FileNotFoundException: 
> /yarn/nm/usercache/vagrant/appcache/application_1425166832391_0027/-19222735701425358546704_cache
>  (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:146)
> {code}
> PR https://github.com/apache/spark/pull/4880



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6144) When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail

2015-03-03 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346396#comment-14346396
 ] 

Andrew Or commented on SPARK-6144:
--

I believe this is a regression from 1.2 caused by 
https://github.com/apache/spark/pull/3670.

> When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail
> ---
>
> Key: SPARK-6144
> URL: https://issues.apache.org/jira/browse/SPARK-6144
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Trystan Leftwich
>Priority: Blocker
>
> While in cluster mode if you use ADD JAR with a HDFS sourced jar it will fail 
> trying to source that jar on the worker nodes with the following error:
> {code}
> 15/03/03 04:56:50 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 
> (TID 0)
> java.io.FileNotFoundException: 
> /yarn/nm/usercache/vagrant/appcache/application_1425166832391_0027/-19222735701425358546704_cache
>  (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:146)
> {code}
> PR https://github.com/apache/spark/pull/4880



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6159) Distinguish between inprogress and abnormal event log history

2015-03-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6159:
-
Affects Version/s: 1.0.0

> Distinguish between inprogress and abnormal event log history
> -
>
> Key: SPARK-6159
> URL: https://issues.apache.org/jira/browse/SPARK-6159
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> This is the following up of SPARK-6107. Currently, when an application is 
> terminated abnormally (ex. Ctrl + C), its log file is still in ".inprogress" 
> format. SPARK-6107 makes the inprogress log readable to SparkUI.
> However, I think we should be able to distinguish between real inprogress 
> case and abnormal case. So this fixing tries to add a shutdownhook to 
> EventLoggingListener and rename ".inprogress" log to ".abnormal" log.
> Then we can know what it is the case when reading log in rebuildSparkUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6144) When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail

2015-03-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6144.

   Resolution: Fixed
Fix Version/s: 1.3.0

> When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail
> ---
>
> Key: SPARK-6144
> URL: https://issues.apache.org/jira/browse/SPARK-6144
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Trystan Leftwich
>Assignee: Trystan Leftwich
>Priority: Blocker
> Fix For: 1.3.0
>
>
> While in cluster mode if you use ADD JAR with a HDFS sourced jar it will fail 
> trying to source that jar on the worker nodes with the following error:
> {code}
> 15/03/03 04:56:50 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 
> (TID 0)
> java.io.FileNotFoundException: 
> /yarn/nm/usercache/vagrant/appcache/application_1425166832391_0027/-19222735701425358546704_cache
>  (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:146)
> {code}
> PR https://github.com/apache/spark/pull/4880



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6171) No class def found for HiveConf in Spark shell

2015-03-04 Thread Andrew Or (JIRA)
Andrew Or created SPARK-6171:


 Summary: No class def found for HiveConf in Spark shell
 Key: SPARK-6171
 URL: https://issues.apache.org/jira/browse/SPARK-6171
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell, SQL
Affects Versions: 1.3.0
Reporter: Andrew Or
Assignee: Michael Armbrust
Priority: Blocker


I ran `build/sbt clean assembly` and then started the Spark shell, clean and 
simple, then I hit this huge stack trace. I can still run Spark jobs no 
problem, but we probably shouldn't be throwing this on a clean build.

{code}
15/03/04 14:09:15 INFO SparkILoop: Created spark context..
Spark context available as sc.
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
at java.lang.Class.getConstructor0(Class.java:2803)
at java.lang.Class.getConstructor(Class.java:1718)
at 
org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026)
at $iwC$$iwC.(:9)
at $iwC.(:18)
at (:20)
at .(:24)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6171) No class def found for HiveConf in Spark shell

2015-03-04 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347669#comment-14347669
 ] 

Andrew Or commented on SPARK-6171:
--

Marking this as a blocker because I believe it's a regression caused by 
https://github.com/apache/spark/pull/4387

> No class def found for HiveConf in Spark shell
> --
>
> Key: SPARK-6171
> URL: https://issues.apache.org/jira/browse/SPARK-6171
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Michael Armbrust
>Priority: Blocker
>
> I ran `build/sbt clean assembly` and then started the Spark shell, clean and 
> simple, then I hit this huge stack trace. I can still run Spark jobs no 
> problem, but we probably shouldn't be throwing this on a clean build.
> {code}
> 15/03/04 14:09:15 INFO SparkILoop: Created spark context..
> Spark context available as sc.
> java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
>   at java.lang.Class.getDeclaredConstructors0(Native Method)
>   at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
>   at java.lang.Class.getConstructor0(Class.java:2803)
>   at java.lang.Class.getConstructor(Class.java:1718)
>   at 
> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026)
>   at $iwC$$iwC.(:9)
>   at $iwC.(:18)
>   at (:20)
>   at .(:24)
>   at .()
>   at .(:7)
>   at .()
>   at $print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>   at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6171) No class def found for HiveConf in Spark shell

2015-03-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6171:
-
Description: 
I ran `build/sbt clean assembly` and then started the Spark shell, then I hit 
this huge stack trace. I didn't enable hive in my build, but I wasn't planning 
on using SQL either. I can still run Spark jobs no problem, but we probably 
shouldn't be throwing this on a clean build.

{code}
15/03/04 14:09:15 INFO SparkILoop: Created spark context..
Spark context available as sc.
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
at java.lang.Class.getConstructor0(Class.java:2803)
at java.lang.Class.getConstructor(Class.java:1718)
at 
org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026)
at $iwC$$iwC.(:9)
at $iwC.(:18)
at (:20)
at .(:24)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI
{code}

  was:
I ran `build/sbt clean assembly` and then started the Spark shell, clean and 
simple, then I hit this huge stack trace. I can still run Spark jobs no 
problem, but we probably shouldn't be throwing this on a clean build.

{code}
15/03/04 14:09:15 INFO SparkILoop: Created spark context..
Spark context available as sc.
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
at java.lang.Class.getConstructor0(Class.java:2803)
at java.lang.Class.getConstructor(Class.java:1718)
at 
org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026)
at $iwC$$iwC.(:9)
at $iwC.(:18)
at (:20)
at .(:24)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI
{code}


> No class def found for HiveConf in Spark shell
> --
>
> Key: SPARK-6171
> URL: https://issues.apache.org/jira/browse/SPARK-6171
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Michael Armbrust
>Priority: Blocker
>
> I ran `build/sbt clean assembly` and then started the Spark shell, then I hit 
> this huge stack trace. I didn't enable hive in my build, but I wasn't 
> planning on using SQL either. I can still run Spark jobs no problem, but we 
> probably shouldn't be throwing this on a clean build.
> {code}
> 15/03/04 14:09:15 INFO SparkILoop: Created spark context..
> Spark context available as sc.
> java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
>   at java.lang.Class.getDeclaredConstructors0(Native Method)
>   at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
>   at java.lang.Class.getConstructor0(Class.java:2803)
>   at java.lang.Class.getConstructor(Class.java:1718)
>   at 
> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026)
>   at $iwC$$iwC.(:9)
>   at $iwC.(:18)
>   at (:20)
>   at .(:24)
>   at .()
>   at .(:7)
>   at .()
>   at $print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>   at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---

[jira] [Commented] (SPARK-6171) No class def found for HiveConf in Spark shell

2015-03-04 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347899#comment-14347899
 ] 

Andrew Or commented on SPARK-6171:
--

Closing as cannot reproduced. It must have been something wrong with my 
environment...

> No class def found for HiveConf in Spark shell
> --
>
> Key: SPARK-6171
> URL: https://issues.apache.org/jira/browse/SPARK-6171
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Michael Armbrust
>Priority: Blocker
>
> I ran `build/sbt clean assembly` and then started the Spark shell, then I hit 
> this huge stack trace. I didn't enable hive in my build, but I wasn't 
> planning on using SQL either. I can still run Spark jobs no problem, but we 
> probably shouldn't be throwing this on a clean build.
> {code}
> 15/03/04 14:09:15 INFO SparkILoop: Created spark context..
> Spark context available as sc.
> java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
>   at java.lang.Class.getDeclaredConstructors0(Native Method)
>   at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
>   at java.lang.Class.getConstructor0(Class.java:2803)
>   at java.lang.Class.getConstructor(Class.java:1718)
>   at 
> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026)
>   at $iwC$$iwC.(:9)
>   at $iwC.(:18)
>   at (:20)
>   at .(:24)
>   at .()
>   at .(:7)
>   at .()
>   at $print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>   at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6171) No class def found for HiveConf in Spark shell

2015-03-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6171.

Resolution: Cannot Reproduce

> No class def found for HiveConf in Spark shell
> --
>
> Key: SPARK-6171
> URL: https://issues.apache.org/jira/browse/SPARK-6171
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Michael Armbrust
>Priority: Blocker
>
> I ran `build/sbt clean assembly` and then started the Spark shell, then I hit 
> this huge stack trace. I didn't enable hive in my build, but I wasn't 
> planning on using SQL either. I can still run Spark jobs no problem, but we 
> probably shouldn't be throwing this on a clean build.
> {code}
> 15/03/04 14:09:15 INFO SparkILoop: Created spark context..
> Spark context available as sc.
> java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
>   at java.lang.Class.getDeclaredConstructors0(Native Method)
>   at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
>   at java.lang.Class.getConstructor0(Class.java:2803)
>   at java.lang.Class.getConstructor(Class.java:1718)
>   at 
> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1026)
>   at $iwC$$iwC.(:9)
>   at $iwC.(:18)
>   at (:20)
>   at .(:24)
>   at .()
>   at .(:7)
>   at .()
>   at $print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>   at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkI
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5173) support python application running on yarn cluster mode

2015-03-24 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378035#comment-14378035
 ] 

Andrew Or commented on SPARK-5173:
--

It appears not. I just closed it.

> support python application running on yarn cluster mode
> ---
>
> Key: SPARK-5173
> URL: https://issues.apache.org/jira/browse/SPARK-5173
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Lianhui Wang
> Fix For: 1.3.0
>
>
> now when we run python application on yarn cluster mode through spark-submit, 
> spark-submit doesnot support python application on yarn cluster mode.so i 
> modify code of submit and yarn's AM in order to support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5173) support python application running on yarn cluster mode

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5173.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Lianhui Wang

> support python application running on yarn cluster mode
> ---
>
> Key: SPARK-5173
> URL: https://issues.apache.org/jira/browse/SPARK-5173
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Lianhui Wang
>Assignee: Lianhui Wang
> Fix For: 1.3.0
>
>
> now when we run python application on yarn cluster mode through spark-submit, 
> spark-submit doesnot support python application on yarn cluster mode.so i 
> modify code of submit and yarn's AM in order to support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6209) ExecutorClassLoader can leak connections after failing to load classes from the REPL class server

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6209:
-
Fix Version/s: 1.4.0
   1.3.1

> ExecutorClassLoader can leak connections after failing to load classes from 
> the REPL class server
> -
>
> Key: SPARK-6209
> URL: https://issues.apache.org/jira/browse/SPARK-6209
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.0.3, 1.1.2, 1.2.1, 1.3.0, 1.4.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
> Fix For: 1.3.1, 1.4.0
>
>
> ExecutorClassLoader does not ensure proper cleanup of network connections 
> that it opens.  If it fails to load a class, it may leak partially-consumed 
> InputStreams that are connected to the REPL's HTTP class server, causing that 
> server to exhaust its thread pool, which can cause the entire job to hang.
> Here is a simple reproduction:
> With
> {code}
> ./bin/spark-shell --master local-cluster[8,8,512] 
> {code}
> run the following command:
> {code}
> sc.parallelize(1 to 1000, 1000).map { x =>
>   try {
>   Class.forName("some.class.that.does.not.Exist")
>   } catch {
>   case e: Exception => // do nothing
>   }
>   x
> }.count()
> {code}
> This job will run 253 tasks, then will completely freeze without any errors 
> or failed tasks.
> It looks like the driver has 253 threads blocked in socketRead0() calls:
> {code}
> [joshrosen ~]$ jstack 16765 | grep socketRead0 | wc
>  253 759   14674
> {code}
> e.g.
> {code}
> "qtp1287429402-13" daemon prio=5 tid=0x7f868a1c nid=0x5b03 runnable 
> [0x0001159bd000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
> at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
> at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
> at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1044)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at 
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745) 
> {code}
> Jstack on the executors shows blocking in loadClass / findClass, where a 
> single thread is RUNNABLE and waiting to hear back from the driver and other 
> executor threads are BLOCKED on object monitor synchronization at 
> Class.forName0().
> Remotely triggering a GC on a hanging executor allows the job to progress and 
> complete more tasks before hanging again.  If I repeatedly trigger GC on all 
> of the executors, then the job runs to completion:
> {code}
> jps | grep CoarseGra | cut -d ' ' -f 1 | xargs -I {} -n 1 -P100 jcmd {} GC.run
> {code}
> The culprit is a {{catch}} block that ignores all exceptions and performs no 
> cleanup: 
> https://github.com/apache/spark/blob/v1.2.0/repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala#L94
> This bug has been present since Spark 1.0.0, but I suspect that we haven't 
> seen it before because it's pretty hard to reproduce. Triggering this error 
> requires a job with tasks that trigger ClassNotFoundExceptions yet are still 
> able to run to completion.  It also requires that executors are able to leak 
> enough open connections to exhaust the class server's Jetty thread pool 
> limit, which requires that there are a large number of tasks (253+) and 
> either a large number of executors or a very low amount of GC pressure on 
> those executors (since GC will cause the leaked connections to be closed).
> The fix here is pretty simple: add proper resource cleanup to this class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6079) Use index to speed up StatusTracker.getJobIdsForGroup()

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6079:
-
Affects Version/s: 1.3.0

> Use index to speed up StatusTracker.getJobIdsForGroup()
> ---
>
> Key: SPARK-6079
> URL: https://issues.apache.org/jira/browse/SPARK-6079
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Minor
> Fix For: 1.4.0
>
>
> {{StatusTracker.getJobIdsForGroup()}} is implemented via a linear scan over a 
> HashMap rather than using an index.  This might be an expensive operation if 
> there are many (e.g. thousands) of retained jobs.  We can add a new index to 
> speed this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6088) UI is malformed when tasks fetch remote results

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6088.

  Resolution: Fixed
   Fix Version/s: 1.4.0
  1.3.1
Target Version/s: 1.3.1, 1.4.0  (was: 1.3.0)

> UI is malformed when tasks fetch remote results
> ---
>
> Key: SPARK-6088
> URL: https://issues.apache.org/jira/browse/SPARK-6088
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.3.0
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
> Fix For: 1.3.1, 1.4.0
>
> Attachments: Screenshot 2015-02-28 18.24.42.png
>
>
> There are three issues when tasks get remote results:
> (1) The status never changes from GET_RESULT to SUCCEEDED
> (2) The time to get the result is shown as the absolute time (resulting in a 
> non-sensical output that says getting the result took >1 million hours) 
> rather than the elapsed time
> (3) The getting result time is included as part of the scheduler delay
> cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6088) UI is malformed when tasks fetch remote results

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6088:
-
Affects Version/s: 1.3.0

> UI is malformed when tasks fetch remote results
> ---
>
> Key: SPARK-6088
> URL: https://issues.apache.org/jira/browse/SPARK-6088
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.3.0
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
> Fix For: 1.3.1, 1.4.0
>
> Attachments: Screenshot 2015-02-28 18.24.42.png
>
>
> There are three issues when tasks get remote results:
> (1) The status never changes from GET_RESULT to SUCCEEDED
> (2) The time to get the result is shown as the absolute time (resulting in a 
> non-sensical output that says getting the result took >1 million hours) 
> rather than the elapsed time
> (3) The getting result time is included as part of the scheduler delay
> cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3570) Shuffle write time does not include time to open shuffle files

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-3570.

  Resolution: Fixed
   Fix Version/s: 1.4.0
  1.3.1
Target Version/s: 1.3.1, 1.4.0

> Shuffle write time does not include time to open shuffle files
> --
>
> Key: SPARK-3570
> URL: https://issues.apache.org/jira/browse/SPARK-3570
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.2, 1.0.2, 1.1.0
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
> Fix For: 1.3.1, 1.4.0
>
> Attachments: 3a_1410854905_0_job_log_waterfall.pdf, 
> 3a_1410957857_0_job_log_waterfall.pdf
>
>
> Currently, the reported shuffle write time does not include time to open the 
> shuffle files.  This time can be very significant when the disk is highly 
> utilized and many shuffle files exist on the machine (I'm not sure how severe 
> this is in 1.0 onward -- since shuffle files are automatically deleted, this 
> may be less of an issue because there are fewer old files sitting around).  
> In experiments I did, in extreme cases, adding the time to open files can 
> increase the shuffle write time from 5ms (of a 2 second task) to 1 second.  
> We should fix this for better performance debugging.
> Thanks [~shivaram] for helping to diagnose this problem.  cc [~pwendell]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5771:
-
Fix Version/s: (was: 1.4.0)

> Number of Cores in Completed Applications of Standalone Master Web Page 
> always be 0 if sc.stop() is called
> --
>
> Key: SPARK-5771
> URL: https://issues.apache.org/jira/browse/SPARK-5771
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
>Priority: Minor
>
> In Standalone mode, the number of cores in Completed Applications of the 
> Master Web Page will always be zero, if sc.stop() is called.
> But the number will always be right, if sc.stop() is not called.
> The reason maybe: 
> after sc.stop() is called, the function removeExecutor of class 
> ApplicationInfo will be called, thus reduce the variable coresGranted to 
> zero.  The variable coresGranted is used to display the number of Cores on 
> the Web Page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reopened SPARK-5771:
--

> Number of Cores in Completed Applications of Standalone Master Web Page 
> always be 0 if sc.stop() is called
> --
>
> Key: SPARK-5771
> URL: https://issues.apache.org/jira/browse/SPARK-5771
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
>Priority: Minor
>
> In Standalone mode, the number of cores in Completed Applications of the 
> Master Web Page will always be zero, if sc.stop() is called.
> But the number will always be right, if sc.stop() is not called.
> The reason maybe: 
> after sc.stop() is called, the function removeExecutor of class 
> ApplicationInfo will be called, thus reduce the variable coresGranted to 
> zero.  The variable coresGranted is used to display the number of Cores on 
> the Web Page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6469) Improving documentation on YARN local directories usage

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6469.

  Resolution: Fixed
   Fix Version/s: 1.4.0
  1.3.1
Assignee: Christophe Préaud
Target Version/s: 1.3.1, 1.4.0

> Improving documentation on YARN local directories usage
> ---
>
> Key: SPARK-6469
> URL: https://issues.apache.org/jira/browse/SPARK-6469
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, YARN
>Affects Versions: 1.0.0
>Reporter: Christophe Préaud
>Assignee: Christophe Préaud
>Priority: Minor
> Fix For: 1.3.1, 1.4.0
>
> Attachments: TestYarnVars.scala
>
>
> According to the [Spark YARN doc 
> page|http://spark.apache.org/docs/latest/running-on-yarn.html#important-notes],
>  Spark executors will use the local directories configured for YARN, not 
> {{spark.local.dir}} which should be ignored.
> However it should be noted that in yarn-client mode, though the executors 
> will indeed use the local directories configured for YARN, the driver will 
> not, because it is not running on the YARN cluster; the driver in yarn-client 
> will use the local directories defined in {{spark.local.dir}}
> Can this please be clarified in the Spark YARN documentation above?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6469) Improving documentation on YARN local directories usage

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6469:
-
Component/s: Documentation

> Improving documentation on YARN local directories usage
> ---
>
> Key: SPARK-6469
> URL: https://issues.apache.org/jira/browse/SPARK-6469
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, YARN
>Affects Versions: 1.0.0
>Reporter: Christophe Préaud
>Assignee: Christophe Préaud
>Priority: Minor
> Fix For: 1.3.1, 1.4.0
>
> Attachments: TestYarnVars.scala
>
>
> According to the [Spark YARN doc 
> page|http://spark.apache.org/docs/latest/running-on-yarn.html#important-notes],
>  Spark executors will use the local directories configured for YARN, not 
> {{spark.local.dir}} which should be ignored.
> However it should be noted that in yarn-client mode, though the executors 
> will indeed use the local directories configured for YARN, the driver will 
> not, because it is not running on the YARN cluster; the driver in yarn-client 
> will use the local directories defined in {{spark.local.dir}}
> Can this please be clarified in the Spark YARN documentation above?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6469) Improving documentation on YARN local directories usage

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6469:
-
Affects Version/s: 1.0.0

> Improving documentation on YARN local directories usage
> ---
>
> Key: SPARK-6469
> URL: https://issues.apache.org/jira/browse/SPARK-6469
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, YARN
>Affects Versions: 1.0.0
>Reporter: Christophe Préaud
>Assignee: Christophe Préaud
>Priority: Minor
> Fix For: 1.3.1, 1.4.0
>
> Attachments: TestYarnVars.scala
>
>
> According to the [Spark YARN doc 
> page|http://spark.apache.org/docs/latest/running-on-yarn.html#important-notes],
>  Spark executors will use the local directories configured for YARN, not 
> {{spark.local.dir}} which should be ignored.
> However it should be noted that in yarn-client mode, though the executors 
> will indeed use the local directories configured for YARN, the driver will 
> not, because it is not running on the YARN cluster; the driver in yarn-client 
> will use the local directories defined in {{spark.local.dir}}
> Can this please be clarified in the Spark YARN documentation above?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6081) DriverRunner doesn't support pulling HTTP/HTTPS URIs

2015-03-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6081:
-
Affects Version/s: 1.0.0

> DriverRunner doesn't support pulling HTTP/HTTPS URIs
> 
>
> Key: SPARK-6081
> URL: https://issues.apache.org/jira/browse/SPARK-6081
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.0.0
>Reporter: Timothy Chen
>Priority: Minor
>
> Standalone cluster mode according to the docs supports specifying http|https 
> jar urls, but when actually called the urls passed to the driver runner is 
> not able to pull http uris due to the usage of hadoopfs get.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3632) ConnectionManager can run out of receive threads with authentication on

2015-03-25 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380579#comment-14380579
 ] 

Andrew Or commented on SPARK-3632:
--

Ok, sounds good.

> ConnectionManager can run out of receive threads with authentication on
> ---
>
> Key: SPARK-3632
> URL: https://issues.apache.org/jira/browse/SPARK-3632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Fix For: 1.2.0
>
>
> If you turn authentication on and you are using a lot of executors. There is 
> a chance that all the of the threads in the handleMessageExecutor could be 
> waiting to send a message because they are blocked waiting on authentication 
> to happen. This can cause a temporary deadlock until the connection times out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6521) executors in the same node read local shuffle file

2015-03-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6521:
-
Affects Version/s: 1.2.0

> executors in the same node read local shuffle file
> --
>
> Key: SPARK-6521
> URL: https://issues.apache.org/jira/browse/SPARK-6521
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: xukun
>
> In the past, executor read other executor's shuffle file in the same node by 
> net. This pr make that executors in the same node read local shuffle file In 
> sort-based Shuffle. It will reduce net transport.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6537) UIWorkloadGenerator: The main thread should not stop SparkContext until all jobs finish

2015-03-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6537:
-
Priority: Minor  (was: Major)

> UIWorkloadGenerator: The main thread should not stop SparkContext until all 
> jobs finish
> ---
>
> Key: SPARK-6537
> URL: https://issues.apache.org/jira/browse/SPARK-6537
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Kousuke Saruta
>Priority: Minor
>
> The main thread of UIWorkloadGenerator spawn sub threads to launch jobs but 
> the main thread stop SparkContext without waiting for finishing those threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6537) UIWorkloadGenerator: The main thread should not stop SparkContext until all jobs finish

2015-03-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6537.

   Resolution: Fixed
Fix Version/s: 1.4.0
 Assignee: Kousuke Saruta

> UIWorkloadGenerator: The main thread should not stop SparkContext until all 
> jobs finish
> ---
>
> Key: SPARK-6537
> URL: https://issues.apache.org/jira/browse/SPARK-6537
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 1.4.0
>
>
> The main thread of UIWorkloadGenerator spawn sub threads to launch jobs but 
> the main thread stop SparkContext without waiting for finishing those threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6537) UIWorkloadGenerator: The main thread should not stop SparkContext until all jobs finish

2015-03-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6537:
-
Affects Version/s: (was: 1.4.0)
   1.0.0

> UIWorkloadGenerator: The main thread should not stop SparkContext until all 
> jobs finish
> ---
>
> Key: SPARK-6537
> URL: https://issues.apache.org/jira/browse/SPARK-6537
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Kousuke Saruta
>Priority: Minor
> Fix For: 1.4.0
>
>
> The main thread of UIWorkloadGenerator spawn sub threads to launch jobs but 
> the main thread stop SparkContext without waiting for finishing those threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called

2015-03-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5771.

  Resolution: Fixed
   Fix Version/s: 1.4.0
Target Version/s: 1.4.0

> Number of Cores in Completed Applications of Standalone Master Web Page 
> always be 0 if sc.stop() is called
> --
>
> Key: SPARK-5771
> URL: https://issues.apache.org/jira/browse/SPARK-5771
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
>Priority: Minor
> Fix For: 1.4.0
>
>
> In Standalone mode, the number of cores in Completed Applications of the 
> Master Web Page will always be zero, if sc.stop() is called.
> But the number will always be right, if sc.stop() is not called.
> The reason maybe: 
> after sc.stop() is called, the function removeExecutor of class 
> ApplicationInfo will be called, thus reduce the variable coresGranted to 
> zero.  The variable coresGranted is used to display the number of Cores on 
> the Web Page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6079) Use index to speed up StatusTracker.getJobIdsForGroup()

2015-03-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6079.

  Resolution: Fixed
Target Version/s: 1.4.0

> Use index to speed up StatusTracker.getJobIdsForGroup()
> ---
>
> Key: SPARK-6079
> URL: https://issues.apache.org/jira/browse/SPARK-6079
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Minor
> Fix For: 1.4.0
>
>
> {{StatusTracker.getJobIdsForGroup()}} is implemented via a linear scan over a 
> HashMap rather than using an index.  This might be an expensive operation if 
> there are many (e.g. thousands) of retained jobs.  We can add a new index to 
> speed this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6132) Context cleaner race condition across SparkContexts

2015-03-25 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381133#comment-14381133
 ] 

Andrew Or commented on SPARK-6132:
--

Looks like this is back ported in all target branches now. Thanks [~srowen].

> Context cleaner race condition across SparkContexts
> ---
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.1.2, 1.2.2, 1.3.1, 1.4.0
>
>
> The context cleaner thread is not stopped properly. If a SparkContext is 
> started immediately after one stops, the context cleaner of the former can 
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to 
> terminate asynchronously, but the code to clean broadcasts goes through 
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext. 
> This is likely to be the cause of the `JavaAPISuite`, which creates many 
> back-to-back SparkContexts, being flaky.
> The right behavior is to wait until all currently running clean up tasks have 
> finished.
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> ...
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 
> of broadcast_0
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4346) YarnClientSchedulerBack.asyncMonitorApplication should be common with Client.monitorApplication

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4346:
-
Affects Version/s: 1.0.0

> YarnClientSchedulerBack.asyncMonitorApplication should be common with 
> Client.monitorApplication
> ---
>
> Key: SPARK-4346
> URL: https://issues.apache.org/jira/browse/SPARK-4346
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, YARN
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> The YarnClientSchedulerBackend.asyncMonitorApplication routine should move 
> into ClientBase and be made common with monitorApplication.  Make sure stop 
> is handled properly.
> See discussion on https://github.com/apache/spark/pull/3143



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6443) Could not submit app in standalone cluster mode when HA is enabled

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6443:
-
Affects Version/s: 1.0.0

> Could not submit app in standalone cluster mode when HA is enabled
> --
>
> Key: SPARK-6443
> URL: https://issues.apache.org/jira/browse/SPARK-6443
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.0.0
>Reporter: Tao Wang
>Priority: Critical
>
> After digging some codes, I found user could not submit app in standalone 
> cluster mode when HA is enabled. But in client mode it can work.
> Haven't try yet. But I will verify this and file a PR to resolve it if the 
> problem exists.
> 3/23 update:
> I started a HA cluster with zk, and tried to submit SparkPi example with 
> command:
> ./spark-submit  --class org.apache.spark.examples.SparkPi --master 
> spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
> ../lib/spark-examples-1.2.0-hadoop2.4.0.jar 
> and it failed with error message:
> Spark assembly has been built with Hive, including Datanucleus jars on 
> classpath
> 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
> spark://doggie153:7077,doggie159:7077
> akka.actor.ActorInitializationException: exception during creation
> at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
> at akka.actor.ActorCell.create(ActorCell.scala:596)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: org.apache.spark.SparkException: Invalid master URL: 
> spark://doggie153:7077,doggie159:7077
> at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
> at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42)
> at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35)
> at akka.actor.ActorCell.create(ActorCell.scala:580)
> ... 9 more
> But in client mode it ended with correct result. So my guess is right. I will 
> fix it in the related PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6443) Support HA in standalone cluster modehen HA is enabled

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6443:
-
Summary: Support HA in standalone cluster modehen HA is enabled  (was: 
Could not submit app in standalone cluster mode when HA is enabled)

> Support HA in standalone cluster modehen HA is enabled
> --
>
> Key: SPARK-6443
> URL: https://issues.apache.org/jira/browse/SPARK-6443
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.0.0
>Reporter: Tao Wang
>Priority: Critical
>
> After digging some codes, I found user could not submit app in standalone 
> cluster mode when HA is enabled. But in client mode it can work.
> Haven't try yet. But I will verify this and file a PR to resolve it if the 
> problem exists.
> 3/23 update:
> I started a HA cluster with zk, and tried to submit SparkPi example with 
> command:
> ./spark-submit  --class org.apache.spark.examples.SparkPi --master 
> spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
> ../lib/spark-examples-1.2.0-hadoop2.4.0.jar 
> and it failed with error message:
> Spark assembly has been built with Hive, including Datanucleus jars on 
> classpath
> 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
> spark://doggie153:7077,doggie159:7077
> akka.actor.ActorInitializationException: exception during creation
> at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
> at akka.actor.ActorCell.create(ActorCell.scala:596)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: org.apache.spark.SparkException: Invalid master URL: 
> spark://doggie153:7077,doggie159:7077
> at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
> at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42)
> at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35)
> at akka.actor.ActorCell.create(ActorCell.scala:580)
> ... 9 more
> But in client mode it ended with correct result. So my guess is right. I will 
> fix it in the related PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6443:
-
Priority: Major  (was: Critical)

> Support HA in standalone cluster mode
> -
>
> Key: SPARK-6443
> URL: https://issues.apache.org/jira/browse/SPARK-6443
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.0.0
>Reporter: Tao Wang
>
> After digging some codes, I found user could not submit app in standalone 
> cluster mode when HA is enabled. But in client mode it can work.
> Haven't try yet. But I will verify this and file a PR to resolve it if the 
> problem exists.
> 3/23 update:
> I started a HA cluster with zk, and tried to submit SparkPi example with 
> command:
> ./spark-submit  --class org.apache.spark.examples.SparkPi --master 
> spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
> ../lib/spark-examples-1.2.0-hadoop2.4.0.jar 
> and it failed with error message:
> Spark assembly has been built with Hive, including Datanucleus jars on 
> classpath
> 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
> spark://doggie153:7077,doggie159:7077
> akka.actor.ActorInitializationException: exception during creation
> at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
> at akka.actor.ActorCell.create(ActorCell.scala:596)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: org.apache.spark.SparkException: Invalid master URL: 
> spark://doggie153:7077,doggie159:7077
> at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
> at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42)
> at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35)
> at akka.actor.ActorCell.create(ActorCell.scala:580)
> ... 9 more
> But in client mode it ended with correct result. So my guess is right. I will 
> fix it in the related PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6443:
-
Summary: Support HA in standalone cluster mode  (was: Support HA in 
standalone cluster modehen HA is enabled)

> Support HA in standalone cluster mode
> -
>
> Key: SPARK-6443
> URL: https://issues.apache.org/jira/browse/SPARK-6443
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.0.0
>Reporter: Tao Wang
>Priority: Critical
>
> After digging some codes, I found user could not submit app in standalone 
> cluster mode when HA is enabled. But in client mode it can work.
> Haven't try yet. But I will verify this and file a PR to resolve it if the 
> problem exists.
> 3/23 update:
> I started a HA cluster with zk, and tried to submit SparkPi example with 
> command:
> ./spark-submit  --class org.apache.spark.examples.SparkPi --master 
> spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
> ../lib/spark-examples-1.2.0-hadoop2.4.0.jar 
> and it failed with error message:
> Spark assembly has been built with Hive, including Datanucleus jars on 
> classpath
> 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
> spark://doggie153:7077,doggie159:7077
> akka.actor.ActorInitializationException: exception during creation
> at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
> at akka.actor.ActorCell.create(ActorCell.scala:596)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: org.apache.spark.SparkException: Invalid master URL: 
> spark://doggie153:7077,doggie159:7077
> at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
> at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42)
> at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35)
> at akka.actor.ActorCell.create(ActorCell.scala:580)
> ... 9 more
> But in client mode it ended with correct result. So my guess is right. I will 
> fix it in the related PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6443:
-
Description: 
After digging some codes, I found user could not submit app in standalone 
cluster mode when HA is enabled. But in client mode it can work.

Haven't try yet. But I will verify this and file a PR to resolve it if the 
problem exists.

3/23 update:
I started a HA cluster with zk, and tried to submit SparkPi example with 
command:
./spark-submit  --class org.apache.spark.examples.SparkPi --master 
spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
../lib/spark-examples-1.2.0-hadoop2.4.0.jar 

and it failed with error message:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
akka.actor.ActorInitializationException: exception during creation
at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
at akka.actor.ActorCell.create(ActorCell.scala:596)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.spark.SparkException: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42)
at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35)
at akka.actor.ActorCell.create(ActorCell.scala:580)
... 9 more

But in client mode it ended with correct result. So my guess is right. I will 
fix it in the related PR.

=== EDIT by Andrew ===

>From a quick survey in the code I can confirm that client mode does support 
>this. [This 
>line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162]
> splits the master URLs by comma and passes these URLs into the AppClient. In 
>standalone cluster mode, there is not equivalent logic to even split the 
>master URLs, whether in the old submission gateway (o.a.s.deploy.Client) or in 
>the new one (o.a.s.deploy.rest.StandaloneRestClient).

Thus, this is an unsupported feature, not a bug!

  was:
After digging some codes, I found user could not submit app in standalone 
cluster mode when HA is enabled. But in client mode it can work.

Haven't try yet. But I will verify this and file a PR to resolve it if the 
problem exists.

3/23 update:
I started a HA cluster with zk, and tried to submit SparkPi example with 
command:
./spark-submit  --class org.apache.spark.examples.SparkPi --master 
spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
../lib/spark-examples-1.2.0-hadoop2.4.0.jar 

and it failed with error message:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
akka.actor.ActorInitializationException: exception during creation
at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
at akka.actor.ActorCell.create(ActorCell.scala:596)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.spark.SparkException: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42)
at akka.actor.Actor$class.ar

[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6443:
-
Description: 
== EDIT by Andrew ==

>From a quick survey in the code I can confirm that client mode does support 
>this. [This 
>line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162]
> splits the master URLs by comma and passes these URLs into the AppClient. In 
>standalone cluster mode, there is simply no equivalent logic to even split the 
>master URLs, whether in the old submission gateway (o.a.s.deploy.Client) or in 
>the new one (o.a.s.deploy.rest.StandaloneRestClient).

Thus, this is an unsupported feature, not a bug!

== Original description from Tao Wang ==

After digging some codes, I found user could not submit app in standalone 
cluster mode when HA is enabled. But in client mode it can work.

Haven't try yet. But I will verify this and file a PR to resolve it if the 
problem exists.

3/23 update:
I started a HA cluster with zk, and tried to submit SparkPi example with 
command:
./spark-submit  --class org.apache.spark.examples.SparkPi --master 
spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
../lib/spark-examples-1.2.0-hadoop2.4.0.jar 

and it failed with error message:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
akka.actor.ActorInitializationException: exception during creation
at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
at akka.actor.ActorCell.create(ActorCell.scala:596)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.spark.SparkException: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42)
at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35)
at akka.actor.ActorCell.create(ActorCell.scala:580)
... 9 more

But in client mode it ended with correct result. So my guess is right. I will 
fix it in the related PR.

  was:
== EDIT by Andrew ==

>From a quick survey in the code I can confirm that client mode does support 
>this. [This 
>line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162]
> splits the master URLs by comma and passes these URLs into the AppClient. In 
>standalone cluster mode, there is not equivalent logic to even split the 
>master URLs, whether in the old submission gateway (o.a.s.deploy.Client) or in 
>the new one (o.a.s.deploy.rest.StandaloneRestClient).

Thus, this is an unsupported feature, not a bug!

== Original description from Tao Wang ==

After digging some codes, I found user could not submit app in standalone 
cluster mode when HA is enabled. But in client mode it can work.

Haven't try yet. But I will verify this and file a PR to resolve it if the 
problem exists.

3/23 update:
I started a HA cluster with zk, and tried to submit SparkPi example with 
command:
./spark-submit  --class org.apache.spark.examples.SparkPi --master 
spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
../lib/spark-examples-1.2.0-hadoop2.4.0.jar 

and it failed with error message:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
akka.actor.ActorInitializationException: exception during creation
at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
at akka.actor.ActorCell.create(ActorCell.scala:596)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala

[jira] [Updated] (SPARK-6443) Support HA in standalone cluster mode

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6443:
-
Description: 
== EDIT by Andrew ==

>From a quick survey in the code I can confirm that client mode does support 
>this. [This 
>line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162]
> splits the master URLs by comma and passes these URLs into the AppClient. In 
>standalone cluster mode, there is not equivalent logic to even split the 
>master URLs, whether in the old submission gateway (o.a.s.deploy.Client) or in 
>the new one (o.a.s.deploy.rest.StandaloneRestClient).

Thus, this is an unsupported feature, not a bug!

== Original description from Tao Wang ==

After digging some codes, I found user could not submit app in standalone 
cluster mode when HA is enabled. But in client mode it can work.

Haven't try yet. But I will verify this and file a PR to resolve it if the 
problem exists.

3/23 update:
I started a HA cluster with zk, and tried to submit SparkPi example with 
command:
./spark-submit  --class org.apache.spark.examples.SparkPi --master 
spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
../lib/spark-examples-1.2.0-hadoop2.4.0.jar 

and it failed with error message:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
akka.actor.ActorInitializationException: exception during creation
at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
at akka.actor.ActorCell.create(ActorCell.scala:596)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.spark.SparkException: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42)
at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35)
at akka.actor.ActorCell.create(ActorCell.scala:580)
... 9 more

But in client mode it ended with correct result. So my guess is right. I will 
fix it in the related PR.

  was:
After digging some codes, I found user could not submit app in standalone 
cluster mode when HA is enabled. But in client mode it can work.

Haven't try yet. But I will verify this and file a PR to resolve it if the 
problem exists.

3/23 update:
I started a HA cluster with zk, and tried to submit SparkPi example with 
command:
./spark-submit  --class org.apache.spark.examples.SparkPi --master 
spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
../lib/spark-examples-1.2.0-hadoop2.4.0.jar 

and it failed with error message:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
akka.actor.ActorInitializationException: exception during creation
at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
at akka.actor.ActorCell.create(ActorCell.scala:596)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.spark.SparkException: Invalid master URL: 
spark://doggie153:7077,doggie159:7077
at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
at org.apache.spark.deploy.Cl

[jira] [Closed] (SPARK-6650) ExecutorAllocationManager never stops

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6650.

   Resolution: Fixed
Fix Version/s: 1.4.0
   1.3.1
 Assignee: Marcelo Vanzin

> ExecutorAllocationManager never stops
> -
>
> Key: SPARK-6650
> URL: https://issues.apache.org/jira/browse/SPARK-6650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 1.3.1, 1.4.0
>
>
> {{ExecutorAllocationManager}} doesn't even have a stop() method. That means 
> that when the owning SparkContext goes away, the internal thread it uses to 
> schedule its activities remains alive.
> That means it constantly spams the logs and does who knows what else that 
> could affect any future contexts that are allocated.
> It's particularly evil during unit tests, since it slows down everything else 
> after the suite is run, leaving multiple threads behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6673) spark-shell.cmd can't start even when spark was built in Windows

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6673:
-
Priority: Blocker  (was: Major)

> spark-shell.cmd can't start even when spark was built in Windows
> 
>
> Key: SPARK-6673
> URL: https://issues.apache.org/jira/browse/SPARK-6673
> Project: Spark
>  Issue Type: Bug
>  Components: Windows
>Affects Versions: 1.3.0
>Reporter: Masayoshi TSUZUKI
>Priority: Blocker
>
> spark-shell.cmd can't start.
> {code}
> bin\spark-shell.cmd --master local
> {code}
> will get
> {code}
> Failed to find Spark assembly JAR.
> You need to build Spark before running this program.
> {code}
> even when we have built spark.
> This is because of the lack of the environment {{SPARK_SCALA_VERSION}} which 
> is used in {{spark-class2.cmd}}.
> In linux scripts, this value is set as {{2.10}} or {{2.11}} by default in 
> {{load-spark-env.sh}}, but there are no equivalent script in Windows.
> As workaround, by executing
> {code}
> set SPARK_SCALA_VERSION=2.10
> {code}
> before execute spark-shell.cmd, we can successfully start it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6673) spark-shell.cmd can't start even when spark was built in Windows

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6673:
-
Target Version/s: 1.3.1, 1.4.0

> spark-shell.cmd can't start even when spark was built in Windows
> 
>
> Key: SPARK-6673
> URL: https://issues.apache.org/jira/browse/SPARK-6673
> Project: Spark
>  Issue Type: Bug
>  Components: Windows
>Affects Versions: 1.3.0
>Reporter: Masayoshi TSUZUKI
>Priority: Blocker
>
> spark-shell.cmd can't start.
> {code}
> bin\spark-shell.cmd --master local
> {code}
> will get
> {code}
> Failed to find Spark assembly JAR.
> You need to build Spark before running this program.
> {code}
> even when we have built spark.
> This is because of the lack of the environment {{SPARK_SCALA_VERSION}} which 
> is used in {{spark-class2.cmd}}.
> In linux scripts, this value is set as {{2.10}} or {{2.11}} by default in 
> {{load-spark-env.sh}}, but there are no equivalent script in Windows.
> As workaround, by executing
> {code}
> set SPARK_SCALA_VERSION=2.10
> {code}
> before execute spark-shell.cmd, we can successfully start it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6673) spark-shell.cmd can't start even when spark was built in Windows

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6673:
-
Assignee: Masayoshi TSUZUKI

> spark-shell.cmd can't start even when spark was built in Windows
> 
>
> Key: SPARK-6673
> URL: https://issues.apache.org/jira/browse/SPARK-6673
> Project: Spark
>  Issue Type: Bug
>  Components: Windows
>Affects Versions: 1.3.0
>Reporter: Masayoshi TSUZUKI
>Assignee: Masayoshi TSUZUKI
>Priority: Blocker
>
> spark-shell.cmd can't start.
> {code}
> bin\spark-shell.cmd --master local
> {code}
> will get
> {code}
> Failed to find Spark assembly JAR.
> You need to build Spark before running this program.
> {code}
> even when we have built spark.
> This is because of the lack of the environment {{SPARK_SCALA_VERSION}} which 
> is used in {{spark-class2.cmd}}.
> In linux scripts, this value is set as {{2.10}} or {{2.11}} by default in 
> {{load-spark-env.sh}}, but there are no equivalent script in Windows.
> As workaround, by executing
> {code}
> set SPARK_SCALA_VERSION=2.10
> {code}
> before execute spark-shell.cmd, we can successfully start it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6640) Executor may connect to HeartbeartReceiver before it's setup in the driver side

2015-04-02 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6640:
-
Assignee: Shixiong Zhu

> Executor may connect to HeartbeartReceiver before it's setup in the driver 
> side
> ---
>
> Key: SPARK-6640
> URL: https://issues.apache.org/jira/browse/SPARK-6640
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> Here is the current code about starting LocalBackend and creating 
> HeartbeatReceiver:
> {code}
>   // Create and start the scheduler
>   private[spark] var (schedulerBackend, taskScheduler) =
> SparkContext.createTaskScheduler(this, master)
>   private val heartbeatReceiver = env.actorSystem.actorOf(
> Props(new HeartbeatReceiver(this, taskScheduler)), "HeartbeatReceiver")
> {code}
> When creating LocalBackend, it will start `LocalActor`. `LocalActor` will   
> create Executor, and Executor's constructor will retrieve `HeartbeatReceiver`.
> So we should make sure this line:
> {code}
> private val heartbeatReceiver = env.actorSystem.actorOf(
> Props(new HeartbeatReceiver(this, taskScheduler)), "HeartbeatReceiver")
> {code}
> happen before "creating LocalActor".
> However, current codes can not guarantee that. Sometimes, creating Executor 
> will crash. The issue was reported by sparkdi  in 
> http://apache-spark-user-list.1001560.n3.nabble.com/Actor-not-found-td22265.html#a22324



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6640) Executor may connect to HeartbeartReceiver before it's setup in the driver side

2015-04-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6640.

  Resolution: Fixed
   Fix Version/s: 1.4.0
Target Version/s: 1.4.0

> Executor may connect to HeartbeartReceiver before it's setup in the driver 
> side
> ---
>
> Key: SPARK-6640
> URL: https://issues.apache.org/jira/browse/SPARK-6640
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 1.4.0
>
>
> Here is the current code about starting LocalBackend and creating 
> HeartbeatReceiver:
> {code}
>   // Create and start the scheduler
>   private[spark] var (schedulerBackend, taskScheduler) =
> SparkContext.createTaskScheduler(this, master)
>   private val heartbeatReceiver = env.actorSystem.actorOf(
> Props(new HeartbeatReceiver(this, taskScheduler)), "HeartbeatReceiver")
> {code}
> When creating LocalBackend, it will start `LocalActor`. `LocalActor` will   
> create Executor, and Executor's constructor will retrieve `HeartbeatReceiver`.
> So we should make sure this line:
> {code}
> private val heartbeatReceiver = env.actorSystem.actorOf(
> Props(new HeartbeatReceiver(this, taskScheduler)), "HeartbeatReceiver")
> {code}
> happen before "creating LocalActor".
> However, current codes can not guarantee that. Sometimes, creating Executor 
> will crash. The issue was reported by sparkdi  in 
> http://apache-spark-user-list.1001560.n3.nabble.com/Actor-not-found-td22265.html#a22324



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6688) EventLoggingListener should always operate on resolved URIs

2015-04-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6688.

   Resolution: Fixed
Fix Version/s: 1.4.0
   1.3.1
 Assignee: Marcelo Vanzin

> EventLoggingListener should always operate on resolved URIs
> ---
>
> Key: SPARK-6688
> URL: https://issues.apache.org/jira/browse/SPARK-6688
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 1.3.1, 1.4.0
>
>
> A small bug was introduced in 1.3.0, where a check in 
> EventLoggingListener.scala is performed on the non-resolved log path. This 
> means that if "fs.defaultFS" is not the local filesystem, and the user is 
> trying to store logs in the local filesystem by providing a path with no 
> "file:" protocol, thing will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6701) Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python application

2015-04-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6701:
-
Description: 
Observed in Master and 1.3, both in SBT and in Maven (with YARN).

{code}
Process 
List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
 --master, yarn-cluster, --num-executors, 1, --properties-file, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties,
 --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) 
exited with code 1

sbt.ForkMain$ForkError: Process 
List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
 --master, yarn-cluster, --num-executors, 1, --properties-file, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties,
 --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) 
exited with code 1
at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
{code}

  was:
Observed in Master and 1.3, both in SBT and in Maven (with YARN).

{code}
Error Message

Process 
List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
 --master, yarn-cluster, --num-executors, 1, --properties-file, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties,
 --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) 
exited with code 1

sbt.ForkMain$ForkError: Process 
List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
 --master, yarn-cluster, --num-executors, 1, --properties-file, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties,
 --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) 
exited with code 1
at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
{code}


> Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python application
> -
>
> Key: SPARK-6701
> URL: https://issues.apache.org/jira/browse/SPARK-6701
> Project: Spark
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Priority: Critical
>
> Observed in Master and 1.3, both in SBT and in Maven (with YARN).
> {code}
> Process 
> List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
>  --master, yarn-cluster, --num-executors, 1, --properties-file, 
> /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties,
>  --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, 
> /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, 
> /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) 
> exited with c

[jira] [Created] (SPARK-6701) Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python application

2015-04-03 Thread Andrew Or (JIRA)
Andrew Or created SPARK-6701:


 Summary: Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python 
application
 Key: SPARK-6701
 URL: https://issues.apache.org/jira/browse/SPARK-6701
 Project: Spark
  Issue Type: Bug
  Components: Tests, YARN
Affects Versions: 1.3.0
Reporter: Andrew Or
Priority: Critical


Observed in Master and 1.3, both in SBT and in Maven (with YARN).

{code}
Error Message

Process 
List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
 --master, yarn-cluster, --num-executors, 1, --properties-file, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties,
 --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) 
exited with code 1

sbt.ForkMain$ForkError: Process 
List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
 --master, yarn-cluster, --num-executors, 1, --properties-file, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties,
 --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, 
/tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) 
exited with code 1
at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6700) flaky test: run Python application in yarn-cluster mode

2015-04-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6700.

Resolution: Fixed

> flaky test: run Python application in yarn-cluster mode 
> 
>
> Key: SPARK-6700
> URL: https://issues.apache.org/jira/browse/SPARK-6700
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Davies Liu
>Assignee: Lianhui Wang
>Priority: Critical
>  Labels: test, yarn
>
> org.apache.spark.deploy.yarn.YarnClusterSuite.run Python application in 
> yarn-cluster mode
> Failing for the past 1 build (Since Failed#2025 )
> Took 12 sec.
> Error Message
> {code}
> Process 
> List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
>  --master, yarn-cluster, --num-executors, 1, --properties-file, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties,
>  --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp)
>  exited with code 1
> Stacktrace
> sbt.ForkMain$ForkError: Process 
> List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
>  --master, yarn-cluster, --num-executors, 1, --properties-file, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties,
>  --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp)
>  exited with code 1
>   at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
>   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.org$scalatest$BeforeAndAfterAll$$super$run(YarnClusterSuite.scala:44)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(Before

[jira] [Reopened] (SPARK-6700) flaky test: run Python application in yarn-cluster mode

2015-04-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reopened SPARK-6700:
--

> flaky test: run Python application in yarn-cluster mode 
> 
>
> Key: SPARK-6700
> URL: https://issues.apache.org/jira/browse/SPARK-6700
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Davies Liu
>Assignee: Lianhui Wang
>Priority: Critical
>  Labels: test, yarn
>
> org.apache.spark.deploy.yarn.YarnClusterSuite.run Python application in 
> yarn-cluster mode
> Failing for the past 1 build (Since Failed#2025 )
> Took 12 sec.
> Error Message
> {code}
> Process 
> List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
>  --master, yarn-cluster, --num-executors, 1, --properties-file, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties,
>  --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp)
>  exited with code 1
> Stacktrace
> sbt.ForkMain$ForkError: Process 
> List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
>  --master, yarn-cluster, --num-executors, 1, --properties-file, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties,
>  --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp)
>  exited with code 1
>   at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
>   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.org$scalatest$BeforeAndAfterAll$$super$run(YarnClusterSuite.scala:44)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:

[jira] [Closed] (SPARK-6700) flaky test: run Python application in yarn-cluster mode

2015-04-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6700.

Resolution: Duplicate

> flaky test: run Python application in yarn-cluster mode 
> 
>
> Key: SPARK-6700
> URL: https://issues.apache.org/jira/browse/SPARK-6700
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Davies Liu
>Assignee: Lianhui Wang
>Priority: Critical
>  Labels: test, yarn
>
> org.apache.spark.deploy.yarn.YarnClusterSuite.run Python application in 
> yarn-cluster mode
> Failing for the past 1 build (Since Failed#2025 )
> Took 12 sec.
> Error Message
> {code}
> Process 
> List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
>  --master, yarn-cluster, --num-executors, 1, --properties-file, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties,
>  --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp)
>  exited with code 1
> Stacktrace
> sbt.ForkMain$ForkError: Process 
> List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
>  --master, yarn-cluster, --num-executors, 1, --properties-file, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/spark3554401802242467930.properties,
>  --py-files, /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test2.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/test.py, 
> /tmp/spark-451f65e7-8e13-404f-ae7a-12a0d0394f09/result8930129095246825990.tmp)
>  exited with code 1
>   at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
>   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.org$scalatest$BeforeAndAfterAll$$super$run(YarnClusterSuite.scala:44)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(Be

[jira] [Updated] (SPARK-6703) Provide a way to discover existing SparkContext's

2015-04-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6703:
-
Target Version/s: 1.4.0

> Provide a way to discover existing SparkContext's
> -
>
> Key: SPARK-6703
> URL: https://issues.apache.org/jira/browse/SPARK-6703
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Patrick Wendell
>
> Right now it is difficult to write a Spark application in a way that can be 
> run independently and also be composed with other Spark applications in an 
> environment such as the JobServer, notebook servers, etc where there is a 
> shared SparkContext.
> It would be nice to provide a rendez-vous point so that applications can 
> learn whether an existing SparkContext already exists before creating one.
> The most simple/surgical way I see to do this is to have an optional static 
> SparkContext singleton that people can be retrieved as follows:
> {code}
> val sc = SparkContext.getOrCreate(conf = new SparkConf())
> {code}
> And you could also have a setter where some outer framework/server can set it 
> for use by multiple downstream applications.
> A more advanced version of this would have some named registry or something, 
> but since we only support a single SparkContext in one JVM at this point 
> anyways, this seems sufficient and much simpler. Another advanced option 
> would be to allow plugging in some other notion of configuration you'd pass 
> when retrieving an existing context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6703) Provide a way to discover existing SparkContext's

2015-04-04 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6703:
-
Affects Version/s: 1.3.0

> Provide a way to discover existing SparkContext's
> -
>
> Key: SPARK-6703
> URL: https://issues.apache.org/jira/browse/SPARK-6703
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Patrick Wendell
>
> Right now it is difficult to write a Spark application in a way that can be 
> run independently and also be composed with other Spark applications in an 
> environment such as the JobServer, notebook servers, etc where there is a 
> shared SparkContext.
> It would be nice to provide a rendez-vous point so that applications can 
> learn whether an existing SparkContext already exists before creating one.
> The most simple/surgical way I see to do this is to have an optional static 
> SparkContext singleton that people can be retrieved as follows:
> {code}
> val sc = SparkContext.getOrCreate(conf = new SparkConf())
> {code}
> And you could also have a setter where some outer framework/server can set it 
> for use by multiple downstream applications.
> A more advanced version of this would have some named registry or something, 
> but since we only support a single SparkContext in one JVM at this point 
> anyways, this seems sufficient and much simpler. Another advanced option 
> would be to allow plugging in some other notion of configuration you'd pass 
> when retrieving an existing context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3596) Support changing the yarn client monitor interval

2015-04-08 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-3596.

  Resolution: Fixed
   Fix Version/s: 1.4.0
Assignee: Weizhong
Target Version/s: 1.4.0

> Support changing the yarn client monitor interval 
> --
>
> Key: SPARK-3596
> URL: https://issues.apache.org/jira/browse/SPARK-3596
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>Assignee: Weizhong
> Fix For: 1.4.0
>
>
> Right now spark on yarn has a monitor interval that can be configured by 
> spark.yarn.report.interval.  This is how often the client checks with the RM 
> to get status on the running application in cluster mode.   We should allow 
> users to set this interval as some may not need to check so often.   There is 
> another jira filed to make it so the client doesn't have to stay around for 
> cluster mode.
> With the changes in https://github.com/apache/spark/pull/2350, it further 
> extends that to affect client mode. 
> We may want to add in specific configs for that since the monitorApplication 
> function is now used in multiple different scenarios it actually might make 
> sense for it to take the timeout as a parameter. You could want different 
> timeout for different situations.
> for instance how quickly we poll on client side and print information 
> (cluster mode) vs how quickly we recognize the application quit and we want 
> to terminate (client mode), I want the latter to happen quickly where as in 
> cluster mode I might not care as much about how often it is printing updated 
> info to the screen. I guess its private so we could leave it as is and change 
> if we add support for that later.
> my suggestion for name would be something like 
> spark.yarn.client.progress.pollinterval. If we were to add separate ones in 
> the future then they could be something like 
> spark.yarn.app.ready.pollinterval and spark.yarn.app.completion.pollinterval 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4346) YarnClientSchedulerBack.asyncMonitorApplication should be common with Client.monitorApplication

2015-04-08 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4346.

  Resolution: Fixed
   Fix Version/s: 1.4.0
Assignee: Weizhong
Target Version/s: 1.4.0

> YarnClientSchedulerBack.asyncMonitorApplication should be common with 
> Client.monitorApplication
> ---
>
> Key: SPARK-4346
> URL: https://issues.apache.org/jira/browse/SPARK-4346
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, YARN
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>Assignee: Weizhong
> Fix For: 1.4.0
>
>
> The YarnClientSchedulerBackend.asyncMonitorApplication routine should move 
> into ClientBase and be made common with monitorApplication.  Make sure stop 
> is handled properly.
> See discussion on https://github.com/apache/spark/pull/3143



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3596) Support changing the yarn client monitor interval

2015-04-08 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486014#comment-14486014
 ] 

Andrew Or commented on SPARK-3596:
--

Transitively fixed by https://github.com/apache/spark/pull/5305

> Support changing the yarn client monitor interval 
> --
>
> Key: SPARK-3596
> URL: https://issues.apache.org/jira/browse/SPARK-3596
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
> Fix For: 1.4.0
>
>
> Right now spark on yarn has a monitor interval that can be configured by 
> spark.yarn.report.interval.  This is how often the client checks with the RM 
> to get status on the running application in cluster mode.   We should allow 
> users to set this interval as some may not need to check so often.   There is 
> another jira filed to make it so the client doesn't have to stay around for 
> cluster mode.
> With the changes in https://github.com/apache/spark/pull/2350, it further 
> extends that to affect client mode. 
> We may want to add in specific configs for that since the monitorApplication 
> function is now used in multiple different scenarios it actually might make 
> sense for it to take the timeout as a parameter. You could want different 
> timeout for different situations.
> for instance how quickly we poll on client side and print information 
> (cluster mode) vs how quickly we recognize the application quit and we want 
> to terminate (client mode), I want the latter to happen quickly where as in 
> cluster mode I might not care as much about how often it is printing updated 
> info to the screen. I guess its private so we could leave it as is and change 
> if we add support for that later.
> my suggestion for name would be something like 
> spark.yarn.client.progress.pollinterval. If we were to add separate ones in 
> the future then they could be something like 
> spark.yarn.app.ready.pollinterval and spark.yarn.app.completion.pollinterval 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5931) Use consistent naming for time properties

2015-04-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5931.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Use consistent naming for time properties
> -
>
> Key: SPARK-5931
> URL: https://issues.apache.org/jira/browse/SPARK-5931
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Ilya Ganelin
> Fix For: 1.4.0
>
>
> This is SPARK-5932's sister issue.
> The naming of existing time configs is inconsistent. We currently have the 
> following throughout the code base:
> {code}
> spark.network.timeout // seconds
> spark.executor.heartbeatInterval // milliseconds
> spark.storage.blockManagerSlaveTimeoutMs // milliseconds
> spark.yarn.scheduler.heartbeat.interval-ms // milliseconds
> {code}
> Instead, my proposal is to simplify the config name itself and make 
> everything accept time using the following format: 5s, 2ms, 100us. For 
> instance:
> {code}
> spark.network.timeout = 5s
> spark.executor.heartbeatInterval = 500ms
> spark.storage.blockManagerSlaveTimeout = 100ms
> spark.yarn.scheduler.heartbeatInterval = 400ms
> {code}
> All existing configs that are relevant will be deprecated in favor of the new 
> ones. We should do this soon before we keep introducing more time configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5931) Use consistent naming for time properties

2015-04-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5931:
-
Assignee: Ilya Ganelin  (was: Andrew Or)

> Use consistent naming for time properties
> -
>
> Key: SPARK-5931
> URL: https://issues.apache.org/jira/browse/SPARK-5931
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Ilya Ganelin
> Fix For: 1.4.0
>
>
> This is SPARK-5932's sister issue.
> The naming of existing time configs is inconsistent. We currently have the 
> following throughout the code base:
> {code}
> spark.network.timeout // seconds
> spark.executor.heartbeatInterval // milliseconds
> spark.storage.blockManagerSlaveTimeoutMs // milliseconds
> spark.yarn.scheduler.heartbeat.interval-ms // milliseconds
> {code}
> Instead, my proposal is to simplify the config name itself and make 
> everything accept time using the following format: 5s, 2ms, 100us. For 
> instance:
> {code}
> spark.network.timeout = 5s
> spark.executor.heartbeatInterval = 500ms
> spark.storage.blockManagerSlaveTimeout = 100ms
> spark.yarn.scheduler.heartbeatInterval = 400ms
> {code}
> All existing configs that are relevant will be deprecated in favor of the new 
> ones. We should do this soon before we keep introducing more time configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6890) Local cluster mode in Mac is broken

2015-04-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6890:
-
Affects Version/s: 1.4.0

> Local cluster mode in Mac is broken
> ---
>
> Key: SPARK-6890
> URL: https://issues.apache.org/jira/browse/SPARK-6890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>
> The worker can not be launched, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6890) Local cluster mode in Mac is broken

2015-04-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6890:
-
Priority: Critical  (was: Blocker)

> Local cluster mode in Mac is broken
> ---
>
> Key: SPARK-6890
> URL: https://issues.apache.org/jira/browse/SPARK-6890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Critical
>
> In master, local cluster mode is broken. If I run `bin/spark-submit --master 
> local-cluster[2,1,512]`, my executors keep failing with class not found 
> exception. It appears that the assembly jar is not added to the executors' 
> class paths. I suspect that this is caused by 
> https://github.com/apache/spark/pull/5085.
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option
>   at java.lang.Class.getDeclaredMethods0(Native Method)
>   at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
>   at java.lang.Class.getMethod0(Class.java:2774)
>   at java.lang.Class.getMethod(Class.java:1663)
>   at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
>   at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
> Caused by: java.lang.ClassNotFoundException: scala.Option
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6890) Local cluster mode is broken

2015-04-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6890:
-
Summary: Local cluster mode is broken  (was: Local cluster mode in Mac is 
broken)

> Local cluster mode is broken
> 
>
> Key: SPARK-6890
> URL: https://issues.apache.org/jira/browse/SPARK-6890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Critical
>
> In master, local cluster mode is broken. If I run `bin/spark-submit --master 
> local-cluster[2,1,512]`, my executors keep failing with class not found 
> exception. It appears that the assembly jar is not added to the executors' 
> class paths. I suspect that this is caused by 
> https://github.com/apache/spark/pull/5085.
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option
>   at java.lang.Class.getDeclaredMethods0(Native Method)
>   at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
>   at java.lang.Class.getMethod0(Class.java:2774)
>   at java.lang.Class.getMethod(Class.java:1663)
>   at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
>   at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
> Caused by: java.lang.ClassNotFoundException: scala.Option
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6890) Local cluster mode in Mac is broken

2015-04-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6890:
-
Description: 
In master, local cluster mode is broken. If I run `bin/spark-submit --master 
local-cluster[2,1,512]`, my executors keep failing with class not found 
exception. It appears that the assembly jar is not added to the executors' 
class paths. I suspect that this is caused by 
https://github.com/apache/spark/pull/5085.

{code}
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
at java.lang.Class.getMethod0(Class.java:2774)
at java.lang.Class.getMethod(Class.java:1663)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: scala.Option
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
{code}

  was:The worker can not be launched, 


> Local cluster mode in Mac is broken
> ---
>
> Key: SPARK-6890
> URL: https://issues.apache.org/jira/browse/SPARK-6890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>
> In master, local cluster mode is broken. If I run `bin/spark-submit --master 
> local-cluster[2,1,512]`, my executors keep failing with class not found 
> exception. It appears that the assembly jar is not added to the executors' 
> class paths. I suspect that this is caused by 
> https://github.com/apache/spark/pull/5085.
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option
>   at java.lang.Class.getDeclaredMethods0(Native Method)
>   at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
>   at java.lang.Class.getMethod0(Class.java:2774)
>   at java.lang.Class.getMethod(Class.java:1663)
>   at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
>   at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
> Caused by: java.lang.ClassNotFoundException: scala.Option
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4848) Allow different Worker configurations in standalone cluster

2015-04-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4848.

  Resolution: Fixed
   Fix Version/s: 1.4.0
Assignee: Nathan Kronenfeld
Target Version/s: 1.4.0

> Allow different Worker configurations in standalone cluster
> ---
>
> Key: SPARK-4848
> URL: https://issues.apache.org/jira/browse/SPARK-4848
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
> Environment: stand-alone spark cluster
>Reporter: Nathan Kronenfeld
>Assignee: Nathan Kronenfeld
> Fix For: 1.4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> On a stand-alone spark cluster, much of the determination of worker 
> specifics, especially one has multiple instances per node, is done only on 
> the master.
> The master loops over instances, and starts a worker per instance on each 
> node.
> This means, if your workers have different values of SPARK_WORKER_INSTANCES 
> or SPARK_WORKER_WEBUI_PORT from each other (or from the master), all values 
> are ignored except the one on the master.
> SPARK_WORKER_PORT looks like it is unread in scripts, but read in code - I'm 
> not sure how it will behave, since all instances will read the same value 
> from the environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6890) Local cluster mode is broken

2015-04-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6890:
-
Assignee: Marcelo Vanzin  (was: Andrew Or)

> Local cluster mode is broken
> 
>
> Key: SPARK-6890
> URL: https://issues.apache.org/jira/browse/SPARK-6890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Davies Liu
>Assignee: Marcelo Vanzin
>Priority: Critical
>
> In master, local cluster mode is broken. If I run `bin/spark-submit --master 
> local-cluster[2,1,512]`, my executors keep failing with class not found 
> exception. It appears that the assembly jar is not added to the executors' 
> class paths. I suspect that this is caused by 
> https://github.com/apache/spark/pull/5085.
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option
>   at java.lang.Class.getDeclaredMethods0(Native Method)
>   at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
>   at java.lang.Class.getMethod0(Class.java:2774)
>   at java.lang.Class.getMethod(Class.java:1663)
>   at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
>   at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
> Caused by: java.lang.ClassNotFoundException: scala.Option
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6890) Local cluster mode is broken

2015-04-13 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493448#comment-14493448
 ] 

Andrew Or commented on SPARK-6890:
--

I'm not actively working on this. Feel free to fix it since you and Nishkam 
have more experience in that part of the code.

> Local cluster mode is broken
> 
>
> Key: SPARK-6890
> URL: https://issues.apache.org/jira/browse/SPARK-6890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Critical
>
> In master, local cluster mode is broken. If I run `bin/spark-submit --master 
> local-cluster[2,1,512]`, my executors keep failing with class not found 
> exception. It appears that the assembly jar is not added to the executors' 
> class paths. I suspect that this is caused by 
> https://github.com/apache/spark/pull/5085.
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option
>   at java.lang.Class.getDeclaredMethods0(Native Method)
>   at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
>   at java.lang.Class.getMethod0(Class.java:2774)
>   at java.lang.Class.getMethod(Class.java:1663)
>   at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
>   at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
> Caused by: java.lang.ClassNotFoundException: scala.Option
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    2   3   4   5   6   7   8   9   10   11   >