[jira] [Closed] (SPARK-4506) Update documentation to clarify whether standalone-cluster mode is now officially supported

2014-12-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4506.

  Resolution: Fixed
   Fix Version/s: (was: 1.1.2)
  1.1.1
Target Version/s: 1.1.1, 1.2.0  (was: 1.2.0, 1.1.2)

> Update documentation to clarify whether standalone-cluster mode is now 
> officially supported
> ---
>
> Key: SPARK-4506
> URL: https://issues.apache.org/jira/browse/SPARK-4506
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
>Reporter: Josh Rosen
>Assignee: Andrew Or
> Fix For: 1.1.1, 1.2.0
>
>
> The "Launching Compiled Spark Applications" section of the Spark Standalone 
> docs claims that standalone mode only supports {{client}} deploy mode:
> {quote}
> The spark-submit script provides the most straightforward way to submit a 
> compiled Spark application to the cluster. For standalone clusters, Spark 
> currently only supports deploying the driver inside the client process that 
> is submitting the application (client deploy mode).
> {quote}
> It looks like {{standalone-cluster}} mode actually works (I've used it and 
> have heard from users that are successfully using it, too).
> The current line was added in SPARK-2259 when {{standalone-cluster}} mode 
> wasn't officially supported.  It looks like SPARK-2260 fixed a number of bugs 
> in {{standalone-cluster}} mode, so we should update the documentation if 
> we're now ready to officially support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4506) Update documentation to clarify whether standalone-cluster mode is now officially supported

2014-12-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4506.

   Resolution: Fixed
Fix Version/s: 1.1.2
   1.2.0

This was actually fixed by https://github.com/apache/spark/pull/2461

> Update documentation to clarify whether standalone-cluster mode is now 
> officially supported
> ---
>
> Key: SPARK-4506
> URL: https://issues.apache.org/jira/browse/SPARK-4506
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
>Reporter: Josh Rosen
>Assignee: Andrew Or
> Fix For: 1.2.0, 1.1.2
>
>
> The "Launching Compiled Spark Applications" section of the Spark Standalone 
> docs claims that standalone mode only supports {{client}} deploy mode:
> {quote}
> The spark-submit script provides the most straightforward way to submit a 
> compiled Spark application to the cluster. For standalone clusters, Spark 
> currently only supports deploying the driver inside the client process that 
> is submitting the application (client deploy mode).
> {quote}
> It looks like {{standalone-cluster}} mode actually works (I've used it and 
> have heard from users that are successfully using it, too).
> The current line was added in SPARK-2259 when {{standalone-cluster}} mode 
> wasn't officially supported.  It looks like SPARK-2260 fixed a number of bugs 
> in {{standalone-cluster}} mode, so we should update the documentation if 
> we're now ready to officially support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4771) Document standalone --supervise feature

2014-12-05 Thread Andrew Or (JIRA)
Andrew Or created SPARK-4771:


 Summary: Document standalone --supervise feature
 Key: SPARK-4771
 URL: https://issues.apache.org/jira/browse/SPARK-4771
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Andrew Or
Assignee: Andrew Or


We need this especially for streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236301#comment-14236301
 ] 

Andrew Or commented on SPARK-4759:
--

Hey just wanted to let you know that I am able to reproduce this locally. It is 
stuck at task 6/9 exactly as you pointed out. Investigating.

> Deadlock in complex spark job.
> --
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Affects Version/s: 1.3.0
   1.2.0

> Deadlock in complex spark job.
> --
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Summary: Deadlock in complex spark job in local mode with multiple cores  
(was: Deadlock in complex spark job in local mode)

> Deadlock in complex spark job in local mode with multiple cores
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Summary: Deadlock in complex spark job in local mode  (was: Deadlock in 
complex spark job.)

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-05 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236517#comment-14236517
 ] 

Andrew Or commented on SPARK-4759:
--

Quick update, I was only able to reproduce this in local mode when multiple 
cores are used. This doesn't happen if I only use 1 core in local mode, in 
local-cluster mode, or in standalone mode. It probably has something to do with 
how we allocate cores to executors in local mode. Still investigating.

> Deadlock in complex spark job in local mode with multiple cores
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or commented on SPARK-4759:
--

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8]
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(sc.defaultParallelism).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(sc.defaultParallelism).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.

> Deadlock in complex spark job in local mode with multiple cores
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/7/14 11:53 PM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8]
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8]
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(sc.defaultParallelism).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(sc.defaultParallelism).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.

> Deadlock in complex spark job in local mode with multiple cores
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Summary: Deadlock in complex spark job in local mode  (was: Deadlock in 
complex spark job in local mode with multiple cores)

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/7/14 11:53 PM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8] (or simply local)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8]
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.

> Deadlock in complex spark job in local mode with multiple cores
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/7/14 11:57 PM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 1/4. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 1/4 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8] (or simply local)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:17 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 1/4. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 1/4 too, but finishes shortly afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:20 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:20 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:25 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).coalesce(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).coalesce(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 1/2. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 1/2 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:36 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/12. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/12 too, but finishes shortly 
afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:36 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/12. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/12 too, but finishes shortly 
afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).coalesce(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).coalesce(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 1/2. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 1/2 too, but finishes shortly afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 1:29 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 1:30 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir("/tmp/spark-test")
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 1:33 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237372#comment-14237372
 ] 

Andrew Or commented on SPARK-4759:
--

Found the issue. The task scheduler schedules tasks based on the preferred 
locations specified by the partition. In CoalescedRDD's partitions, we use the 
empty string as the default preferred location, even though this does not 
actually represent a real host: 
https://github.com/apache/spark/blob/e895e0cbecbbec1b412ff21321e57826d2d0a982/core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala#L41

As a result, the task scheduler doesn't schedule a subset of the tasks on the 
local executor because these tasks are supposed to be scheduled on the host "" 
(empty string) that doesn't actually exist. I have not dug into the details of 
PartitionCoalescer as to why this is only specific to local mode.

I'll submit a fix shortly.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237374#comment-14237374
 ] 

Andrew Or commented on SPARK-4759:
--

[~dgshep] That's strange. I am able to reproduce this every time, and I only 
need to call "runMyJob" once. What master are you running?

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237374#comment-14237374
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 2:36 AM:
---

[~dgshep] That's strange. I am able to reproduce this every time, and I only 
need to call "runMyJob" once. What master are you running?

I just tried local, local[6], and local[*] and they all reproduced the 
deadlock. I am running the master branch with this commit: 
6eb1b6f6204ea3c8083af3fb9cd990d9f3dac89d


was (Author: andrewor14):
[~dgshep] That's strange. I am able to reproduce this every time, and I only 
need to call "runMyJob" once. What master are you running?

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237377#comment-14237377
 ] 

Andrew Or commented on SPARK-4759:
--

Hm I'll try branch 1.1 again later tonight. There might very well be more than 
one issue that causes this.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237550#comment-14237550
 ] 

Andrew Or commented on SPARK-4759:
--

Ok yeah you're right, I can't reproduce it from the code snippet in branch 1.1 
either. There seems to be at least two issues going on here... Can you confirm 
that the snippet does reproduce the lock in master branch?

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 7:28 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

- EDIT -
This seems to reproduce it only on the master branch.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 7:29 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

=== EDIT ===
This seems to reproduce it only on the master branch, but not 1.1.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

=== EDIT ===
This seems to reproduce it only on the master branch.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 7:29 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

=== EDIT ===
This seems to reproduce it only on the master branch.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

- EDIT -
This seems to reproduce it only on the master branch.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-08 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237597#comment-14237597
 ] 

Andrew Or commented on SPARK-4759:
--

Hey I have opened the following PR to fix the symptom I described earlier: 
https://github.com/apache/spark/pull/3633. I will spend some more time trying 
to understand why there is a discrepancy in reproducibility between master and 
1.1, but in both cases the patch should be sufficient in preventing this from 
happening again. Can you try it out?

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-08 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Attachment: SparkBugReplicatorSmaller.scala

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala, SparkBugReplicatorSmaller.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-08 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238289#comment-14238289
 ] 

Andrew Or commented on SPARK-4759:
--

I have a smaller reproduction for branch-1.1. It seems that we need to run the 
two jobs in two different SparkContexts in tandem to reproduce it here:

1. Run bin/spark-shell. The master doesn't matter here.
2. Copy and paste the following into the REPL
{code}
  def setup(): SparkContext = {
val conf = new SparkConf
conf.setMaster("local[8]")
conf.setAppName("test")
new SparkContext(conf)
  }

  def runMyJob(sc: SparkContext): Unit = {
val rdd = sc.parallelize(1 to 100).repartition(5).cache()
rdd.count()
val rdd2 = sc.parallelize(1 to 100).repartition(12)
rdd.union(rdd2).count()
  }

  def test(): Unit = {
var sc = setup()
runMyJob(sc)
sc.stop()
println("\n== FINISHED FIRST JOB ==\n")
sc = setup()
runMyJob(sc) // This will get stuck at task 5/17 and never finish
sc.stop()
println("\n== FINISHED SECOND JOB ==\n")
  }
{code}
3. Call test().

Alternatively you can run the same code as an application in the source code 
that I have attached. The benefit of not reproducing this through the 
spark-shell is that there we don't start have more than one SparkContexts 
running at any given time. Although I don't believe this has anything to do 
with the root cause, it would be good to limit the scope of possible things 
that could go wrong.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala, SparkBugReplicatorSmaller.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-08 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Attachment: (was: SparkBugReplicatorSmaller.scala)

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-08 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238289#comment-14238289
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 7:16 PM:
---

I have a smaller reproduction for branch-1.1. It seems that we need to run the 
two jobs in two different SparkContexts in tandem to reproduce it here:

1. Run bin/spark-shell. The master doesn't matter here.
2. Copy and paste the following into the REPL
{code}
  import org.apache.spark.{SparkConf, SparkContext}

  def setup(): SparkContext = {
val conf = new SparkConf
conf.setMaster("local[8]")
conf.setAppName("test")
new SparkContext(conf)
  }

  def runMyJob(sc: SparkContext): Unit = {
val rdd = sc.parallelize(1 to 100).repartition(5).cache()
rdd.count()
val rdd2 = sc.parallelize(1 to 100).repartition(12)
rdd.union(rdd2).count()
  }

  def test(): Unit = {
var sc = setup()
runMyJob(sc)
sc.stop()
println("\n== FINISHED FIRST JOB ==\n")
sc = setup()
runMyJob(sc) // This will get stuck at task 5/17 and never finish
sc.stop()
println("\n== FINISHED SECOND JOB ==\n")
  }
{code}
3. Call test().

Alternatively you can run the same code as an application in the source code 
that I have attached. The benefit of not reproducing this through the 
spark-shell is that there we don't start have more than one SparkContexts 
running at any given time. Although I don't believe this has anything to do 
with the root cause, it would be good to limit the scope of possible things 
that could go wrong.


was (Author: andrewor14):
I have a smaller reproduction for branch-1.1. It seems that we need to run the 
two jobs in two different SparkContexts in tandem to reproduce it here:

1. Run bin/spark-shell. The master doesn't matter here.
2. Copy and paste the following into the REPL
{code}
  def setup(): SparkContext = {
val conf = new SparkConf
conf.setMaster("local[8]")
conf.setAppName("test")
new SparkContext(conf)
  }

  def runMyJob(sc: SparkContext): Unit = {
val rdd = sc.parallelize(1 to 100).repartition(5).cache()
rdd.count()
val rdd2 = sc.parallelize(1 to 100).repartition(12)
rdd.union(rdd2).count()
  }

  def test(): Unit = {
var sc = setup()
runMyJob(sc)
sc.stop()
println("\n== FINISHED FIRST JOB ==\n")
sc = setup()
runMyJob(sc) // This will get stuck at task 5/17 and never finish
sc.stop()
println("\n== FINISHED SECOND JOB ==\n")
  }
{code}
3. Call test().

Alternatively you can run the same code as an application in the source code 
that I have attached. The benefit of not reproducing this through the 
spark-shell is that there we don't start have more than one SparkContexts 
running at any given time. Although I don't believe this has anything to do 
with the root cause, it would be good to limit the scope of possible things 
that could go wrong.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala, SparkBugReplicatorSmaller.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-08 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Attachment: SparkBugReplicatorSmaller.scala

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala, SparkBugReplicatorSmaller.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-08 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238289#comment-14238289
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 7:17 PM:
---

I have a smaller reproduction for branch-1.1. It seems that we need to run the 
two jobs in two different SparkContexts in tandem to reproduce it here:

1. Run bin/spark-shell. The master doesn't matter here.
2. Copy and paste the following into the REPL
{code}
  import org.apache.spark.{SparkConf, SparkContext}

  def setup(): SparkContext = {
val conf = new SparkConf
conf.setMaster("local[8]")
conf.setAppName("test")
new SparkContext(conf)
  }

  def runMyJob(sc: SparkContext): Unit = {
val rdd = sc.parallelize(1 to 100).repartition(5).cache()
rdd.count()
val rdd2 = sc.parallelize(1 to 100).repartition(12)
rdd.union(rdd2).count()
  }

  def test(): Unit = {
var sc = setup()
runMyJob(sc)
sc.stop()
println("\n== FINISHED FIRST JOB ==\n")
sc = setup()
runMyJob(sc) // This will get stuck at task 5/17 and never finish
sc.stop()
println("\n== FINISHED SECOND JOB ==\n")
  }
{code}
3. Call test().

Note that this will not work in master or branch-1.2 because there we don't 
allow spawning a SparkContext within a SparkContext, which is what the shell  
is doing here.

Alternatively you can run the same code as an application in the source code 
that I have attached. The benefit of not reproducing this through the 
spark-shell is that there we don't start have more than one SparkContexts 
running at any given time. Although I don't believe this has anything to do 
with the root cause, it would be good to limit the scope of possible things 
that could go wrong.


was (Author: andrewor14):
I have a smaller reproduction for branch-1.1. It seems that we need to run the 
two jobs in two different SparkContexts in tandem to reproduce it here:

1. Run bin/spark-shell. The master doesn't matter here.
2. Copy and paste the following into the REPL
{code}
  import org.apache.spark.{SparkConf, SparkContext}

  def setup(): SparkContext = {
val conf = new SparkConf
conf.setMaster("local[8]")
conf.setAppName("test")
new SparkContext(conf)
  }

  def runMyJob(sc: SparkContext): Unit = {
val rdd = sc.parallelize(1 to 100).repartition(5).cache()
rdd.count()
val rdd2 = sc.parallelize(1 to 100).repartition(12)
rdd.union(rdd2).count()
  }

  def test(): Unit = {
var sc = setup()
runMyJob(sc)
sc.stop()
println("\n== FINISHED FIRST JOB ==\n")
sc = setup()
runMyJob(sc) // This will get stuck at task 5/17 and never finish
sc.stop()
println("\n== FINISHED SECOND JOB ==\n")
  }
{code}
3. Call test().

Alternatively you can run the same code as an application in the source code 
that I have attached. The benefit of not reproducing this through the 
spark-shell is that there we don't start have more than one SparkContexts 
running at any given time. Although I don't believe this has anything to do 
with the root cause, it would be good to limit the scope of possible things 
that could go wrong.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala, SparkBugReplicatorSmaller.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-

[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-08 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238289#comment-14238289
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 7:17 PM:
---

I have a smaller reproduction for branch-1.1. It seems that we need to run the 
two jobs in two different SparkContexts in tandem to reproduce it here:

1. Run bin/spark-shell. The master doesn't matter here.
2. Copy and paste the following into the REPL
{code}
  import org.apache.spark.{SparkConf, SparkContext}

  def setup(): SparkContext = {
val conf = new SparkConf
conf.setMaster("local[8]")
conf.setAppName("test")
new SparkContext(conf)
  }

  def runMyJob(sc: SparkContext): Unit = {
val rdd = sc.parallelize(1 to 100).repartition(5).cache()
rdd.count()
val rdd2 = sc.parallelize(1 to 100).repartition(12)
rdd.union(rdd2).count()
  }

  def test(): Unit = {
var sc = setup()
runMyJob(sc)
sc.stop()
println("\n== FINISHED FIRST JOB ==\n")
sc = setup()
runMyJob(sc) // This will get stuck at task 5/17 and never finish
sc.stop()
println("\n== FINISHED SECOND JOB ==\n")
  }
{code}
3. Call test().

Note that this will not work in master or branch-1.2 because there we don't 
allow spawning a SparkContext within a SparkContext, which is what the shell is 
doing here. For master and branch-1.2, use the reproduction I posted above.

Alternatively you can run the same code as an application in the source code 
that I have attached. The benefit of not reproducing this through the 
spark-shell is that there we don't start have more than one SparkContexts 
running at any given time. Although I don't believe this has anything to do 
with the root cause, it would be good to limit the scope of possible things 
that could go wrong.


was (Author: andrewor14):
I have a smaller reproduction for branch-1.1. It seems that we need to run the 
two jobs in two different SparkContexts in tandem to reproduce it here:

1. Run bin/spark-shell. The master doesn't matter here.
2. Copy and paste the following into the REPL
{code}
  import org.apache.spark.{SparkConf, SparkContext}

  def setup(): SparkContext = {
val conf = new SparkConf
conf.setMaster("local[8]")
conf.setAppName("test")
new SparkContext(conf)
  }

  def runMyJob(sc: SparkContext): Unit = {
val rdd = sc.parallelize(1 to 100).repartition(5).cache()
rdd.count()
val rdd2 = sc.parallelize(1 to 100).repartition(12)
rdd.union(rdd2).count()
  }

  def test(): Unit = {
var sc = setup()
runMyJob(sc)
sc.stop()
println("\n== FINISHED FIRST JOB ==\n")
sc = setup()
runMyJob(sc) // This will get stuck at task 5/17 and never finish
sc.stop()
println("\n== FINISHED SECOND JOB ==\n")
  }
{code}
3. Call test().

Note that this will not work in master or branch-1.2 because there we don't 
allow spawning a SparkContext within a SparkContext, which is what the shell  
is doing here.

Alternatively you can run the same code as an application in the source code 
that I have attached. The benefit of not reproducing this through the 
spark-shell is that there we don't start have more than one SparkContexts 
running at any given time. Although I don't believe this has anything to do 
with the root cause, it would be good to limit the scope of possible things 
that could go wrong.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala, SparkBugReplicatorSmaller.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the i

[jira] [Updated] (SPARK-4687) SparkContext#addFile doesn't keep file folder information

2014-12-08 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4687:
-
Affects Version/s: 1.2.0

> SparkContext#addFile doesn't keep file folder information
> -
>
> Key: SPARK-4687
> URL: https://issues.apache.org/jira/browse/SPARK-4687
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Jimmy Xiang
>
> Files added with SparkContext#addFile are loaded with Utils#fetchFile before 
> a task starts. However, Utils#fetchFile puts all files under the Spart root 
> on the worker node. We should have an option to keep the folder information. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4338) Remove yarn-alpha support

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4338.

   Resolution: Fixed
Fix Version/s: 1.3.0

> Remove yarn-alpha support
> -
>
> Key: SPARK-4338
> URL: https://issues.apache.org/jira/browse/SPARK-4338
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4338) Remove yarn-alpha support

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4338:
-
Affects Version/s: 1.2.0

> Remove yarn-alpha support
> -
>
> Key: SPARK-4338
> URL: https://issues.apache.org/jira/browse/SPARK-4338
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-12-09 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239920#comment-14239920
 ] 

Andrew Or commented on SPARK-4447:
--

Hey [~sandyr] did you start work on this yet? If not I'll be happy to take this 
up. Otherwise I'll be happy to review it.

> Remove layers of abstraction in YARN code no longer needed after dropping 
> yarn-alpha
> 
>
> Key: SPARK-4447
> URL: https://issues.apache.org/jira/browse/SPARK-4447
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.3.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> For example, YarnRMClient and YarnRMClientImpl can be merged
> YarnAllocator and YarnAllocationHandler can be merged



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4772) Accumulators leak memory, both temporarily and permanently

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4772:
-
Target Version/s: 1.3.0, 1.2.1

> Accumulators leak memory, both temporarily and permanently
> --
>
> Key: SPARK-4772
> URL: https://issues.apache.org/jira/browse/SPARK-4772
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Nathan Kronenfeld
>  Labels: accumulators
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Accumulators.localAccums is cleared at the beginning of a task, and not at 
> the end.
> This means that any locally accumulated values hang around until another task 
> is run on that thread.
> If for some reason, the thread dies, said values hang around indefinitely.
> This is really only a big issue with very large accumulators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4772) Accumulators leak memory, both temporarily and permanently

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4772:
-
Fix Version/s: (was: 1.3.0)

> Accumulators leak memory, both temporarily and permanently
> --
>
> Key: SPARK-4772
> URL: https://issues.apache.org/jira/browse/SPARK-4772
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Nathan Kronenfeld
>  Labels: accumulators
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Accumulators.localAccums is cleared at the beginning of a task, and not at 
> the end.
> This means that any locally accumulated values hang around until another task 
> is run on that thread.
> If for some reason, the thread dies, said values hang around indefinitely.
> This is really only a big issue with very large accumulators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4772) Accumulators leak memory, both temporarily and permanently

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4772:
-
Affects Version/s: 1.1.0

> Accumulators leak memory, both temporarily and permanently
> --
>
> Key: SPARK-4772
> URL: https://issues.apache.org/jira/browse/SPARK-4772
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Nathan Kronenfeld
>  Labels: accumulators
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Accumulators.localAccums is cleared at the beginning of a task, and not at 
> the end.
> This means that any locally accumulated values hang around until another task 
> is run on that thread.
> If for some reason, the thread dies, said values hang around indefinitely.
> This is really only a big issue with very large accumulators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4772) Accumulators leak memory, both temporarily and permanently

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4772:
-
Affects Version/s: (was: 1.1.0)
   1.0.0

> Accumulators leak memory, both temporarily and permanently
> --
>
> Key: SPARK-4772
> URL: https://issues.apache.org/jira/browse/SPARK-4772
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Nathan Kronenfeld
>  Labels: accumulators
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Accumulators.localAccums is cleared at the beginning of a task, and not at 
> the end.
> This means that any locally accumulated values hang around until another task 
> is run on that thread.
> If for some reason, the thread dies, said values hang around indefinitely.
> This is really only a big issue with very large accumulators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4691) Restructure a few lines in shuffle code

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4691:
-
Assignee: maji2014

> Restructure a few lines in shuffle code
> ---
>
> Key: SPARK-4691
> URL: https://issues.apache.org/jira/browse/SPARK-4691
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: maji2014
>Assignee: maji2014
>Priority: Minor
>  Labels: backport-needed
>
> aggregator and mapSideCombine judgement in 
> HashShuffleWriter.scala 
> SortShuffleWriter.scala
> HashShuffleReader.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4691) Restructure a few lines in shuffle code

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4691:
-
Summary: Restructure a few lines in shuffle code  (was: code optimization 
for judgement)

> Restructure a few lines in shuffle code
> ---
>
> Key: SPARK-4691
> URL: https://issues.apache.org/jira/browse/SPARK-4691
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: maji2014
>Priority: Minor
>  Labels: backport-needed
>
> aggregator and mapSideCombine judgement in 
> HashShuffleWriter.scala 
> SortShuffleWriter.scala
> HashShuffleReader.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4691) Restructure a few lines in shuffle code

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4691:
-
Labels: backport-needed  (was: )

> Restructure a few lines in shuffle code
> ---
>
> Key: SPARK-4691
> URL: https://issues.apache.org/jira/browse/SPARK-4691
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: maji2014
>Assignee: maji2014
>Priority: Minor
>  Labels: backport-needed
>
> aggregator and mapSideCombine judgement in 
> HashShuffleWriter.scala 
> SortShuffleWriter.scala
> HashShuffleReader.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4691) Restructure a few lines in shuffle code

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4691:
-
Target Version/s: 1.3.0, 1.2.1
   Fix Version/s: 1.3.0

> Restructure a few lines in shuffle code
> ---
>
> Key: SPARK-4691
> URL: https://issues.apache.org/jira/browse/SPARK-4691
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: maji2014
>Assignee: maji2014
>Priority: Minor
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> aggregator and mapSideCombine judgement in 
> HashShuffleWriter.scala 
> SortShuffleWriter.scala
> HashShuffleReader.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2450) Provide link to YARN executor logs on UI

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2450:
-
Priority: Major  (was: Minor)

> Provide link to YARN executor logs on UI
> 
>
> Key: SPARK-2450
> URL: https://issues.apache.org/jira/browse/SPARK-2450
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI, YARN
>Affects Versions: 1.0.0
>Reporter: Bill Havanki
>Assignee: Kostas Sakellis
>
> When running under YARN, provide links to executor logs from the web UI to 
> avoid the need to drill down through the YARN UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-09 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240238#comment-14240238
 ] 

Andrew Or commented on SPARK-4759:
--

Aha, that is a great idea. Thanks [~joshrosen]

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala, SparkBugReplicatorSmaller.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Attachment: (was: SparkBugReplicatorSmaller.scala)

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-09 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238289#comment-14238289
 ] 

Andrew Or edited comment on SPARK-4759 at 12/9/14 11:01 PM:


I have a smaller reproduction for branch-1.1. It seems that we need to run the 
two jobs in two different SparkContexts in tandem to reproduce it here:

1. Run bin/spark-shell. The master doesn't matter here.
2. Copy and paste the following into the REPL
{code}
  import org.apache.spark.{SparkConf, SparkContext}
  sc.stop()

  def setup(): SparkContext = {
val conf = new SparkConf
conf.setMaster("local[8]")
conf.setAppName("test")
new SparkContext(conf)
  }

  def runMyJob(sc: SparkContext): Unit = {
val rdd = sc.parallelize(1 to 100).repartition(5).cache()
rdd.count()
val rdd2 = sc.parallelize(1 to 100).repartition(12)
rdd.union(rdd2).count()
  }

  def test(): Unit = {
var sc = setup()
runMyJob(sc)
sc.stop()
println("\n== FINISHED FIRST JOB ==\n")
sc = setup()
runMyJob(sc) // This will get stuck at task 5/17 and never finish
sc.stop()
println("\n== FINISHED SECOND JOB ==\n")
  }
{code}
3. Call test().


was (Author: andrewor14):
I have a smaller reproduction for branch-1.1. It seems that we need to run the 
two jobs in two different SparkContexts in tandem to reproduce it here:

1. Run bin/spark-shell. The master doesn't matter here.
2. Copy and paste the following into the REPL
{code}
  import org.apache.spark.{SparkConf, SparkContext}

  def setup(): SparkContext = {
val conf = new SparkConf
conf.setMaster("local[8]")
conf.setAppName("test")
new SparkContext(conf)
  }

  def runMyJob(sc: SparkContext): Unit = {
val rdd = sc.parallelize(1 to 100).repartition(5).cache()
rdd.count()
val rdd2 = sc.parallelize(1 to 100).repartition(12)
rdd.union(rdd2).count()
  }

  def test(): Unit = {
var sc = setup()
runMyJob(sc)
sc.stop()
println("\n== FINISHED FIRST JOB ==\n")
sc = setup()
runMyJob(sc) // This will get stuck at task 5/17 and never finish
sc.stop()
println("\n== FINISHED SECOND JOB ==\n")
  }
{code}
3. Call test().

Note that this will not work in master or branch-1.2 because there we don't 
allow spawning a SparkContext within a SparkContext, which is what the shell is 
doing here. For master and branch-1.2, use the reproduction I posted above.

Alternatively you can run the same code as an application in the source code 
that I have attached. The benefit of not reproducing this through the 
spark-shell is that there we don't start have more than one SparkContexts 
running at any given time. Although I don't believe this has anything to do 
with the root cause, it would be good to limit the scope of possible things 
that could go wrong.

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4161) Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4161:
-
Target Version/s: 1.3.0, 1.1.2, 1.2.1  (was: 1.1.2, 1.2.1)

> Spark shell class path is not correctly set if "spark.driver.extraClassPath" 
> is set in defaults.conf
> 
>
> Key: SPARK-4161
> URL: https://issues.apache.org/jira/browse/SPARK-4161
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0, 1.2.0
> Environment: Mac, Ubuntu
>Reporter: Shay Seng
>Assignee: Guoqiang Li
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> (1) I want to launch a spark-shell + with jars that are only required by the 
> driver (ie. not shipped to slaves)
>  
> (2) I added "spark.driver.extraClassPath  /mypath/to.jar" to my 
> spark-defaults.conf
> I launched spark-shell with:  ./spark-shell
> Here I see on the WebUI that spark.driver.extraClassPath has been set, but I 
> am NOT able to access any methods in the jar.
> (3) I removed "spark.driver.extraClassPath" from my spark-default.conf
> I launched spark-shell with  ./spark-shell --driver.class.path /mypath/to.jar
> Again I see that the WebUI spark.driver.extraClassPath has been set. 
> But this time I am able to access the methods in the jar. 
> Looks like when the driver class path is loaded from spark-default.conf, the 
> REPL's classpath is not correctly appended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4161) Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4161:
-
Fix Version/s: 1.3.0

> Spark shell class path is not correctly set if "spark.driver.extraClassPath" 
> is set in defaults.conf
> 
>
> Key: SPARK-4161
> URL: https://issues.apache.org/jira/browse/SPARK-4161
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0, 1.2.0
> Environment: Mac, Ubuntu
>Reporter: Shay Seng
>Assignee: Guoqiang Li
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> (1) I want to launch a spark-shell + with jars that are only required by the 
> driver (ie. not shipped to slaves)
>  
> (2) I added "spark.driver.extraClassPath  /mypath/to.jar" to my 
> spark-defaults.conf
> I launched spark-shell with:  ./spark-shell
> Here I see on the WebUI that spark.driver.extraClassPath has been set, but I 
> am NOT able to access any methods in the jar.
> (3) I removed "spark.driver.extraClassPath" from my spark-default.conf
> I launched spark-shell with  ./spark-shell --driver.class.path /mypath/to.jar
> Again I see that the WebUI spark.driver.extraClassPath has been set. 
> But this time I am able to access the methods in the jar. 
> Looks like when the driver class path is loaded from spark-default.conf, the 
> REPL's classpath is not correctly appended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4161) Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4161:
-
Labels: backport-needed  (was: )

> Spark shell class path is not correctly set if "spark.driver.extraClassPath" 
> is set in defaults.conf
> 
>
> Key: SPARK-4161
> URL: https://issues.apache.org/jira/browse/SPARK-4161
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0, 1.2.0
> Environment: Mac, Ubuntu
>Reporter: Shay Seng
>Assignee: Guoqiang Li
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> (1) I want to launch a spark-shell + with jars that are only required by the 
> driver (ie. not shipped to slaves)
>  
> (2) I added "spark.driver.extraClassPath  /mypath/to.jar" to my 
> spark-defaults.conf
> I launched spark-shell with:  ./spark-shell
> Here I see on the WebUI that spark.driver.extraClassPath has been set, but I 
> am NOT able to access any methods in the jar.
> (3) I removed "spark.driver.extraClassPath" from my spark-default.conf
> I launched spark-shell with  ./spark-shell --driver.class.path /mypath/to.jar
> Again I see that the WebUI spark.driver.extraClassPath has been set. 
> But this time I am able to access the methods in the jar. 
> Looks like when the driver class path is loaded from spark-default.conf, the 
> REPL's classpath is not correctly appended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4329) Add indexing feature for HistoryPage

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4329.

Resolution: Fixed
  Assignee: Kousuke Saruta

> Add indexing feature for HistoryPage
> 
>
> Key: SPARK-4329
> URL: https://issues.apache.org/jira/browse/SPARK-4329
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Fix For: 1.3.0
>
>
> Current HistoryPage have links only to previous page or next page.
> I suggest to add index to access history pages easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4329) Add indexing feature for HistoryPage

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4329:
-
Fix Version/s: 1.3.0

> Add indexing feature for HistoryPage
> 
>
> Key: SPARK-4329
> URL: https://issues.apache.org/jira/browse/SPARK-4329
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
> Fix For: 1.3.0
>
>
> Current HistoryPage have links only to previous page or next page.
> I suggest to add index to access history pages easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4771) Document standalone --supervise feature

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4771:
-
Target Version/s: 1.3.0, 1.1.2, 1.2.1  (was: 1.1.2, 1.2.1)

> Document standalone --supervise feature
> ---
>
> Key: SPARK-4771
> URL: https://issues.apache.org/jira/browse/SPARK-4771
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.3.0
>
>
> We need this especially for streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4771) Document standalone --supervise feature

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4771:
-
Fix Version/s: 1.3.0

> Document standalone --supervise feature
> ---
>
> Key: SPARK-4771
> URL: https://issues.apache.org/jira/browse/SPARK-4771
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.3.0
>
>
> We need this especially for streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4771) Document standalone --supervise feature

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4771.

   Resolution: Fixed
Fix Version/s: 1.2.1
   1.1.2

> Document standalone --supervise feature
> ---
>
> Key: SPARK-4771
> URL: https://issues.apache.org/jira/browse/SPARK-4771
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.3.0, 1.1.2, 1.2.1
>
>
> We need this especially for streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4215) Allow requesting executors only on Yarn (for now)

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4215:
-
Fix Version/s: 1.3.0

> Allow requesting executors only on Yarn (for now)
> -
>
> Key: SPARK-4215
> URL: https://issues.apache.org/jira/browse/SPARK-4215
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> Currently if the user attempts to call `sc.requestExecutors` or enables 
> dynamic allocation on, say, standalone mode, it just fails silently. We must 
> at the very least log a warning to say it's only available for Yarn 
> currently, or maybe even throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4215) Allow requesting executors only on Yarn (for now)

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4215:
-
Labels: backport-needed  (was: )

> Allow requesting executors only on Yarn (for now)
> -
>
> Key: SPARK-4215
> URL: https://issues.apache.org/jira/browse/SPARK-4215
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> Currently if the user attempts to call `sc.requestExecutors` or enables 
> dynamic allocation on, say, standalone mode, it just fails silently. We must 
> at the very least log a warning to say it's only available for Yarn 
> currently, or maybe even throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4793) way to find assembly jar is too strict

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4793:
-
Assignee: Adrian Wang

> way to find assembly jar is too strict
> --
>
> Key: SPARK-4793
> URL: https://issues.apache.org/jira/browse/SPARK-4793
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Reporter: Adrian Wang
>Assignee: Adrian Wang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4793) way to find assembly jar is too strict

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4793:
-
Affects Version/s: 1.1.0

> way to find assembly jar is too strict
> --
>
> Key: SPARK-4793
> URL: https://issues.apache.org/jira/browse/SPARK-4793
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.1.0
>Reporter: Adrian Wang
>Assignee: Adrian Wang
>Priority: Minor
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4793) way to find assembly jar is too strict

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4793:
-
Target Version/s: 1.3.0, 1.1.2, 1.2.1
   Fix Version/s: 1.3.0

> way to find assembly jar is too strict
> --
>
> Key: SPARK-4793
> URL: https://issues.apache.org/jira/browse/SPARK-4793
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.1.0
>Reporter: Adrian Wang
>Assignee: Adrian Wang
>Priority: Minor
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4569) Rename "externalSorting" in Aggregator

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4569:
-
Target Version/s: 1.3.0, 1.1.2, 1.2.1
   Fix Version/s: 1.3.0

> Rename "externalSorting" in Aggregator
> --
>
> Key: SPARK-4569
> URL: https://issues.apache.org/jira/browse/SPARK-4569
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 1.2.0
>Reporter: Sandy Ryza
>Priority: Trivial
> Fix For: 1.3.0
>
>
> While technically all spilling in Spark does result in sorting, calling this 
> variable externalSorting makes it seem like ExternalSorter will be used, when 
> in fact it just means whether spilling is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4569) Rename "externalSorting" in Aggregator

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4569:
-
Labels: backport-needed  (was: )

> Rename "externalSorting" in Aggregator
> --
>
> Key: SPARK-4569
> URL: https://issues.apache.org/jira/browse/SPARK-4569
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 1.2.0
>Reporter: Sandy Ryza
>Priority: Trivial
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> While technically all spilling in Spark does result in sorting, calling this 
> variable externalSorting makes it seem like ExternalSorter will be used, when 
> in fact it just means whether spilling is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Target Version/s: 1.3.0, 1.1.2, 1.2.1

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.3.0, 1.1.2
>
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Labels: backport-needed  (was: )

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.3.0, 1.1.2
>
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Fix Version/s: 1.1.2
   1.3.0

> Deadlock in complex spark job in local mode
> ---
>
> Key: SPARK-4759
> URL: https://issues.apache.org/jira/browse/SPARK-4759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
> Environment: Java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> Mac OSX 10.10.1
> Using local spark context
>Reporter: Davis Shepherd
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.3.0, 1.1.2
>
> Attachments: SparkBugReplicator.scala
>
>
> The attached test class runs two identical jobs that perform some iterative 
> computation on an RDD[(Int, Int)]. This computation involves 
>   # taking new data merging it with the previous result
>   # caching and checkpointing the new result
>   # rinse and repeat
> The first time the job is run, it runs successfully, and the spark context is 
> shut down. The second time the job is run with a new spark context in the 
> same process, the job hangs indefinitely, only having scheduled a subset of 
> the necessary tasks for the final stage.
> Ive been able to produce a test case that reproduces the issue, and I've 
> added some comments where some knockout experimentation has left some 
> breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2016) rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large

2014-12-11 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243153#comment-14243153
 ] 

Andrew Or commented on SPARK-2016:
--

This was filed before SPARK-2316 (https://github.com/apache/spark/pull/1679) 
was fixed. At least on the backend side, this should be much quicker than 
before. I don't know if we need to do some CSS magic to make the frontend side 
blazing fast too. Is this still reproducible?

> rdd in-memory storage UI becomes unresponsive when the number of RDD 
> partitions is large
> 
>
> Key: SPARK-2016
> URL: https://issues.apache.org/jira/browse/SPARK-2016
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Reynold Xin
>  Labels: starter
>
> Try run
> {code}
> sc.parallelize(1 to 100, 100).cache().count()
> {code}
> And open the storage UI for this RDD. It takes forever to load the page.
> When the number of partitions is very large, I think there are a few 
> alternatives:
> 0. Only show the top 1000.
> 1. Pagination
> 2. Instead of grouping by RDD blocks, group by executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2954) PySpark MLlib serialization tests fail on Python 2.6

2014-12-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2954:
-
Fix Version/s: 1.0.3

> PySpark MLlib serialization tests fail on Python 2.6
> 
>
> Key: SPARK-2954
> URL: https://issues.apache.org/jira/browse/SPARK-2954
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.1.0, 1.0.3
>
>
> The PySpark MLlib tests currently fail on Python 2.6 due to problems 
> unpacking data from bytearray using struct.unpack:
> {code}
> **
> File "pyspark/mllib/_common.py", line 181, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(1L)) == 1.0
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(1L)) == 1.0
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> File "pyspark/mllib/_common.py", line 184, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(sys.float_info.max)) == x
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(sys.float_info.max)) == x
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> File "pyspark/mllib/_common.py", line 187, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(sys.float_info.max)) == y
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(sys.float_info.max)) == y
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> {code}
> It looks like one solution is to wrap the {{bytearray}} with {{buffer()}}: 
> http://stackoverflow.com/a/15467046/590203



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2948) PySpark doesn't work on Python 2.6

2014-12-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2948:
-
Fix Version/s: 1.0.3

> PySpark doesn't work on Python 2.6
> --
>
> Key: SPARK-2948
> URL: https://issues.apache.org/jira/browse/SPARK-2948
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: CentOS 6.5 / Python 2.6.6
>Reporter: Kousuke Saruta
>Assignee: Josh Rosen
>Priority: Blocker
> Fix For: 1.1.0, 1.0.3
>
>
> In serializser.py, collections.namedtuple is redefined as follows.
> {code}
> def namedtuple(name, fields, verbose=False, rename=False):
>   
>   
> cls = _old_namedtuple(name, fields, verbose, rename)  
>   
>   
> return _hack_namedtuple(cls)  
>   
>   
>  
> {code}
> The number of arguments is 4 but the number of arguments of namedtuple for 
> Python 2.6 is 3 so mismatch is occurred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2910) Test with Python 2.6 on Jenkins

2014-12-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2910:
-
Fix Version/s: 1.0.3

> Test with Python 2.6 on Jenkins
> ---
>
> Key: SPARK-2910
> URL: https://issues.apache.org/jira/browse/SPARK-2910
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.1.0, 1.0.3
>
>
> As long as we continue to support Python 2.6 in PySpark, Jenkins should test  
> with Python 2.6.
> We could downgrade the system Python to 2.6, but it might be easier / cleaner 
> to install 2.6 alongside the current Python and {{export 
> PYSPARK_PYTHON=python2.6}} in the test runner script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2101) Python unit tests fail on Python 2.6 because of lack of unittest.skipIf()

2014-12-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2101:
-
Fix Version/s: 1.0.3

> Python unit tests fail on Python 2.6 because of lack of unittest.skipIf()
> -
>
> Key: SPARK-2101
> URL: https://issues.apache.org/jira/browse/SPARK-2101
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Uri Laserson
>Assignee: Josh Rosen
> Fix For: 1.1.0, 1.0.3
>
>
> PySpark tests fail with Python 2.6 because they currently depend on 
> {{unittest.skipIf}}, which was only introduced in Python 2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4839) Adding documentations about dynamic resource allocation

2014-12-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4839.

Resolution: Duplicate

Closing this as a duplicate

> Adding documentations about dynamic resource allocation
> ---
>
> Key: SPARK-4839
> URL: https://issues.apache.org/jira/browse/SPARK-4839
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Tsuyoshi OZAWA
> Fix For: 1.2.0
>
>
> There are not docs about dynamicAllocation. We should add them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4006) Spark Driver crashes whenever an Executor is registered twice

2014-12-17 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4006:
-
Fix Version/s: 1.0.3

> Spark Driver crashes whenever an Executor is registered twice
> -
>
> Key: SPARK-4006
> URL: https://issues.apache.org/jira/browse/SPARK-4006
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 0.9.2, 1.0.2, 1.1.0, 1.2.0
> Environment: Mesos, Coarse Grained
>Reporter: Tal Sliwowicz
>Assignee: Tal Sliwowicz
>Priority: Critical
> Fix For: 1.1.1, 1.2.0, 1.0.3
>
>
> This is a huge robustness issue for us (Taboola), in mission critical , time 
> sensitive (real time) spark jobs.
> We have long running spark drivers and even though we have state of the art 
> hardware, from time to time executors disconnect. In many cases, the 
> RemoveExecutor is not received, and when the new executor registers, the 
> driver crashes. In mesos coarse grained, executor ids are fixed. 
> The issue is with the System.exit(1) in BlockManagerMasterActor
> {code}
> private def register(id: BlockManagerId, maxMemSize: Long, slaveActor: 
> ActorRef) {
> if (!blockManagerInfo.contains(id)) {
>   blockManagerIdByExecutor.get(id.executorId) match {
> case Some(manager) =>
>   // A block manager of the same executor already exists.
>   // This should never happen. Let's just quit.
>   logError("Got two different block manager registrations on " + 
> id.executorId)
>   System.exit(1)
> case None =>
>   blockManagerIdByExecutor(id.executorId) = id
>   }
>   logInfo("Registering block manager %s with %s RAM".format(
> id.hostPort, Utils.bytesToString(maxMemSize)))
>   blockManagerInfo(id) =
> new BlockManagerInfo(id, System.currentTimeMillis(), maxMemSize, 
> slaveActor)
> }
> listenerBus.post(SparkListenerBlockManagerAdded(id, maxMemSize))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4006) Spark Driver crashes whenever an Executor is registered twice

2014-12-17 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4006:
-
Target Version/s: 1.1.1, 1.2.0, 1.0.3  (was: 1.1.1, 1.2.0)

> Spark Driver crashes whenever an Executor is registered twice
> -
>
> Key: SPARK-4006
> URL: https://issues.apache.org/jira/browse/SPARK-4006
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 0.9.2, 1.0.2, 1.1.0, 1.2.0
> Environment: Mesos, Coarse Grained
>Reporter: Tal Sliwowicz
>Assignee: Tal Sliwowicz
>Priority: Critical
> Fix For: 1.1.1, 1.2.0, 1.0.3
>
>
> This is a huge robustness issue for us (Taboola), in mission critical , time 
> sensitive (real time) spark jobs.
> We have long running spark drivers and even though we have state of the art 
> hardware, from time to time executors disconnect. In many cases, the 
> RemoveExecutor is not received, and when the new executor registers, the 
> driver crashes. In mesos coarse grained, executor ids are fixed. 
> The issue is with the System.exit(1) in BlockManagerMasterActor
> {code}
> private def register(id: BlockManagerId, maxMemSize: Long, slaveActor: 
> ActorRef) {
> if (!blockManagerInfo.contains(id)) {
>   blockManagerIdByExecutor.get(id.executorId) match {
> case Some(manager) =>
>   // A block manager of the same executor already exists.
>   // This should never happen. Let's just quit.
>   logError("Got two different block manager registrations on " + 
> id.executorId)
>   System.exit(1)
> case None =>
>   blockManagerIdByExecutor(id.executorId) = id
>   }
>   logInfo("Registering block manager %s with %s RAM".format(
> id.hostPort, Utils.bytesToString(maxMemSize)))
>   blockManagerInfo(id) =
> new BlockManagerInfo(id, System.currentTimeMillis(), maxMemSize, 
> slaveActor)
> }
> listenerBus.post(SparkListenerBlockManagerAdded(id, maxMemSize))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3000) drop old blocks to disk in parallel when memory is not large enough for caching new blocks

2014-12-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-3000:
-
 Target Version/s: 1.3.0
Affects Version/s: 1.1.0

> drop old blocks to disk in parallel when memory is not large enough for 
> caching new blocks
> --
>
> Key: SPARK-3000
> URL: https://issues.apache.org/jira/browse/SPARK-3000
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Zhang, Liye
>Assignee: Zhang, Liye
> Attachments: Spark-3000 Design Doc.pdf
>
>
> In spark, rdd can be cached in memory for later use, and the cached memory 
> size is "*spark.executor.memory * spark.storage.memoryFraction*" for spark 
> version before 1.1.0, and "*spark.executor.memory * 
> spark.storage.memoryFraction * spark.storage.safetyFraction*" after 
> [SPARK-1777|https://issues.apache.org/jira/browse/SPARK-1777]. 
> For Storage level *MEMORY_AND_DISK*, when free memory is not enough to cache 
> new blocks, old blocks might be dropped to disk to free up memory for new 
> blocks. This operation is processed by _ensureFreeSpace_ in 
> _MemoryStore.scala_, there will always be a "*accountingLock*" held by the 
> caller to ensure only one thread is dropping blocks. This method can not 
> fully used the disks throughput when there are multiple disks on the working 
> node. When testing our workload, we found this is really a bottleneck when 
> size of old blocks to be dropped is really large. 
> We have tested the parallel method on spark 1.0, the speedup is significant. 
> So it's necessary to make dropping blocks operation in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4754) ExecutorAllocationManager should not take in SparkContext

2014-12-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4754:
-
Fix Version/s: 1.2.1
   1.3.0

> ExecutorAllocationManager should not take in SparkContext
> -
>
> Key: SPARK-4754
> URL: https://issues.apache.org/jira/browse/SPARK-4754
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.3.0, 1.2.1
>
>
> We should refactor ExecutorAllocationManager to not take in a SparkContext. 
> Otherwise, new developers may try to add a lot of unnecessary pointers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4754) ExecutorAllocationManager should not take in SparkContext

2014-12-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4754.

Resolution: Fixed

> ExecutorAllocationManager should not take in SparkContext
> -
>
> Key: SPARK-4754
> URL: https://issues.apache.org/jira/browse/SPARK-4754
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.3.0, 1.2.1
>
>
> We should refactor ExecutorAllocationManager to not take in a SparkContext. 
> Otherwise, new developers may try to add a lot of unnecessary pointers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4889) HistoryServer documentation refers to non-existent spark-history-server.sh script

2014-12-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4889:
-
Assignee: Ryan Williams

> HistoryServer documentation refers to non-existent spark-history-server.sh 
> script
> -
>
> Key: SPARK-4889
> URL: https://issues.apache.org/jira/browse/SPARK-4889
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.2.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>
> The [examples for how to start a history 
> server|https://github.com/apache/spark/blob/253b72b56fe908bbab5d621eae8a5f359c639dfd/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L164]
>  refer to a {{spark-history-server.sh}} script that doesn't exist; afaict it 
> never did.
> I believe the examples mean to refer to {{./sbin/start-history-server.sh}}, 
> and should be updated to reflect that the log directory should be specified 
> via {{-Dspark.history.fs.logDirectory}} in {{$SPARK_HISTORY_OPTS}}, not via a 
> command-line argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4889) HistoryServer documentation refers to non-existent spark-history-server.sh script

2014-12-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4889:
-
 Target Version/s: 1.3.0, 1.2.1
Affects Version/s: 1.2.0

> HistoryServer documentation refers to non-existent spark-history-server.sh 
> script
> -
>
> Key: SPARK-4889
> URL: https://issues.apache.org/jira/browse/SPARK-4889
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.2.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>
> The [examples for how to start a history 
> server|https://github.com/apache/spark/blob/253b72b56fe908bbab5d621eae8a5f359c639dfd/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L164]
>  refer to a {{spark-history-server.sh}} script that doesn't exist; afaict it 
> never did.
> I believe the examples mean to refer to {{./sbin/start-history-server.sh}}, 
> and should be updated to reflect that the log directory should be specified 
> via {{-Dspark.history.fs.logDirectory}} in {{$SPARK_HISTORY_OPTS}}, not via a 
> command-line argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4889) HistoryServer documentation refers to non-existent spark-history-server.sh script

2014-12-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4889.

Resolution: Fixed

> HistoryServer documentation refers to non-existent spark-history-server.sh 
> script
> -
>
> Key: SPARK-4889
> URL: https://issues.apache.org/jira/browse/SPARK-4889
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.2.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>
> The [examples for how to start a history 
> server|https://github.com/apache/spark/blob/253b72b56fe908bbab5d621eae8a5f359c639dfd/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L164]
>  refer to a {{spark-history-server.sh}} script that doesn't exist; afaict it 
> never did.
> I believe the examples mean to refer to {{./sbin/start-history-server.sh}}, 
> and should be updated to reflect that the log directory should be specified 
> via {{-Dspark.history.fs.logDirectory}} in {{$SPARK_HISTORY_OPTS}}, not via a 
> command-line argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4889) HistoryServer documentation refers to non-existent spark-history-server.sh script

2014-12-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4889:
-
Fix Version/s: 1.2.1
   1.3.0

> HistoryServer documentation refers to non-existent spark-history-server.sh 
> script
> -
>
> Key: SPARK-4889
> URL: https://issues.apache.org/jira/browse/SPARK-4889
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.2.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>
> The [examples for how to start a history 
> server|https://github.com/apache/spark/blob/253b72b56fe908bbab5d621eae8a5f359c639dfd/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L164]
>  refer to a {{spark-history-server.sh}} script that doesn't exist; afaict it 
> never did.
> I believe the examples mean to refer to {{./sbin/start-history-server.sh}}, 
> and should be updated to reflect that the log directory should be specified 
> via {{-Dspark.history.fs.logDirectory}} in {{$SPARK_HISTORY_OPTS}}, not via a 
> command-line argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-2261) Spark application event logs are not very NameNode-friendly

2014-12-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-2261.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Target Version/s: 1.3.0

> Spark application event logs are not very NameNode-friendly
> ---
>
> Key: SPARK-2261
> URL: https://issues.apache.org/jira/browse/SPARK-2261
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 1.3.0
>
>
> Currently, EventLoggingListener will generate application logs using, in the 
> worst case, five different entries in the file system:
> * The directory to hold the files
> * One file for the Spark version
> * One file for the event logs
> * One file to identify the compression codec of the event logs
> * One file to say the application is finished.
> It would be better to be more friendly to the NameNodes and use a single 
> entry for all of those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3060) spark-shell.cmd doesn't accept application options in Windows OS

2014-12-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-3060.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Masayoshi TSUZUKI
Target Version/s: 1.3.0

> spark-shell.cmd doesn't accept application options in Windows OS
> 
>
> Key: SPARK-3060
> URL: https://issues.apache.org/jira/browse/SPARK-3060
> Project: Spark
>  Issue Type: Bug
>  Components: Windows
>Affects Versions: 1.0.2
> Environment: Windows
>Reporter: Masayoshi TSUZUKI
>Assignee: Masayoshi TSUZUKI
> Fix For: 1.3.0
>
>
> spark-shell.cmd accepts submit options ([SPARK-3006]).
> But we have no way to specify the appliaction options with spark-shell.cmd.
> This problem is only for the Windows OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4140) Document the dynamic allocation feature

2014-12-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4140.

  Resolution: Fixed
   Fix Version/s: 1.2.1
  1.3.0
Target Version/s: 1.3.0, 1.2.1  (was: 1.2.0)

> Document the dynamic allocation feature
> ---
>
> Key: SPARK-4140
> URL: https://issues.apache.org/jira/browse/SPARK-4140
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.3.0, 1.2.1
>
>
> This blocks on SPARK-3795 and SPARK-3822.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4140) Document the dynamic allocation feature

2014-12-19 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254494#comment-14254494
 ] 

Andrew Or commented on SPARK-4140:
--

I'd like to keep SPARK-3174 closed instead of keep re-opening it whenever a 
dynamic allocation issue comes in

> Document the dynamic allocation feature
> ---
>
> Key: SPARK-4140
> URL: https://issues.apache.org/jira/browse/SPARK-4140
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.3.0, 1.2.1
>
>
> This blocks on SPARK-3795 and SPARK-3822.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4751) Support dynamic allocation for standalone mode

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4751:
-
Priority: Critical  (was: Blocker)

> Support dynamic allocation for standalone mode
> --
>
> Key: SPARK-4751
> URL: https://issues.apache.org/jira/browse/SPARK-4751
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
>
> This is equivalent to SPARK-3822 but for standalone mode.
> This is actually a very tricky issue because the scheduling mechanism in the 
> standalone Master uses different semantics. In standalone mode we allocate 
> resources based on cores. By default, an application will grab all the cores 
> in the cluster unless "spark.cores.max" is specified. Unfortunately, this 
> means an application could get executors of different sizes (in terms of 
> cores) if:
> 1) App 1 kills an executor
> 2) App 2, with "spark.cores.max" set, grabs a subset of cores on a worker
> 3) App 1 requests an executor
> In this case, the new executor that App 1 gets back will be smaller than the 
> rest and can execute fewer tasks in parallel. Further, standalone mode is 
> subject to the constraint that only one executor can be allocated on each 
> worker per application. As a result, it is rather meaningless to request new 
> executors if the existing ones are already spread out across all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4922) Support dynamic allocation for coarse-grained Mesos

2014-12-22 Thread Andrew Or (JIRA)
Andrew Or created SPARK-4922:


 Summary: Support dynamic allocation for coarse-grained Mesos
 Key: SPARK-4922
 URL: https://issues.apache.org/jira/browse/SPARK-4922
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.2.0
Reporter: Andrew Or
Priority: Critical


This brings SPARK-3174, which provided dynamic allocation of cluster resources 
to Spark on YARN applications, to Mesos coarse-grained mode. 

Note that the translation is not as trivial as adding a code path that exposes 
the request and kill mechanisms as we did for YARN is SPARK-3822. This is 
because Mesos coarse-grained mode schedules on the notion of setting the number 
of cores allowed for an application (as in standalone mode) instead of number 
of executors (as in YARN mode). For more detail, please see SPARK-4751.

If you intend to work on this, please provide a detailed design doc!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-12-22 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256045#comment-14256045
 ] 

Andrew Or commented on SPARK-3174:
--

Hey [~nemccarthy] I filed one at SPARK-4922, which is for coarse-grained mode. 
For fine-grained mode, there is already one that enables dynamically scaling 
memory instead of just CPU at SPARK-1882. I believe there has not been progress 
on either issue yet.

> Provide elastic scaling within a Spark application
> --
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.2
>Reporter: Sandy Ryza
>Assignee: Andrew Or
> Fix For: 1.2.0
>
> Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4915) Wrong classname of external shuffle service in the doc for dynamic allocation

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4915:
-
Affects Version/s: 1.2.0

> Wrong classname of external shuffle service in the doc for dynamic allocation
> -
>
> Key: SPARK-4915
> URL: https://issues.apache.org/jira/browse/SPARK-4915
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, YARN
>Affects Versions: 1.2.0
>Reporter: Tsuyoshi OZAWA
> Fix For: 1.2.0, 1.3.0
>
>
> docs/job-scheduling.md says as follows:
> {quote}
> To enable this service, set `spark.shuffle.service.enabled` to `true`. In 
> YARN, this external shuffle service is implemented in 
> `org.apache.spark.yarn.network.YarnShuffleService` that runs in each 
> `NodeManager` in your cluster. 
> {quote}
> The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. 
> org.apache.spark.network.yarn.YarnShuffleService is correct class name to be 
> specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4915) Wrong classname of external shuffle service in the doc for dynamic allocation

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4915.

  Resolution: Fixed
   Fix Version/s: 1.3.0
  1.2.0
Assignee: Tsuyoshi OZAWA
Target Version/s: 1.2.0, 1.3.0

> Wrong classname of external shuffle service in the doc for dynamic allocation
> -
>
> Key: SPARK-4915
> URL: https://issues.apache.org/jira/browse/SPARK-4915
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, YARN
>Affects Versions: 1.2.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 1.2.0, 1.3.0
>
>
> docs/job-scheduling.md says as follows:
> {quote}
> To enable this service, set `spark.shuffle.service.enabled` to `true`. In 
> YARN, this external shuffle service is implemented in 
> `org.apache.spark.yarn.network.YarnShuffleService` that runs in each 
> `NodeManager` in your cluster. 
> {quote}
> The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. 
> org.apache.spark.network.yarn.YarnShuffleService is correct class name to be 
> specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4881) Use SparkConf#getBoolean instead of get().toBoolean

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4881:
-
Priority: Trivial  (was: Minor)

> Use SparkConf#getBoolean instead of get().toBoolean
> ---
>
> Key: SPARK-4881
> URL: https://issues.apache.org/jira/browse/SPARK-4881
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Priority: Trivial
>
> It's really a minor issue.
> In ApplicationMaster, there is code like as follows.
> {code}
>   val preserveFiles = sparkConf.get("spark.yarn.preserve.staging.files", 
> "false").toBoolean
> {code}
> I think, the code can be simplified like as follows.
> {code}
>   val preserveFiles = 
> sparkConf.getBoolean("spark.yarn.preserve.staging.files", false)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4870) Add version information to driver log

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4870:
-
Priority: Minor  (was: Major)

> Add version information to driver log
> -
>
> Key: SPARK-4870
> URL: https://issues.apache.org/jira/browse/SPARK-4870
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Zhang, Liye
>Priority: Minor
>
> Driver log doesn't include spark version information, version info is 
> important in testing different spark version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4870) Add version information to driver log

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4870.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Zhang, Liye
Target Version/s: 1.3.0

> Add version information to driver log
> -
>
> Key: SPARK-4870
> URL: https://issues.apache.org/jira/browse/SPARK-4870
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Zhang, Liye
>Assignee: Zhang, Liye
>Priority: Minor
> Fix For: 1.3.0
>
>
> Driver log doesn't include spark version information, version info is 
> important in testing different spark version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4883) Add a name to the directoryCleaner thread

2014-12-22 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4883.

  Resolution: Fixed
   Fix Version/s: 1.2.1
  1.3.0
Assignee: Shixiong Zhu
Target Version/s: 1.3.0, 1.2.1

> Add a name to the directoryCleaner thread
> -
>
> Key: SPARK-4883
> URL: https://issues.apache.org/jira/browse/SPARK-4883
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.2.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >