[jira] [Created] (SPARK-4783) Remove all System.exit calls from sparkcontext

2014-12-07 Thread David Semeria (JIRA)
David Semeria created SPARK-4783:


 Summary: Remove all System.exit calls from sparkcontext
 Key: SPARK-4783
 URL: https://issues.apache.org/jira/browse/SPARK-4783
 Project: Spark
  Issue Type: Bug
Reporter: David Semeria


A common architectural choice for integrating Spark within a larger application 
is to employ a gateway to handle Spark jobs. The gateway is a server which 
contains one or more long-running sparkcontexts.

A typical server is created with the following pseudo code:
```
var continue = true
while (continue){
 try {
server.run()
  } catch (e) {
  continue = log_and_examine_error(e)
}
```
The problem is that sparkcontext frequently calls System.exit when it 
encounters a problem which means the server can only be re-spawned at the 
process level, which is much more messy than the simple code above.

Therefore, I believe it makes sense to replace all System.exit calls in 
sparkcontext with the throwing of a fatal error. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4783) Remove all System.exit calls from sparkcontext

2014-12-07 Thread David Semeria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Semeria updated SPARK-4783:
-
Description: 
A common architectural choice for integrating Spark within a larger application 
is to employ a gateway to handle Spark jobs. The gateway is a server which 
contains one or more long-running sparkcontexts.

A typical server is created with the following pseudo code:

var continue = true
while (continue){
 try {
server.run() 
  } catch (e) {
  continue = log_and_examine_error(e)
}

The problem is that sparkcontext frequently calls System.exit when it 
encounters a problem which means the server can only be re-spawned at the 
process level, which is much more messy than the simple code above.

Therefore, I believe it makes sense to replace all System.exit calls in 
sparkcontext with the throwing of a fatal error. 

  was:
A common architectural choice for integrating Spark within a larger application 
is to employ a gateway to handle Spark jobs. The gateway is a server which 
contains one or more long-running sparkcontexts.

A typical server is created with the following pseudo code:
```
var continue = true
while (continue){
 try {
server.run()
  } catch (e) {
  continue = log_and_examine_error(e)
}
```
The problem is that sparkcontext frequently calls System.exit when it 
encounters a problem which means the server can only be re-spawned at the 
process level, which is much more messy than the simple code above.

Therefore, I believe it makes sense to replace all System.exit calls in 
sparkcontext with the throwing of a fatal error. 


 Remove all System.exit calls from sparkcontext
 --

 Key: SPARK-4783
 URL: https://issues.apache.org/jira/browse/SPARK-4783
 Project: Spark
  Issue Type: Bug
Reporter: David Semeria

 A common architectural choice for integrating Spark within a larger 
 application is to employ a gateway to handle Spark jobs. The gateway is a 
 server which contains one or more long-running sparkcontexts.
 A typical server is created with the following pseudo code:
 var continue = true
 while (continue){
  try {
 server.run() 
   } catch (e) {
   continue = log_and_examine_error(e)
 }
 The problem is that sparkcontext frequently calls System.exit when it 
 encounters a problem which means the server can only be re-spawned at the 
 process level, which is much more messy than the simple code above.
 Therefore, I believe it makes sense to replace all System.exit calls in 
 sparkcontext with the throwing of a fatal error. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2808) update kafka to version 0.8.2

2014-12-07 Thread Helena Edelson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237150#comment-14237150
 ] 

Helena Edelson commented on SPARK-2808:
---

I've done the migration, am testing the changes against Scala 2.10.4 since the 
parent spark pom uses 
scala.version2.10.4/scala.version
scala.binary.version2.10/scala.binary.version
This will allow usage of the new producer, among other nice additions in 0.8.2. 
Though I do not know when it will be GA.

 update kafka to version 0.8.2
 -

 Key: SPARK-2808
 URL: https://issues.apache.org/jira/browse/SPARK-2808
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Spark Core
Reporter: Anand Avati

 First kafka_2.11 0.8.1 has to be released



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4181) Create separate options to control the client-mode AM resource allocation request

2014-12-07 Thread WangTaoTheTonic (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237163#comment-14237163
 ] 

WangTaoTheTonic commented on SPARK-4181:


I couldn't think out one case for extraLibraryPath either. But we should honor 
extraClasspath as some function, for instance, log system, would load resources 
from self-defined classpath.

 Create separate options to control the client-mode AM resource allocation 
 request
 -

 Key: SPARK-4181
 URL: https://issues.apache.org/jira/browse/SPARK-4181
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Reporter: WangTaoTheTonic
Priority: Minor

 I found related discussion in https://github.com/apache/spark/pull/2115, 
 SPARK-1953 and SPARK-1507. And recently I found some inconvenience in 
 configuring properties like logging while we use yarn-client mode. So if no 
 one else do the work, I will try it. Maybe start in few days, and complete in 
 next 1 or 2 weeks.
 As not very familiar with spark on yarn, any discussion and feedback is 
 welcome!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-07 Thread Zhang, Liye (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237178#comment-14237178
 ] 

Zhang, Liye commented on SPARK-4740:


[~pwendell], I have the same concern first, and I verified that, and replaced 
the assembly jar files on all nodes, and also restart the cluster, but the 
problem still exists. One more solid proof is that the better performance is 
not stick to one machine, maybe one node performs better for this test and 
another node for other test. 

 Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey
 

 Key: SPARK-4740
 URL: https://issues.apache.org/jira/browse/SPARK-4740
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 1.2.0
Reporter: Zhang, Liye
Assignee: Reynold Xin
Priority: Blocker
 Attachments: (rxin patch better executor)TestRunner  sort-by-key - 
 Thread dump for executor 3_files.zip, (rxin patch normal executor)TestRunner  
 sort-by-key - Thread dump for executor 0 _files.zip, Spark-perf Test Report 
 16 Cores per Executor.pdf, Spark-perf Test Report.pdf, TestRunner  
 sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per node).zip, 
 TestRunner  sort-by-key - Thread dump for executor 1_files (Nio-48 cores per 
 node).zip


 When testing current spark master (1.3.0-snapshot) with spark-perf 
 (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
 takes much longer time than NIO based shuffle transferService. The network 
 throughput of Netty is only about half of that of NIO. 
 We tested with standalone mode, and the data set we used for test is 20 
 billion records, and the total size is about 400GB. Spark-perf test is 
 Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
 executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-07 Thread Zhang, Liye (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237178#comment-14237178
 ] 

Zhang, Liye edited comment on SPARK-4740 at 12/7/14 3:32 PM:
-

[~pwendell], I have the same concern at first. I verified for several times, 
and replaced the assembly jar files on all nodes, and also restart the cluster, 
but the problem still exists. One more solid proof is that the better 
performance node is not stick to one machine, maybe one node performs better 
for this test and may switch to another node for next test. 


was (Author: liyezhang556520):
[~pwendell], I have the same concern first, and I verified that, and replaced 
the assembly jar files on all nodes, and also restart the cluster, but the 
problem still exists. One more solid proof is that the better performance is 
not stick to one machine, maybe one node performs better for this test and 
another node for other test. 

 Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey
 

 Key: SPARK-4740
 URL: https://issues.apache.org/jira/browse/SPARK-4740
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 1.2.0
Reporter: Zhang, Liye
Assignee: Reynold Xin
Priority: Blocker
 Attachments: (rxin patch better executor)TestRunner  sort-by-key - 
 Thread dump for executor 3_files.zip, (rxin patch normal executor)TestRunner  
 sort-by-key - Thread dump for executor 0 _files.zip, Spark-perf Test Report 
 16 Cores per Executor.pdf, Spark-perf Test Report.pdf, TestRunner  
 sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per node).zip, 
 TestRunner  sort-by-key - Thread dump for executor 1_files (Nio-48 cores per 
 node).zip


 When testing current spark master (1.3.0-snapshot) with spark-perf 
 (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
 takes much longer time than NIO based shuffle transferService. The network 
 throughput of Netty is only about half of that of NIO. 
 We tested with standalone mode, and the data set we used for test is 20 
 billion records, and the total size is about 400GB. Spark-perf test is 
 Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
 executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-07 Thread Zhang, Liye (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237179#comment-14237179
 ] 

Zhang, Liye commented on SPARK-4740:


Hi [~adav], I kept speculation as default (false).

 Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey
 

 Key: SPARK-4740
 URL: https://issues.apache.org/jira/browse/SPARK-4740
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 1.2.0
Reporter: Zhang, Liye
Assignee: Reynold Xin
Priority: Blocker
 Attachments: (rxin patch better executor)TestRunner  sort-by-key - 
 Thread dump for executor 3_files.zip, (rxin patch normal executor)TestRunner  
 sort-by-key - Thread dump for executor 0 _files.zip, Spark-perf Test Report 
 16 Cores per Executor.pdf, Spark-perf Test Report.pdf, TestRunner  
 sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per node).zip, 
 TestRunner  sort-by-key - Thread dump for executor 1_files (Nio-48 cores per 
 node).zip


 When testing current spark master (1.3.0-snapshot) with spark-perf 
 (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
 takes much longer time than NIO based shuffle transferService. The network 
 throughput of Netty is only about half of that of NIO. 
 We tested with standalone mode, and the data set we used for test is 20 
 billion records, and the total size is about 400GB. Spark-perf test is 
 Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
 executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-2808) update kafka to version 0.8.2

2014-12-07 Thread Helena Edelson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237150#comment-14237150
 ] 

Helena Edelson edited comment on SPARK-2808 at 12/7/14 3:35 PM:


I've done the migration, am testing the changes against Scala 2.10.4 since the 
parent spark pom uses 
scala.version2.10.4/scala.version
scala.binary.version2.10/scala.binary.version
I do not know when it will be GA.


was (Author: helena_e):
I've done the migration, am testing the changes against Scala 2.10.4 since the 
parent spark pom uses 
scala.version2.10.4/scala.version
scala.binary.version2.10/scala.binary.version
This will allow usage of the new producer, among other nice additions in 0.8.2. 
Though I do not know when it will be GA.

 update kafka to version 0.8.2
 -

 Key: SPARK-2808
 URL: https://issues.apache.org/jira/browse/SPARK-2808
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Spark Core
Reporter: Anand Avati

 First kafka_2.11 0.8.1 has to be released



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-2808) update kafka to version 0.8.2

2014-12-07 Thread Helena Edelson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237150#comment-14237150
 ] 

Helena Edelson edited comment on SPARK-2808 at 12/7/14 3:52 PM:


I've done the migration, am testing the changes against Scala 2.10.4 since the 
parent spark pom uses 
scala.version2.10.4/scala.version
scala.binary.version2.10/scala.binary.version
I do not know when it will be GA.

https://github.com/apache/spark/pull/3631


was (Author: helena_e):
I've done the migration, am testing the changes against Scala 2.10.4 since the 
parent spark pom uses 
scala.version2.10.4/scala.version
scala.binary.version2.10/scala.binary.version
I do not know when it will be GA.

 update kafka to version 0.8.2
 -

 Key: SPARK-2808
 URL: https://issues.apache.org/jira/browse/SPARK-2808
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Spark Core
Reporter: Anand Avati

 First kafka_2.11 0.8.1 has to be released



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-07 Thread koert kuipers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237254#comment-14237254
 ] 

koert kuipers commented on SPARK-3655:
--

i also dont like the signature 
def groupByKeyAndSortValues(valueOrdering: Ordering[V], partitioner: 
Partitioner): RDD[(K, Iterable[V])]
i doubt it can be implemented efficiently

i would much prefer
def groupByKeyAndSortValues(valueOrdering: Ordering[V], partitioner: 
Partitioner): RDD[(K, TraversableOnce[V])]

but that is inconsistent with groupByKey (which i guess has Iterable in it's 
return type for historical reasons.. used to be Seq)

 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: koert kuipers
Assignee: Koert Kuipers
Priority: Minor

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-07 Thread Davis Shepherd (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237264#comment-14237264
 ] 

Davis Shepherd commented on SPARK-4759:
---

It is possible to reproduce the issue with single core local mode.

Simply change the partitions parameter to  1 (in the attached version it uses 
the defaultParallelism of the spark context, which in single core mode is 1) in 
either of the coalesce calls.

 Deadlock in complex spark job in local mode with multiple cores
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237298#comment-14237298
 ] 

Apache Spark commented on SPARK-3655:
-

User 'koertkuipers' has created a pull request for this issue:
https://github.com/apache/spark/pull/3632

 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: koert kuipers
Assignee: Koert Kuipers
Priority: Minor

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-07 Thread koert kuipers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237299#comment-14237299
 ] 

koert kuipers commented on SPARK-3655:
--

i have a new pullreq that implements just groupByKeyAndSortValues in scala and 
java. i will need some help with python.

pullreq is here:
https://github.com/apache/spark/pull/3632

i changed methods to return RDD[(K, TraversableOnce[V])] instead of RDD[(K, 
Iterable[V])], since i dont see a reasonable way to implement it so that it 
returns Iterables without resorting to keeping the data in memory.
The assumption made is that once you move on to the next key within a partition 
that the previous value (so the TraversableOnce[V]) will no longer be used.

I personally find this API too generic, and too easy to abuse or make mistakes 
with. So i prefer a more constrained API like foldLeft.


 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: koert kuipers
Assignee: Koert Kuipers
Priority: Minor

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-07 Thread koert kuipers (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

koert kuipers updated SPARK-3655:
-
Affects Version/s: 1.2.0

 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: koert kuipers
Assignee: Koert Kuipers
Priority: Minor

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or commented on SPARK-4759:
--

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8]
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(sc.defaultParallelism).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(sc.defaultParallelism).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.

 Deadlock in complex spark job in local mode with multiple cores
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/7/14 11:53 PM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8]
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8]
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(sc.defaultParallelism).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(sc.defaultParallelism).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.

 Deadlock in complex spark job in local mode with multiple cores
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4759:
-
Summary: Deadlock in complex spark job in local mode  (was: Deadlock in 
complex spark job in local mode with multiple cores)

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/7/14 11:53 PM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8] (or simply local)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8]
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.

 Deadlock in complex spark job in local mode with multiple cores
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/7/14 11:57 PM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 1/4. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 1/4 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[8] (or simply local)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 7/8.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:17 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def makeMyRdd(sc: SparkContext): RDD[Int] = {
  sc.parallelize(1 to 100).repartition(4).cache()
}

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = makeMyRdd(sc)
  rdd.checkpoint()
  rdd.count()
  val rdd2 = makeMyRdd(sc)
  val newRdd = rdd.union(rdd2).coalesce(4).cache()
  newRdd.checkpoint()
  newRdd.count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 1/4. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 1/4 too, but finishes shortly afterwards.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:20 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:20 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

def runMyJob(sc: SparkContext): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:25 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).coalesce(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).coalesce(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 1/2. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 1/2 too, but finishes shortly afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).repartition(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 4/8. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 4/8 too, but finishes shortly afterwards.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 12:36 AM:


Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/12. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/12 too, but finishes shortly 
afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).coalesce(4).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).coalesce(4)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 1/2. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 1/2 too, but finishes shortly afterwards.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4783) Remove all System.exit calls from sparkcontext

2014-12-07 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237341#comment-14237341
 ] 

Patrick Wendell commented on SPARK-4783:


For code cleanliness, we should go through and look everywhere we call 
System.exit() and see which ones can be converted safely to exceptions.  Most 
of our use of System.exit is on the executor side, but there may be a few on 
the driver/SparkContext side.

That said, if there is a fatal exception in the SparkContext, I don't think 
your app can safely just catch the exception, log it, and create a new 
SparkContext. Is that what you are trying to do? In that case there could be 
static state around that is not properly cleaned up and will cause the new 
context to be buggy.



 Remove all System.exit calls from sparkcontext
 --

 Key: SPARK-4783
 URL: https://issues.apache.org/jira/browse/SPARK-4783
 Project: Spark
  Issue Type: Bug
Reporter: David Semeria

 A common architectural choice for integrating Spark within a larger 
 application is to employ a gateway to handle Spark jobs. The gateway is a 
 server which contains one or more long-running sparkcontexts.
 A typical server is created with the following pseudo code:
 var continue = true
 while (continue){
  try {
 server.run() 
   } catch (e) {
   continue = log_and_examine_error(e)
 }
 The problem is that sparkcontext frequently calls System.exit when it 
 encounters a problem which means the server can only be re-spawned at the 
 process level, which is much more messy than the simple code above.
 Therefore, I believe it makes sense to replace all System.exit calls in 
 sparkcontext with the throwing of a fatal error. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 1:29 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  sc.setCheckpointDir(/tmp/spark-test)
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.checkpoint()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 1:33 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob(sc)

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237374#comment-14237374
 ] 

Andrew Or commented on SPARK-4759:
--

[~dgshep] That's strange. I am able to reproduce this every time, and I only 
need to call runMyJob once. What master are you running?

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Davis Shepherd (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237376#comment-14237376
 ] 

Davis Shepherd commented on SPARK-4759:
---

local[2] on the latest commit of branch 1.1

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237374#comment-14237374
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 2:36 AM:
---

[~dgshep] That's strange. I am able to reproduce this every time, and I only 
need to call runMyJob once. What master are you running?

I just tried local, local[6], and local[*] and they all reproduced the 
deadlock. I am running the master branch with this commit: 
6eb1b6f6204ea3c8083af3fb9cd990d9f3dac89d


was (Author: andrewor14):
[~dgshep] That's strange. I am able to reproduce this every time, and I only 
need to call runMyJob once. What master are you running?

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237377#comment-14237377
 ] 

Andrew Or commented on SPARK-4759:
--

Hm I'll try branch 1.1 again later tonight. There might very well be more than 
one issue that causes this.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4783) System.exit() calls in SparkContext disrupt applications embedding Spark

2014-12-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4783:
---
Summary: System.exit() calls in SparkContext disrupt applications embedding 
Spark  (was: Remove all System.exit calls from sparkcontext)

 System.exit() calls in SparkContext disrupt applications embedding Spark
 

 Key: SPARK-4783
 URL: https://issues.apache.org/jira/browse/SPARK-4783
 Project: Spark
  Issue Type: Bug
Reporter: David Semeria

 A common architectural choice for integrating Spark within a larger 
 application is to employ a gateway to handle Spark jobs. The gateway is a 
 server which contains one or more long-running sparkcontexts.
 A typical server is created with the following pseudo code:
 var continue = true
 while (continue){
  try {
 server.run() 
   } catch (e) {
   continue = log_and_examine_error(e)
 }
 The problem is that sparkcontext frequently calls System.exit when it 
 encounters a problem which means the server can only be re-spawned at the 
 process level, which is much more messy than the simple code above.
 Therefore, I believe it makes sense to replace all System.exit calls in 
 sparkcontext with the throwing of a fatal error. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Davis Shepherd (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237383#comment-14237383
 ] 

Davis Shepherd commented on SPARK-4759:
---

Ok your version does reproduce the issue against the spark-core 1.1.1 artifact 
if I copy and paste your code into the original SparkBugReplicator, but it only 
seems to hang on the second time the job is run :P. This smells of a race 
condition...

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Davis Shepherd (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237393#comment-14237393
 ] 

Davis Shepherd commented on SPARK-4759:
---

I still cannot reproduce the issue in the spark shell on tag v1.1.1

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Davis Shepherd (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237393#comment-14237393
 ] 

Davis Shepherd edited comment on SPARK-4759 at 12/8/14 3:15 AM:


I still cannot reproduce the issue with your snippet in the spark shell on tag 
v1.1.1


was (Author: dgshep):
I still cannot reproduce the issue in the spark shell on tag v1.1.1

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-07 Thread koert kuipers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237299#comment-14237299
 ] 

koert kuipers edited comment on SPARK-3655 at 12/8/14 3:24 AM:
---

i have a new pullreq that implements just groupByKeyAndSortValues in scala and 
java. i will need some help with python.

pullreq is here:
https://github.com/apache/spark/pull/3632

i changed methods to return RDD[(K, TraversableOnce[V])] instead of RDD[(K, 
Iterable[V])], since i dont see a reasonable way to implement it so that it 
returns Iterables without resorting to keeping the data in memory.
The assumption made is that once you move on to the next key within a partition 
that the previous value (so the TraversableOnce[V]) will no longer be used.

I personally find this API too generic, and too easy to abuse or make mistakes 
with. So i prefer a more constrained API like foldLeft. Or perhaps 
groupByKeyAndSortValues could be DeveloperAPI?



was (Author: koert):
i have a new pullreq that implements just groupByKeyAndSortValues in scala and 
java. i will need some help with python.

pullreq is here:
https://github.com/apache/spark/pull/3632

i changed methods to return RDD[(K, TraversableOnce[V])] instead of RDD[(K, 
Iterable[V])], since i dont see a reasonable way to implement it so that it 
returns Iterables without resorting to keeping the data in memory.
The assumption made is that once you move on to the next key within a partition 
that the previous value (so the TraversableOnce[V]) will no longer be used.

I personally find this API too generic, and too easy to abuse or make mistakes 
with. So i prefer a more constrained API like foldLeft.


 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: koert kuipers
Assignee: Koert Kuipers
Priority: Minor

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4646) Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark

2014-12-07 Thread Ankur Dave (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Dave resolved SPARK-4646.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 3507
[https://github.com/apache/spark/pull/3507]

 Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark
 --

 Key: SPARK-4646
 URL: https://issues.apache.org/jira/browse/SPARK-4646
 Project: Spark
  Issue Type: Improvement
  Components: GraphX
Reporter: Takeshi Yamamuro
Priority: Minor
 Fix For: 1.2.0


 This patch just replaces a native quick sorter with Sorter(TimSort) in Spark.
 It could get performance gains by ~8% in my quick experiments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4620) Add unpersist in Graph/GraphImpl

2014-12-07 Thread Ankur Dave (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Dave resolved SPARK-4620.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 3476
[https://github.com/apache/spark/pull/3476]

 Add unpersist in Graph/GraphImpl
 

 Key: SPARK-4620
 URL: https://issues.apache.org/jira/browse/SPARK-4620
 Project: Spark
  Issue Type: Improvement
  Components: GraphX
Reporter: Takeshi Yamamuro
Priority: Trivial
 Fix For: 1.2.0


 Add an IF to uncache both vertices and edges of Graph/GraphImpl.
 This IF is useful when iterative graph operations build a new graph in each 
 iteration, and the vertices and edges of previous iterations are no longer 
 needed for following iterations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4784) Model.fittingParamMap should store all Params

2014-12-07 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-4784:


 Summary: Model.fittingParamMap should store all Params
 Key: SPARK-4784
 URL: https://issues.apache.org/jira/browse/SPARK-4784
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley
Priority: Minor


spark.ml's Model class should store all parameters in the fittingParamMap, not 
just the ones which were explicitly set.
[~mengxr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4784) Model.fittingParamMap should store all Params

2014-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-4784:
-
Description: 
spark.ml's Model class should store all parameters in the fittingParamMap, not 
just the ones which were explicitly set.

CC: [~mengxr]

  was:
spark.ml's Model class should store all parameters in the fittingParamMap, not 
just the ones which were explicitly set.
[~mengxr]


 Model.fittingParamMap should store all Params
 -

 Key: SPARK-4784
 URL: https://issues.apache.org/jira/browse/SPARK-4784
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley
Priority: Minor

 spark.ml's Model class should store all parameters in the fittingParamMap, 
 not just the ones which were explicitly set.
 CC: [~mengxr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1350) YARN ContainerLaunchContext should use cluster's JAVA_HOME

2014-12-07 Thread Aniket Bhatnagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237525#comment-14237525
 ] 

Aniket Bhatnagar commented on SPARK-1350:
-

[~sandyr] Using Environment.JAVA_HOME.$() causes issues while submitting spark 
applications from a windows box into a Yarn cluster running on Linux (with 
spark master set as yarn-client). This is because Environment.JAVA_HOME.$() 
resolves to %JAVA_HOME% which results in not a valid executable on Linux. Is 
this a Spark issue or a YARN issue?

 YARN ContainerLaunchContext should use cluster's JAVA_HOME
 --

 Key: SPARK-1350
 URL: https://issues.apache.org/jira/browse/SPARK-1350
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 0.9.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 1.0.0


 {code}
 var javaCommand = java
 val javaHome = System.getenv(JAVA_HOME)
 if ((javaHome != null  !javaHome.isEmpty()) || 
 env.isDefinedAt(JAVA_HOME)) {
   javaCommand = Environment.JAVA_HOME.$() + /bin/java
 }
 {code}
 Currently, if JAVA_HOME is specified on the client, it will be used instead 
 of the value given on the cluster.  This makes it so that Java must be 
 installed in the same place on the client as on the cluster.
 This is a possibly incompatible change that we should get in before 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4785) When called with arguments referring column fields, PMOD throws NPE

2014-12-07 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-4785:
-

 Summary: When called with arguments referring column fields, PMOD 
throws NPE
 Key: SPARK-4785
 URL: https://issues.apache.org/jira/browse/SPARK-4785
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Cheng Lian
Priority: Blocker


Reproduction steps with {{hive/console}}:
{code}
scala loadTestTable(src)
scala sql(SELECT PMOD(key, 10) FROM src LIMIT 1).collect()
...
14/12/08 15:11:31 INFO DAGScheduler: Job 0 failed: runJob at 
basicOperators.scala:141, took 0.235788 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 
0, localhost): java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseNumeric.initialize(GenericUDFBaseNumeric.java:109)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:116)
at 
org.apache.spark.sql.hive.HiveGenericUdf.returnInspector$lzycompute(hiveUdfs.scala:156)
at 
org.apache.spark.sql.hive.HiveGenericUdf.returnInspector(hiveUdfs.scala:155)
at org.apache.spark.sql.hive.HiveGenericUdf.eval(hiveUdfs.scala:174)
at 
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:92)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
at 
org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)

[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237550#comment-14237550
 ] 

Andrew Or commented on SPARK-4759:
--

Ok yeah you're right, I can't reproduce it from the code snippet in branch 1.1 
either. There seems to be at least two issues going on here... Can you confirm 
that the snippet does reproduce the lock in master branch?

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 7:28 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

- EDIT -
This seems to reproduce it only on the master branch.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 7:29 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

=== EDIT ===
This seems to reproduce it only on the master branch, but not 1.1.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

=== EDIT ===
This seems to reproduce it only on the master branch.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-07 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237321#comment-14237321
 ] 

Andrew Or edited comment on SPARK-4759 at 12/8/14 7:29 AM:
---

Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

=== EDIT ===
This seems to reproduce it only on the master branch.


was (Author: andrewor14):
Hey I came up with a much smaller reproduction for this from your program.

1. Start spark-shell with --master local[N] where N can be anything (or simply 
local with 1 core)
2. Copy and paste the following into your REPL
{code}
def runMyJob(): Unit = {
  val rdd = sc.parallelize(1 to 100).repartition(5).cache()
  rdd.count()
  val rdd2 = sc.parallelize(1 to 100).repartition(12)
  rdd.union(rdd2).count()
}
{code}
3. runMyJob()

It should be stuck at task 5/17. Note that with local-cluster and (local) 
standalone mode, it pauses a little at 5/17 too, but finishes shortly 
afterwards.

- EDIT -
This seems to reproduce it only on the master branch.

 Deadlock in complex spark job in local mode
 ---

 Key: SPARK-4759
 URL: https://issues.apache.org/jira/browse/SPARK-4759
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0, 1.3.0
 Environment: Java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
 Mac OSX 10.10.1
 Using local spark context
Reporter: Davis Shepherd
Assignee: Andrew Or
Priority: Critical
 Attachments: SparkBugReplicator.scala


 The attached test class runs two identical jobs that perform some iterative 
 computation on an RDD[(Int, Int)]. This computation involves 
   # taking new data merging it with the previous result
   # caching and checkpointing the new result
   # rinse and repeat
 The first time the job is run, it runs successfully, and the spark context is 
 shut down. The second time the job is run with a new spark context in the 
 same process, the job hangs indefinitely, only having scheduled a subset of 
 the necessary tasks for the final stage.
 Ive been able to produce a test case that reproduces the issue, and I've 
 added some comments where some knockout experimentation has left some 
 breadcrumbs as to where the issue might be.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4781) Column values become all NULL after doing ALTER TABLE CHANGE for renaming column names (Parquet external table in HiveContext)

2014-12-07 Thread Jianshi Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianshi Huang updated SPARK-4781:
-
Issue Type: Bug  (was: Improvement)

 Column values become all NULL after doing ALTER TABLE CHANGE for renaming 
 column names (Parquet external table in HiveContext)
 --

 Key: SPARK-4781
 URL: https://issues.apache.org/jira/browse/SPARK-4781
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0, 1.3.0, 1.2.1
Reporter: Jianshi Huang

 I have a table say created like follows:
 {code}
 CREATE EXTERNAL TABLE pmt (
   `sorted::cre_ts` string
 )
 STORED AS PARQUET
 LOCATION '...'
 {code}
 And I renamed the column from sorted::cre_ts to cre_ts by doing:
 {code}
 ALTER TABLE pmt CHANGE `sorted::cre_ts` cre_ts string
 {code}
 After renaming the column, the values in the column become all NULLs.
 {noformat}
 Before renaming:
 scala sql(select `sorted::cre_ts` from pmt limit 1).collect
 res12: Array[org.apache.spark.sql.Row] = Array([12/02/2014 07:38:54])
 Execute renaming:
 scala sql(alter table pmt change `sorted::cre_ts` cre_ts string)
 res13: org.apache.spark.sql.SchemaRDD =
 SchemaRDD[972] at RDD at SchemaRDD.scala:108
 == Query Plan ==
 Native command: executed by Hive
 After renaming:
 scala sql(select cre_ts from pmt limit 1).collect
 res16: Array[org.apache.spark.sql.Row] = Array([null])
 {noformat}
 Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org