Re: java.lang.StackOverflowError when calling count()

2014-08-12 Thread Tathagata Das
The long lineage causes a long/deep Java object tree (DAG of RDD objects), which needs to be serialized as part of the task creation. When serializing, the whole object DAG needs to be traversed leading to the stackoverflow error. TD On Mon, Aug 11, 2014 at 7:14 PM, randylu randyl...@gmail.com

Re: java.lang.StackOverflowError when calling count()

2014-08-12 Thread randylu
hi, TD. Thanks very much! I got it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-StackOverflowError-when-calling-count-tp5649p11980.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.lang.StackOverflowError when calling count()

2014-08-11 Thread randylu
hi, TD. I also fall into the trap of long lineage, and your suggestions do work well. But i don't understand why the long lineage can cause stackover, and where it takes effect? -- View this message in context:

Re: java.lang.StackOverflowError when calling count()

2014-07-26 Thread Tathagata Das
Responses inline. On Wed, Jul 23, 2014 at 4:13 AM, lalit1303 la...@sigmoidanalytics.com wrote: Hi, Thanks TD for your reply. I am still not able to resolve the problem for my use case. I have let's say 1000 different RDD's, and I am applying a transformation function on each RDD and I want

Re: java.lang.StackOverflowError when calling count()

2014-07-23 Thread lalit1303
Hi, Thanks TD for your reply. I am still not able to resolve the problem for my use case. I have let's say 1000 different RDD's, and I am applying a transformation function on each RDD and I want the output of all rdd's combined to a single output RDD. For, this I am doing the following: *Loop

Re: java.lang.StackOverflowError when calling count()

2014-05-14 Thread Nicholas Chammas
Would cache() + count() every N iterations work just as well as checkPoint() + count() to get around this issue? We're basically trying to get Spark to avoid working on too lengthy a lineage at once, right? Nick On Tue, May 13, 2014 at 12:04 PM, Xiangrui Meng men...@gmail.com wrote: After

Re: java.lang.StackOverflowError when calling count()

2014-05-14 Thread lalit1303
If we do cache() + count() after say every 50 iterations. The whole process becomes very slow. I have tried checkpoint() , cache() + count(), saveAsObjectFiles(). Nothing works. Materializing RDD's lead to drastic decrease in performance if we don't materialize, we face stackoverflowerror. On

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Mayur Rustagi
We are running into same issue. After 700 or so files the stack overflows, cache, persist checkpointing dont help. Basically checkpointing only saves the RDD when it is materialized it only materializes in the end, then it runs out of stack. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Guanhua Yan
Thanks Xiangrui. After some debugging efforts, it turns out that the problem results from a bug in my code. But it's good to know that a long lineage could also lead to this problem. I will also try checkpointing to see whether the performance can be improved. Best regards, - Guanhua On 5/13/14

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Mayur Rustagi
Count causes the overall performance to drop drastically. Infact beyond 50 files it starts to hang. if i force materialization. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, May 13, 2014 at 9:34 PM,

java.lang.StackOverflowError when calling count()

2014-05-12 Thread Guanhua Yan
Dear Sparkers: I am using Python spark of version 0.9.0 to implement some iterative algorithm. I got some errors shown at the end of this email. It seems that it's due to the Java Stack Overflow error. The same error has been duplicated on a mac desktop and a linux workstation, both running the