The long lineage causes a long/deep Java object tree (DAG of RDD objects),
which needs to be serialized as part of the task creation. When
serializing, the whole object DAG needs to be traversed leading to the
stackoverflow error.
TD
On Mon, Aug 11, 2014 at 7:14 PM, randylu randyl...@gmail.com
hi, TD. Thanks very much! I got it.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-StackOverflowError-when-calling-count-tp5649p11980.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
hi, TD. I also fall into the trap of long lineage, and your suggestions do
work well. But i don't understand why the long lineage can cause stackover,
and where it takes effect?
--
View this message in context:
Responses inline.
On Wed, Jul 23, 2014 at 4:13 AM, lalit1303 la...@sigmoidanalytics.com wrote:
Hi,
Thanks TD for your reply. I am still not able to resolve the problem for my
use case.
I have let's say 1000 different RDD's, and I am applying a transformation
function on each RDD and I want
Hi,
Thanks TD for your reply. I am still not able to resolve the problem for my
use case.
I have let's say 1000 different RDD's, and I am applying a transformation
function on each RDD and I want the output of all rdd's combined to a single
output RDD. For, this I am doing the following:
*Loop
Would cache() + count() every N iterations work just as well as
checkPoint() + count() to get around this issue?
We're basically trying to get Spark to avoid working on too lengthy a
lineage at once, right?
Nick
On Tue, May 13, 2014 at 12:04 PM, Xiangrui Meng men...@gmail.com wrote:
After
If we do cache() + count() after say every 50 iterations. The whole process
becomes very slow.
I have tried checkpoint() , cache() + count(), saveAsObjectFiles().
Nothing works.
Materializing RDD's lead to drastic decrease in performance if we don't
materialize, we face stackoverflowerror.
On
We are running into same issue. After 700 or so files the stack overflows,
cache, persist checkpointing dont help.
Basically checkpointing only saves the RDD when it is materialized it
only materializes in the end, then it runs out of stack.
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
Thanks Xiangrui. After some debugging efforts, it turns out that the
problem results from a bug in my code. But it's good to know that a long
lineage could also lead to this problem. I will also try checkpointing to
see whether the performance can be improved.
Best regards,
- Guanhua
On 5/13/14
Count causes the overall performance to drop drastically. Infact beyond 50
files it starts to hang. if i force materialization.
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Tue, May 13, 2014 at 9:34 PM,
Dear Sparkers:
I am using Python spark of version 0.9.0 to implement some iterative
algorithm. I got some errors shown at the end of this email. It seems that
it's due to the Java Stack Overflow error. The same error has been
duplicated on a mac desktop and a linux workstation, both running the
11 matches
Mail list logo