Hi,

How is 78g distributed in driver, daemon, executor ?

Can you please paste the logs regarding " that I don't have enough memory to 
hold the data in memory"
Are you collecting any data in driver ?

Lastly, did you try doing a re-partition to create smaller and evenly 
distributed partitions?

Regards,

Kapil 

-----Original Message-----
From: akhandeshi [mailto:ami.khande...@gmail.com] 
Sent: 12 November 2014 03:44
To: u...@spark.incubator.apache.org
Subject: Help with processing multiple RDDs

I have been struggling to process a set of RDDs.  Conceptually, it is is not a 
large data set. It seems, no matter how much I provide to JVM or partition, I 
can't seem to process this data.  I am caching the RDD.  I have tried 
persit(disk and memory), perist(memory) and persist(off_heap) with no success.  
Currently I am giving 78g to my driver, daemon and executor
memory.   

Currently, it seems to have trouble with one of the largest partition,
rdd_22_29 which is 25.9 GB.  

The metrics page shows Summary Metrics for 29 Completed Tasks.  However, I 
don't see few partitions on the list below.  However, i do seem to have 
warnings in the log file, indicating that I don't have enough memory to hold 
the data in memory.  I don't understand, what I am doing wrong or how I can 
troubleshoot. Any pointers will be appreciated...

14/11/11 21:28:45 WARN CacheManager: Not enough space to cache partition
rdd_22_20 in memory! Free memory is 17190150496 bytes.
14/11/11 21:29:27 WARN CacheManager: Not enough space to cache partition
rdd_22_13 in memory! Free memory is 17190150496 bytes.


Block Name      Storage Level   Size in Memory  Size on Disk    Executors
rdd_22_0        Memory Deserialized 1x Replicated       2.1 MB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_10       Memory Deserialized 1x Replicated       7.0 GB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_11       Memory Deserialized 1x Replicated       1290.2 MB       0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_12       Memory Deserialized 1x Replicated       1167.7 KB       0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_14       Memory Deserialized 1x Replicated       3.8 GB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_15       Memory Deserialized 1x Replicated       4.0 MB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_16       Memory Deserialized 1x Replicated       2.4 GB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_17       Memory Deserialized 1x Replicated       37.6 MB 0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_18       Memory Deserialized 1x Replicated       120.9 MB        0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_19       Memory Deserialized 1x Replicated       755.9 KB        0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_2        Memory Deserialized 1x Replicated       289.5 KB        0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_21       Memory Deserialized 1x Replicated       11.9 KB 0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_22       Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_23       Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_24       Memory Deserialized 1x Replicated       3.0 MB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_25       Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_26       Memory Deserialized 1x Replicated       4.0 GB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_27       Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_28       Memory Deserialized 1x Replicated       1846.1 KB       0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_29       Memory Deserialized 1x Replicated       25.9 GB 0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_3        Memory Deserialized 1x Replicated       267.1 KB        0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_4        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_5        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_6        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_7        Memory Deserialized 1x Replicated       14.8 KB 0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_8        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_9        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-processing-multiple-RDDs-tp18628.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to