I have been struggling to process a set of RDDs.  Conceptually, it is is not
a large data set. It seems, no matter how much I provide to JVM or
partition, I can't seem to process this data.  I am caching the RDD.  I have
tried persit(disk and memory), perist(memory) and persist(off_heap) with no
success.  Currently I am giving 78g to my driver, daemon and executor
memory.   

Currently, it seems to have trouble with one of the largest partition,
rdd_22_29 which is 25.9 GB.  

The metrics page shows Summary Metrics for 29 Completed Tasks.  However, I
don't see few partitions on the list below.  However, i do seem to have
warnings in the log file, indicating that I don't have enough memory to hold
the data in memory.  I don't understand, what I am doing wrong or how I can
troubleshoot. Any pointers will be appreciated...

14/11/11 21:28:45 WARN CacheManager: Not enough space to cache partition
rdd_22_20 in memory! Free memory is 17190150496 bytes.
14/11/11 21:29:27 WARN CacheManager: Not enough space to cache partition
rdd_22_13 in memory! Free memory is 17190150496 bytes.


Block Name      Storage Level   Size in Memory  Size on Disk    Executors
rdd_22_0        Memory Deserialized 1x Replicated       2.1 MB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_10       Memory Deserialized 1x Replicated       7.0 GB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_11       Memory Deserialized 1x Replicated       1290.2 MB       0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_12       Memory Deserialized 1x Replicated       1167.7 KB       0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_14       Memory Deserialized 1x Replicated       3.8 GB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_15       Memory Deserialized 1x Replicated       4.0 MB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_16       Memory Deserialized 1x Replicated       2.4 GB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_17       Memory Deserialized 1x Replicated       37.6 MB 0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_18       Memory Deserialized 1x Replicated       120.9 MB        0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_19       Memory Deserialized 1x Replicated       755.9 KB        0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_2        Memory Deserialized 1x Replicated       289.5 KB        0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_21       Memory Deserialized 1x Replicated       11.9 KB 0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_22       Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_23       Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_24       Memory Deserialized 1x Replicated       3.0 MB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_25       Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_26       Memory Deserialized 1x Replicated       4.0 GB  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_27       Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_28       Memory Deserialized 1x Replicated       1846.1 KB       0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_29       Memory Deserialized 1x Replicated       25.9 GB 0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_3        Memory Deserialized 1x Replicated       267.1 KB        0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_4        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_5        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_6        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_7        Memory Deserialized 1x Replicated       14.8 KB 0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_8        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974
rdd_22_9        Memory Deserialized 1x Replicated       24.0 B  0.0 B
mddworker.c.fi-mdd-poc.internal:54974




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-processing-multiple-RDDs-tp18628.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to