Hi, 
I do a word count application with 600M text file, and give the RDD's  
StorageLevel as StorageLevel.MEMORY_AND_DISK_2. 
I got two questions that I can't explain:
1. The StorageLevel shown on the UI is Disk Serialized 2x Replicated,but I am 
using StorageLevel.MEMORY_AND_DISK_2,where is the Memory info?
Storage Level: Disk Serialized 2x Replicated 
Cached Partitions: 20 
Total Partitions: 20 
Memory Size: 107.6 MB 
Disk Size: 277.1 MB             

2. My textfile is 600M,but the memory and Disk size shown above is about 400M 
total(107.6M + 277.1M), and I am using 2 replications, So, in my opinion it 
should be about 600M * 2, Looks some compression happens under the scene or 
something else?

Thanks!


bit1...@163.com

Reply via email to