Hi, I do a word count application with 600M text file, and give the RDD's StorageLevel as StorageLevel.MEMORY_AND_DISK_2. I got two questions that I can't explain: 1. The StorageLevel shown on the UI is Disk Serialized 2x Replicated,but I am using StorageLevel.MEMORY_AND_DISK_2,where is the Memory info? Storage Level: Disk Serialized 2x Replicated Cached Partitions: 20 Total Partitions: 20 Memory Size: 107.6 MB Disk Size: 277.1 MB
2. My textfile is 600M,but the memory and Disk size shown above is about 400M total(107.6M + 277.1M), and I am using 2 replications, So, in my opinion it should be about 600M * 2, Looks some compression happens under the scene or something else? Thanks! bit1...@163.com