Re: RDD size in memory - Array[String] vs. case classes

2014-10-11 Thread Sean Owen
Yes of course. If your number is 123456, the this takes 4 bytes as an int. But as a String in a 64-bit JVM you have an 8-byte reference, 4-byte object overhead, a char count of 4 bytes, and 6 2-byte chars. Maybe more i'm not thinking of. On Sat, Oct 11, 2014 at 6:29 AM, Liam Clarke-Hutchinson

RDD size in memory - Array[String] vs. case classes

2014-10-10 Thread Liam Clarke-Hutchinson
Hi all, I'm playing with Spark currently as a possible solution at work, and I've been recently working out a rough correlation between our input data size and RAM needed to cache an RDD that will be used multiple times in a job. As part of this I've been trialling different methods of