Re: OOM in SizeEstimator while using combineByKey

2015-04-15 Thread Xianjin YE
what is your JVM heap size settings? The OOM in SIzeEstimator is caused by a lot of entry in IdentifyHashMap. A quick guess is that the object in your dataset is a custom class and you didn't implement the hashCode and equals method correctly. On Wednesday, April 15, 2015 at 3:10 PM,

Re: Spark, snappy and HDFS

2015-04-01 Thread Xianjin YE
Can you read snappy compressed file in hdfs? Looks like the libsnappy.so is not in the hadoop native lib path. On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote: Has anyone else encountered the following error when trying to read a snappy compressed sequence file from HDFS?