Unfortunately, groupBy is not the most efficient operation. What is it you’re 
trying to do? It may be possible with one of the other *byKey transformations.

From: "SAHA, DEBOBROTA"
Date: Wednesday, September 2, 2015 at 7:46 PM
To: "'user@spark.apache.org<mailto:'user@spark.apache.org>'"
Subject: Unbale to run Group BY on Large File

Hi ,

I am getting below error while I am trying to select data using SPARK SQL from 
a RDD table.

java.lang.OutOfMemoryError: GC overhead limit exceeded
    "Spark Context Cleaner" java.lang.InterruptedException


The file or table size is around 113 GB and I am running SPARK 1.4 on a 
standalone cluster. Tried to extend the heap size but extending to 64GB also 
didn’t help.

I would really appreciate any help on this.

Thanks,
Debobrota

Reply via email to