You can increase number of partitions n try... On Sep 3, 2015 5:33 AM, "Silvio Fiorito" <silvio.fior...@granturing.com> wrote:
> Unfortunately, groupBy is not the most efficient operation. What is it > you’re trying to do? It may be possible with one of the other *byKey > transformations. > > From: "SAHA, DEBOBROTA" > Date: Wednesday, September 2, 2015 at 7:46 PM > To: "'user@spark.apache.org'" > Subject: Unbale to run Group BY on Large File > > Hi , > > > > I am getting below error while I am trying to select data using SPARK SQL > from a RDD table. > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > "Spark Context Cleaner" java.lang.InterruptedException > > > > > > The file or table size is around 113 GB and I am running SPARK 1.4 on a > standalone cluster. Tried to extend the heap size but extending to 64GB > also didn’t help. > > > > I would really appreciate any help on this. > > > > Thanks, > > Debobrota >