Hi Silvio,

I am trying to group the data from a Oracle RAW table by loading the raw table 
into a  RDD first and the registering that as a table in SAPRK.

Thanks,
Debobrota

From: Silvio Fiorito [mailto:silvio.fior...@granturing.com]
Sent: Wednesday, September 02, 2015 5:03 PM
To: SAHA, DEBOBROTA; 'user@spark.apache.org'
Subject: Re: Unbale to run Group BY on Large File

Unfortunately, groupBy is not the most efficient operation. What is it you’re 
trying to do? It may be possible with one of the other *byKey transformations.

From: "SAHA, DEBOBROTA"
Date: Wednesday, September 2, 2015 at 7:46 PM
To: "'user@spark.apache.org<mailto:'user@spark.apache.org>'"
Subject: Unbale to run Group BY on Large File

Hi ,

I am getting below error while I am trying to select data using SPARK SQL from 
a RDD table.

java.lang.OutOfMemoryError: GC overhead limit exceeded
    "Spark Context Cleaner" java.lang.InterruptedException


The file or table size is around 113 GB and I am running SPARK 1.4 on a 
standalone cluster. Tried to extend the heap size but extending to 64GB also 
didn’t help.

I would really appreciate any help on this.

Thanks,
Debobrota

Reply via email to