RE: Unbale to run Group BY on Large File

SAHA, DEBOBROTA Thu, 03 Sep 2015 09:42:24 -0700

Hi Silvio,

I am trying to group the data from a Oracle RAW table by loading the raw table 
into a  RDD first and the registering that as a table in SAPRK.

Thanks,
Debobrota

From: Silvio Fiorito [mailto:silvio.fior...@granturing.com]
Sent: Wednesday, September 02, 2015 5:03 PM
To: SAHA, DEBOBROTA; 'user@spark.apache.org'
Subject: Re: Unbale to run Group BY on Large File

Unfortunately, groupBy is not the most efficient operation. What is it you’re 
trying to do? It may be possible with one of the other *byKey transformations.

From: "SAHA, DEBOBROTA"
Date: Wednesday, September 2, 2015 at 7:46 PM
To: "'user@spark.apache.org<mailto:'user@spark.apache.org>'"
Subject: Unbale to run Group BY on Large File

Hi ,

I am getting below error while I am trying to select data using SPARK SQL from 
a RDD table.

java.lang.OutOfMemoryError: GC overhead limit exceeded
    "Spark Context Cleaner" java.lang.InterruptedException

The file or table size is around 113 GB and I am running SPARK 1.4 on a 
standalone cluster. Tried to extend the heap size but extending to 64GB also 
didn’t help.

I would really appreciate any help on this.

Thanks,
Debobrota

RE: Unbale to run Group BY on Large File

Reply via email to