Re: Unbale to run Group BY on Large File

2015-09-02 Thread Silvio Fiorito
Unfortunately, groupBy is not the most efficient operation. What is it you’re 
trying to do? It may be possible with one of the other *byKey transformations.

From: "SAHA, DEBOBROTA"
Date: Wednesday, September 2, 2015 at 7:46 PM
To: "'user@spark.apache.org'"
Subject: Unbale to run Group BY on Large File

Hi ,

I am getting below error while I am trying to select data using SPARK SQL from 
a RDD table.

java.lang.OutOfMemoryError: GC overhead limit exceeded
"Spark Context Cleaner" java.lang.InterruptedException


The file or table size is around 113 GB and I am running SPARK 1.4 on a 
standalone cluster. Tried to extend the heap size but extending to 64GB also 
didn’t help.

I would really appreciate any help on this.

Thanks,
Debobrota


Re: Unbale to run Group BY on Large File

2015-09-02 Thread Raghavendra Pandey
You can increase number of partitions n try...
On Sep 3, 2015 5:33 AM, "Silvio Fiorito" 
wrote:

> Unfortunately, groupBy is not the most efficient operation. What is it
> you’re trying to do? It may be possible with one of the other *byKey
> transformations.
>
> From: "SAHA, DEBOBROTA"
> Date: Wednesday, September 2, 2015 at 7:46 PM
> To: "'user@spark.apache.org'"
> Subject: Unbale to run Group BY on Large File
>
> Hi ,
>
>
>
> I am getting below error while I am trying to select data using SPARK SQL
> from a RDD table.
>
>
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> "Spark Context Cleaner" java.lang.InterruptedException
>
>
>
>
>
> The file or table size is around 113 GB and I am running SPARK 1.4 on a
> standalone cluster. Tried to extend the heap size but extending to 64GB
> also didn’t help.
>
>
>
> I would really appreciate any help on this.
>
>
>
> Thanks,
>
> Debobrota
>


RE: Unbale to run Group BY on Large File

2015-09-03 Thread SAHA, DEBOBROTA
Hi Silvio,

I am trying to group the data from a Oracle RAW table by loading the raw table 
into a  RDD first and the registering that as a table in SAPRK.

Thanks,
Debobrota

From: Silvio Fiorito [mailto:silvio.fior...@granturing.com]
Sent: Wednesday, September 02, 2015 5:03 PM
To: SAHA, DEBOBROTA; 'user@spark.apache.org'
Subject: Re: Unbale to run Group BY on Large File

Unfortunately, groupBy is not the most efficient operation. What is it you’re 
trying to do? It may be possible with one of the other *byKey transformations.

From: "SAHA, DEBOBROTA"
Date: Wednesday, September 2, 2015 at 7:46 PM
To: "'user@spark.apache.org'"
Subject: Unbale to run Group BY on Large File

Hi ,

I am getting below error while I am trying to select data using SPARK SQL from 
a RDD table.

java.lang.OutOfMemoryError: GC overhead limit exceeded
"Spark Context Cleaner" java.lang.InterruptedException


The file or table size is around 113 GB and I am running SPARK 1.4 on a 
standalone cluster. Tried to extend the heap size but extending to 64GB also 
didn’t help.

I would really appreciate any help on this.

Thanks,
Debobrota