Re: Unbale to run Group BY on Large File

Raghavendra Pandey Wed, 02 Sep 2015 19:13:32 -0700

You can increase number of partitions n try...
On Sep 3, 2015 5:33 AM, "Silvio Fiorito" <silvio.fior...@granturing.com>
wrote:


> Unfortunately, groupBy is not the most efficient operation. What is it
> you’re trying to do? It may be possible with one of the other *byKey
> transformations.
>
> From: "SAHA, DEBOBROTA"
> Date: Wednesday, September 2, 2015 at 7:46 PM
> To: "'user@spark.apache.org'"
> Subject: Unbale to run Group BY on Large File
>
> Hi ,
>
>
>
> I am getting below error while I am trying to select data using SPARK SQL
> from a RDD table.
>
>
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>     "Spark Context Cleaner" java.lang.InterruptedException
>
>
>
>
>
> The file or table size is around 113 GB and I am running SPARK 1.4 on a
> standalone cluster. Tried to extend the heap size but extending to 64GB
> also didn’t help.
>
>
>
> I would really appreciate any help on this.
>
>
>
> Thanks,
>
> Debobrota
>

Re: Unbale to run Group BY on Large File

Reply via email to