RE: Unbale to run Group BY on Large File

2015-09-03 Thread SAHA, DEBOBROTA
, DEBOBROTA; 'user@spark.apache.org' Subject: Re: Unbale to run Group BY on Large File Unfortunately, groupBy is not the most efficient operation. What is it you’re trying to do? It may be possible with one of the other *byKey transformations. From: "SAHA, DEBOBROTA" Date: Wednesday, September 2,

Unbale to run Group BY on Large File

2015-09-02 Thread SAHA, DEBOBROTA
Hi , I am getting below error while I am trying to select data using SPARK SQL from a RDD table. java.lang.OutOfMemoryError: GC overhead limit exceeded "Spark Context Cleaner" java.lang.InterruptedException The file or table size is around 113 GB and I am running SPARK 1.4 on a

Array Out OF Bound Exception

2015-08-24 Thread SAHA, DEBOBROTA
Hi , I am using SPARK 1.4 and I am getting an array out of bound Exception when I am trying to read from a registered table in SPARK. For example If I have 3 different text files with the content as below: Scenario 1: A1|B1|C1 A2|B2|C2 Scenario 2: A1| |C1 A2| |C2 Scenario 3: A1| B1| A2| B2|

load NULL Values in RDD

2015-08-20 Thread SAHA, DEBOBROTA
Hi , Can anyone help me in loading a column that may or may not have NULL values in a RDD. Thanks