Re: Memory issue in pyspark for 1.6 mb file

Pralabh Kumar Sat, 17 Jun 2017 19:55:15 -0700

Hi Naga

Is it failing because of driver memory full or executor  memory full ?


can you please try setting this property spark.cleaner.ttl ? . So that
older RDDs /metadata should also get clear automatically.

Can you please provide the complete error stacktrace and code snippet ?.


Regards
Pralabh Kumar



On Sun, Jun 18, 2017 at 12:06 AM, Naga Guduru <gudurun...@gmail.com> wrote:

> Hi,
>
> I am trying to load 1.6 mb excel file which has 16 tabs. We converted
> excel to csv and loaded 16 csv files to 8 tables. Job was running
> successful in 1st run in pyspark. When trying to run the same job 2 time,
> container getting killed due to memory issues.
>
> I am using unpersist and clearcache on all rdds and dataframes after each
> file loaded into table. Each csv file is loaded in sequence process ( for
> loop) as some of the files should go to same table. Job will run 15 min if
> it was success and 12-15 min if it was failed. If i increase the driver
> memory and executor memory to more than 5 gb, its getting success.
>
> My assumption is driver memory full, and unpersist clear cache not working.
>
> Error: physical memory of 2 gb used and virtual memory of 4.6 gb used.
>
> Spark 1.6 version running in Cloudera Enterprise .
>
> Please let me know, if you need any info.
>
>
> Thanks
>
>

Re: Memory issue in pyspark for 1.6 mb file

Reply via email to