Hi Naga Is it failing because of driver memory full or executor memory full ?
can you please try setting this property spark.cleaner.ttl ? . So that older RDDs /metadata should also get clear automatically. Can you please provide the complete error stacktrace and code snippet ?. Regards Pralabh Kumar On Sun, Jun 18, 2017 at 12:06 AM, Naga Guduru <gudurun...@gmail.com> wrote: > Hi, > > I am trying to load 1.6 mb excel file which has 16 tabs. We converted > excel to csv and loaded 16 csv files to 8 tables. Job was running > successful in 1st run in pyspark. When trying to run the same job 2 time, > container getting killed due to memory issues. > > I am using unpersist and clearcache on all rdds and dataframes after each > file loaded into table. Each csv file is loaded in sequence process ( for > loop) as some of the files should go to same table. Job will run 15 min if > it was success and 12-15 min if it was failed. If i increase the driver > memory and executor memory to more than 5 gb, its getting success. > > My assumption is driver memory full, and unpersist clear cache not working. > > Error: physical memory of 2 gb used and virtual memory of 4.6 gb used. > > Spark 1.6 version running in Cloudera Enterprise . > > Please let me know, if you need any info. > > > Thanks > >