Which Spark release are you using ? I assume executor crashed due to OOME.
Did you have a chance to capture jmap on the executor before it crashed ? Have you tried giving more memory to the executor ? Thanks On Tue, May 10, 2016 at 8:25 PM, kramer2...@126.com <kramer2...@126.com> wrote: > I submit my code to a spark stand alone cluster. Find the memory usage > executor process keeps growing. Which cause the program to crash. > > I modified the code and submit several times. Find below 4 line may causing > the issue > > dataframe = > > dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits')) > windowSpec = > Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc()) > rank = func.dense_rank().over(windowSpec) > ret = > > dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'], > rank.alias('rank')).filter("rank<=2") > > It looks a little complicated but it is just some Window function on > dataframe. I use the HiveContext because SQLContext do not support window > function yet. Without the 4 line, my code can run all night. Adding them > will cause the memory leak. Program will crash in a few hours. > > I will provided the whole code (50 lines)here. ForAsk01.py > < > http://apache-spark-user-list.1001560.n3.nabble.com/file/n26921/ForAsk01.py > > > Please advice me if it is a bug.. > > Also here is the submit command > > nohup ./bin/spark-submit \ > --master spark://ES01:7077 \ > --executor-memory 4G \ > --num-executors 1 \ > --total-executor-cores 1 \ > --conf "spark.storage.memoryFraction=0.2" \ > ./ForAsk.py 1>a.log 2>b.log & > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >