I submit my code to a spark stand alone cluster. Find the memory usage executor process keeps growing. Which cause the program to crash.
I modified the code and submit several times. Find below 4 line may causing the issue dataframe = dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits')) windowSpec = Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc()) rank = func.dense_rank().over(windowSpec) ret = dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'], rank.alias('rank')).filter("rank<=2") It looks a little complicated but it is just some Window function on dataframe. I use the HiveContext because SQLContext do not support window function yet. Without the 4 line, my code can run all night. Adding them will cause the memory leak. Program will crash in a few hours. I will provided the whole code (50 lines)here. ForAsk01.py <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26921/ForAsk01.py> Please advice me if it is a bug.. Also here is the submit command nohup ./bin/spark-submit \ --master spark://ES01:7077 \ --executor-memory 4G \ --num-executors 1 \ --total-executor-cores 1 \ --conf "spark.storage.memoryFraction=0.2" \ ./ForAsk.py 1>a.log 2>b.log & -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org