I submit my code to a spark stand alone cluster. Find the memory usage
executor process keeps growing. Which cause the program to crash.

I modified the code and submit several times. Find below 4 line may causing
the issue

    dataframe =
dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits'))
    windowSpec =
Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc())
    rank = func.dense_rank().over(windowSpec)
    ret =
dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'],
rank.alias('rank')).filter("rank<=2")

It looks a little complicated but it is just some Window function on
dataframe. I use the HiveContext because SQLContext do not support window
function yet. Without the 4 line, my code can run all night. Adding them
will cause the memory leak. Program will crash in a few hours.

I will provided the whole code (50 lines)here.  ForAsk01.py
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26921/ForAsk01.py>  
Please advice me if it is a bug..

Also here is the submit command 

    nohup ./bin/spark-submit  \  
    --master spark://ES01:7077 \
    --executor-memory 4G \
    --num-executors 1 \
    --total-executor-cores 1 \
    --conf "spark.storage.memoryFraction=0.2"  \
    ./ForAsk.py 1>a.log 2>b.log &





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to