subject:"Will the HiveContext cause memory leak \?"

Re:Re:Re: Re:Re: Will the HiveContext cause memory leak ?

2016-05-12 Thread kramer2...@126.com

Sorry, the bug link in previous mail was is wrong.

Here is the real link:

http://apache-spark-developers-list.1001551.n3.nabble.com/Re-SQL-Memory-leak-with-spark-streaming-and-spark-sql-in-spark-1-5-1-td14603.html

At 2016-05-13 09:49:05, "李明伟" <kramer2...@126.com> wrote:

It seems we hit the same issue.

There was a bug on 1.5.1 about memory leak. But I am using 1.6.1

Here is the link about the bug in 1.5.1
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

At 2016-05-12 23:10:43, "Simon Schiff [via Apache Spark User List]"
<ml-node+s1001560n2694...@n3.nabble.com> wrote:
I read with Spark-Streaming from a Port. The incoming data consists of key and
value pairs. Then I call forEachRDD on each window. There I create a Dataset
from the window and do some SQL Querys on it. On the result i only do show, to
see the content. It works well, but the memory usage increases. When it reaches
the maximum nothing works anymore. When I use more memory. The Program runs
some time longer, but the problem persists. Because I run a Programm which
writes to the Port, I can control perfectly how much Data Spark has to Process.
When I write every one ms one key and value Pair the Problem is the same as
when i write only every second a key and value pair to the port.

When I dont create a Dataset in the foreachRDD and only count the Elements in
the RDD, then everything works fine. I also use groupBy agg functions in the
querys.

If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26940.html
To unsubscribe from Will the HiveContext cause memory leak ?, click here.
NAML

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26947.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Will the HiveContext cause memory leak ?

2016-05-12 Thread Ted Yu

The link below doesn't refer to specific bug. 

Can you send the correct link ?

Thanks 

> On May 12, 2016, at 6:50 PM, "kramer2...@126.com" <kramer2...@126.com> wrote:
> 
> It seems we hit the same issue.
> 
> There was a bug on 1.5.1 about memory leak. But I am using 1.6.1
> 
> Here is the link about the bug in 1.5.1 
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> 
> 
> 
> 
> 
> At 2016-05-12 23:10:43, "Simon Schiff [via Apache Spark User List]" <[hidden 
> email]> wrote:
> I read with Spark-Streaming from a Port. The incoming data consists of key 
> and value pairs. Then I call forEachRDD on each window. There I create a 
> Dataset from the window and do some SQL Querys on it. On the result i only do 
> show, to see the content. It works well, but the memory usage increases. When 
> it reaches the maximum nothing works anymore. When I use more memory. The 
> Program runs some time longer, but the problem persists. Because I run a 
> Programm which writes to the Port, I can control perfectly how much Data 
> Spark has to Process. When I write every one ms one key and value Pair the 
> Problem is the same as when i write only every second a key and value pair to 
> the port. 
> 
> When I dont create a Dataset in the foreachRDD and only count the Elements in 
> the RDD, then everything works fine. I also use groupBy agg functions in the 
> querys. 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26940.html
> To unsubscribe from Will the HiveContext cause memory leak ?, click here.
> NAML
> 
> 
>  
> 
> 
> View this message in context: Re:Re: Re:Re: Will the HiveContext cause memory 
> leak ?
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re:Re: Re:Re: Will the HiveContext cause memory leak ?

2016-05-12 Thread kramer2...@126.com

It seems we hit the same issue.

There was a bug on 1.5.1 about memory leak. But I am using 1.6.1

Here is the link about the bug in 1.5.1
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

When I dont create a Dataset in the foreachRDD and only count the Elements in
the RDD, then everything works fine. I also use groupBy agg functions in the
querys.

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26946.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re:Re: Will the HiveContext cause memory leak ?

2016-05-11 Thread kramer2...@126.com

Hi Simon


Can you describe your problem in more details? 
I suspect that my problem is because the window function (or may be the groupBy 
agg functions).
If you are the same. May be we should report a bug 






At 2016-05-11 23:46:49, "Simon Schiff [via Apache Spark User List]" 
<ml-node+s1001560n26930...@n3.nabble.com> wrote:
I have the same Problem with Spark-2.0.0 Snapshot with Streaming. There I use 
Datasets instead of Dataframes. I hope you or someone will find a solution.


If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26930.html
To unsubscribe from Will the HiveContext cause memory leak ?, click here.
NAML



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26934.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Will the HiveContext cause memory leak ?

2016-05-11 Thread kramer2...@126.com

sorry I have to correction again. It may still a memory leak. Because at last
the memory usage goes up again... 

eventually , the stream program crashed.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26933.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Will the HiveContext cause memory leak ?

2016-05-11 Thread kramer2...@126.com

After 8 hours. The usage of memory become stable. Use the Top command will
find it will be 75%. So means 12GB memory.


But it still do not make sense. Because my workload is very small.


I use this spark to calculate on one csv file every 20 seconds. The size of
the csv file is 1.3M.


So spark is using almost 10 000 times of memory than my workload. Does that
mean I need prepare 1TB RAM if the workload is 100M?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26927.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re:Re: Will the HiveContext cause memory leak ?

2016-05-10 Thread 李明伟

Hi  Ted


Spark version :  spark-1.6.0-bin-hadoop2.6
I tried increase the memory of executor. Still have the same problem.
I can use jmap to capture some thing. But the output is too difficult to 
understand. 










在 2016-05-11 11:50:14，"Ted Yu" <yuzhih...@gmail.com> 写道：

Which Spark release are you using ?


I assume executor crashed due to OOME.


Did you have a chance to capture jmap on the executor before it crashed ?


Have you tried giving more memory to the executor ?


Thanks


On Tue, May 10, 2016 at 8:25 PM, kramer2...@126.com<kramer2...@126.com> wrote:
I submit my code to a spark stand alone cluster. Find the memory usage
executor process keeps growing. Which cause the program to crash.

I modified the code and submit several times. Find below 4 line may causing
the issue

dataframe =
dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits'))
windowSpec =
Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc())
rank = func.dense_rank().over(windowSpec)
ret =
dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'],
rank.alias('rank')).filter("rank<=2")

It looks a little complicated but it is just some Window function on
dataframe. I use the HiveContext because SQLContext do not support window
function yet. Without the 4 line, my code can run all night. Adding them
will cause the memory leak. Program will crash in a few hours.

I will provided the whole code (50 lines)here.  ForAsk01.py
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26921/ForAsk01.py>
Please advice me if it is a bug..

Also here is the submit command

nohup ./bin/spark-submit  \
--master spark://ES01:7077 \
--executor-memory 4G \
--num-executors 1 \
--total-executor-cores 1 \
--conf "spark.storage.memoryFraction=0.2"  \
./ForAsk.py 1>a.log 2>b.log &





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Will the HiveContext cause memory leak ?

2016-05-10 Thread Ted Yu

Which Spark release are you using ?

I assume executor crashed due to OOME.

Did you have a chance to capture jmap on the executor before it crashed ?

Have you tried giving more memory to the executor ?

Thanks

On Tue, May 10, 2016 at 8:25 PM, kramer2...@126.com <kramer2...@126.com>
wrote:

> I submit my code to a spark stand alone cluster. Find the memory usage
> executor process keeps growing. Which cause the program to crash.
>
> I modified the code and submit several times. Find below 4 line may causing
> the issue
>
> dataframe =
>
> dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits'))
> windowSpec =
> Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc())
> rank = func.dense_rank().over(windowSpec)
> ret =
>
> dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'],
> rank.alias('rank')).filter("rank<=2")
>
> It looks a little complicated but it is just some Window function on
> dataframe. I use the HiveContext because SQLContext do not support window
> function yet. Without the 4 line, my code can run all night. Adding them
> will cause the memory leak. Program will crash in a few hours.
>
> I will provided the whole code (50 lines)here.  ForAsk01.py
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n26921/ForAsk01.py
> >
> Please advice me if it is a bug..
>
> Also here is the submit command
>
> nohup ./bin/spark-submit  \
> --master spark://ES01:7077 \
> --executor-memory 4G \
> --num-executors 1 \
> --total-executor-cores 1 \
> --conf "spark.storage.memoryFraction=0.2"  \
> ./ForAsk.py 1>a.log 2>b.log &
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Will the HiveContext cause memory leak ?

2016-05-10 Thread kramer2...@126.com

I submit my code to a spark stand alone cluster. Find the memory usage
executor process keeps growing. Which cause the program to crash.

I modified the code and submit several times. Find below 4 line may causing
the issue

dataframe =
dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits'))
windowSpec =
Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc())
rank = func.dense_rank().over(windowSpec)
ret =
dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'],
rank.alias('rank')).filter("rank<=2")

It looks a little complicated but it is just some Window function on
dataframe. I use the HiveContext because SQLContext do not support window
function yet. Without the 4 line, my code can run all night. Adding them
will cause the memory leak. Program will crash in a few hours.

I will provided the whole code (50 lines)here.  ForAsk01.py
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26921/ForAsk01.py>  
Please advice me if it is a bug..

Also here is the submit command 

nohup ./bin/spark-submit  \  
--master spark://ES01:7077 \
--executor-memory 4G \
--num-executors 1 \
--total-executor-cores 1 \
--conf "spark.storage.memoryFraction=0.2"  \
./ForAsk.py 1>a.log 2>b.log &





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re:Re:Re: Re:Re: Will the HiveContext cause memory leak ?

Re: Will the HiveContext cause memory leak ?

Re:Re: Re:Re: Will the HiveContext cause memory leak ?

Re:Re: Will the HiveContext cause memory leak ?

Re: Will the HiveContext cause memory leak ?

Re: Will the HiveContext cause memory leak ?

Re:Re: Will the HiveContext cause memory leak ?

Re: Will the HiveContext cause memory leak ?

Will the HiveContext cause memory leak ?

9 matches

Site Navigation

Mail list logo

Footer information