Re:Re: Re: Re: Re: Re: Re: How big the spark stream window could be ?

2016-05-10 Thread
BUrV8Pw http://talebzadehmich.wordpress.com On 11 May 2016 at 03:01, 李明伟 <kramer2...@126.com> wrote: Hi Mich From the ps command. I can find four process. 10409 is the master and 10603 is the worker. 12420 is the driver program and 12578 should be the executor (worker). Am I right?

Re:Re: Will the HiveContext cause memory leak ?

2016-05-10 Thread
Hi Ted Spark version : spark-1.6.0-bin-hadoop2.6 I tried increase the memory of executor. Still have the same problem. I can use jmap to capture some thing. But the output is too difficult to understand. 在 2016-05-11 11:50:14,"Ted Yu" 写道: Which Spark release

Re:Re: Re: Re: Re: Re: How big the spark stream window could be ?

2016-05-10 Thread
BUrV8Pw http://talebzadehmich.wordpress.com On 11 May 2016 at 01:22, 李明伟 <kramer2...@126.com> wrote: I actually provided them in submit command here: nohup ./bin/spark-submit --master spark://ES01:7077 --executor-memory 4G --num-executors 1 --total-executor-cores 1 --conf "spark.storage.memoryFracti

Re:Re: Re: Re: Re: How big the spark stream window could be ?

2016-05-10 Thread
Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com On 10 May 2016 at 03:12, 李明伟 <kramer2...@126.com> wrote: Hi Mich I added some more infor (the spark-env.sh setting and top comm

Re:Re: Re: Re: How big the spark stream window could be ?

2016-05-09 Thread
PCCdOABUrV8Pw http://talebzadehmich.wordpress.com On 9 May 2016 at 16:19, 李明伟 <kramer2...@126.com> wrote: Thanks for all the information guys. I wrote some code to do the test. Not using window. So only calculating data for each batch interval. I set the interval to 30 seconds also reduce the size o

Re:Re: Re: How big the spark stream window could be ?

2016-05-09 Thread
tored in memory is that streaming source is not persistent source, so you need to have a place to store the data. On Mon, May 9, 2016 at 4:43 PM, 李明伟 <kramer2...@126.com> wrote: Thanks. What if I use batch calculation instead of stream computing? Do I still need that much memory? For example, if the 24 ho

Re:Re: How big the spark stream window could be ?

2016-05-09 Thread
Thanks. What if I use batch calculation instead of stream computing? Do I still need that much memory? For example, if the 24 hour data set is 100 GB. Do I also need a 100GB RAM to do the one time batch calculation ? At 2016-05-09 15:14:47, "Saisai Shao" wrote:

Re:Re: How big the spark stream window could be ?

2016-05-09 Thread
Thanks Mich I guess I did not make my question clear enough. I know the terms like interval or window. I also know how to use them. The problem is that in my case, I need to set the window to cover data for 24 hours or 1 hours. I am not sure if it is a good way because the window is just too

Re:Re: Re: Re: Why Spark having OutOfMemory Exception?

2016-04-20 Thread
t; But my way is to setup a forever loop to handle continued income data. Not >>> sure if it is the right way to use spark Not sure what this mean, do you use spark-streaming, for doing batch job in the forever loop ? On Wed, Apr 20, 2016 at 3:55 PM, 李明伟 <kramer2...@126.com>

Re:Re: Re: Why Spark having OutOfMemory Exception?

2016-04-20 Thread
rk.driver.memory and spark.driver.maxResultSize On Tue, Apr 19, 2016 at 4:06 PM, 李明伟 <kramer2...@126.com> wrote: Hi Zhan Zhang Please see the exception trace below. It is saying some GC overhead limit error I am not a java or scala developer so it is hard for me to understand these infor. Also reading coredump

Re:Re: Why very small work load cause GC overhead limit?

2016-04-19 Thread
The memory parameters :--executor-memory 8G --driver-memory 4G. Please note that the data size is very small. Total size of the data is less than 10M As per jmap. It is a little hard for me to do so. I am not a java developer. I will google the jmap first, thanks Regards Mingwei

Re:Re: Why Spark having OutOfMemory Exception?

2016-04-19 Thread
u can use coredump to find what cause the OOM. Thanks. Zhan Zhang On Apr 18, 2016, at 9:44 PM, 李明伟 <kramer2...@126.com> wrote: Hi Samaga Thanks very much for your reply and sorry for the delay reply. Cassandra or Hive is a good suggestion. However in my situation I am not

Re:RE: Why Spark having OutOfMemory Exception?

2016-04-18 Thread
Hi Samaga Thanks very much for your reply and sorry for the delay reply. Cassandra or Hive is a good suggestion. However in my situation I am not sure if it will make sense. My requirements is that to get the recent 24 hour data to generate report. The frequency is 5 minute. So if use

Re:Re: How to design the input source of spark stream

2016-03-31 Thread
Hi Anthony Thanks. You are right the api will read all files, no need to merge At 2016-03-31 20:09:25, "Femi Anthony" wrote: Also, ssc.textFileStream(dataDir) will read all the files from a directory so as far as I can see there's no need to merge the files. Just