Re: Spark Streaming Job Keeps growing memory over time

2016-08-09 Thread Aasish Kumar
Hi Sandeep,

I have not enabled check pointing. I will try enabling check pointing and
observe the memory pattern. but what you really want to correlate with
check pointing . I don't know much about check-pointing.


Thanks and rgds

Aashish Kumar

Software Engineer

Avekshaa Technologies (P) Ltd. | www.avekshaa.com

+91 -9164495083

Performance Excellence Assured

*Deloitte Technology Fast 50 India *|* Technology Fast 500 APAC 2014*

*NASSCOM* Emerge 50, 2013
*Express IT Awards *- IT Innovation: Winner (silver) 2015

*P* *Every 3000 A4 paper costs 1 tree. Please **do not **print unless you
really need it, save environment & energy*

On Tue, Aug 9, 2016 at 5:30 PM, Sandeep Nemuri  wrote:

> Hi Aashish,
>
> Do you have checkpointing enabled ? if not, Can you try enabling
> checkpointing and observe the memory pattern.
>
> Thanks,
> Sandeep
> ᐧ
>
> On Tue, Aug 9, 2016 at 4:25 PM, Mich Talebzadeh  > wrote:
>
>> Hi Aashish,
>>
>> You are running in standalone mode with one node
>>
>> As I read you start master and 5 workers pop up from
>> SPARK_WORKER_INSTANCES=5. I gather you use start-slaves.sh?
>>
>> Now that is the number of workers and low memory on them port 8080
>> should  show practically no memory used (idle). Also every worker has been
>> allocated 1 core SPARK_WORKER_CORE=1
>>
>> Now it all depends how you start your start-submit job and what
>> parameters you pass to it.
>>
>> ${SPARK_HOME}/bin/spark-submit \
>> --driver-memory 1G \
>> --num-executors 2 \
>> --executor-cores 1 \
>> --executor-memory 1G \
>> --master spark://:7077 \
>>
>> What are your parameters here? From my experience standalone mode has
>> mind of its own and it does not follow what you have asked.
>>
>> If you increase the number of cores for workers, you may reduce the
>> memory issue because effectively multiple tasks can be run on sub-set of
>> your data.
>>
>> HTH
>>
>> P.S. I don't use SPARK_MASTER_OPTS
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 9 August 2016 at 11:21, aasish.kumar 
>> wrote:
>>
>>> Hi,
>>>
>>> I am running spark v 1.6.1 on a single machine in standalone mode, having
>>> 64GB RAM and 16cores.
>>>
>>> I have created five worker instances to create five executor as in
>>> standalone mode, there cannot be more than one executor in one worker
>>> node.
>>>
>>> *Configuration*:
>>>
>>> SPARK_WORKER_INSTANCES 5
>>> SPARK_WORKER_CORE 1
>>> SPARK_MASTER_OPTS "-Dspark.deploy.default.Cores=5"
>>>
>>> all other configurations are default in spark_env.sh
>>>
>>> I am running a spark streaming direct kafka job at an interval of 1 min,
>>> which takes data from kafka and after some aggregation write the data to
>>> mongo.
>>>
>>> *Problems:*
>>>
>>> > when I start master and slave, it starts one master process and five
>>> > worker processes. each only consume about 212 MB of ram.when i submit
>>> the
>>> > job , it again creates 5 executor processes and 1 job process and also
>>> the
>>> > memory uses grows to 8GB in total and keeps growing over time (slowly)
>>> > also when there is no data to process.
>>>
>>> I am also unpersisting cached rdd at the end also set spark.cleaner.ttl
>>> to
>>> 600. but still memory is growing.
>>>
>>> > one more thing, I have seen the merged SPARK-1706, then also why i am
>>> > unable to create multiple executor within a worker.and also in
>>> > spark_env.sh file , setting any configuration related to executor comes
>>> > under YARN only mode.
>>>
>>> I have also tried running example program but same problem.
>>>
>>> Any help would be greatly appreciated,
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Spark-Streaming-Job-Keeps-growing-memo
>>> ry-over-time-tp27498.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>


Re: Spark Streaming Job Keeps growing memory over time

2016-08-09 Thread Sandeep Nemuri
Hi Aashish,

Do you have checkpointing enabled ? if not, Can you try enabling
checkpointing and observe the memory pattern.

Thanks,
Sandeep
ᐧ

On Tue, Aug 9, 2016 at 4:25 PM, Mich Talebzadeh 
wrote:

> Hi Aashish,
>
> You are running in standalone mode with one node
>
> As I read you start master and 5 workers pop up from
> SPARK_WORKER_INSTANCES=5. I gather you use start-slaves.sh?
>
> Now that is the number of workers and low memory on them port 8080 should
> show practically no memory used (idle). Also every worker has been
> allocated 1 core SPARK_WORKER_CORE=1
>
> Now it all depends how you start your start-submit job and what parameters
> you pass to it.
>
> ${SPARK_HOME}/bin/spark-submit \
> --driver-memory 1G \
> --num-executors 2 \
> --executor-cores 1 \
> --executor-memory 1G \
> --master spark://:7077 \
>
> What are your parameters here? From my experience standalone mode has mind
> of its own and it does not follow what you have asked.
>
> If you increase the number of cores for workers, you may reduce the memory
> issue because effectively multiple tasks can be run on sub-set of your data.
>
> HTH
>
> P.S. I don't use SPARK_MASTER_OPTS
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 9 August 2016 at 11:21, aasish.kumar  wrote:
>
>> Hi,
>>
>> I am running spark v 1.6.1 on a single machine in standalone mode, having
>> 64GB RAM and 16cores.
>>
>> I have created five worker instances to create five executor as in
>> standalone mode, there cannot be more than one executor in one worker
>> node.
>>
>> *Configuration*:
>>
>> SPARK_WORKER_INSTANCES 5
>> SPARK_WORKER_CORE 1
>> SPARK_MASTER_OPTS "-Dspark.deploy.default.Cores=5"
>>
>> all other configurations are default in spark_env.sh
>>
>> I am running a spark streaming direct kafka job at an interval of 1 min,
>> which takes data from kafka and after some aggregation write the data to
>> mongo.
>>
>> *Problems:*
>>
>> > when I start master and slave, it starts one master process and five
>> > worker processes. each only consume about 212 MB of ram.when i submit
>> the
>> > job , it again creates 5 executor processes and 1 job process and also
>> the
>> > memory uses grows to 8GB in total and keeps growing over time (slowly)
>> > also when there is no data to process.
>>
>> I am also unpersisting cached rdd at the end also set spark.cleaner.ttl to
>> 600. but still memory is growing.
>>
>> > one more thing, I have seen the merged SPARK-1706, then also why i am
>> > unable to create multiple executor within a worker.and also in
>> > spark_env.sh file , setting any configuration related to executor comes
>> > under YARN only mode.
>>
>> I have also tried running example program but same problem.
>>
>> Any help would be greatly appreciated,
>>
>> Thanks
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Spark-Streaming-Job-Keeps-growing-memo
>> ry-over-time-tp27498.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


-- 
*  Regards*
*  Sandeep Nemuri*


Re: Spark Streaming Job Keeps growing memory over time

2016-08-09 Thread Mich Talebzadeh
Hi Aashish,

You are running in standalone mode with one node

As I read you start master and 5 workers pop up from
SPARK_WORKER_INSTANCES=5. I gather you use start-slaves.sh?

Now that is the number of workers and low memory on them port 8080 should
show practically no memory used (idle). Also every worker has been
allocated 1 core SPARK_WORKER_CORE=1

Now it all depends how you start your start-submit job and what parameters
you pass to it.

${SPARK_HOME}/bin/spark-submit \
--driver-memory 1G \
--num-executors 2 \
--executor-cores 1 \
--executor-memory 1G \
--master spark://:7077 \

What are your parameters here? From my experience standalone mode has mind
of its own and it does not follow what you have asked.

If you increase the number of cores for workers, you may reduce the memory
issue because effectively multiple tasks can be run on sub-set of your data.

HTH

P.S. I don't use SPARK_MASTER_OPTS


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 9 August 2016 at 11:21, aasish.kumar <aasish.ku...@avekshaa.com> wrote:

> Hi,
>
> I am running spark v 1.6.1 on a single machine in standalone mode, having
> 64GB RAM and 16cores.
>
> I have created five worker instances to create five executor as in
> standalone mode, there cannot be more than one executor in one worker node.
>
> *Configuration*:
>
> SPARK_WORKER_INSTANCES 5
> SPARK_WORKER_CORE 1
> SPARK_MASTER_OPTS "-Dspark.deploy.default.Cores=5"
>
> all other configurations are default in spark_env.sh
>
> I am running a spark streaming direct kafka job at an interval of 1 min,
> which takes data from kafka and after some aggregation write the data to
> mongo.
>
> *Problems:*
>
> > when I start master and slave, it starts one master process and five
> > worker processes. each only consume about 212 MB of ram.when i submit the
> > job , it again creates 5 executor processes and 1 job process and also
> the
> > memory uses grows to 8GB in total and keeps growing over time (slowly)
> > also when there is no data to process.
>
> I am also unpersisting cached rdd at the end also set spark.cleaner.ttl to
> 600. but still memory is growing.
>
> > one more thing, I have seen the merged SPARK-1706, then also why i am
> > unable to create multiple executor within a worker.and also in
> > spark_env.sh file , setting any configuration related to executor comes
> > under YARN only mode.
>
> I have also tried running example program but same problem.
>
> Any help would be greatly appreciated,
>
> Thanks
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-Streaming-Job-Keeps-growing-
> memory-over-time-tp27498.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Spark Streaming Job Keeps growing memory over time

2016-08-09 Thread aasish.kumar
Hi,

I am running spark v 1.6.1 on a single machine in standalone mode, having
64GB RAM and 16cores.

I have created five worker instances to create five executor as in
standalone mode, there cannot be more than one executor in one worker node.

*Configuration*:

SPARK_WORKER_INSTANCES 5
SPARK_WORKER_CORE 1
SPARK_MASTER_OPTS "-Dspark.deploy.default.Cores=5"

all other configurations are default in spark_env.sh

I am running a spark streaming direct kafka job at an interval of 1 min,
which takes data from kafka and after some aggregation write the data to
mongo.

*Problems:*

> when I start master and slave, it starts one master process and five
> worker processes. each only consume about 212 MB of ram.when i submit the
> job , it again creates 5 executor processes and 1 job process and also the
> memory uses grows to 8GB in total and keeps growing over time (slowly)
> also when there is no data to process.

I am also unpersisting cached rdd at the end also set spark.cleaner.ttl to
600. but still memory is growing.

> one more thing, I have seen the merged SPARK-1706, then also why i am
> unable to create multiple executor within a worker.and also in
> spark_env.sh file , setting any configuration related to executor comes
> under YARN only mode.

I have also tried running example program but same problem.

Any help would be greatly appreciated,

Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Job-Keeps-growing-memory-over-time-tp27498.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org