Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
Try to figure out what the env vars and arguments of the worker JVM and
Python process are. Maybe you'll get a clue.

On Mon, Jul 4, 2016 at 11:42 AM Mathieu Longtin 
wrote:

> I started with a download of 1.6.0. These days, we use a self compiled
> 1.6.2.
>
> On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav 
> wrote:
>
>> I am thinking of any possibilities as to why this could be happening. If
>> the cores are multi-threaded, should that affect the daemons? Your spark
>> was built from source code or downloaded as a binary, though that should
>> not technically change anything?
>>
>> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin 
>> wrote:
>>
>>> 1.6.1.
>>>
>>> I have no idea. SPARK_WORKER_CORES should do the same.
>>>
>>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav 
>>> wrote:
>>>
 Which version of Spark are you using? 1.6.1?

 Any ideas as to why it is not working in ours?

 On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin >>> > wrote:

> 16.
>
> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav 
> wrote:
>
>> Hi,
>>
>> I tried what you suggested and started the slave using the following
>> command:
>>
>> start-slave.sh --cores 1 
>>
>> But it still seems to start as many pyspark daemons as the number of
>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>
>> When you said it helped you and limited it to 2 processes in your
>> cluster, how many cores did each machine have?
>>
>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <
>> math...@closetwork.org> wrote:
>>
>>> It depends on what you want to do:
>>>
>>> If, on any given server, you don't want Spark to use more than one
>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>>> --cores=1
>>>
>>> If you have a bunch of servers dedicated to Spark, but you don't
>>> want a driver to use more than one core per server, then: 
>>> spark.executor.cores=1
>>> tells it not to use more than 1 core per server. However, it seems it 
>>> will
>>> start as many pyspark as there are cores, but maybe not use them.
>>>
>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav 
>>> wrote:
>>>
 Hi Mathieu,

 Isn't that the same as setting "spark.executor.cores" to 1? And how
 can I specify "--cores=1" from the application?

 On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
 math...@closetwork.org> wrote:

> When running the executor, put --cores=1. We use this and I only
> see 2 pyspark process, one seem to be the parent of the other and is 
> idle.
>
> In your case, are all pyspark process working?
>
> On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:
>
>> Hi,
>>
>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>> application
>> is run, the load on the workers seems to go more than what was
>> given. When I
>> ran top, I noticed that there were too many Pyspark.daemons
>> processes
>> running. There was another mail thread regarding the same:
>>
>>
>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>
>> I followed what was mentioned there, i.e. reduced the number of
>> executor
>> cores and number of executors in one node to 1. But the number of
>> pyspark.daemons process is still not coming down. It looks like
>> initially
>> there is one Pyspark.daemons process and this in turn spawns as
>> many
>> pyspark.daemons processes as the number of cores in the machine.
>>
>> Any help is appreciated :)
>>
>> Thanks,
>> Ashwin Raaghav.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>> Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>



 --
 Regards,

 Ashwin Raaghav

>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



 --
 Regards,

 Ashwin Raaghav

>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Thanks. I'll try that. Hopefully that should work.

On Mon, Jul 4, 2016 at 9:12 PM, Mathieu Longtin 
wrote:

> I started with a download of 1.6.0. These days, we use a self compiled
> 1.6.2.
>
> On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav 
> wrote:
>
>> I am thinking of any possibilities as to why this could be happening. If
>> the cores are multi-threaded, should that affect the daemons? Your spark
>> was built from source code or downloaded as a binary, though that should
>> not technically change anything?
>>
>> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin 
>> wrote:
>>
>>> 1.6.1.
>>>
>>> I have no idea. SPARK_WORKER_CORES should do the same.
>>>
>>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav 
>>> wrote:
>>>
 Which version of Spark are you using? 1.6.1?

 Any ideas as to why it is not working in ours?

 On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin >>> > wrote:

> 16.
>
> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav 
> wrote:
>
>> Hi,
>>
>> I tried what you suggested and started the slave using the following
>> command:
>>
>> start-slave.sh --cores 1 
>>
>> But it still seems to start as many pyspark daemons as the number of
>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>
>> When you said it helped you and limited it to 2 processes in your
>> cluster, how many cores did each machine have?
>>
>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <
>> math...@closetwork.org> wrote:
>>
>>> It depends on what you want to do:
>>>
>>> If, on any given server, you don't want Spark to use more than one
>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>>> --cores=1
>>>
>>> If you have a bunch of servers dedicated to Spark, but you don't
>>> want a driver to use more than one core per server, then: 
>>> spark.executor.cores=1
>>> tells it not to use more than 1 core per server. However, it seems it 
>>> will
>>> start as many pyspark as there are cores, but maybe not use them.
>>>
>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav 
>>> wrote:
>>>
 Hi Mathieu,

 Isn't that the same as setting "spark.executor.cores" to 1? And how
 can I specify "--cores=1" from the application?

 On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
 math...@closetwork.org> wrote:

> When running the executor, put --cores=1. We use this and I only
> see 2 pyspark process, one seem to be the parent of the other and is 
> idle.
>
> In your case, are all pyspark process working?
>
> On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:
>
>> Hi,
>>
>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>> application
>> is run, the load on the workers seems to go more than what was
>> given. When I
>> ran top, I noticed that there were too many Pyspark.daemons
>> processes
>> running. There was another mail thread regarding the same:
>>
>>
>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>
>> I followed what was mentioned there, i.e. reduced the number of
>> executor
>> cores and number of executors in one node to 1. But the number of
>> pyspark.daemons process is still not coming down. It looks like
>> initially
>> there is one Pyspark.daemons process and this in turn spawns as
>> many
>> pyspark.daemons processes as the number of cores in the machine.
>>
>> Any help is appreciated :)
>>
>> Thanks,
>> Ashwin Raaghav.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>> Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>



 --
 Regards,

 Ashwin Raaghav

>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



 --
 Regards,

 Ashwin Raaghav

>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Math

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
I started with a download of 1.6.0. These days, we use a self compiled
1.6.2.

On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav  wrote:

> I am thinking of any possibilities as to why this could be happening. If
> the cores are multi-threaded, should that affect the daemons? Your spark
> was built from source code or downloaded as a binary, though that should
> not technically change anything?
>
> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin 
> wrote:
>
>> 1.6.1.
>>
>> I have no idea. SPARK_WORKER_CORES should do the same.
>>
>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav 
>> wrote:
>>
>>> Which version of Spark are you using? 1.6.1?
>>>
>>> Any ideas as to why it is not working in ours?
>>>
>>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin 
>>> wrote:
>>>
 16.

 On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav 
 wrote:

> Hi,
>
> I tried what you suggested and started the slave using the following
> command:
>
> start-slave.sh --cores 1 
>
> But it still seems to start as many pyspark daemons as the number of
> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
> file by giving SPARK_WORKER_CORES=1 also didn't help.
>
> When you said it helped you and limited it to 2 processes in your
> cluster, how many cores did each machine have?
>
> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <
> math...@closetwork.org> wrote:
>
>> It depends on what you want to do:
>>
>> If, on any given server, you don't want Spark to use more than one
>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>> --cores=1
>>
>> If you have a bunch of servers dedicated to Spark, but you don't want
>> a driver to use more than one core per server, then: 
>> spark.executor.cores=1
>> tells it not to use more than 1 core per server. However, it seems it 
>> will
>> start as many pyspark as there are cores, but maybe not use them.
>>
>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav 
>> wrote:
>>
>>> Hi Mathieu,
>>>
>>> Isn't that the same as setting "spark.executor.cores" to 1? And how
>>> can I specify "--cores=1" from the application?
>>>
>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
>>> math...@closetwork.org> wrote:
>>>
 When running the executor, put --cores=1. We use this and I only
 see 2 pyspark process, one seem to be the parent of the other and is 
 idle.

 In your case, are all pyspark process working?

 On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:

> Hi,
>
> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
> application
> is run, the load on the workers seems to go more than what was
> given. When I
> ran top, I noticed that there were too many Pyspark.daemons
> processes
> running. There was another mail thread regarding the same:
>
>
> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>
> I followed what was mentioned there, i.e. reduced the number of
> executor
> cores and number of executors in one node to 1. But the number of
> pyspark.daemons process is still not coming down. It looks like
> initially
> there is one Pyspark.daemons process and this in turn spawns as
> many
> pyspark.daemons processes as the number of cores in the machine.
>
> Any help is appreciated :)
>
> Thanks,
> Ashwin Raaghav.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
 Mathieu Longtin
 1-514-803-8977

>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Ashwin Raaghav
>>>
>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
 --
 Mathieu Longtin
 1-514-803-8977

>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Ashwin Raaghav
>>>
>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
-- 
Mathieu Longtin
1-514-803-8977


Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
I am thinking of any possibilities as to why this could be happening. If
the cores are multi-threaded, should that affect the daemons? Your spark
was built from source code or downloaded as a binary, though that should
not technically change anything?

On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin 
wrote:

> 1.6.1.
>
> I have no idea. SPARK_WORKER_CORES should do the same.
>
> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav 
> wrote:
>
>> Which version of Spark are you using? 1.6.1?
>>
>> Any ideas as to why it is not working in ours?
>>
>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin 
>> wrote:
>>
>>> 16.
>>>
>>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav 
>>> wrote:
>>>
 Hi,

 I tried what you suggested and started the slave using the following
 command:

 start-slave.sh --cores 1 

 But it still seems to start as many pyspark daemons as the number of
 cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
 file by giving SPARK_WORKER_CORES=1 also didn't help.

 When you said it helped you and limited it to 2 processes in your
 cluster, how many cores did each machine have?

 On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin >>> > wrote:

> It depends on what you want to do:
>
> If, on any given server, you don't want Spark to use more than one
> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
> --cores=1
>
> If you have a bunch of servers dedicated to Spark, but you don't want
> a driver to use more than one core per server, then: 
> spark.executor.cores=1
> tells it not to use more than 1 core per server. However, it seems it will
> start as many pyspark as there are cores, but maybe not use them.
>
> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav 
> wrote:
>
>> Hi Mathieu,
>>
>> Isn't that the same as setting "spark.executor.cores" to 1? And how
>> can I specify "--cores=1" from the application?
>>
>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
>> math...@closetwork.org> wrote:
>>
>>> When running the executor, put --cores=1. We use this and I only see
>>> 2 pyspark process, one seem to be the parent of the other and is idle.
>>>
>>> In your case, are all pyspark process working?
>>>
>>> On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:
>>>
 Hi,

 I am currently using PySpark 1.6.1 in my cluster. When a pyspark
 application
 is run, the load on the workers seems to go more than what was
 given. When I
 ran top, I noticed that there were too many Pyspark.daemons
 processes
 running. There was another mail thread regarding the same:


 https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E

 I followed what was mentioned there, i.e. reduced the number of
 executor
 cores and number of executors in one node to 1. But the number of
 pyspark.daemons process is still not coming down. It looks like
 initially
 there is one Pyspark.daemons process and this in turn spawns as many
 pyspark.daemons processes as the number of cores in the machine.

 Any help is appreciated :)

 Thanks,
 Ashwin Raaghav.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.


 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org

 --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



 --
 Regards,

 Ashwin Raaghav

>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav


Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
1.6.1.

I have no idea. SPARK_WORKER_CORES should do the same.

On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav  wrote:

> Which version of Spark are you using? 1.6.1?
>
> Any ideas as to why it is not working in ours?
>
> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin 
> wrote:
>
>> 16.
>>
>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav 
>> wrote:
>>
>>> Hi,
>>>
>>> I tried what you suggested and started the slave using the following
>>> command:
>>>
>>> start-slave.sh --cores 1 
>>>
>>> But it still seems to start as many pyspark daemons as the number of
>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>>
>>> When you said it helped you and limited it to 2 processes in your
>>> cluster, how many cores did each machine have?
>>>
>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin 
>>> wrote:
>>>
 It depends on what you want to do:

 If, on any given server, you don't want Spark to use more than one
 core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
 --cores=1

 If you have a bunch of servers dedicated to Spark, but you don't want a
 driver to use more than one core per server, then: spark.executor.cores=1
 tells it not to use more than 1 core per server. However, it seems it will
 start as many pyspark as there are cores, but maybe not use them.

 On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav 
 wrote:

> Hi Mathieu,
>
> Isn't that the same as setting "spark.executor.cores" to 1? And how
> can I specify "--cores=1" from the application?
>
> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
> math...@closetwork.org> wrote:
>
>> When running the executor, put --cores=1. We use this and I only see
>> 2 pyspark process, one seem to be the parent of the other and is idle.
>>
>> In your case, are all pyspark process working?
>>
>> On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:
>>
>>> Hi,
>>>
>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>> application
>>> is run, the load on the workers seems to go more than what was
>>> given. When I
>>> ran top, I noticed that there were too many Pyspark.daemons processes
>>> running. There was another mail thread regarding the same:
>>>
>>>
>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>>
>>> I followed what was mentioned there, i.e. reduced the number of
>>> executor
>>> cores and number of executors in one node to 1. But the number of
>>> pyspark.daemons process is still not coming down. It looks like
>>> initially
>>> there is one Pyspark.daemons process and this in turn spawns as many
>>> pyspark.daemons processes as the number of cores in the machine.
>>>
>>> Any help is appreciated :)
>>>
>>> Thanks,
>>> Ashwin Raaghav.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>> Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
 --
 Mathieu Longtin
 1-514-803-8977

>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Ashwin Raaghav
>>>
>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
-- 
Mathieu Longtin
1-514-803-8977


Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Which version of Spark are you using? 1.6.1?

Any ideas as to why it is not working in ours?

On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin 
wrote:

> 16.
>
> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav 
> wrote:
>
>> Hi,
>>
>> I tried what you suggested and started the slave using the following
>> command:
>>
>> start-slave.sh --cores 1 
>>
>> But it still seems to start as many pyspark daemons as the number of
>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>
>> When you said it helped you and limited it to 2 processes in your
>> cluster, how many cores did each machine have?
>>
>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin 
>> wrote:
>>
>>> It depends on what you want to do:
>>>
>>> If, on any given server, you don't want Spark to use more than one core,
>>> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1
>>>
>>> If you have a bunch of servers dedicated to Spark, but you don't want a
>>> driver to use more than one core per server, then: spark.executor.cores=1
>>> tells it not to use more than 1 core per server. However, it seems it will
>>> start as many pyspark as there are cores, but maybe not use them.
>>>
>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav 
>>> wrote:
>>>
 Hi Mathieu,

 Isn't that the same as setting "spark.executor.cores" to 1? And how can
 I specify "--cores=1" from the application?

 On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin >>> > wrote:

> When running the executor, put --cores=1. We use this and I only see 2
> pyspark process, one seem to be the parent of the other and is idle.
>
> In your case, are all pyspark process working?
>
> On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:
>
>> Hi,
>>
>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>> application
>> is run, the load on the workers seems to go more than what was given.
>> When I
>> ran top, I noticed that there were too many Pyspark.daemons processes
>> running. There was another mail thread regarding the same:
>>
>>
>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>
>> I followed what was mentioned there, i.e. reduced the number of
>> executor
>> cores and number of executors in one node to 1. But the number of
>> pyspark.daemons process is still not coming down. It looks like
>> initially
>> there is one Pyspark.daemons process and this in turn spawns as many
>> pyspark.daemons processes as the number of cores in the machine.
>>
>> Any help is appreciated :)
>>
>> Thanks,
>> Ashwin Raaghav.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>> Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>



 --
 Regards,

 Ashwin Raaghav

>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav


Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
16.

On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav  wrote:

> Hi,
>
> I tried what you suggested and started the slave using the following
> command:
>
> start-slave.sh --cores 1 
>
> But it still seems to start as many pyspark daemons as the number of cores
> in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by
> giving SPARK_WORKER_CORES=1 also didn't help.
>
> When you said it helped you and limited it to 2 processes in your cluster,
> how many cores did each machine have?
>
> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin 
> wrote:
>
>> It depends on what you want to do:
>>
>> If, on any given server, you don't want Spark to use more than one core,
>> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1
>>
>> If you have a bunch of servers dedicated to Spark, but you don't want a
>> driver to use more than one core per server, then: spark.executor.cores=1
>> tells it not to use more than 1 core per server. However, it seems it will
>> start as many pyspark as there are cores, but maybe not use them.
>>
>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav 
>> wrote:
>>
>>> Hi Mathieu,
>>>
>>> Isn't that the same as setting "spark.executor.cores" to 1? And how can
>>> I specify "--cores=1" from the application?
>>>
>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin 
>>> wrote:
>>>
 When running the executor, put --cores=1. We use this and I only see 2
 pyspark process, one seem to be the parent of the other and is idle.

 In your case, are all pyspark process working?

 On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:

> Hi,
>
> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
> application
> is run, the load on the workers seems to go more than what was given.
> When I
> ran top, I noticed that there were too many Pyspark.daemons processes
> running. There was another mail thread regarding the same:
>
>
> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>
> I followed what was mentioned there, i.e. reduced the number of
> executor
> cores and number of executors in one node to 1. But the number of
> pyspark.daemons process is still not coming down. It looks like
> initially
> there is one Pyspark.daemons process and this in turn spawns as many
> pyspark.daemons processes as the number of cores in the machine.
>
> Any help is appreciated :)
>
> Thanks,
> Ashwin Raaghav.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
 Mathieu Longtin
 1-514-803-8977

>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Ashwin Raaghav
>>>
>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
-- 
Mathieu Longtin
1-514-803-8977


Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Hi,

I tried what you suggested and started the slave using the following
command:

start-slave.sh --cores 1 

But it still seems to start as many pyspark daemons as the number of cores
in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by
giving SPARK_WORKER_CORES=1 also didn't help.

When you said it helped you and limited it to 2 processes in your cluster,
how many cores did each machine have?

On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin 
wrote:

> It depends on what you want to do:
>
> If, on any given server, you don't want Spark to use more than one core,
> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1
>
> If you have a bunch of servers dedicated to Spark, but you don't want a
> driver to use more than one core per server, then: spark.executor.cores=1
> tells it not to use more than 1 core per server. However, it seems it will
> start as many pyspark as there are cores, but maybe not use them.
>
> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav 
> wrote:
>
>> Hi Mathieu,
>>
>> Isn't that the same as setting "spark.executor.cores" to 1? And how can I
>> specify "--cores=1" from the application?
>>
>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin 
>> wrote:
>>
>>> When running the executor, put --cores=1. We use this and I only see 2
>>> pyspark process, one seem to be the parent of the other and is idle.
>>>
>>> In your case, are all pyspark process working?
>>>
>>> On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:
>>>
 Hi,

 I am currently using PySpark 1.6.1 in my cluster. When a pyspark
 application
 is run, the load on the workers seems to go more than what was given.
 When I
 ran top, I noticed that there were too many Pyspark.daemons processes
 running. There was another mail thread regarding the same:


 https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E

 I followed what was mentioned there, i.e. reduced the number of executor
 cores and number of executors in one node to 1. But the number of
 pyspark.daemons process is still not coming down. It looks like
 initially
 there is one Pyspark.daemons process and this in turn spawns as many
 pyspark.daemons processes as the number of cores in the machine.

 Any help is appreciated :)

 Thanks,
 Ashwin Raaghav.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org

 --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav


Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
It depends on what you want to do:

If, on any given server, you don't want Spark to use more than one core,
use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1

If you have a bunch of servers dedicated to Spark, but you don't want a
driver to use more than one core per server, then: spark.executor.cores=1
tells it not to use more than 1 core per server. However, it seems it will
start as many pyspark as there are cores, but maybe not use them.

On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav  wrote:

> Hi Mathieu,
>
> Isn't that the same as setting "spark.executor.cores" to 1? And how can I
> specify "--cores=1" from the application?
>
> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin 
> wrote:
>
>> When running the executor, put --cores=1. We use this and I only see 2
>> pyspark process, one seem to be the parent of the other and is idle.
>>
>> In your case, are all pyspark process working?
>>
>> On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:
>>
>>> Hi,
>>>
>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>> application
>>> is run, the load on the workers seems to go more than what was given.
>>> When I
>>> ran top, I noticed that there were too many Pyspark.daemons processes
>>> running. There was another mail thread regarding the same:
>>>
>>>
>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>>
>>> I followed what was mentioned there, i.e. reduced the number of executor
>>> cores and number of executors in one node to 1. But the number of
>>> pyspark.daemons process is still not coming down. It looks like initially
>>> there is one Pyspark.daemons process and this in turn spawns as many
>>> pyspark.daemons processes as the number of cores in the machine.
>>>
>>> Any help is appreciated :)
>>>
>>> Thanks,
>>> Ashwin Raaghav.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
-- 
Mathieu Longtin
1-514-803-8977


Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Hi Mathieu,

Isn't that the same as setting "spark.executor.cores" to 1? And how can I
specify "--cores=1" from the application?

On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin 
wrote:

> When running the executor, put --cores=1. We use this and I only see 2
> pyspark process, one seem to be the parent of the other and is idle.
>
> In your case, are all pyspark process working?
>
> On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:
>
>> Hi,
>>
>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>> application
>> is run, the load on the workers seems to go more than what was given.
>> When I
>> ran top, I noticed that there were too many Pyspark.daemons processes
>> running. There was another mail thread regarding the same:
>>
>>
>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>
>> I followed what was mentioned there, i.e. reduced the number of executor
>> cores and number of executors in one node to 1. But the number of
>> pyspark.daemons process is still not coming down. It looks like initially
>> there is one Pyspark.daemons process and this in turn spawns as many
>> pyspark.daemons processes as the number of cores in the machine.
>>
>> Any help is appreciated :)
>>
>> Thanks,
>> Ashwin Raaghav.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav


Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
When running the executor, put --cores=1. We use this and I only see 2
pyspark process, one seem to be the parent of the other and is idle.

In your case, are all pyspark process working?

On Mon, Jul 4, 2016 at 3:15 AM ar7  wrote:

> Hi,
>
> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
> application
> is run, the load on the workers seems to go more than what was given. When
> I
> ran top, I noticed that there were too many Pyspark.daemons processes
> running. There was another mail thread regarding the same:
>
>
> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>
> I followed what was mentioned there, i.e. reduced the number of executor
> cores and number of executors in one node to 1. But the number of
> pyspark.daemons process is still not coming down. It looks like initially
> there is one Pyspark.daemons process and this in turn spawns as many
> pyspark.daemons processes as the number of cores in the machine.
>
> Any help is appreciated :)
>
> Thanks,
> Ashwin Raaghav.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Mathieu Longtin
1-514-803-8977