Re: Limiting Pyspark.daemons
Try to figure out what the env vars and arguments of the worker JVM and Python process are. Maybe you'll get a clue. On Mon, Jul 4, 2016 at 11:42 AM Mathieu Longtin wrote: > I started with a download of 1.6.0. These days, we use a self compiled > 1.6.2. > > On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav > wrote: > >> I am thinking of any possibilities as to why this could be happening. If >> the cores are multi-threaded, should that affect the daemons? Your spark >> was built from source code or downloaded as a binary, though that should >> not technically change anything? >> >> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin >> wrote: >> >>> 1.6.1. >>> >>> I have no idea. SPARK_WORKER_CORES should do the same. >>> >>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav >>> wrote: >>> Which version of Spark are you using? 1.6.1? Any ideas as to why it is not working in ours? On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin >>> > wrote: > 16. > > On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav > wrote: > >> Hi, >> >> I tried what you suggested and started the slave using the following >> command: >> >> start-slave.sh --cores 1 >> >> But it still seems to start as many pyspark daemons as the number of >> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh >> file by giving SPARK_WORKER_CORES=1 also didn't help. >> >> When you said it helped you and limited it to 2 processes in your >> cluster, how many cores did each machine have? >> >> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin < >> math...@closetwork.org> wrote: >> >>> It depends on what you want to do: >>> >>> If, on any given server, you don't want Spark to use more than one >>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh >>> --cores=1 >>> >>> If you have a bunch of servers dedicated to Spark, but you don't >>> want a driver to use more than one core per server, then: >>> spark.executor.cores=1 >>> tells it not to use more than 1 core per server. However, it seems it >>> will >>> start as many pyspark as there are cores, but maybe not use them. >>> >>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav >>> wrote: >>> Hi Mathieu, Isn't that the same as setting "spark.executor.cores" to 1? And how can I specify "--cores=1" from the application? On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin < math...@closetwork.org> wrote: > When running the executor, put --cores=1. We use this and I only > see 2 pyspark process, one seem to be the parent of the other and is > idle. > > In your case, are all pyspark process working? > > On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: > >> Hi, >> >> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >> application >> is run, the load on the workers seems to go more than what was >> given. When I >> ran top, I noticed that there were too many Pyspark.daemons >> processes >> running. There was another mail thread regarding the same: >> >> >> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >> >> I followed what was mentioned there, i.e. reduced the number of >> executor >> cores and number of executors in one node to 1. But the number of >> pyspark.daemons process is still not coming down. It looks like >> initially >> there is one Pyspark.daemons process and this in turn spawns as >> many >> pyspark.daemons processes as the number of cores in the machine. >> >> Any help is appreciated :) >> >> Thanks, >> Ashwin Raaghav. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >> Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav >>> -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav >>> -- >>> Mathieu Longtin >>> 1-514-803-8977
Re: Limiting Pyspark.daemons
Thanks. I'll try that. Hopefully that should work. On Mon, Jul 4, 2016 at 9:12 PM, Mathieu Longtin wrote: > I started with a download of 1.6.0. These days, we use a self compiled > 1.6.2. > > On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav > wrote: > >> I am thinking of any possibilities as to why this could be happening. If >> the cores are multi-threaded, should that affect the daemons? Your spark >> was built from source code or downloaded as a binary, though that should >> not technically change anything? >> >> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin >> wrote: >> >>> 1.6.1. >>> >>> I have no idea. SPARK_WORKER_CORES should do the same. >>> >>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav >>> wrote: >>> Which version of Spark are you using? 1.6.1? Any ideas as to why it is not working in ours? On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin >>> > wrote: > 16. > > On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav > wrote: > >> Hi, >> >> I tried what you suggested and started the slave using the following >> command: >> >> start-slave.sh --cores 1 >> >> But it still seems to start as many pyspark daemons as the number of >> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh >> file by giving SPARK_WORKER_CORES=1 also didn't help. >> >> When you said it helped you and limited it to 2 processes in your >> cluster, how many cores did each machine have? >> >> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin < >> math...@closetwork.org> wrote: >> >>> It depends on what you want to do: >>> >>> If, on any given server, you don't want Spark to use more than one >>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh >>> --cores=1 >>> >>> If you have a bunch of servers dedicated to Spark, but you don't >>> want a driver to use more than one core per server, then: >>> spark.executor.cores=1 >>> tells it not to use more than 1 core per server. However, it seems it >>> will >>> start as many pyspark as there are cores, but maybe not use them. >>> >>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav >>> wrote: >>> Hi Mathieu, Isn't that the same as setting "spark.executor.cores" to 1? And how can I specify "--cores=1" from the application? On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin < math...@closetwork.org> wrote: > When running the executor, put --cores=1. We use this and I only > see 2 pyspark process, one seem to be the parent of the other and is > idle. > > In your case, are all pyspark process working? > > On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: > >> Hi, >> >> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >> application >> is run, the load on the workers seems to go more than what was >> given. When I >> ran top, I noticed that there were too many Pyspark.daemons >> processes >> running. There was another mail thread regarding the same: >> >> >> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >> >> I followed what was mentioned there, i.e. reduced the number of >> executor >> cores and number of executors in one node to 1. But the number of >> pyspark.daemons process is still not coming down. It looks like >> initially >> there is one Pyspark.daemons process and this in turn spawns as >> many >> pyspark.daemons processes as the number of cores in the machine. >> >> Any help is appreciated :) >> >> Thanks, >> Ashwin Raaghav. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >> Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav >>> -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav >>> -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Math
Re: Limiting Pyspark.daemons
I started with a download of 1.6.0. These days, we use a self compiled 1.6.2. On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav wrote: > I am thinking of any possibilities as to why this could be happening. If > the cores are multi-threaded, should that affect the daemons? Your spark > was built from source code or downloaded as a binary, though that should > not technically change anything? > > On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin > wrote: > >> 1.6.1. >> >> I have no idea. SPARK_WORKER_CORES should do the same. >> >> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav >> wrote: >> >>> Which version of Spark are you using? 1.6.1? >>> >>> Any ideas as to why it is not working in ours? >>> >>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin >>> wrote: >>> 16. On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav wrote: > Hi, > > I tried what you suggested and started the slave using the following > command: > > start-slave.sh --cores 1 > > But it still seems to start as many pyspark daemons as the number of > cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh > file by giving SPARK_WORKER_CORES=1 also didn't help. > > When you said it helped you and limited it to 2 processes in your > cluster, how many cores did each machine have? > > On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin < > math...@closetwork.org> wrote: > >> It depends on what you want to do: >> >> If, on any given server, you don't want Spark to use more than one >> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh >> --cores=1 >> >> If you have a bunch of servers dedicated to Spark, but you don't want >> a driver to use more than one core per server, then: >> spark.executor.cores=1 >> tells it not to use more than 1 core per server. However, it seems it >> will >> start as many pyspark as there are cores, but maybe not use them. >> >> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav >> wrote: >> >>> Hi Mathieu, >>> >>> Isn't that the same as setting "spark.executor.cores" to 1? And how >>> can I specify "--cores=1" from the application? >>> >>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin < >>> math...@closetwork.org> wrote: >>> When running the executor, put --cores=1. We use this and I only see 2 pyspark process, one seem to be the parent of the other and is idle. In your case, are all pyspark process working? On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: > Hi, > > I am currently using PySpark 1.6.1 in my cluster. When a pyspark > application > is run, the load on the workers seems to go more than what was > given. When I > ran top, I noticed that there were too many Pyspark.daemons > processes > running. There was another mail thread regarding the same: > > > https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E > > I followed what was mentioned there, i.e. reduced the number of > executor > cores and number of executors in one node to 1. But the number of > pyspark.daemons process is still not coming down. It looks like > initially > there is one Pyspark.daemons process and this in turn spawns as > many > pyspark.daemons processes as the number of cores in the machine. > > Any help is appreciated :) > > Thanks, > Ashwin Raaghav. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html > Sent from the Apache Spark User List mailing list archive at > Nabble.com. > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Mathieu Longtin 1-514-803-8977 >>> >>> >>> >>> -- >>> Regards, >>> >>> Ashwin Raaghav >>> >> -- >> Mathieu Longtin >> 1-514-803-8977 >> > > > > -- > Regards, > > Ashwin Raaghav > -- Mathieu Longtin 1-514-803-8977 >>> >>> >>> >>> -- >>> Regards, >>> >>> Ashwin Raaghav >>> >> -- >> Mathieu Longtin >> 1-514-803-8977 >> > > > > -- > Regards, > > Ashwin Raaghav > -- Mathieu Longtin 1-514-803-8977
Re: Limiting Pyspark.daemons
I am thinking of any possibilities as to why this could be happening. If the cores are multi-threaded, should that affect the daemons? Your spark was built from source code or downloaded as a binary, though that should not technically change anything? On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin wrote: > 1.6.1. > > I have no idea. SPARK_WORKER_CORES should do the same. > > On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav > wrote: > >> Which version of Spark are you using? 1.6.1? >> >> Any ideas as to why it is not working in ours? >> >> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin >> wrote: >> >>> 16. >>> >>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav >>> wrote: >>> Hi, I tried what you suggested and started the slave using the following command: start-slave.sh --cores 1 But it still seems to start as many pyspark daemons as the number of cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by giving SPARK_WORKER_CORES=1 also didn't help. When you said it helped you and limited it to 2 processes in your cluster, how many cores did each machine have? On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin >>> > wrote: > It depends on what you want to do: > > If, on any given server, you don't want Spark to use more than one > core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh > --cores=1 > > If you have a bunch of servers dedicated to Spark, but you don't want > a driver to use more than one core per server, then: > spark.executor.cores=1 > tells it not to use more than 1 core per server. However, it seems it will > start as many pyspark as there are cores, but maybe not use them. > > On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav > wrote: > >> Hi Mathieu, >> >> Isn't that the same as setting "spark.executor.cores" to 1? And how >> can I specify "--cores=1" from the application? >> >> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin < >> math...@closetwork.org> wrote: >> >>> When running the executor, put --cores=1. We use this and I only see >>> 2 pyspark process, one seem to be the parent of the other and is idle. >>> >>> In your case, are all pyspark process working? >>> >>> On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: >>> Hi, I am currently using PySpark 1.6.1 in my cluster. When a pyspark application is run, the load on the workers seems to go more than what was given. When I ran top, I noticed that there were too many Pyspark.daemons processes running. There was another mail thread regarding the same: https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E I followed what was mentioned there, i.e. reduced the number of executor cores and number of executors in one node to 1. But the number of pyspark.daemons process is still not coming down. It looks like initially there is one Pyspark.daemons process and this in turn spawns as many pyspark.daemons processes as the number of cores in the machine. Any help is appreciated :) Thanks, Ashwin Raaghav. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav >>> -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav
Re: Limiting Pyspark.daemons
1.6.1. I have no idea. SPARK_WORKER_CORES should do the same. On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav wrote: > Which version of Spark are you using? 1.6.1? > > Any ideas as to why it is not working in ours? > > On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin > wrote: > >> 16. >> >> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav >> wrote: >> >>> Hi, >>> >>> I tried what you suggested and started the slave using the following >>> command: >>> >>> start-slave.sh --cores 1 >>> >>> But it still seems to start as many pyspark daemons as the number of >>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh >>> file by giving SPARK_WORKER_CORES=1 also didn't help. >>> >>> When you said it helped you and limited it to 2 processes in your >>> cluster, how many cores did each machine have? >>> >>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin >>> wrote: >>> It depends on what you want to do: If, on any given server, you don't want Spark to use more than one core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1 If you have a bunch of servers dedicated to Spark, but you don't want a driver to use more than one core per server, then: spark.executor.cores=1 tells it not to use more than 1 core per server. However, it seems it will start as many pyspark as there are cores, but maybe not use them. On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav wrote: > Hi Mathieu, > > Isn't that the same as setting "spark.executor.cores" to 1? And how > can I specify "--cores=1" from the application? > > On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin < > math...@closetwork.org> wrote: > >> When running the executor, put --cores=1. We use this and I only see >> 2 pyspark process, one seem to be the parent of the other and is idle. >> >> In your case, are all pyspark process working? >> >> On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: >> >>> Hi, >>> >>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >>> application >>> is run, the load on the workers seems to go more than what was >>> given. When I >>> ran top, I noticed that there were too many Pyspark.daemons processes >>> running. There was another mail thread regarding the same: >>> >>> >>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >>> >>> I followed what was mentioned there, i.e. reduced the number of >>> executor >>> cores and number of executors in one node to 1. But the number of >>> pyspark.daemons process is still not coming down. It looks like >>> initially >>> there is one Pyspark.daemons process and this in turn spawns as many >>> pyspark.daemons processes as the number of cores in the machine. >>> >>> Any help is appreciated :) >>> >>> Thanks, >>> Ashwin Raaghav. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >>> Sent from the Apache Spark User List mailing list archive at >>> Nabble.com. >>> >>> - >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> -- >> Mathieu Longtin >> 1-514-803-8977 >> > > > > -- > Regards, > > Ashwin Raaghav > -- Mathieu Longtin 1-514-803-8977 >>> >>> >>> >>> -- >>> Regards, >>> >>> Ashwin Raaghav >>> >> -- >> Mathieu Longtin >> 1-514-803-8977 >> > > > > -- > Regards, > > Ashwin Raaghav > -- Mathieu Longtin 1-514-803-8977
Re: Limiting Pyspark.daemons
Which version of Spark are you using? 1.6.1? Any ideas as to why it is not working in ours? On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin wrote: > 16. > > On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav > wrote: > >> Hi, >> >> I tried what you suggested and started the slave using the following >> command: >> >> start-slave.sh --cores 1 >> >> But it still seems to start as many pyspark daemons as the number of >> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh >> file by giving SPARK_WORKER_CORES=1 also didn't help. >> >> When you said it helped you and limited it to 2 processes in your >> cluster, how many cores did each machine have? >> >> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin >> wrote: >> >>> It depends on what you want to do: >>> >>> If, on any given server, you don't want Spark to use more than one core, >>> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1 >>> >>> If you have a bunch of servers dedicated to Spark, but you don't want a >>> driver to use more than one core per server, then: spark.executor.cores=1 >>> tells it not to use more than 1 core per server. However, it seems it will >>> start as many pyspark as there are cores, but maybe not use them. >>> >>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav >>> wrote: >>> Hi Mathieu, Isn't that the same as setting "spark.executor.cores" to 1? And how can I specify "--cores=1" from the application? On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin >>> > wrote: > When running the executor, put --cores=1. We use this and I only see 2 > pyspark process, one seem to be the parent of the other and is idle. > > In your case, are all pyspark process working? > > On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: > >> Hi, >> >> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >> application >> is run, the load on the workers seems to go more than what was given. >> When I >> ran top, I noticed that there were too many Pyspark.daemons processes >> running. There was another mail thread regarding the same: >> >> >> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >> >> I followed what was mentioned there, i.e. reduced the number of >> executor >> cores and number of executors in one node to 1. But the number of >> pyspark.daemons process is still not coming down. It looks like >> initially >> there is one Pyspark.daemons process and this in turn spawns as many >> pyspark.daemons processes as the number of cores in the machine. >> >> Any help is appreciated :) >> >> Thanks, >> Ashwin Raaghav. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >> Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav >>> -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav
Re: Limiting Pyspark.daemons
16. On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav wrote: > Hi, > > I tried what you suggested and started the slave using the following > command: > > start-slave.sh --cores 1 > > But it still seems to start as many pyspark daemons as the number of cores > in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by > giving SPARK_WORKER_CORES=1 also didn't help. > > When you said it helped you and limited it to 2 processes in your cluster, > how many cores did each machine have? > > On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin > wrote: > >> It depends on what you want to do: >> >> If, on any given server, you don't want Spark to use more than one core, >> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1 >> >> If you have a bunch of servers dedicated to Spark, but you don't want a >> driver to use more than one core per server, then: spark.executor.cores=1 >> tells it not to use more than 1 core per server. However, it seems it will >> start as many pyspark as there are cores, but maybe not use them. >> >> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav >> wrote: >> >>> Hi Mathieu, >>> >>> Isn't that the same as setting "spark.executor.cores" to 1? And how can >>> I specify "--cores=1" from the application? >>> >>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin >>> wrote: >>> When running the executor, put --cores=1. We use this and I only see 2 pyspark process, one seem to be the parent of the other and is idle. In your case, are all pyspark process working? On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: > Hi, > > I am currently using PySpark 1.6.1 in my cluster. When a pyspark > application > is run, the load on the workers seems to go more than what was given. > When I > ran top, I noticed that there were too many Pyspark.daemons processes > running. There was another mail thread regarding the same: > > > https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E > > I followed what was mentioned there, i.e. reduced the number of > executor > cores and number of executors in one node to 1. But the number of > pyspark.daemons process is still not coming down. It looks like > initially > there is one Pyspark.daemons process and this in turn spawns as many > pyspark.daemons processes as the number of cores in the machine. > > Any help is appreciated :) > > Thanks, > Ashwin Raaghav. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html > Sent from the Apache Spark User List mailing list archive at > Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Mathieu Longtin 1-514-803-8977 >>> >>> >>> >>> -- >>> Regards, >>> >>> Ashwin Raaghav >>> >> -- >> Mathieu Longtin >> 1-514-803-8977 >> > > > > -- > Regards, > > Ashwin Raaghav > -- Mathieu Longtin 1-514-803-8977
Re: Limiting Pyspark.daemons
Hi, I tried what you suggested and started the slave using the following command: start-slave.sh --cores 1 But it still seems to start as many pyspark daemons as the number of cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by giving SPARK_WORKER_CORES=1 also didn't help. When you said it helped you and limited it to 2 processes in your cluster, how many cores did each machine have? On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin wrote: > It depends on what you want to do: > > If, on any given server, you don't want Spark to use more than one core, > use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1 > > If you have a bunch of servers dedicated to Spark, but you don't want a > driver to use more than one core per server, then: spark.executor.cores=1 > tells it not to use more than 1 core per server. However, it seems it will > start as many pyspark as there are cores, but maybe not use them. > > On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav > wrote: > >> Hi Mathieu, >> >> Isn't that the same as setting "spark.executor.cores" to 1? And how can I >> specify "--cores=1" from the application? >> >> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin >> wrote: >> >>> When running the executor, put --cores=1. We use this and I only see 2 >>> pyspark process, one seem to be the parent of the other and is idle. >>> >>> In your case, are all pyspark process working? >>> >>> On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: >>> Hi, I am currently using PySpark 1.6.1 in my cluster. When a pyspark application is run, the load on the workers seems to go more than what was given. When I ran top, I noticed that there were too many Pyspark.daemons processes running. There was another mail thread regarding the same: https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E I followed what was mentioned there, i.e. reduced the number of executor cores and number of executors in one node to 1. But the number of pyspark.daemons process is still not coming down. It looks like initially there is one Pyspark.daemons process and this in turn spawns as many pyspark.daemons processes as the number of cores in the machine. Any help is appreciated :) Thanks, Ashwin Raaghav. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav
Re: Limiting Pyspark.daemons
It depends on what you want to do: If, on any given server, you don't want Spark to use more than one core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1 If you have a bunch of servers dedicated to Spark, but you don't want a driver to use more than one core per server, then: spark.executor.cores=1 tells it not to use more than 1 core per server. However, it seems it will start as many pyspark as there are cores, but maybe not use them. On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav wrote: > Hi Mathieu, > > Isn't that the same as setting "spark.executor.cores" to 1? And how can I > specify "--cores=1" from the application? > > On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin > wrote: > >> When running the executor, put --cores=1. We use this and I only see 2 >> pyspark process, one seem to be the parent of the other and is idle. >> >> In your case, are all pyspark process working? >> >> On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: >> >>> Hi, >>> >>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >>> application >>> is run, the load on the workers seems to go more than what was given. >>> When I >>> ran top, I noticed that there were too many Pyspark.daemons processes >>> running. There was another mail thread regarding the same: >>> >>> >>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >>> >>> I followed what was mentioned there, i.e. reduced the number of executor >>> cores and number of executors in one node to 1. But the number of >>> pyspark.daemons process is still not coming down. It looks like initially >>> there is one Pyspark.daemons process and this in turn spawns as many >>> pyspark.daemons processes as the number of cores in the machine. >>> >>> Any help is appreciated :) >>> >>> Thanks, >>> Ashwin Raaghav. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> -- >> Mathieu Longtin >> 1-514-803-8977 >> > > > > -- > Regards, > > Ashwin Raaghav > -- Mathieu Longtin 1-514-803-8977
Re: Limiting Pyspark.daemons
Hi Mathieu, Isn't that the same as setting "spark.executor.cores" to 1? And how can I specify "--cores=1" from the application? On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin wrote: > When running the executor, put --cores=1. We use this and I only see 2 > pyspark process, one seem to be the parent of the other and is idle. > > In your case, are all pyspark process working? > > On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: > >> Hi, >> >> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >> application >> is run, the load on the workers seems to go more than what was given. >> When I >> ran top, I noticed that there were too many Pyspark.daemons processes >> running. There was another mail thread regarding the same: >> >> >> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >> >> I followed what was mentioned there, i.e. reduced the number of executor >> cores and number of executors in one node to 1. But the number of >> pyspark.daemons process is still not coming down. It looks like initially >> there is one Pyspark.daemons process and this in turn spawns as many >> pyspark.daemons processes as the number of cores in the machine. >> >> Any help is appreciated :) >> >> Thanks, >> Ashwin Raaghav. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav
Re: Limiting Pyspark.daemons
When running the executor, put --cores=1. We use this and I only see 2 pyspark process, one seem to be the parent of the other and is idle. In your case, are all pyspark process working? On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: > Hi, > > I am currently using PySpark 1.6.1 in my cluster. When a pyspark > application > is run, the load on the workers seems to go more than what was given. When > I > ran top, I noticed that there were too many Pyspark.daemons processes > running. There was another mail thread regarding the same: > > > https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E > > I followed what was mentioned there, i.e. reduced the number of executor > cores and number of executors in one node to 1. But the number of > pyspark.daemons process is still not coming down. It looks like initially > there is one Pyspark.daemons process and this in turn spawns as many > pyspark.daemons processes as the number of cores in the machine. > > Any help is appreciated :) > > Thanks, > Ashwin Raaghav. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Mathieu Longtin 1-514-803-8977
Limiting Pyspark.daemons
Hi, I am currently using PySpark 1.6.1 in my cluster. When a pyspark application is run, the load on the workers seems to go more than what was given. When I ran top, I noticed that there were too many Pyspark.daemons processes running. There was another mail thread regarding the same: https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E I followed what was mentioned there, i.e. reduced the number of executor cores and number of executors in one node to 1. But the number of pyspark.daemons process is still not coming down. It looks like initially there is one Pyspark.daemons process and this in turn spawns as many pyspark.daemons processes as the number of cores in the machine. Any help is appreciated :) Thanks, Ashwin Raaghav. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org