Re: configure to run multiple tasks on a core

2014-11-26 Thread Sean Owen
What about running, say, 2 executors per machine, each of which thinks
it should use all cores?

You can also multi-thread your map function manually, directly, within
your code, with careful use of a java.util.concurrent.Executor

On Wed, Nov 26, 2014 at 6:57 AM, yotto yotto.k...@autodesk.com wrote:
 I'm running a spark-ec2 cluster.

 I have a map task that calls a specialized C++ external app. The app doesn't
 fully utilize the core as it needs to download/upload data as part of the
 task. Looking at the worker nodes, it appears that there is one task with my
 app running per core.

 I'd like to better utilize the cpu resources with the hope of increasing
 throughput by running multiple tasks (with my app) per core in parallel.

 I see there is a spark.task.cpus config setting with a default value of 1.
 It appears though that this is used to go the other way than what I am
 looking for.

 Is there a way where I can specify multiple tasks per core rather than
 multiple cores per task?

 thanks for any help.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: configure to run multiple tasks on a core

2014-11-26 Thread Yotto Koga
Thanks Sean. That worked out well.

For anyone who happens onto this post and wants to do the same, these are the 
steps I took to do as Sean suggested...

(Note this is for a stand alone cluster)

login to the master

~/spark/sbin/stop-all.sh

edit ~/spark/conf/spark-env.sh

modify the line
export SPARK_WORKER_INSTANCES=1
to the multiple you want to set (e.g 2)

I also added
export SPARK_WORKER_MEMORY=some reasonable value so that the total number of 
workers on a node is within the available memory available on the node (e.g. 2g)

~/spark-ec2/copy-dir /root/spark/conf

~/spark/sbin/start-all.sh



From: Sean Owen [so...@cloudera.com]
Sent: Wednesday, November 26, 2014 12:14 AM
To: Yotto Koga
Cc: user@spark.apache.org
Subject: Re: configure to run multiple tasks on a core

What about running, say, 2 executors per machine, each of which thinks
it should use all cores?

You can also multi-thread your map function manually, directly, within
your code, with careful use of a java.util.concurrent.Executor

On Wed, Nov 26, 2014 at 6:57 AM, yotto yotto.k...@autodesk.com wrote:
 I'm running a spark-ec2 cluster.

 I have a map task that calls a specialized C++ external app. The app doesn't
 fully utilize the core as it needs to download/upload data as part of the
 task. Looking at the worker nodes, it appears that there is one task with my
 app running per core.

 I'd like to better utilize the cpu resources with the hope of increasing
 throughput by running multiple tasks (with my app) per core in parallel.

 I see there is a spark.task.cpus config setting with a default value of 1.
 It appears though that this is used to go the other way than what I am
 looking for.

 Is there a way where I can specify multiple tasks per core rather than
 multiple cores per task?

 thanks for any help.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: configure to run multiple tasks on a core

2014-11-26 Thread Matei Zaharia
Instead of SPARK_WORKER_INSTANCES you can also set SPARK_WORKER_CORES, to have 
one worker that thinks it has more cores.

Matei

 On Nov 26, 2014, at 5:01 PM, Yotto Koga yotto.k...@autodesk.com wrote:
 
 Thanks Sean. That worked out well.
 
 For anyone who happens onto this post and wants to do the same, these are the 
 steps I took to do as Sean suggested...
 
 (Note this is for a stand alone cluster)
 
 login to the master
 
 ~/spark/sbin/stop-all.sh
 
 edit ~/spark/conf/spark-env.sh
 
 modify the line
 export SPARK_WORKER_INSTANCES=1
 to the multiple you want to set (e.g 2)
 
 I also added
 export SPARK_WORKER_MEMORY=some reasonable value so that the total number of 
 workers on a node is within the available memory available on the node (e.g. 
 2g)
 
 ~/spark-ec2/copy-dir /root/spark/conf
 
 ~/spark/sbin/start-all.sh
 
 
 
 From: Sean Owen [so...@cloudera.com]
 Sent: Wednesday, November 26, 2014 12:14 AM
 To: Yotto Koga
 Cc: user@spark.apache.org
 Subject: Re: configure to run multiple tasks on a core
 
 What about running, say, 2 executors per machine, each of which thinks
 it should use all cores?
 
 You can also multi-thread your map function manually, directly, within
 your code, with careful use of a java.util.concurrent.Executor
 
 On Wed, Nov 26, 2014 at 6:57 AM, yotto yotto.k...@autodesk.com wrote:
 I'm running a spark-ec2 cluster.
 
 I have a map task that calls a specialized C++ external app. The app doesn't
 fully utilize the core as it needs to download/upload data as part of the
 task. Looking at the worker nodes, it appears that there is one task with my
 app running per core.
 
 I'd like to better utilize the cpu resources with the hope of increasing
 throughput by running multiple tasks (with my app) per core in parallel.
 
 I see there is a spark.task.cpus config setting with a default value of 1.
 It appears though that this is used to go the other way than what I am
 looking for.
 
 Is there a way where I can specify multiple tasks per core rather than
 multiple cores per task?
 
 thanks for any help.
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: configure to run multiple tasks on a core

2014-11-26 Thread Yotto Koga
Indeed. That's nice.

Thanks!

yotto

From: Matei Zaharia [matei.zaha...@gmail.com]
Sent: Wednesday, November 26, 2014 6:11 PM
To: Yotto Koga
Cc: Sean Owen; user@spark.apache.org
Subject: Re: configure to run multiple tasks on a core

Instead of SPARK_WORKER_INSTANCES you can also set SPARK_WORKER_CORES, to have 
one worker that thinks it has more cores.

Matei

 On Nov 26, 2014, at 5:01 PM, Yotto Koga yotto.k...@autodesk.com wrote:

 Thanks Sean. That worked out well.

 For anyone who happens onto this post and wants to do the same, these are the 
 steps I took to do as Sean suggested...

 (Note this is for a stand alone cluster)

 login to the master

 ~/spark/sbin/stop-all.sh

 edit ~/spark/conf/spark-env.sh

 modify the line
 export SPARK_WORKER_INSTANCES=1
 to the multiple you want to set (e.g 2)

 I also added
 export SPARK_WORKER_MEMORY=some reasonable value so that the total number of 
 workers on a node is within the available memory available on the node (e.g. 
 2g)

 ~/spark-ec2/copy-dir /root/spark/conf

 ~/spark/sbin/start-all.sh


 
 From: Sean Owen [so...@cloudera.com]
 Sent: Wednesday, November 26, 2014 12:14 AM
 To: Yotto Koga
 Cc: user@spark.apache.org
 Subject: Re: configure to run multiple tasks on a core

 What about running, say, 2 executors per machine, each of which thinks
 it should use all cores?

 You can also multi-thread your map function manually, directly, within
 your code, with careful use of a java.util.concurrent.Executor

 On Wed, Nov 26, 2014 at 6:57 AM, yotto yotto.k...@autodesk.com wrote:
 I'm running a spark-ec2 cluster.

 I have a map task that calls a specialized C++ external app. The app doesn't
 fully utilize the core as it needs to download/upload data as part of the
 task. Looking at the worker nodes, it appears that there is one task with my
 app running per core.

 I'd like to better utilize the cpu resources with the hope of increasing
 throughput by running multiple tasks (with my app) per core in parallel.

 I see there is a spark.task.cpus config setting with a default value of 1.
 It appears though that this is used to go the other way than what I am
 looking for.

 Is there a way where I can specify multiple tasks per core rather than
 multiple cores per task?

 thanks for any help.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



configure to run multiple tasks on a core

2014-11-25 Thread yotto
I'm running a spark-ec2 cluster.

I have a map task that calls a specialized C++ external app. The app doesn't
fully utilize the core as it needs to download/upload data as part of the
task. Looking at the worker nodes, it appears that there is one task with my
app running per core.

I'd like to better utilize the cpu resources with the hope of increasing
throughput by running multiple tasks (with my app) per core in parallel.

I see there is a spark.task.cpus config setting with a default value of 1.
It appears though that this is used to go the other way than what I am
looking for.

Is there a way where I can specify multiple tasks per core rather than
multiple cores per task?

thanks for any help.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org