Why does this siimple spark program uses only one core?

2014-11-09 Thread ReticulatedPython
So, I'm running this simple program on a 16 core multicore system. I run it
by issuing the following.

spark-submit --master local[*] pi.py

And the code of that program is the following. When I use top to see CPU
consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
documentation says that the default parallelism is contained in property
spark.default.parallelism. How can I read this property from within my
python program?

#pi.py
from pyspark import SparkContext
import random

NUM_SAMPLES = 1250

def sample(p):
x, y = random.random(), random.random()
return 1 if x*x + y*y  1 else 0

sc = SparkContext(local, Test App)
count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
b: a + b)
print Pi is roughly %f % (4.0 * count / NUM_SAMPLES)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Why does this siimple spark program uses only one core?

2014-11-09 Thread Akhil Das
You can set the following entry inside the conf/spark-defaults.conf file

spark.cores.max 16


If you want to read the default value, then you can use the following api
call

*sc*.defaultParallelism

where ​*sc* is your sparkContext object.​


Thanks
Best Regards

On Sun, Nov 9, 2014 at 6:48 PM, ReticulatedPython person.of.b...@gmail.com
wrote:

 So, I'm running this simple program on a 16 core multicore system. I run it
 by issuing the following.

 spark-submit --master local[*] pi.py

 And the code of that program is the following. When I use top to see CPU
 consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
 documentation says that the default parallelism is contained in property
 spark.default.parallelism. How can I read this property from within my
 python program?

 #pi.py
 from pyspark import SparkContext
 import random

 NUM_SAMPLES = 1250

 def sample(p):
 x, y = random.random(), random.random()
 return 1 if x*x + y*y  1 else 0

 sc = SparkContext(local, Test App)
 count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
 b: a + b)
 print Pi is roughly %f % (4.0 * count / NUM_SAMPLES)



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Why does this siimple spark program uses only one core?

2014-11-09 Thread Matei Zaharia
Call getNumPartitions() on your RDD to make sure it has the right number of 
partitions. You can also specify it when doing parallelize, e.g.

rdd = sc.parallelize(xrange(1000), 10))

This should run in parallel if you have multiple partitions and cores, but it 
might be that during part of the process only one node (e.g. the master 
process) is doing anything.

Matei


 On Nov 9, 2014, at 9:27 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
 
 You can set the following entry inside the conf/spark-defaults.conf file 
 
 spark.cores.max 16
 
 If you want to read the default value, then you can use the following api call
 
 sc.defaultParallelism
 
 where ​sc is your sparkContext object.​
 
 Thanks
 Best Regards
 
 On Sun, Nov 9, 2014 at 6:48 PM, ReticulatedPython person.of.b...@gmail.com 
 mailto:person.of.b...@gmail.com wrote:
 So, I'm running this simple program on a 16 core multicore system. I run it
 by issuing the following.
 
 spark-submit --master local[*] pi.py
 
 And the code of that program is the following. When I use top to see CPU
 consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
 documentation says that the default parallelism is contained in property
 spark.default.parallelism. How can I read this property from within my
 python program?
 
 #pi.py
 from pyspark import SparkContext
 import random
 
 NUM_SAMPLES = 1250
 
 def sample(p):
 x, y = random.random(), random.random()
 return 1 if x*x + y*y  1 else 0
 
 sc = SparkContext(local, Test App)
 count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
 b: a + b)
 print Pi is roughly %f % (4.0 * count / NUM_SAMPLES)
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
  
 http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
 mailto:user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org 
 mailto:user-h...@spark.apache.org