Re: How to set the degree of parallelism in Spark SQL?

2016-05-26 Thread Mich Talebzadeh
Also worth adding that in standalone mode there is only one executor per spark-submit job. In Standalone cluster mode Spark allocates resources based on cores. By default, an application will grab all the cores in the cluster. You only have one worker that lives within the driver JVM process

Re: How to set the degree of parallelism in Spark SQL?

2016-05-26 Thread Ian
The number of executors is set when you launch the shell or an application with /spark-submit/. It's controlled by the /num-executors/ parameter: https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/. Important is also that cranking up the number may not

Re: How to set the degree of parallelism in Spark SQL?

2016-05-23 Thread Xinh Huynh
To the original question of parallelism and executors: you can have a parallelism of 200, even with 2 executors. In the Spark UI, you should see that the number of _tasks_ is 200 when your job involves shuffling. Executors vs. tasks: http://spark.apache.org/docs/latest/cluster-overview.html Xinh

Re: How to set the degree of parallelism in Spark SQL?

2016-05-23 Thread Mathieu Longtin
Since the default is 200, I would guess you're only running 2 executors. Try to verify how many executor you are actually running with the web interface (port 8080 where the master is running). On Sat, May 21, 2016 at 11:42 PM Ted Yu wrote: > Looks like an equal sign is

Re: How to set the degree of parallelism in Spark SQL?

2016-05-21 Thread Ted Yu
Looks like an equal sign is missing between partitions and 200. On Sat, May 21, 2016 at 8:31 PM, SRK wrote: > Hi, > > How to set the degree of parallelism in Spark SQL? I am using the following > but it somehow seems to allocate only two executors at a time. > >

How to set the degree of parallelism in Spark SQL?

2016-05-21 Thread SRK
Hi, How to set the degree of parallelism in Spark SQL? I am using the following but it somehow seems to allocate only two executors at a time. sqlContext.sql(" set spark.sql.shuffle.partitions 200 ") Thanks, Swetha -- View this message in context: