Re: default parallelism and mesos executors

2015-12-15 Thread Adrian Bridgett
Thanks Iulian, I'll retest with 1.6.x once it's released (probably won't 
have enough spare time to test with the RC).


On 11/12/2015 15:00, Iulian DragoČ™ wrote:



On Wed, Dec 9, 2015 at 4:29 PM, Adrian Bridgett > wrote:


(resending, text only as first post on 2nd never seemed to make it)

Using parallelize() on a dataset I'm only seeing two tasks rather
than the number of cores in the Mesos cluster. This is with spark
1.5.1 and using the mesos coarse grained scheduler.

Running pyspark in a console seems to show that it's taking a
while before the mesos executors come online (at which point the
default parallelism is changing).  If I add "sleep 30" after
initialising the SparkContext I get the "right" number (42 by
coincidence!)

I've just tried increasing minRegisteredResourcesRatio to 0.5 but
this doesn't affect either the test case below nor my code.


This limit seems to be implemented only in the coarse-grained Mesos 
scheduler, but the fix will be available starting with Spark 1.6.0 
(1.5.2 doesn't have it).


iulian


Is there something else I can do instead?  Perhaps it should be
seeing how many tasks _should_ be available rather than how many
are (I'm also using dynamicAllocation).

15/12/02 14:34:09 INFO mesos.CoarseMesosSchedulerBackend:
SchedulerBackend is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.0
>>>
>>>
>>> print (sc.defaultParallelism)
2
>>> 15/12/02 14:34:12 INFO mesos.CoarseMesosSchedulerBackend:
Mesos task 5 is now TASK_RUNNING
15/12/02 14:34:13 INFO mesos.MesosExternalShuffleClient:
Successfully registered app
20151117-115458-164233482-5050-24333-0126 with external shuffle
service.

15/12/02 14:34:15 INFO mesos.CoarseMesosSchedulerBackend:
Registered executor:

AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@ip-10-1-200-147.ec2.internal:41194/user/Executor#-1021429650])
with ID 20151117-115458-164233482-5050-24333-S22/5
15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New
executor 20151117-115458-164233482-5050-24333-S22/5 has registered
(new total is 1)

>>> print (sc.defaultParallelism)
42

Thanks

Adrian Bridgett

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org





--

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com 



--
*Adrian Bridgett* |  Sysadmin Engineer, OpenSignal 


_
Office: First Floor, Scriptor Court, 155-157 Farringdon Road, 
Clerkenwell, London, EC1R 3AD

Phone #: +44 777-377-8251
Skype: abridgett  |@adrianbridgett | 
LinkedIn link 

_


Re: default parallelism and mesos executors

2015-12-11 Thread Iulian DragoČ™
On Wed, Dec 9, 2015 at 4:29 PM, Adrian Bridgett 
wrote:

> (resending, text only as first post on 2nd never seemed to make it)
>
> Using parallelize() on a dataset I'm only seeing two tasks rather than the
> number of cores in the Mesos cluster.  This is with spark 1.5.1 and using
> the mesos coarse grained scheduler.
>
> Running pyspark in a console seems to show that it's taking a while before
> the mesos executors come online (at which point the default parallelism is
> changing).  If I add "sleep 30" after initialising the SparkContext I get
> the "right" number (42 by coincidence!)
>
> I've just tried increasing minRegisteredResourcesRatio to 0.5 but this
> doesn't affect either the test case below nor my code.
>

This limit seems to be implemented only in the coarse-grained Mesos
scheduler, but the fix will be available starting with Spark 1.6.0 (1.5.2
doesn't have it).

iulian


>
> Is there something else I can do instead?  Perhaps it should be seeing how
> many tasks _should_ be available rather than how many are (I'm also using
> dynamicAllocation).
>
> 15/12/02 14:34:09 INFO mesos.CoarseMesosSchedulerBackend: SchedulerBackend
> is ready for scheduling beginning after reached
> minRegisteredResourcesRatio: 0.0
> >>>
> >>>
> >>> print (sc.defaultParallelism)
> 2
> >>> 15/12/02 14:34:12 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 5
> is now TASK_RUNNING
> 15/12/02 14:34:13 INFO mesos.MesosExternalShuffleClient: Successfully
> registered app 20151117-115458-164233482-5050-24333-0126 with external
> shuffle service.
> 
> 15/12/02 14:34:15 INFO mesos.CoarseMesosSchedulerBackend: Registered
> executor:
> AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@ip-10-1-200-147.ec2.internal:41194/user/Executor#-1021429650])
> with ID 20151117-115458-164233482-5050-24333-S22/5
> 15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New executor
> 20151117-115458-164233482-5050-24333-S22/5 has registered (new total is 1)
> 
> >>> print (sc.defaultParallelism)
> 42
>
> Thanks
>
> Adrian Bridgett
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


default parallelism and mesos executors

2015-12-09 Thread Adrian Bridgett

(resending, text only as first post on 2nd never seemed to make it)

Using parallelize() on a dataset I'm only seeing two tasks rather than 
the number of cores in the Mesos cluster.  This is with spark 1.5.1 and 
using the mesos coarse grained scheduler.


Running pyspark in a console seems to show that it's taking a while 
before the mesos executors come online (at which point the default 
parallelism is changing).  If I add "sleep 30" after initialising the 
SparkContext I get the "right" number (42 by coincidence!)


I've just tried increasing minRegisteredResourcesRatio to 0.5 but this 
doesn't affect either the test case below nor my code.


Is there something else I can do instead?  Perhaps it should be seeing 
how many tasks _should_ be available rather than how many are (I'm also 
using dynamicAllocation).


15/12/02 14:34:09 INFO mesos.CoarseMesosSchedulerBackend: 
SchedulerBackend is ready for scheduling beginning after reached 
minRegisteredResourcesRatio: 0.0

>>>
>>>
>>> print (sc.defaultParallelism)
2
>>> 15/12/02 14:34:12 INFO mesos.CoarseMesosSchedulerBackend: Mesos 
task 5 is now TASK_RUNNING
15/12/02 14:34:13 INFO mesos.MesosExternalShuffleClient: Successfully 
registered app 20151117-115458-164233482-5050-24333-0126 with external 
shuffle service.


15/12/02 14:34:15 INFO mesos.CoarseMesosSchedulerBackend: Registered 
executor: 
AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@ip-10-1-200-147.ec2.internal:41194/user/Executor#-1021429650]) 
with ID 20151117-115458-164233482-5050-24333-S22/5
15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New executor 
20151117-115458-164233482-5050-24333-S22/5 has registered (new total is 1)


>>> print (sc.defaultParallelism)
42

Thanks

Adrian Bridgett

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



default parallelism and mesos executors

2015-12-02 Thread Adrian Bridgett
Using parallelize() on a dataset I'm only seeing two tasks rather than 
the number of cores in the Mesos cluster.  This is with spark 1.5.1 and 
using the mesos coarse grained scheduler.


Running pyspark in a console seems to show that it's taking a while 
before the mesos executors come online (at which point the default 
parallelism is changing).  If I add "sleep 30" after initialising the 
SparkContext I get the "right" number (42 by coincidence!)


I've just tried increasing minRegisteredResourcesRatio to 0.5 but this 
doesn't affect either the test case below nor my code.


Is there something else I can do instead?  Perhaps it should be seeing 
how many tasks _should_ be available rather than how many are (I'm also 
using dynamicAllocation).


15/12/02 14:34:09 INFO mesos.CoarseMesosSchedulerBackend: 
SchedulerBackend is ready for scheduling beginning after reached 
minRegisteredResourcesRatio: 0.0

>>>
>>>
>>> print (sc.defaultParallelism)
2
>>> 15/12/02 14:34:12 INFO mesos.CoarseMesosSchedulerBackend: Mesos 
task 5 is now TASK_RUNNING
15/12/02 14:34:13 INFO mesos.MesosExternalShuffleClient: Successfully 
registered app 20151117-115458-164233482-5050-24333-0126 with external 
shuffle service.


15/12/02 14:34:15 INFO mesos.CoarseMesosSchedulerBackend: Registered 
executor: 
AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@ip-10-1-200-147.ec2.internal:41194/user/Executor#-1021429650]) 
with ID 20151117-115458-164233482-5050-24333-S22/5
15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New executor 
20151117-115458-164233482-5050-24333-S22/5 has registered (new total is 1)


>>> print (sc.defaultParallelism)
42

--
*Adrian Bridgett* |  Sysadmin Engineer, OpenSignal 


_
Office: First Floor, Scriptor Court, 155-157 Farringdon Road, 
Clerkenwell, London, EC1R 3AD

Phone #: +44 777-377-8251
Skype: abridgett  |@adrianbridgett | 
LinkedIn link 

_