Re: [py-dev] xdist and thread-safe resource counting

holger krekel Fri, 20 Jan 2012 00:50:28 -0800

Hi Eli,

interesting problem.

On Wed, Jan 18, 2012 at 20:55 -0800, Ateljevich, Eli wrote:
> I have a question about managing resources in a threadsafe way across xdist 
> -n.
> 
> My group is using py.test as a high-level driver for testing an mpi-based 
> numerical code. Many of our system-level tests wrap a system call to mpirun 
> then postprocess results. I have a decorator for the tests that hints at the 
> number of processors needed (usually something like 1,2,8).
> 
> I would like to launch as much as I can at once given the available 
> processors. For instance, if 16 processors are available there is no reason I 
> couldn't be doing a 12 and a 4 processor test. I was thinking of using xdist 
> with some modest number of processors representing the maximum number of 
> concurrent tests. The xdist test processors would launch mpi jobs when enough 
> processors become available to satisfy the np hint for that test. This would 
> be managed by having the tests "check out" cores and sleep if they aren't 
> available yet.
> 
> This design requires a threadsafe method to query, acquire and lock the count 
> of available mpi cores. I could use some sort of lock or semaphore from 
> threading, but I thought it would be good to run this by the xdist 
> cognoscenti and find out if there might be a preferred way of doing this 
> given how xdist itself distributes its work or manages threads.

pytest-xdist itself does not provide or use a method to query the number
of available processors.  Quick background of xdist: Master process starts 
a number of processes which collect tests (see output of py.test --collectonly) 
and the master sees the test ids of all those collections.  It then decides 
the scheduling (Each or Load at the moment, "-n5" implies load-balancing) and 
sends test ids to the nodes to execute.  It pre-loads tests with test ids
and then waits for completion for sending more test ids to each node.
There is no node-to-node communication for co-ordination.

It might be easiest to not try to extend the xdist-mechanisms
but to implement an independent method which co-ordinates the number of running
MPI tests / used processors via a file or so.  For example, on posix you 
can get read/write a file with some meta-information and use the 
atomic os.rename operation.  Not sure about the exact semantics but
this should be doable and testable without any xdist involvement. 
If you have such a method which helps to restrict the number
of MPI-processes you can then use it from a pytest_runtest_setup which 
can read your decorator-attributes/markers and then make the decision 
if to wait or run the test.  This method also makes you rather independent
from the number of worker processes started with "-nNUM".

HTH,
holger
_______________________________________________
py-dev mailing list
py-dev@codespeak.net
http://codespeak.net/mailman/listinfo/py-dev

Re: [py-dev] xdist and thread-safe resource counting

Reply via email to