Hi Eli, interesting problem.
On Wed, Jan 18, 2012 at 20:55 -0800, Ateljevich, Eli wrote: > I have a question about managing resources in a threadsafe way across xdist > -n. > > My group is using py.test as a high-level driver for testing an mpi-based > numerical code. Many of our system-level tests wrap a system call to mpirun > then postprocess results. I have a decorator for the tests that hints at the > number of processors needed (usually something like 1,2,8). > > I would like to launch as much as I can at once given the available > processors. For instance, if 16 processors are available there is no reason I > couldn't be doing a 12 and a 4 processor test. I was thinking of using xdist > with some modest number of processors representing the maximum number of > concurrent tests. The xdist test processors would launch mpi jobs when enough > processors become available to satisfy the np hint for that test. This would > be managed by having the tests "check out" cores and sleep if they aren't > available yet. > > This design requires a threadsafe method to query, acquire and lock the count > of available mpi cores. I could use some sort of lock or semaphore from > threading, but I thought it would be good to run this by the xdist > cognoscenti and find out if there might be a preferred way of doing this > given how xdist itself distributes its work or manages threads. pytest-xdist itself does not provide or use a method to query the number of available processors. Quick background of xdist: Master process starts a number of processes which collect tests (see output of py.test --collectonly) and the master sees the test ids of all those collections. It then decides the scheduling (Each or Load at the moment, "-n5" implies load-balancing) and sends test ids to the nodes to execute. It pre-loads tests with test ids and then waits for completion for sending more test ids to each node. There is no node-to-node communication for co-ordination. It might be easiest to not try to extend the xdist-mechanisms but to implement an independent method which co-ordinates the number of running MPI tests / used processors via a file or so. For example, on posix you can get read/write a file with some meta-information and use the atomic os.rename operation. Not sure about the exact semantics but this should be doable and testable without any xdist involvement. If you have such a method which helps to restrict the number of MPI-processes you can then use it from a pytest_runtest_setup which can read your decorator-attributes/markers and then make the decision if to wait or run the test. This method also makes you rather independent from the number of worker processes started with "-nNUM". HTH, holger _______________________________________________ py-dev mailing list py-dev@codespeak.net http://codespeak.net/mailman/listinfo/py-dev