Re: More CPUs doen't equal more speed

Cameron Simpson Thu, 23 May 2019 18:35:38 -0700

On 23May2019 17:04, bvdp <b...@mellowood.ca> wrote:

Anyway, yes the problem is that I was naively using command.getoutput()
which blocks until the command is finished. So, of course, only one process
was being run at one time! Bad me!


I guess I should be looking at subprocess.Popen(). Now, a more relevant
question ... if I do it this way I then need to poll though a list of saved
process IDs to see which have finished? Right? My initial thought is to
batch them up in small groups (say CPU_COUNT-1) and wait for that batch to
finish, etc. Would it be foolish to send send a large number (1200 in this
case since this is the number of files) and let the OS worry about
scheduling and have my program poll 1200 IDs?

Someone mentioned the GIL. If I launch separate processes then I don't
encounter this issue? Right?

Yes, but it becomes more painful to manage. If you're issues distinctseparate commands anyway, dispatch many or all and then wait for them asa distinct step. If the commands start thrashing the rest of the OSresources (such as the disc) then you may want to do some capacitylimitation, such as a counter or semaphore to limit how many go at once.


Now, waiting for a subcommand can be done in a few ways.

If you're then parent of all the processes you can keep a set() of theissued process ids and then call os.wait() repeatedly, which returns thepid of a completed child process. Check it against your set. If you needto act on the specific process, use a dict to map pids to some record ofthe subprocess.

Alternatively, you can spawn a Python Thread for each subcommand, havethe Thread dispatch the subcommand _and_ wait for it (i.e. keep yourcommand.getoutput() method, but in a Thread). Main programme waits forthe Threads by join()ing them.

Because a thread waiting for something external (the subprocess) doesn'thold the GIL, other stuff can proceed. Basicly, if something is handedoff the to OS and then Python waits for that (via an os.* call or aPopen.wait() call etc etc) then it will release the GIL while it isblocked, so other Threads _will_ get to work.

This is all efficient, and there's any number of variations on the waitstep depending what your needs are.

The GIL isn't the disaster most people seem think. It can be abottleneck for pure Python compute intensive work. But Python'sinterpreted - if you _really_ want performance the core compute will becompiled to something more efficient (eg a C extension) or handed toanother process (transcode video in pure Python - argh! - but call theffmpeg command as a subprocess - yes!); handed off, the GIL should bereleased, allowing other Python side work to continue.


Cheers,
Cameron Simpson <c...@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list

Re: More CPUs doen't equal more speed

Reply via email to