Just got a 1 liner working with parallel. Super! All I ended up doing is: parallel mma {} ::: *mma
which whizzed through my files in less than 1/4 of the time of my one-at-a-time script. (In case anyone is wondering, or cares, this is a bunch of Musical Midi Accompaniment files: https://mellowood.ca/mma/index.html). On Fri, May 24, 2019 at 9:28 AM Rob Gaddi <rgaddi@highlandtechnology.invalid> wrote: > On 5/23/19 6:32 PM, Cameron Simpson wrote: > > On 23May2019 17:04, bvdp <b...@mellowood.ca> wrote: > >> Anyway, yes the problem is that I was naively using command.getoutput() > >> which blocks until the command is finished. So, of course, only one > >> process > >> was being run at one time! Bad me! > >> > >> I guess I should be looking at subprocess.Popen(). Now, a more relevant > >> question ... if I do it this way I then need to poll though a list of > >> saved > >> process IDs to see which have finished? Right? My initial thought is to > >> batch them up in small groups (say CPU_COUNT-1) and wait for that > >> batch to > >> finish, etc. Would it be foolish to send send a large number (1200 in > >> this > >> case since this is the number of files) and let the OS worry about > >> scheduling and have my program poll 1200 IDs? > >> > >> Someone mentioned the GIL. If I launch separate processes then I don't > >> encounter this issue? Right? > > > > Yes, but it becomes more painful to manage. If you're issues distinct > > separate commands anyway, dispatch many or all and then wait for them as > > a distinct step. If the commands start thrashing the rest of the OS > > resources (such as the disc) then you may want to do some capacity > > limitation, such as a counter or semaphore to limit how many go at once. > > > > Now, waiting for a subcommand can be done in a few ways. > > > > If you're then parent of all the processes you can keep a set() of the > > issued process ids and then call os.wait() repeatedly, which returns the > > pid of a completed child process. Check it against your set. If you need > > to act on the specific process, use a dict to map pids to some record of > > the subprocess. > > > > Alternatively, you can spawn a Python Thread for each subcommand, have > > the Thread dispatch the subcommand _and_ wait for it (i.e. keep your > > command.getoutput() method, but in a Thread). Main programme waits for > > the Threads by join()ing them. > > > > I'll just note, because no one else has brought it up yet, that rather > than manually creating threads and/or process pools for all these > things, this is exactly what the standard concurrent.futures module is > for. It's a fairly brilliant wrapper around all this stuff, and I feel > like it often doesn't get enough love. > > > -- > Rob Gaddi, Highland Technology -- www.highlandtechnology.com > Email address domain is currently out of order. See above to fix. > -- > https://mail.python.org/mailman/listinfo/python-list > -- **** Listen to my FREE CD at http://www.mellowood.ca/music/cedars **** Bob van der Poel ** Wynndel, British Columbia, CANADA ** EMAIL: b...@mellowood.ca WWW: http://www.mellowood.ca -- https://mail.python.org/mailman/listinfo/python-list