Re: [Tutor] Newbie Wondering About Threads
On Sun, Dec 7, 2008 at 9:35 PM, Kent Johnson <[EMAIL PROTECTED]> wrote: > There is no need to include both the flac file name and the mp3 file > name if the roots match. You can use os.path functions to split the > extension or the quick-and-dirty way: > mp3file = flacfile.rsplit('.', 1)[0] + '.mp3' That is *so* what I was looking for! You guys are awesome. Damon > > Kent > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Wondering About Threads
On Sun, Dec 7, 2008 at 3:10 PM, Damon Timm <[EMAIL PROTECTED]> wrote: > I think I did it! Woo hoo! (cheers all around! drinks on me!) Cool! Where are we meeting for drinks? ;-) > flacFiles = [["test.flac","test.mp3"],["test2.flac","test2.mp3"],\ >["test3.flac","test3.mp3"],["test4.flac","test4.mp3"],\ >["test5.flac","test5.mp3"],["test6.flac","test6.mp3"]] There is no need to include both the flac file name and the mp3 file name if the roots match. You can use os.path functions to split the extension or the quick-and-dirty way: mp3file = flacfile.rsplit('.', 1)[0] + '.mp3' Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Wondering About Threads
On Sun, Dec 7, 2008 at 10:47 AM, Kent Johnson <[EMAIL PROTECTED]> wrote: > A function as mentioned above would help. For the threaded solution > the function could just start the child process and wait for it to > finish, it doesn't have to return anything. Each thread will block on > its associated child. I think I did it! Woo hoo! (cheers all around! drinks on me!) First, I found that using the Popen.communicate() function wasn't going to work (because it sits there and waits for until it's done before continuing); so, I ditched that, created my own little function that returned the Popen object and went from there ... I mixed in one super-long audio file file with all the others it seems to work without a hitch (so far) ... watching top I see both processors running at max during the lame processing. Check it out (there are probably sexier ways to populate the *.mp3 files but I was more interested in the threads): --- import time import subprocess totProcs = 2 #number of processes to spawn before waiting flacFiles = [["test.flac","test.mp3"],["test2.flac","test2.mp3"],\ ["test3.flac","test3.mp3"],["test4.flac","test4.mp3"],\ ["test5.flac","test5.mp3"],["test6.flac","test6.mp3"]] procs = [] def flac_to_mp3(flacfile,mp3file): print "beginning to process " + flacfile p = subprocess.Popen(["flac","--decode","--stdout","--silent",flacfile], stdout=subprocess.PIPE) p1 = subprocess.Popen(["lame","--silent","-",mp3file], stdin=p.stdout) return p1 while flacFiles or procs: procs = [p for p in procs if p.poll() is None] while flacFiles and len(procs) < totProcs: file = flacFiles.pop(0) procs.append(flac_to_mp3(file[0],file[1])) time.sleep(1) --[EOF]-- Thanks again - onward I go! Damon ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Wondering About Threads
On Sun, Dec 7, 2008 at 8:58 AM, Damon Timm <[EMAIL PROTECTED]> wrote: > On Sun, Dec 7, 2008 at 12:33 AM, Martin Walsh <[EMAIL PROTECTED]> wrote: >> Here is my simplistic, not-very-well-thought-out, attempt in >> pseudo-code, perhaps it will get you started ... >> >> paths = ["file1.flac","file2.flac", ... "file11.flac"] >> procs = [] >> while paths or procs: >>procs = [p for p in procs if p.poll() is None] >>while paths and len(procs) < 2: >>flac = paths.pop(0) >>procs.append(Popen(['...', flac], ...)) >>time.sleep(1) > > I think I got a little lost with the "procs = [p for p in procs if > p.poll() is None]" statement It's called a list comprehension http://personalpages.tds.net/~kent37/kk/3.html Essentially it creates a new list from all the elements it the old list that are still running. > On Sun, Dec 7, 2008 at 2:58 AM, Lie Ryan <[EMAIL PROTECTED]> wrote: > Yea, looks like it - I think the trick, for me, will be getting a > dynamic list that can be iterated through ... I experimented a little > with the .poll() function and I think I follow how it is working ... > but really, I am going to have to do a little more "pre-thinking" than > I had to do with the bash version ... not sure if I should create a > class containing the list of flac files or just a number of functions > to handle the list ... whatever way it ends up being, is going to take > a little thought to get it straightened out. And the objected > oriented part is different than bash -- so, I have to "think > different" too. I don't think you need any classes for this. A simple list of file names should be fine. A function that takes a file name as a parameter, starts a process to process the file, and returns the resulting Popen object would also be helpful. > On Sun, Dec 7, 2008 at 8:31 AM, Kent Johnson <[EMAIL PROTECTED]> wrote: > Oh neat! I will be honest, more than one screen full of code and I > get a little overwhelmed (at this point) but I am going to check that > idea out. I was thinking something along these lines, where I can > send all the input/ouput variables along with a number argument > (threads) to a class/function that would then handle everything ... so > using a thread pool may make sense ... > > Looks like I would create a loop that went through the list of all the > files to be converted and then sent them all off, one by one, to the > thread pool -- which would then just dish them out so that no more > than 2 (if I chose that) would be converting at a time? Yes, that's right. > I gotta try > and wrap my head around it ... also, I will be using two subprocesses > to accomplish a single command (one for stdoutput and the other taking > stdinput) as well ... so they have to be packaged together somehow ... A function as mentioned above would help. For the threaded solution the function could just start the child process and wait for it to finish, it doesn't have to return anything. Each thread will block on its associated child. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Wondering About Threads
On Sun, Dec 7, 2008 at 12:33 AM, Martin Walsh <[EMAIL PROTECTED]> wrote: > I'm not certain this completely explains the poor performance, if at > all, but the communicate method of Popen objects will wait until EOF is > reached and the process ends. So IIUC, in your example the process 'p' > runs to completion and only then is its stdout (p.communicate()[0]) > passed to stdin of 'p2' by the outer communicate call. > > You might try something like this (untested!) ... > > p1 = subprocess.Popen( >["flac","--decode","--stdout","test.flac"], >stdout=subprocess.PIPE, stderr=subprocess.PIPE > ) > p2 = subprocess.Popen( >["lame","-","test.mp3"], stdin=p1.stdout, # <-- >stdout=subprocess.PIPE, stderr=subprocess.PIPE > ) > p2.communicate() That did the trick! Got it back down to 20s ... which is what it was taking on the command line. Thanks for that! > Here is my simplistic, not-very-well-thought-out, attempt in > pseudo-code, perhaps it will get you started ... > > paths = ["file1.flac","file2.flac", ... "file11.flac"] > procs = [] > while paths or procs: >procs = [p for p in procs if p.poll() is None] >while paths and len(procs) < 2: >flac = paths.pop(0) >procs.append(Popen(['...', flac], ...)) >time.sleep(1) I think I got a little lost with the "procs = [p for p in procs if p.poll() is None]" statement -- I'm not sure exactly what that is doing ... but otherwise, I think that makes sense ... will have to try it out (if not one of the more "robust" thread pool suggestions (below). On Sun, Dec 7, 2008 at 2:58 AM, Lie Ryan <[EMAIL PROTECTED]> wrote: > I think when you do that (p2.wait() then p3.wait() ), if p3 finishes > first, you wouldn't start another p3 until p2 have finished (i.e. until > p2.wait() returns) and if p2 finishes first, you wouldn't start another > p2 until p3 finishes (i.e. until p3.wait() returns ). > > The solution would be to start and wait() the subprocessess in two > threads. Use threading module or -- if you use python2.6 -- the new > multiprocessing module. > > Alternatively, you could do a "non-blocking wait", i.e. poll the thread. > > while True: >if p1.poll(): # start another p1 >if p2.poll(): # start another p2 Yea, looks like it - I think the trick, for me, will be getting a dynamic list that can be iterated through ... I experimented a little with the .poll() function and I think I follow how it is working ... but really, I am going to have to do a little more "pre-thinking" than I had to do with the bash version ... not sure if I should create a class containing the list of flac files or just a number of functions to handle the list ... whatever way it ends up being, is going to take a little thought to get it straightened out. And the objected oriented part is different than bash -- so, I have to "think different" too. On Sun, Dec 7, 2008 at 8:31 AM, Kent Johnson <[EMAIL PROTECTED]> wrote: > A simple way to do this would be to use poll() instead of wait(). Then > you can check both processes for completion in a loop and start a new > process when one of the current ones ends. You could keep the list of > active processes in a list. Make sure you put a sleep() in the polling > loop, otherwise the loop will consume your CPU! Thanks for that tip - I already throttled my CPU and had to abort the first time (without the sleep() function) ... smile. > Another approach is to use a thread pool with one worker for each > process. The thread would call wait() on its child process; when it > finishes the thread will take a new task off the queue. There are > several thread pool recipes in the Python cookbook, for example > http://code.activestate.com/recipes/203871/ > http://code.activestate.com/recipes/576576/ (this one has many links > to other pool implementations) Oh neat! I will be honest, more than one screen full of code and I get a little overwhelmed (at this point) but I am going to check that idea out. I was thinking something along these lines, where I can send all the input/ouput variables along with a number argument (threads) to a class/function that would then handle everything ... so using a thread pool may make sense ... Looks like I would create a loop that went through the list of all the files to be converted and then sent them all off, one by one, to the thread pool -- which would then just dish them out so that no more than 2 (if I chose that) would be converting at a time? I gotta try and wrap my head around it ... also, I will be using two subprocesses to accomplish a single command (one for stdoutput and the other taking stdinput) as well ... so they have to be packaged together somehow ... hmm! Great help everyone. Not quite as simple as single threading but am learning quite a bit. One day, I will figure it out. Smile. Damon ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Wondering About Threads
On Sat, Dec 6, 2008 at 9:43 PM, Damon Timm <[EMAIL PROTECTED]> wrote: > The last piece of my puzzle though, I am having trouble wrapping my > head around ... I will have a list of files > ["file1.flac","file2.flac","file3.flac","etc"] and I want the program > to tackle compressing two at a time ... but not more than two at a > time (or four, or eight, or whatever) because that's not going to help > me at all (I have dual cores right now) ... I am having trouble > thinking how I can create the algorithm that would do this for me ... A simple way to do this would be to use poll() instead of wait(). Then you can check both processes for completion in a loop and start a new process when one of the current ones ends. You could keep the list of active processes in a list. Make sure you put a sleep() in the polling loop, otherwise the loop will consume your CPU! Another approach is to use a thread pool with one worker for each process. The thread would call wait() on its child process; when it finishes the thread will take a new task off the queue. There are several thread pool recipes in the Python cookbook, for example http://code.activestate.com/recipes/203871/ http://code.activestate.com/recipes/576576/ (this one has many links to other pool implementations) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Wondering About Threads
On Sat, 06 Dec 2008 21:43:11 -0500, Damon Timm wrote: > On Sat, Dec 6, 2008 at 6:25 PM, Python Nutter <[EMAIL PROTECTED]> > wrote: >> I'm on my phone so excuse the simple reply. From what I skimmed you are >> wrapping shell commands which is what I do all the time. Some hints. 1) >> look into popen or subprocess in place of execute for more flexibility. >> I use popen a lot and assigning a popen call to an object name let's >> you parse the output and make informed decisions depending on what the >> shell program outputs. > > So I took a peak at subprocess.Popen --> looks like that's the direction > I would be headed for parallel processes ... a real simple way to see it > work for me was: > > p2 = subprocess.Popen(["lame","--silent","test.wav","test.mp3"]) > p3 = subprocess.Popen(["lame","--silent","test2.wav","test2.mp3"]) > p2.wait() > p3.wait() I think when you do that (p2.wait() then p3.wait() ), if p3 finishes first, you wouldn't start another p3 until p2 have finished (i.e. until p2.wait() returns) and if p2 finishes first, you wouldn't start another p2 until p3 finishes (i.e. until p3.wait() returns ). The solution would be to start and wait() the subprocessess in two threads. Use threading module or -- if you use python2.6 -- the new multiprocessing module. Alternatively, you could do a "non-blocking wait", i.e. poll the thread. while True: if p1.poll(): # start another p1 if p2.poll(): # start another p2 > > top showed that both cores get busy and it takes half the time! So > that's great -- when I tried to add the flac decoding through stdout I > was able to accomplish it as well ... I was mimicing the command of > "flac --decode --stdout test.flac | lame - test.mp3" ... see: > > p = subprocess.Popen(["flac","--decode","--stdout","test.flac"], > stdout=subprocess.PIPE) > p2 = subprocess.Popen(["lame","-","test.mp3"], stdin=subprocess.PIPE) > p2.communicate(p.communicate()[0]) > > That did the trick - it worked! However, it was *very* slow! The > python script has a "real" time of 2m22.504s whereas if I run it from > the command line it is only 0m18.594s. Not sure why this is ... > > The last piece of my puzzle though, I am having trouble wrapping my head > around ... I will have a list of files > ["file1.flac","file2.flac","file3.flac","etc"] and I want the program to > tackle compressing two at a time ... but not more than two at a time (or > four, or eight, or whatever) because that's not going to help me at all > (I have dual cores right now) ... I am having trouble thinking how I can > create the algorithm that would do this for me ... > > Thanks everyone. Maybe after a good night's sleep it will come to me. > If you have any ideas - would love to hear them. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Wondering About Threads
Damon Timm wrote: > On Sat, Dec 6, 2008 at 6:25 PM, Python Nutter <[EMAIL PROTECTED]> wrote: >> I'm on my phone so excuse the simple reply. >> From what I skimmed you are wrapping shell commands which is what I do >> all the time. Some hints. 1) look into popen or subprocess in place of >> execute for more flexibility. I use popen a lot and assigning a popen >> call to an object name let's you parse the output and make informed >> decisions depending on what the shell program outputs. > > So I took a peak at subprocess.Popen --> looks like that's the > direction I would be headed for parallel processes ... a real simple > way to see it work for me was: > > p2 = subprocess.Popen(["lame","--silent","test.wav","test.mp3"]) > p3 = subprocess.Popen(["lame","--silent","test2.wav","test2.mp3"]) > p2.wait() > p3.wait() > > top showed that both cores get busy and it takes half the time! So > that's great -- when I tried to add the flac decoding through stdout I > was able to accomplish it as well ... I was mimicing the command of > "flac --decode --stdout test.flac | lame - test.mp3" ... see: > > p = subprocess.Popen(["flac","--decode","--stdout","test.flac"], > stdout=subprocess.PIPE) > p2 = subprocess.Popen(["lame","-","test.mp3"], stdin=subprocess.PIPE) > p2.communicate(p.communicate()[0]) > > That did the trick - it worked! However, it was *very* slow! The > python script has a "real" time of 2m22.504s whereas if I run it from > the command line it is only 0m18.594s. Not sure why this is ... I'm not certain this completely explains the poor performance, if at all, but the communicate method of Popen objects will wait until EOF is reached and the process ends. So IIUC, in your example the process 'p' runs to completion and only then is its stdout (p.communicate()[0]) passed to stdin of 'p2' by the outer communicate call. You might try something like this (untested!) ... p1 = subprocess.Popen( ["flac","--decode","--stdout","test.flac"], stdout=subprocess.PIPE, stderr=subprocess.PIPE ) p2 = subprocess.Popen( ["lame","-","test.mp3"], stdin=p1.stdout, # <-- stdout=subprocess.PIPE, stderr=subprocess.PIPE ) p2.communicate() ... where you directly assign the stdin of 'p2' to be the stdout of 'p1'. > > The last piece of my puzzle though, I am having trouble wrapping my > head around ... I will have a list of files > ["file1.flac","file2.flac","file3.flac","etc"] and I want the program > to tackle compressing two at a time ... but not more than two at a > time (or four, or eight, or whatever) because that's not going to help > me at all (I have dual cores right now) ... I am having trouble > thinking how I can create the algorithm that would do this for me ... Interesting problem, and not an easy one IMHO, unless you're content with waiting for a pair of processes to complete before starting two more. In which case you can just grab two filenames at a time from the list, define the Popen calls, and wait for (or communicate with) both before continuing with another pair. But since you probably want your script to stay busy, and it's reasonable to assume (I think!) that one of the processes may finish much sooner or much later than the other... well, it is a bit tricky (for me, anyway). Here is my simplistic, not-very-well-thought-out, attempt in pseudo-code, perhaps it will get you started ... paths = ["file1.flac","file2.flac", ... "file11.flac"] procs = [] while paths or procs: procs = [p for p in procs if p.poll() is None] while paths and len(procs) < 2: flac = paths.pop(0) procs.append(Popen(['...', flac], ...)) time.sleep(1) The idea here is to keep track of running processes in a list, remove them when they've terminated, and start (append) new processes as necessary up to the desired max, only while there are files remaining or processes running. HTH, Marty ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Wondering About Threads
On Sat, Dec 6, 2008 at 6:25 PM, Python Nutter <[EMAIL PROTECTED]> wrote: > I'm on my phone so excuse the simple reply. > From what I skimmed you are wrapping shell commands which is what I do > all the time. Some hints. 1) look into popen or subprocess in place of > execute for more flexibility. I use popen a lot and assigning a popen > call to an object name let's you parse the output and make informed > decisions depending on what the shell program outputs. So I took a peak at subprocess.Popen --> looks like that's the direction I would be headed for parallel processes ... a real simple way to see it work for me was: p2 = subprocess.Popen(["lame","--silent","test.wav","test.mp3"]) p3 = subprocess.Popen(["lame","--silent","test2.wav","test2.mp3"]) p2.wait() p3.wait() top showed that both cores get busy and it takes half the time! So that's great -- when I tried to add the flac decoding through stdout I was able to accomplish it as well ... I was mimicing the command of "flac --decode --stdout test.flac | lame - test.mp3" ... see: p = subprocess.Popen(["flac","--decode","--stdout","test.flac"], stdout=subprocess.PIPE) p2 = subprocess.Popen(["lame","-","test.mp3"], stdin=subprocess.PIPE) p2.communicate(p.communicate()[0]) That did the trick - it worked! However, it was *very* slow! The python script has a "real" time of 2m22.504s whereas if I run it from the command line it is only 0m18.594s. Not sure why this is ... The last piece of my puzzle though, I am having trouble wrapping my head around ... I will have a list of files ["file1.flac","file2.flac","file3.flac","etc"] and I want the program to tackle compressing two at a time ... but not more than two at a time (or four, or eight, or whatever) because that's not going to help me at all (I have dual cores right now) ... I am having trouble thinking how I can create the algorithm that would do this for me ... Thanks everyone. Maybe after a good night's sleep it will come to me. If you have any ideas - would love to hear them. Damon ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Newbie Wondering About Threads
Hi Everyone - I am a complete and utter Python newbie (as of today, honestly) -- am interested in expanding my programming horizons beyond bash scripting and thought Python would be a nice match for me. To start, I thought I may try re-writing some of my bash scripts in Python as a learning tool for me ... and the first one I wanted to talkle was a script that converts .flac audio files into .mp3 files ... basic idea is I supply a sourceDirectory and targetDirectory and then recursively convert the source file tree into an identical target file tree filled with mp3 files. I'm sure this has been done before (by those much wiser than me) but I figured I can learn something as I go ... for what I've accomplished so far, it seems pretty ugly! But I'm learning ... Anyhow, I think I got the basics down but I had a thought: can I thread this program to utilize all of my cores? And if so, how? Right now, the lame audio encoder is only hitting one core ... I could do all this faster if I could pass a variable that says: open 2 or 4 threads instead. Here is what I've been working on so far -- would appreciate any insight you may have. Thanks, Damon #!/usr/bin/env python import os import sys import fnmatch from os import system fileList = [] rootDir = sys.argv[1] targetDir = sys.argv[2] def shell_quote(s): """Quote and escape the given string (if necessary) for inclusion in a shell command""" return "\"%s\"" % s.replace('"', '\\"') def _mkdir(newdir): """works the way a good mkdir should :) - already exists, silently complete - regular file in the way, raise an exception - parent directory(ies) does not exist, make them as well http://code.activestate.com/recipes/82465/ """ if os.path.isdir(newdir): pass elif os.path.isfile(newdir): raise OSError("a file with the same name as the desired " \ "dir, '%s', already exists." % newdir) else: head, tail = os.path.split(newdir) if head and not os.path.isdir(head): _mkdir(head) #print "_mkdir %s" % repr(newdir) if tail: os.mkdir(newdir) # get all the flac files and directory structures for dirpath, subFolders, files in os.walk(rootDir): for file in files: if fnmatch.fnmatch(file, '*.flac'): flacFileInfo = [os.path.join(dirpath,file),dirpath+"/",file,dirpath.lstrip(rootDir)+"/"] fileList.append(flacFileInfo) # create new directory structure and mp3 files for sourceFile,dir,flacfile,strip in fileList: mp3File = shell_quote(targetDir + strip + flacfile.strip('.flac') + ".mp3") mp3FileDir = targetDir + strip sourceFile = shell_quote(sourceFile) _mkdir(mp3FileDir) flacCommand = "flac --decode --stdout --silent " + sourceFile + " | lame -V4 --slient - " + mp3File system(flacCommand) ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor