Re: [Tutor] Newbie Wondering About Threads

2008-12-07 Thread Damon Timm
On Sun, Dec 7, 2008 at 9:35 PM, Kent Johnson <[EMAIL PROTECTED]> wrote:
> There is no need to include both the flac file name and the mp3 file
> name if the roots match. You can use os.path functions to split the
> extension or the quick-and-dirty way:
>  mp3file = flacfile.rsplit('.', 1)[0] + '.mp3'

That is *so* what I was looking for!

You guys are awesome.

Damon

>
> Kent
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Wondering About Threads

2008-12-07 Thread Kent Johnson
On Sun, Dec 7, 2008 at 3:10 PM, Damon Timm <[EMAIL PROTECTED]> wrote:

> I think I did it!  Woo hoo!  (cheers all around! drinks on me!)

Cool! Where are we meeting for drinks? ;-)

> flacFiles = [["test.flac","test.mp3"],["test2.flac","test2.mp3"],\
>["test3.flac","test3.mp3"],["test4.flac","test4.mp3"],\
>["test5.flac","test5.mp3"],["test6.flac","test6.mp3"]]

There is no need to include both the flac file name and the mp3 file
name if the roots match. You can use os.path functions to split the
extension or the quick-and-dirty way:
  mp3file = flacfile.rsplit('.', 1)[0] + '.mp3'

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Wondering About Threads

2008-12-07 Thread Damon Timm
On Sun, Dec 7, 2008 at 10:47 AM, Kent Johnson <[EMAIL PROTECTED]> wrote:
> A function as mentioned above would help. For the threaded solution
> the function could just start the child process and wait for it to
> finish, it doesn't have to return anything. Each thread will block on
> its associated child.

I think I did it!  Woo hoo!  (cheers all around! drinks on me!)

First, I found that using the Popen.communicate() function wasn't
going to work (because it sits there and waits for until it's done
before continuing); so, I ditched that, created my own little function
that returned the Popen object and went from there ... I mixed in one
super-long audio file file with all the others it seems to work
without a hitch (so far) ... watching top I see both processors
running at max during the lame processing.

Check it out (there are probably sexier ways to populate the *.mp3
files but I was more interested in the threads):
---
import time
import subprocess

totProcs = 2 #number of processes to spawn before waiting
flacFiles = [["test.flac","test.mp3"],["test2.flac","test2.mp3"],\
["test3.flac","test3.mp3"],["test4.flac","test4.mp3"],\
["test5.flac","test5.mp3"],["test6.flac","test6.mp3"]]
procs = []

def flac_to_mp3(flacfile,mp3file):
print "beginning to process " + flacfile
p = subprocess.Popen(["flac","--decode","--stdout","--silent",flacfile],
stdout=subprocess.PIPE)
p1 = subprocess.Popen(["lame","--silent","-",mp3file], stdin=p.stdout)
return p1

while flacFiles or procs:
procs = [p for p in procs if p.poll() is None]
while flacFiles and len(procs) < totProcs:
file = flacFiles.pop(0)
procs.append(flac_to_mp3(file[0],file[1]))
time.sleep(1)
--[EOF]--

Thanks again - onward I go!

Damon
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Wondering About Threads

2008-12-07 Thread Kent Johnson
On Sun, Dec 7, 2008 at 8:58 AM, Damon Timm <[EMAIL PROTECTED]> wrote:
> On Sun, Dec 7, 2008 at 12:33 AM, Martin Walsh <[EMAIL PROTECTED]> wrote:

>> Here is my simplistic, not-very-well-thought-out, attempt in
>> pseudo-code, perhaps it will get you started ...
>>
>> paths = ["file1.flac","file2.flac", ... "file11.flac"]
>> procs = []
>> while paths or procs:
>>procs = [p for p in procs if p.poll() is None]
>>while paths and len(procs) < 2:
>>flac = paths.pop(0)
>>procs.append(Popen(['...', flac], ...))
>>time.sleep(1)
>
> I think I got a little lost with the "procs = [p for p in procs if
> p.poll() is None]" statement

It's called a list comprehension
http://personalpages.tds.net/~kent37/kk/3.html

Essentially it creates a new list from all the elements it the old
list that are still running.

> On Sun, Dec 7, 2008 at 2:58 AM, Lie Ryan <[EMAIL PROTECTED]> wrote:

> Yea, looks like it - I think the trick, for me, will be getting a
> dynamic list that can be iterated through ... I experimented a little
> with the .poll() function and I think I follow how it is working ...
> but really, I am going to have to do a little more "pre-thinking" than
> I had to do with the bash version ... not sure if I should create a
> class containing the list of flac files or just a number of functions
> to handle the list ... whatever way it ends up being, is going to take
> a little thought to get it straightened out.  And the objected
> oriented part is different than bash -- so, I have to "think
> different" too.

I don't think you need any classes for this. A simple list of file
names should be fine. A function that takes a file name as a
parameter, starts a process to process the file, and returns the
resulting Popen object would also be helpful.

> On Sun, Dec 7, 2008 at 8:31 AM, Kent Johnson <[EMAIL PROTECTED]> wrote:

> Oh neat!  I will be honest, more than one screen full of code and I
> get a little overwhelmed (at this point) but I am going to check that
> idea out.  I was thinking something along these lines, where I can
> send all the input/ouput variables along with a number argument
> (threads) to a class/function that would then handle everything ... so
> using a thread pool may make sense ...
>
> Looks like I would create a loop that went through the list of all the
> files to be converted and then sent them all off, one by one, to the
> thread pool -- which would then just dish them out so that no more
> than 2 (if I chose that) would be converting at a time?

Yes, that's right.

>  I gotta try
> and wrap my head around it ... also, I will be using two subprocesses
> to accomplish a single command (one for stdoutput and the other taking
> stdinput) as well ... so they have to be packaged together somehow ...

A function as mentioned above would help. For the threaded solution
the function could just start the child process and wait for it to
finish, it doesn't have to return anything. Each thread will block on
its associated child.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Wondering About Threads

2008-12-07 Thread Damon Timm
On Sun, Dec 7, 2008 at 12:33 AM, Martin Walsh <[EMAIL PROTECTED]> wrote:
> I'm not certain this completely explains the poor performance, if at
> all, but the communicate method of Popen objects will wait until EOF is
> reached and the process ends. So IIUC, in your example the process 'p'
> runs to completion and only then is its stdout (p.communicate()[0])
> passed to stdin of 'p2' by the outer communicate call.
>
> You might try something like this (untested!) ...
>
> p1 = subprocess.Popen(
>["flac","--decode","--stdout","test.flac"],
>stdout=subprocess.PIPE, stderr=subprocess.PIPE
> )
> p2 = subprocess.Popen(
>["lame","-","test.mp3"], stdin=p1.stdout, # <--
>stdout=subprocess.PIPE, stderr=subprocess.PIPE
> )
> p2.communicate()

That did the trick!  Got it back down to 20s ... which is what it was
taking on the command line.  Thanks for that!

> Here is my simplistic, not-very-well-thought-out, attempt in
> pseudo-code, perhaps it will get you started ...
>
> paths = ["file1.flac","file2.flac", ... "file11.flac"]
> procs = []
> while paths or procs:
>procs = [p for p in procs if p.poll() is None]
>while paths and len(procs) < 2:
>flac = paths.pop(0)
>procs.append(Popen(['...', flac], ...))
>time.sleep(1)

I think I got a little lost with the "procs = [p for p in procs if
p.poll() is None]" statement -- I'm not sure exactly what that is
doing ... but otherwise, I think that makes sense ... will have to try
it out (if not one of the more "robust" thread pool suggestions
(below).

On Sun, Dec 7, 2008 at 2:58 AM, Lie Ryan <[EMAIL PROTECTED]> wrote:
> I think when you do that (p2.wait() then p3.wait() ), if p3 finishes
> first, you wouldn't start another p3 until p2 have finished (i.e. until
> p2.wait() returns) and if p2 finishes first, you wouldn't start another
> p2 until p3 finishes (i.e. until p3.wait() returns ).
>
> The solution would be to start and wait() the subprocessess in two
> threads. Use threading module or -- if you use python2.6 -- the new
> multiprocessing module.
>
> Alternatively, you could do a "non-blocking wait", i.e. poll the thread.
>
> while True:
>if p1.poll(): # start another p1
>if p2.poll(): # start another p2

Yea, looks like it - I think the trick, for me, will be getting a
dynamic list that can be iterated through ... I experimented a little
with the .poll() function and I think I follow how it is working ...
but really, I am going to have to do a little more "pre-thinking" than
I had to do with the bash version ... not sure if I should create a
class containing the list of flac files or just a number of functions
to handle the list ... whatever way it ends up being, is going to take
a little thought to get it straightened out.  And the objected
oriented part is different than bash -- so, I have to "think
different" too.

On Sun, Dec 7, 2008 at 8:31 AM, Kent Johnson <[EMAIL PROTECTED]> wrote:
> A simple way to do this would be to use poll() instead of wait(). Then
> you can check both processes for completion in a loop and start a new
> process when one of the current ones ends. You could keep the list of
> active processes in a list. Make sure you put a sleep() in the polling
> loop, otherwise the loop will consume your CPU!

Thanks for that tip - I already throttled my CPU and had to abort the
first time (without the sleep() function) ... smile.

> Another approach is to use a thread pool with one worker for each
> process. The thread would call wait() on its child process; when it
> finishes the thread will take a new task off the queue. There are
> several thread pool recipes in the Python cookbook, for example
> http://code.activestate.com/recipes/203871/
> http://code.activestate.com/recipes/576576/ (this one has many links
> to other pool implementations)

Oh neat!  I will be honest, more than one screen full of code and I
get a little overwhelmed (at this point) but I am going to check that
idea out.  I was thinking something along these lines, where I can
send all the input/ouput variables along with a number argument
(threads) to a class/function that would then handle everything ... so
using a thread pool may make sense ...

Looks like I would create a loop that went through the list of all the
files to be converted and then sent them all off, one by one, to the
thread pool -- which would then just dish them out so that no more
than 2 (if I chose that) would be converting at a time?  I gotta try
and wrap my head around it ... also, I will be using two subprocesses
to accomplish a single command (one for stdoutput and the other taking
stdinput) as well ... so they have to be packaged together somehow ...
hmm!

Great help everyone.  Not quite as simple as single threading but am
learning quite a bit.  One day, I will figure it out.  Smile.

Damon
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Wondering About Threads

2008-12-07 Thread Kent Johnson
On Sat, Dec 6, 2008 at 9:43 PM, Damon Timm <[EMAIL PROTECTED]> wrote:


> The last piece of my puzzle though, I am having trouble wrapping my
> head around ... I will have a list of files
> ["file1.flac","file2.flac","file3.flac","etc"] and I want the program
> to tackle compressing two at a time ... but not more than two at a
> time (or four, or eight, or whatever) because that's not going to help
> me at all (I have dual cores right now) ... I am having trouble
> thinking how I can create the algorithm that would do this for me ...

A simple way to do this would be to use poll() instead of wait(). Then
you can check both processes for completion in a loop and start a new
process when one of the current ones ends. You could keep the list of
active processes in a list. Make sure you put a sleep() in the polling
loop, otherwise the loop will consume your CPU!

Another approach is to use a thread pool with one worker for each
process. The thread would call wait() on its child process; when it
finishes the thread will take a new task off the queue. There are
several thread pool recipes in the Python cookbook, for example
http://code.activestate.com/recipes/203871/
http://code.activestate.com/recipes/576576/ (this one has many links
to other pool implementations)

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Wondering About Threads

2008-12-06 Thread Lie Ryan
On Sat, 06 Dec 2008 21:43:11 -0500, Damon Timm wrote:

> On Sat, Dec 6, 2008 at 6:25 PM, Python Nutter <[EMAIL PROTECTED]>
> wrote:
>> I'm on my phone so excuse the simple reply. From what I skimmed you are
>> wrapping shell commands which is what I do all the time. Some hints. 1)
>> look into popen or subprocess in place of execute for more flexibility.
>> I use popen a lot and assigning a popen call to an object name let's
>> you parse the output and make informed decisions depending on what the
>> shell program outputs.
> 
> So I took a peak at subprocess.Popen --> looks like that's the direction
> I would be headed for parallel processes ... a real simple way to see it
> work for me was:
> 
> p2 = subprocess.Popen(["lame","--silent","test.wav","test.mp3"]) 
> p3 = subprocess.Popen(["lame","--silent","test2.wav","test2.mp3"]) 
> p2.wait()
> p3.wait()

I think when you do that (p2.wait() then p3.wait() ), if p3 finishes 
first, you wouldn't start another p3 until p2 have finished (i.e. until 
p2.wait() returns) and if p2 finishes first, you wouldn't start another 
p2 until p3 finishes (i.e. until p3.wait() returns ).

The solution would be to start and wait() the subprocessess in two 
threads. Use threading module or -- if you use python2.6 -- the new 
multiprocessing module.

Alternatively, you could do a "non-blocking wait", i.e. poll the thread.

while True:
if p1.poll(): # start another p1
if p2.poll(): # start another p2

> 
> top showed that both cores get busy and it takes half the time!  So
> that's great -- when I tried to add the flac decoding through stdout I
> was able to accomplish it as well ... I was mimicing the command of
> "flac --decode --stdout test.flac | lame - test.mp3" ... see:
> 
> p = subprocess.Popen(["flac","--decode","--stdout","test.flac"],
> stdout=subprocess.PIPE)
> p2 = subprocess.Popen(["lame","-","test.mp3"], stdin=subprocess.PIPE)
> p2.communicate(p.communicate()[0])
> 
> That did the trick - it worked!  However, it was *very* slow!  The
> python script has a "real" time of 2m22.504s whereas if I run it from
> the command line it is only 0m18.594s.  Not sure why this is ...
> 
> The last piece of my puzzle though, I am having trouble wrapping my head
> around ... I will have a list of files
> ["file1.flac","file2.flac","file3.flac","etc"] and I want the program to
> tackle compressing two at a time ... but not more than two at a time (or
> four, or eight, or whatever) because that's not going to help me at all
> (I have dual cores right now) ... I am having trouble thinking how I can
> create the algorithm that would do this for me ...
> 
> Thanks everyone.  Maybe after a good night's sleep it will come to me.
>  If you have any ideas - would love to hear them.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Wondering About Threads

2008-12-06 Thread Martin Walsh
Damon Timm wrote:
> On Sat, Dec 6, 2008 at 6:25 PM, Python Nutter <[EMAIL PROTECTED]> wrote:
>> I'm on my phone so excuse the simple reply.
>> From what I skimmed you are wrapping shell commands which is what I do
>> all the time. Some hints. 1) look into popen or subprocess in place of
>> execute for more flexibility. I use popen a lot and assigning a popen
>> call to an object name let's you parse the output and make informed
>> decisions depending on what the shell program outputs.
> 
> So I took a peak at subprocess.Popen --> looks like that's the
> direction I would be headed for parallel processes ... a real simple
> way to see it work for me was:
> 
> p2 = subprocess.Popen(["lame","--silent","test.wav","test.mp3"])
> p3 = subprocess.Popen(["lame","--silent","test2.wav","test2.mp3"])
> p2.wait()
> p3.wait()
> 
> top showed that both cores get busy and it takes half the time!  So
> that's great -- when I tried to add the flac decoding through stdout I
> was able to accomplish it as well ... I was mimicing the command of
> "flac --decode --stdout test.flac | lame - test.mp3" ... see:
> 
> p = subprocess.Popen(["flac","--decode","--stdout","test.flac"],
> stdout=subprocess.PIPE)
> p2 = subprocess.Popen(["lame","-","test.mp3"], stdin=subprocess.PIPE)
> p2.communicate(p.communicate()[0])
> 
> That did the trick - it worked!  However, it was *very* slow!  The
> python script has a "real" time of 2m22.504s whereas if I run it from
> the command line it is only 0m18.594s.  Not sure why this is ...

I'm not certain this completely explains the poor performance, if at
all, but the communicate method of Popen objects will wait until EOF is
reached and the process ends. So IIUC, in your example the process 'p'
runs to completion and only then is its stdout (p.communicate()[0])
passed to stdin of 'p2' by the outer communicate call.

You might try something like this (untested!) ...

p1 = subprocess.Popen(
["flac","--decode","--stdout","test.flac"],
stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
p2 = subprocess.Popen(
["lame","-","test.mp3"], stdin=p1.stdout, # <--
stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
p2.communicate()

... where you directly assign the stdin of 'p2' to be the stdout of 'p1'.

> 
> The last piece of my puzzle though, I am having trouble wrapping my
> head around ... I will have a list of files
> ["file1.flac","file2.flac","file3.flac","etc"] and I want the program
> to tackle compressing two at a time ... but not more than two at a
> time (or four, or eight, or whatever) because that's not going to help
> me at all (I have dual cores right now) ... I am having trouble
> thinking how I can create the algorithm that would do this for me ...

Interesting problem, and not an easy one IMHO, unless you're content
with waiting for a pair of processes to complete before starting two
more. In which case you can just grab two filenames at a time from the
list, define the Popen calls, and wait for (or communicate with) both
before continuing with another pair.

But since you probably want your script to stay busy, and it's
reasonable to assume (I think!) that one of the processes may finish
much sooner or much later than the other... well, it is a bit tricky
(for me, anyway).

Here is my simplistic, not-very-well-thought-out, attempt in
pseudo-code, perhaps it will get you started ...

paths = ["file1.flac","file2.flac", ... "file11.flac"]
procs = []
while paths or procs:
procs = [p for p in procs if p.poll() is None]
while paths and len(procs) < 2:
flac = paths.pop(0)
procs.append(Popen(['...', flac], ...))
time.sleep(1)

The idea here is to keep track of running processes in a list, remove
them when they've terminated, and start (append) new processes as
necessary up to the desired max, only while there are files remaining or
processes running.

HTH,
Marty


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Wondering About Threads

2008-12-06 Thread Damon Timm
On Sat, Dec 6, 2008 at 6:25 PM, Python Nutter <[EMAIL PROTECTED]> wrote:
> I'm on my phone so excuse the simple reply.
> From what I skimmed you are wrapping shell commands which is what I do
> all the time. Some hints. 1) look into popen or subprocess in place of
> execute for more flexibility. I use popen a lot and assigning a popen
> call to an object name let's you parse the output and make informed
> decisions depending on what the shell program outputs.

So I took a peak at subprocess.Popen --> looks like that's the
direction I would be headed for parallel processes ... a real simple
way to see it work for me was:

p2 = subprocess.Popen(["lame","--silent","test.wav","test.mp3"])
p3 = subprocess.Popen(["lame","--silent","test2.wav","test2.mp3"])
p2.wait()
p3.wait()

top showed that both cores get busy and it takes half the time!  So
that's great -- when I tried to add the flac decoding through stdout I
was able to accomplish it as well ... I was mimicing the command of
"flac --decode --stdout test.flac | lame - test.mp3" ... see:

p = subprocess.Popen(["flac","--decode","--stdout","test.flac"],
stdout=subprocess.PIPE)
p2 = subprocess.Popen(["lame","-","test.mp3"], stdin=subprocess.PIPE)
p2.communicate(p.communicate()[0])

That did the trick - it worked!  However, it was *very* slow!  The
python script has a "real" time of 2m22.504s whereas if I run it from
the command line it is only 0m18.594s.  Not sure why this is ...

The last piece of my puzzle though, I am having trouble wrapping my
head around ... I will have a list of files
["file1.flac","file2.flac","file3.flac","etc"] and I want the program
to tackle compressing two at a time ... but not more than two at a
time (or four, or eight, or whatever) because that's not going to help
me at all (I have dual cores right now) ... I am having trouble
thinking how I can create the algorithm that would do this for me ...

Thanks everyone.  Maybe after a good night's sleep it will come to me.
 If you have any ideas - would love to hear them.

Damon
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Newbie Wondering About Threads

2008-12-06 Thread Damon Timm
Hi Everyone - I am a complete and utter Python newbie (as of today,
honestly) -- am interested in expanding my programming horizons beyond
bash scripting and thought Python would be a nice match for me.

To start, I thought I may try re-writing some of my bash scripts in
Python as a learning tool for me ... and the first one I wanted to
talkle was a script that converts .flac audio files into .mp3 files
... basic idea is I supply a sourceDirectory and targetDirectory and
then recursively convert the source file tree into an identical target
file tree filled with mp3 files.

I'm sure this has been done before (by those much wiser than me) but I
figured I can learn something as I go ... for what I've accomplished
so far, it seems pretty ugly!  But I'm learning ...

Anyhow, I think I got the basics down but I had a thought: can I
thread this program to utilize all of my cores?  And if so, how?

Right now, the lame audio encoder is only hitting one core ... I could
do all this faster if I could pass a variable that says: open 2 or 4
threads instead.

Here is what I've been working on so far -- would appreciate any
insight you may have.

Thanks,
Damon

#!/usr/bin/env python

import os
import sys
import fnmatch
from os import system

fileList = []
rootDir = sys.argv[1]
targetDir = sys.argv[2]

def shell_quote(s):
"""Quote and escape the given string (if necessary) for inclusion in
  a shell command"""
return "\"%s\"" % s.replace('"', '\\"')

def _mkdir(newdir):
"""works the way a good mkdir should :)
- already exists, silently complete
- regular file in the way, raise an exception
- parent directory(ies) does not exist, make them as well
http://code.activestate.com/recipes/82465/
"""
if os.path.isdir(newdir):
pass
elif os.path.isfile(newdir):
raise OSError("a file with the same name as the desired " \
"dir, '%s', already exists." % newdir)
else:
head, tail = os.path.split(newdir)
if head and not os.path.isdir(head):
_mkdir(head)
#print "_mkdir %s" % repr(newdir)
if tail:
os.mkdir(newdir)

# get all the flac files and directory structures
for dirpath, subFolders, files in os.walk(rootDir): 
for file in files:
if fnmatch.fnmatch(file, '*.flac'):
flacFileInfo =
[os.path.join(dirpath,file),dirpath+"/",file,dirpath.lstrip(rootDir)+"/"]
fileList.append(flacFileInfo)

# create new directory structure and mp3 files
for sourceFile,dir,flacfile,strip in fileList:
mp3File = shell_quote(targetDir + strip + flacfile.strip('.flac') + ".mp3")
mp3FileDir = targetDir + strip
sourceFile = shell_quote(sourceFile)

_mkdir(mp3FileDir)

flacCommand = "flac --decode --stdout --silent " + sourceFile + "
| lame -V4 --slient - " + mp3File
system(flacCommand)
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor