Wow, this message turned out to be *LONG*. And it also took a long time to write. But I had fun with it, so ok. :-)
Michael Torrie wrote: > Recently a post that mentioned a recipe that extended subprocess to > allow killable processes caused me to do some thinking. Some of my > larger bash scripts are starting to become a bit unwieldy (hundreds of > lines of code). Yet for many things bash just works out so well because > it is so close to the file system and processes. As part of another > project, I now have need of a really good library to make it almost as > easy to do things in Python as it is in Bash. With a simple wrapper > around subprocess, I'm pretty much able to do most things. Most of my > complicated bash hackery involves using awk, sed, grep, and cut to > process text, which python does quite nicely, thank you very much. But > there's a few things to add. > > To wit, I'm wanting to write a library that can deal with the following > things: > > - spawn a process, feed it std in, get stdout, stderr, and err code. > This is largely already accomplished by subprocess It is accomplished by subprocess.Popen: The 'communicate' method handles stdin, stdout and stderr, waiting for the process to terminate. The 'wait' method just waits for the process to terminate and returns the return code. The 'returncode' attribute contains the return code (or None if the process hasn't terminated yet). You could write a convenience wrapper function if you want to do this in a more terse way. > - spawn off processes as background daemons Couldn't you do this with subprocess by doing subprocess.Popen([prog]) and, well, nothing else? (You may have/want to set stdin/stdout/stderr too. I dunno.) > - spawn multiple processes and pipe output to input. > - can do fancier things like bash does, like combine stderr/stdout, > switch stderr/stdout, redirects to and from files That's possible with subprocess. See this paragraph of <http://docs.python.org/lib/node528.html>: > stdin, stdout and stderr specify the executed programs' standard input, > standard output and standard error file handles, respectively. Valid values > are PIPE, an existing file descriptor (a positive integer), an existing file > object, and None. PIPE indicates that a new pipe to the child should be > created. With None, no redirection will occur; the child's file handles will > be inherited from the parent. Additionally, stderr can be STDOUT, which > indicates that the stderr data from the applications should be captured into > the same file handle as for stdout. And also <http://docs.python.org/lib/node535.html>. Not the least verbose, but pretty simple, and I bet it can do anything bash can. > - transparently allow a python function or object to be a part of > the pipeline at any stage. Hmmm. I can't think very well at the moment, but you could create file-like objects that do...I dunno, callbacks or something. Simple and incomplete mockup: class Pipe(object): def __init__(self, from_fh, to_fh, from_callback=None, to_callback=None): self.from_fh = from_fh self.to_fh = to_fh self.from_callback = from_callback self.to_callback = to_callback def read(self, *args, **kwargs): data = self.from_fh.read(*args, **kwargs) if self.from_callback is not None: self.from_callback(data) return data def write(self, data): # XXX Call the callback before or after the data is actually written? if self.to_callback is not None: self.to_callback(data) return self.to_fh.write(data) That just passes input and output through itself, also passing it to callback functions. You'd have to add all the other methods too, like readline and __iter__... Maybe inheriting from 'file' would get most of them. I dunno how it works internally. > Questions include, how would one design the interface for things, like > assembling pipes? Several ideas include: > > pipe([prog1,args],[prog2,args],...) > > or > > run([prog1,args]).pipe([prog2,args]).pipe(...) > > The former doesn't deal very well with re-plumbing of the pipes, nor is > there an easy way to redirect to and from a file. The second syntax is > more flexible but a bit cumbersome. Also it doesn't allow redirection > or flexible plumbing either. > > Any ideas on how I could design this? Ok, the below is an edited-down, more formal-sounding brain dump. Idea 1: >>> run([prog, args], from_fh, to_fh, from_callback, to_callback).run(...) It would basically just automate the construction of the intermediary pipe objects suggested above. It could also be done with tuples, like: >>> run([prog, args], (from_fh, to_fh), (from_callback, to_callback)).run(...) Idea 2: This one would parse a list similar to a bash command line. run('prog', '>>out', '|', 'other_prog', 'arg', '>', 'foo.txt') Which would be like a bash: `prog 2>&1 | other_prog arg >foo.txt` ("2>&1" is how you combine stdout and stderr, right?) "<", ">", ">>" and "|" would be keywords that behave similarly to bash. You would use e.g. ['>', 'foo.txt'] to pass the argument to the keywords. Along with string filenames, it would accept file-like objects and file descriptors like subprocess.Popen does. ">>out", would be equivalent to bash's "2>&1". Similar things like ">err" would work too. They would be entirely separate keywords, not ['>>', 'out'] or something, so you could use "out" as the filename if you wanted to. If the last command in the pipeline didn't have its stdout and stderr redirected somewhere, their file objects would be returned. If you wanted them going to your regular stdout and stderr, I guess you would have to end the pipeline with [">out", ">>err"]. I just realized I made a mistake: In my keywords, I used ">x" for stdout and ">>x" for stderr. Bash uses ">x" and ">>x", and "2>x" and "2>>x", respectively. That could be changed, or we could just leave it all confusing-like. ;-) Idea 3: You could be dirty and just use os.system(). ;-) -- -- http://mail.python.org/mailman/listinfo/python-list