Thanks to all for your replies. i want to clarify what i mean by a pipeline. a major feature i am looking for is the ability to chain functions or scripts together, where the output of one script -- which is usually a file -- is required for another script to run. so one script has to wait for the other. i would like to do this over a cluster, where some of the scripts are distributed as separate jobs on a cluster but the results are then collected together. so the ideal library would have easily facilities for expressing this things: script X and Y run independently, but script Z depends on the output of X and Y (which is such and such file or file flag).
is there a way to do this? i prefer not to use a framework that requires control of the clusters etc. like Disco, but something that's light weight and simple. right now ruffus seems most relevant but i am not sure -- are there other candidates? thank you. On Nov 23, 4:02 am, Paul Rudin <paul.nos...@rudin.co.uk> wrote: > per <perfr...@gmail.com> writes: > > hi all, > > > i am looking for a python package to make it easier to create a > > "pipeline" of scripts (all in python). what i do right now is have a > > set of scripts that produce certain files as output, and i simply have > > a "master" script that checks at each stage whether the output of the > > previous script exists, using functions from the os module. this has > > several flaws and i am sure someone has thought of nice abstractions > > for making these kind of wrappers easier to write. > > > does anyone have any recommendations for python packages that can do > > this? > > Not entirely what you're looking for, but the subprocess module is > easier to work with for this sort of thing than os. See e.g. > <http://docs.python.org/library/subprocess.html#replacing-shell-pipeline> -- http://mail.python.org/mailman/listinfo/python-list