If you or anyone who reads the thread is interested in using Python in an advanced way you use generators and build processing chains that will take the performance of Python to the edge and even give old AWK a run for its money for certain types of processing.
Python: wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes) Python execution time: 25.96 seconds AWK: % awk '{ total += $NF } END { print total }' big-access-log AWK execution time: 37.33 seconds With generators you can plug in filters at any stage: lines = lines_from_dir("big-access-log",".") lines = (line for line in lines if 'robots.txt' in line) log = apache_log(lines) addrs = set(r['host'] for r in log)the beauty of generators is that you can plug filters in at almost any stage The second line increased the execution time of a 1.3GB log file. Without it the execution was shameful at 53 minutes With the second line added execution time was 93 seconds David Beazley presented a great talk and accompanying PDF at PyCon'2008. It would be great if these generator tricks / patterns came more to the focus of the commuinity. Link if interested: http://www.dabeaz.com/generators/Generators.pdf 2009/1/16 Alfons Nonell-Canals <alfons.non...@upf.edu>: > Hello, > I'm developing a software package using python. I've programmed all > necessary tools but I have to use other stuff from other people. Most of > these external scripts are developed using awk. > > At the beggining I thought to "translate" them and program them in python > but I prefer to avoid it because it means a lot of work and I should do it > after each new version of this external stuff. I would like to integrate > them into my python code. > > I know I can call them using the system environment but it is slower than if > I call them inside the package. I know it is possible with C, do you have > experience on integrate awk into python calling these awk scripts from > python? > > Thanks in advance! > > Regards, > Alfons. > > > -- > ------------ > Alfons Nonell-Canals, PhD > Chemogenomics Lab > Research Group on Biomedical Informatics (GRIB) - IMIM/UPF > Barcelona Biomedical Research Park (PRBB) > C/ Doctor Aiguader, 88 - 08003 Barcelona > alfons.non...@upf.edu - http://cgl.imim.es > Tel. +34933160528 > > http://alfons.elmeuportal.cat > http://www.selenocisteina.info > > -- > http://mail.python.org/mailman/listinfo/python-list > > -- http://mail.python.org/mailman/listinfo/python-list