Hi, Something I have to do very often is filtering / transforming line based file contents and storing the result in an array or a dictionary.
Very often the functionallity exists already in form of a shell script with sed / awk / grep , . . . and I would like to have the same implementation in my script What's a compact, efficient (no intermediate arrays generated / regexps compiled only once) way in python for such kind of 'pipe line' Example 1 (in bash): (annotated with comment (thus not working) if copied / pasted #------------------------------------------------------------------------------------------- cat file \ ### read from file | sed 's/\.\..*//' \ ### remove '//' comments | sed 's/#.*//' \ ### remove '#' comments | grep -v '^\s*$' \ ### get rid of empty lines | awk '{ print $1 + $2 " " $2 }' \ ### knowing, that all remaining lines contain always at least \ ### two integers calculate sum and 'keep' second number | grep '^42 ' ### keep lines for which sum is 42 | awk '{ print $2 }' ### print number Same example in perl: # I guess (but didn't try), taht the perl example will create more intermediate # data structures than necessary. # Ideally the python implementation shouldn't do this, but just 'chain' iterators. #------------------------------------------------------------------------------------------- my $filename= "file"; open(my $fh,$filename) or die "failed opening file $filename"; # order of 'pipeline' is syntactically reversed (if compared to shell script) my @numbers = map { $_->[1] } # extract num 2 grep { $_->[0] == 42 } # keep lines with result 42 map { [ $_->[0]+$_->[1],$_->[1] ] } # calculate sum of first two nums and keep second num map { [ split(' ',$_,3) ] } # split by white space grep { ! ($_ =~ /^\s*$/) } # remove empty lines map { $_ =~ s/#.*// ; $_} # strip '#' comments map { $_ =~ s/\/\/.*// ; $_} # strip '//' comments <$fh>; print "Numbers are:\n",join("\n",@numbers),"\n"; thanks in advance for any suggestions of how to code this (keeping the comments) H -- http://mail.python.org/mailman/listinfo/python-list