I'm new to Python and fairly experienced in Perl, although that experience is limited to the things I use daily.
I wrote the same script in both Perl and Python, and the output is identical. The run speed is similar (very fast) and the line count is similar. Now that they're both working, I was looking at the code and wondering what Perl-specific and Python-specific improvements to the code would look like, as judged by others more knowledgeable in the individual languages. I am not looking for the smallest number of lines, or anything else that would make the code more difficult to read in six months. Just any instances where I'm doing something inefficiently or in a "bad" way. I'm attaching both the Perl and Python versions, and I'm open to comments on either. The script reads a file from standard input and finds the best record for each unique ID (piid). The best is defined as follows: The newest expiration date (field 5) for the record with the state (field 1) which matches the desired state (field 6). If there is no record matching the desired state, then just take the newest expiration date. Thanks for taking the time to look at these. Shawn ########################################################################## Perl code: ########################################################################## #! /usr/bin/env perl use warnings; use strict; my $piid; my $row; my %input; my $best; my $curr; foreach $row (<>){ chomp($row); $piid = (split(/\t/, $row))[0]; push ( @{$input{$piid}}, $row ); } for $piid (keys(%input)){ $best = ""; for $curr (@{$input{$piid}}){ if ($best eq ""){ $best = $curr; }else{ #If the current record is the correct state if ((split(/\t/, $curr))[1] eq (split(/\t/, $curr))[6]){ #If existing record is the correct state if ((split(/\t/, $best))[1] eq (split(/\t/, $curr))[6]){ if ((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5]){ $best = $curr; } }else{ $best = $curr; } }else{ #if the existing record does not have the correct state #and the new one has a newer expiration date if (((split(/\t/, $best))[1] ne (split(/\t/, $curr))[6]) and ((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5])){ $best = $curr; } } } } print "$best\n"; } ########################################################################## End Perl code ########################################################################## ########################################################################## Python code ########################################################################## #! /usr/bin/env python import sys input = sys.stdin recs = {} for row in input: row = row.rstrip('\n') piid = row.split('\t')[0] if recs.has_key(piid) is False: recs[piid] = [] recs[piid].append(row) for piid in recs.keys(): best = "" for current in recs[piid]: if best == "": best = current; else: #If the current record is the correct state if current.split("\t")[1] == current.split("\t")[6]: #If the existing record is the correct state if best.split("\t")[1] == best.split("\t")[6]: #If the new record has a newer exp. date if current.split("\t")[5] > best.split("\t")[5]: best = current else: best = current else: #If the existing record does not have the correct state #and the new record has a newer exp. date if best.split("\t")[1] != best.split("\t")[6] and current.split("\t")[5] > best.split("\t")[5]: best = current print best ########################################################################## End Python code ########################################################################## -- http://mail.python.org/mailman/listinfo/python-list