I'm new to Python and fairly experienced in Perl, although that
experience is limited to the things I use daily.

I wrote the same script in both Perl and Python, and the output is
identical. The run speed is similar (very fast) and the line count is
similar.

Now that they're both working, I was looking at the code and wondering
what Perl-specific and Python-specific improvements to the code would
look like, as judged by others more knowledgeable in the individual
languages.

I am not looking for the smallest number of lines, or anything else
that would make the code more difficult to read in six months. Just
any instances where I'm doing something inefficiently or in a "bad"
way.

I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.

Thanks for taking the time to look at these.

Shawn

##########################################################################
Perl code:
##########################################################################
#! /usr/bin/env perl

use warnings;
use strict;

my $piid;
my $row;
my %input;
my $best;
my $curr;

foreach $row (<>){

        chomp($row);
        $piid = (split(/\t/, $row))[0];

        push ( @{$input{$piid}}, $row );
}

for $piid (keys(%input)){

        $best = "";

        for $curr (@{$input{$piid}}){
                if ($best eq ""){
                        $best = $curr;
                }else{
                        #If the current record is the correct state

                        if ((split(/\t/, $curr))[1] eq (split(/\t/, $curr))[6]){
                                #If existing record is the correct state
                                if ((split(/\t/, $best))[1] eq (split(/\t/, 
$curr))[6]){
                                        if ((split(/\t/, $curr))[5] gt 
(split(/\t/, $best))[5]){
                                                $best = $curr;
                                        }
                                }else{
                                        $best = $curr;
                                }
                        }else{
                                #if the existing record does not have the 
correct state
                                #and the new one has a newer expiration date
                                if (((split(/\t/, $best))[1] ne (split(/\t/, 
$curr))[6]) and
((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5])){
                                        $best = $curr;
                                }
                        }
                }


        }
        print "$best\n";
}

##########################################################################
End Perl code
##########################################################################






##########################################################################
Python code
##########################################################################

#! /usr/bin/env python

import sys

input = sys.stdin

recs = {}

for row in input:
        row = row.rstrip('\n')
        piid = row.split('\t')[0]
        if recs.has_key(piid) is False:
                recs[piid] = []
        recs[piid].append(row)

for piid in recs.keys():
        best = ""
        for current in recs[piid]:
                if best == "":
                        best = current;
                else:
                        #If the current record is the correct state
                        if current.split("\t")[1] == current.split("\t")[6]:
                                #If the existing record is the correct state
                                if best.split("\t")[1] == best.split("\t")[6]:
                                        #If the new record has a newer exp. date
                                        if current.split("\t")[5] > 
best.split("\t")[5]:
                                                best = current
                                else:
                                        best = current
                        else:
                                #If the existing  record does not have the 
correct state
                                #and the new record has a newer exp. date
                                if best.split("\t")[1] != best.split("\t")[6] 
and
current.split("\t")[5] > best.split("\t")[5]:
                                        best = current
                        
        print best


##########################################################################
End Python code
##########################################################################
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to