On Fri, 2003-12-12 at 15:59, John Fitzgerald wrote:
> I need a list set like this:
> ID    date
> 3008 11/1/03  
> 3008 11/1/03  
> 3008 11/1/03  
> 3010 12/1/03
> 3010 12/1/03
> 
> So I need repeating ID's, with the earliest date for
> each ID. If the order of the data is preserved, I can
> use just those two columns for processing, then
> combine them back with the other columns afterward.

Warning - I've only been working in Perl for just over a week... ;^)

If I understand the unwritten goal correctly, you want to actually
modify the data? Either in the original file or a new file with the
changed dates? Why should it matter if the order is preserved while
processing, as long as the order is preserved in the result?  That also
implies retaining all the data read in during the 'first pass', which
isn't necessary as you can match up IDs trivially.  Largely influenced
by my single 'real' Perl project, (involving multiple 100mb+ logfiles) I
tend to work with the minimum amount of data at a time that is
reasonable.

I would approach it like this:

Loop over the data file once, creating a hash with the IDs as keys and
the earliest date found as the value for each key.  Loop over the data
file a second time and instead of the date field from the file, use the
date field value retrieved from the hash, and write to a file or format
for screen presentation or whatever your goal is for this data.

I'm a rank beginner with Perl, so the following quite likely contains at
least one error, but I'd write it something like this:

my @clientbirth;

open INFILE, "sourcefile" or die "sourcefile open failed - $!";
while (<INFILE>)
{
        my ($uid, $birth, undef) = split;
        $clientbirth{$uid} = earlierdate($birth,$clientbirth{$uid});
}
close INFILE;

open INFILE, "sourcefile" or die "sourcefile open failed - $!";
open OUTFILE, ">destfile" or die "destfile open failed - $!";
while (<INFILE>)
{
        my ($uid, undef, @data) = split;
        print OUTFILE, $uid, $clientbirth{$uid}, @data;
}
close OUTFILE;
close INFILE;


With an implicit sub earlierdate() that returns the earlier of the two
dates presented to it, dealing with whatever the date format is. 
Obviously if this all takes place in sequence it's not necessary to
close and reopen INFILE, just rewind the file to the start, like "seek
(INFILE,0,0);".  Just as obviously, if you aren't intending to write the
data out to file, the second half is inappropriate... ;^)  And there's
an implied assumption that sourcefile's contents aren't subject to
change while being processed.

There are ways to accomplish it in a single pass as well, although
unless you're assured that the 'earliest date' is also the first one to
appear for a given ID, it gets complicated pretty quickly unless you
simply read all data in and then work with it.  (Your sample data shows
out-of-order dates, IE for ID=3010.)

j

> --- Rob Dixon <[EMAIL PROTECTED]> wrote:
> > John Fitzgerald wrote:
> > >
> > > Hi, I'm fairly new to Perl, and trying to do a
> > simple
> > > operation on a text file exported from excel.
> > > ID      Enrolled     Extraneous Columns....
> > > 3008 05-Aug-03
> > > 3008 05-Aug-03
> > > 3008 05-Aug-03
> > > 3008 05-Aug-03
> > > 3008 24-Sep-03
> > > 3009 11-Aug-03
> > > 3010 19-Nov-03
> > > 3010 11-Jul-03
> > > 3010 11-Jul-03
> > > 3010 11-Jul-03
> > > 3011 15-Jul-03
> > >
> > > As you can see, the dates for a given ID are
> > > different. What I need to do, is set the dates all
> > to
> > > the earliest date for that ID (client-birth date).

> > The
> > > other columns are are important, but don't factor
> > in
> > > here.

-- 
"Not all those who wander are lost."  - JRR Tolkien


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to