Hi All,

I'm back and with a new algorithm/solution I need help with.
I have two csv files, sorted by the first column (ID).
Each file may have all the same, none of the same, or some of the same ID's.
I would like to take these two files, and make one out of them.
Two tricks:
 - When I come across the same ID in each file I need to merge those two
lines (don't worry about the merge, I can handle that).
 - I want to be looking at the least number of lines from each file as
possible at any one time (optimally I would like to only be looking at one
of each file at the same time).

Basically we are dealing with large files here and I don't want to kill my
RAM by storing all the data from both files into a hash or some other
object.

I have an algorithm I like, I'm just not certain how to implement it:
1. Examine the ID of the first line of each file.
2. If they are the same, then merge and print the merge to the final output
file..
3. If they are not the same, find the lesser one and have it print its
contents to the final output file until its ID is the same or greater than
the other file's.
4. repeat.

Any advice on how to do this?

Thanks.
--Alex
 
_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to