On Wed, Oct 24, 2001 at 04:15:08PM +0100, Lucy McWilliam wrote: > > If anyone can think of a better (i.e. quicker) way to do it - even if it > means having to relearn C - I'd love to know. Code snippets below. Bonus > points for the subject line ;-) >
If you have enough swap and RAM, you could read the file in once and convert it to a 2D array of score[fly,fly] stored in a very long string and accessed via substr and unpack. Assuming your packed scores take 4 bytes (long int or single precision float) you would need 4*$N*$N bytes of virtual memory to hold this lot, where there are $N flies. Because of the way your file is ordered, I think you can build this string with one pass through the file without undue swapping if you have enough RAM to have $N virtual memory pages in RAM at once. Assuming a VM page size of 4k, this would take up 400M of swap and 40M of RAM for 10000 flies. Now, one fly at a time, do the sort to convert score[other fly] into ranking[other fly]. Assuming less than 65000 flies, the rankings will take less space than the scores since each can be packed into 2 bytes. This leaves you with the numbers you want packed into 2*$N*$N bytes. If you don't have enough RAM to keep the whole results string paged in then you'll need to be a bit careful about the order in which you extract the answers to write them out to file. You can pull them out in pairs in the same order as the original file with $N VM pages of RAM. Nick -- ($O= #\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ /////////////////////////////#{O$}xb| q|HHHNIiHIHIHNNHHHHI{HHHiiHHHHHiiI|^#\/#(|}OM:-#+(iI$:-+!:- >i (!>:=#!i +-|b q|-+ !i#=:<i) !< -:i+-:$I!)+#-:WO{|)#/\#v|I!!HHHHH!!HHH}IHHHHNNHIHIH!INHHH|b |qx{$O}#///////////////////////////// \\\\\\\\\\\\\\\\\\\\\\\\\\\\\# =O$)