> > I have a file that contains records of customer interaction
> > The first column of the file is the batch number(INT) , and other
> > are date time , close time etc etc
> >
> > I have to sort the entire file in order of the first column .. but the
> > problem is that the file is extremely huge.
> >
> > For the largest customer it contains 1100 million records and the file
> > 44GB !
> > how can I sort this big a file
> Is there any reason not to use the system sort?  GNU sort uses an
> external R-way merge.  It's designed for this sort of thing.

The Unix sort is pretty fast and it will work. The problem with it is that
it seems to buffer overflow somewhere between 2 and 4 gigs, IIRC. A database
is perfect for this. However, I think the problem was that mysql's order by
is slow as hell. It can be sped up (slightly) with an index. You might
consider postgresql as their order by /should/ be quite a bit faster. You
might also try mongo or couch - though you'll put the sort logic in the
script and I haven't used either in perl.

If you've already got it in a db, I'd create the index, start the query,
watch your resources get pegged, and wait. You'll get it eventually. :)

