On Aug 7, 2011 1:15 PM, "Paul Johnson" <p...@pjcj.net> wrote: > > On Sun, Aug 07, 2011 at 08:58:14PM +0530, Ramprasad Prasad wrote: > > > I have a file that contains records of customer interaction > > The first column of the file is the batch number(INT) , and other columns > > are date time , close time etc etc > > > > I have to sort the entire file in order of the first column .. but the > > problem is that the file is extremely huge. > > > > For the largest customer it contains 1100 million records and the file is > > 44GB ! > > how can I sort this big a file > > Is there any reason not to use the system sort? GNU sort uses an > external R-way merge. It's designed for this sort of thing. >
The Unix sort is pretty fast and it will work. The problem with it is that it seems to buffer overflow somewhere between 2 and 4 gigs, IIRC. A database is perfect for this. However, I think the problem was that mysql's order by is slow as hell. It can be sped up (slightly) with an index. You might consider postgresql as their order by /should/ be quite a bit faster. You might also try mongo or couch - though you'll put the sort logic in the script and I haven't used either in perl. If you've already got it in a db, I'd create the index, start the query, watch your resources get pegged, and wait. You'll get it eventually. :)