Re: Sorting an extremely LARGE file

shawn wilson Sun, 07 Aug 2011 12:20:43 -0700

On Aug 7, 2011 1:15 PM, "Paul Johnson" <p...@pjcj.net> wrote:
>
> On Sun, Aug 07, 2011 at 08:58:14PM +0530, Ramprasad Prasad wrote:
>
> > I have a file that contains records of customer interaction
> > The first column of the file is the batch number(INT) , and other
columns
> > are date time , close time etc etc
> >
> > I have to sort the entire file in order of the first column .. but the
> > problem is that the file is extremely huge.
> >
> > For the largest customer it contains 1100 million records and the file
is
> > 44GB !
> > how can I sort this big a file
>
> Is there any reason not to use the system sort?  GNU sort uses an
> external R-way merge.  It's designed for this sort of thing.
>


The Unix sort is pretty fast and it will work. The problem with it is that
it seems to buffer overflow somewhere between 2 and 4 gigs, IIRC. A database
is perfect for this. However, I think the problem was that mysql's order by
is slow as hell. It can be sped up (slightly) with an index. You might
consider postgresql as their order by /should/ be quite a bit faster. You
might also try mongo or couch - though you'll put the sort logic in the
script and I haven't used either in perl.

If you've already got it in a db, I'd create the index, start the query,
watch your resources get pegged, and wait. You'll get it eventually. :)

Re: Sorting an extremely LARGE file

Reply via email to