On Tue, 17 Mar 2015 08:58:56 +1300
worik <worik.stan...@gmail.com> wrote:

> On 16/03/15 06:43, Steve Litt wrote:
> > But IMHO, sorting 60megalines isn't something I would expect a
> > generic sort command to easily and timely do out of the box.
> 
> I would.  These days such files are getting more and more common.
> 
> But there is a warning in the man page for sort under "BUGS":
> 
>      "To sort files larger than 60MB, use sort -H; files larger than
> 704MB must be sorted in smaller pieces, then merged."
> 
> So it seams there is a bug in... "files larger than 60MB, use sort -H"
> since that did not work for the OP.
> 
> Worik

Oh, jeez, you put your finger *right* on the problem Worik. Both I and
the OP read the manpage wrong. sort -H won't work for extremely big
files (more than 704MB). But there's a fairly easy solution...

An average line length can be found with wc and then dividing. Then
figure out how many lines would make about a 10MB file, and use split
-l to split the file into smaller files with that many lines. Then sort
each of those files, with no arguments, and finally use sort -m to
merge them all back together again into one sorted file.

According to the man page, the preceding should work just fine, and it
can pretty much be automated with a simple shellscript, so you can set
it to run and have it work while you do other things.

SteveT

Steve Litt                *  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance

Reply via email to