On Tue, 17 Mar 2015 08:58:56 +1300 worik <worik.stan...@gmail.com> wrote:
> On 16/03/15 06:43, Steve Litt wrote: > > But IMHO, sorting 60megalines isn't something I would expect a > > generic sort command to easily and timely do out of the box. > > I would. These days such files are getting more and more common. > > But there is a warning in the man page for sort under "BUGS": > > "To sort files larger than 60MB, use sort -H; files larger than > 704MB must be sorted in smaller pieces, then merged." > > So it seams there is a bug in... "files larger than 60MB, use sort -H" > since that did not work for the OP. > > Worik Oh, jeez, you put your finger *right* on the problem Worik. Both I and the OP read the manpage wrong. sort -H won't work for extremely big files (more than 704MB). But there's a fairly easy solution... An average line length can be found with wc and then dividing. Then figure out how many lines would make about a 10MB file, and use split -l to split the file into smaller files with that many lines. Then sort each of those files, with no arguments, and finally use sort -m to merge them all back together again into one sorted file. According to the man page, the preceding should work just fine, and it can pretty much be automated with a simple shellscript, so you can set it to run and have it work while you do other things. SteveT Steve Litt * http://www.troubleshooters.com/ Troubleshooting Training * Human Performance