Leo Butler <[EMAIL PROTECTED]> writes:

> I don't know if this is relevant, but I have extracted the 2nd through 1000th 
> character in the 50GB file, and there appears to be garbage (unprintable 
> chars) 
> in the first line. The remainder of the extract looks fine. Moreover, I split 
> the file into 500MB chunks, sorted these and then merge sorted the pairs. It 
> appears that the 500MB chunks produced by split have been stripped of '\n' 
> and 
> are garbage, as are the sorted files.

Hmm, it sounds like your input data has some very long lines, then.
That would explain at least part of your problem, then.  'sort' needs
to keep at least two lines in main memory to compare them: if single
input lines are many gigabytes long, then 'sort' must consume many
gigabytes of memory, regardless of what parameter you specify with '-S'.


_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Reply via email to