Lemley James - jlemle <[EMAIL PROTECTED]> writes: > I've coded a patch to sort.c from coreutils 5.3.0 for your careful > consideration (and my use ;) to perform a binary search in the merge pass, > only if the key was not found in the first merge temp file.
Thanks for that idea. How about the following patch instead? It uses just one loop rather than two nested ones, and should do two fewer key comparisons (in the common case with large merges, anyway), and I think the code is simpler overall. I installed this patch, but if you see something wrong with it please let me know so that we can improve it. I also added you to the THANKS file. Thanks! While we're on the subject of large NMERGE values, why don't we modify "sort" as follows? 1. Add an option to let the user specify NMERGE on the command line, e.g., "sort --n-way-merge=1024". 2 (probably more important in practice). Use a better heuristic for NMERGE, when the user does not specify NMERGE. Can you suggest a good heuristic, given your experience with large merges and GNU sort? For example, would it be good simply to use an NMERGE that is as large as possible (i.e., does not exhaust memory and/or file descriptors)? If not, what would be the limiting factors? --- sort.c 2 Dec 2004 06:55:31 -0000 1.302 +++ sort.c 14 Feb 2005 18:04:22 -0000 1.303 @@ -1778,18 +1778,31 @@ mergefps (char **files, size_t ntemps, s } /* The new line just read in may be larger than other lines - already in core; push it back in the queue until we encounter - a line larger than it. */ - for (i = 1; i < nfiles; ++i) - { - int cmp = compare (cur[ord[0]], cur[ord[i]]); - if (cmp < 0 || (cmp == 0 && ord[0] < ord[i])) - break; - } - t = ord[0]; - for (j = 1; j < i; ++j) - ord[j - 1] = ord[j]; - ord[i - 1] = t; + already in main memory; push it back in the queue until we + encounter a line larger than it. Optimize for the common + case where the new line is smallest. */ + { + size_t lo = 1; + size_t hi = nfiles; + size_t probe = lo; + size_t ord0 = ord[0]; + size_t count_of_smaller_lines; + + while (lo < hi) + { + int cmp = compare (cur[ord0], cur[ord[probe]]); + if (cmp < 0 || (cmp == 0 && ord0 < ord[probe])) + hi = probe; + else + lo = probe + 1; + probe = (lo + hi) / 2; + } + + count_of_smaller_lines = lo - 1; + for (j = 0; j < count_of_smaller_lines; j++) + ord[j] = ord[j + 1]; + ord[count_of_smaller_lines] = ord0; + } } if (unique && savedline) _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils