Hi Paul, thanks for the good words! That's an elegant solution - starting probe at 1 instead of in the middle, to get the case where the key we want is already in the first file. I like it. I had set up a test on my machine to try lots of different scenarios with the original sort.c and my modified version, and run it over-night. I'll do the same with the patch below and I'll let you know the results.
As to making NMERGE a command line option and/or a dynamically set value, I love that idea. I don't know if there are still systems in use where the number of open files allowed is low, so I would be conservative with any new setting unless it is obvious the system is a large one, which leads me to the next topic: Regarding large NMERGE and a good heuristic: I found that performance suffers as each file's buffer area goes down - so either too small -S on the command line, or too large NMERGE, and seeking between merge files becomes a bottleneck(*). In practice on one particular machine, seeking became a pretty big bottleneck when there was less than 512K of buffer memory for each file, and it really worked best with 1MB or more per merge file. I was using -S 1G with NMERGE=1024, sorting a 100GB file, and it ran very well indeed. I am sure it will be machine dependent, but making NMERGE be roughly the number of megabytes in the merge buffer, for merge buffer sizes >= 16, would be a good start I think. That would least-surprise most people who don't set -S or the new NMERGE parameter, and pleasantly surprise folks who have gobs of RAM and can probably also support lots of open files for reduced merge passes. Some testing is in order; it could be that 512K or 2MB would be the sweet spot, but I would think that would be a good way to set NMERGE if not specified. (*) note: I assume it was seeking. it could have been the number of system calls to read() and write() went way up, and that was the overhead. Either way, it was badness. --James ********************************************************************** The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please re-send this communication to the sender and delete the original message or any copy of it from your computer system. Thank You. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils