On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote: > Not to look the gift horse in the mouth, but it'd be nice > if you wrote ChangeLog entries, too. And even (gasp! :-) > a test case or two. Of course, we'd expect such a test case > (probably named tests/misc/sort-compress, and based on > tests/sample-test) to have this line in it: > > . $srcdir/../very-expensive > > If you don't have time for that, I'll take care of it, eventually.
I'm not going to stop you :-) I just haven't had the time to look into it yet, so I've just been running the coreutils tests and then my own tests. I was planning on adding the tests, as you say, "eventually". > Default to just "gzip", not /bin/gzip. The latter may not exist; > your patch already handles that, but /bin/gzip may not be the first > gzip in PATH. Also, don't bother with the access-XOK check. > There's no point in incurring even such a small overhead in the > general case, when no temporary is used. The reason I put the access check in there is that if we default to gzip and it doesn't exist, then of course the exec will fail, the child will fail, and this will cause sort to fail when it really should just not do the compression, or try another default if something suitable exists (what about compress?). How about we just delay the determination of the compress program until it's actually needed (e.g., in create_temp right before "if (compress_program)" we have "if (compress_program_known)" and inside the body we check the environment variable and/or do access checks on possible defaults)? > But please address the FIXME I've added. If we can't fork a compression process, it's not the end of the world. We just don't compress that file and sort will still work. If we can't fork a decompression process, we can't continue the sort. So I figure we'll just try twice to fork compression processes, and if we can't do it after 1 sec, we're probably wasting more time waiting to fork than we would doing disk access. However, we really need to be able to fork decompression processes, so we can afford to wait a really long time for it. I was considering making the number of tries for decompression processes even larger (now, it'll wait about 2 min before giving up). > Have you considered using the gnulib hash module rather than > rolling your own? There are examples in many of coreutils/src/*.c. I'm not familiar with gnulib, so I didn't know a hash module existed or think to look for one. Looking at it now, though, it seems it will be slower because of its abstraction, unless the table fills up to the point where it would be faster to access if it grew. I'd prefer (since it seems to be my time we're talking about), to leave it the way it is because it's simple, and see if the gnulib module is faster later. We've already seen tests (yours :-) that show the patch improves large sorts (granted, they used the last patch, not this one), so it's still a win to put it in, and optimizations later are just gravy. Thanks for the advice. I've got some other work I've got to get caught up with first (trust me, I'd rather do this), but hopefully I can get to work on it tomorrow. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils