Thanks for working on this. You've gotten further than anyone else has! Some quick comments:
Frederik Eaton <[EMAIL PROTECTED]> writes: > Is there a script for making a patch with all the right files excluded > by the way? Not yet. That's on the list of things to do. The fix will be to remove those files from the repository, and have a bootstrap script that generates them. Then "cvs diff" will do the right thing. >> shred also tries to obtain a random seed. >> It'd be nice (eventually) to have both programs >> use the same mechanism. > > I did not realize this. Yes, perhaps it would. That sounds right to me as well. > I do not have any idea, but it would be good to have random seed > functionality somewhere standard-ish. It would be nice if it were in > libc but that's an area I don't want to tackle. It's a reasonable thing to put in coreutils/lib. We can then put it into gnulib. > Also, I didn't personally think it was necessary to do anything but > look at the time and process ID for a default seed - I only added the > "/dev/random" stuff because Paul Eggert seems to think that security > is very important here If "sort -R" is used for anything security-relevant, then it will be important, yes. > - and once "/dev/random" is referenced then I figured compatibility > with other kinds of systems would become an issue No, the method used by "shred" is good enough. We don't need to worry about EGD any more. It's an obsolete hack. You might be better off starting with the code in "shred". > I would like not to spend much more time on this, by the way, Alas, more work needs to be done. Perhaps we can recruit someone else to look into it? I will put it on my list of things to do as well, but it's a pretty long list.... + * src/sort.c: add functions init_salt, get_hash; 'init_salt' is + used to provide seeding, and 'get_hash' calculates the hash value + by which a key is sorted Please use the ChangeLog style recommended by the GNU coding standards <http://www.gnu.org/prep/standards/html_node/Change-Logs.html>. For example, the first line should start with "* src/sort.c (init_salt, get_hash):". It's important to log every externally-visible identifier whose meaning has changed. Also, each external symbol (function, macro, variable) should have a comment explaining what it does. Currently I'm at a bit of a loss trying to figure out what things do, so my comments will be limited. +#ifndef _CHECKSUM_H +#define _CHECKSUM_H 1 + +#include <sys/types.h> +#include <config.h> +#include "sha1.h" +#include "md5.h" This seems overkill to me. sort is just using md5, right? + int len = strlen(str); There should be a space before the "(". There are several other instances of that. + return (len*2) >= ops->bits; Please put spaces around the "*". The parentheses aren't needed here. Also, we prefer <= to >=. This is a programming style rule that I learned from D. Val Schorre. It's an application of Leibnitz's criterion for notation: textual order should reflect numeric order. + --seed use specified seed for random number generator\n\ --seed has an operand, so it should be mentioned here. + void *ctx = alloca(digops.ctx_size); What are the bounds on digops.ctx_size? If it can be large (more than a few kilobytes) then we shouldn't rely on alloca, due to stack-overflow detection issues. + else if (key->random_hash) + { + int dig_bytes = digops.bits/8; + char diga[dig_bytes]; + char digb[dig_bytes]; + get_hash(texta, lena, diga); + get_hash(textb, lenb, digb); + diff = memcmp(diga, digb, sizeof(diga)); + } It should be possible to combine -R with -b, -d, -f, -g, -i, -M, -n, and -r. For example, sort -nR should compute the same hash for 1.0 and 01.0 that it computes for 1.00. Currently, though, the -R is silently ignored in this case. Conversely, sort -MR should compute the same hash for " Jan" that it does for "Jan"; but here, the "-M" is silently ignored. else if (key->month) diff = getmonth (texta, lena) - getmonth (textb, lenb); /* Sorting like this may become slow, so in a simple locale the user @@ -1986,7 +2038,7 @@ badfieldspec (char const *spec, char con { error (SORT_FAILURE, 0, _("%s: invalid field specification %s"), _(msgid), quote (spec)); - abort (); + abort (); // inserted to avoid a compiler warning } Please don't use //-style comments, as we can't assume C99 yet. Also, please prefer declarative sentences in comments, and put the comments on separate lines before the code they describe, e,g.: /* Avoid a compiler warning. */ abort (); + char *randseed = 0; This should be type "char const *", not "char *". Note that we prefer the const after the char. Also, the 0 should be NULL. + randseed = strdupa(optarg); We can't assume strdupa exists. But you don't need to copy optarg; just use it without copying it. + -R, --random sort by random hash of keys\n\ I would prefer the long option name --random-hash, since it's not a truly random sort. There is a key problem here: one of documentation. doc/coreutils.texi and NEWS need to be modified to say exactly what's going on, and why sorting based on a random hash is not the same as generating a random permutation, so that people who are doing security-relevant stuff won't rely on something that they shouldn't. Also, I can't easily tell from the code how many random bits are currently acquired from the environment, and whether there are enough bits to actually satisfy the claim that we are using a random hash. _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
