> Also, each external symbol (function, macro, variable) should have a comment > explaining what it does. Currently I'm at a bit of a loss trying to > figure out what things do, so my comments will be limited. > > +#ifndef _CHECKSUM_H > +#define _CHECKSUM_H 1 > + > +#include <sys/types.h> > +#include <config.h> > +#include "sha1.h" > +#include "md5.h" > > This seems overkill to me. sort is just using md5, right?
I wanted an easy way to switch between them. Perhaps I should have gotten rid of this macro stuff from md5sum.c as well, but at least what I added lets someone else take that on in the future: #define DIGEST_TYPE_STRING(Alg) ((Alg) == ALG_MD5 ? "MD5" : "SHA1") #define DIGEST_STREAM(Alg) ((Alg) == ALG_MD5 ? md5_stream : sha_stream) #define DIGEST_BITS(Alg) ((Alg) == ALG_MD5 ? 128 : 160) #define DIGEST_HEX_BYTES(Alg) (DIGEST_BITS (Alg) / 4) #define DIGEST_BIN_BYTES(Alg) (DIGEST_BITS (Alg) / 8) > + int len = strlen(str); > > There should be a space before the "(". There are several other instances > of that. > > + return (len*2) >= ops->bits; > > Please put spaces around the "*". The parentheses aren't needed here. It surprises me that this seems so important to you guys, but whatever. > Also, we prefer <= to >=. This is a programming style rule that I > learned from D. Val Schorre. It's an application of Leibnitz's "Leibniz's" > criterion for notation: textual order should reflect numeric order. Interesting. > + void *ctx = alloca(digops.ctx_size); > > What are the bounds on digops.ctx_size? If it can be large (more than > a few kilobytes) then we shouldn't rely on alloca, due to stack-overflow > detection issues. It's only going to be a few bytes. > + else if (key->random_hash) > + { > + int dig_bytes = digops.bits/8; > + char diga[dig_bytes]; > + char digb[dig_bytes]; > + get_hash(texta, lena, diga); > + get_hash(textb, lenb, digb); > + diff = memcmp(diga, digb, sizeof(diga)); > + } > > It should be possible to combine -R with -b, -d, -f, -g, -i, -M, -n, > and -r. For example, sort -nR should compute the same hash for 1.0 > and 01.0 that it computes for 1.00. Currently, though, the -R is > silently ignored in this case. Conversely, sort -MR should compute > the same hash for " Jan" that it does for "Jan"; but here, the "-M" is > silently ignored. I disagree that this is important. > + -R, --random sort by random hash of keys\n\ > > I would prefer the long option name --random-hash, since it's not a > truly random sort. What, in the sense that it pays attention to the keys, or that the numbers are pseudorandom? I might call something which pays attention to keys a random sort, and something which doesn't a random shuffle. I don't like your suggestion because I don't think the implementation should show up as part of the API. If it's the pseudorandomness, I think mentioning that is redundant, and the same thing I said about not wanting implementation in the API applies - a good pseudorandom number generator should be externally indistinguishable from an actual random number generator. > There is a key problem here: one of documentation. doc/coreutils.texi > and NEWS need to be modified to say exactly what's going on, and why > sorting based on a random hash is not the same as generating a random > permutation, so that people who are doing security-relevant stuff > won't rely on something that they shouldn't. Yes, I mentioned that I haven't written documentation yet. I will keep in mind that these files need to be modified. > Also, I can't easily tell from the code how many random bits are > currently acquired from the environment, and whether there are enough > bits to actually satisfy the claim that we are using a random hash. It should be 128 bits for MD5 and 160 for SHA1. Do you think I need more? Frederik _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils