Hence my question: is there some "approximate" hash function (which I could
use instead of SHA-1) which can verify that a text hashes "very close" to a
value? So that if I change, say, tabs into spaces, I won't get exactly the
same value, but I would get a "good enough"?



What you want is what's called a canonicalization function. You want to hash not T, but F(T), and F can de-tabify, and so on.


As has been mentioned, OpenPGP has a canonicalization for text and cleartext signatures (and we debate what it should be, even). XML Digital Signatures has one.

The major other issue you have to deal with is whether the canonicalization is an interpretation, a conversion, or an assurance.

If it's an interpretation, then you are hashing F(T), and there's always some slim chance that there's a collision between two texts that canonicalize to different values, but hash to the same. I wouldn't worry about it, myself. Much. (Assuming a decent canonicalizer, of course.)

If it's a conversion, then you're replacing the text with the canonicalized text. This puts the canonicalization in the face of the users, but removes the problems handwaved at above.

If it's an assurance, then if the text is not canonicalized, then it's not a valid message if it doesn't meet the requirements. A message with a tab, for example, just isn't a valid message. Don't even hash it, return an error. Or alert that it was an invalid message.

        Jon

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]

Reply via email to