bignum audit

Geoff Thorpe Mon, 27 Oct 2003 21:49:17 -0800

Hi y'all,

This is a "ping -b" to anyone who has an interest in the integer math code 
in openssl. Otto and Nils had reported a few discrepencies a while back, 
and there is already a RT ticket from Otto about one aspect of this. 
However, as I wade into crypto/bn/, I finding that it's a bit of a 
pandorah's box (I would gone with the "rabbit hole" thing, but that's a 
little too fashionable lately).


My interest is a little more fundamental than the particular error cases 
that have been discussed so far - I wold like to make the API more robust 
against operations that leave BIGNUM variables in an inconsistent state. 
BTW: my definition of a consistent state for a bignum x is for it to be 
invariant under bn_fix_top(). It's a home-brewed definition for sure, but 
I think it's logical enough because of what "fix" means. IMHO 
bn_fix_top() should only be needed to recalibrate a bignum structure 
after directly manipulating bignum's data representation, *especially* 
from the point of view of API users though this seems rational for 
internal code too. Of course, there are obviously other ways for a BIGNUM 
structure to be inconsistent too, but I'm only interested in the 
relationship between the x->top index and the x->d[0..(top-1)] array-data 
because that is there we currently legitimise something I consider to be 
a bug. I've started some auditing/hacking that involves hijacking some 
existing macros (eg. bn_[fix|check]_top) and adding some others (eg. 
bn_verify_top), and in doing this, I've stumbled onto a question that I 
realise Nils had also posted some time ago, so I'll quote his version of 
it;

> Another question would be if both bignum representations for '0'
> should be considered legal, i.e. is {{0, 0}, 0, 2, 0, 0} the same as
> {{0, 0}, 1, 2, 0, 0} (BN_is_zero returns 1 for both representations
> of '0') ?

This observation/question turns out to be quite a nutcracker. To explain, 
for some perverse reason there is a convention in openssl to accept that 
a legitimate "zero-valued" BIGNUM could either have x->top set to zero, 
or it can have x->top set to one but with the corresponding word of the 
bignum data (x->d[0]) set to zero. Anyone who has done more than one 
high-school induction proof or implemented any kind of loop algorithm 
should intuitively feel itchy about this kind of thing, and there are 
some solid reasonings against this that go beyond intuition too. The best 
generalised rule I could come up with was this; x->top should be the 
minimal number of words for its data representation, which coincidently 
happens to be a "rule" that is 1-1 with the output from bn_fix_top() 
anyway. In particular one of these two "zero" forms is the output you get 
from bn_fix_top() no matter which of the two you feed in. So in the case 
of '0', this reasoning would suggest that x->top be zero. Does anyone out 
there have some historical knowledge (SSLeay?) or some convincing 
explanations for why the the exceptional 1-word form for '0' should be 
considered acceptable?

Going on from that, this "tolerance" is the cause of quite a bit of 
complexity in crypto/bn/, so as part of this "audit" I'm wondering to 
what extent we can clean this up. The obvious issue is backwards 
compatibility with applications or libraries that have some reliance on 
this perversity. Speaking for myself, whilst I could understand some 
caution from people sensitive to that, my own caution is sensitive to 
complexity and bugs in openssl itself and I think both stem rather easily 
from working around this mathematically-illogical exceptional case. 
Currently, all the bignum code needs to ensure it is robust against this 
"rule", including the exposed API macros that are used in tight-loop 
algorithm code for word-based adds, compares, etc. Read this another way: 
any elegant algorithmic code that doesn't have an ugly "if()" before or 
after it is at risk to this problem. IMHO this is not just a problem now 
for the one or two bugs spotted so far, it's always going to remain a 
maintenance problem that risks future bugs. (The "Principle of most 
surprise" anyone?) So the census I'm looking for it this: even if noone 
has any convincing explanations for *why* this alternative 1-word 
exception is tolerated, does anyone have any convincing arguments against 
us phasing it out?

BTW: w.r.t. this "audit" (if you can call it such) ... I know some people 
will prefer to be kept in the loop about this and have a chance to 
contribute long before I start stuffing changes into cvs. Eg. anyone 
implementing their own implementations of RSA, DSA, elliptic-curve, [etc] 
or maintaining interfaces to other PKC libraries or hardware might be 
affected. Before figuring out how best to continue, I'll wait to see what 
(if anything) people have to say first. But that's also a warning - if 
you have something to say, it would be better to say it sooner than 
later. :-)

All thoughts and feedback would be most welcome.

Cheers,
Geoff

-- 
Geoff Thorpe
[EMAIL PROTECTED]
http://www.openssl.org/

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

bignum audit

Reply via email to