Hi y'all,
This is a "ping -b" to anyone who has an interest in the integer math code
in openssl. Otto and Nils had reported a few discrepencies a while back,
and there is already a RT ticket from Otto about one aspect of this.
However, as I wade into crypto/bn/, I finding that it's a bit of a
pandorah's box (I would gone with the "rabbit hole" thing, but that's a
little too fashionable lately).
My interest is a little more fundamental than the particular error cases
that have been discussed so far - I wold like to make the API more robust
against operations that leave BIGNUM variables in an inconsistent state.
BTW: my definition of a consistent state for a bignum x is for it to be
invariant under bn_fix_top(). It's a home-brewed definition for sure, but
I think it's logical enough because of what "fix" means. IMHO
bn_fix_top() should only be needed to recalibrate a bignum structure
after directly manipulating bignum's data representation, *especially*
from the point of view of API users though this seems rational for
internal code too. Of course, there are obviously other ways for a BIGNUM
structure to be inconsistent too, but I'm only interested in the
relationship between the x->top index and the x->d[0..(top-1)] array-data
because that is there we currently legitimise something I consider to be
a bug. I've started some auditing/hacking that involves hijacking some
existing macros (eg. bn_[fix|check]_top) and adding some others (eg.
bn_verify_top), and in doing this, I've stumbled onto a question that I
realise Nils had also posted some time ago, so I'll quote his version of
it;
> Another question would be if both bignum representations for '0'
> should be considered legal, i.e. is {{0, 0}, 0, 2, 0, 0} the same as
> {{0, 0}, 1, 2, 0, 0} (BN_is_zero returns 1 for both representations
> of '0') ?
This observation/question turns out to be quite a nutcracker. To explain,
for some perverse reason there is a convention in openssl to accept that
a legitimate "zero-valued" BIGNUM could either have x->top set to zero,
or it can have x->top set to one but with the corresponding word of the
bignum data (x->d[0]) set to zero. Anyone who has done more than one
high-school induction proof or implemented any kind of loop algorithm
should intuitively feel itchy about this kind of thing, and there are
some solid reasonings against this that go beyond intuition too. The best
generalised rule I could come up with was this; x->top should be the
minimal number of words for its data representation, which coincidently
happens to be a "rule" that is 1-1 with the output from bn_fix_top()
anyway. In particular one of these two "zero" forms is the output you get
from bn_fix_top() no matter which of the two you feed in. So in the case
of '0', this reasoning would suggest that x->top be zero. Does anyone out
there have some historical knowledge (SSLeay?) or some convincing
explanations for why the the exceptional 1-word form for '0' should be
considered acceptable?
Going on from that, this "tolerance" is the cause of quite a bit of
complexity in crypto/bn/, so as part of this "audit" I'm wondering to
what extent we can clean this up. The obvious issue is backwards
compatibility with applications or libraries that have some reliance on
this perversity. Speaking for myself, whilst I could understand some
caution from people sensitive to that, my own caution is sensitive to
complexity and bugs in openssl itself and I think both stem rather easily
from working around this mathematically-illogical exceptional case.
Currently, all the bignum code needs to ensure it is robust against this
"rule", including the exposed API macros that are used in tight-loop
algorithm code for word-based adds, compares, etc. Read this another way:
any elegant algorithmic code that doesn't have an ugly "if()" before or
after it is at risk to this problem. IMHO this is not just a problem now
for the one or two bugs spotted so far, it's always going to remain a
maintenance problem that risks future bugs. (The "Principle of most
surprise" anyone?) So the census I'm looking for it this: even if noone
has any convincing explanations for *why* this alternative 1-word
exception is tolerated, does anyone have any convincing arguments against
us phasing it out?
BTW: w.r.t. this "audit" (if you can call it such) ... I know some people
will prefer to be kept in the loop about this and have a chance to
contribute long before I start stuffing changes into cvs. Eg. anyone
implementing their own implementations of RSA, DSA, elliptic-curve, [etc]
or maintaining interfaces to other PKC libraries or hardware might be
affected. Before figuring out how best to continue, I'll wait to see what
(if anything) people have to say first. But that's also a warning - if
you have something to say, it would be better to say it sooner than
later. :-)
All thoughts and feedback would be most welcome.
Cheers,
Geoff
--
Geoff Thorpe
[EMAIL PROTECTED]
http://www.openssl.org/
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]