On Tue, Oct 28, 2014 at 5:16 PM, Wim Lewis <w...@omnigroup.com> wrote:

>
> On Oct 23, 2014, at 10:17 PM, Vladimir Zatsepin <
> vladimir.zatse...@gmail.com> wrote:
> > Does somebody know how OPENSSL_cleanse() works?
> > I don't understand what this [17, 63, 0xF] values means. Why such values
> were chosen?
>
> I think it's a simplistic random number generator, like a linear
> congruential generator- it's trying to fill the buffer with random-looking
> data. I'm not sure why it's doing that instead of simply filling with a
> constant value, though.


I can't remember all the whys and wherefores, this was written over a
decade ago.

I had occasional chats with Peter Gutmann about his study of anti-forensic
scrubbing techniques, and that was at least one reason why a memset()
didn't seem good enough (ie. for cases where your physical security
paranoia is dialed all the way up). Perhaps the state of technology has
moved on to the point where it no longer makes a difference, but back then
it was at least conceivably important. Some binary storage has analog
artifacts (if you're forensically equipped to read them) that allow you to
make educated guesses about what their previously-written values might have
been, and this could allow "erased" disk or memory contents to be recovered
by people who are sufficiently motivated, financed, or socially-challenged.
Eg. a bit with an analog reading of 0.99 might be a bit that has been
written to 1 repeatedly, whereas a 0.75 might be a bit that was 0 for a
long time and recently written to 1, if you'll pardon the hand-waving.

In any case, see http://en.wikipedia.org/wiki/Gutmann_method for a deeper
discussion.

The OPENSSL_clean() routine clearly doesn't do anything as fancy or
comprehensive as what's described there, because the only desire in this
case was to scrub in a single pass using a simple and self-contained (fast)
algorithm, and simply, to do a better job than memset(). (In fact, the
memchr() step causes a read-only second pass, but that's still fairly
efficient compared to more industrial/tin-foil-hat methods. And besides
which, these cleanse calls are most importantly used on BIGNUMs or
symmetric keys, which are relatively small and likely to stay cache-local
for a second pass.)

(And if super-low-quality random numbers are sufficient/desired here, why
> not just call rand() ?)
>

Well it just needs to scrub/obfuscate reasonably efficiently. Randomness
per se is not that important IMO, so super-low-quality-random is even less
so. :-) Using rand() in a loop would likely be much slower, and of course
relying on "standard" C functions like that one has a nasty habit of
opening up new cans of platform/portability worms. (Eg. what if some
platforms have a "rand" that does something weird, or is really slow and
total overkill?)

The algorithm I came up with seemed to mix and jumble relatively well in my
limited testing and "analysis" (a big word for what was a crude exercise).
It may or may not have been influenced by coffee-addled discussions with
Peter, so he deserves complete deniability. :-)

And no, the algorithm wasn't suggested to me by any black-suited gentlemen
from any government agency, in case the thought crossed your mind (as it
probably should). Anyone is free to analyse what's there and post findings
and patches if they see fit, there's no need to trust me.


>
> The code starting from memchr() is particularly odd.
>

I thought it was desirable to use pointers/addresses as part of the mixing
logic, rather than just payload. In the post-processing step to remix the
global variable (that's used to kick off each "cleanse" run), I was wary of
degenerate/harmonic behaviour if the 'ptr' and 'len' inputs to the function
were repetitive and very unrandomly aligned (particularly if the
to-be-scrubbed data is also repetitive). So that memchr() search will match
occasionally, and seems unlikely to have any particular address alignment
when it does, which is probably the characteristic I was after when working
through various examples. Again, it's been a long time...

Oh yeah there was one other thing that was bothering me back then. We'd run
into some weirdly aggressive toolchain issues where even the linker was
sometimes helping the compiler to optimise out logic that you wouldn't
expect. Eg. if you pass a data structure to a function that does a memset()
and then you free that data structure after the function returns, the
toolchain could optimise out the memset() on the basis that it sees no
point to doing writes to memory that you subsequently deallocate. Ie. it
assumes there are no useful side-effects to (nor dependencies on) that
memset(), and with its special knowledge about what free() means, it prunes
those writes as being "redundant". So the self-imposed requirement for a
rolling global variable, that was data-interdependent with the cleanse
logic, was as least partly motivated by a desire to not get subverted by
any more smart-ass compilers. :-)

Cheers,
Geoff

Reply via email to