On Tue, Oct 28, 2014 at 5:16 PM, Wim Lewis <w...@omnigroup.com> wrote:
> > On Oct 23, 2014, at 10:17 PM, Vladimir Zatsepin < > vladimir.zatse...@gmail.com> wrote: > > Does somebody know how OPENSSL_cleanse() works? > > I don't understand what this [17, 63, 0xF] values means. Why such values > were chosen? > > I think it's a simplistic random number generator, like a linear > congruential generator- it's trying to fill the buffer with random-looking > data. I'm not sure why it's doing that instead of simply filling with a > constant value, though. I can't remember all the whys and wherefores, this was written over a decade ago. I had occasional chats with Peter Gutmann about his study of anti-forensic scrubbing techniques, and that was at least one reason why a memset() didn't seem good enough (ie. for cases where your physical security paranoia is dialed all the way up). Perhaps the state of technology has moved on to the point where it no longer makes a difference, but back then it was at least conceivably important. Some binary storage has analog artifacts (if you're forensically equipped to read them) that allow you to make educated guesses about what their previously-written values might have been, and this could allow "erased" disk or memory contents to be recovered by people who are sufficiently motivated, financed, or socially-challenged. Eg. a bit with an analog reading of 0.99 might be a bit that has been written to 1 repeatedly, whereas a 0.75 might be a bit that was 0 for a long time and recently written to 1, if you'll pardon the hand-waving. In any case, see http://en.wikipedia.org/wiki/Gutmann_method for a deeper discussion. The OPENSSL_clean() routine clearly doesn't do anything as fancy or comprehensive as what's described there, because the only desire in this case was to scrub in a single pass using a simple and self-contained (fast) algorithm, and simply, to do a better job than memset(). (In fact, the memchr() step causes a read-only second pass, but that's still fairly efficient compared to more industrial/tin-foil-hat methods. And besides which, these cleanse calls are most importantly used on BIGNUMs or symmetric keys, which are relatively small and likely to stay cache-local for a second pass.) (And if super-low-quality random numbers are sufficient/desired here, why > not just call rand() ?) > Well it just needs to scrub/obfuscate reasonably efficiently. Randomness per se is not that important IMO, so super-low-quality-random is even less so. :-) Using rand() in a loop would likely be much slower, and of course relying on "standard" C functions like that one has a nasty habit of opening up new cans of platform/portability worms. (Eg. what if some platforms have a "rand" that does something weird, or is really slow and total overkill?) The algorithm I came up with seemed to mix and jumble relatively well in my limited testing and "analysis" (a big word for what was a crude exercise). It may or may not have been influenced by coffee-addled discussions with Peter, so he deserves complete deniability. :-) And no, the algorithm wasn't suggested to me by any black-suited gentlemen from any government agency, in case the thought crossed your mind (as it probably should). Anyone is free to analyse what's there and post findings and patches if they see fit, there's no need to trust me. > > The code starting from memchr() is particularly odd. > I thought it was desirable to use pointers/addresses as part of the mixing logic, rather than just payload. In the post-processing step to remix the global variable (that's used to kick off each "cleanse" run), I was wary of degenerate/harmonic behaviour if the 'ptr' and 'len' inputs to the function were repetitive and very unrandomly aligned (particularly if the to-be-scrubbed data is also repetitive). So that memchr() search will match occasionally, and seems unlikely to have any particular address alignment when it does, which is probably the characteristic I was after when working through various examples. Again, it's been a long time... Oh yeah there was one other thing that was bothering me back then. We'd run into some weirdly aggressive toolchain issues where even the linker was sometimes helping the compiler to optimise out logic that you wouldn't expect. Eg. if you pass a data structure to a function that does a memset() and then you free that data structure after the function returns, the toolchain could optimise out the memset() on the basis that it sees no point to doing writes to memory that you subsequently deallocate. Ie. it assumes there are no useful side-effects to (nor dependencies on) that memset(), and with its special knowledge about what free() means, it prunes those writes as being "redundant". So the self-imposed requirement for a rolling global variable, that was data-interdependent with the cleanse logic, was as least partly motivated by a desire to not get subverted by any more smart-ass compilers. :-) Cheers, Geoff