On Wed, Sep 9, 2015 at 6:36 PM, Zack Weinberg <za...@panix.com> wrote: > The first, simpler problem is strictly optimization. explicit_bzero > can be optimized to memset followed by a vacuous use of the memory > region (generating no machine instructions, but preventing the stores > from being deleted as dead); this is valuable because the sensitive > data is often small and fixed-size, so the memset can in turn be > replaced by inline code.
How valuable is that speedup due to possible inlining? You know, call instruction is not a crime. In fact, it is *heavily* optimized on any CPU exactly because calls happen gazillion times every second. In my measurement, on x86 call+ret pair is cheaper than a single read-modify-write ALU op on L1 data item! So, just implement explicit_bzero() as a function which is prohibited from inlining, and which clears all call-clobbered registers in addition to clearing memory. This will probably also be the smallest solution code size wise.