On Wed, Sep 9, 2015 at 6:36 PM, Zack Weinberg <za...@panix.com> wrote:
> The first, simpler problem is strictly optimization.  explicit_bzero
> can be optimized to memset followed by a vacuous use of the memory
> region (generating no machine instructions, but preventing the stores
> from being deleted as dead); this is valuable because the sensitive
> data is often small and fixed-size, so the memset can in turn be
> replaced by inline code.

How valuable is that speedup due to possible inlining?

You know, call instruction is not a crime.
In fact, it is *heavily* optimized on any CPU exactly because
calls happen gazillion times every second.

In my measurement, on x86 call+ret pair is cheaper than
a single read-modify-write ALU op on L1 data item!

So, just implement explicit_bzero() as a function which
is prohibited from inlining, and which clears all call-clobbered
registers in addition to clearing memory.

This will probably also be the smallest solution code size wise.

Reply via email to