Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-23 Thread Jörn Engel
On Sun, 22 September 2013 22:43:38 -0400, Theodore Ts'o wrote:
> On Sun, Sep 22, 2013 at 08:16:23PM -0400, Jörn Engel wrote:
> > How about we switch between the two mixing functions depending on the
> > interrupt load?  If this CPU has seen fewer than 1000 interrupts in
> > the last second, use the better one, otherwise us the cheaper one?
> 
> I guess the question here is whether it's worth it.  On a 2.8 GHz
> laptop Ivy Bridge chip the numbers are:

Then let us assume for now it is not worth it.  When I finally get
around to generating profiles for my insane system we can revisit the
issue.

> I am very strongly of the opinion that the number of systems where you
> have an embedded system with that kind of inane interrupt rate is the
> 0.001% case.

That would be one machine in 10^13?  I doubt we have reached 10^13
machines running Linux in all of history, so a single example would
defeat your very strong opinion. ;)

Anyway, let me collect some real numbers before we argue any further.
And thank you for your maintainership.  It may not appear that way,
but I have _very_ little to complain about in drivers/char/random.c.

Jörn

--
One of the painful things about our time is that those who feel certainty
are stupid, and those with any imagination and understanding are filled
with doubt and indecision.
-- Bertrand Russell
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-23 Thread Jörg-Volker Peetz
Thanks for your patience and elaborated answer.

Theodore Ts'o wrote, on 09/22/2013 23:27:
> On Sun, Sep 22, 2013 at 11:01:42PM +0200, Jörg-Volker Peetz wrote:
>> just out of interest I would like to ask why this mixing function has to be 
>> that
>> complicated. For example, even if the input is always 0 and the pool is 
>> seeded
>> with pool[0] = 1 (as in your test program) this algorithm generates some
>> (predictable) pseudo-random numbers in the pool. Is this necessary?
>>
>> To just mix in some random input filling the whole pool (seeded again with
>> pool[0] = 1) something as "simple" as
>>
>>   f->pool[0] = rol32(input[0], f->pool[2] & 31) ^ f->pool[1];
>>   f->pool[1] = rol32(input[1], f->pool[3] & 31) ^ f->pool[2];
>>   f->pool[2] = rol32(input[2], f->pool[0] & 31) ^ f->pool[3];
>>   f->pool[3] = rol32(input[3], f->pool[1] & 31) ^ f->pool[0];
>>
>> would suffice, although I didn't do any statistical tests.
> 
> The structure of the mixing functions in /dev/random has been well
> studied, and validatetd in a number of different academic papers.  So
> I prefer to stick with the basic architecture, even as it is scaled
> down for speed reasons and beause the pool is smaller.

I'm not arguing so much for speed but for simplicity and comprehensibility of
code. As far as I understand the task of fast_mix() is just to collect random
bits in a small buffer separated from the random pool?
Once in a while these collected bits are then mixed into the main random pool.
For that purpose, justifiably, a strong mixing function is used.

> One of the important things about the mixing function is that if the
> attacker knows the input values (of which the simplest example for
> analytic purposes is if the input values are all zero), we want there
> to be ample mixing across the pool.
> 
> If you look at your proposed mixing function, in the case where the
> input values are all zero, it devolves into:
> 
>f->pool[0] = f->pool[1];
>f->pool[1] = f->pool[2];
>f->pool[2] = f->pool[3];
>f->pool[3] = f->pool[0];
> 
> Yes, I know the input values will never be all zero, but in the case
> where the attacker knows what the input values are[1], but does not know
> the contents of the entropy pool, a very simplistic mixing function
> becomes relatively easy to analyze in the case where attacker knows
> the inputs and then outputs, and is trying to determine the internal
> state of the random driver.
> 
> Measuring the speed of the fast_mix function which I put together,
> it's already been speeded up compard to the original fast_mix function
> by a factor of six.  On my x86 laptop, I can run 10 million
> iterations in 0.14 seconds, so that translates to 14ns per fast_mix
> call.   (The original fast_mix function runs in 84 nanoseconds.)
> 
> So there is a cost-benefit tradeoff that we need to balance here.  If
> you have a system with 100k interrupts per second, performance is
> going to be poor no matter what, and it's not clear how common such
> systems really are.  Most modern hardware do have some kind of NAPI or
> other interrupt mitigation in place --- heck, even back in 1980's
> people had figured out how to improve the 8250 UART with the 16550A
> UART, which introdued a FIFO to decrease the interrupt load caused by
> serial ports, and things have only gotten better since then.
> 
> Out of curiosity, on your profiles, how many nanonseconds total is the
> total interrupt code path chewing up per interrupt?
> 
> Regards,
> 
>   - Ted
> 
> [1] And on systems where we don't have get_cycles() or
> random_get_entropy(), it becomes much easier for the attacker to guess
> what at least half of the input values will be!
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-23 Thread Jörg-Volker Peetz
Thanks for your patience and elaborated answer.

Theodore Ts'o wrote, on 09/22/2013 23:27:
 On Sun, Sep 22, 2013 at 11:01:42PM +0200, Jörg-Volker Peetz wrote:
 just out of interest I would like to ask why this mixing function has to be 
 that
 complicated. For example, even if the input is always 0 and the pool is 
 seeded
 with pool[0] = 1 (as in your test program) this algorithm generates some
 (predictable) pseudo-random numbers in the pool. Is this necessary?

 To just mix in some random input filling the whole pool (seeded again with
 pool[0] = 1) something as simple as

   f-pool[0] = rol32(input[0], f-pool[2]  31) ^ f-pool[1];
   f-pool[1] = rol32(input[1], f-pool[3]  31) ^ f-pool[2];
   f-pool[2] = rol32(input[2], f-pool[0]  31) ^ f-pool[3];
   f-pool[3] = rol32(input[3], f-pool[1]  31) ^ f-pool[0];

 would suffice, although I didn't do any statistical tests.
 
 The structure of the mixing functions in /dev/random has been well
 studied, and validatetd in a number of different academic papers.  So
 I prefer to stick with the basic architecture, even as it is scaled
 down for speed reasons and beause the pool is smaller.

I'm not arguing so much for speed but for simplicity and comprehensibility of
code. As far as I understand the task of fast_mix() is just to collect random
bits in a small buffer separated from the random pool?
Once in a while these collected bits are then mixed into the main random pool.
For that purpose, justifiably, a strong mixing function is used.

 One of the important things about the mixing function is that if the
 attacker knows the input values (of which the simplest example for
 analytic purposes is if the input values are all zero), we want there
 to be ample mixing across the pool.
 
 If you look at your proposed mixing function, in the case where the
 input values are all zero, it devolves into:
 
f-pool[0] = f-pool[1];
f-pool[1] = f-pool[2];
f-pool[2] = f-pool[3];
f-pool[3] = f-pool[0];
 
 Yes, I know the input values will never be all zero, but in the case
 where the attacker knows what the input values are[1], but does not know
 the contents of the entropy pool, a very simplistic mixing function
 becomes relatively easy to analyze in the case where attacker knows
 the inputs and then outputs, and is trying to determine the internal
 state of the random driver.
 
 Measuring the speed of the fast_mix function which I put together,
 it's already been speeded up compard to the original fast_mix function
 by a factor of six.  On my x86 laptop, I can run 10 million
 iterations in 0.14 seconds, so that translates to 14ns per fast_mix
 call.   (The original fast_mix function runs in 84 nanoseconds.)
 
 So there is a cost-benefit tradeoff that we need to balance here.  If
 you have a system with 100k interrupts per second, performance is
 going to be poor no matter what, and it's not clear how common such
 systems really are.  Most modern hardware do have some kind of NAPI or
 other interrupt mitigation in place --- heck, even back in 1980's
 people had figured out how to improve the 8250 UART with the 16550A
 UART, which introdued a FIFO to decrease the interrupt load caused by
 serial ports, and things have only gotten better since then.
 
 Out of curiosity, on your profiles, how many nanonseconds total is the
 total interrupt code path chewing up per interrupt?
 
 Regards,
 
   - Ted
 
 [1] And on systems where we don't have get_cycles() or
 random_get_entropy(), it becomes much easier for the attacker to guess
 what at least half of the input values will be!
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-23 Thread Jörn Engel
On Sun, 22 September 2013 22:43:38 -0400, Theodore Ts'o wrote:
 On Sun, Sep 22, 2013 at 08:16:23PM -0400, Jörn Engel wrote:
  How about we switch between the two mixing functions depending on the
  interrupt load?  If this CPU has seen fewer than 1000 interrupts in
  the last second, use the better one, otherwise us the cheaper one?
 
 I guess the question here is whether it's worth it.  On a 2.8 GHz
 laptop Ivy Bridge chip the numbers are:

Then let us assume for now it is not worth it.  When I finally get
around to generating profiles for my insane system we can revisit the
issue.

 I am very strongly of the opinion that the number of systems where you
 have an embedded system with that kind of inane interrupt rate is the
 0.001% case.

That would be one machine in 10^13?  I doubt we have reached 10^13
machines running Linux in all of history, so a single example would
defeat your very strong opinion. ;)

Anyway, let me collect some real numbers before we argue any further.
And thank you for your maintainership.  It may not appear that way,
but I have _very_ little to complain about in drivers/char/random.c.

Jörn

--
One of the painful things about our time is that those who feel certainty
are stupid, and those with any imagination and understanding are filled
with doubt and indecision.
-- Bertrand Russell
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Theodore Ts'o
On Sun, Sep 22, 2013 at 08:16:23PM -0400, Jörn Engel wrote:
> How about we switch between the two mixing functions depending on the
> interrupt load?  If this CPU has seen fewer than 1000 interrupts in
> the last second, use the better one, otherwise us the cheaper one?

I guess the question here is whether it's worth it.  On a 2.8 GHz
laptop Ivy Bridge chip the numbers are:

Original fast_mix: 84 ns
tytso's fast_mix:  14 ns
joern's fast_mix:   8 ns

In terms of absolute overhead if you are running at an insane 100k
interrupts per second, it's still only 0.84%, 0.14%, and 0.08%,
respectively.  Granted, an embedded CPU will be (much) slower, but so
will the overall overhead of the rest of the interrupt handling code
path plus whatever the overhead of the device driver will be.  The
real bug is the 100k interrupts per second workload.

How about this as a compromise?  We can add an #ifdef in the random.c
code which has the alternate fast_mix algorithm in the very rare case
that some embedded software engineer under time-pressure and suffering
under the need to use a pathetically broken hardware design, and who
starts looking in the random.c code, will find the an alternate
version.  That way, we avoid the complexity of an dynamic switching
system, plus the overhead of measuring the number of interrupts per
second.

I am very strongly of the opinion that the number of systems where you
have an embedded system with that kind of inane interrupt rate is the
0.001% case.  So IMHO it's not even worth having a dynamic
switching system, especially when it's only going to improve things
slightly.

- Ted

P.S.  The real reason for the original fast_mix() function is because
it has a separate pool for each CPU, so there's no spinlock contention
and no cache line bouncing.  And that's why the fast_mix pool is so
small --- so that the entire struct fast_pool can fit in a single CPU
cache line.  So on normal, common systems --- even mobile handsets
have multiple cores these days --- fast_mix() *is* actually much
faster than the standard entropy pool, even before any optimizations.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörn Engel
On Sun, 22 September 2013 19:36:41 -0400, Theodore Ts'o wrote:
> On Sun, Sep 22, 2013 at 04:53:34PM -0400, Jörn Engel wrote:
> > 
> > And I want to keep that function.  Essentially the point of fast_mix()
> > is to ratelimit _mix_pool_bytes().  Naïve ratelimiting would simply
> > discard the input once the ratelimit has been reached.  My proposal is
> > to still use the input bits, but use a really cheap mixing function.
> 
> Sure, but how cheap is "cheap"?  There's a balance here.  I don't buy
> the argument that we must weaken the fast_mix() at all costs because
> we have to drive the cost of fast_mix() to zero.  If we're going to do
> that, to the limit fast_mix() should be a no-op, which is ridiculous.

Agreed.  We always have a tradeoff between quality and cost.  And I
repeat yet again that driving the cost down is important, because it
allows us to collect entropy more often.  The schedule is an entropy
source I would like to tap.

> So what I've suggested already makes fast_mix() much faster.  It's not
> fast as what you've proposed, but it's pretty clear that my fast_mix()
> is better at mixing the fast_mix pool.

Agreed.

> > Your version of fast_mix() failed in the "really cheap" department.
> > As a result, it showed up in profiles and at least one idiot (me)
> > reverted to naïve ratelimiting.  It could have been worse, I was
> > explicitly asked twice to just remove the call to
> > add_interrupt_randomness().
> 
> Sure, but that's not _my_ problem.  If someone stupid adulterates
> /dev/random, that's not my responsibility.  Most people aren't going
> to be dealing with systems with truly stupid interrupt rates, and most
> people aren't going to be tampering with the random driver.

This is where I don't agree.  I very much care even about the plastic
routers running some variant of Linux modified by some embedded
engineers under insane time pressure.  If you leave them an incentive
to cripple the random generator, some of them will.  If you find
source code with a maliciously crippled random generator, the author
has plausible deniability.

So this should be _our_ problem.  Maybe not yours specifically, but
certainly that of kernel developers in general.

> I'm certainly willing to make fast_mix() faster, to reduce the
> temptation for idiots to tamper with /dev/random, but we need to
> balance the time of fast_mix() with the quality of its mixing, and to
> remember what the common case is (and it's not 100k interrupts per
> second!)

How about we switch between the two mixing functions depending on the
interrupt load?  If this CPU has seen fewer than 1000 interrupts in
the last second, use the better one, otherwise us the cheaper one?

I don't really like the idea too much.  But it would cover both the
common case and my particular degenerated one.

> In practice, we are using randomness in so many places (every single
> time we start a process for ASLR, in lots of security-related programs
> that use SSL libraries, etc.), that we are actually practically
> *never* hitting trickle_thresh.
> 
> The trickle_thresh was added originally when add_timer_randomness()
> was used for interrupts that used SA_SAMPLE_RANDOM.  Given that we
> don't use add_timer_randomness() for that, but for things that happen
> much more rarely (i.e., such as keyboard/mouse input), I'm probably
> going to end up removing the trickle thresh_check altogether.

Makes sense to me.

Jörn

--
Doubt is not a pleasant condition, but certainty is an absurd one.
-- Voltaire
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Theodore Ts'o
On Sun, Sep 22, 2013 at 04:53:34PM -0400, Jörn Engel wrote:
> 
> And I want to keep that function.  Essentially the point of fast_mix()
> is to ratelimit _mix_pool_bytes().  Naïve ratelimiting would simply
> discard the input once the ratelimit has been reached.  My proposal is
> to still use the input bits, but use a really cheap mixing function.

Sure, but how cheap is "cheap"?  There's a balance here.  I don't buy
the argument that we must weaken the fast_mix() at all costs because
we have to drive the cost of fast_mix() to zero.  If we're going to do
that, to the limit fast_mix() should be a no-op, which is ridiculous.

So what I've suggested already makes fast_mix() much faster.  It's not
fast as what you've proposed, but it's pretty clear that my fast_mix()
is better at mixing the fast_mix pool.

> Your version of fast_mix() failed in the "really cheap" department.
> As a result, it showed up in profiles and at least one idiot (me)
> reverted to naïve ratelimiting.  It could have been worse, I was
> explicitly asked twice to just remove the call to
> add_interrupt_randomness().

Sure, but that's not _my_ problem.  If someone stupid adulterates
/dev/random, that's not my responsibility.  Most people aren't going
to be dealing with systems with truly stupid interrupt rates, and most
people aren't going to be tampering with the random driver.

I'm certainly willing to make fast_mix() faster, to reduce the
temptation for idiots to tamper with /dev/random, but we need to
balance the time of fast_mix() with the quality of its mixing, and to
remember what the common case is (and it's not 100k interrupts per
second!)

> And you should do the same for add_timer_randomness(), where again you
> have ratelimiting.  Once trickle_thresh is reached your code simply
> discards most randomness.  

In practice, we are using randomness in so many places (every single
time we start a process for ASLR, in lots of security-related programs
that use SSL libraries, etc.), that we are actually practically
*never* hitting trickle_thresh.

The trickle_thresh was added originally when add_timer_randomness()
was used for interrupts that used SA_SAMPLE_RANDOM.  Given that we
don't use add_timer_randomness() for that, but for things that happen
much more rarely (i.e., such as keyboard/mouse input), I'm probably
going to end up removing the trickle thresh_check altogether.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörn Engel
On Sun, 22 September 2013 17:27:52 -0400, Theodore Ts'o wrote:
> 
> The structure of the mixing functions in /dev/random has been well
> studied, and validatetd in a number of different academic papers.  So
> I prefer to stick with the basic architecture, even as it is scaled
> down for speed reasons and beause the pool is smaller.

And I want to keep that function.  Essentially the point of fast_mix()
is to ratelimit _mix_pool_bytes().  Naïve ratelimiting would simply
discard the input once the ratelimit has been reached.  My proposal is
to still use the input bits, but use a really cheap mixing function.

Your version of fast_mix() failed in the "really cheap" department.
As a result, it showed up in profiles and at least one idiot (me)
reverted to naïve ratelimiting.  It could have been worse, I was
explicitly asked twice to just remove the call to
add_interrupt_randomness().

So don't think of my patch as weakening the mixing, but as
strengthening the ratelimited mixing.  If we have few interrupts,
_mix_pool_bytes() will be run once a second, if we have many it will
be run once every 64 interrupts.  And in the latter case, the input
for _mix_pool_bytes() is much better than with naïve ratelimiting.

And you should do the same for add_timer_randomness(), where again you
have ratelimiting.  Once trickle_thresh is reached your code simply
discards most randomness.  Only once in 4096 call do you use all the
bits you get - most of which will be predictable.  Why not use a cheap
mixing function for the other 4095 calls and ensure we have many good
bits on call 4096?

Jörn

--
You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a
speed hack until you've proven that's where the bottleneck is.
-- Rob Pike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörn Engel
On Sat, 21 September 2013 17:41:18 -0400, Theodore Ts'o wrote:
> 
> BTW, just to give another example of the difference between the mixing
> funtions, try compiling the following with and without ORIG_MIX defined...

Garbage in, garbage out again.  If there is absolutely no randomness
in the input (all bits zero), my mixing function will simply shift the
pool.  And because the shifting has a period of 128, there are only
128 possible pool states.  Your mixing function is doing slightly
better, it effectively becomes an interrupt counter with a silly
output format.

But who cares?  If there is absolutely no randomness in the input, you
have lost.  That case isn't worth contemplating.  The question is
whether any randomness present in the input will get accumulated.

Without the shifting, a single unpredictable bit on, say, the lowest
position of the timestamp will be xor'ed into the same pool bit every
time.  The rest of the pool would always be predictable and the
resulting mixing function would clearly be bad.

With the shifting and using the same example, after 128 rounds every
bit of the pool is unpredictable.  Job done, we can go home now.  You
cannot achieve anything better than 128 unpredictable bits, no matter
how much you try.

Jörn

--
Those who come seeking peace without a treaty are plotting.
-- Sun Tzu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörn Engel
On Sat, 21 September 2013 17:25:10 -0400, Theodore Ts'o wrote:
> On Mon, Sep 16, 2013 at 11:40:27AM -0400, Jörn Engel wrote:
> > 
> > Here is a patch to make add_interrupt_randomness() significantly
> > cheaper without significantly impacting the quality.  The second part
> > is my personal opinion and others might disagree.
> > 
> > So far this has only seen userspace performance testing, so don't
> > merge it in a hurry.
> 
> Performance testing, but your new fast_mix pool has some serious
> problems in terms of how well it works.
> 
> First of all, I think this is a typo:
> 
> > +   for (i = 0; i < 3; i++) {
> 
> This means that pool[3] is always 0, and the input[3] is never mixed
> in.  Hence, the pool is now only 96 bits instead of 128 bits, and the
> last 4 bytes of the input pool are not getting mixed in.

Yes, indeed.

> Secondly, if you change this so that we actually use the whole pool,
> and you mix in a constant set of inputs like this:
> 
>   for (i = 0; i < 100; i++) {
>   input[0] = i;
>   input[1] = i;
>   input[2] = i;
>   input[3] = i;
>   fast_mix(, input);
>   print_pool();
>   }
> 
> you see something like this:
> 
> pool:
> 0x
> 0x
> 0x
> 0x
> pool:
> 0x0001
> 0x0001
> 0x0001
> 0x0001
> pool:
> 0x0082
> 0x0082
> 0x0082
> 0x0082
> pool:
> 0x4103
> 0x4103
> 0x4103
> 0x4103
> pool:
> 0x00208184
> 0x00208184
> 0x00208184
> 0x00208184
> pool:
> 0x1040c205
> 0x1040c205
> 0x1040c205
> 0x1040c205
> 
> Granted, it's unlikely that we will be mixing numbers like this, but
> it's a great demonstration of why I added the input_rotate aspect to
> my mixing functions.

Honestly, this is a case of garbage in, garbage out.  If your
"randomness" is the same number repeated four times, the repetitions
don't add anything.  A more complicated mixing function doesn't buy
you much.  In the extreme case, you have the AES-encrypted counter
example.  No entropy on the input side, something that looks random on
the output side, and - because there is no secret key in the
complicated mixing function - anyone willing to sit down and do the
math can predict the output.

The failure case I was personally concerned about was a commutative
mixing function.  Interrupt numbers, for examples, are perfectly
predictable.  The randomness comes from the order or interrupts.  So
mix(a, b) != mix(b, a) is a hard requirement, if you excuse my silly
notation.

One variant that still fails your test above, but ensures every input
bit will eventually influence every output bit (certainly after 64
rounds) would be this:

static void fast_mix(struct fast_pool *f, __u32 input[4])
{
int i;
__u32 acc, p, carry = f->pool[3] >> 25;

for (i = 0; i < 4; i++) {
p = carry * input[i];
acc = (f->pool[i] << 7) ^ input[i] ^ carry ^ p;
carry = f->pool[i] >> 25;
f->pool[i] = acc;
}
}

Clearly this is a better mixing function, but I still think we don't
need that much quality.  If there is any randomness at all in the
interrupts, we will quickly make every bit in the pool unpredictable.
And if the interrupts are completely predictable in every bit and in
their order, well, we have lost either way.

Jörn

--
I say what I believe in.  And give reasons.
-- Pervez Hoodbhoy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Theodore Ts'o
On Sun, Sep 22, 2013 at 11:01:42PM +0200, Jörg-Volker Peetz wrote:
> just out of interest I would like to ask why this mixing function has to be 
> that
> complicated. For example, even if the input is always 0 and the pool is seeded
> with pool[0] = 1 (as in your test program) this algorithm generates some
> (predictable) pseudo-random numbers in the pool. Is this necessary?
> 
> To just mix in some random input filling the whole pool (seeded again with
> pool[0] = 1) something as "simple" as
> 
>   f->pool[0] = rol32(input[0], f->pool[2] & 31) ^ f->pool[1];
>   f->pool[1] = rol32(input[1], f->pool[3] & 31) ^ f->pool[2];
>   f->pool[2] = rol32(input[2], f->pool[0] & 31) ^ f->pool[3];
>   f->pool[3] = rol32(input[3], f->pool[1] & 31) ^ f->pool[0];
> 
> would suffice, although I didn't do any statistical tests.

The structure of the mixing functions in /dev/random has been well
studied, and validatetd in a number of different academic papers.  So
I prefer to stick with the basic architecture, even as it is scaled
down for speed reasons and beause the pool is smaller.

One of the important things about the mixing function is that if the
attacker knows the input values (of which the simplest example for
analytic purposes is if the input values are all zero), we want there
to be ample mixing across the pool.

If you look at your proposed mixing function, in the case where the
input values are all zero, it devolves into:

   f->pool[0] = f->pool[1];
   f->pool[1] = f->pool[2];
   f->pool[2] = f->pool[3];
   f->pool[3] = f->pool[0];

Yes, I know the input values will never be all zero, but in the case
where the attacker knows what the input values are[1], but does not know
the contents of the entropy pool, a very simplistic mixing function
becomes relatively easy to analyze in the case where attacker knows
the inputs and then outputs, and is trying to determine the internal
state of the random driver.

Measuring the speed of the fast_mix function which I put together,
it's already been speeded up compard to the original fast_mix function
by a factor of six.  On my x86 laptop, I can run 10 million
iterations in 0.14 seconds, so that translates to 14ns per fast_mix
call.   (The original fast_mix function runs in 84 nanoseconds.)

So there is a cost-benefit tradeoff that we need to balance here.  If
you have a system with 100k interrupts per second, performance is
going to be poor no matter what, and it's not clear how common such
systems really are.  Most modern hardware do have some kind of NAPI or
other interrupt mitigation in place --- heck, even back in 1980's
people had figured out how to improve the 8250 UART with the 16550A
UART, which introdued a FIFO to decrease the interrupt load caused by
serial ports, and things have only gotten better since then.

Out of curiosity, on your profiles, how many nanonseconds total is the
total interrupt code path chewing up per interrupt?

Regards,

- Ted

[1] And on systems where we don't have get_cycles() or
random_get_entropy(), it becomes much easier for the attacker to guess
what at least half of the input values will be!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörg-Volker Peetz
Hi Theodore,

Theodore Ts'o wrote, on 09/22/2013 05:05:
> The following fast_mix function, with the loop unrolling, is about 70%
> slower than your proposed version, but it's still four times faster
> than the original byte-based fast_mix function.  This is what I'm
> considering using as a compromise.
> 
> Any comments or objections?
> 
>   - Ted
> 
> static void fast_mix(struct fast_pool *f, __u32 input[4])
> {
>   __u32   w;
>   int i;
>   unsignedinput_rotate = f->rotate;
> 
> #if 0
>   for (i = 0; i < 4; i++) {
>   w = rol32(input[i], input_rotate) ^ f->pool[i] ^
>   f->pool[(i + 3) & 3];
>   f->pool[i] = (w >> 3) ^ twist_table[w & 7];
>   input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
>   }
> #else   /* loop unrolled for speed */
>   w = rol32(input[0], input_rotate) ^ f->pool[0] ^ f->pool[3];
>   f->pool[0] = (w >> 3) ^ twist_table[w & 7];
>   input_rotate = (input_rotate + 14) & 31;
>   w = rol32(input[1], input_rotate) ^ f->pool[1] ^ f->pool[0];
>   f->pool[1] = (w >> 3) ^ twist_table[w & 7];
>   input_rotate = (input_rotate + 7) & 31;
>   w = rol32(input[2], input_rotate) ^ f->pool[2] ^ f->pool[1];
>   f->pool[2] = (w >> 3) ^ twist_table[w & 7];
>   input_rotate = (input_rotate + 7) & 31;
>   w = rol32(input[3], input_rotate) ^ f->pool[3] ^ f->pool[2];
>   f->pool[3] = (w >> 3) ^ twist_table[w & 7];
>   input_rotate = (input_rotate + 7) & 31;
> #endif
>   f->count += 16;
>   f->rotate = input_rotate;
> }
> 

just out of interest I would like to ask why this mixing function has to be that
complicated. For example, even if the input is always 0 and the pool is seeded
with pool[0] = 1 (as in your test program) this algorithm generates some
(predictable) pseudo-random numbers in the pool. Is this necessary?

To just mix in some random input filling the whole pool (seeded again with
pool[0] = 1) something as "simple" as

  f->pool[0] = rol32(input[0], f->pool[2] & 31) ^ f->pool[1];
  f->pool[1] = rol32(input[1], f->pool[3] & 31) ^ f->pool[2];
  f->pool[2] = rol32(input[2], f->pool[0] & 31) ^ f->pool[3];
  f->pool[3] = rol32(input[3], f->pool[1] & 31) ^ f->pool[0];

would suffice, although I didn't do any statistical tests.

Best regards,
Jörg-Volker.
-- 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörg-Volker Peetz
Hi Theodore,

Theodore Ts'o wrote, on 09/22/2013 05:05:
 The following fast_mix function, with the loop unrolling, is about 70%
 slower than your proposed version, but it's still four times faster
 than the original byte-based fast_mix function.  This is what I'm
 considering using as a compromise.
 
 Any comments or objections?
 
   - Ted
 
 static void fast_mix(struct fast_pool *f, __u32 input[4])
 {
   __u32   w;
   int i;
   unsignedinput_rotate = f-rotate;
 
 #if 0
   for (i = 0; i  4; i++) {
   w = rol32(input[i], input_rotate) ^ f-pool[i] ^
   f-pool[(i + 3)  3];
   f-pool[i] = (w  3) ^ twist_table[w  7];
   input_rotate = (input_rotate + (i ? 7 : 14))  31;
   }
 #else   /* loop unrolled for speed */
   w = rol32(input[0], input_rotate) ^ f-pool[0] ^ f-pool[3];
   f-pool[0] = (w  3) ^ twist_table[w  7];
   input_rotate = (input_rotate + 14)  31;
   w = rol32(input[1], input_rotate) ^ f-pool[1] ^ f-pool[0];
   f-pool[1] = (w  3) ^ twist_table[w  7];
   input_rotate = (input_rotate + 7)  31;
   w = rol32(input[2], input_rotate) ^ f-pool[2] ^ f-pool[1];
   f-pool[2] = (w  3) ^ twist_table[w  7];
   input_rotate = (input_rotate + 7)  31;
   w = rol32(input[3], input_rotate) ^ f-pool[3] ^ f-pool[2];
   f-pool[3] = (w  3) ^ twist_table[w  7];
   input_rotate = (input_rotate + 7)  31;
 #endif
   f-count += 16;
   f-rotate = input_rotate;
 }
 

just out of interest I would like to ask why this mixing function has to be that
complicated. For example, even if the input is always 0 and the pool is seeded
with pool[0] = 1 (as in your test program) this algorithm generates some
(predictable) pseudo-random numbers in the pool. Is this necessary?

To just mix in some random input filling the whole pool (seeded again with
pool[0] = 1) something as simple as

  f-pool[0] = rol32(input[0], f-pool[2]  31) ^ f-pool[1];
  f-pool[1] = rol32(input[1], f-pool[3]  31) ^ f-pool[2];
  f-pool[2] = rol32(input[2], f-pool[0]  31) ^ f-pool[3];
  f-pool[3] = rol32(input[3], f-pool[1]  31) ^ f-pool[0];

would suffice, although I didn't do any statistical tests.

Best regards,
Jörg-Volker.
-- 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Theodore Ts'o
On Sun, Sep 22, 2013 at 11:01:42PM +0200, Jörg-Volker Peetz wrote:
 just out of interest I would like to ask why this mixing function has to be 
 that
 complicated. For example, even if the input is always 0 and the pool is seeded
 with pool[0] = 1 (as in your test program) this algorithm generates some
 (predictable) pseudo-random numbers in the pool. Is this necessary?
 
 To just mix in some random input filling the whole pool (seeded again with
 pool[0] = 1) something as simple as
 
   f-pool[0] = rol32(input[0], f-pool[2]  31) ^ f-pool[1];
   f-pool[1] = rol32(input[1], f-pool[3]  31) ^ f-pool[2];
   f-pool[2] = rol32(input[2], f-pool[0]  31) ^ f-pool[3];
   f-pool[3] = rol32(input[3], f-pool[1]  31) ^ f-pool[0];
 
 would suffice, although I didn't do any statistical tests.

The structure of the mixing functions in /dev/random has been well
studied, and validatetd in a number of different academic papers.  So
I prefer to stick with the basic architecture, even as it is scaled
down for speed reasons and beause the pool is smaller.

One of the important things about the mixing function is that if the
attacker knows the input values (of which the simplest example for
analytic purposes is if the input values are all zero), we want there
to be ample mixing across the pool.

If you look at your proposed mixing function, in the case where the
input values are all zero, it devolves into:

   f-pool[0] = f-pool[1];
   f-pool[1] = f-pool[2];
   f-pool[2] = f-pool[3];
   f-pool[3] = f-pool[0];

Yes, I know the input values will never be all zero, but in the case
where the attacker knows what the input values are[1], but does not know
the contents of the entropy pool, a very simplistic mixing function
becomes relatively easy to analyze in the case where attacker knows
the inputs and then outputs, and is trying to determine the internal
state of the random driver.

Measuring the speed of the fast_mix function which I put together,
it's already been speeded up compard to the original fast_mix function
by a factor of six.  On my x86 laptop, I can run 10 million
iterations in 0.14 seconds, so that translates to 14ns per fast_mix
call.   (The original fast_mix function runs in 84 nanoseconds.)

So there is a cost-benefit tradeoff that we need to balance here.  If
you have a system with 100k interrupts per second, performance is
going to be poor no matter what, and it's not clear how common such
systems really are.  Most modern hardware do have some kind of NAPI or
other interrupt mitigation in place --- heck, even back in 1980's
people had figured out how to improve the 8250 UART with the 16550A
UART, which introdued a FIFO to decrease the interrupt load caused by
serial ports, and things have only gotten better since then.

Out of curiosity, on your profiles, how many nanonseconds total is the
total interrupt code path chewing up per interrupt?

Regards,

- Ted

[1] And on systems where we don't have get_cycles() or
random_get_entropy(), it becomes much easier for the attacker to guess
what at least half of the input values will be!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörn Engel
On Sat, 21 September 2013 17:25:10 -0400, Theodore Ts'o wrote:
 On Mon, Sep 16, 2013 at 11:40:27AM -0400, Jörn Engel wrote:
  
  Here is a patch to make add_interrupt_randomness() significantly
  cheaper without significantly impacting the quality.  The second part
  is my personal opinion and others might disagree.
  
  So far this has only seen userspace performance testing, so don't
  merge it in a hurry.
 
 Performance testing, but your new fast_mix pool has some serious
 problems in terms of how well it works.
 
 First of all, I think this is a typo:
 
  +   for (i = 0; i  3; i++) {
 
 This means that pool[3] is always 0, and the input[3] is never mixed
 in.  Hence, the pool is now only 96 bits instead of 128 bits, and the
 last 4 bytes of the input pool are not getting mixed in.

Yes, indeed.

 Secondly, if you change this so that we actually use the whole pool,
 and you mix in a constant set of inputs like this:
 
   for (i = 0; i  100; i++) {
   input[0] = i;
   input[1] = i;
   input[2] = i;
   input[3] = i;
   fast_mix(pool, input);
   print_pool(pool);
   }
 
 you see something like this:
 
 pool:
 0x
 0x
 0x
 0x
 pool:
 0x0001
 0x0001
 0x0001
 0x0001
 pool:
 0x0082
 0x0082
 0x0082
 0x0082
 pool:
 0x4103
 0x4103
 0x4103
 0x4103
 pool:
 0x00208184
 0x00208184
 0x00208184
 0x00208184
 pool:
 0x1040c205
 0x1040c205
 0x1040c205
 0x1040c205
 
 Granted, it's unlikely that we will be mixing numbers like this, but
 it's a great demonstration of why I added the input_rotate aspect to
 my mixing functions.

Honestly, this is a case of garbage in, garbage out.  If your
randomness is the same number repeated four times, the repetitions
don't add anything.  A more complicated mixing function doesn't buy
you much.  In the extreme case, you have the AES-encrypted counter
example.  No entropy on the input side, something that looks random on
the output side, and - because there is no secret key in the
complicated mixing function - anyone willing to sit down and do the
math can predict the output.

The failure case I was personally concerned about was a commutative
mixing function.  Interrupt numbers, for examples, are perfectly
predictable.  The randomness comes from the order or interrupts.  So
mix(a, b) != mix(b, a) is a hard requirement, if you excuse my silly
notation.

One variant that still fails your test above, but ensures every input
bit will eventually influence every output bit (certainly after 64
rounds) would be this:

static void fast_mix(struct fast_pool *f, __u32 input[4])
{
int i;
__u32 acc, p, carry = f-pool[3]  25;

for (i = 0; i  4; i++) {
p = carry * input[i];
acc = (f-pool[i]  7) ^ input[i] ^ carry ^ p;
carry = f-pool[i]  25;
f-pool[i] = acc;
}
}

Clearly this is a better mixing function, but I still think we don't
need that much quality.  If there is any randomness at all in the
interrupts, we will quickly make every bit in the pool unpredictable.
And if the interrupts are completely predictable in every bit and in
their order, well, we have lost either way.

Jörn

--
I say what I believe in.  And give reasons.
-- Pervez Hoodbhoy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörn Engel
On Sat, 21 September 2013 17:41:18 -0400, Theodore Ts'o wrote:
 
 BTW, just to give another example of the difference between the mixing
 funtions, try compiling the following with and without ORIG_MIX defined...

Garbage in, garbage out again.  If there is absolutely no randomness
in the input (all bits zero), my mixing function will simply shift the
pool.  And because the shifting has a period of 128, there are only
128 possible pool states.  Your mixing function is doing slightly
better, it effectively becomes an interrupt counter with a silly
output format.

But who cares?  If there is absolutely no randomness in the input, you
have lost.  That case isn't worth contemplating.  The question is
whether any randomness present in the input will get accumulated.

Without the shifting, a single unpredictable bit on, say, the lowest
position of the timestamp will be xor'ed into the same pool bit every
time.  The rest of the pool would always be predictable and the
resulting mixing function would clearly be bad.

With the shifting and using the same example, after 128 rounds every
bit of the pool is unpredictable.  Job done, we can go home now.  You
cannot achieve anything better than 128 unpredictable bits, no matter
how much you try.

Jörn

--
Those who come seeking peace without a treaty are plotting.
-- Sun Tzu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörn Engel
On Sun, 22 September 2013 17:27:52 -0400, Theodore Ts'o wrote:
 
 The structure of the mixing functions in /dev/random has been well
 studied, and validatetd in a number of different academic papers.  So
 I prefer to stick with the basic architecture, even as it is scaled
 down for speed reasons and beause the pool is smaller.

And I want to keep that function.  Essentially the point of fast_mix()
is to ratelimit _mix_pool_bytes().  Naïve ratelimiting would simply
discard the input once the ratelimit has been reached.  My proposal is
to still use the input bits, but use a really cheap mixing function.

Your version of fast_mix() failed in the really cheap department.
As a result, it showed up in profiles and at least one idiot (me)
reverted to naïve ratelimiting.  It could have been worse, I was
explicitly asked twice to just remove the call to
add_interrupt_randomness().

So don't think of my patch as weakening the mixing, but as
strengthening the ratelimited mixing.  If we have few interrupts,
_mix_pool_bytes() will be run once a second, if we have many it will
be run once every 64 interrupts.  And in the latter case, the input
for _mix_pool_bytes() is much better than with naïve ratelimiting.

And you should do the same for add_timer_randomness(), where again you
have ratelimiting.  Once trickle_thresh is reached your code simply
discards most randomness.  Only once in 4096 call do you use all the
bits you get - most of which will be predictable.  Why not use a cheap
mixing function for the other 4095 calls and ensure we have many good
bits on call 4096?

Jörn

--
You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a
speed hack until you've proven that's where the bottleneck is.
-- Rob Pike
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Theodore Ts'o
On Sun, Sep 22, 2013 at 04:53:34PM -0400, Jörn Engel wrote:
 
 And I want to keep that function.  Essentially the point of fast_mix()
 is to ratelimit _mix_pool_bytes().  Naïve ratelimiting would simply
 discard the input once the ratelimit has been reached.  My proposal is
 to still use the input bits, but use a really cheap mixing function.

Sure, but how cheap is cheap?  There's a balance here.  I don't buy
the argument that we must weaken the fast_mix() at all costs because
we have to drive the cost of fast_mix() to zero.  If we're going to do
that, to the limit fast_mix() should be a no-op, which is ridiculous.

So what I've suggested already makes fast_mix() much faster.  It's not
fast as what you've proposed, but it's pretty clear that my fast_mix()
is better at mixing the fast_mix pool.

 Your version of fast_mix() failed in the really cheap department.
 As a result, it showed up in profiles and at least one idiot (me)
 reverted to naïve ratelimiting.  It could have been worse, I was
 explicitly asked twice to just remove the call to
 add_interrupt_randomness().

Sure, but that's not _my_ problem.  If someone stupid adulterates
/dev/random, that's not my responsibility.  Most people aren't going
to be dealing with systems with truly stupid interrupt rates, and most
people aren't going to be tampering with the random driver.

I'm certainly willing to make fast_mix() faster, to reduce the
temptation for idiots to tamper with /dev/random, but we need to
balance the time of fast_mix() with the quality of its mixing, and to
remember what the common case is (and it's not 100k interrupts per
second!)

 And you should do the same for add_timer_randomness(), where again you
 have ratelimiting.  Once trickle_thresh is reached your code simply
 discards most randomness.  

In practice, we are using randomness in so many places (every single
time we start a process for ASLR, in lots of security-related programs
that use SSL libraries, etc.), that we are actually practically
*never* hitting trickle_thresh.

The trickle_thresh was added originally when add_timer_randomness()
was used for interrupts that used SA_SAMPLE_RANDOM.  Given that we
don't use add_timer_randomness() for that, but for things that happen
much more rarely (i.e., such as keyboard/mouse input), I'm probably
going to end up removing the trickle thresh_check altogether.

Regards,

- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Jörn Engel
On Sun, 22 September 2013 19:36:41 -0400, Theodore Ts'o wrote:
 On Sun, Sep 22, 2013 at 04:53:34PM -0400, Jörn Engel wrote:
  
  And I want to keep that function.  Essentially the point of fast_mix()
  is to ratelimit _mix_pool_bytes().  Naïve ratelimiting would simply
  discard the input once the ratelimit has been reached.  My proposal is
  to still use the input bits, but use a really cheap mixing function.
 
 Sure, but how cheap is cheap?  There's a balance here.  I don't buy
 the argument that we must weaken the fast_mix() at all costs because
 we have to drive the cost of fast_mix() to zero.  If we're going to do
 that, to the limit fast_mix() should be a no-op, which is ridiculous.

Agreed.  We always have a tradeoff between quality and cost.  And I
repeat yet again that driving the cost down is important, because it
allows us to collect entropy more often.  The schedule is an entropy
source I would like to tap.

 So what I've suggested already makes fast_mix() much faster.  It's not
 fast as what you've proposed, but it's pretty clear that my fast_mix()
 is better at mixing the fast_mix pool.

Agreed.

  Your version of fast_mix() failed in the really cheap department.
  As a result, it showed up in profiles and at least one idiot (me)
  reverted to naïve ratelimiting.  It could have been worse, I was
  explicitly asked twice to just remove the call to
  add_interrupt_randomness().
 
 Sure, but that's not _my_ problem.  If someone stupid adulterates
 /dev/random, that's not my responsibility.  Most people aren't going
 to be dealing with systems with truly stupid interrupt rates, and most
 people aren't going to be tampering with the random driver.

This is where I don't agree.  I very much care even about the plastic
routers running some variant of Linux modified by some embedded
engineers under insane time pressure.  If you leave them an incentive
to cripple the random generator, some of them will.  If you find
source code with a maliciously crippled random generator, the author
has plausible deniability.

So this should be _our_ problem.  Maybe not yours specifically, but
certainly that of kernel developers in general.

 I'm certainly willing to make fast_mix() faster, to reduce the
 temptation for idiots to tamper with /dev/random, but we need to
 balance the time of fast_mix() with the quality of its mixing, and to
 remember what the common case is (and it's not 100k interrupts per
 second!)

How about we switch between the two mixing functions depending on the
interrupt load?  If this CPU has seen fewer than 1000 interrupts in
the last second, use the better one, otherwise us the cheaper one?

I don't really like the idea too much.  But it would cover both the
common case and my particular degenerated one.

 In practice, we are using randomness in so many places (every single
 time we start a process for ASLR, in lots of security-related programs
 that use SSL libraries, etc.), that we are actually practically
 *never* hitting trickle_thresh.
 
 The trickle_thresh was added originally when add_timer_randomness()
 was used for interrupts that used SA_SAMPLE_RANDOM.  Given that we
 don't use add_timer_randomness() for that, but for things that happen
 much more rarely (i.e., such as keyboard/mouse input), I'm probably
 going to end up removing the trickle thresh_check altogether.

Makes sense to me.

Jörn

--
Doubt is not a pleasant condition, but certainty is an absurd one.
-- Voltaire
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-22 Thread Theodore Ts'o
On Sun, Sep 22, 2013 at 08:16:23PM -0400, Jörn Engel wrote:
 How about we switch between the two mixing functions depending on the
 interrupt load?  If this CPU has seen fewer than 1000 interrupts in
 the last second, use the better one, otherwise us the cheaper one?

I guess the question here is whether it's worth it.  On a 2.8 GHz
laptop Ivy Bridge chip the numbers are:

Original fast_mix: 84 ns
tytso's fast_mix:  14 ns
joern's fast_mix:   8 ns

In terms of absolute overhead if you are running at an insane 100k
interrupts per second, it's still only 0.84%, 0.14%, and 0.08%,
respectively.  Granted, an embedded CPU will be (much) slower, but so
will the overall overhead of the rest of the interrupt handling code
path plus whatever the overhead of the device driver will be.  The
real bug is the 100k interrupts per second workload.

How about this as a compromise?  We can add an #ifdef in the random.c
code which has the alternate fast_mix algorithm in the very rare case
that some embedded software engineer under time-pressure and suffering
under the need to use a pathetically broken hardware design, and who
starts looking in the random.c code, will find the an alternate
version.  That way, we avoid the complexity of an dynamic switching
system, plus the overhead of measuring the number of interrupts per
second.

I am very strongly of the opinion that the number of systems where you
have an embedded system with that kind of inane interrupt rate is the
0.001% case.  So IMHO it's not even worth having a dynamic
switching system, especially when it's only going to improve things
slightly.

- Ted

P.S.  The real reason for the original fast_mix() function is because
it has a separate pool for each CPU, so there's no spinlock contention
and no cache line bouncing.  And that's why the fast_mix pool is so
small --- so that the entire struct fast_pool can fit in a single CPU
cache line.  So on normal, common systems --- even mobile handsets
have multiple cores these days --- fast_mix() *is* actually much
faster than the standard entropy pool, even before any optimizations.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-21 Thread Theodore Ts'o
The following fast_mix function, with the loop unrolling, is about 70%
slower than your proposed version, but it's still four times faster
than the original byte-based fast_mix function.  This is what I'm
considering using as a compromise.

Any comments or objections?

- Ted

static void fast_mix(struct fast_pool *f, __u32 input[4])
{
__u32   w;
int i;
unsignedinput_rotate = f->rotate;

#if 0
for (i = 0; i < 4; i++) {
w = rol32(input[i], input_rotate) ^ f->pool[i] ^
f->pool[(i + 3) & 3];
f->pool[i] = (w >> 3) ^ twist_table[w & 7];
input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
}
#else   /* loop unrolled for speed */
w = rol32(input[0], input_rotate) ^ f->pool[0] ^ f->pool[3];
f->pool[0] = (w >> 3) ^ twist_table[w & 7];
input_rotate = (input_rotate + 14) & 31;
w = rol32(input[1], input_rotate) ^ f->pool[1] ^ f->pool[0];
f->pool[1] = (w >> 3) ^ twist_table[w & 7];
input_rotate = (input_rotate + 7) & 31;
w = rol32(input[2], input_rotate) ^ f->pool[2] ^ f->pool[1];
f->pool[2] = (w >> 3) ^ twist_table[w & 7];
input_rotate = (input_rotate + 7) & 31;
w = rol32(input[3], input_rotate) ^ f->pool[3] ^ f->pool[2];
f->pool[3] = (w >> 3) ^ twist_table[w & 7];
input_rotate = (input_rotate + 7) & 31;
#endif
f->count += 16;
f->rotate = input_rotate;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-21 Thread Theodore Ts'o
BTW, just to give another example of the difference between the mixing
funtions, try compiling the following with and without ORIG_MIX defined...

   - Ted

#include 
#include 
#include 
#include 

/* #define ORIG_MIX */
#define ADD_ROTATE
#define DEBUG_POOL
#ifdef DEBUG_POOL
#define NUM_LOOPS 100
#else
#define NUM_LOOPS 100
#endif

typedef unsigned int __u32;

struct fast_pool {
__u32   pool[4];
unsigned long   last;
unsigned short  count;
unsigned char   rotate;
unsigned char   last_timer_intr;
};

static __u32 const twist_table[8] = {
0x, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };

/**
 * rol32 - rotate a 32-bit value left
 * @word: value to rotate
 * @shift: bits to roll
 */
static inline __u32 rol32(__u32 word, unsigned int shift)
{
return (word << shift) | (word >> (32 - shift));
}

#ifdef ORIG_MIX
/*
 * This is a fast mixing routine used by the interrupt randomness
 * collector.  It's hardcoded for an 128 bit pool and assumes that any
 * locks that might be needed are taken by the caller.
 */
static void fast_mix(struct fast_pool *f, const void *in, int nbytes)
{
const char  *bytes = in;
__u32   w;
unsignedi = f->count;
unsignedinput_rotate = f->rotate;

while (nbytes--) {
w = rol32(*bytes++, input_rotate & 31) ^ f->pool[i & 3] ^
f->pool[(i + 1) & 3];
f->pool[i & 3] = (w >> 3) ^ twist_table[w & 7];
input_rotate += (i++ & 3) ? 7 : 14;
}
f->count = i;
f->rotate = input_rotate;
}
#else
static void fast_mix(struct fast_pool *f, __u32 input[4])
{
int i = f->count;
__u32 acc, carry = f->pool[3] >> 25;
unsignedinput_rotate = f->rotate;

for (i = 0; i < 4; i++) {
#ifdef ADD_ROTATE
acc = (f->pool[i] << 7) ^ rol32(input[i], input_rotate & 31) ^
carry;
#else
acc = (f->pool[i] << 7) ^ input[i] ^ carry;
#endif

carry = f->pool[i] >> 25;
f->pool[i] = acc;
input_rotate += 7;
}
f->count++;
f->rotate = input_rotate;
}
#endif

void print_pool(struct fast_pool *pool)
{
int i;

printf("pool:\n");
for (i = 0; i < 4; i++) {
printf("\t0x%08x\n", pool->pool[i]);
}
}


int main(const char *argv, int argc)
{
struct fast_pool pool;
int i;
unsigned int input[4];

memset(, 0, sizeof(struct fast_pool));
memset(, 0, sizeof(input));
pool.pool[0] = 1;

for (i = 0; i < NUM_LOOPS; i++) {
#ifdef ORIG_MIX
fast_mix(, , sizeof(input));
#else
fast_mix(, input);
#endif
#ifdef DEBUG_POOL
print_pool();
#endif
}
#ifndef DEBUG_POOL
print_pool();
#endif
return 0;
}   
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-21 Thread Theodore Ts'o
On Mon, Sep 16, 2013 at 11:40:27AM -0400, Jörn Engel wrote:
> 
> Here is a patch to make add_interrupt_randomness() significantly
> cheaper without significantly impacting the quality.  The second part
> is my personal opinion and others might disagree.
> 
> So far this has only seen userspace performance testing, so don't
> merge it in a hurry.

Performance testing, but your new fast_mix pool has some serious
problems in terms of how well it works.

First of all, I think this is a typo:

> + for (i = 0; i < 3; i++) {

This means that pool[3] is always 0, and the input[3] is never mixed
in.  Hence, the pool is now only 96 bits instead of 128 bits, and the
last 4 bytes of the input pool are not getting mixed in.

Secondly, if you change this so that we actually use the whole pool,
and you mix in a constant set of inputs like this:

for (i = 0; i < 100; i++) {
input[0] = i;
input[1] = i;
input[2] = i;
input[3] = i;
fast_mix(, input);
print_pool();
}

you see something like this:

pool:
0x
0x
0x
0x
pool:
0x0001
0x0001
0x0001
0x0001
pool:
0x0082
0x0082
0x0082
0x0082
pool:
0x4103
0x4103
0x4103
0x4103
pool:
0x00208184
0x00208184
0x00208184
0x00208184
pool:
0x1040c205
0x1040c205
0x1040c205
0x1040c205

Granted, it's unlikely that we will be mixing numbers like this, but
it's a great demonstration of why I added the input_rotate aspect to
my mixing functions.

See my testing program below.  I need to think a bit more about
whether I'm comfortable with the new fast_mix that you've proposed
with the input_rotate restored, but at the minimum I think the rotate
is needed.

Regards,

- Ted


#include 
#include 
#include 
#include 

/* #define ORIG_MIX */
#define ADD_ROTATE
/* #define DEBUG_POOL */
#ifdef DEBUG_POOL
#define NUM_LOOPS 32
#else
#define NUM_LOOPS 100
#endif

typedef unsigned int __u32;

struct fast_pool {
__u32   pool[4];
unsigned long   last;
unsigned short  count;
unsigned char   rotate;
unsigned char   last_timer_intr;
};

static __u32 const twist_table[8] = {
0x, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };

/**
 * rol32 - rotate a 32-bit value left
 * @word: value to rotate
 * @shift: bits to roll
 */
static inline __u32 rol32(__u32 word, unsigned int shift)
{
return (word << shift) | (word >> (32 - shift));
}

#ifdef ORIG_MIX
/*
 * This is a fast mixing routine used by the interrupt randomness
 * collector.  It's hardcoded for an 128 bit pool and assumes that any
 * locks that might be needed are taken by the caller.
 */
static void fast_mix(struct fast_pool *f, const void *in, int nbytes)
{
const char  *bytes = in;
__u32   w;
unsignedi = f->count;
unsignedinput_rotate = f->rotate;

while (nbytes--) {
w = rol32(*bytes++, input_rotate & 31) ^ f->pool[i & 3] ^
f->pool[(i + 1) & 3];
f->pool[i & 3] = (w >> 3) ^ twist_table[w & 7];
input_rotate += (i++ & 3) ? 7 : 14;
}
f->count = i;
f->rotate = input_rotate;
}
#else
static void fast_mix(struct fast_pool *f, __u32 input[4])
{
int i = f->count;
__u32 acc, carry = f->pool[3] >> 25;
unsignedinput_rotate = f->rotate;

for (i = 0; i < 4; i++) {
#ifdef ADD_ROTATE
acc = (f->pool[i] << 7) ^ rol32(input[i], input_rotate & 31) ^
carry;
#else
acc = (f->pool[i] << 7) ^ input[i] ^ carry;
#endif

carry = f->pool[i] >> 25;
f->pool[i] = acc;
input_rotate += 7;
}
f->count++;
f->rotate = input_rotate;
}
#endif

void print_pool(struct fast_pool *pool)
{
int i;

printf("pool:\n");
for (i = 0; i < 4; i++) {
printf("\t0x%08x\n", pool->pool[i]);
}
}


int main(const char *argv, int argc)
{
struct fast_pool pool;
int i;
unsigned int input[4];

memset(, 0, sizeof(struct fast_pool));
srandom(42);

for (i = 0; i < NUM_LOOPS; i++) {
input[0] = i;
input[1] = i;
input[2] = i;
input[3] = i;
#ifdef ORIG_MIX
fast_mix(, , sizeof(input));
#else
fast_mix(, input);
#endif
#ifdef DEBUG_POOL
print_pool();
#endif
}
#ifndef DEBUG_POOL
print_pool();
#endif
return 0;
}   




--
To 

Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-21 Thread Theodore Ts'o
On Mon, Sep 16, 2013 at 11:40:27AM -0400, Jörn Engel wrote:
 
 Here is a patch to make add_interrupt_randomness() significantly
 cheaper without significantly impacting the quality.  The second part
 is my personal opinion and others might disagree.
 
 So far this has only seen userspace performance testing, so don't
 merge it in a hurry.

Performance testing, but your new fast_mix pool has some serious
problems in terms of how well it works.

First of all, I think this is a typo:

 + for (i = 0; i  3; i++) {

This means that pool[3] is always 0, and the input[3] is never mixed
in.  Hence, the pool is now only 96 bits instead of 128 bits, and the
last 4 bytes of the input pool are not getting mixed in.

Secondly, if you change this so that we actually use the whole pool,
and you mix in a constant set of inputs like this:

for (i = 0; i  100; i++) {
input[0] = i;
input[1] = i;
input[2] = i;
input[3] = i;
fast_mix(pool, input);
print_pool(pool);
}

you see something like this:

pool:
0x
0x
0x
0x
pool:
0x0001
0x0001
0x0001
0x0001
pool:
0x0082
0x0082
0x0082
0x0082
pool:
0x4103
0x4103
0x4103
0x4103
pool:
0x00208184
0x00208184
0x00208184
0x00208184
pool:
0x1040c205
0x1040c205
0x1040c205
0x1040c205

Granted, it's unlikely that we will be mixing numbers like this, but
it's a great demonstration of why I added the input_rotate aspect to
my mixing functions.

See my testing program below.  I need to think a bit more about
whether I'm comfortable with the new fast_mix that you've proposed
with the input_rotate restored, but at the minimum I think the rotate
is needed.

Regards,

- Ted


#include stdio.h
#include unistd.h
#include stdlib.h
#include string.h

/* #define ORIG_MIX */
#define ADD_ROTATE
/* #define DEBUG_POOL */
#ifdef DEBUG_POOL
#define NUM_LOOPS 32
#else
#define NUM_LOOPS 100
#endif

typedef unsigned int __u32;

struct fast_pool {
__u32   pool[4];
unsigned long   last;
unsigned short  count;
unsigned char   rotate;
unsigned char   last_timer_intr;
};

static __u32 const twist_table[8] = {
0x, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };

/**
 * rol32 - rotate a 32-bit value left
 * @word: value to rotate
 * @shift: bits to roll
 */
static inline __u32 rol32(__u32 word, unsigned int shift)
{
return (word  shift) | (word  (32 - shift));
}

#ifdef ORIG_MIX
/*
 * This is a fast mixing routine used by the interrupt randomness
 * collector.  It's hardcoded for an 128 bit pool and assumes that any
 * locks that might be needed are taken by the caller.
 */
static void fast_mix(struct fast_pool *f, const void *in, int nbytes)
{
const char  *bytes = in;
__u32   w;
unsignedi = f-count;
unsignedinput_rotate = f-rotate;

while (nbytes--) {
w = rol32(*bytes++, input_rotate  31) ^ f-pool[i  3] ^
f-pool[(i + 1)  3];
f-pool[i  3] = (w  3) ^ twist_table[w  7];
input_rotate += (i++  3) ? 7 : 14;
}
f-count = i;
f-rotate = input_rotate;
}
#else
static void fast_mix(struct fast_pool *f, __u32 input[4])
{
int i = f-count;
__u32 acc, carry = f-pool[3]  25;
unsignedinput_rotate = f-rotate;

for (i = 0; i  4; i++) {
#ifdef ADD_ROTATE
acc = (f-pool[i]  7) ^ rol32(input[i], input_rotate  31) ^
carry;
#else
acc = (f-pool[i]  7) ^ input[i] ^ carry;
#endif

carry = f-pool[i]  25;
f-pool[i] = acc;
input_rotate += 7;
}
f-count++;
f-rotate = input_rotate;
}
#endif

void print_pool(struct fast_pool *pool)
{
int i;

printf(pool:\n);
for (i = 0; i  4; i++) {
printf(\t0x%08x\n, pool-pool[i]);
}
}


int main(const char *argv, int argc)
{
struct fast_pool pool;
int i;
unsigned int input[4];

memset(pool, 0, sizeof(struct fast_pool));
srandom(42);

for (i = 0; i  NUM_LOOPS; i++) {
input[0] = i;
input[1] = i;
input[2] = i;
input[3] = i;
#ifdef ORIG_MIX
fast_mix(pool, input, sizeof(input));
#else
fast_mix(pool, input);
#endif
#ifdef DEBUG_POOL
print_pool(pool);
#endif
}
#ifndef DEBUG_POOL
print_pool(pool);
#endif
return 0;
}   




--
To 

Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-21 Thread Theodore Ts'o
BTW, just to give another example of the difference between the mixing
funtions, try compiling the following with and without ORIG_MIX defined...

   - Ted

#include stdio.h
#include unistd.h
#include stdlib.h
#include string.h

/* #define ORIG_MIX */
#define ADD_ROTATE
#define DEBUG_POOL
#ifdef DEBUG_POOL
#define NUM_LOOPS 100
#else
#define NUM_LOOPS 100
#endif

typedef unsigned int __u32;

struct fast_pool {
__u32   pool[4];
unsigned long   last;
unsigned short  count;
unsigned char   rotate;
unsigned char   last_timer_intr;
};

static __u32 const twist_table[8] = {
0x, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };

/**
 * rol32 - rotate a 32-bit value left
 * @word: value to rotate
 * @shift: bits to roll
 */
static inline __u32 rol32(__u32 word, unsigned int shift)
{
return (word  shift) | (word  (32 - shift));
}

#ifdef ORIG_MIX
/*
 * This is a fast mixing routine used by the interrupt randomness
 * collector.  It's hardcoded for an 128 bit pool and assumes that any
 * locks that might be needed are taken by the caller.
 */
static void fast_mix(struct fast_pool *f, const void *in, int nbytes)
{
const char  *bytes = in;
__u32   w;
unsignedi = f-count;
unsignedinput_rotate = f-rotate;

while (nbytes--) {
w = rol32(*bytes++, input_rotate  31) ^ f-pool[i  3] ^
f-pool[(i + 1)  3];
f-pool[i  3] = (w  3) ^ twist_table[w  7];
input_rotate += (i++  3) ? 7 : 14;
}
f-count = i;
f-rotate = input_rotate;
}
#else
static void fast_mix(struct fast_pool *f, __u32 input[4])
{
int i = f-count;
__u32 acc, carry = f-pool[3]  25;
unsignedinput_rotate = f-rotate;

for (i = 0; i  4; i++) {
#ifdef ADD_ROTATE
acc = (f-pool[i]  7) ^ rol32(input[i], input_rotate  31) ^
carry;
#else
acc = (f-pool[i]  7) ^ input[i] ^ carry;
#endif

carry = f-pool[i]  25;
f-pool[i] = acc;
input_rotate += 7;
}
f-count++;
f-rotate = input_rotate;
}
#endif

void print_pool(struct fast_pool *pool)
{
int i;

printf(pool:\n);
for (i = 0; i  4; i++) {
printf(\t0x%08x\n, pool-pool[i]);
}
}


int main(const char *argv, int argc)
{
struct fast_pool pool;
int i;
unsigned int input[4];

memset(pool, 0, sizeof(struct fast_pool));
memset(input, 0, sizeof(input));
pool.pool[0] = 1;

for (i = 0; i  NUM_LOOPS; i++) {
#ifdef ORIG_MIX
fast_mix(pool, input, sizeof(input));
#else
fast_mix(pool, input);
#endif
#ifdef DEBUG_POOL
print_pool(pool);
#endif
}
#ifndef DEBUG_POOL
print_pool(pool);
#endif
return 0;
}   
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH,RFC] random: make fast_mix() honor its name

2013-09-21 Thread Theodore Ts'o
The following fast_mix function, with the loop unrolling, is about 70%
slower than your proposed version, but it's still four times faster
than the original byte-based fast_mix function.  This is what I'm
considering using as a compromise.

Any comments or objections?

- Ted

static void fast_mix(struct fast_pool *f, __u32 input[4])
{
__u32   w;
int i;
unsignedinput_rotate = f-rotate;

#if 0
for (i = 0; i  4; i++) {
w = rol32(input[i], input_rotate) ^ f-pool[i] ^
f-pool[(i + 3)  3];
f-pool[i] = (w  3) ^ twist_table[w  7];
input_rotate = (input_rotate + (i ? 7 : 14))  31;
}
#else   /* loop unrolled for speed */
w = rol32(input[0], input_rotate) ^ f-pool[0] ^ f-pool[3];
f-pool[0] = (w  3) ^ twist_table[w  7];
input_rotate = (input_rotate + 14)  31;
w = rol32(input[1], input_rotate) ^ f-pool[1] ^ f-pool[0];
f-pool[1] = (w  3) ^ twist_table[w  7];
input_rotate = (input_rotate + 7)  31;
w = rol32(input[2], input_rotate) ^ f-pool[2] ^ f-pool[1];
f-pool[2] = (w  3) ^ twist_table[w  7];
input_rotate = (input_rotate + 7)  31;
w = rol32(input[3], input_rotate) ^ f-pool[3] ^ f-pool[2];
f-pool[3] = (w  3) ^ twist_table[w  7];
input_rotate = (input_rotate + 7)  31;
#endif
f-count += 16;
f-rotate = input_rotate;
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH,RFC] random: make fast_mix() honor its name

2013-09-16 Thread Jörn Engel
On Thu, 12 September 2013 19:31:55 -0400, Theodore Ts'o wrote:
> On Thu, Sep 12, 2013 at 05:07:17PM -0400, Jörn Engel wrote:
> > 
> > I happen to have a real-world system with >100k interrupts per second
> > and - surprise - add_interrupt_randomness() showed up prominently in
> > the profiles.  I was also told twice to just remove that call.  I
> > resisted both times and have done far more work to reduce overhead
> > while still collecting entropy.  Some others would have caved in.
> 
> Would it be possible for you to send me the perf numbers that you saw?

Here is a patch to make add_interrupt_randomness() significantly
cheaper without significantly impacting the quality.  The second part
is my personal opinion and others might disagree.

So far this has only seen userspace performance testing, so don't
merge it in a hurry.

Jörn

--
You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a
speed hack until you've proven that's where the bottleneck is.
-- Rob Pike


Doing a quick benchmark on my notebook, fast_mix() turned out to be
barely faster than the full mixing operation.  Doing 10M in a loop took
2.1s for the old fast_mix(), 2.5s for _mix_pool_bytes() and .12s for
my new fast_mix().  Making fast_mix() 16x less expensive gives people
less incentive to disable entropy collection.

The new fast_mix has a lower quality.  It simply rotates the entire pool
left by 39 bits and xor's the new entropy into the pool.  I think this
is good enough - if every interrupt collects just one bit of truly
unpredictable data and that bit is always in the same position, we will
accumulate 64 bits of entropy over the 64 rounds.  Given two bits in the
likely position - the lowest two bits of any one word - all 128 bits of
the pool are unpredictable.  If we wanted to collect more randomness, we
would have to either enlarge the pool or empty it more than once every
64 interrupts.

So I think the mixing quality is good enough.  If someone is concerned,
we can add a multiplication, which will fairly quickly mix every input
bit into every pool bit, at a 20% performance cost.  But really I would
rather collect from more entropy sources than improve mixing quality.

Signed-off-by: Joern Engel 
---
 drivers/char/random.c |   25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 32a6c57..36ef6e1 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -555,7 +555,6 @@ struct fast_pool {
__u32   pool[4];
unsigned long   last;
unsigned short  count;
-   unsigned char   rotate;
unsigned char   last_timer_intr;
 };
 
@@ -564,21 +563,17 @@ struct fast_pool {
  * collector.  It's hardcoded for an 128 bit pool and assumes that any
  * locks that might be needed are taken by the caller.
  */
-static void fast_mix(struct fast_pool *f, const void *in, int nbytes)
+static void fast_mix(struct fast_pool *f, __u32 input[4])
 {
-   const char  *bytes = in;
-   __u32   w;
-   unsignedi = f->count;
-   unsignedinput_rotate = f->rotate;
+   int i;
+   __u32 acc, carry = f->pool[3] >> 25;
 
-   while (nbytes--) {
-   w = rol32(*bytes++, input_rotate & 31) ^ f->pool[i & 3] ^
-   f->pool[(i + 1) & 3];
-   f->pool[i & 3] = (w >> 3) ^ twist_table[w & 7];
-   input_rotate += (i++ & 3) ? 7 : 14;
+   for (i = 0; i < 3; i++) {
+   acc = (f->pool[i] << 7) ^ input[i] ^ carry;
+   carry = f->pool[i] >> 25;
+   f->pool[i] = acc;
}
-   f->count = i;
-   f->rotate = input_rotate;
+   f->count++;
 }
 
 /*
@@ -757,9 +752,9 @@ void add_interrupt_randomness(int irq, int irq_flags)
input[3] = ip >> 32;
}
 
-   fast_mix(fast_pool, input, sizeof(input));
+   fast_mix(fast_pool, input);
 
-   if ((fast_pool->count & 1023) &&
+   if ((fast_pool->count & 63) &&
!time_after(now, fast_pool->last + HZ))
return;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH,RFC] random: make fast_mix() honor its name

2013-09-16 Thread Jörn Engel
On Thu, 12 September 2013 19:31:55 -0400, Theodore Ts'o wrote:
 On Thu, Sep 12, 2013 at 05:07:17PM -0400, Jörn Engel wrote:
  
  I happen to have a real-world system with 100k interrupts per second
  and - surprise - add_interrupt_randomness() showed up prominently in
  the profiles.  I was also told twice to just remove that call.  I
  resisted both times and have done far more work to reduce overhead
  while still collecting entropy.  Some others would have caved in.
 
 Would it be possible for you to send me the perf numbers that you saw?

Here is a patch to make add_interrupt_randomness() significantly
cheaper without significantly impacting the quality.  The second part
is my personal opinion and others might disagree.

So far this has only seen userspace performance testing, so don't
merge it in a hurry.

Jörn

--
You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a
speed hack until you've proven that's where the bottleneck is.
-- Rob Pike


Doing a quick benchmark on my notebook, fast_mix() turned out to be
barely faster than the full mixing operation.  Doing 10M in a loop took
2.1s for the old fast_mix(), 2.5s for _mix_pool_bytes() and .12s for
my new fast_mix().  Making fast_mix() 16x less expensive gives people
less incentive to disable entropy collection.

The new fast_mix has a lower quality.  It simply rotates the entire pool
left by 39 bits and xor's the new entropy into the pool.  I think this
is good enough - if every interrupt collects just one bit of truly
unpredictable data and that bit is always in the same position, we will
accumulate 64 bits of entropy over the 64 rounds.  Given two bits in the
likely position - the lowest two bits of any one word - all 128 bits of
the pool are unpredictable.  If we wanted to collect more randomness, we
would have to either enlarge the pool or empty it more than once every
64 interrupts.

So I think the mixing quality is good enough.  If someone is concerned,
we can add a multiplication, which will fairly quickly mix every input
bit into every pool bit, at a 20% performance cost.  But really I would
rather collect from more entropy sources than improve mixing quality.

Signed-off-by: Joern Engel jo...@logfs.org
---
 drivers/char/random.c |   25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 32a6c57..36ef6e1 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -555,7 +555,6 @@ struct fast_pool {
__u32   pool[4];
unsigned long   last;
unsigned short  count;
-   unsigned char   rotate;
unsigned char   last_timer_intr;
 };
 
@@ -564,21 +563,17 @@ struct fast_pool {
  * collector.  It's hardcoded for an 128 bit pool and assumes that any
  * locks that might be needed are taken by the caller.
  */
-static void fast_mix(struct fast_pool *f, const void *in, int nbytes)
+static void fast_mix(struct fast_pool *f, __u32 input[4])
 {
-   const char  *bytes = in;
-   __u32   w;
-   unsignedi = f-count;
-   unsignedinput_rotate = f-rotate;
+   int i;
+   __u32 acc, carry = f-pool[3]  25;
 
-   while (nbytes--) {
-   w = rol32(*bytes++, input_rotate  31) ^ f-pool[i  3] ^
-   f-pool[(i + 1)  3];
-   f-pool[i  3] = (w  3) ^ twist_table[w  7];
-   input_rotate += (i++  3) ? 7 : 14;
+   for (i = 0; i  3; i++) {
+   acc = (f-pool[i]  7) ^ input[i] ^ carry;
+   carry = f-pool[i]  25;
+   f-pool[i] = acc;
}
-   f-count = i;
-   f-rotate = input_rotate;
+   f-count++;
 }
 
 /*
@@ -757,9 +752,9 @@ void add_interrupt_randomness(int irq, int irq_flags)
input[3] = ip  32;
}
 
-   fast_mix(fast_pool, input, sizeof(input));
+   fast_mix(fast_pool, input);
 
-   if ((fast_pool-count  1023) 
+   if ((fast_pool-count  63) 
!time_after(now, fast_pool-last + HZ))
return;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/