Re: [PATCH,RFC] random: make fast_mix() honor its name
On Sun, 22 September 2013 22:43:38 -0400, Theodore Ts'o wrote: > On Sun, Sep 22, 2013 at 08:16:23PM -0400, Jörn Engel wrote: > > How about we switch between the two mixing functions depending on the > > interrupt load? If this CPU has seen fewer than 1000 interrupts in > > the last second, use the better one, otherwise us the cheaper one? > > I guess the question here is whether it's worth it. On a 2.8 GHz > laptop Ivy Bridge chip the numbers are: Then let us assume for now it is not worth it. When I finally get around to generating profiles for my insane system we can revisit the issue. > I am very strongly of the opinion that the number of systems where you > have an embedded system with that kind of inane interrupt rate is the > 0.001% case. That would be one machine in 10^13? I doubt we have reached 10^13 machines running Linux in all of history, so a single example would defeat your very strong opinion. ;) Anyway, let me collect some real numbers before we argue any further. And thank you for your maintainership. It may not appear that way, but I have _very_ little to complain about in drivers/char/random.c. Jörn -- One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision. -- Bertrand Russell -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
Thanks for your patience and elaborated answer. Theodore Ts'o wrote, on 09/22/2013 23:27: > On Sun, Sep 22, 2013 at 11:01:42PM +0200, Jörg-Volker Peetz wrote: >> just out of interest I would like to ask why this mixing function has to be >> that >> complicated. For example, even if the input is always 0 and the pool is >> seeded >> with pool[0] = 1 (as in your test program) this algorithm generates some >> (predictable) pseudo-random numbers in the pool. Is this necessary? >> >> To just mix in some random input filling the whole pool (seeded again with >> pool[0] = 1) something as "simple" as >> >> f->pool[0] = rol32(input[0], f->pool[2] & 31) ^ f->pool[1]; >> f->pool[1] = rol32(input[1], f->pool[3] & 31) ^ f->pool[2]; >> f->pool[2] = rol32(input[2], f->pool[0] & 31) ^ f->pool[3]; >> f->pool[3] = rol32(input[3], f->pool[1] & 31) ^ f->pool[0]; >> >> would suffice, although I didn't do any statistical tests. > > The structure of the mixing functions in /dev/random has been well > studied, and validatetd in a number of different academic papers. So > I prefer to stick with the basic architecture, even as it is scaled > down for speed reasons and beause the pool is smaller. I'm not arguing so much for speed but for simplicity and comprehensibility of code. As far as I understand the task of fast_mix() is just to collect random bits in a small buffer separated from the random pool? Once in a while these collected bits are then mixed into the main random pool. For that purpose, justifiably, a strong mixing function is used. > One of the important things about the mixing function is that if the > attacker knows the input values (of which the simplest example for > analytic purposes is if the input values are all zero), we want there > to be ample mixing across the pool. > > If you look at your proposed mixing function, in the case where the > input values are all zero, it devolves into: > >f->pool[0] = f->pool[1]; >f->pool[1] = f->pool[2]; >f->pool[2] = f->pool[3]; >f->pool[3] = f->pool[0]; > > Yes, I know the input values will never be all zero, but in the case > where the attacker knows what the input values are[1], but does not know > the contents of the entropy pool, a very simplistic mixing function > becomes relatively easy to analyze in the case where attacker knows > the inputs and then outputs, and is trying to determine the internal > state of the random driver. > > Measuring the speed of the fast_mix function which I put together, > it's already been speeded up compard to the original fast_mix function > by a factor of six. On my x86 laptop, I can run 10 million > iterations in 0.14 seconds, so that translates to 14ns per fast_mix > call. (The original fast_mix function runs in 84 nanoseconds.) > > So there is a cost-benefit tradeoff that we need to balance here. If > you have a system with 100k interrupts per second, performance is > going to be poor no matter what, and it's not clear how common such > systems really are. Most modern hardware do have some kind of NAPI or > other interrupt mitigation in place --- heck, even back in 1980's > people had figured out how to improve the 8250 UART with the 16550A > UART, which introdued a FIFO to decrease the interrupt load caused by > serial ports, and things have only gotten better since then. > > Out of curiosity, on your profiles, how many nanonseconds total is the > total interrupt code path chewing up per interrupt? > > Regards, > > - Ted > > [1] And on systems where we don't have get_cycles() or > random_get_entropy(), it becomes much easier for the attacker to guess > what at least half of the input values will be! > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
On Sun, Sep 22, 2013 at 08:16:23PM -0400, Jörn Engel wrote: > How about we switch between the two mixing functions depending on the > interrupt load? If this CPU has seen fewer than 1000 interrupts in > the last second, use the better one, otherwise us the cheaper one? I guess the question here is whether it's worth it. On a 2.8 GHz laptop Ivy Bridge chip the numbers are: Original fast_mix: 84 ns tytso's fast_mix: 14 ns joern's fast_mix: 8 ns In terms of absolute overhead if you are running at an insane 100k interrupts per second, it's still only 0.84%, 0.14%, and 0.08%, respectively. Granted, an embedded CPU will be (much) slower, but so will the overall overhead of the rest of the interrupt handling code path plus whatever the overhead of the device driver will be. The real bug is the 100k interrupts per second workload. How about this as a compromise? We can add an #ifdef in the random.c code which has the alternate fast_mix algorithm in the very rare case that some embedded software engineer under time-pressure and suffering under the need to use a pathetically broken hardware design, and who starts looking in the random.c code, will find the an alternate version. That way, we avoid the complexity of an dynamic switching system, plus the overhead of measuring the number of interrupts per second. I am very strongly of the opinion that the number of systems where you have an embedded system with that kind of inane interrupt rate is the 0.001% case. So IMHO it's not even worth having a dynamic switching system, especially when it's only going to improve things slightly. - Ted P.S. The real reason for the original fast_mix() function is because it has a separate pool for each CPU, so there's no spinlock contention and no cache line bouncing. And that's why the fast_mix pool is so small --- so that the entire struct fast_pool can fit in a single CPU cache line. So on normal, common systems --- even mobile handsets have multiple cores these days --- fast_mix() *is* actually much faster than the standard entropy pool, even before any optimizations. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
On Sun, 22 September 2013 19:36:41 -0400, Theodore Ts'o wrote: > On Sun, Sep 22, 2013 at 04:53:34PM -0400, Jörn Engel wrote: > > > > And I want to keep that function. Essentially the point of fast_mix() > > is to ratelimit _mix_pool_bytes(). Naïve ratelimiting would simply > > discard the input once the ratelimit has been reached. My proposal is > > to still use the input bits, but use a really cheap mixing function. > > Sure, but how cheap is "cheap"? There's a balance here. I don't buy > the argument that we must weaken the fast_mix() at all costs because > we have to drive the cost of fast_mix() to zero. If we're going to do > that, to the limit fast_mix() should be a no-op, which is ridiculous. Agreed. We always have a tradeoff between quality and cost. And I repeat yet again that driving the cost down is important, because it allows us to collect entropy more often. The schedule is an entropy source I would like to tap. > So what I've suggested already makes fast_mix() much faster. It's not > fast as what you've proposed, but it's pretty clear that my fast_mix() > is better at mixing the fast_mix pool. Agreed. > > Your version of fast_mix() failed in the "really cheap" department. > > As a result, it showed up in profiles and at least one idiot (me) > > reverted to naïve ratelimiting. It could have been worse, I was > > explicitly asked twice to just remove the call to > > add_interrupt_randomness(). > > Sure, but that's not _my_ problem. If someone stupid adulterates > /dev/random, that's not my responsibility. Most people aren't going > to be dealing with systems with truly stupid interrupt rates, and most > people aren't going to be tampering with the random driver. This is where I don't agree. I very much care even about the plastic routers running some variant of Linux modified by some embedded engineers under insane time pressure. If you leave them an incentive to cripple the random generator, some of them will. If you find source code with a maliciously crippled random generator, the author has plausible deniability. So this should be _our_ problem. Maybe not yours specifically, but certainly that of kernel developers in general. > I'm certainly willing to make fast_mix() faster, to reduce the > temptation for idiots to tamper with /dev/random, but we need to > balance the time of fast_mix() with the quality of its mixing, and to > remember what the common case is (and it's not 100k interrupts per > second!) How about we switch between the two mixing functions depending on the interrupt load? If this CPU has seen fewer than 1000 interrupts in the last second, use the better one, otherwise us the cheaper one? I don't really like the idea too much. But it would cover both the common case and my particular degenerated one. > In practice, we are using randomness in so many places (every single > time we start a process for ASLR, in lots of security-related programs > that use SSL libraries, etc.), that we are actually practically > *never* hitting trickle_thresh. > > The trickle_thresh was added originally when add_timer_randomness() > was used for interrupts that used SA_SAMPLE_RANDOM. Given that we > don't use add_timer_randomness() for that, but for things that happen > much more rarely (i.e., such as keyboard/mouse input), I'm probably > going to end up removing the trickle thresh_check altogether. Makes sense to me. Jörn -- Doubt is not a pleasant condition, but certainty is an absurd one. -- Voltaire -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
On Sun, Sep 22, 2013 at 04:53:34PM -0400, Jörn Engel wrote: > > And I want to keep that function. Essentially the point of fast_mix() > is to ratelimit _mix_pool_bytes(). Naïve ratelimiting would simply > discard the input once the ratelimit has been reached. My proposal is > to still use the input bits, but use a really cheap mixing function. Sure, but how cheap is "cheap"? There's a balance here. I don't buy the argument that we must weaken the fast_mix() at all costs because we have to drive the cost of fast_mix() to zero. If we're going to do that, to the limit fast_mix() should be a no-op, which is ridiculous. So what I've suggested already makes fast_mix() much faster. It's not fast as what you've proposed, but it's pretty clear that my fast_mix() is better at mixing the fast_mix pool. > Your version of fast_mix() failed in the "really cheap" department. > As a result, it showed up in profiles and at least one idiot (me) > reverted to naïve ratelimiting. It could have been worse, I was > explicitly asked twice to just remove the call to > add_interrupt_randomness(). Sure, but that's not _my_ problem. If someone stupid adulterates /dev/random, that's not my responsibility. Most people aren't going to be dealing with systems with truly stupid interrupt rates, and most people aren't going to be tampering with the random driver. I'm certainly willing to make fast_mix() faster, to reduce the temptation for idiots to tamper with /dev/random, but we need to balance the time of fast_mix() with the quality of its mixing, and to remember what the common case is (and it's not 100k interrupts per second!) > And you should do the same for add_timer_randomness(), where again you > have ratelimiting. Once trickle_thresh is reached your code simply > discards most randomness. In practice, we are using randomness in so many places (every single time we start a process for ASLR, in lots of security-related programs that use SSL libraries, etc.), that we are actually practically *never* hitting trickle_thresh. The trickle_thresh was added originally when add_timer_randomness() was used for interrupts that used SA_SAMPLE_RANDOM. Given that we don't use add_timer_randomness() for that, but for things that happen much more rarely (i.e., such as keyboard/mouse input), I'm probably going to end up removing the trickle thresh_check altogether. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
On Sun, 22 September 2013 17:27:52 -0400, Theodore Ts'o wrote: > > The structure of the mixing functions in /dev/random has been well > studied, and validatetd in a number of different academic papers. So > I prefer to stick with the basic architecture, even as it is scaled > down for speed reasons and beause the pool is smaller. And I want to keep that function. Essentially the point of fast_mix() is to ratelimit _mix_pool_bytes(). Naïve ratelimiting would simply discard the input once the ratelimit has been reached. My proposal is to still use the input bits, but use a really cheap mixing function. Your version of fast_mix() failed in the "really cheap" department. As a result, it showed up in profiles and at least one idiot (me) reverted to naïve ratelimiting. It could have been worse, I was explicitly asked twice to just remove the call to add_interrupt_randomness(). So don't think of my patch as weakening the mixing, but as strengthening the ratelimited mixing. If we have few interrupts, _mix_pool_bytes() will be run once a second, if we have many it will be run once every 64 interrupts. And in the latter case, the input for _mix_pool_bytes() is much better than with naïve ratelimiting. And you should do the same for add_timer_randomness(), where again you have ratelimiting. Once trickle_thresh is reached your code simply discards most randomness. Only once in 4096 call do you use all the bits you get - most of which will be predictable. Why not use a cheap mixing function for the other 4095 calls and ensure we have many good bits on call 4096? Jörn -- You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. -- Rob Pike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
On Sat, 21 September 2013 17:41:18 -0400, Theodore Ts'o wrote: > > BTW, just to give another example of the difference between the mixing > funtions, try compiling the following with and without ORIG_MIX defined... Garbage in, garbage out again. If there is absolutely no randomness in the input (all bits zero), my mixing function will simply shift the pool. And because the shifting has a period of 128, there are only 128 possible pool states. Your mixing function is doing slightly better, it effectively becomes an interrupt counter with a silly output format. But who cares? If there is absolutely no randomness in the input, you have lost. That case isn't worth contemplating. The question is whether any randomness present in the input will get accumulated. Without the shifting, a single unpredictable bit on, say, the lowest position of the timestamp will be xor'ed into the same pool bit every time. The rest of the pool would always be predictable and the resulting mixing function would clearly be bad. With the shifting and using the same example, after 128 rounds every bit of the pool is unpredictable. Job done, we can go home now. You cannot achieve anything better than 128 unpredictable bits, no matter how much you try. Jörn -- Those who come seeking peace without a treaty are plotting. -- Sun Tzu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
On Sat, 21 September 2013 17:25:10 -0400, Theodore Ts'o wrote: > On Mon, Sep 16, 2013 at 11:40:27AM -0400, Jörn Engel wrote: > > > > Here is a patch to make add_interrupt_randomness() significantly > > cheaper without significantly impacting the quality. The second part > > is my personal opinion and others might disagree. > > > > So far this has only seen userspace performance testing, so don't > > merge it in a hurry. > > Performance testing, but your new fast_mix pool has some serious > problems in terms of how well it works. > > First of all, I think this is a typo: > > > + for (i = 0; i < 3; i++) { > > This means that pool[3] is always 0, and the input[3] is never mixed > in. Hence, the pool is now only 96 bits instead of 128 bits, and the > last 4 bytes of the input pool are not getting mixed in. Yes, indeed. > Secondly, if you change this so that we actually use the whole pool, > and you mix in a constant set of inputs like this: > > for (i = 0; i < 100; i++) { > input[0] = i; > input[1] = i; > input[2] = i; > input[3] = i; > fast_mix(&pool, input); > print_pool(&pool); > } > > you see something like this: > > pool: > 0x > 0x > 0x > 0x > pool: > 0x0001 > 0x0001 > 0x0001 > 0x0001 > pool: > 0x0082 > 0x0082 > 0x0082 > 0x0082 > pool: > 0x4103 > 0x4103 > 0x4103 > 0x4103 > pool: > 0x00208184 > 0x00208184 > 0x00208184 > 0x00208184 > pool: > 0x1040c205 > 0x1040c205 > 0x1040c205 > 0x1040c205 > > Granted, it's unlikely that we will be mixing numbers like this, but > it's a great demonstration of why I added the input_rotate aspect to > my mixing functions. Honestly, this is a case of garbage in, garbage out. If your "randomness" is the same number repeated four times, the repetitions don't add anything. A more complicated mixing function doesn't buy you much. In the extreme case, you have the AES-encrypted counter example. No entropy on the input side, something that looks random on the output side, and - because there is no secret key in the complicated mixing function - anyone willing to sit down and do the math can predict the output. The failure case I was personally concerned about was a commutative mixing function. Interrupt numbers, for examples, are perfectly predictable. The randomness comes from the order or interrupts. So mix(a, b) != mix(b, a) is a hard requirement, if you excuse my silly notation. One variant that still fails your test above, but ensures every input bit will eventually influence every output bit (certainly after 64 rounds) would be this: static void fast_mix(struct fast_pool *f, __u32 input[4]) { int i; __u32 acc, p, carry = f->pool[3] >> 25; for (i = 0; i < 4; i++) { p = carry * input[i]; acc = (f->pool[i] << 7) ^ input[i] ^ carry ^ p; carry = f->pool[i] >> 25; f->pool[i] = acc; } } Clearly this is a better mixing function, but I still think we don't need that much quality. If there is any randomness at all in the interrupts, we will quickly make every bit in the pool unpredictable. And if the interrupts are completely predictable in every bit and in their order, well, we have lost either way. Jörn -- I say what I believe in. And give reasons. -- Pervez Hoodbhoy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
On Sun, Sep 22, 2013 at 11:01:42PM +0200, Jörg-Volker Peetz wrote: > just out of interest I would like to ask why this mixing function has to be > that > complicated. For example, even if the input is always 0 and the pool is seeded > with pool[0] = 1 (as in your test program) this algorithm generates some > (predictable) pseudo-random numbers in the pool. Is this necessary? > > To just mix in some random input filling the whole pool (seeded again with > pool[0] = 1) something as "simple" as > > f->pool[0] = rol32(input[0], f->pool[2] & 31) ^ f->pool[1]; > f->pool[1] = rol32(input[1], f->pool[3] & 31) ^ f->pool[2]; > f->pool[2] = rol32(input[2], f->pool[0] & 31) ^ f->pool[3]; > f->pool[3] = rol32(input[3], f->pool[1] & 31) ^ f->pool[0]; > > would suffice, although I didn't do any statistical tests. The structure of the mixing functions in /dev/random has been well studied, and validatetd in a number of different academic papers. So I prefer to stick with the basic architecture, even as it is scaled down for speed reasons and beause the pool is smaller. One of the important things about the mixing function is that if the attacker knows the input values (of which the simplest example for analytic purposes is if the input values are all zero), we want there to be ample mixing across the pool. If you look at your proposed mixing function, in the case where the input values are all zero, it devolves into: f->pool[0] = f->pool[1]; f->pool[1] = f->pool[2]; f->pool[2] = f->pool[3]; f->pool[3] = f->pool[0]; Yes, I know the input values will never be all zero, but in the case where the attacker knows what the input values are[1], but does not know the contents of the entropy pool, a very simplistic mixing function becomes relatively easy to analyze in the case where attacker knows the inputs and then outputs, and is trying to determine the internal state of the random driver. Measuring the speed of the fast_mix function which I put together, it's already been speeded up compard to the original fast_mix function by a factor of six. On my x86 laptop, I can run 10 million iterations in 0.14 seconds, so that translates to 14ns per fast_mix call. (The original fast_mix function runs in 84 nanoseconds.) So there is a cost-benefit tradeoff that we need to balance here. If you have a system with 100k interrupts per second, performance is going to be poor no matter what, and it's not clear how common such systems really are. Most modern hardware do have some kind of NAPI or other interrupt mitigation in place --- heck, even back in 1980's people had figured out how to improve the 8250 UART with the 16550A UART, which introdued a FIFO to decrease the interrupt load caused by serial ports, and things have only gotten better since then. Out of curiosity, on your profiles, how many nanonseconds total is the total interrupt code path chewing up per interrupt? Regards, - Ted [1] And on systems where we don't have get_cycles() or random_get_entropy(), it becomes much easier for the attacker to guess what at least half of the input values will be! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
Hi Theodore, Theodore Ts'o wrote, on 09/22/2013 05:05: > The following fast_mix function, with the loop unrolling, is about 70% > slower than your proposed version, but it's still four times faster > than the original byte-based fast_mix function. This is what I'm > considering using as a compromise. > > Any comments or objections? > > - Ted > > static void fast_mix(struct fast_pool *f, __u32 input[4]) > { > __u32 w; > int i; > unsignedinput_rotate = f->rotate; > > #if 0 > for (i = 0; i < 4; i++) { > w = rol32(input[i], input_rotate) ^ f->pool[i] ^ > f->pool[(i + 3) & 3]; > f->pool[i] = (w >> 3) ^ twist_table[w & 7]; > input_rotate = (input_rotate + (i ? 7 : 14)) & 31; > } > #else /* loop unrolled for speed */ > w = rol32(input[0], input_rotate) ^ f->pool[0] ^ f->pool[3]; > f->pool[0] = (w >> 3) ^ twist_table[w & 7]; > input_rotate = (input_rotate + 14) & 31; > w = rol32(input[1], input_rotate) ^ f->pool[1] ^ f->pool[0]; > f->pool[1] = (w >> 3) ^ twist_table[w & 7]; > input_rotate = (input_rotate + 7) & 31; > w = rol32(input[2], input_rotate) ^ f->pool[2] ^ f->pool[1]; > f->pool[2] = (w >> 3) ^ twist_table[w & 7]; > input_rotate = (input_rotate + 7) & 31; > w = rol32(input[3], input_rotate) ^ f->pool[3] ^ f->pool[2]; > f->pool[3] = (w >> 3) ^ twist_table[w & 7]; > input_rotate = (input_rotate + 7) & 31; > #endif > f->count += 16; > f->rotate = input_rotate; > } > just out of interest I would like to ask why this mixing function has to be that complicated. For example, even if the input is always 0 and the pool is seeded with pool[0] = 1 (as in your test program) this algorithm generates some (predictable) pseudo-random numbers in the pool. Is this necessary? To just mix in some random input filling the whole pool (seeded again with pool[0] = 1) something as "simple" as f->pool[0] = rol32(input[0], f->pool[2] & 31) ^ f->pool[1]; f->pool[1] = rol32(input[1], f->pool[3] & 31) ^ f->pool[2]; f->pool[2] = rol32(input[2], f->pool[0] & 31) ^ f->pool[3]; f->pool[3] = rol32(input[3], f->pool[1] & 31) ^ f->pool[0]; would suffice, although I didn't do any statistical tests. Best regards, Jörg-Volker. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
The following fast_mix function, with the loop unrolling, is about 70% slower than your proposed version, but it's still four times faster than the original byte-based fast_mix function. This is what I'm considering using as a compromise. Any comments or objections? - Ted static void fast_mix(struct fast_pool *f, __u32 input[4]) { __u32 w; int i; unsignedinput_rotate = f->rotate; #if 0 for (i = 0; i < 4; i++) { w = rol32(input[i], input_rotate) ^ f->pool[i] ^ f->pool[(i + 3) & 3]; f->pool[i] = (w >> 3) ^ twist_table[w & 7]; input_rotate = (input_rotate + (i ? 7 : 14)) & 31; } #else /* loop unrolled for speed */ w = rol32(input[0], input_rotate) ^ f->pool[0] ^ f->pool[3]; f->pool[0] = (w >> 3) ^ twist_table[w & 7]; input_rotate = (input_rotate + 14) & 31; w = rol32(input[1], input_rotate) ^ f->pool[1] ^ f->pool[0]; f->pool[1] = (w >> 3) ^ twist_table[w & 7]; input_rotate = (input_rotate + 7) & 31; w = rol32(input[2], input_rotate) ^ f->pool[2] ^ f->pool[1]; f->pool[2] = (w >> 3) ^ twist_table[w & 7]; input_rotate = (input_rotate + 7) & 31; w = rol32(input[3], input_rotate) ^ f->pool[3] ^ f->pool[2]; f->pool[3] = (w >> 3) ^ twist_table[w & 7]; input_rotate = (input_rotate + 7) & 31; #endif f->count += 16; f->rotate = input_rotate; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
BTW, just to give another example of the difference between the mixing funtions, try compiling the following with and without ORIG_MIX defined... - Ted #include #include #include #include /* #define ORIG_MIX */ #define ADD_ROTATE #define DEBUG_POOL #ifdef DEBUG_POOL #define NUM_LOOPS 100 #else #define NUM_LOOPS 100 #endif typedef unsigned int __u32; struct fast_pool { __u32 pool[4]; unsigned long last; unsigned short count; unsigned char rotate; unsigned char last_timer_intr; }; static __u32 const twist_table[8] = { 0x, 0x3b6e20c8, 0x76dc4190, 0x4db26158, 0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 }; /** * rol32 - rotate a 32-bit value left * @word: value to rotate * @shift: bits to roll */ static inline __u32 rol32(__u32 word, unsigned int shift) { return (word << shift) | (word >> (32 - shift)); } #ifdef ORIG_MIX /* * This is a fast mixing routine used by the interrupt randomness * collector. It's hardcoded for an 128 bit pool and assumes that any * locks that might be needed are taken by the caller. */ static void fast_mix(struct fast_pool *f, const void *in, int nbytes) { const char *bytes = in; __u32 w; unsignedi = f->count; unsignedinput_rotate = f->rotate; while (nbytes--) { w = rol32(*bytes++, input_rotate & 31) ^ f->pool[i & 3] ^ f->pool[(i + 1) & 3]; f->pool[i & 3] = (w >> 3) ^ twist_table[w & 7]; input_rotate += (i++ & 3) ? 7 : 14; } f->count = i; f->rotate = input_rotate; } #else static void fast_mix(struct fast_pool *f, __u32 input[4]) { int i = f->count; __u32 acc, carry = f->pool[3] >> 25; unsignedinput_rotate = f->rotate; for (i = 0; i < 4; i++) { #ifdef ADD_ROTATE acc = (f->pool[i] << 7) ^ rol32(input[i], input_rotate & 31) ^ carry; #else acc = (f->pool[i] << 7) ^ input[i] ^ carry; #endif carry = f->pool[i] >> 25; f->pool[i] = acc; input_rotate += 7; } f->count++; f->rotate = input_rotate; } #endif void print_pool(struct fast_pool *pool) { int i; printf("pool:\n"); for (i = 0; i < 4; i++) { printf("\t0x%08x\n", pool->pool[i]); } } int main(const char *argv, int argc) { struct fast_pool pool; int i; unsigned int input[4]; memset(&pool, 0, sizeof(struct fast_pool)); memset(&input, 0, sizeof(input)); pool.pool[0] = 1; for (i = 0; i < NUM_LOOPS; i++) { #ifdef ORIG_MIX fast_mix(&pool, &input, sizeof(input)); #else fast_mix(&pool, input); #endif #ifdef DEBUG_POOL print_pool(&pool); #endif } #ifndef DEBUG_POOL print_pool(&pool); #endif return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH,RFC] random: make fast_mix() honor its name
On Mon, Sep 16, 2013 at 11:40:27AM -0400, Jörn Engel wrote: > > Here is a patch to make add_interrupt_randomness() significantly > cheaper without significantly impacting the quality. The second part > is my personal opinion and others might disagree. > > So far this has only seen userspace performance testing, so don't > merge it in a hurry. Performance testing, but your new fast_mix pool has some serious problems in terms of how well it works. First of all, I think this is a typo: > + for (i = 0; i < 3; i++) { This means that pool[3] is always 0, and the input[3] is never mixed in. Hence, the pool is now only 96 bits instead of 128 bits, and the last 4 bytes of the input pool are not getting mixed in. Secondly, if you change this so that we actually use the whole pool, and you mix in a constant set of inputs like this: for (i = 0; i < 100; i++) { input[0] = i; input[1] = i; input[2] = i; input[3] = i; fast_mix(&pool, input); print_pool(&pool); } you see something like this: pool: 0x 0x 0x 0x pool: 0x0001 0x0001 0x0001 0x0001 pool: 0x0082 0x0082 0x0082 0x0082 pool: 0x4103 0x4103 0x4103 0x4103 pool: 0x00208184 0x00208184 0x00208184 0x00208184 pool: 0x1040c205 0x1040c205 0x1040c205 0x1040c205 Granted, it's unlikely that we will be mixing numbers like this, but it's a great demonstration of why I added the input_rotate aspect to my mixing functions. See my testing program below. I need to think a bit more about whether I'm comfortable with the new fast_mix that you've proposed with the input_rotate restored, but at the minimum I think the rotate is needed. Regards, - Ted #include #include #include #include /* #define ORIG_MIX */ #define ADD_ROTATE /* #define DEBUG_POOL */ #ifdef DEBUG_POOL #define NUM_LOOPS 32 #else #define NUM_LOOPS 100 #endif typedef unsigned int __u32; struct fast_pool { __u32 pool[4]; unsigned long last; unsigned short count; unsigned char rotate; unsigned char last_timer_intr; }; static __u32 const twist_table[8] = { 0x, 0x3b6e20c8, 0x76dc4190, 0x4db26158, 0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 }; /** * rol32 - rotate a 32-bit value left * @word: value to rotate * @shift: bits to roll */ static inline __u32 rol32(__u32 word, unsigned int shift) { return (word << shift) | (word >> (32 - shift)); } #ifdef ORIG_MIX /* * This is a fast mixing routine used by the interrupt randomness * collector. It's hardcoded for an 128 bit pool and assumes that any * locks that might be needed are taken by the caller. */ static void fast_mix(struct fast_pool *f, const void *in, int nbytes) { const char *bytes = in; __u32 w; unsignedi = f->count; unsignedinput_rotate = f->rotate; while (nbytes--) { w = rol32(*bytes++, input_rotate & 31) ^ f->pool[i & 3] ^ f->pool[(i + 1) & 3]; f->pool[i & 3] = (w >> 3) ^ twist_table[w & 7]; input_rotate += (i++ & 3) ? 7 : 14; } f->count = i; f->rotate = input_rotate; } #else static void fast_mix(struct fast_pool *f, __u32 input[4]) { int i = f->count; __u32 acc, carry = f->pool[3] >> 25; unsignedinput_rotate = f->rotate; for (i = 0; i < 4; i++) { #ifdef ADD_ROTATE acc = (f->pool[i] << 7) ^ rol32(input[i], input_rotate & 31) ^ carry; #else acc = (f->pool[i] << 7) ^ input[i] ^ carry; #endif carry = f->pool[i] >> 25; f->pool[i] = acc; input_rotate += 7; } f->count++; f->rotate = input_rotate; } #endif void print_pool(struct fast_pool *pool) { int i; printf("pool:\n"); for (i = 0; i < 4; i++) { printf("\t0x%08x\n", pool->pool[i]); } } int main(const char *argv, int argc) { struct fast_pool pool; int i; unsigned int input[4]; memset(&pool, 0, sizeof(struct fast_pool)); srandom(42); for (i = 0; i < NUM_LOOPS; i++) { input[0] = i; input[1] = i; input[2] = i; input[3] = i; #ifdef ORIG_MIX fast_mix(&pool, &input, sizeof(input)); #else fast_mix(&pool, input); #endif #ifdef DEBUG_POOL print_pool(&pool); #endif } #ifndef DEBUG_POOL print_pool(&pool); #endif