Re: early x86 unseeded randomness
On Wed, Aug 16, 2017 at 11:13:03AM +0200, Thomas Gleixner wrote: > On Tue, 15 Aug 2017, Theodore Ts'o wrote: > > If we really want to do this, I'd much rather *not* have code calling > > tsc_early_random(). We're better off having the code call > > get_random_bytes() and/or get_random_u32(), and having these systems > > use RDRAND if available, and if not, falling back to > > tsc_early_random() and then mixing it with whatever unpredictability > > we may have been able to gather so far if the CRNG hasn't been > > initialized yet. > > I agree. This is not about systems which have RDRAND. We want to support > systems which do not have it and there the TSC magic comes handy. > > > That way something like tsc_early_random() can help, but it can't make > > things worse than what we have today (excepting the performance delay > > caused by adding whatever random shite that we hope is enough to > > introduce unpredictability to the TSC --- for which I still remain > > very skeptical). > > I just rerun tests in the early boot code (interrupts disabled, no NMIs > ...) with the TSC/wbinvd voodoo on several generations of machines and > stored 4M random values in a big static array. Reading it out after boot > and running it through dieharder makes me pretty confident that we observe > real random noise coming from the internals of the microarch/pipelines/bus > interactions. > > > P.S. As I recall hpa@ has talked to some Intel architects internally > > about how much unpredictability we could really get, and how much of > > it is just because there's complex state that we can't see (which if > > we could see, might make it much more predictable), and as I recall > > Right, there is complex state which is not completely synchronous even if > all frequencies are derived from a single 24MHZ oscillator. The PWMs, the > memory access characteristics and quite some other sources of > asynchronousity allow us to utilize that and I'm pretty sure, that you > can't find two systems which expose exactly the same behaviour. > > > they didn't say anyhing definitively; but they were nervous. I'm > > Sure, they are always nervous when you ask them questions about the > internals of their chips especially when you expect authorative answers. Right, especially as this is randomness as a side-effect of the design, rather than something that was an actual design goal. You won't find CPU designers committing to semantics of accidental behaviours :) Another paper on this (which I think Paul pointed me to) is: https://www.kernel.org/doc/ols/2014/ols2014-mueller.pdf which seems to be what crypto/jitterentropy.c is based on. On arm64, we currently rely on the bootloader for entropy (either an explicit kaslr seed, or the EFI_RNG_PROTOCOL). Unfortunately, the former is often zero and the latter unimplemented, but this seems to be improving slowly. Will
Re: early x86 unseeded randomness
On Tue, 15 Aug 2017, Theodore Ts'o wrote: > If we really want to do this, I'd much rather *not* have code calling > tsc_early_random(). We're better off having the code call > get_random_bytes() and/or get_random_u32(), and having these systems > use RDRAND if available, and if not, falling back to > tsc_early_random() and then mixing it with whatever unpredictability > we may have been able to gather so far if the CRNG hasn't been > initialized yet. I agree. This is not about systems which have RDRAND. We want to support systems which do not have it and there the TSC magic comes handy. > That way something like tsc_early_random() can help, but it can't make > things worse than what we have today (excepting the performance delay > caused by adding whatever random shite that we hope is enough to > introduce unpredictability to the TSC --- for which I still remain > very skeptical). I just rerun tests in the early boot code (interrupts disabled, no NMIs ...) with the TSC/wbinvd voodoo on several generations of machines and stored 4M random values in a big static array. Reading it out after boot and running it through dieharder makes me pretty confident that we observe real random noise coming from the internals of the microarch/pipelines/bus interactions. > P.S. As I recall hpa@ has talked to some Intel architects internally > about how much unpredictability we could really get, and how much of > it is just because there's complex state that we can't see (which if > we could see, might make it much more predictable), and as I recall Right, there is complex state which is not completely synchronous even if all frequencies are derived from a single 24MHZ oscillator. The PWMs, the memory access characteristics and quite some other sources of asynchronousity allow us to utilize that and I'm pretty sure, that you can't find two systems which expose exactly the same behaviour. > they didn't say anyhing definitively; but they were nervous. I'm Sure, they are always nervous when you ask them questions about the internals of their chips especially when you expect authorative answers. > pretty sure that for Intel architects, the right answer from their > perspective is to use RDRAND, and not to play games with the TSC. > > The other thing to note here is that because Intel has RDRAND, I'm > actually not that worried about Intel; all of the > drivers/char/random.c will mix in inputs from RDRAND or RDSEED if > available. I'm actually much more worried about architectures that > don't have a hardware random number generator (e.g., some ARM > subarchitectures and MIPS). So while you might be able to come up > with something that could work on x86, the real question is it safely > generalizable to other, non-x86 architectures. And that's where it > gets much more scary. You have similar asynchronous behaviour there especially when you take RTCs into account. But also the stupid mechanism of utilizing a timer/counter vs. a constant number of execution cycles mixed with a cacheflush will expose random behaviour. You need to make some decisions which bit of the counter to use, but that merily depends on the frequency ratio of the CPU core vs. the counter. It'd be probably worthwhile to code up a generic version of the memcmp/readcounter/memset/cacheflush loop and expose it to a variety of platforms to figure out how that behaves. We won't get authorative answers from the silicon dudes, but my (not so limited) understanding of the hardware internals makes me pretty confident that there is enough noise caused by clock interference and domain interaction available on all modern CPUs. Thanks, tglx
Re: early x86 unseeded randomness
On Tue, Aug 15, 2017 at 07:37:05PM +0200, Thomas Gleixner wrote: > That exploits the fact that the CPU and caches run at a different non > synchronized clock than the memory controller and therefore the execution > time for both the wbinvd() and the memchr_inv() measured in TSC cycles is > non constant and random enough for the early boot randomization. Um, can we guarantee that is always true for all systems? Even, say, for Silermont, Goldmont and Goldmont Plus (Intel's SOC designs) where the memory may be located in the same chip package as the CPU/caches? And even if this is true today, can we be sure that it will be true for the forseeable future? Using multiple clocks takes more power, so I would think that on a SOC there would be a strong pressure to use a single oscillator for the whole package. If we really want to do this, I'd much rather *not* have code calling tsc_early_random(). We're better off having the code call get_random_bytes() and/or get_random_u32(), and having these systems use RDRAND if available, and if not, falling back to tsc_early_random() and then mixing it with whatever unpredictability we may have been able to gather so far if the CRNG hasn't been initialized yet. That way something like tsc_early_random() can help, but it can't make things worse than what we have today (excepting the performance delay caused by adding whatever random shite that we hope is enough to introduce unpredictability to the TSC --- for which I still remain very skeptical). - Ted P.S. As I recall hpa@ has talked to some Intel architects internally about how much unpredictability we could really get, and how much of it is just because there's complex state that we can't see (which if we could see, might make it much more predictable), and as I recall they didn't say anyhing definitively; but they were nervous. I'm pretty sure that for Intel architects, the right answer from their perspective is to use RDRAND, and not to play games with the TSC. The other thing to note here is that because Intel has RDRAND, I'm actually not that worried about Intel; all of the drivers/char/random.c will mix in inputs from RDRAND or RDSEED if available. I'm actually much more worried about architectures that don't have a hardware random number generator (e.g., some ARM subarchitectures and MIPS). So while you might be able to come up with something that could work on x86, the real question is it safely generalizable to other, non-x86 architectures. And that's where it gets much more scary.
Re: early x86 unseeded randomness
On Tue, Aug 15, 2017 at 04:42:47PM +0200, Thomas Gleixner wrote: > Care to read the paper? > > We tried that 6 years ago on a wide range of machines from server to stupid > first generation in order ATOM chips. All of them exposed more or less the > same behaviour and passed RND validation tests. Yeah, I read the paper, It points out that in a tight loop, they *did* see real patterns in the TSC values that they read out. But then they did better by adding usleep(delay). This is critical because this was being done in userspace, when presumably there were other processes running (kernel threads, if nothing else). Whether this would still realistic in early boot, when interrupts may not have been fully enabled, and many devices not even probed yet, and when there are no other userspace processes to schedule against --- and where you can't use usleep because the scheduler may not have been initialized yet --- and where udelay is *way* more boring that usleep --- I think is something where you can't rely on that paper. So I think a lot of care is required here. - Ted
Re: early x86 unseeded randomness
On Tue, 15 Aug 2017, Thomas Gleixner wrote: > On Tue, 15 Aug 2017, Theodore Ts'o wrote: > > On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote: > > > > > +u64 __init tsc_early_random(void) > > > > > +{ > > > > > + u64 uninitialized_var(res); > > > > > + int i; > > > > > + > > > > > + if (!boot_cpu_has(X86_FEATURE_TSC)) > > > > > + return res; > > > > > + > > > > > + res ^= rdtsc(); > > > > > + for (i = 0; i < BITS_PER_LONG; i++) { > > > > > + res ^= ((rdtsc() & 0x04) >> 2) << i; > > > > > + udelay(2); > > > > > + } > > > > > + return res; > > > > > +} > > > > Reasons why this is probably not the best idea: > > > > 1) Exactly how udelay is implemented varies from architecture to > > architecture and in some cases is different on a subarchitectural > > level. Some of them rely on reading the TSC; others rely on > > operations that will have a constant number of CPU cycles (e.g., they > > aren't doing much if any operations that might even have a tiny > > glimmer of hope of adding unpredictability). > > That's not really true. You can add random shite instead of udelay(2). The > point of this exercise is to somewhat utilize the instruction pipeline, > which causes the TSC readouts to be not even spread over a the loop and > therefor yield random results. Talking about random shite: memset(foo, 0, sizeof(foo)); res ^= rdtsc(); for (i = 0; i < BITS_PER_LONG; i++) { /* Will never happen ... */ if (memchr_inv(foo, i, sizeof(foo))) continue; res ^= ((rdtsc() & 0x04) >> 2) << i; memset(foo, i, sizeof(foo)); wbinvd(); } return res; That exploits the fact that the CPU and caches run at a different non synchronized clock than the memory controller and therefore the execution time for both the wbinvd() and the memchr_inv() measured in TSC cycles is non constant and random enough for the early boot randomization. Thanks, tglx
Re: early x86 unseeded randomness
On Tue, Aug 15, 2017 at 04:42:47PM +0200, Thomas Gleixner wrote: > I'm not saying it's a replacement for the run time random generator, but > it's exceptionally good and reasonably fast for the early boot > randomization. ... which suffices our purposes. It's not like we're going to generate long-term key pairs with it. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: early x86 unseeded randomness
On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote: > Go ahead. What you actually want to do is to replace boot_cpu_has() with a > real cpuid() check because boot_cpu_has() is not initialized on real early > boot. Right. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: early x86 unseeded randomness
On Tue, 15 Aug 2017, Theodore Ts'o wrote: > On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote: > > > > +u64 __init tsc_early_random(void) > > > > +{ > > > > + u64 uninitialized_var(res); > > > > + int i; > > > > + > > > > + if (!boot_cpu_has(X86_FEATURE_TSC)) > > > > + return res; > > > > + > > > > + res ^= rdtsc(); > > > > + for (i = 0; i < BITS_PER_LONG; i++) { > > > > + res ^= ((rdtsc() & 0x04) >> 2) << i; > > > > + udelay(2); > > > > + } > > > > + return res; > > > > +} > > Reasons why this is probably not the best idea: > > 1) Exactly how udelay is implemented varies from architecture to > architecture and in some cases is different on a subarchitectural > level. Some of them rely on reading the TSC; others rely on > operations that will have a constant number of CPU cycles (e.g., they > aren't doing much if any operations that might even have a tiny > glimmer of hope of adding unpredictability). That's not really true. You can add random shite instead of udelay(2). The point of this exercise is to somewhat utilize the instruction pipeline, which causes the TSC readouts to be not even spread over a the loop and therefor yield random results. > 2) Given a dozen numbers and saying, "hmm, my human brain doesn't see > a problem, it *must* be good" is hardly the basis on which to make > this kind of security design. If you don't understand why this is a > bad idea, and can't come up with a counter example in under 5 seconds, > you probably aren't qualified to be designing a RNG which is supposed > to be cryptographically secure. Care to read the paper? We tried that 6 years ago on a wide range of machines from server to stupid first generation in order ATOM chips. All of them exposed more or less the same behaviour and passed RND validation tests. I'm not saying it's a replacement for the run time random generator, but it's exceptionally good and reasonably fast for the early boot randomization. Thanks, tglx
Re: early x86 unseeded randomness
On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote: > > > +u64 __init tsc_early_random(void) > > > +{ > > > + u64 uninitialized_var(res); > > > + int i; > > > + > > > + if (!boot_cpu_has(X86_FEATURE_TSC)) > > > + return res; > > > + > > > + res ^= rdtsc(); > > > + for (i = 0; i < BITS_PER_LONG; i++) { > > > + res ^= ((rdtsc() & 0x04) >> 2) << i; > > > + udelay(2); > > > + } > > > + return res; > > > +} Reasons why this is probably not the best idea: 1) Exactly how udelay is implemented varies from architecture to architecture and in some cases is different on a subarchitectural level. Some of them rely on reading the TSC; others rely on operations that will have a constant number of CPU cycles (e.g., they aren't doing much if any operations that might even have a tiny glimmer of hope of adding unpredictability). 2) Given a dozen numbers and saying, "hmm, my human brain doesn't see a problem, it *must* be good" is hardly the basis on which to make this kind of security design. If you don't understand why this is a bad idea, and can't come up with a counter example in under 5 seconds, you probably aren't qualified to be designing a RNG which is supposed to be cryptographically secure. 3) Depending on when you use this function in early boot, udelay() might not even have been calibrated yet. 4) As as standalone function, it doesn't take advantage of whatever randomness might have been available; and if it turns out that the tsc_early_random() is trivially predictable by someone spends more time analyizing it for particular target architectures, it could be disastrous 5) If you use this in addition to the existing get_random_u32(), it won't hurt from a cryptographic perspective, but it will end up burning 64-128 microseconds (or potentially more, depending on how udelay is implemented on the architecture/subarch and whether udelay has been calibrated yet). And it's not clear it's really better(tm). 6) It would be the ultimate in irony if Jason, who tried so hard to get this warning in because he hoped it would make Linux more secure, actually ends up making Linux *less* secure because something like this starts getting used - Ted
Re: early x86 unseeded randomness
On Tue, 15 Aug 2017, Borislav Petkov wrote: > On Tue, Aug 15, 2017 at 12:47:36PM +0200, Thomas Gleixner wrote: > > 8<--- > > > > --- a/arch/x86/kernel/tsc.c > > +++ b/arch/x86/kernel/tsc.c > > @@ -1360,3 +1360,19 @@ unsigned long calibrate_delay_is_known(v > > return 0; > > } > > #endif > > + > > +u64 __init tsc_early_random(void) > > +{ > > + u64 uninitialized_var(res); > > + int i; > > + > > + if (!boot_cpu_has(X86_FEATURE_TSC)) > > + return res; > > + > > + res ^= rdtsc(); > > + for (i = 0; i < BITS_PER_LONG; i++) { > > + res ^= ((rdtsc() & 0x04) >> 2) << i; > > + udelay(2); > > + } > > + return res; > > +} > > Something like this is exactly what I was aiming at with my dumb patch. > We could use this for early boot randomness on x86. > > Should I turn it into proper patches or you want to? Go ahead. What you actually want to do is to replace boot_cpu_has() with a real cpuid() check because boot_cpu_has() is not initialized on real early boot. Thanks, tglx
Re: early x86 unseeded randomness
On Tue, Aug 15, 2017 at 12:47:36PM +0200, Thomas Gleixner wrote: > 8<--- > > --- a/arch/x86/kernel/tsc.c > +++ b/arch/x86/kernel/tsc.c > @@ -1360,3 +1360,19 @@ unsigned long calibrate_delay_is_known(v > return 0; > } > #endif > + > +u64 __init tsc_early_random(void) > +{ > + u64 uninitialized_var(res); > + int i; > + > + if (!boot_cpu_has(X86_FEATURE_TSC)) > + return res; > + > + res ^= rdtsc(); > + for (i = 0; i < BITS_PER_LONG; i++) { > + res ^= ((rdtsc() & 0x04) >> 2) << i; > + udelay(2); > + } > + return res; > +} Something like this is exactly what I was aiming at with my dumb patch. We could use this for early boot randomness on x86. Should I turn it into proper patches or you want to? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: early x86 unseeded randomness
On Tue, Aug 15, 2017 at 08:09:50AM -0400, Theodore Ts'o wrote: > On Tue, Aug 15, 2017 at 10:05:46AM +0200, Ingo Molnar wrote: > > > > * Willy Tarreau wrote: > > > > > > > > > > Nowadays we could use similar methods using RDTSC providing more > > > > > accurate > > > > > counting. This doesn't provide a lot of entropy of course, given that > > > > > a > > > > > 2 GHz machine will at most count 31 bits there. But I tend to think > > > > > that > > > > > what matters during early boot is to transform something highly > > > > > predictable > > > > > into something unlikely to be predicted (ie: an exploit having to > > > > > scan 2^31 > > > > > possible addresses will not be really usable). It's also possible to > > > > > do the > > > > > same with the PIT0 counter ticking at 18.2 Hz without any correlation > > > > > with > > > > > the RTC by the way, and roughly provide 25 more bits > > All of this assumes that you have different clock crystals generating > all of these different clock sources. Otherwise there can be a lot > less entropy than you expected. That's what's cool with the RTC, it definitely runs on its own 32kHz crystal since it continues to beat when everything else is powered off. Willy
Re: early x86 unseeded randomness
Ingo Molnar writes: > * Willy Tarreau wrote: > >> Nowadays we could use similar methods using RDTSC providing more accurate >> counting. This doesn't provide a lot of entropy of course, given that a >> 2 GHz machine will at most count 31 bits there. But I tend to think that >> what matters during early boot is to transform something highly predictable >> into something unlikely to be predicted (ie: an exploit having to scan 2^31 >> possible addresses will not be really usable). It's also possible to do the >> same with the PIT0 counter ticking at 18.2 Hz without any correlation with >> the RTC by the way, and roughly provide 25 more bits. And if you expect >> that the BIOS has emitted a 800 Hz beep at boot, you could still have a >> divider of 1491 in PIT2 providing 10 more bits, though with a bit of >> correlation with PIT0 since they use the same 1.19 MHz source. These >> methods increase the boot time by up to one second though, but my point >> here is that when you have nothing it's always a bit better. > > One other thing besides trying to extract entropy via timing would be to > utilize > more of the machine's environment in seeding the random number generator. > > For example on x86 the E820 table is available very early on and its > addresses > could be mixed into the random pool. An external attacker often would not > know the > precise hardware configuration. > > Likewise the boot parameters string could be mixed into the initial random > pool as > well - and this way distributions could create per installation seed simply > by > appending a random number to the boot string. In fact that could be a per-boot seed, if you just re-ran update-grub in the shutdown scripts with a new value. cheers
Re: early x86 unseeded randomness
On Tue, Aug 15, 2017 at 10:05:46AM +0200, Ingo Molnar wrote: > > > * Willy Tarreau wrote: > > > > > > > > Nowadays we could use similar methods using RDTSC providing more > > > > accurate > > > > counting. This doesn't provide a lot of entropy of course, given that a > > > > 2 GHz machine will at most count 31 bits there. But I tend to think that > > > > what matters during early boot is to transform something highly > > > > predictable > > > > into something unlikely to be predicted (ie: an exploit having to scan > > > > 2^31 > > > > possible addresses will not be really usable). It's also possible to do > > > > the > > > > same with the PIT0 counter ticking at 18.2 Hz without any correlation > > > > with > > > > the RTC by the way, and roughly provide 25 more bits All of this assumes that you have different clock crystals generating all of these different clock sources. Otherwise there can be a lot less entropy than you expected. > In practice it's much faster to process the e820 table and the boot > string though - and should already give a way to gain very good > inter-machine randomization. It would also cover odd things like > weird machines such as virtualization environments that often have > nothing in the first 1MB. There is nothing wrong with doing this --- and architecture maintainers are encouraged to call add_device_randomness() on any piece of data which they think might help. That being said, these values don't change much (boot string) or at all (e820 table) between boots, and if you have a large collection of machines that all have the same hardware (say, Dell desktops), and the same distro installed on them (say, Ubuntu), it doesn't give you as much inter-machine randomization as you might expect. And as far as virtualization environments are concerned, the attacker can determine quite easily from the IP address range if a VM is being run out of an Amazon, Google, or Microsoft data center. Finally, in some cases, we need the entropy *very* early --- for example for kASLR, right after we get control from the boot loader. Which is why I believe that we really do need to have the bootloader fetching random seed material from the root file system or some other trusted store to solve the inter-boot randomness problem. Again, if platform code and device drivers want to call add_device_randomness() early in the boot process --- they should absolutely be encouraged to do this. It is a good thing. *AND* it is not sufficient by itself. Cheers, - Ted
Re: early x86 unseeded randomness
On Tue, 15 Aug 2017, Ingo Molnar wrote: > * Willy Tarreau wrote: > > > Nowadays we could use similar methods using RDTSC providing more accurate > > counting. This doesn't provide a lot of entropy of course, given that a > > 2 GHz machine will at most count 31 bits there. But I tend to think that > > what matters during early boot is to transform something highly predictable > > into something unlikely to be predicted (ie: an exploit having to scan 2^31 > > possible addresses will not be really usable). It's also possible to do the > > same with the PIT0 counter ticking at 18.2 Hz without any correlation with > > the RTC by the way, and roughly provide 25 more bits. And if you expect > > that the BIOS has emitted a 800 Hz beep at boot, you could still have a > > divider of 1491 in PIT2 providing 10 more bits, though with a bit of > > correlation with PIT0 since they use the same 1.19 MHz source. These > > methods increase the boot time by up to one second though, but my point > > here is that when you have nothing it's always a bit better. > > One other thing besides trying to extract entropy via timing would be to > utilize > more of the machine's environment in seeding the random number generator. > > For example on x86 the E820 table is available very early on and its > addresses > could be mixed into the random pool. An external attacker often would not > know the > precise hardware configuration. > > Likewise the boot parameters string could be mixed into the initial random > pool as > well - and this way distributions could create per installation seed simply > by > appending a random number to the boot string. > > Both methods should be very fast and robust. Actually using RDTSC is not the worst approach. See: https://lwn.net/images/conf/rtlws11/random-hardware.pdf Below is a stupid implementation of that. Here are the resulting numbers from a dozen of boot cycles: 10a3e7af4890c0ae e7b08c8c18e6d5d9 951e12c77f79e000 ad88753ad11c9b80 db2d4dce466a3da4 b328c76d4e67368d 642edf2265e0c8a7 ef45a9f9326249d0 13e01119498797a6 0a537c8751e0349e eb67c02dc09326dd 037d4b332020538d 793fbfda06718c69 2231535c514769e5 This mechanism could also be used to seed the random generator. Thanks, tglx 8<--- --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1360,3 +1360,19 @@ unsigned long calibrate_delay_is_known(v return 0; } #endif + +u64 __init tsc_early_random(void) +{ + u64 uninitialized_var(res); + int i; + + if (!boot_cpu_has(X86_FEATURE_TSC)) + return res; + + res ^= rdtsc(); + for (i = 0; i < BITS_PER_LONG; i++) { + res ^= ((rdtsc() & 0x04) >> 2) << i; + udelay(2); + } + return res; +}
Re: early x86 unseeded randomness
* Willy Tarreau wrote: > On Tue, Aug 15, 2017 at 09:42:54AM +0200, Ingo Molnar wrote: > > > > * Willy Tarreau wrote: > > > > > Nowadays we could use similar methods using RDTSC providing more accurate > > > counting. This doesn't provide a lot of entropy of course, given that a > > > 2 GHz machine will at most count 31 bits there. But I tend to think that > > > what matters during early boot is to transform something highly > > > predictable > > > into something unlikely to be predicted (ie: an exploit having to scan > > > 2^31 > > > possible addresses will not be really usable). It's also possible to do > > > the > > > same with the PIT0 counter ticking at 18.2 Hz without any correlation with > > > the RTC by the way, and roughly provide 25 more bits. And if you expect > > > that the BIOS has emitted a 800 Hz beep at boot, you could still have a > > > divider of 1491 in PIT2 providing 10 more bits, though with a bit of > > > correlation with PIT0 since they use the same 1.19 MHz source. These > > > methods increase the boot time by up to one second though, but my point > > > here is that when you have nothing it's always a bit better. > > > > One other thing besides trying to extract entropy via timing would be to > > utilize > > more of the machine's environment in seeding the random number generator. > > > > For example on x86 the E820 table is available very early on and its > > addresses > > could be mixed into the random pool. An external attacker often would not > > know the > > precise hardware configuration. > > > > Likewise the boot parameters string could be mixed into the initial random > > pool as > > well - and this way distributions could create per installation seed simply > > by > > appending a random number to the boot string. > > > > Both methods should be very fast and robust. > > Definitely, just like a simple MD5SUM on the first MB of RAM including > the BIOS, and on the CMOS RAM contents, which also differ quite a bit > between systems. In practice it's much faster to process the e820 table and the boot string though - and should already give a way to gain very good inter-machine randomization. It would also cover odd things like weird machines such as virtualization environments that often have nothing in the first 1MB. I.e. 'e820' and 'boot string' are two universally available pieces of environmental data that could be used in a robust and fast fashion. Thanks, Ingo
Re: early x86 unseeded randomness
On Tue, Aug 15, 2017 at 09:42:54AM +0200, Ingo Molnar wrote: > > * Willy Tarreau wrote: > > > Nowadays we could use similar methods using RDTSC providing more accurate > > counting. This doesn't provide a lot of entropy of course, given that a > > 2 GHz machine will at most count 31 bits there. But I tend to think that > > what matters during early boot is to transform something highly predictable > > into something unlikely to be predicted (ie: an exploit having to scan 2^31 > > possible addresses will not be really usable). It's also possible to do the > > same with the PIT0 counter ticking at 18.2 Hz without any correlation with > > the RTC by the way, and roughly provide 25 more bits. And if you expect > > that the BIOS has emitted a 800 Hz beep at boot, you could still have a > > divider of 1491 in PIT2 providing 10 more bits, though with a bit of > > correlation with PIT0 since they use the same 1.19 MHz source. These > > methods increase the boot time by up to one second though, but my point > > here is that when you have nothing it's always a bit better. > > One other thing besides trying to extract entropy via timing would be to > utilize > more of the machine's environment in seeding the random number generator. > > For example on x86 the E820 table is available very early on and its > addresses > could be mixed into the random pool. An external attacker often would not > know the > precise hardware configuration. > > Likewise the boot parameters string could be mixed into the initial random > pool as > well - and this way distributions could create per installation seed simply > by > appending a random number to the boot string. > > Both methods should be very fast and robust. Definitely, just like a simple MD5SUM on the first MB of RAM including the BIOS, and on the CMOS RAM contents, which also differ quite a bit between systems. Willy
Re: early x86 unseeded randomness
* Willy Tarreau wrote: > Nowadays we could use similar methods using RDTSC providing more accurate > counting. This doesn't provide a lot of entropy of course, given that a > 2 GHz machine will at most count 31 bits there. But I tend to think that > what matters during early boot is to transform something highly predictable > into something unlikely to be predicted (ie: an exploit having to scan 2^31 > possible addresses will not be really usable). It's also possible to do the > same with the PIT0 counter ticking at 18.2 Hz without any correlation with > the RTC by the way, and roughly provide 25 more bits. And if you expect > that the BIOS has emitted a 800 Hz beep at boot, you could still have a > divider of 1491 in PIT2 providing 10 more bits, though with a bit of > correlation with PIT0 since they use the same 1.19 MHz source. These > methods increase the boot time by up to one second though, but my point > here is that when you have nothing it's always a bit better. One other thing besides trying to extract entropy via timing would be to utilize more of the machine's environment in seeding the random number generator. For example on x86 the E820 table is available very early on and its addresses could be mixed into the random pool. An external attacker often would not know the precise hardware configuration. Likewise the boot parameters string could be mixed into the initial random pool as well - and this way distributions could create per installation seed simply by appending a random number to the boot string. Both methods should be very fast and robust. Thanks, Ingo
Re: early x86 unseeded randomness
Hi Ted, On Mon, Aug 14, 2017 at 09:31:24PM -0400, Theodore Ts'o wrote: > The real fix is to do what OpenBSD does, which is to teach the > bootloader (e.g., grub) to read from some file such as > /var/lib/urandom/random-seed, and then to have the init scripts > overwrite it with a new set of entropy generated from getrandom(2) as > early as possible. > > It won't solve the CD-ROM install problem (although there is enough > entropy during the install process that after the install is done, we > should be fine, and again, *hopefully* the distro people won't be > stupid enough to generate high-value keys during the installation > process, or certainly not early in the installation process). But it > does solve most of the problem. In my opinion what matters is to combine multiple sources of entropy. I remember in the good old days when I was coding under DOS, I used to build my own random numbers using a phase detection method, counting the time it takes for the RTC to switch to the next second, similar to this : rand: xor edx, edx mov al, 0 out 70h, al in al, 71h mov ah, al count: inc edx in al, 71h cmp al, ah jz count mov eax, edx ret Nowadays we could use similar methods using RDTSC providing more accurate counting. This doesn't provide a lot of entropy of course, given that a 2 GHz machine will at most count 31 bits there. But I tend to think that what matters during early boot is to transform something highly predictable into something unlikely to be predicted (ie: an exploit having to scan 2^31 possible addresses will not be really usable). It's also possible to do the same with the PIT0 counter ticking at 18.2 Hz without any correlation with the RTC by the way, and roughly provide 25 more bits. And if you expect that the BIOS has emitted a 800 Hz beep at boot, you could still have a divider of 1491 in PIT2 providing 10 more bits, though with a bit of correlation with PIT0 since they use the same 1.19 MHz source. These methods increase the boot time by up to one second though, but my point here is that when you have nothing it's always a bit better. Willy
Re: early x86 unseeded randomness
On Mon, Aug 14, 2017 at 09:00:13PM +0200, Borislav Petkov wrote: > On Mon, Aug 14, 2017 at 11:17:37AM -0700, Linus Torvalds wrote: > > Ok, guys, you ALL need to learn that blindly just trying to get rid of > > warnings IS A HORRIBLE IDEA. > > Not blindly - I was actually suggesting/asking whether falling back to > the TSC that early during boot might make more sense than using unseeded > randomness. Especially add the least significant 32 bits to the most > significant i.e., that thing: > > tsc + (tsc << 32UL) > > as they're more unpredictable. Part of the initialization step of the entropy pools and the crng mixes in the jiffies, ktime_get_real() and the TSC. So if you are one of those people who believe in the magic of jitter-rng, we do something somewhat similar to that a part of the initialization. We just don't credit any entropy to that step. > > > But maybe those places that currently trigger the warning should just > > use "get_random_u32()" instead. That at least gets rid of the warning > > if there's a fast architected hardware random thing (ie modern x86). > > Right, that is better, at least for the RDRAND machines. Also note that the CRNG will use RDRAND if it is there; it just doesn't depend on it. The main thing about get_random_u32() is that it has a policy decision embedded into it which is that we believe that RDRAND, or whatever implements arch_get_random_{seed_}{long,int} is trusted. (e.g., that we trust that Intel, Qualcomm, etc., didn't put a back door into CPU-based random number generator, which we basically have to take on faith because there is no way to audit it). I tend to be a bit of pragmatist on these things, which is that I believe it is *unlikely* that the NSA managed to pay off or blackmail Intel into subverting RDRAND. I can't prove it, of course, but if we can get away with not trusting Intel, it's better if we can avoid investing all of our trust in Intel. On the flip side, for very early boot, unless we can do what OpenBSD does and actually embed into the boot loader reading a cryptographic seed from trusted store, it's pretty much hopeless to think we can gather entropy, and we might as well just use RDRAND, and hope for the best. Realistically speaking, the harm done from a bogus stack canary or even a subverted KASLR is very small compared with introducing a wekaness into the keygen of long-term public/private key. And yes, spraying warning around just annoys users and if it inspires them to do dumb(tm) things, it really doesn't help. So suppressing warnings during early boot is probably a good thing, since hopefully people in early boot won't be stupid enough to generate high-value keypairs. :-) > We'd still need a proper fix for the older ones. And I don't see an > easy way to change the init ordering for the stack canary as it gets > setup very very early in start_kernel() vs crng_initialize() being > an early_initcall()... Need to sleep on it. The real fix is to do what OpenBSD does, which is to teach the bootloader (e.g., grub) to read from some file such as /var/lib/urandom/random-seed, and then to have the init scripts overwrite it with a new set of entropy generated from getrandom(2) as early as possible. It won't solve the CD-ROM install problem (although there is enough entropy during the install process that after the install is done, we should be fine, and again, *hopefully* the distro people won't be stupid enough to generate high-value keys during the installation process, or certainly not early in the installation process). But it does solve most of the problem. - Ted
Re: early x86 unseeded randomness
On Mon, Aug 14, 2017 at 11:17:37AM -0700, Linus Torvalds wrote: > Ok, guys, you ALL need to learn that blindly just trying to get rid of > warnings IS A HORRIBLE IDEA. Not blindly - I was actually suggesting/asking whether falling back to the TSC that early during boot might make more sense than using unseeded randomness. Especially add the least significant 32 bits to the most significant i.e., that thing: tsc + (tsc << 32UL) as they're more unpredictable. > But maybe those places that currently trigger the warning should just > use "get_random_u32()" instead. That at least gets rid of the warning > if there's a fast architected hardware random thing (ie modern x86). Right, that is better, at least for the RDRAND machines. We'd still need a proper fix for the older ones. And I don't see an easy way to change the init ordering for the stack canary as it gets setup very very early in start_kernel() vs crng_initialize() being an early_initcall()... Need to sleep on it. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: early x86 unseeded randomness
On Mon, Aug 14, 2017 at 11:00 AM, Borislav Petkov wrote: > On Mon, Aug 14, 2017 at 10:47:47AM -0700, Linus Torvalds wrote: >> Plus on modern x86, you'll always get at least the hardware >> randomness, which is fundamentally much better anyway. > > Right, my only intention was to get rid of those: > > [0.00] random: get_random_bytes called from start_kernel+0x30/0x3d8 > with crng_init=0 Ok, guys, you ALL need to learn that blindly just trying to get rid of warnings IS A HORRIBLE IDEA. People also need to learn that *adding* warnings isn't always a good idea, exactly because then people will mindlessly react to them. I *detest* bad compiler warnings for this reason. The number of garbage patches that actually break working code that I've seen over the year is mind-numbing. > What do you propose? Keep 'em? Keeping the warning (or removing the warning itself without changing the code) is certainly preferable to trying to "fix" the warning by bogus measures, yes. > Or fix the above, snipped bit to conditionally do rdtsc() *once* or > get_random_bytes() depending on the crng state? Neither. Let's aim to make sure to fix the warning the *only* correct way - by making sure the initialization _ordering_ is correct, not by hacking around the caller code. Maybe the warning message should be clarified to say that too. Make it clear that the only acceptable fix is to change code ordering, not to play games with randomness. But maybe those places that currently trigger the warning should just use "get_random_u32()" instead. That at least gets rid of the warning if there's a fast architected hardware random thing (ie modern x86). Linus
Re: early x86 unseeded randomness
On Mon, Aug 14, 2017 at 10:47:47AM -0700, Linus Torvalds wrote: > Plus on modern x86, you'll always get at least the hardware > randomness, which is fundamentally much better anyway. Right, my only intention was to get rid of those: [0.00] random: get_random_bytes called from start_kernel+0x30/0x3d8 with crng_init=0 What do you propose? Keep 'em? Or fix the above, snipped bit to conditionally do rdtsc() *once* or get_random_bytes() depending on the crng state? > So this patch is utter and absolute garbage, and should be shot in the > head and buried very very deep. /me takes out a 44 magnum... > Please immediately delete it from the whole internet. Haha, lemme call a guy. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: early x86 unseeded randomness
On Mon, Aug 14, 2017 at 10:35 AM, Borislav Petkov wrote: > > how about we address that unseeded randomness usage during early boot by > falling back on the TSC on x86? I mean, we already do that for the stack > canary value anyway... That patch is completely broken: > + if (crng_ready()) > + get_random_bytes(&canary, sizeof(canary)); > + else > + canary = rdtsc(); > + > tsc = rdtsc(); > canary += tsc + (tsc << 32UL); So now you do rdtsc() twice, and then add them together. Adding the same value together adds absolutely zero information. Quite the reverse - it just makes the values cancel out and you're shifting away one bit. So the current code that just does an unconditional "get_random_bytes()" and then adds the TSC into it for noise when it's not ranom is actually *objectively* better than that broken crap you just tried. Plus on modern x86, you'll always get at least the hardware randomness, which is fundamentally much better anyway. So this patch is utter and absolute garbage, and should be shot in the head and buried very very deep. Please immediately delete it from the whole internet. Linus
early x86 unseeded randomness
Hi, how about we address that unseeded randomness usage during early boot by falling back on the TSC on x86? I mean, we already do that for the stack canary value anyway... --- diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h index 8abedf1d650e..e636ac6f8418 100644 --- a/arch/x86/include/asm/stackprotector.h +++ b/arch/x86/include/asm/stackprotector.h @@ -71,7 +71,11 @@ static __always_inline void boot_init_stack_canary(void) * there it already has some randomness on most systems. Later * on during the bootup the random pool has true entropy too. */ - get_random_bytes(&canary, sizeof(canary)); + if (crng_ready()) + get_random_bytes(&canary, sizeof(canary)); + else + canary = rdtsc(); + tsc = rdtsc(); canary += tsc + (tsc << 32UL); canary &= CANARY_MASK; diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 3b9e220621f8..859009daf345 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -526,8 +526,8 @@ static void bsp_init_amd(struct cpuinfo_x86 *c) va_align.mask = (upperbit - 1) & PAGE_MASK; va_align.flags= ALIGN_VA_32 | ALIGN_VA_64; - /* A random value per boot for bit slice [12:upper_bit) */ - va_align.bits = get_random_int() & va_align.mask; + /* A pseudo-random value per boot for bit slice [12:upper_bit) */ + va_align.bits = rdtsc() & va_align.mask; } if (cpu_has(c, X86_FEATURE_MWAITX)) diff --git a/drivers/char/random.c b/drivers/char/random.c index 8ad92707e45f..887cca606d7b 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -428,7 +428,6 @@ struct crng_state primary_crng = { * its value (from 0->1->2). */ static int crng_init = 0; -#define crng_ready() (likely(crng_init > 0)) static int crng_init_cnt = 0; #define CRNG_INIT_CNT_THRESH (2*CHACHA20_KEY_SIZE) static void _extract_crng(struct crng_state *crng, @@ -497,6 +496,11 @@ static __u32 const twist_table[8] = { 0x, 0x3b6e20c8, 0x76dc4190, 0x4db26158, 0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 }; +bool crng_ready(void) +{ + return likely(crng_init > 0); +} + /* * This function adds bytes into the entropy "pool". It does not * update the entropy estimate. The caller should call diff --git a/include/linux/random.h b/include/linux/random.h index eafea6a09361..18035ba94e43 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -197,4 +197,6 @@ static inline u32 next_pseudo_random32(u32 seed) return seed * 1664525 + 1013904223; } +extern bool crng_ready(void); + #endif /* _LINUX_RANDOM_H */ -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.