RE: x86/random: Speculation to the rescue
David, On Tue, 1 Oct 2019, David Laight wrote: > From: Linus Torvalds > > Sent: 30 September 2019 17:16 > > > > On Mon, Sep 30, 2019 at 6:16 AM Theodore Y. Ts'o wrote: > > > > > > Which is to say, I'm still worried that people with deep access to the > > > implementation details of a CPU might be able to reverse engineer what > > > a jitter entropy scheme produces. This is why I'd be curious to see > > > the results when someone tries to attack a jitter scheme on a fully > > > open, simple architecture such as RISC-V. > > > > Oh, I agree. > > > > One of the reasons I didn't like some of the other jitter entropy > > things was that they seemed to rely _entirely_ on just purely > > low-level CPU unpredictability. I think that exists, but I think it > > makes for problems for really simple cores. > > > > Timing over a bigger thing and an actual interrupt (even if it's > > "just" a timer interrupt, which is arguably much closer to the CPU and > > has a much higher likelihood of having common frequency domains with > > the cycle counter etc) means that I'm pretty damn convinced that a big > > complex CPU will absolutely see issues, even if it has big caches. > > Agreed, you need something that is actually non-deterministic. > While 'speculation' is difficult to predict, it is actually fully > deterministic. I surely agree with Linus that simple architectures could have a more or less predictable or at least searchable behaviour. If we talk about complex x86 CPUs, I tend to disagree. Even the Intel architects cannot explain why the following scenario is not deterministic at all: Single CPU No NMIs No MCEs No DMAs in the background, nothing. CPU frequency is identical to TSC frequency volatile int foo; local_irq_disable(); start = rdtscp(); for (i = 0; i < 10; i++) foo++; end = rdtscp(); local_irq_enable(); Repeat that loop as often as you wish and observe the end - start delta. You'll see min <= delta <= N * min where N is something >= 2. The actual value of N depends on the micro architecture, but is not identical on two systems and not even identical on the same system after boot and 1e6 iterations of the above. Aside of the fact that N is insane big there is absolutely no pattern in the delta value even over a large number of runs. > Until you get some perturbation from an outside source the cpu state > (including caches and DRAM) is likely to be the same on every boot. See above and read Nicholas paper. It's simply not that likely on anything halfways modern. > For a desktop (etc) PC booting from a disk (even SSD) you'll get some > variation. > Boot an embedded system from onboard flash and every boot could > well be the same (or one of a small number of results). > > Synchronising a signal between frequency domains might generate > some 'randomness', but maybe not if both come from the same PLL. > > Even if there are variations, they may not be large enough to give > a lot of variations in the state. > The variations between systems could also be a lot larger than the > variations within a system. The variations between systems are going to be larger as any minimal tolerance in the components have an influence. But even between two boots on a 'noiseless' embedded system factors like temperature, PLL lock times, swing in times of voltage regulators and other tiny details create non-deterministic behaviour. In my former life as a hardware engineer I had to analyze such issues deeply as they create serious trouble for some application scenarios, but those systems where based on very trivial and truly deterministic silicon parts. No commodity hardware vendor will ever go the extra mile to address these things as the effort required to get them under control is exponential vs. the effect. Whether that's enough to create true entropy is a different question, but I do not agree with the upfront dismissal of a potentially useful entropy source. I'm in no way saying that this should be used as the sole source of entropy, but I definitely want to explore the inherent non-determinism of modern OoO machines further. The results are way too interesting to ignore them and amazingly fast at least with the algorithm which I used in my initial post which started this whole discussion. I let that thing run on a testbox for the last two weeks while I was on vacation and gathered 32768 random bits via that debugfs hack every 100ms, i.e. a total of 1.2e7 samples amounting to ~4e11 bits. The analysis is still running, but so far it holds up. Thanks, tglx
RE: x86/random: Speculation to the rescue
From: Pavel Machek > Sent: 09 October 2019 09:03 > > NAND flash requires ECC so is likely to be async. > > But I2C is clocked from the cpu end - so is fixed. > > RTC i2c may be clocked from the CPU end, but the time source needs to > work when machine is off, so that has a separate crystal for > timekeeping. That only helps if the rtc chip lets you read its internal counters. You get one read of a few bits of 'randomness'. > > Also an embedded system could be booting off a large serial EEPROM. > > These have fixed timings and are clocked from the cpu end. > > Have you seen such system running Linux? You can run Linux on the Nios cpu on an Altera/Intel FPGA. The kernel is likely to be loaded from the same serial eeprom as the FPGA image. I've not personally run such a setup, but there are examples for the dev boards so I assume some people do. I'm not sure I'd want to run Linux on a 100MHz cpu with a slow memory interface. Better finding an fpga with an arm core in the corner! (We do use the Nios cpu - but for standalone code that fits in small internal memory blocks.) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: x86/random: Speculation to the rescue
Hi! > > I have many systems including SoC here, but technology needed for NAND > > flash is different from technology for CPU, so these parts do _not_ > > share a silicon die. They do not even share same package. (Also RTC > > tends to be on separate chip, connected using i2c). > > NAND flash requires ECC so is likely to be async. > But I2C is clocked from the cpu end - so is fixed. RTC i2c may be clocked from the CPU end, but the time source needs to work when machine is off, so that has a separate crystal for timekeeping. > Also an embedded system could be booting off a large serial EEPROM. > These have fixed timings and are clocked from the cpu end. Have you seen such system running Linux? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
RE: x86/random: Speculation to the rescue
From: Pavel Machek > Sent: 07 October 2019 23:18 .. > I have many systems including SoC here, but technology needed for NAND > flash is different from technology for CPU, so these parts do _not_ > share a silicon die. They do not even share same package. (Also RTC > tends to be on separate chip, connected using i2c). NAND flash requires ECC so is likely to be async. But I2C is clocked from the cpu end - so is fixed. Also an embedded system could be booting off a large serial EEPROM. These have fixed timings and are clocked from the cpu end. So until you get any ethernet interface up and can look at receive packet timings there isn't likely to be any randomness at all. A high resolution voltage (or temperature) monitor might generate noise in its LSB - but they don't often have a higher resolution than the stability of the signal. I can't remember (from the start of this thread) why 'speculation' is expected to generate randomness. I'd have thought the loop was deterministic - but you don't know the initial state. More iterations may just be amplifying the initial small randomness - rather than generating extra randomness. So while it gets the system to boot, it hasn't actually done its job. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: x86/random: Speculation to the rescue
On Mon 2019-10-07 07:47:34, Theodore Y. Ts'o wrote: > On Sun, Oct 06, 2019 at 08:21:03PM +0200, Pavel Machek wrote: > > Even without cycle counter... if we _know_ we are trying to generate > > entropy and have MMC available, we don't care about power and > > performance. > > > > So we can just... > > > >issue read request on MMC > >while (!interrupt_done) > > i++ > > > > ...and then use i++ as poor man's version of cycle counter. > > I suggest that you try that and see how much uncertainty you really > have before you assume that this is actually going to work. Again, on > "System on a Chip" systems, there is very likely a single master > oscillator, and the eMMC is going to be on the the same silicon die as > the CPU. At least for spinning rust platters it's on a physically I have many systems including SoC here, but technology needed for NAND flash is different from technology for CPU, so these parts do _not_ share a silicon die. They do not even share same package. (Also RTC tends to be on separate chip, connected using i2c). Would you have an example of Linux-capable system where eMMC is on same chip as CPU? > P.S. Note that this Don Davis's paper[1] claims that they were able > to extract 100 independent unpredictable bits per _minute_. Given > that modern init scripts want us to be able to boot in under a few Well, for now I'm arguing that it is possible to gather entropy, not neccessarily that it is going to be fast. Still waiting minute and a half during boot is better than generating non-random keys. Linux already credits interrupts with some entropy, so all I really need to do is generate some interrupts. And "find /" generates lots of those on embedded systems. (Even with nfsroot as long as network card is not being polled...) Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: x86/random: Speculation to the rescue
On Sun, Oct 06, 2019 at 08:21:03PM +0200, Pavel Machek wrote: > Even without cycle counter... if we _know_ we are trying to generate > entropy and have MMC available, we don't care about power and > performance. > > So we can just... > >issue read request on MMC >while (!interrupt_done) >i++ > > ...and then use i++ as poor man's version of cycle counter. I suggest that you try that and see how much uncertainty you really have before you assume that this is actually going to work. Again, on "System on a Chip" systems, there is very likely a single master oscillator, and the eMMC is going to be on the the same silicon die as the CPU. At least for spinning rust platters it's on a physically separate peripheral, and there was a physical claim about how chaotic air turbulence might add enough uncertainty that timing HDD reads could be unpredictable (although note that the paper[1] which claimed this was written 25 years ago, and HDD's have had to get more precise, not less, in order to cram more bits into the same squire millimeter). more.) [1] D. Davis, R. Ihaka, P.R. Fenstermacher, "Cryptographic Randomness from Air Turbulence in Disk Drives", in Advances in Cryptology -- CRYPTO '94 Conference Proceedings, edited by Yvo G. Desmedt, pp.114--120. Lecture Notes in Computer Science #839. Heidelberg: Springer-Verlag, 1994 But for a flash device that is on the same silicon as the CPU, and driven off of the same clock crystal, and where you are doing *reads*, so you can't even depend on SSD GC and uncertainties in programming the flash cell --- I'm going to be dubious. If someone wants to use this as a last ditch hope, then we can let systemd do this, where there's at least a bit more uncertainty in userspace, and maybe you could be doing something useful, like running fsck on the stateful partitions of the system. Ultimately, though, we really want to try to get a hardware RNG, and for that matter, a cryptographic secure element built into these devices, and the sooner we can pound it into CPU manufacturers' heads that not doing this should be professional malpractice, the better. - Ted P.S. Note that this Don Davis's paper[1] claims that they were able to extract 100 independent unpredictable bits per _minute_. Given that modern init scripts want us to be able to boot in under a few seconds, and we need at least 128 independent bits to securely initialize the CRNG, people who think that we can get secure random bits suitable for long-term cryptographic keys (say, such as initializing the host's SSH key) during early boot based only on HDD timings may be a bit optimistic. P.P.S. I actually knew Don; he was at MIT Project Athena as a staff member at the same time I was working on Kerberos as my day job, and Linux's e2fsprogs, /dev/random, etc. as a hobby...
Re: x86/random: Speculation to the rescue
On Sun, Oct 6, 2019 at 11:21 AM Pavel Machek wrote: > > > Even without cycle counter... if we _know_ we are trying to generate > entropy and have MMC available, we don't care about power and > performance. > > So we can just... > >issue read request on MMC >while (!interrupt_done) > i++ > > ...and then use i++ as poor man's version of cycle counter. > > [We would not want to do that in normal operation, for obvious > reasons, just when userland is blocked and waiting for entropy.] > > Hmm? I hate it, but it might be worth it for the existing timer thing alternative when we don't have a cycle counter. Then we'd have _something_ for those bad embedded devices. I still absolutely hate the idea of doing disk IO for this. Linus
Re: x86/random: Speculation to the rescue
On Sun 2019-10-06 11:06:38, Linus Torvalds wrote: > On Sun, Oct 6, 2019 at 10:35 AM Pavel Machek wrote: > > > > It will not: boot is now halted because systemd wants some > > entropy. Everything is idle and very little interrupts are > > happening. We have spinning rust, but it is idle, and thus not > > generating any interrupts. > > Yes, but we have that problem now solved. > > Except on embedded platforms that have garbage CPU's without even a > cycle counter. > > But those won't have spinning rust anyway. > > Yes, bad SSD's and MMC disks (that they do have) will generate timing > noise too, but in the absense of a cycle counter, that noise won't be > much use. Even without cycle counter... if we _know_ we are trying to generate entropy and have MMC available, we don't care about power and performance. So we can just... issue read request on MMC while (!interrupt_done) i++ ...and then use i++ as poor man's version of cycle counter. [We would not want to do that in normal operation, for obvious reasons, just when userland is blocked and waiting for entropy.] Hmm? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: x86/random: Speculation to the rescue
On Sun, Oct 6, 2019 at 10:35 AM Pavel Machek wrote: > > It will not: boot is now halted because systemd wants some > entropy. Everything is idle and very little interrupts are > happening. We have spinning rust, but it is idle, and thus not > generating any interrupts. Yes, but we have that problem now solved. Except on embedded platforms that have garbage CPU's without even a cycle counter. But those won't have spinning rust anyway. Yes, bad SSD's and MMC disks (that they do have) will generate timing noise too, but in the absense of a cycle counter, that noise won't be much use. Linus
Re: x86/random: Speculation to the rescue
On Sun, Oct 6, 2019 at 4:41 AM Pavel Machek wrote: > > Should we have some kind of notifier chain, so that we could utilize > better random sources (spinning rust) if we had them? The spinning rust will get entropy on its own just thanks to the regular interrupt stuff. And the kernel tryin gto do IO is a bad idea. Plus I think it's kind of pointless to do anythign at all for things like spinning rust in this day and age. It's no longer relevant, and never really was in the area where this was a problem. Also, I don't really like the notion of random (sic) notifiers that different drivers or things could attach to this thing. People will disagree about how much entropy it has anyway, and I'd rather have _one_ clear implementation that people can look at and comment on and try to actually write an academic paper on and suggest improvements to, than some generic "entropy notifier interface" that then gets whatever input somebody decides is appropriate. We already have interfaces for "I think I have interesting data": add_interrupt_randomness(), add_device_randomness(), add_hwgenerator_randomness() are all for different sources of entropy. Linus
Re: x86/random: Speculation to the rescue
On Sun 2019-10-06 10:26:18, Linus Torvalds wrote: > On Sun, Oct 6, 2019 at 4:41 AM Pavel Machek wrote: > > > > Should we have some kind of notifier chain, so that we could utilize > > better random sources (spinning rust) if we had them? > > The spinning rust will get entropy on its own just thanks to the > regular interrupt stuff. And the kernel tryin gto do IO is a bad > idea. It will not: boot is now halted because systemd wants some entropy. Everything is idle and very little interrupts are happening. We have spinning rust, but it is idle, and thus not generating any interrupts. > Plus I think it's kind of pointless to do anythign at all for things > like spinning rust in this day and age. It's no longer relevant, and > never really was in the area where this was a problem. > > Also, I don't really like the notion of random (sic) notifiers that > different drivers or things could attach to this thing. People will > disagree about how much entropy it has anyway, and I'd rather have > _one_ clear implementation that people can look at and comment on and > try to actually write an academic paper on and suggest improvements > to, than some generic "entropy notifier interface" that then gets > whatever input somebody decides is appropriate. > > We already have interfaces for "I think I have interesting data": > add_interrupt_randomness(), add_device_randomness(), > add_hwgenerator_randomness() are all for different sources of > entropy. I'm not suggesting the notifier would invent some entropy... I agree that kernel doing IO is strange, but I'm suggesting just that: if userspace is blocked waiting for entropy, do some I/O, and let interrupt randomness do its job. It will work great on spinning rust. It will also work somehow on SSDs and SD cards etc, because they have separate CPUs these days. They'll certainly generate some interrupts, and we already assign some randomness to that... It will let the machine boot, and entropy calculation rules do not need to change. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: x86/random: Speculation to the rescue
Hi! > Entropy really is hard. It's hard to generate, and it's hard to measure. It is not hard to generate... not on PC, not on most machines. "find /" can generate plenty of entropy... certainly on any PC. But it does not work everywhere, and we may need other methods of generating entropy on some very embedded systems... So IMO we shold have generic driver for PC-like machines and specific drivers for the embedded stuff that needs it. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: x86/random: Speculation to the rescue
Hi! On Sat 2019-09-28 16:53:52, Linus Torvalds wrote: > On Sat, Sep 28, 2019 at 3:24 PM Thomas Gleixner wrote: > > > > Nicholas presented the idea to (ab)use speculative execution for random > > number generation years ago at the Real-Time Linux Workshop: > > What you describe is just a particularly simple version of the jitter > entropy. Not very reliable. > > But hey, here's a made-up patch. It basically does jitter entropy, but > it uses a more complex load than the fibonacci LFSR folding: it calls > "schedule()" in a loop, and it sets up a timer to fire. > > And then it mixes in the TSC in that loop. > > And to be fairly conservative, it then credits one bit of entropy for > every timer tick. Not because the timer itself would be all that > unpredictable, but because the interaction between the timer and the > loop is going to be pretty damn unpredictable. > > Ok, I'm handwaving. But I do claim it really is fairly conservative to > think that a cycle counter would give one bit of entropy when you time > over a timer actually happening. The way that loop is written, we do > guarantee that we'll mix in the TSC value both before and after the > timer actually happened. We never look at the difference of TSC > values, because the mixing makes that uninteresting, but the code does > start out with verifying that "yes, the TSC really is changing rapidly > enough to be meaningful". > > So if we want to do jitter entropy, I'd much rather do something like > this that actually has a known fairly complex load with timers and > scheduling. > +/* > + * If we have an actual cycle counter, see if we can > + * generate enough entropy with timing noise > + */ > +static void try_to_generate_entropy(void) > +{ > + struct { > + unsigned long now; > + struct timer_list timer; > + } stack; Should we have some kind of notifier chain, so that we could utilize better random sources (spinning rust) if we had them? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: x86/random: Speculation to the rescue
On Tue, Oct 01, 2019 at 06:15:02PM +0200, Ahmed S. Darwish wrote: > > Using the "ent" tool, [2] also used to test randomness in the Stephen > Müller LRNG paper, on a 50-byte file, produced the following > results: The "ent" tool is really, really useless. If you take any CRNG, even intialized with a known seed, "ent" will say that it's *GREAT*! If you don't believe me, disable all entropy inputs into the CRNG, initialize it with "THE NSA IS OUR LORD AND MASTER", and then run it. You'll get substantially the same results. (And if we didn't the Cha Cha 20 encryption algorithm would be totally broken). - Ted
Re: x86/random: Speculation to the rescue
On Tue, Oct 1, 2019 at 9:15 AM Ahmed S. Darwish wrote: > > To test the quality of the new jitter code, I added a small patch on > top to disable all other sources of randomness except the new jitter > entropy code, [1] and made quick tests on the quality of getrandom(0). You also need to make sure to disable rdrand. Even if we don't trust it, we always mix it in. > Using the "ent" tool, [2] also used to test randomness in the Stephen > Müller LRNG paper, on a 50-byte file, produced the following > results: Entropy is hard to estimate, for roughly the same reasons it's hard to generate. The entropy estimation is entirely bvroken by the whitening we do: first we do the LFSR to mix things into the pools, then we whiten it when we mix it between the input pool and the final pool, and then we whiten it once more when we extract it when reading. So the end result of urandom will look random to all the entropy tools regardless of what the starting point is. Because we use good hashes for whitening, and do all the updating of the pools while extracing, the end result had better look perfect. The only way to even make an educated estimate of actual entropy would be to print out the raw state of the input pool when we do that "crng init done". And then you would have to automate some "reboot machine thousands of times" and start looking for patterns. And even then you'd only have a few thousand starting points that we _claim_ have at least 128 bits of entropy in, and you'd have a really hard time to prove that is the case. You might prove that we are doing something very very wrong and don't have even remotely close to 128 bits of randomness, but just 5 bits of actual entropy or whatever - _that_ kind of pattern is easy to see. But even then /dev/urandom as a _stream_ should look fine. Only the (multiple, repeated) initial states in the input pool would show the lack of entropy. And you'd really have to reboot things for real. And not in a VM either. Just repeating the entropy initialization wouldn't show the pattern (unless it's even more broken) because the base TSC values would be changing. Entropy really is hard. It's hard to generate, and it's hard to measure. Linus
Re: x86/random: Speculation to the rescue
On Tue, Oct 01, 2019 at 09:37:39AM -0700, Kees Cook wrote: > On Tue, Oct 01, 2019 at 06:15:02PM +0200, Ahmed S. Darwish wrote: > > On Sat, Sep 28, 2019 at 04:53:52PM -0700, Linus Torvalds wrote: > > > Ahmed - would you be willing to test this on your problem case (with > > > the ext4 optimization re-enabled, of course)? > > > > > > > So I pulled the patch and the revert of the ext4 revert as they're all > > now merged in master. It of course made the problem go away... > > > > To test the quality of the new jitter code, I added a small patch on > > top to disable all other sources of randomness except the new jitter > > entropy code, [1] and made quick tests on the quality of getrandom(0). > > > > Using the "ent" tool, [2] also used to test randomness in the Stephen > > Müller LRNG paper, on a 50-byte file, produced the following > > results: > > > > $ ent rand-file > > > > Entropy = 7.999625 bits per byte. > > > > Optimum compression would reduce the size of this 50 byte file > > by 0 percent. > > > > Chi square distribution for 50 samples is 259.43, and randomly > > would exceed this value 41.11 percent of the times. > > > > Arithmetic mean value of data bytes is 127.4085 (127.5 = random). > > > > Monte Carlo value for Pi is 3.148476594 (error 0.22 percent). > > > > Serial correlation coefficient is 0.001740 (totally uncorrelated = 0.0). > > > > As can be seen above, everything looks random, and almost all of the > > statistical randomness tests matched the same kernel without the > > "jitter + schedule()" patch added (after getting it un-stuck). > > Can you post the patch for [1]? > Yup, it was the quick&dirty patch below: (discussion continues after the patch) diff --git a/drivers/char/random.c b/drivers/char/random.c index c2f7de9dc543..26d0d2bb3337 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1177,6 +1177,8 @@ struct timer_rand_state { */ void add_device_randomness(const void *buf, unsigned int size) { + return; + unsigned long time = random_get_entropy() ^ jiffies; unsigned long flags; @@ -1205,6 +1207,8 @@ static struct timer_rand_state input_timer_state = INIT_TIMER_RAND_STATE; */ static void add_timer_randomness(struct timer_rand_state *state, unsigned num) { + return; + struct entropy_store*r; struct { long jiffies; @@ -1255,6 +1259,8 @@ static void add_timer_randomness(struct timer_rand_state *state, unsigned num) void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) { + return; + static unsigned char last_value; /* ignore autorepeat and the like */ @@ -1308,6 +1314,8 @@ static __u32 get_reg(struct fast_pool *f, struct pt_regs *regs) void add_interrupt_randomness(int irq, int irq_flags) { + return; + struct entropy_store*r; struct fast_pool*fast_pool = this_cpu_ptr(&irq_randomness); struct pt_regs *regs = get_irq_regs(); @@ -1375,6 +1383,8 @@ EXPORT_SYMBOL_GPL(add_interrupt_randomness); #ifdef CONFIG_BLOCK void add_disk_randomness(struct gendisk *disk) { + return; + if (!disk || !disk->random) return; /* first major is 1, so we get >= 0x200 here */ @@ -2489,6 +2499,8 @@ randomize_page(unsigned long start, unsigned long range) void add_hwgenerator_randomness(const char *buffer, size_t count, size_t entropy) { + return; + struct entropy_store *poolp = &input_pool; if (unlikely(crng_init == 0)) { @@ -2515,9 +2527,11 @@ EXPORT_SYMBOL_GPL(add_hwgenerator_randomness); */ void add_bootloader_randomness(const void *buf, unsigned int size) { + return; + if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER)) add_hwgenerator_randomness(buf, size, size * 8); else > Another test we should do is the > multi-boot test. Testing the stream (with ent, or with my dieharder run) > is mainly testing the RNG algo. I'd like to see if the first 8 bytes > out of the kernel RNG change between multiple boots of the same system. > e.g. read the first 8 bytes, for each of 10 boots, and feed THAT > byte "stream" into ent... > Oh, indeed, that's an excellent point... I'll prototype this and come back. thanks, -- Ahmed Darwish
Re: x86/random: Speculation to the rescue
On Tue, Oct 1, 2019 at 6:51 AM Borislav Petkov wrote: > > So when I add this by hand and do git diff, it adds a second hunk: You and me both. We both have editors that always add line-endings, and right now that file doesn't have a newline at the end of the file thanks to commit 428826f5358c ("fdt: add support for rng-seed"). > and I kinda get what it is trying to tell me but this is new. And when I > do > > $ xxd drivers/char/random.c > .. > > 000125e0: 646f 6d6e 6573 7329 3b0a domness);. > > there's a 0xa at the end so what's git really trying to tell me? The previous state of the file didn't have that 0xa at the end, so you get that -EXPORT_SYMBOL_GPL(add_bootloader_randomness); \ No newline at end of file +EXPORT_SYMBOL_GPL(add_bootloader_randomness); which is "the '-' line doesn't have a newline, the '+' line does" marker. > > Doing something like the above to /dev/urandom is likely the right > > thing to do eventually, but I didn't want to mix up "we can perhaps > > improve the urandom situation too" with the basic "let's fix the boot > > problem". The urandom behavior change would be a separate thing. > > So make it a separate patch and let's hammer on it during the next weeks > and see what happens? Yeah, probably. > > Also, talking about "future changes". Right now > > "try_to_generate_entropy()" is actually uninterruptible once it gets > > started. I think we should add a test for signal_pending() too, but it > > Wouldn't that even increase its entropy, which would be a good thing? I'm not sure it increases it, but it certainly shouldn't make it any worse. Linus
Re: x86/random: Speculation to the rescue
On Tue, Oct 01, 2019 at 06:15:02PM +0200, Ahmed S. Darwish wrote: > On Sat, Sep 28, 2019 at 04:53:52PM -0700, Linus Torvalds wrote: > > Ahmed - would you be willing to test this on your problem case (with > > the ext4 optimization re-enabled, of course)? > > > > So I pulled the patch and the revert of the ext4 revert as they're all > now merged in master. It of course made the problem go away... > > To test the quality of the new jitter code, I added a small patch on > top to disable all other sources of randomness except the new jitter > entropy code, [1] and made quick tests on the quality of getrandom(0). > > Using the "ent" tool, [2] also used to test randomness in the Stephen > Müller LRNG paper, on a 50-byte file, produced the following > results: > > $ ent rand-file > > Entropy = 7.999625 bits per byte. > > Optimum compression would reduce the size of this 50 byte file > by 0 percent. > > Chi square distribution for 50 samples is 259.43, and randomly > would exceed this value 41.11 percent of the times. > > Arithmetic mean value of data bytes is 127.4085 (127.5 = random). > > Monte Carlo value for Pi is 3.148476594 (error 0.22 percent). > > Serial correlation coefficient is 0.001740 (totally uncorrelated = 0.0). > > As can be seen above, everything looks random, and almost all of the > statistical randomness tests matched the same kernel without the > "jitter + schedule()" patch added (after getting it un-stuck). Can you post the patch for [1]? Another test we should do is the multi-boot test. Testing the stream (with ent, or with my dieharder run) is mainly testing the RNG algo. I'd like to see if the first 8 bytes out of the kernel RNG change between multiple boots of the same system. e.g. read the first 8 bytes, for each of 10 boots, and feed THAT byte "stream" into ent... -- Kees Cook
Re: x86/random: Speculation to the rescue
Hi, Sorry for the late reply as I'm also on vacation this week :-) On Sat, Sep 28, 2019 at 04:53:52PM -0700, Linus Torvalds wrote: > On Sat, Sep 28, 2019 at 3:24 PM Thomas Gleixner wrote: > > > > Nicholas presented the idea to (ab)use speculative execution for random > > number generation years ago at the Real-Time Linux Workshop: > > What you describe is just a particularly simple version of the jitter > entropy. Not very reliable. > > But hey, here's a made-up patch. It basically does jitter entropy, but > it uses a more complex load than the fibonacci LFSR folding: it calls > "schedule()" in a loop, and it sets up a timer to fire. > > And then it mixes in the TSC in that loop. > > And to be fairly conservative, it then credits one bit of entropy for > every timer tick. Not because the timer itself would be all that > unpredictable, but because the interaction between the timer and the > loop is going to be pretty damn unpredictable. > > Ok, I'm handwaving. But I do claim it really is fairly conservative to > think that a cycle counter would give one bit of entropy when you time > over a timer actually happening. The way that loop is written, we do > guarantee that we'll mix in the TSC value both before and after the > timer actually happened. We never look at the difference of TSC > values, because the mixing makes that uninteresting, but the code does > start out with verifying that "yes, the TSC really is changing rapidly > enough to be meaningful". > > So if we want to do jitter entropy, I'd much rather do something like > this that actually has a known fairly complex load with timers and > scheduling. > > And even if absolutely no actual other process is running, the timer > itself is still going to cause perturbations. And the "schedule()" > call is more complicated than the LFSR is anyway. > > It does wait for one second the old way before it starts doing this. > > Whatever. I'm entirely convinced this won't make everybody happy > anyway, but it's _one_ approach to handle the issue. > > Ahmed - would you be willing to test this on your problem case (with > the ext4 optimization re-enabled, of course)? > So I pulled the patch and the revert of the ext4 revert as they're all now merged in master. It of course made the problem go away... To test the quality of the new jitter code, I added a small patch on top to disable all other sources of randomness except the new jitter entropy code, [1] and made quick tests on the quality of getrandom(0). Using the "ent" tool, [2] also used to test randomness in the Stephen Müller LRNG paper, on a 50-byte file, produced the following results: $ ent rand-file Entropy = 7.999625 bits per byte. Optimum compression would reduce the size of this 50 byte file by 0 percent. Chi square distribution for 50 samples is 259.43, and randomly would exceed this value 41.11 percent of the times. Arithmetic mean value of data bytes is 127.4085 (127.5 = random). Monte Carlo value for Pi is 3.148476594 (error 0.22 percent). Serial correlation coefficient is 0.001740 (totally uncorrelated = 0.0). As can be seen above, everything looks random, and almost all of the statistical randomness tests matched the same kernel without the "jitter + schedule()" patch added (after getting it un-stuck). Thanks! [1] Nullified add_{device,timer,input,interrupt,disk,.*}_randomness() [2] http://www.fourmilab.ch/random/ -- Ahmed Darwish
Re: x86/random: Speculation to the rescue
On Mon, Sep 30, 2019 at 09:06:36AM -0700, Linus Torvalds wrote: > Obviously, that can be a problem if you then need sshd in order to get > into a headless box, so my patch fixes things for you too, but at > least your box doesn't show the problem that Ahmed had, and the boot > completing presumably means that you got more entropy from other disk > IO being done by the rest of the boot. Right, another observation I did was that when it would wait for entropy, if I press random keys, it would get done faster because apparently it would collect entropy from the key presses too. > If you want to test my hacky "do /dev/urandom too", it was this one-liner: > > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -2027,6 +2027,7 @@ urandom_read(struct file *file, char __user > *buf, size_t nbytes, loff_t *ppos) > static int maxwarn = 10; > int ret; > > + if (!crng_ready()) try_to_generate_entropy(); > if (!crng_ready() && maxwarn > 0) { > maxwarn--; > if (__ratelimit(&urandom_warning)) > > and that should get rid of the warnings. So when I add this by hand and do git diff, it adds a second hunk: --- diff --git a/drivers/char/random.c b/drivers/char/random.c index c2f7de9dc543..93bad17bef98 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2027,6 +2027,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) static int maxwarn = 10; int ret; + if (!crng_ready()) try_to_generate_entropy(); if (!crng_ready() && maxwarn > 0) { maxwarn--; if (__ratelimit(&urandom_warning)) @@ -2520,4 +2521,4 @@ void add_bootloader_randomness(const void *buf, unsigned int size) else add_device_randomness(buf, size); } -EXPORT_SYMBOL_GPL(add_bootloader_randomness); \ No newline at end of file +EXPORT_SYMBOL_GPL(add_bootloader_randomness); --- and I kinda get what it is trying to tell me but this is new. And when I do $ xxd drivers/char/random.c .. 000125e0: 646f 6d6e 6573 7329 3b0a domness);. there's a 0xa at the end so what's git really trying to tell me? Anyway, that does get rid of the warns too. > Doing something like the above to /dev/urandom is likely the right > thing to do eventually, but I didn't want to mix up "we can perhaps > improve the urandom situation too" with the basic "let's fix the boot > problem". The urandom behavior change would be a separate thing. So make it a separate patch and let's hammer on it during the next weeks and see what happens? > Also, talking about "future changes". Right now > "try_to_generate_entropy()" is actually uninterruptible once it gets > started. I think we should add a test for signal_pending() too, but it Wouldn't that even increase its entropy, which would be a good thing? > should generally complete really fairly quickly so I left it without > one just to see if anybody even notices. Right. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
RE: x86/random: Speculation to the rescue
From: Linus Torvalds > Sent: 30 September 2019 17:16 > > On Mon, Sep 30, 2019 at 6:16 AM Theodore Y. Ts'o wrote: > > > > Which is to say, I'm still worried that people with deep access to the > > implementation details of a CPU might be able to reverse engineer what > > a jitter entropy scheme produces. This is why I'd be curious to see > > the results when someone tries to attack a jitter scheme on a fully > > open, simple architecture such as RISC-V. > > Oh, I agree. > > One of the reasons I didn't like some of the other jitter entropy > things was that they seemed to rely _entirely_ on just purely > low-level CPU unpredictability. I think that exists, but I think it > makes for problems for really simple cores. > > Timing over a bigger thing and an actual interrupt (even if it's > "just" a timer interrupt, which is arguably much closer to the CPU and > has a much higher likelihood of having common frequency domains with > the cycle counter etc) means that I'm pretty damn convinced that a big > complex CPU will absolutely see issues, even if it has big caches. Agreed, you need something that is actually non-deterministic. While 'speculation' is difficult to predict, it is actually fully deterministic. Until you get some perturbation from an outside source the cpu state (including caches and DRAM) is likely to be the same on every boot. For a desktop (etc) PC booting from a disk (even SSD) you'll get some variation. Boot an embedded system from onboard flash and every boot could well be the same (or one of a small number of results). Synchronising a signal between frequency domains might generate some 'randomness', but maybe not if both come from the same PLL. Even if there are variations, they may not be large enough to give a lot of variations in the state. The variations between systems could also be a lot larger than the variations within a system. If there are 'only' 2^32 variations an exhaustive search might be possible to find an ssh key. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: x86/random: Speculation to the rescue
Why not get entropy from the white noise that can be obtained from any attached ADC? Audio cards, some SBCs, and microcontrollers all have ADCs. Not that I'm familiar with when the kernel first needs entropy or an expert in the field. Thanks - This free account was provided by VFEmail.net - report spam to ab...@vfemail.net ONLY AT VFEmail! - Use our Metadata Mitigator to keep your email out of the NSA's hands! $24.95 ONETIME Lifetime accounts with Privacy Features! 15GB disk! No bandwidth quotas! Commercial and Bulk Mail Options!
Re: x86/random: Speculation to the rescue
On Mon, Sep 30, 2019 at 08:10:15AM +0200, Borislav Petkov wrote: > On Sun, Sep 29, 2019 at 07:59:19PM -0700, Linus Torvalds wrote: > > All my smoke testing looked fine - I disabled trusting the CPU, I > > increased the required entropy a lot, and to actually trigger the > > lockup issue without the broken user space, I made /dev/urandom do > > that "wait for entropy" thing too. > > Hohum, seems to get rid of the longish delay during boot on my test > boxes here: Yes; for me too. This makes a huge difference in my ARM emulation environments (where I wasn't using virtio-rng-device). Those VMs were very boot entropy starved -- I was waiting minutes for sshd to start. I doubt running something like dieharder on urandom would actually show any deficiencies, but I've started a test up anyway. I'll yell in a few hours if it actually has something bad to say. :) -- Kees Cook
Re: x86/random: Speculation to the rescue
On Mon, Sep 30, 2019 at 9:32 AM Peter Zijlstra wrote: > > In my experience LFSRs are good at defeating branch predictors, which > would make even in-order cores suffer lots of branch misses. And that > might be enough, maybe. Agreed, branch mis-prediction is likely fairly hard to take into account ahead of time even in an in-order CPU. But when you know the LFSR, and you know the architecture, you could just re-create the timing, and have a fairly high chance of getting the same complex pattern. And in the simple enough (ie bad) case - the embedded world - you don't need to "know" or do any deep analysis of anything or try to predict it ahead of time. You just look at what another identical machine does when given the identical starting point. So I don't think an LFSR is all that great on its own. It's complicated to predict, and it gives odd patterns, but on an in-order core I'm not convinced it gives sufficiently _different_ odd patterns across booting. This, btw, is why you shouldn't trust the "I ran the thing a billion times" on my PC, even if you were to have an old in-order Atom CPU available to you. If you didn't restart the whole CPU state from an identical starting point as you re-run them, the differences you see may simply not be real. They may be an artificial effect of cumulative changes to internal CPU branch prediction arrays and cache tag layout. I don't think it's a huge issue if you have a real load, and you have _any_ source of entropy at all, but I really do not think that an LFSR is necessarily a good idea. It's just _too_ identical across reboots, and will have very similar (but yes, complex due to branch prediction) behavior across different runs. Of course, in the "completely and utterly identical state and absolutely no timing differences anywhere" situation, even my "take timer interrupts and force at least cache misses on SMP" model doesn't protect you from just re-running the 100% identical sequence. But when it's a more complex load than an LFSR, I personally at least feel better about it. An LFSR I can well imagine will give the exact same (odd) timing patterns across boots even if there were earlier minor changes. But hopefully a bigger load with just a more complex footprint will have more of that. More cache misses, more DRAM accesses, more branch mispredicts, more "pipeline was broken in a _slightly_ different place due to timer". It is also, btw, why I don't mix in TSC _differences_ when I mix things in. I think it's better to actually mix in the TSC value itself. Even if you re-run the LFSR, and it has the exact same branch mis-predicts (because it's the same LFSR), if there were any timing differences from _anything_ else before you ran that LFSR, then the bits you'll be mixing in are different across boots. But if you mix in the relative difference, you might be mixing in the identical bits. The only real difference is only the initial TSC value, of course, so the added entropy is small. But when we're talking about trying to get to a total of 256 bits, a couple of bits here and there end up mattering. But no. Never any _guarantees_. There is no absolute security. Only best effort. An OoO CPU will have a _lot_ more internal state, and a lot of things that perturb that internal state, and that will make small changes in timing cause more chaos in the end. Much less to worry about. An in-order CPU will have less internal state, and so less perturbations and sources of real entropy from small differences. We can only hope there is _some_. It's not like our existing "depend on external interrupt timing" is any hard guarantee either, regardless of how long we wait or how many external interrupts we'd get. It's always a "at some point you have to make a judgement call". And we all have different levels of comfort about where that point ends up being. Linus
Re: x86/random: Speculation to the rescue
On Mon, Sep 30, 2019 at 09:15:55AM -0700, Linus Torvalds wrote: > On Mon, Sep 30, 2019 at 6:16 AM Theodore Y. Ts'o wrote: > But it _also_ means that if you have a small and excessively stupid > in-order CPU, I can almost guarantee that you will at least have cache > misses likely all the way out to memory. So a CPU-only loop like the > LFSR thing that Thomas reports generates entropy even on its own would > likely generate nothing at all on a simple in-order core - but I do In my experience LFSRs are good at defeating branch predictors, which would make even in-order cores suffer lots of branch misses. And that might be enough, maybe.
Re: x86/random: Speculation to the rescue
On Mon, Sep 30, 2019 at 6:16 AM Theodore Y. Ts'o wrote: > > Which is to say, I'm still worried that people with deep access to the > implementation details of a CPU might be able to reverse engineer what > a jitter entropy scheme produces. This is why I'd be curious to see > the results when someone tries to attack a jitter scheme on a fully > open, simple architecture such as RISC-V. Oh, I agree. One of the reasons I didn't like some of the other jitter entropy things was that they seemed to rely _entirely_ on just purely low-level CPU unpredictability. I think that exists, but I think it makes for problems for really simple cores. Timing over a bigger thing and an actual interrupt (even if it's "just" a timer interrupt, which is arguably much closer to the CPU and has a much higher likelihood of having common frequency domains with the cycle counter etc) means that I'm pretty damn convinced that a big complex CPU will absolutely see issues, even if it has big caches. But it _also_ means that if you have a small and excessively stupid in-order CPU, I can almost guarantee that you will at least have cache misses likely all the way out to memory. So a CPU-only loop like the LFSR thing that Thomas reports generates entropy even on its own would likely generate nothing at all on a simple in-order core - but I do think that with timers and real cache misses etc, it's going to be really really hard to try to figure out cycle counters even if you're a CPU expert. But the embedded market with small cores and 100% identical machines and 100% identical system images is always going to be a potential huge problem. If somebody has connections to RISC-V hw people, maybe they could bring this issue up with them? Linus
Re: x86/random: Speculation to the rescue
On Sun, Sep 29, 2019 at 11:10 PM Borislav Petkov wrote: > > so sshd does getrandom(2) while those other userspace things don't. Oh > well. Well, that's actually what systems _should_ do. Presumably sshd actually wants secure randomness, and nothing else waits for it. Obviously, that can be a problem if you then need sshd in order to get into a headless box, so my patch fixes things for you too, but at least your box doesn't show the problem that Ahmed had, and the boot completing presumably means that you got more entropy from other disk IO being done by the rest of the boot. If you want to test my hacky "do /dev/urandom too", it was this one-liner: --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2027,6 +2027,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) static int maxwarn = 10; int ret; + if (!crng_ready()) try_to_generate_entropy(); if (!crng_ready() && maxwarn > 0) { maxwarn--; if (__ratelimit(&urandom_warning)) and that should get rid of the warnings. It's not using the full "wait_for_random_bytes()", because in the absence of a cycle counter, that would introduce the boot-time lockup for /dev/urandom too. Doing something like the above to /dev/urandom is likely the right thing to do eventually, but I didn't want to mix up "we can perhaps improve the urandom situation too" with the basic "let's fix the boot problem". The urandom behavior change would be a separate thing. Also, talking about "future changes". Right now "try_to_generate_entropy()" is actually uninterruptible once it gets started. I think we should add a test for signal_pending() too, but it should generally complete really fairly quickly so I left it without one just to see if anybody even notices. Linus
Re: x86/random: Speculation to the rescue
On Sun, Sep 29, 2019 at 11:37:06PM -0400, Theodore Y. Ts'o wrote: > I'm OK with this as a starting point. If a jitter entropy system > allow us to get pass this logjam, let's do it. At least for the x86 > architecture, it will be security through obscurity. And if the > alternative is potentially failing where the adversary can attack the > CRNG, it's my preference. It's certainly better than nothing. Upon rereading this, this came out wrong. What I was trying to say is in the very worst case, it will be security through obscurity, and if the alternative "don't block, because blocking is always worse than an guessable value being returned through getrandom(0)", it's better than nothing. Which is to say, I'm still worried that people with deep access to the implementation details of a CPU might be able to reverse engineer what a jitter entropy scheme produces. This is why I'd be curious to see the results when someone tries to attack a jitter scheme on a fully open, simple architecture such as RISC-V. - Ted
Re: x86/random: Speculation to the rescue
On Sun, Sep 29, 2019 at 07:59:19PM -0700, Linus Torvalds wrote: > All my smoke testing looked fine - I disabled trusting the CPU, I > increased the required entropy a lot, and to actually trigger the > lockup issue without the broken user space, I made /dev/urandom do > that "wait for entropy" thing too. Hohum, seems to get rid of the longish delay during boot on my test boxes here: $ grep random /var/log/messages <--- that's before Sep 30 07:46:07 cz vmunix: [0.00] random: get_random_bytes called from start_kernel+0x304/0x4ac with crng_init=0 Sep 30 07:46:07 cz vmunix: [1.505641] random: fast init done Sep 30 07:46:07 cz vmunix: [7.124808] random: dd: uninitialized urandom read (512 bytes read) Sep 30 07:46:07 cz vmunix: [8.507672] random: dbus-daemon: uninitialized urandom read (12 bytes read) Sep 30 07:46:07 cz vmunix: [8.518621] random: dbus-daemon: uninitialized urandom read (12 bytes read) Sep 30 07:46:07 cz vmunix: [8.565073] random: avahi-daemon: uninitialized urandom read (4 bytes read) Sep 30 07:46:21 cz vmunix: [ 23.092795] random: crng init done Sep 30 07:46:21 cz vmunix: [ 23.096419] random: 3 urandom warning(s) missed due to ratelimiting <--- that's after and we're 15 secs faster: Sep 30 07:47:53 cz vmunix: [0.329599] random: get_random_bytes called from start_kernel+0x304/0x4ac with crng_init=0 Sep 30 07:47:53 cz vmunix: [1.949216] random: fast init done Sep 30 07:47:53 cz vmunix: [4.806132] random: dd: uninitialized urandom read (512 bytes read) Sep 30 07:47:53 cz vmunix: [5.954547] random: dbus-daemon: uninitialized urandom read (12 bytes read) Sep 30 07:47:53 cz vmunix: [5.965483] random: dbus-daemon: uninitialized urandom read (12 bytes read) Sep 30 07:47:53 cz vmunix: [6.014102] random: avahi-daemon: uninitialized urandom read (4 bytes read) Sep 30 07:47:55 cz vmunix: [8.238514] random: crng init done Sep 30 07:47:55 cz vmunix: [8.240205] random: 3 urandom warning(s) missed due to ratelimiting Seeing how those uninitialized urandom read warns still happen, I added a dump_stack() to see when we do wait for the random bytes first and I got this: [5.522348] random: dbus-daemon: uninitialized urandom read (12 bytes read) [5.532008] random: dbus-daemon: uninitialized urandom read (12 bytes read) [5.579922] random: avahi-daemon: uninitialized urandom read (4 bytes read) [5.751790] elogind-daemon[1730]: New seat seat0. [5.756376] elogind-daemon[1730]: Watching system buttons on /dev/input/event6 (Power Button) [5.777381] elogind-daemon[1730]: Watching system buttons on /dev/input/event3 (Power Button) [5.781485] elogind-daemon[1730]: Watching system buttons on /dev/input/event5 (Lid Switch) [5.783547] elogind-daemon[1730]: Watching system buttons on /dev/input/event4 (Sleep Button) [5.885300] elogind-daemon[1730]: Watching system buttons on /dev/input/event0 (AT Translated Set 2 keyboard) [5.911602] CPU: 2 PID: 1798 Comm: sshd Not tainted 5.3.0+ #1 [5.914672] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.39 04/16/2019 [5.917774] Call Trace: [5.920905] dump_stack+0x46/0x60 [5.924044] wait_for_random_bytes.part.32+0x21/0x163 [5.927256] ? handle_mm_fault+0x50/0xc0 [5.930425] ? _raw_spin_unlock_irq+0x17/0x40 [5.933604] ? __do_page_fault+0x225/0x500 [5.936763] __x64_sys_getrandom+0x70/0xb0 [5.939902] do_syscall_64+0x4c/0x180 [5.943003] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [5.946152] RIP: 0033:0x7f4417f4d495 [5.949225] Code: 74 4c 8d 0c 37 41 ba 3e 01 00 00 66 2e 0f 1f 84 00 00 00 00 00 4d 39 c8 73 27 4c 89 ce 31 d2 4c 89 c7 44 89 d0 4c 29 c6 0f 05 <48> 3d 00 f0 ff ff 77 2b 48 85 c0 78 0e 74 3c 49 01 c0 4d 39 c8 72 [5.952902] RSP: 002b:7ffc69e6e328 EFLAGS: 0202 ORIG_RAX: 013e [5.956227] RAX: ffda RBX: 0020 RCX: 7f4417f4d495 [5.959530] RDX: RSI: 0020 RDI: 559262c74780 [5.962820] RBP: 559262c708b0 R08: 559262c74780 R09: 559262c747a0 [5.966104] R10: 013e R11: 0202 R12: 7ffc69e6e470 [5.969373] R13: 0002 R14: 7f4417f4d460 R15: 7fff [7.852837] random: crng init done [7.854637] random: 3 urandom warning(s) missed due to ratelimiting [ 17.767786] elogind-daemon[1730]: New session c1 of user root. so sshd does getrandom(2) while those other userspace things don't. Oh well. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: x86/random: Speculation to the rescue
On Sun, Sep 29, 2019 at 06:16:33PM -0700, Linus Torvalds wrote: > > - or just say "hey, a lot of people find jitter entropy reasonable, > so let's just try this". > I'm OK with this as a starting point. If a jitter entropy system allow us to get pass this logjam, let's do it. At least for the x86 architecture, it will be security through obscurity. And if the alternative is potentially failing where the adversary can attack the CRNG, it's my preference. It's certainly better than nothing. That being said, I'd very much like to see someone do an analysis of how well these jitter schemes work on an RISC-V iplementation (you know, the ones that were so simplistic and didn't have any speculation so they weren't vulnerable to Specture/Meltdown). If jitter approaches turn out not to work that well on RISC-V, perhaps that will be a goad for future RISC-V chips to include the crypto extension to their ISA. In the long term (not in time for the 5.4 merge window), I'm convinced that we should be trying as many ways of getting entropy as possible. If we're using UEFI, we should be trying to get it from UEFI's secure random number generator; if there is a TPM, we should be trying to get random numbers from the RPM, and mix them in, and so on. After all, the reason why lived with getrandom(0) blocking for five years was because for the vast majority of x86 platforms, it simply wasn't problem in practice. We need to get back to that place where in practice, we've harvested as much uncertainty from hardware as possible, so most folks are comfortable that attacking the CRNG is no longer the simplest way to crack system security. - Ted
Re: x86/random: Speculation to the rescue
On Sun, Sep 29, 2019 at 6:16 PM Linus Torvalds wrote: > > But I've committed that patch and the revert of the ext4 revert to a > local branch, I'll do some basic testing of it (which honestly on my > machines are kind of pointless, since all of them support rdrand), but > assuming it passes the basic smoke tests - and I expect it to - I'll > merge it for rc1. All my smoke testing looked fine - I disabled trusting the CPU, I increased the required entropy a lot, and to actually trigger the lockup issue without the broken user space, I made /dev/urandom do that "wait for entropy" thing too. It all looked sane to me, and the urandom part also had the side effect of then silencing all the "reading urandom without entropy" warning cases as expected. So it's merged. Note that what I merged did _not_ contain the urandom changes, that was purely for my testing. But it might well be a reasonable thing to do at some point. Of course, whether this jitter-entropy approach is reasonable in the first place ends up likely being debated, but it does seem to be the simplest way forward. Linus
Re: x86/random: Speculation to the rescue
On Sat, Sep 28, 2019 at 4:53 PM Linus Torvalds wrote: > > But hey, here's a made-up patch. It basically does jitter entropy, but > it uses a more complex load than the fibonacci LFSR folding: it calls > "schedule()" in a loop, and it sets up a timer to fire. Ok, I'm sure a lot of people will end up finding this distasteful, and I'll admit to just waffling about it myself. But I am supposed to close the merge window today, and honestly, I want _something_ to happen about the getrandom() issue during the 5.4 merge cycle. So I had a few choices - just ignore things and hope some consensus happens - start the movement to a new getrandom() interface and encourage user space to say "yeah, I don't need _secure_ random numbers" - or just say "hey, a lot of people find jitter entropy reasonable, so let's just try this". And I went with that last choice. If it works, it makes the getrandom() interface changes a non-issue. I'm not saying my patch is going to be the last word on the issue. I'm _personally_ ok with it and believe it's not crazy, and if it then makes serious people go "Eww" and send some improvements to it, then it has served its purpose. But I've committed that patch and the revert of the ext4 revert to a local branch, I'll do some basic testing of it (which honestly on my machines are kind of pointless, since all of them support rdrand), but assuming it passes the basic smoke tests - and I expect it to - I'll merge it for rc1. I also have my old readdir branch that I decided I want to merge due to the (completely independent) discussion about filldir issues, so I'll probably end up delaying rc1 until tomorrow, but just a heads up. I don't want to leave this until "some time later in the -rc series", although I will be _more_ than happy to have people send me fixes to my somewhat simplistic patch. That said, my patch may be simplistic, but I suspect using a loop with a real load like schedule() and arming a timer is a lot more realistic than some of the jitter entropy papers with their _very_ trivial LFSR's and some made-up pointer chasing. But yeah, I think improvements to it are also not unexpected or unreasonable ;) Linus
Re: x86/random: Speculation to the rescue
29.09.2019 04:53, Linus Torvalds пишет: On Sat, Sep 28, 2019 at 3:24 PM Thomas Gleixner wrote: Nicholas presented the idea to (ab)use speculative execution for random number generation years ago at the Real-Time Linux Workshop: What you describe is just a particularly simple version of the jitter entropy. Not very reliable. But hey, here's a made-up patch. It basically does jitter entropy, but it uses a more complex load than the fibonacci LFSR folding: it calls "schedule()" in a loop, and it sets up a timer to fire. And then it mixes in the TSC in that loop. And to be fairly conservative, it then credits one bit of entropy for every timer tick. Not because the timer itself would be all that unpredictable, but because the interaction between the timer and the loop is going to be pretty damn unpredictable. This looks quite similar to the refactoring proposed earlier by Stephan Müller in his paper: https://www.chronox.de/lrng/doc/lrng.pdf . Indeed, he makes a good argument that the timing of device interrupts is right now the main actual source of entropy in Linux, at the end of Section 1.1: """ The discussion shows that the noise sources of block devices and HIDs are a derivative of the interrupt noise source. All events used as entropy source recorded by the block device and HID noise source are delivered to the Linux kernel via interrupts. """ Now your patch adds the timer interrupt (while the schedule() loop is running) to the mix, essentially in the same setup as proposed. Ok, I'm handwaving. But I do claim it really is fairly conservative to think that a cycle counter would give one bit of entropy when you time over a timer actually happening. The way that loop is written, we do guarantee that we'll mix in the TSC value both before and after the timer actually happened. We never look at the difference of TSC values, because the mixing makes that uninteresting, but the code does start out with verifying that "yes, the TSC really is changing rapidly enough to be meaningful". So if we want to do jitter entropy, I'd much rather do something like this that actually has a known fairly complex load with timers and scheduling. And even if absolutely no actual other process is running, the timer itself is still going to cause perturbations. And the "schedule()" call is more complicated than the LFSR is anyway. It does wait for one second the old way before it starts doing this. Whatever. I'm entirely convinced this won't make everybody happy anyway, but it's _one_ approach to handle the issue. Ahmed - would you be willing to test this on your problem case (with the ext4 optimization re-enabled, of course)? And Thomas - mind double-checking that I didn't do anything questionable with the timer code.. And this goes without saying - this patch is ENTIRELY untested. Apart from making people upset for the lack of rigor, it might do unspeakable crimes against your pets. You have been warned. Linus -- Alexander E. Patrakov
Re: x86/random: Speculation to the rescue
On Sat, 28 Sep 2019, Linus Torvalds wrote: > On Sat, Sep 28, 2019 at 3:24 PM Thomas Gleixner wrote: > > > > Nicholas presented the idea to (ab)use speculative execution for random > > number generation years ago at the Real-Time Linux Workshop: > > What you describe is just a particularly simple version of the jitter > entropy. Not very reliable. I know that jitter entropy is not the most reliable source. Though its for one better than no entropy and on the other hand the results are interesting. I just looked at the test I ran overnight on one of my machines which did an aggregate test over 1024 runs both on the hacked up thing and the veritable /dev/random. The overall difference is within the noise. So I'm not trying to say that we should rely solely on that, but I think it's something we should not dismiss upfront just because paranoid theory says that jitter entropy is not reliable enough. The point is that jitter entropy relies as any other entropy source on the quality of the observed data. The non-deterministic behaviour of speculative execution seems to yield quite good input. There is another fun thing which I found while analyzing the crappy runtime behaviour of a customer application: static volatile unsigned int foo; t0 = rdtscp(); for (i = 0; i < 1; i++) foo++; /* rdtscp() or the fenced rdtsc() gives better results */ t1 = rdtscp(); Even with interrupts disabled and no NMI disturbing the loop, the resulting execution time varies depending on the microarchitecture: tmin < t < N * tmin where N is >= 2 on all Intel machines which I tested. Feeding bit 0 of t1 mixed with t0 into the LSFR gives equally good results as the hacky loop I used in the patch. Fun isn't it? > Whatever. I'm entirely convinced this won't make everybody happy > anyway, but it's _one_ approach to handle the issue. > > Ahmed - would you be willing to test this on your problem case (with > the ext4 optimization re-enabled, of course)? > > And Thomas - mind double-checking that I didn't do anything > questionable with the timer code.. Looks about right. > And this goes without saying - this patch is ENTIRELY untested. Apart > from making people upset for the lack of rigor, it might do > unspeakable crimes against your pets. You have been warned. As I'm about to vanish for a 2 weeks vacation, I'm going to shut up now and hope that experiment gave at least food for thought. Thanks, tglx
Re: x86/random: Speculation to the rescue
On Sat, Sep 28, 2019 at 3:24 PM Thomas Gleixner wrote: > > Nicholas presented the idea to (ab)use speculative execution for random > number generation years ago at the Real-Time Linux Workshop: What you describe is just a particularly simple version of the jitter entropy. Not very reliable. But hey, here's a made-up patch. It basically does jitter entropy, but it uses a more complex load than the fibonacci LFSR folding: it calls "schedule()" in a loop, and it sets up a timer to fire. And then it mixes in the TSC in that loop. And to be fairly conservative, it then credits one bit of entropy for every timer tick. Not because the timer itself would be all that unpredictable, but because the interaction between the timer and the loop is going to be pretty damn unpredictable. Ok, I'm handwaving. But I do claim it really is fairly conservative to think that a cycle counter would give one bit of entropy when you time over a timer actually happening. The way that loop is written, we do guarantee that we'll mix in the TSC value both before and after the timer actually happened. We never look at the difference of TSC values, because the mixing makes that uninteresting, but the code does start out with verifying that "yes, the TSC really is changing rapidly enough to be meaningful". So if we want to do jitter entropy, I'd much rather do something like this that actually has a known fairly complex load with timers and scheduling. And even if absolutely no actual other process is running, the timer itself is still going to cause perturbations. And the "schedule()" call is more complicated than the LFSR is anyway. It does wait for one second the old way before it starts doing this. Whatever. I'm entirely convinced this won't make everybody happy anyway, but it's _one_ approach to handle the issue. Ahmed - would you be willing to test this on your problem case (with the ext4 optimization re-enabled, of course)? And Thomas - mind double-checking that I didn't do anything questionable with the timer code.. And this goes without saying - this patch is ENTIRELY untested. Apart from making people upset for the lack of rigor, it might do unspeakable crimes against your pets. You have been warned. Linus drivers/char/random.c | 62 - 1 file changed, 61 insertions(+), 1 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index d3beed084c0a..de434feb873a 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1732,6 +1732,56 @@ void get_random_bytes(void *buf, int nbytes) } EXPORT_SYMBOL(get_random_bytes); + +/* + * Each time the timer fires, we expect that we got an unpredictable + * jump in the cycle counter. Even if the timer is running on another + * CPU, the timer activity will be touching the stack of the CPU that is + * generating entropy.. + * + * Note that we don't re-arm the timer in the timer itself - we are + * happy to be scheduled away, since that just makes the load more + * complex, but we do not want the timer to keep ticking unless the + * entropy loop is running. + * + * So the re-arming always happens in the entropy loop itself. + */ +static void entropy_timer(struct timer_list *t) +{ + credit_entropy_bits(&input_pool, 1); +} + +/* + * If we have an actual cycle counter, see if we can + * generate enough entropy with timing noise + */ +static void try_to_generate_entropy(void) +{ + struct { + unsigned long now; + struct timer_list timer; + } stack; + + stack.now = random_get_entropy(); + + /* Slow counter - or none. Don't even bother */ + if (stack.now == random_get_entropy()) + return; + + timer_setup_on_stack(&stack.timer, entropy_timer, 0); + while (!crng_ready()) { + if (!timer_pending(&stack.timer)) + mod_timer(&stack.timer, jiffies+1); + mix_pool_bytes(&input_pool, &stack.now, sizeof(stack.now)); + schedule(); + stack.now = random_get_entropy(); + } + + del_timer_sync(&stack.timer); + destroy_timer_on_stack(&stack.timer); + mix_pool_bytes(&input_pool, &stack.now, sizeof(stack.now)); +} + /* * Wait for the urandom pool to be seeded and thus guaranteed to supply * cryptographically secure random numbers. This applies to: the /dev/urandom @@ -1746,7 +1796,17 @@ int wait_for_random_bytes(void) { if (likely(crng_ready())) return 0; - return wait_event_interruptible(crng_init_wait, crng_ready()); + + do { + int ret; + ret = wait_event_interruptible_timeout(crng_init_wait, crng_ready(), HZ); + if (ret) + return ret > 0 ? 0 : ret; + + try_to_generate_entropy(); + } while (!crng_ready()); + + return 0; } EXPORT_SYMBOL(wait_for_random_bytes);