[issue27266] Always use getrandom() in os.random() on Linux and add block=False parameter to os.urandom()
Theodore Tso added the comment: Larry, at least on FreeBSD, it sounds like the implementation could just the kern.random.sys.seeded sysctl, and return . (Note: what is the proposed behaviour if the PRNG is not seeded? Return Null?) As far as OpenBSD is concerned, it's true that it's getentropy(2) never blocks. But that's because OpenBSD assumes that the bootloader can read a seeded entropy file from the previous boot session, and that the CD-ROM installer will be able to gather enough entropy to save a entropy file from the beginning of the installation.So if you don't have worry about booting your OS on an ARC Internet of Things device, you can afford to make certain simplifying assumptions. Could Linux on x86 do something similar (read the entropy file in the bootloader)? Sure, the design isn't difficult. If someone can fund the headcount to do the work, I'd be happy to help supervise the GSOC intern or work with some engineer at some other company who is interested in getting a change like this upstream. But there will still be cases where getrandom(2) could block, simply because I can't control all of the hardware platforms where Linux might be ported to. The real problem is that since on real hardware platforms it's only a problem in very early boot, it's hard to make a business case to invest in solving this problem. -- nosy: +Theodore Tso ___ Python tracker <http://bugs.python.org/issue27266> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26839] Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()
Theodore Tso added the comment: Oh --- and about people wondering whether os.random is being used for cryptographic purposes or not "most of the time" or not --- again, welcome to my world. I get complaints all the time from people who try to do "dd if=/dev/urandom of=/dev/hdX bs=4k" and then complain this is too slow. Creating an os.cryptorandom and os.pseudorandom might be a useful way to go here. I've often considered whether I should create a /dev/frandom for the crazies who want to use dd as a way to wipe a disk, but to date I've haven't thought it was worth the effort, and I didn't want to encourage them. Besides, isn't obviously the right answer is to create a quickie python script? :-) Splitting os.random does beg the question of what os.random should do, however. If you go down that path, I'd suggest defaulting to the secure-but-slow choice. I'd also suggest assuming it's OK to put the onus on the people who are trying to run python scripts during early boot to have to either add some command flags to the python interpreter, or to otherwise make adjustments, as being completely fair. But again, that's my bias, and if people don't want to deal with trying to ask the systemd folks to make a change in their code, I'd _completely_ understand. My design preference is that outside of boot scripts, having os.random block in the same of security is completely fair, since in that case you won't deadlock the system. People of good will may disagree, of course, and I'm not on the Python development team, so take that with whatever grain of salt you wish. At the end of the day, this is all about tradeoffs, and you know your customer/developer base better than I do.\ Cheers! -- ___ Python tracker <http://bugs.python.org/issue26839> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26839] Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()
Theodore Tso added the comment: One of the reasons why trying to deal with randomness is hard is because a lot of it is about trust. Did Intel backdoor RDRAND to help out the NSA? You might have one answer if you work for the NSA, and perhaps if you are willing to assume the worst about the NSA balancing its equities between its signals intelligence mission and providing a secure infrastructure for its customers and keeping the US computing industry strong. Etc., etc. It is true that OS developers are trying to make their random number generators be initialized more quickly at boot time. Part of this is because of the dynamic which we can all see at work on the discussion of this bug. Some people care about very much about not blocking; some people want Python to be useful during the boot sequences; some people care very much about security above all else; some people don't trust application programmers. (And if you fit in that camp; congratulations, now you know how I often feel when I worry about user space programmers doing potentially crazy things and I have no way of even knowing about them until the security researchers publish a web site such as http://www.factorable.net) >From the OS's perspective, one of the problems is that it's very hard to know >when you have actually achieved a securely initialized random number >generator. Sure, we can say we've done this once we have accumulated at least >128 bits of entropy, but that begs the question of when you've collected a bit >of entropy. There's no way to know for sure. On current systems, we assume >that each interrupt gathers 1/64th of a bit of entropy on average. This is an >incredibly conservative number, and on real hardware, assuming the normal >bootup activity, we achieve that within about 5 seconds (plus/minus 2 seconds) >after boot. On Intel, on real hardware, I'm comfortable cutting this to 1 >bit of entropy per interrupt, which will speed up things considerably. In an >ARM SOC, or if you are on a VM and you don't trust the hypervisor so you don't >use virtio-rng, is one bit of entropy per interrupt going to be good enough? >It's hard to say. On the other hand, if we use too conservative a number, there is a risk that userspace programmers (such as some have advocated on the discussionon this bug) to simply always use GRND_NONBLOCK, or fall back to /dev/urandom, and then if there's a security exposure, they'll cast the blame on the OS developers. The reality is that we really need to work together, because the real problem are the clueless people writing python scripts at boot time to create long-term RSA private keys for IOT devices[1]. :-) So when people assert that developers at FreeBSD are at work trying to speedup /dev/random initialization, folks need to understand that there's no magic here. What's really happening is that we're all trying to figure out which defaults work the best. In some ways the FreeBSD folks have it easier, because they support a much fewer range of platforms. It's a lot easier to get things right on x86, where we have instructions like RDTSC and RDRAND to help us out. It's a lot harder to be sure you have things right for ARM SOC's. There are other techniques such as trying to carry entropy over from previous boot sessions, but (a) this requires support from the boot loaders, and on an OS with a large number of architectures, that means adding support to a large number of different ways of booting the kernel --- and it doesn't solve the "consumer device generating keys after a cold start when the device is freshly removed from the packaging". As far as adding knobs, such as "blocking vs non-blocking", etc., keep in mind that as you add knobs, you increase the knowledge of the system that you force onto the next layer of the stack. So this goes to the question of whether you trust application programmers will be able to get things right. So Ted, why does Linux expose /dev/random vs /dev/urandom? Historical reasons; some people don't believe that relying on cryptogaphic random number generators is sufficient, they *want* to use entropy which has minimal reliance on the belief that NSA ***probably*** didn't leave a back door into SHA-1, for example. It is for that reason that /dev/random exists. These days, the number of people who believe that to be true are very small, but I didn't want to make changes in existing interfaces. For similar reasons I didn't want to suddenly make /dev/urandom block. The fact that getrandom(2) blocks only until the cryptographic RNG has been initialized, and that it depends on a cryptogaphic RNG, is the consensus that *most* people have come to, and it reflects my recommendations that unless you ***really*** kno
[issue26839] Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()
Theodore Tso added the comment: I ran the experiment Colm asked me to run --- and yes, if you boot a system with Python 3.5.1 with the boot options "init=/usr/bin/python3", you're going to have a bad time. The problem is that in a KVM environment where things are very quiet, especially if you are using a tickless kernel, if python calls getrandom(2), it will block since the entropy pool hasn't been initialized yet. But since we aren't doing anything, the system becomes completely quiescent and so no entropy can be collected. If systemd tries to run a python script very early in the boot process, and does this in a way where no further boot time activity takes place until the python script exits, you can indeed deadlock the system. The solution is I think what Donald suggested in msg267746, which is to use GRND_NONBLOCK for initializing the hash which gets used for the dict, or whatever it's used for. My understanding is that this is not a long-term cryptographic secret, and indeed it will be thrown away as soon as the python interpreter exits. Since this is before networking has been brought up, the denial service attack or whatever requires that you use a strong SipHash for your Python dictionaries shouldn't be a problem. (Which I gather has something to do with this? https://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf) Now, I can see people being concerned that if Python *always* initializes its hash dictionaries using getrandom with GRND_NONBLOCK, it might be opening up a DOS attack. Well, in practice, once the boot sequence continues and the system is actually doing some real work, within a few seconds the random number generator will be initialized so in practice it won't be an issue once the system has booted. If you want to be really paranoid, I suppose you could give some kind of command-line flag which tells Python to use GRND_NONBLOCK for the purposes of initializing its hash table for its dictionary, and only use it in the boot path. In practice, I suspect very early in the systemd boot path, before it actually starts running the boot scripts in parallel, is the only place where you are likely going to run into this problem, so making it be a flag that only systemd scripts have to call is probably the right thing to do. But I'll let someone else have the joys of negotiating with Lennart, and I won't blame the Python devs if using GRND_NONBLOCK unconditionally is less painful than having to work with the systemd folks. :-) -- ___ Python tracker <http://bugs.python.org/issue26839> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26839] Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()
Theodore Tso added the comment: Hi. Colm alerted me to this bug, so I thought I would chime in as the author of Linux's getrandom(2) function. First of all, if you are OK with reading from /dev/urandom, then you might as well use getrandom's GRND_NONBLOCK flag. They are logically equivalent. Secondly, when I decided to add this behavior to getrandom(2), it was because people were really worried that people would be using /dev/urandom for security-critical things (e.g., initializing ssh host session keys, when they'd _really_ rather not the NSA have be able to trivally pwn the server) before it had been completely initialized. (And if it is not completely initialized, it would be trivially and embarassingly easy. See https://factorable.net/weakkeys12.extended.pdf for an example of where this was rather disastrous.) Why didn't I make /dev/urandom blocking? Because a lot of people would whine and complain. But getrandom(2) was a new interface, and so this was something I could do. Now, before I decided to do this, I did do some benchmarks, and pre-systemd in practice on real hardware (e.g., x86 servers and laptops), I observed that you would actually see a message indicating that we had gathered 128 bits of entropy long before the root file system had been mounted.With systemd, I observed that udevd was trying to read from /dev/urandom when we had only gathered an estimated 7 bits of entropy --- but I devoutly hoped that udevd wasn't doing anything super security critical, and trying to get the systemd people to change what they are doing is mostly like trying to teach a pig to sing, so I let it be.However, in practice within a single digit number of seconds, the kernel printk indicating that random driver had considered itself initialized came quickly enough that I figured it would be safe to do. If people are claiming that they are seeing cases where it takes over 90 seconds for the random number generator to initialize itself, please contact me directly; I'd love to know more, because that's input I would very much like to have. However, at the end of the day, on certain hardware, if you don't have a source of initial entropy because the system doesn't have enough real hardware with real sources of entropy --- or if you don't trust your friendly cloud provider to provide you with some entropy from the hypervisor's entropy pool via virtio-random --- you can either (a) decide to pretend you are secure, when you really aren't, (b) wait, or (c) decide that you don't *really* need a secure source of randomness because you're really just initializing a hash for some associative array, and in fact srandom(time(0)) would have been fine, and you were using getrandom(2) or /dev/urandom just because you wanted to feel like one of the cool kids. That being said, I do know of one potential issue which is if you happening to be using Microsoft Azure, the way the virtualized interrupt works, we weren't actually getting any entropy, and this was something I didn't discover until someone sent me a patch. I have a patch[1] queued up in the random.git tree for the next kernel merge window to address that issue for Microsoft Azure servers. [1] http://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=dev&id=8748971b4f5e322236154981827bf43dec4dc470 On a Google Compute Engine (GCE) system, I just did a quick test, and the "random: non-blocking pool initialized" message appears 5.64 seconds after the system is booted. The changes I have queued up in random.git should reduce that to under a second. All of this is neither here nor there, though. The big question is *what* does Python expect to do with the randomness. If you are just using it for computational simulation, you can do whatever you want. If you are using it to create long-lived secrets that are intended to be secure against the depredations of a Nation-State's intelligence service, and you are on a system which really has almost no entropy available to be collected, then falling back to reading from /dev/urandom or using GRND_NONBLOCOK is going to be the equivalent of saying La-La-La-La-La-Nobody-Knows-How-Secure-I-Am while keeping your ears plugged.(Now, if you are on an Intel system with RDRAND, and you trust Intel not to have given a back door to the NSA, you probably are safe, because we do actually mix in RDRAND. On the other hand, if you are using some crappy ARM SOC for some Internet of Things device, and are firing up Python right after the system boots for the first time, and creating long-lived RS A private keys within milliseconds after the system is first booted --- please tell me so, I can avoid your product like the Plague. :-) -- nosy: +Theod