[issue27266] Always use getrandom() in os.random() on Linux and add block=False parameter to os.urandom()

2016-06-08 Thread Theodore Tso

Theodore Tso added the comment:

Larry, at least on FreeBSD, it sounds like the implementation could just the 
kern.random.sys.seeded sysctl, and return .  (Note: what is the 
proposed behaviour if the PRNG is not seeded?  Return Null?)

As far as OpenBSD is concerned, it's true that it's getentropy(2) never blocks. 
 But that's because OpenBSD assumes that the bootloader can read a seeded 
entropy file from the previous boot session, and that the CD-ROM installer will 
be able to gather enough entropy to save a entropy file from the beginning of 
the installation.So if you don't have worry about booting your OS on an ARC 
Internet of Things device, you can afford to make certain simplifying 
assumptions.

Could Linux on x86 do something similar (read the entropy file in the 
bootloader)?  Sure, the design isn't difficult.  If someone can fund the 
headcount to do the work, I'd be happy to help supervise the GSOC intern or 
work with some engineer at some other company who is interested in getting a 
change like this upstream.  But there will still be cases where getrandom(2) 
could block, simply because I can't control all of the hardware platforms where 
Linux might be ported to.   The real problem is that since on real hardware 
platforms it's only a problem in very early boot, it's hard to make a business 
case to invest in solving this problem.

--
nosy: +Theodore Tso

___
Python tracker 
<http://bugs.python.org/issue27266>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26839] Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()

2016-06-08 Thread Theodore Tso

Theodore Tso added the comment:

Oh --- and about people wondering whether os.random is being used for 
cryptographic purposes or not "most of the time" or not --- again, welcome to 
my world.  I get complaints all the time from people who try to do "dd 
if=/dev/urandom of=/dev/hdX bs=4k" and then complain this is too slow.

Creating an os.cryptorandom and os.pseudorandom might be a useful way to go 
here.  I've often considered whether I should create a /dev/frandom for the 
crazies who want to use dd as a way to wipe a disk, but to date I've haven't 
thought it was worth the effort, and I didn't want to encourage them.  Besides, 
isn't obviously the right answer is to create a quickie python script?  :-)

Splitting os.random does beg the question of what os.random should do, however. 
 If you go down that path, I'd suggest defaulting to the secure-but-slow choice.

I'd also suggest assuming it's OK to put the onus on the people who are trying 
to run python scripts during early boot to have to either add some command 
flags to the python interpreter, or to otherwise make adjustments, as being 
completely fair.  But again, that's my bias, and if people don't want to deal 
with trying to ask the systemd folks to make a change in their code, I'd 
_completely_ understand.

My design preference is that outside of boot scripts, having os.random block in 
the same of security is completely fair, since in that case you won't deadlock 
the system.  People of good will may disagree, of course, and I'm not on the 
Python development team, so take that with whatever grain of salt you wish.   
At the end of the day, this is all about tradeoffs, and you know your 
customer/developer base better than I do.\

Cheers!

--

___
Python tracker 
<http://bugs.python.org/issue26839>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26839] Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()

2016-06-08 Thread Theodore Tso

Theodore Tso added the comment:

One of the reasons why trying to deal with randomness is hard is because a lot 
of it is about trust.  Did Intel backdoor RDRAND to help out the NSA?   You 
might have one answer if you work for the NSA, and perhaps if you are willing 
to assume the worst about the NSA balancing its equities between its signals 
intelligence mission and providing a secure infrastructure for its customers 
and keeping the US computing industry strong.   Etc., etc.

It is true that OS developers are trying to make their random number generators 
be initialized more quickly at boot time.  Part of this is because of the 
dynamic which we can all see at work on the discussion of this bug.  Some 
people care about very much about not blocking; some people want Python to be 
useful during the boot sequences; some people care very much about security 
above all else; some people don't trust application programmers.  (And if you 
fit in that camp; congratulations, now you know how I often feel when I worry 
about user space programmers doing potentially crazy things and I have no way 
of even knowing about them until the security researchers publish a web site 
such as http://www.factorable.net)

>From the OS's perspective, one of the problems is that it's very hard to know 
>when you have actually achieved a securely initialized random number 
>generator.  Sure, we can say we've done this once we have accumulated at least 
>128 bits of entropy, but that begs the question of when you've collected a bit 
>of entropy.  There's no way to know for sure.  On current systems, we assume 
>that each interrupt gathers 1/64th of a bit of entropy on average.  This is an 
>incredibly conservative number, and on real hardware, assuming the normal 
>bootup activity, we achieve that within about 5 seconds (plus/minus 2 seconds) 
>after boot.   On Intel, on real hardware, I'm comfortable cutting this to 1 
>bit of entropy per interrupt, which will speed up things considerably.  In an 
>ARM SOC, or if you are on a VM and you don't trust the hypervisor so you don't 
>use virtio-rng, is one bit of entropy per interrupt going to be good enough?  
>It's hard to say.

On the other hand, if we use too conservative a number, there is a risk that 
userspace programmers (such as some have advocated on the discussionon this 
bug) to simply always use GRND_NONBLOCK, or fall back to /dev/urandom, and then 
if there's a security exposure, they'll cast the blame on the OS developers.  
The reality is that we really need to work together, because the real problem 
are the clueless people writing python scripts at boot time to create long-term 
RSA private keys for IOT devices[1].  :-)

So when people assert that developers at FreeBSD are at work trying to speedup 
/dev/random initialization, folks need to understand that there's no magic 
here.  What's really happening is that we're all trying to figure out which 
defaults work the best.  In some ways the FreeBSD folks have it easier, because 
they support a much fewer range of platforms.  It's a lot easier to get things 
right on x86, where we have instructions like RDTSC and RDRAND to help us out.  
It's a lot harder to be sure you have things right for ARM SOC's.   There are 
other techniques such as trying to carry entropy over from previous boot 
sessions, but (a) this requires support from the boot loaders, and on an OS 
with a large number of architectures, that means adding support to a large 
number of different ways of booting the kernel --- and it doesn't solve the 
"consumer device generating keys after a cold start when the device is freshly 
removed from the packaging".

As far as adding knobs, such as "blocking vs non-blocking", etc., keep in mind 
that as you add knobs, you increase the knowledge of the system that you force 
onto the next layer of the stack.  So this goes to the question of whether you 
trust application programmers will be able to get things right.

So Ted, why does Linux expose /dev/random vs /dev/urandom?  Historical reasons; 
some people don't believe that relying on cryptogaphic random number generators 
is sufficient, they *want* to use entropy which has minimal reliance on the 
belief that NSA ***probably*** didn't leave a back door into SHA-1, for 
example.  It is for that reason that /dev/random exists.  These days, the 
number of people who believe that to be true are very small, but I didn't want 
to make changes in existing interfaces.  For similar reasons I didn't want to 
suddenly make /dev/urandom block.   The fact that getrandom(2) blocks only 
until the cryptographic RNG has been initialized, and that it depends on a 
cryptogaphic RNG, is the consensus that *most* people have come to, and it 
reflects my recommendations that unless you ***really*** kno

[issue26839] Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()

2016-06-07 Thread Theodore Tso

Theodore Tso added the comment:

I ran the experiment Colm asked me to run --- and yes, if you boot a system 
with Python 3.5.1 with the boot options "init=/usr/bin/python3", you're going 
to have a bad time.   The problem is that in a KVM environment where things are 
very quiet, especially if you are using a tickless kernel, if python calls 
getrandom(2), it will block since the entropy pool hasn't been initialized yet. 
  But since we aren't doing anything, the system becomes completely quiescent 
and so no entropy can be collected.  If systemd tries to run a python script 
very early in the boot process, and does this in a way where no further boot 
time activity takes place until the python script exits, you can indeed 
deadlock the system.

The solution is I think what Donald suggested in msg267746, which is to use 
GRND_NONBLOCK for initializing the hash which gets used for the dict, or 
whatever it's used for.   My understanding is that this is not a long-term 
cryptographic secret, and indeed it will be thrown away as soon as the python 
interpreter exits.  Since this is before networking has been brought up, the 
denial service attack or whatever requires that you use a strong SipHash for 
your Python dictionaries shouldn't be a problem.   (Which I gather has 
something to do with this?   
https://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf)

Now, I can see people being concerned that if Python *always* initializes its 
hash dictionaries using getrandom with GRND_NONBLOCK, it might be opening up a 
DOS attack.   Well, in practice, once the boot sequence continues and the 
system is actually doing some real work, within a few seconds the random number 
generator will be initialized so in practice it won't be an issue once the 
system has booted.

If you want to be really paranoid, I suppose you could give some kind of 
command-line flag which tells Python to use GRND_NONBLOCK for the purposes of 
initializing its hash table for its dictionary, and only use it in the boot 
path.   In practice, I suspect very early in the systemd boot path, before it 
actually starts running the boot scripts in parallel, is the only place where 
you are likely going to run into this problem, so making it be a flag that only 
systemd scripts have to call is probably the right thing to do.   But I'll let 
someone else have the joys of negotiating with Lennart, and I won't blame the 
Python devs if using GRND_NONBLOCK unconditionally is less painful than having 
to work with the systemd folks.  :-)

--

___
Python tracker 
<http://bugs.python.org/issue26839>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26839] Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()

2016-06-07 Thread Theodore Tso

Theodore Tso added the comment:

Hi.   Colm alerted me to this bug, so I thought I would chime in as the author 
of Linux's getrandom(2) function.

First of all, if you are OK with reading from /dev/urandom, then you might as 
well use getrandom's GRND_NONBLOCK flag.  They are logically equivalent.

Secondly, when I decided to add this behavior to getrandom(2), it was because 
people were really worried that people would be using /dev/urandom for 
security-critical things (e.g., initializing ssh host session keys, when they'd 
_really_ rather not the NSA have be able to trivally pwn the server) before it 
had been completely initialized.   (And if it is not completely initialized, it 
would be trivially and embarassingly easy.  See 
https://factorable.net/weakkeys12.extended.pdf for an example of where this was 
rather disastrous.)

Why didn't I make /dev/urandom blocking?  Because a lot of people would whine 
and complain.   But getrandom(2) was a new interface, and so this was something 
I could do.   Now, before I decided to do this, I did do some benchmarks, and 
pre-systemd in practice on real hardware (e.g., x86 servers and laptops), I 
observed that you would actually see a message indicating that we had gathered 
128 bits of entropy long before the root file system had been mounted.With 
systemd, I observed that udevd was trying to read from /dev/urandom when we had 
only gathered an estimated 7 bits of entropy --- but I devoutly hoped that 
udevd wasn't doing anything super security critical, and trying to get the 
systemd people to change what they are doing is mostly like trying to teach a 
pig to sing, so I let it be.However, in practice within a single digit 
number of seconds, the kernel printk indicating that random driver had 
considered itself initialized came quickly enough that I figured it would 
 be safe to do.

If people are claiming that they are seeing cases where it takes over 90 
seconds for the random number generator to initialize itself, please contact me 
directly; I'd love to know more, because that's input I would very much like to 
have.

However, at the end of the day, on certain hardware, if you don't have a source 
of initial entropy because the system doesn't have enough real hardware with 
real sources of entropy --- or if you don't trust your friendly cloud provider 
to provide you with some entropy from the hypervisor's entropy pool via 
virtio-random --- you can either (a) decide to pretend you are secure, when you 
really aren't, (b) wait, or (c) decide that you don't *really* need a secure 
source of randomness because you're really just initializing a hash for some 
associative array, and in fact srandom(time(0)) would have been fine, and you 
were using getrandom(2) or /dev/urandom just because you wanted to feel like 
one of the cool kids.

That being said, I do know of one potential issue which is if you happening to 
be using Microsoft Azure, the way the virtualized interrupt works, we weren't 
actually getting any entropy, and this was something I didn't discover until 
someone sent me a patch.  I have a patch[1] queued up in the random.git tree 
for the next kernel merge window to address that issue for Microsoft Azure 
servers. 

[1] 
http://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=dev&id=8748971b4f5e322236154981827bf43dec4dc470

On a Google Compute Engine (GCE) system, I just did a quick test, and the 
"random: non-blocking pool initialized" message appears 5.64 seconds after the 
system is booted.  The changes I have queued up in random.git should reduce 
that to under a second.

All of this is neither here nor there, though.  The big question is *what* does 
Python expect to do with the randomness.  If you are just using it for 
computational simulation, you can do whatever you want.   If you are using it 
to create long-lived secrets that are intended to be secure against the 
depredations of a Nation-State's intelligence service, and you are on a system 
which really has almost no entropy available to be collected, then falling back 
to reading from /dev/urandom or using GRND_NONBLOCOK is going to be the 
equivalent of saying La-La-La-La-La-Nobody-Knows-How-Secure-I-Am while keeping 
your ears plugged.(Now, if you are on an Intel system with RDRAND, and you 
trust Intel not to have given a back door to the NSA, you probably are safe, 
because we do actually mix in RDRAND.  On the other hand, if you are using some 
crappy ARM SOC for some Internet of Things device, and are firing up Python 
right after the system boots for the first time, and creating long-lived RS
 A private keys within milliseconds after the system is first booted --- please 
tell me so, I can avoid your product like the Plague.  :-)

--
nosy: +Theod