Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
On Wed, Mar 31, 2021 at 09:58:48PM -0400, Thor Lancelot Simon wrote: > On Wed, Mar 31, 2021 at 11:24:07AM +0200, Manuel Bouyer wrote: > > On Tue, Mar 30, 2021 at 10:42:53PM +, Taylor R Campbell wrote: > > > > > > There are no virtual RNG devices on the system in question, according > > > to the quoted `rndctl -l' output. Perhaps the VM host needs to be > > > taught to expose a virtio-rng device to the guest? > > > > There is no such thing in Xen. > > Is the CPU so old that it doesn't have RDRAND / RDSEED, or is Xen perhaps > masking these CPU features from the guest? Is there an easy way to test, on a netbsd-9 system, if the instruction is present and working ? -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
On Wed, Mar 31, 2021 at 11:24:07AM +0200, Manuel Bouyer wrote: > On Tue, Mar 30, 2021 at 10:42:53PM +, Taylor R Campbell wrote: > > > > There are no virtual RNG devices on the system in question, according > > to the quoted `rndctl -l' output. Perhaps the VM host needs to be > > taught to expose a virtio-rng device to the guest? > > There is no such thing in Xen. Is the CPU so old that it doesn't have RDRAND / RDSEED, or is Xen perhaps masking these CPU features from the guest? Thor
Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
> On Wed, Mar 31, 2021 at 12:12:31AM +, Taylor R Campbell wrote: >> This is false. If the VM host provided a viornd(4) device then NetBSD >> would automatically collect, and count, entropy from the host, with no >> manual intervention. > > I would love to see instructions how to do this - I have not seen a working > virond(4) in any of my Xen domU (but that is a very limited sample). While this isn't with Xen, and isn't on -current, but this is what I do for my emulated arm64 system, where the emulator runs on NetBSD/amd64 8.0: #!/bin/sh SMP=4 MEM=8g qemu-system-aarch64 -M virt -cpu cortex-a57 -smp $SMP -m $MEM \ -drive if=none,file=disk.img,id=hd0 -device virtio-blk-device,drive=hd0 \ -netdev type=user,id=net0,hostfwd=tcp::-:22,ipv6=off \ -nographic \ -device virtio-net-device,netdev=net0,mac=00:11:66:33:44:55 \ -device virtio-rng-device \ -kernel netbsd.img -append root=ld4a and the booted system is NetBSD/aarch64 9.0 with the unmodified GENERIC64 kernel: arm64# rndctl -l Source Bits Type Flags cpu3 7824 vm estimate, collect, v, t, dv cpu2 8983 vm estimate, collect, v, t, dv cpu1 8351 vm estimate, collect, v, t, dv cpu0 12436 vm estimate, collect, v, t, dv ld4 8440476 disk estimate, collect, v, t, dt viornd04096 rng estimate, collect, v system-power 0 power estimate, collect, v, t, dt autoconf 72 ??? estimate, collect, t, dt printf0 ??? collect callout 116 skew estimate, collect, v, dv arm64# arm64# dmesg | grep rnd [ 1.10] viornd0 at virtio29: Features: 0x1000 arm64# arm64# dmesg | grep virtio29 [ 1.10] virtio29 at simplebus0 [ 1.10] viornd0 at virtio29: Features: 0x1000 [ 1.10] virtio29: allocated 32768 byte for virtqueue 0 for Entropy request, size 1024 [ 1.10] virtio29: interrupting on GIC irq 77 arm64# When I get to booting a past-rng-rework kernel, I'm fairly certain that only the input from viornd0 will remain as a source with "estimate" in the flags field. Of course, any saved and restored entropy will also count towards the estimate. That said, it doesn't look like the amd64 XEN3_DOMU kernel has either of virtio* or viornd* configured, they're only in the GENERIC and ALL kernel configs. Also, I don't know what has to happen on the XEN "host side" to provide those devices; virtio* is apparently supposed to be made visible via the pci bus (looking at amd64's GENERIC), but by the looks of it, XEN only does "pci passthrough" to physical devices (looking at the comments near the commented-out "pci" config statements in XEN3_DOMU), so no "emulated" PCI bus where the host can provide the host-side of the randomness virtual device? Regards, - HÃ¥vard
Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
On Tue, Mar 30, 2021 at 10:42:53PM +, Taylor R Campbell wrote: > > Date: Tue, 30 Mar 2021 23:53:43 +0200 > > From: Manuel Bouyer > > > > On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote: > > > [...] > > > > > > Perhaps the answer is that nothing seems to be contributing anything to > > > the entropy pool. No matter what device I exercise, none of the numbers > > > in the following changes: > > > > yes, it's been this way since the rnd rototill. Virtual devices are > > not trusted. > > > > The only way is to manually seed the pool. > > This is false. The virtual RNG drivers (viornd(4) [1], rump > hyperentropy [2], maybe others) all assume the VM host provides > samples with full entropy. This has always been the case, and this > didn't change at all in the rototill last year. > > There are no virtual RNG devices on the system in question, according > to the quoted `rndctl -l' output. Perhaps the VM host needs to be > taught to expose a virtio-rng device to the guest? There is no such thing in Xen. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
On Wed, Mar 31, 2021 at 12:12:31AM +, Taylor R Campbell wrote: > This is false. If the VM host provided a viornd(4) device then NetBSD > would automatically collect, and count, entropy from the host, with no > manual intervention. I would love to see instructions how to do this - I have not seen a working virond(4) in any of my Xen domU (but that is a very limited sample). Martin
Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
[[ sorry I've not been catching up on mailing list discussions as fast as I had hoped to, and I'm way behind on following the entropy rototill. ]] At Wed, 31 Mar 2021 00:12:31 +, Taylor R Campbell wrote: Subject: Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement) > > This is false. If the VM host provided a viornd(4) device then NetBSD > would automatically collect, and count, entropy from the host, with no > manual intervention. I'll leave that idea to others more up-to-date on Xen PV drivers to respond to. Booting a -current GENERIC kernel (which has both Xen PV and virtio(4) devices configured into it) in a "type='pvh'" domU only attaches the xenbus PV devices, no virtio devices, so adding virtio might be a bit of a much bigger task that will need further support on at least the backend, and perhaps on the front-end too, especially to do it without QEMU. I haven't tried if virtio devices show up in an HVM domU precisely because I'm trying to avoid having to run and rely on QEMU (never mind any performance implications of HVM). > > Finally, if the system isn't actually collecting entropy from a device, > > then why the heck does it allow me to think it is (i.e. by allowing me > > to enable it and show it as enabled and collecting via "rndctl -l")? > > The system does collect samples from all those devices. However, they > are not designed to be unpredictable and there is no good reliable > model for just how unpredictable they are, so the system doesn't > _count_ anything from them. See https://man.NetBSD.org/entropy.4 for > a high-level overview. I'm not sure the word "count" appears in entropy(4) any context I can make sense of it in w.r.t. what it means to "collect" but not "count" entropy from those devices. Worse the "Flags" shown by "rndctl -l" don't seem to be directly documented (i.e. they're not described in rndctl(8)), and even on a kernel running on real hardware I don't see the word "count" showing there. After looking at the source I'm not sure the descriptions of the RND_FLAG_* values in rnd(4) help me much either. Based on my vague understanding of all of this, perhaps you meant to say "estimate", instead of "count"? That would make more sense in the context of what I read in rnd(4) and rndctl(8), though "estimate" still seems a little vague in meaning to me. In any case, I don't see why an xbd disk, or a xennet interface, can't be treated exactly as if they were real hardware (i.e. in terms of extracting entropy from their behaviour). This is exactly what virtualization is all about to me -- even for paravirtualization. After all in a threat-free world (i.e. specifically where I also trust other domUs) their entropy is going to reflect (though maybe not exactly mirror) the entropy of the underlying hardware and/or network traffic. So (but maybe not by default) if I as the admin want to trust the entropy available from an xbd(4) or xennet(4) device, then I should be able to enable it with rndctl(8) and have it "count". More importantly though the system shouldn't mislead me into thinking it is "counting" entropy from a device when it is actually not. If I had seen that there were no sources estimating/counting/whatever entropy, and I tried to enable one and was given a nice error message about this not being possible, then I would have looked elsewhere to find out how to give the system more bits of entropy. As is in my Xen domU system the output of "rndctl -l" leads me to believe all of my devices are collecting both timing and value samples, and using either one or the other to gather entropy (though with '-v' I don't see that any bits of entropy have been added from any of those amy millions of collected samples). -- Greg A. Woods Kelowna, BC +1 250 762-7675 RoboHack Planix, Inc. Avoncote Farms pgpcOwz5f2PVj.pgp Description: OpenPGP Digital Signature
Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
> Date: Tue, 30 Mar 2021 16:23:43 -0700 > From: "Greg A. Woods" > > At Tue, 30 Mar 2021 23:53:43 +0200, Manuel Bouyer > wrote: > > On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote: > > > Perhaps the answer is that nothing seems to be contributing anything to > > > the entropy pool. No matter what device I exercise, none of the numbers > > > in the following changes: > > > > yes, it's been this way since the rnd rototill. Virtual devices are > > not trusted. > > > > The only way is to manually seed the pool. > > Ah, so that is definitely not what I expected! This is false. If the VM host provided a viornd(4) device then NetBSD would automatically collect, and count, entropy from the host, with no manual intervention. > Finally, if the system isn't actually collecting entropy from a device, > then why the heck does it allow me to think it is (i.e. by allowing me > to enable it and show it as enabled and collecting via "rndctl -l")? The system does collect samples from all those devices. However, they are not designed to be unpredictable and there is no good reliable model for just how unpredictable they are, so the system doesn't _count_ anything from them. See https://man.NetBSD.org/entropy.4 for a high-level overview. In the past we used an essentially meaningless model, designed in a vacuum without reference to any information about the physics of the sources of the samples (and the same model with all sources), for fabricating entropy estimates by examining the sample data. This practice no longer happens.
Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
At Tue, 30 Mar 2021 23:53:43 +0200, Manuel Bouyer wrote: Subject: Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement) > > On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote: > > [...] > > > > Perhaps the answer is that nothing seems to be contributing anything to > > the entropy pool. No matter what device I exercise, none of the numbers > > in the following changes: > > yes, it's been this way since the rnd rototill. Virtual devices are > not trusted. > > The only way is to manually seed the pool. Ah, so that is definitely not what I expected! Previously wasn't it up to the local admin what to trust? I guess throwing bits into /dev/random is one way to play that game, but I have to trust the dom0 implicitly and utterly anyway, so why not trust the devices it presents? This is especially true for xbd block devices. All my blocks are belong to dom0. The network device is in effect no different than if it were real hardware, so if I want to trust network traffic, then I should be able to enable it, just as I could if it were real hardware. The CPUs are also probably the least "virtual" things in Xen, so why not trust them? (Though I'm not sure I understand what entropy they can offer in the first place.) Finally, if the system isn't actually collecting entropy from a device, then why the heck does it allow me to think it is (i.e. by allowing me to enable it and show it as enabled and collecting via "rndctl -l")? -- Greg A. Woods Kelowna, BC +1 250 762-7675 RoboHack Planix, Inc. Avoncote Farms pgpE2Nup3Gb9V.pgp Description: OpenPGP Digital Signature
Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
> Date: Tue, 30 Mar 2021 23:53:43 +0200 > From: Manuel Bouyer > > On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote: > > [...] > > > > Perhaps the answer is that nothing seems to be contributing anything to > > the entropy pool. No matter what device I exercise, none of the numbers > > in the following changes: > > yes, it's been this way since the rnd rototill. Virtual devices are > not trusted. > > The only way is to manually seed the pool. This is false. The virtual RNG drivers (viornd(4) [1], rump hyperentropy [2], maybe others) all assume the VM host provides samples with full entropy. This has always been the case, and this didn't change at all in the rototill last year. There are no virtual RNG devices on the system in question, according to the quoted `rndctl -l' output. Perhaps the VM host needs to be taught to expose a virtio-rng device to the guest? [1] https://nxr.netbsd.org/xref/src/sys/dev/pci/viornd.c#245 [2] https://nxr.netbsd.org/xref/src/sys/rump/librump/rumpkern/hyperentropy.c#57 P.S. Further discussion about Python, getrandom, and system integration: https://mail-index.netbsd.org/tech-userlevel/2021/01/11/msg012807.html
Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote: > [...] > > Perhaps the answer is that nothing seems to be contributing anything to > the entropy pool. No matter what device I exercise, none of the numbers > in the following changes: yes, it's been this way since the rnd rototill. Virtual devices are not trusted. The only way is to manually seed the pool. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)
Further to this, here's the ktrace output from just before and after the SIGINT: 9350 9350 python 0.122852724 RET mmap 127488830459904/0x73f34e73d000 9350 9350 python 0.122890259 CALL getrandom(0x73f34e742610,0x20,0) 9350 9350 python 6.141515919 RET getrandom -1 errno 4 Interrupted system call 9350 9350 python 6.141522183 PSIG SIGINT caught handler=0x73f34f182b25 mask=(): code=SI_NOINFO So, how can getramdom(2) hang for so long despite the rest of the system running on and doing things for over a day on a system that's been up and running and busy building packages for nearly a week? Perhaps the answer is that nothing seems to be contributing anything to the entropy pool. No matter what device I exercise, none of the numbers in the following changes: # rndctl -l Source Bits Type Flags /dev/random 0 ??? estimate, collect, v xbd6 0 disk estimate, collect, v, t, dt xbd5 0 disk estimate, collect, v, t, dt xbd4 0 disk estimate, collect, v, t, dt xbd3 0 disk estimate, collect, v, t, dt xennet0 0 net estimate, v, t, dt xbd2 0 disk estimate, collect, v, t, dt xbd1 0 disk estimate, collect, v, t, dt xbd0 0 disk estimate, collect, v, t, dt cpu15 0 vm estimate, collect, v, t, dv cpu14 0 vm estimate, collect, v, t, dv cpu13 0 vm estimate, collect, v, t, dv cpu12 0 vm estimate, collect, v, t, dv cpu11 0 vm estimate, collect, v, t, dv cpu10 0 vm estimate, collect, v, t, dv cpu9 0 vm estimate, collect, v, t, dv cpu8 0 vm estimate, collect, v, t, dv cpu7 0 vm estimate, collect, v, t, dv cpu6 0 vm estimate, collect, v, t, dv cpu5 0 vm estimate, collect, v, t, dv cpu4 0 vm estimate, collect, v, t, dv cpu3 0 vm estimate, collect, v, t, dv cpu2 0 vm estimate, collect, v, t, dv cpu1 0 vm estimate, collect, v, t, dv cpu0 0 vm estimate, collect, v, t, dv hardclock 0 skew estimate, collect, t system-power 0 power estimate, collect, v, t, dt autoconf 0 ??? estimate, collect, t seed128 ??? estimate, collect, v # rndctl -s 0 bits mixed into pool 128 bits currently stored in pool (max 256) 0 bits of entropy discarded due to full pool 0 hard-random bits generated 0 pseudo-random bits generated # sysctl kern.entropy kern.entropy.collection = 1 kern.entropy.depletion = 0 kern.entropy.consolidate = -11774 kern.entropy.gather = -11774 kern.entropy.needed = 128 kern.entropy.pending = 0 kern.entropy.epoch = 8 Even if I set the network devices to collect (rndctl -c -e -t net), nothing changes. Again, for the record: # uname -a NetBSD b2 9.99.81 NetBSD 9.99.81 (XEN3_DOMU) #1: Tue Mar 23 14:26:58 PDT 2021 woods@b2:/build/woods/b2/current-amd64-amd64-obj/work/woods/m-NetBSD-current/sys/arch/amd64/compile/XEN3_DOMU amd64 Also, is python the only thing that calls getrandom(2) with the flags parameter set to the recommended value of zero? Also, is the behaviour of getrandom(2) supposed to be the same as /dev/random, i.e. w.r.t. the note in the original announcement of the entropy overhaul, i.e. that it should never block once the system as achieved full entropy? - /dev/random no longer blocks repeatedly: it will block after boot until the system has full entropy, and the never again. This means applications that issue repeated reads from /dev/random will no longer repeatedly hang. If so then can I assume no device is actually contributing entropy and that the system never achieved full entropy? Should Xen domUs be running the commands recommended in the entropy overhaul announcement to fool the system into thinking it has full entropy? dd if=/dev/urandom of=/dev/random bs=32 count=1 sysctl -w kern.entropy.consolidate=1 After I do this then I can read from /dev/random without blocking. Can Xen domUs get entropy from their dom0? Perhaps via xenstore? Finally I just noticed that syslogd isn't collecting "entropy" messages from the kernel. My /var/log/kern does _not_ contain the following (from dmesg): [ 517813.480815] entropy: pid 19875 (python) blocking due to lack of entropy [ 520426.415882] entropy: pid 19875 (python) blocking due to lack of entropy [ 520468.885538] entropy: pid 19875 (python) blocking due to lack of entropy [ 543351.589752] entropy: pid 19875 (python) blocking due to lack of entropy [ 543351.589752] entropy: pid 19875 (python) blocking due to lack of