Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-04-01 Thread Manuel Bouyer
On Wed, Mar 31, 2021 at 09:58:48PM -0400, Thor Lancelot Simon wrote:
> On Wed, Mar 31, 2021 at 11:24:07AM +0200, Manuel Bouyer wrote:
> > On Tue, Mar 30, 2021 at 10:42:53PM +, Taylor R Campbell wrote:
> > > 
> > > There are no virtual RNG devices on the system in question, according
> > > to the quoted `rndctl -l' output.  Perhaps the VM host needs to be
> > > taught to expose a virtio-rng device to the guest?
> > 
> > There is no such thing in Xen.
> 
> Is the CPU so old that it doesn't have RDRAND / RDSEED, or is Xen perhaps
> masking these CPU features from the guest?

Is there an easy way to test, on a netbsd-9 system, if the instruction is
present and working ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-31 Thread Thor Lancelot Simon
On Wed, Mar 31, 2021 at 11:24:07AM +0200, Manuel Bouyer wrote:
> On Tue, Mar 30, 2021 at 10:42:53PM +, Taylor R Campbell wrote:
> > 
> > There are no virtual RNG devices on the system in question, according
> > to the quoted `rndctl -l' output.  Perhaps the VM host needs to be
> > taught to expose a virtio-rng device to the guest?
> 
> There is no such thing in Xen.

Is the CPU so old that it doesn't have RDRAND / RDSEED, or is Xen perhaps
masking these CPU features from the guest?

Thor


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-31 Thread Havard Eidnes
> On Wed, Mar 31, 2021 at 12:12:31AM +, Taylor R Campbell wrote:
>> This is false.  If the VM host provided a viornd(4) device then NetBSD
>> would automatically collect, and count, entropy from the host, with no
>> manual intervention.
>
> I would love to see instructions how to do this - I have not seen a working
> virond(4) in any of my Xen domU (but that is a very limited sample).

While this isn't with Xen, and isn't on -current, but this is
what I do for my emulated arm64 system, where the emulator runs on
NetBSD/amd64 8.0:

#!/bin/sh
SMP=4
MEM=8g
qemu-system-aarch64 -M virt -cpu cortex-a57 -smp $SMP -m $MEM \
  -drive if=none,file=disk.img,id=hd0 -device virtio-blk-device,drive=hd0 \
  -netdev type=user,id=net0,hostfwd=tcp::-:22,ipv6=off \
  -nographic \
  -device virtio-net-device,netdev=net0,mac=00:11:66:33:44:55 \
  -device virtio-rng-device \
  -kernel netbsd.img -append root=ld4a

and the booted system is NetBSD/aarch64 9.0 with the unmodified
GENERIC64 kernel:

arm64# rndctl -l
Source Bits Type  Flags
cpu3   7824 vm   estimate, collect, v, t, dv
cpu2   8983 vm   estimate, collect, v, t, dv
cpu1   8351 vm   estimate, collect, v, t, dv
cpu0  12436 vm   estimate, collect, v, t, dv
ld4 8440476 disk estimate, collect, v, t, dt
viornd04096 rng  estimate, collect, v
system-power  0 power estimate, collect, v, t, dt
autoconf 72 ???  estimate, collect, t, dt
printf0 ???  collect
callout 116 skew estimate, collect, v, dv
arm64#
arm64# dmesg | grep rnd
[ 1.10] viornd0 at virtio29: Features: 0x1000
arm64# 
arm64# dmesg | grep virtio29
[ 1.10] virtio29 at simplebus0
[ 1.10] viornd0 at virtio29: Features: 0x1000
[ 1.10] virtio29: allocated 32768 byte for virtqueue 0 for Entropy 
request, size 1024
[ 1.10] virtio29: interrupting on GIC irq 77
arm64# 

When I get to booting a past-rng-rework kernel, I'm fairly
certain that only the input from viornd0 will remain as a source
with "estimate" in the flags field.  Of course, any saved and
restored entropy will also count towards the estimate.


That said, it doesn't look like the amd64 XEN3_DOMU kernel has either
of virtio* or viornd* configured, they're only in the GENERIC and ALL
kernel configs.  Also, I don't know what has to happen on the XEN
"host side" to provide those devices; virtio* is apparently supposed
to be made visible via the pci bus (looking at amd64's GENERIC), but
by the looks of it, XEN only does "pci passthrough" to physical
devices (looking at the comments near the commented-out "pci" config
statements in XEN3_DOMU), so no "emulated" PCI bus where the host can
provide the host-side of the randomness virtual device?

Regards,

- HÃ¥vard


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-31 Thread Manuel Bouyer
On Tue, Mar 30, 2021 at 10:42:53PM +, Taylor R Campbell wrote:
> > Date: Tue, 30 Mar 2021 23:53:43 +0200
> > From: Manuel Bouyer 
> > 
> > On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote:
> > > [...]
> > > 
> > > Perhaps the answer is that nothing seems to be contributing anything to
> > > the entropy pool.  No matter what device I exercise, none of the numbers
> > > in the following changes:
> > 
> > yes, it's been this way since the rnd rototill. Virtual devices are
> > not trusted.
> > 
> > The only way is to manually seed the pool.
> 
> This is false.  The virtual RNG drivers (viornd(4) [1], rump
> hyperentropy [2], maybe others) all assume the VM host provides
> samples with full entropy.  This has always been the case, and this
> didn't change at all in the rototill last year.
> 
> There are no virtual RNG devices on the system in question, according
> to the quoted `rndctl -l' output.  Perhaps the VM host needs to be
> taught to expose a virtio-rng device to the guest?

There is no such thing in Xen.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-30 Thread Martin Husemann
On Wed, Mar 31, 2021 at 12:12:31AM +, Taylor R Campbell wrote:
> This is false.  If the VM host provided a viornd(4) device then NetBSD
> would automatically collect, and count, entropy from the host, with no
> manual intervention.

I would love to see instructions how to do this - I have not seen a working
virond(4) in any of my Xen domU (but that is a very limited sample).

Martin


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-30 Thread Greg A. Woods
[[ sorry I've not been catching up on mailing list discussions as fast
as I had hoped to, and I'm way behind on following the entropy rototill. ]]

At Wed, 31 Mar 2021 00:12:31 +, Taylor R Campbell  
wrote:
Subject: Re: nothing contributing entropy in Xen domUs?  (causing python3.7 
rebuild to get stuck in kernel in "entropy" during an "import" statement)
>
> This is false.  If the VM host provided a viornd(4) device then NetBSD
> would automatically collect, and count, entropy from the host, with no
> manual intervention.

I'll leave that idea to others more up-to-date on Xen PV drivers to
respond to.  Booting a -current GENERIC kernel (which has both Xen PV
and virtio(4) devices configured into it) in a "type='pvh'" domU only
attaches the xenbus PV devices, no virtio devices, so adding virtio
might be a bit of a much bigger task that will need further support on
at least the backend, and perhaps on the front-end too, especially to do
it without QEMU.  I haven't tried if virtio devices show up in an HVM
domU precisely because I'm trying to avoid having to run and rely on
QEMU (never mind any performance implications of HVM).

> > Finally, if the system isn't actually collecting entropy from a device,
> > then why the heck does it allow me to think it is (i.e. by allowing me
> > to enable it and show it as enabled and collecting via "rndctl -l")?
>
> The system does collect samples from all those devices.  However, they
> are not designed to be unpredictable and there is no good reliable
> model for just how unpredictable they are, so the system doesn't
> _count_ anything from them.  See https://man.NetBSD.org/entropy.4 for
> a high-level overview.

I'm not sure the word "count" appears in entropy(4) any context I can
make sense of it in w.r.t. what it means to "collect" but not "count"
entropy from those devices.

Worse the "Flags" shown by "rndctl -l" don't seem to be directly
documented (i.e. they're not described in rndctl(8)), and even on a
kernel running on real hardware I don't see the word "count" showing
there.

After looking at the source I'm not sure the descriptions of the
RND_FLAG_* values in rnd(4) help me much either.

Based on my vague understanding of all of this, perhaps you meant to say
"estimate", instead of "count"?  That would make more sense in the
context of what I read in rnd(4) and rndctl(8), though "estimate" still
seems a little vague in meaning to me.

In any case, I don't see why an xbd disk, or a xennet interface, can't
be treated exactly as if they were real hardware (i.e. in terms of
extracting entropy from their behaviour).  This is exactly what
virtualization is all about to me -- even for paravirtualization.  After
all in a threat-free world (i.e. specifically where I also trust other
domUs) their entropy is going to reflect (though maybe not exactly
mirror) the entropy of the underlying hardware and/or network traffic.
So (but maybe not by default) if I as the admin want to trust the
entropy available from an xbd(4) or xennet(4) device, then I should be
able to enable it with rndctl(8) and have it "count".

More importantly though the system shouldn't mislead me into thinking it
is "counting" entropy from a device when it is actually not.  If I had
seen that there were no sources estimating/counting/whatever entropy,
and I tried to enable one and was given a nice error message about this
not being possible, then I would have looked elsewhere to find out how
to give the system more bits of entropy.  As is in my Xen domU system
the output of "rndctl -l" leads me to believe all of my devices are
collecting both timing and value samples, and using either one or the
other to gather entropy (though with '-v' I don't see that any bits of
entropy have been added from any of those amy millions of collected
samples).

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpcOwz5f2PVj.pgp
Description: OpenPGP Digital Signature


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-30 Thread Taylor R Campbell
> Date: Tue, 30 Mar 2021 16:23:43 -0700
> From: "Greg A. Woods" 
> 
> At Tue, 30 Mar 2021 23:53:43 +0200, Manuel Bouyer  
> wrote:
> > On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote:
> > > Perhaps the answer is that nothing seems to be contributing anything to
> > > the entropy pool.  No matter what device I exercise, none of the numbers
> > > in the following changes:
> >
> > yes, it's been this way since the rnd rototill. Virtual devices are
> > not trusted.
> >
> > The only way is to manually seed the pool.
> 
> Ah, so that is definitely not what I expected!

This is false.  If the VM host provided a viornd(4) device then NetBSD
would automatically collect, and count, entropy from the host, with no
manual intervention.

> Finally, if the system isn't actually collecting entropy from a device,
> then why the heck does it allow me to think it is (i.e. by allowing me
> to enable it and show it as enabled and collecting via "rndctl -l")?

The system does collect samples from all those devices.  However, they
are not designed to be unpredictable and there is no good reliable
model for just how unpredictable they are, so the system doesn't
_count_ anything from them.  See https://man.NetBSD.org/entropy.4 for
a high-level overview.

In the past we used an essentially meaningless model, designed in a
vacuum without reference to any information about the physics of the
sources of the samples (and the same model with all sources), for
fabricating entropy estimates by examining the sample data.  This
practice no longer happens.


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-30 Thread Greg A. Woods
At Tue, 30 Mar 2021 23:53:43 +0200, Manuel Bouyer  
wrote:
Subject: Re: nothing contributing entropy in Xen domUs?  (causing python3.7 
rebuild to get stuck in kernel in "entropy" during an "import" statement)
>
> On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote:
> > [...]
> >
> > Perhaps the answer is that nothing seems to be contributing anything to
> > the entropy pool.  No matter what device I exercise, none of the numbers
> > in the following changes:
>
> yes, it's been this way since the rnd rototill. Virtual devices are
> not trusted.
>
> The only way is to manually seed the pool.

Ah, so that is definitely not what I expected!

Previously wasn't it up to the local admin what to trust?  I guess
throwing bits into /dev/random is one way to play that game, but

I have to trust the dom0 implicitly and utterly anyway, so why not trust
the devices it presents?

This is especially true for xbd block devices.  All my blocks are belong
to dom0.

The network device is in effect no different than if it were real
hardware, so if I want to trust network traffic, then I should be able
to enable it, just as I could if it were real hardware.

The CPUs are also probably the least "virtual" things in Xen, so why not
trust them?  (Though I'm not sure I understand what entropy they can
offer in the first place.)

Finally, if the system isn't actually collecting entropy from a device,
then why the heck does it allow me to think it is (i.e. by allowing me
to enable it and show it as enabled and collecting via "rndctl -l")?

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpE2Nup3Gb9V.pgp
Description: OpenPGP Digital Signature


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-30 Thread Taylor R Campbell
> Date: Tue, 30 Mar 2021 23:53:43 +0200
> From: Manuel Bouyer 
> 
> On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote:
> > [...]
> > 
> > Perhaps the answer is that nothing seems to be contributing anything to
> > the entropy pool.  No matter what device I exercise, none of the numbers
> > in the following changes:
> 
> yes, it's been this way since the rnd rototill. Virtual devices are
> not trusted.
> 
> The only way is to manually seed the pool.

This is false.  The virtual RNG drivers (viornd(4) [1], rump
hyperentropy [2], maybe others) all assume the VM host provides
samples with full entropy.  This has always been the case, and this
didn't change at all in the rototill last year.

There are no virtual RNG devices on the system in question, according
to the quoted `rndctl -l' output.  Perhaps the VM host needs to be
taught to expose a virtio-rng device to the guest?


[1] https://nxr.netbsd.org/xref/src/sys/dev/pci/viornd.c#245
[2] https://nxr.netbsd.org/xref/src/sys/rump/librump/rumpkern/hyperentropy.c#57


P.S.  Further discussion about Python, getrandom, and system
integration:
https://mail-index.netbsd.org/tech-userlevel/2021/01/11/msg012807.html


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-30 Thread Manuel Bouyer
On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote:
> [...]
> 
> Perhaps the answer is that nothing seems to be contributing anything to
> the entropy pool.  No matter what device I exercise, none of the numbers
> in the following changes:

yes, it's been this way since the rnd rototill. Virtual devices are
not trusted.

The only way is to manually seed the pool.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-30 Thread Greg A. Woods
Further to this, here's the ktrace output from just before and after the
SIGINT:

  9350   9350 python   0.122852724 RET   mmap 127488830459904/0x73f34e73d000
  9350   9350 python   0.122890259 CALL  getrandom(0x73f34e742610,0x20,0)
  9350   9350 python   6.141515919 RET   getrandom -1 errno 4 Interrupted 
system call
  9350   9350 python   6.141522183 PSIG  SIGINT caught handler=0x73f34f182b25 
mask=(): code=SI_NOINFO

So, how can getramdom(2) hang for so long despite the rest of the system
running on and doing things for over a day on a system that's been up
and running and busy building packages for nearly a week?

Perhaps the answer is that nothing seems to be contributing anything to
the entropy pool.  No matter what device I exercise, none of the numbers
in the following changes:


# rndctl -l
Source Bits Type  Flags
/dev/random   0 ???  estimate, collect, v
xbd6  0 disk estimate, collect, v, t, dt
xbd5  0 disk estimate, collect, v, t, dt
xbd4  0 disk estimate, collect, v, t, dt
xbd3  0 disk estimate, collect, v, t, dt
xennet0   0 net  estimate, v, t, dt
xbd2  0 disk estimate, collect, v, t, dt
xbd1  0 disk estimate, collect, v, t, dt
xbd0  0 disk estimate, collect, v, t, dt
cpu15 0 vm   estimate, collect, v, t, dv
cpu14 0 vm   estimate, collect, v, t, dv
cpu13 0 vm   estimate, collect, v, t, dv
cpu12 0 vm   estimate, collect, v, t, dv
cpu11 0 vm   estimate, collect, v, t, dv
cpu10 0 vm   estimate, collect, v, t, dv
cpu9  0 vm   estimate, collect, v, t, dv
cpu8  0 vm   estimate, collect, v, t, dv
cpu7  0 vm   estimate, collect, v, t, dv
cpu6  0 vm   estimate, collect, v, t, dv
cpu5  0 vm   estimate, collect, v, t, dv
cpu4  0 vm   estimate, collect, v, t, dv
cpu3  0 vm   estimate, collect, v, t, dv
cpu2  0 vm   estimate, collect, v, t, dv
cpu1  0 vm   estimate, collect, v, t, dv
cpu0  0 vm   estimate, collect, v, t, dv
hardclock 0 skew estimate, collect, t
system-power  0 power estimate, collect, v, t, dt
autoconf  0 ???  estimate, collect, t
seed128 ???  estimate, collect, v
# rndctl -s
0 bits mixed into pool
  128 bits currently stored in pool (max 256)
0 bits of entropy discarded due to full pool
0 hard-random bits generated
0 pseudo-random bits generated
# sysctl kern.entropy
kern.entropy.collection = 1
kern.entropy.depletion = 0
kern.entropy.consolidate = -11774
kern.entropy.gather = -11774
kern.entropy.needed = 128
kern.entropy.pending = 0
kern.entropy.epoch = 8

Even if I set the network devices to collect (rndctl -c -e -t net),
nothing changes.

Again, for the record:

# uname -a
NetBSD b2 9.99.81 NetBSD 9.99.81 (XEN3_DOMU) #1: Tue Mar 23 14:26:58 PDT 2021  
woods@b2:/build/woods/b2/current-amd64-amd64-obj/work/woods/m-NetBSD-current/sys/arch/amd64/compile/XEN3_DOMU
 amd64


Also, is python the only thing that calls getrandom(2) with the flags
parameter set to the recommended value of zero?

Also, is the behaviour of getrandom(2) supposed to be the same as
/dev/random, i.e. w.r.t. the note in the original announcement of the
entropy overhaul, i.e. that it should never block once the system as
achieved full entropy?

- /dev/random no longer blocks repeatedly: it will block after boot
  until the system has full entropy, and the never again.  This means
  applications that issue repeated reads from /dev/random will no longer
  repeatedly hang.

If so then can I assume no device is actually contributing entropy and
that the system never achieved full entropy?

Should Xen domUs be running the commands recommended in the entropy
overhaul announcement to fool the system into thinking it has full
entropy?

dd if=/dev/urandom of=/dev/random bs=32 count=1
sysctl -w kern.entropy.consolidate=1

After I do this then I can read from /dev/random without blocking.

Can Xen domUs get entropy from their dom0?  Perhaps via xenstore?

Finally I just noticed that syslogd isn't collecting "entropy" messages
from the kernel.  My /var/log/kern does _not_ contain the following
(from dmesg):

[ 517813.480815] entropy: pid 19875 (python) blocking due to lack of entropy
[ 520426.415882] entropy: pid 19875 (python) blocking due to lack of entropy
[ 520468.885538] entropy: pid 19875 (python) blocking due to lack of entropy
[ 543351.589752] entropy: pid 19875 (python) blocking due to lack of entropy
[ 543351.589752] entropy: pid 19875 (python) blocking due to lack of