On 27/03/2019 17:50, Henning Schild wrote: > Am Wed, 27 Mar 2019 17:15:37 +0100 > schrieb Ralf Ramsauer <ralf.ramsa...@oth-regensburg.de>: > >> On 3/26/19 6:03 PM, Jan Kiszka wrote: >>> On 26.03.19 17:41, Ralf Ramsauer wrote: >>>> Hi Jan, >>>> >>>> On 3/25/19 4:03 PM, Jan Kiszka wrote: >>>>> On 25.03.19 14:18, Andrej Utz wrote: >>>>>> Greetings Jailhouse developers, >>>>>> >>>>>> I am trying to run Jailhouse on AMD Ryzen 2700X (x86_64) with >>>>>> B450 chipset and >>>>>> got into some problems. >>>>>> >>>>>> After whitelisting some I/O ports and putting "amd_iommu=off >>>>>> mce=off" I managed >>>>>> to enable Jailhouse, but instantly lost some USB ports (keyboard >>>>>> being one of >>>>>> them). After some retries I noticed this happens only 80 % of >>>>>> the time and it >>>>>> seems that some interrupts are never acknowledged and keep >>>>>> blocking the USB hub. >>>>>> >>>>> >>>>> A typical pattern if the interrupt controller (IOAPIC or even >>>>> APIC) is directly >>>>> accesses by the guest. Or of the MSI-X page of a PCI device is >>>>> passed through. >>>>> Double-check if none of the resources is guest-assigned. Jailhouse >>>>> needs to >>>>> intercept them. >>>> >>>> Looks like jailhouse-config-create might have issues with parsing >>>> IVRS tables on AMD. This is why both irq chips had the same ID in >>>> our config (cf. Andrej's attachment). >>> >>> Hmm, another variable shadowing like we have in >>> jailhouse-hardware-check? >>>> >>>> Parsing the table with hexdump, AMD's manual and five fingers on >>>> two hands gave us the correct ID. Andrej will provide a patch soon. >>>> (BTW, the python-parsers are really hard to read) >>> >>> Improvement ideas welcome. >>> >>>> >>>> So our APIC IDs were wrong in the system configuration, but still, >>>> this doesn't solve the issue. >>>> >>>> I double checked that the APIC region is not directly assigned to >>>> the guest. >>>> >>>> So in sum, we currently face two issues on AMD: >>>> - Loose USB interrupts on enabling with high probability. >>>> Disabling jailhouse works, but won't revive it. >>>> - Loose our network device on cell create >>>> >>>> Somehow, those two problems smell related, and maybe the second >>>> one is indirectly solved after solving the first one. Let's see. >>> >>> Do both interrupts have something in common? Maybe something other >>> devices that >>> still have working interrupts do not? Are they INTx, MSI, MSI-X? >> >> We had a Vodoo-Debugging session today. All interrupts that seem to >> disappear are edge-triggered MSI-X interrupts. The first thing we >> tried was pci=nomsi. This turns them to legacy IOAPIC interrupts, but >> the problem pattern still remains the same. >> >> When using legacy interrupts, interrupts looked like this: >> - IRQ 25: xhci (some USB 3.1 ports), enp3s0 >> - IRQ 29: xhci (some other USB 3.0) >> >> So IRQ 25 seems to be shared. The funny thing is that while USB 3.0 >> (IRQ 29) and enp3s0 (IRQ 25) died, USB 3.1 (also IRQ 25) still worked… >> >> After a while, we found that if there is no ethernet link (cable >> disconnected) and no USB devices connected (we use a PS/2 keyboard for >> enabling jh), everything seems to be stable after enabling jh. USB + >> Ethernet works fine if we bring up devices after enabling. >> >> Yes, we tried turning it off and on again :-) Subsequent jailhouse >> disable/enable sequences then seem to remain stable, it's look like >> that it's 'important' that those devices are disconnected before >> enabling jailhouse for the first time. >> >> So at least we found some pattern so far. > > Not sure if AMD has an SMI counter, but i wonder whether the BIOS is > messing with you. BIOSs oftern emulate "good old" input devs until the > OS initializes USB, for keyboard usage in the bootloader and so on. > The NIC could have such a thing going on for PXE. >
We disabled all kinds of legacy and emulation stuff in BIOS/UEFI and also its network stack (so no PXE) but results were the same. AMD has SMI counter not as a register but as a event csource from IOMMU. I let 'perf stat -e smi_recv -e smi_blk' run for some minutes while stessing the hardware and randomly disabling cpu cores, but none of the expected SMI events occured. Even more suprisingly tracing with 'hwlat' in kernel for a while produced not a single trace entry. Seems the hardware is really tame at least in that aspect. Andrej -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
pEpkey.asc
Description: application/pgp-keys