On 27/03/2019 17:50, Henning Schild wrote:
> Am Wed, 27 Mar 2019 17:15:37 +0100
> schrieb Ralf Ramsauer <ralf.ramsa...@oth-regensburg.de>:
> 
>> On 3/26/19 6:03 PM, Jan Kiszka wrote:
>>> On 26.03.19 17:41, Ralf Ramsauer wrote:  
>>>> Hi Jan,
>>>>
>>>> On 3/25/19 4:03 PM, Jan Kiszka wrote:  
>>>>> On 25.03.19 14:18, Andrej Utz wrote:  
>>>>>> Greetings Jailhouse developers,
>>>>>>
>>>>>> I am trying to run Jailhouse on AMD Ryzen 2700X (x86_64) with
>>>>>> B450 chipset and
>>>>>> got into some problems.
>>>>>>
>>>>>> After whitelisting some I/O ports and putting "amd_iommu=off
>>>>>> mce=off" I managed
>>>>>> to enable Jailhouse, but instantly lost some USB ports (keyboard
>>>>>> being one of
>>>>>> them). After some retries I noticed this happens only 80 % of
>>>>>> the time and it
>>>>>> seems that some interrupts are never acknowledged and keep
>>>>>> blocking the USB hub.
>>>>>>  
>>>>>
>>>>> A typical pattern if the interrupt controller (IOAPIC or even
>>>>> APIC) is directly
>>>>> accesses by the guest. Or of the MSI-X page of a PCI device is
>>>>> passed through.
>>>>> Double-check if none of the resources is guest-assigned. Jailhouse
>>>>> needs to
>>>>> intercept them.  
>>>>
>>>> Looks like jailhouse-config-create might have issues with parsing
>>>> IVRS tables on AMD. This is why both irq chips had the same ID in
>>>> our config (cf. Andrej's attachment).  
>>>
>>> Hmm, another variable shadowing like we have in
>>> jailhouse-hardware-check? 
>>>>
>>>> Parsing the table with hexdump, AMD's manual and five fingers on
>>>> two hands gave us the correct ID. Andrej will provide a patch soon.
>>>> (BTW, the python-parsers are really hard to read)  
>>>
>>> Improvement ideas welcome.
>>>   
>>>>
>>>> So our APIC IDs were wrong in the system configuration, but still,
>>>> this doesn't solve the issue.
>>>>
>>>> I double checked that the APIC region is not directly assigned to
>>>> the guest.
>>>>
>>>> So in sum, we currently face two issues on AMD:
>>>>    - Loose USB interrupts on enabling with high probability.
>>>> Disabling jailhouse works, but won't revive it.
>>>>    - Loose our network device on cell create
>>>>
>>>> Somehow, those two problems smell related, and maybe the second
>>>> one is indirectly solved after solving the first one. Let's see.  
>>>
>>> Do both interrupts have something in common? Maybe something other
>>> devices that
>>> still have working interrupts do not? Are they INTx, MSI, MSI-X?  
>>
>> We had a Vodoo-Debugging session today. All interrupts that seem to
>> disappear are edge-triggered MSI-X interrupts. The first thing we
>> tried was pci=nomsi. This turns them to legacy IOAPIC interrupts, but
>> the problem pattern still remains the same.
>>
>> When using legacy interrupts, interrupts looked like this:
>>   - IRQ 25: xhci (some USB 3.1 ports), enp3s0
>>   - IRQ 29: xhci (some other USB 3.0)
>>
>> So IRQ 25 seems to be shared. The funny thing is that while USB 3.0
>> (IRQ 29) and enp3s0 (IRQ 25) died, USB 3.1 (also IRQ 25) still worked…
>>
>> After a while, we found that if there is no ethernet link (cable
>> disconnected) and no USB devices connected (we use a PS/2 keyboard for
>> enabling jh), everything seems to be stable after enabling jh. USB +
>> Ethernet works fine if we bring up devices after enabling.
>>
>> Yes, we tried turning it off and on again :-) Subsequent jailhouse
>> disable/enable sequences then seem to remain stable, it's look like
>> that it's 'important' that those devices are disconnected before
>> enabling jailhouse for the first time.
>>
>> So at least we found some pattern so far.
> 
> Not sure if AMD has an SMI counter, but i wonder whether the BIOS is
> messing with you. BIOSs oftern emulate "good old" input devs until the
> OS initializes USB, for keyboard usage in the bootloader and so on.
> The NIC could have such a thing going on for PXE.
> 

We disabled all kinds of legacy and emulation stuff in BIOS/UEFI and
also its network stack (so no PXE) but results were the same.

AMD has SMI counter not as a register but as a event csource from IOMMU.
I let 'perf stat -e smi_recv -e smi_blk' run for some minutes while
stessing the hardware and randomly disabling cpu cores, but none of the
expected SMI events occured.
Even more suprisingly tracing with 'hwlat' in kernel for a while
produced not a single trace entry. Seems the hardware is really tame at
least in that aspect.

Andrej

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jailhouse-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Attachment: pEpkey.asc
Description: application/pgp-keys

Reply via email to