pci: Unplug remaining devices on pcihost reset

David Hildenbrand Tue, 29 Jan 2019 08:17:49 -0800

On 29.01.19 14:50, Pierre Morel wrote:
> On 29/01/2019 11:24, David Hildenbrand wrote:
>>>>> I'm wondering what the architecture says regarding those events -- can
>>>>> someone with access to the documentation comment?
>>>>
>>>> Ping. Any comments from the IBM folks?
> 
> Hi,
> 
> Sorry to have wait so long.
> At least Collin was faster.
> 
>>>>
>>>
>>>
>>> So the idea here is that if we have a PCI device that is the process of
>>> being deconfigured and we are also in the middle of a reset, then let's
>>> accelerate deconfiguring of the PCI device during the reset. Makes sense.
> 
> to me too.
> However, how do we ensure that the guest got time to respond to the 
> first deconfigure request?


30 seconds, then the reboot. On a reboot, I don't see why we should give
a guest more time. "It's dead", rip out the card as the guest refused to
hand it back. (maybe it crashed! but after a reboot the guest state is
reset and baught back to life)

> 
>>>
>>> Note:
>>>
>>> The callback function will deconfigure the the device and put it into
>>> standby mode. However, a PCI device should only go into standby from the
>>> *disabled state* (which it could already be in due to the unplug
>>> sequence), or from a *permanent error state* (something we should
>>> hopefully never see -- this means something went seriously wrong with
>>> the device).
> 
> Not completely exact, the CHSC event 0x304, on the initiative of the 
> host, force the "deconfigure state" from "configure state" generally, 
> whatever sub-state it has (enabled/disabled/error...).
> 
>>
>> Right, this should already have been checked before setting up the timer.
> 
> Apropos timer, do we need a timer or wouldn't it be better to use a 
> delay / a timer + condition?

I don't think we need a timer at all.

> 
> AFAIU we get out of the unplug without waiting for any answer from the 
> guest and we surely get the timer triggering after the reset has been done.
> That seems bad.

This is the case right now, correct.

> 
> 
>>
>>>
>>> Two things I'm concerned about:
>>>
>>> 1)
>>>
>>> What I would suggest is adding a check for the pbdev->state for
>>> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these
>>> states, then we're safe to deconfigure and put into standby. If the
>>
>> We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.
>>
>> So for
>> - ZPCI_FS_DISABLED
>> - ZPCI_FS_ENABLED
>> - ZPCI_FS_BLOCKED
>> - ZPCI_FS_ERROR
>> - ZPCI_FS_PERMANENT_ERROR
>>
>> We setup a timer and simply go ahead and unplug the device when the
>> timer expires ("forced unplug").
> 
> I agree only for ZPCI_FS_ENABLED why do we need to be smooth for other 
> states?
> ZPCI_FS_DISABLED may be a candidate even I do not see the interrest but 
> other states of the device should issue a HP_EVENT_DECONFIGURE_REQUEST 
> and we do not need a timer (or any delay)

You can always expect that your guest driver is dead.

> 
>>
>> Changing that behavior might be more invasive. Simply not unplugging in
>> s390_pcihost_timer_cb() on some of these states would mean that somebody
>> issued a request and that requests is simply lost/ignored. Not sure if
>> that is what we want. I think we need separate patches to change
>> something like that. Especially
>>
>> 1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
>> the unplug request and moves the device to ZPCI_FS_ENABLED before the
>> timer expires? These are corner cases to consider.
> 
> +1, we must ensure to do the work inside the unplug CB.
> 
>>
>> 2. Do we need a timer at all? Now that Patch #1 introduces
>> unplug_requests, we are free to ignore requests from the user if the
>> guest is not reacting. I would really favor getting rid of the timer
>> completely. Was there a special reason why this was introduced?
> 
> Yes, to let a chance to the guest to smoothly relinquish the device.
> (for example sync/clean the disk)
> However I do not think it is right implemented.
> 
>>
>> No other architecture (e.g. ACPI) uses such a timer. They use a simple
>> flag to remember if a request is pending. I would really favor going
>> into that direction.
> 
> I am not sure that the Intel architecture is a good example. :)

Right, we all learned that zPCI did it better. (sorry ;) )

> 
> AFAIU they do not wait for the guest to have relinquish the device.
> Or do they?
> How long do they wait?

They wait for ever. And I guess we should do the same thing. If the
guest driver is broken (and this is really a rare scenario!) we would
not get the device back. Which is perfectly fine in my point of view. In
all other scenarios I guess the guest will simply respond in a timely
manner. And ripping out stuff from the guest always feels wrong. (yes
the channel subsystem works like that, but here we actually have a choice)

If we reboot, we can unplug the device. Otherwise, let's keep it simply
and don't use a timer.

Thanks!

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset

Reply via email to