hi,bruce
On 5/29/2018 7:20 PM, Bruce Richardson wrote:
On Thu, May 24, 2018 at 07:55:43AM +0100, Guo, Jia wrote:
<snip>
The hot plug failure handle mechanism should be come across as bellow:
1. Add a new bus ops “handle_hot-unplug”in bus to handle bus
read/write error, it is bus-specific and each
kind of bus can implement its own logic.
2. Implement pci bus specific ops“pci_handle_hot_unplug”, in the
function, base on the
failure address to remap memory which belong to the corresponding
device that unplugged.
3. Implement a new sigbus handler, and register it when start
device event monitoring,
once the MMIO sigbus error exposure, it will trigger the above hot plug
failure handle mechanism,
that will keep app, that working on packet processing, would not be
broken and crash, then could
keep going clean, fail-safe or other working task.
4. Also also will introduce the solution by use testpmd to show
the example of the whole procedure like that:
device unplug ->failure handle->stop forwarding->stop port->close
port->detach port.
Hi Jeff,
so if I understand this correctly the proposal is that we need two parallel
solutions to handle safe removal of a device.
1. We need a solution to support unpluging of the device at the bus level,
ie. remove the device from the list of devices and to make access to
that device invalid.
2. Since the removal of the device from the software lists is not going to
be instantaneous, we need a mechanism to handle any accesses to the
device from the data path until such time as the removal is complete. To
support that, you propose to add a sigbus handler which will
automatically replace any mmio bar mappings with some other memory that is
ok to access - presumable zero memory or similar.
Is this understanding correct?
i think you are correct about that.
Point #2 seems reasonably clear to me, but for #1, presumably the trigger
to the bus needs to come from the kernel. What is planned to be used there?
about point #1, i should clarify here is that, we will use the device
event monitor mechanism to detect the hot unplug event.
the monitor be enabled by app(or fail-safe driver), and app(fail-safe
driver) register the event callback. Once the hot unplug behave be
detected,
the user's callback could be triggered to let app(fail-safe driver) know
the event and manage the process, it will call APIs to stop the device
and detach the device from the bus.
You also talk about using testpmd as a reference for this, but you don't
explain how an application can be notified of a device removal, or even why
that is necessary. Since all applications should now be using the proper
macros when iterating device lists, and not just assuming devices 0-N are
valid, what changes would you see a normal app having to make to be
hotplug-safe?
we could use app or fail-safe driver to use these mechanism , but at
this stage i will firstly use testpmd as a reference.
as above reply, testpmd should enable device event mechanism to monitor
the device removal, and register callback,
the device bdf list is managed by bus and the hoplug fail handler will
be process by the eal layer, then the app would be hotplug-safe.
is there anything i miss to clarify? please shout. and i think i will
detail more later.
Regards,
/Bruce