Hi Julien,
On 5/1/2024 4:13 AM, Julien Grall wrote:
Hi Henry,
On 30/04/2024 04:50, Henry Wang wrote:
On 4/25/2024 10:28 PM, Julien Grall wrote:
Thanks for your feeedback. After checking the b8577547236f commit
message I think I now understand your point. Do you have any
suggestion about how can I properly add the support to route/remove
the IRQ to running domains? Thanks.
I spent some time going through the GIC/vGIC code and had some
discussions with Stefano and Stewart during the last couple of days,
let me see if I can describe the use case properly now to continue
the discussion:
We have some use cases that requires assigning devices to domains
after domain boot time. For example, suppose there is an FPGA on the
board which can simulate a device, and the bitstream for the FPGA is
provided and programmed after domain boot. So we need a way to assign
the device to the running domain. This series tries to implement this
use case by using device tree overlay - users can firstly add the
overlay to Xen dtb, assign the device in the overlay to a domain by
the xl command, then apply the overlay to Linux.
Thanks for the description! This helps to understand your goal :).
Thank you very much for spending your time on discussing this and
provide these valuable comments!
I haven't really look at that code in quite a while. I think we need
to make sure that the virtual and physical IRQ state matches at the
time we do the routing.
I am undecided on whether we want to simply prevent the action to
happen or try to reset the state.
There is also the question of what to do if the guest is enabling
the vIRQ before it is routed.
Sorry for bothering, would you mind elaborating a bit more about the
two cases that you mentioned above? Commit b8577547236f ("xen/arm:
Restrict when a physical IRQ can be routed/removed from/to a domain")
only said there will be undesirable effects, so I am not sure if I
understand the concerns raised above and the consequences of these
two use cases.
I will try to explain them below after I answer the rest.
I am probably wrong, I think when we add the overlay, we are probably
fine as the interrupt is not being used before.
What if the DT overlay is unloaded and then reloaded? Wouldn't the
same interrupt be re-used? As a more generic case, this could also be
a new bitstream for the FPGA.
But even if the interrupt is brand new every time for the DT overlay,
you are effectively relaxing the check for every user (such as
XEN_DOMCTL_bind_pt_irq). So the interrupt re-use case needs to be
taken into account.
I agree. I think IIUC, with your explanation here and below, could we
simplify the problem to how to properly handle the removal of the IRQ
from a running guest, if we always properly remove and clean up the
information when remove the IRQ from the guest? In this way, the IRQ can
always be viewed as a brand new one when we add it back. Then the only
corner case that we need to take care of would be...
Also since we only load the device driver after the IRQ is routed to
the guest,
This is what a well-behave guest will do. However, we need to think
what will happen if a guest misbehaves. I am not concerned about a
guest only impacting itself, I am more concerned about the case where
the rest of the system is impacted.
I am not sure the guest can enable the vIRQ before it is routed.
Xen allows the guest to enable a vIRQ even if there is no pIRQ
assigned. Thanksfully, it looks like the vgic_connect_hw_irq(), in
both the current and new vGIC, will return an error if we are trying
to route a pIRQ to an already enabled vIRQ.
But we need to investigate all the possible scenarios to make sure
that any inconsistencies between the physical state and virtual state
(including the LRs) will not result to bigger problem.
The one that comes to my mind is: The physical interrupt is
de-assigned from the guest before it was EOIed. In this case, the
interrupt will still be in the LR with the HW bit set. This would
allow the guest to EOI the interrupt even if it is routed to someone
else. It is unclear what would be the impact on the other guest.
...same as this case, i.e.
test_bit(_IRQ_INPROGRESS, &desc->status) || !test_bit(_IRQ_DISABLED,
&desc->status)) when we try to remove the IRQ from a running domain.
we have 3 possible states which can be read from LR for this case :
active, pending, pending and active.
- I don't think we can do anything about the active state, so we should
return -EBUSY and reject the whole operation of removing the IRQ from
running guest, and user can always retry this operation.
- For the pending (and active) case, can we clear the LR and point the
LR for the pending_irq to invalid?
Kind regards,
Henry
Cheers,