On 28/01/2019 07:13, Zheng Xiang wrote: > Hi Marc, > > Thanks for your review. > > On 2019/1/26 19:38, Marc Zyngier wrote: >> Hi Zheng, >> >> On Sat, 26 Jan 2019 06:16:24 +0000, >> Zheng Xiang <zhengxia...@huawei.com> wrote: >>> >>> Currently each PCI device under a PCI Bridge shares the same device id >>> and ITS device. Assume there are two PCI devices call its_msi_prepare >>> concurrently and they are both going to find and create their ITS >>> device. There is a chance that the later one couldn't find ITS device >>> before the other one creating the ITS device. It will cause the later >>> one to create a different ITS device even if they have the same >>> device_id. >> >> Interesting finding. Is this something you've actually seen in practice >> with two devices being probed in parallel? Or something that you found >> by inspection? > > Yes, I find this problem after analyzing the reason of VM hung. At last, I > find that the virtio-gpu cannot receive the MSI interrupts due to sharing > a same event_id as virtio-serial. > > See https://lkml.org/lkml/2019/1/10/299 for the bug report. > > This problem can be reproducted with high probability by booting a Qemu/KVM > VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge > and also adding some delay before creating ITS device.
Fair enough. Do you mind sharing your QEMU command line? It'd be useful if I could reproduce it here (and would give me a way to check that it doesn't regress). > >> >> The whole RID aliasing is such a mess, I wish we never supported >> it. Anyway, comments below. >> >>> >>> Signed-off-by: Zheng Xiang <zhengxia...@huawei.com> >>> --- >>> drivers/irqchip/irq-gic-v3-its.c | 52 >>> +++++++++++++++------------------------- >>> 1 file changed, 19 insertions(+), 33 deletions(-) >>> >>> diff --git a/drivers/irqchip/irq-gic-v3-its.c >>> b/drivers/irqchip/irq-gic-v3-its.c >>> index db20e99..397edc8 100644 >>> --- a/drivers/irqchip/irq-gic-v3-its.c >>> +++ b/drivers/irqchip/irq-gic-v3-its.c >>> @@ -2205,25 +2205,6 @@ static void its_cpu_init_collections(void) >>> raw_spin_unlock(&its_lock); >>> } >>> >>> -static struct its_device *its_find_device(struct its_node *its, u32 dev_id) >>> -{ >>> - struct its_device *its_dev = NULL, *tmp; >>> - unsigned long flags; >>> - >>> - raw_spin_lock_irqsave(&its->lock, flags); >>> - >>> - list_for_each_entry(tmp, &its->its_device_list, entry) { >>> - if (tmp->device_id == dev_id) { >>> - its_dev = tmp; >>> - break; >>> - } >>> - } >>> - >>> - raw_spin_unlock_irqrestore(&its->lock, flags); >>> - >>> - return its_dev; >>> -} >>> - >>> static struct its_baser *its_get_baser(struct its_node *its, u32 type) >>> { >>> int i; >>> @@ -2321,7 +2302,7 @@ static bool its_alloc_vpe_table(u32 vpe_id) >>> static struct its_device *its_create_device(struct its_node *its, u32 >>> dev_id, >>> int nvecs, bool alloc_lpis) >>> { >>> - struct its_device *dev; >>> + struct its_device *dev = NULL, *tmp; >>> unsigned long *lpi_map = NULL; >>> unsigned long flags; >>> u16 *col_map = NULL; >>> @@ -2331,6 +2312,24 @@ static struct its_device *its_create_device(struct >>> its_node *its, u32 dev_id, >>> int nr_ites; >>> int sz; >>> >>> + raw_spin_lock_irqsave(&its->lock, flags); >>> + list_for_each_entry(tmp, &its->its_device_list, entry) { >>> + if (tmp->device_id == dev_id) { >>> + dev = tmp; >>> + break; >>> + } >>> + } >>> + if (dev) { >>> + /* >>> + * We already have seen this ID, probably through >>> + * another alias (PCI bridge of some sort). No need to >>> + * create the device. >>> + */ >>> + pr_debug("Reusing ITT for devID %x\n", dev_id); >>> + raw_spin_unlock_irqrestore(&its->lock, flags); >>> + return dev; >>> + } >>> + >>> if (!its_alloc_device_table(its, dev_id)) >> >> You're now performing all sort of allocations in an atomic context, >> which is pretty horrible (and the kernel will shout at you for doing >> so). >> >> We could probably keep the current logic and wrap it around a mutex >> instead, which would give us the appropriate guarantees WRT allocations. >> Something along those lines (untested):> >> diff --git a/drivers/irqchip/irq-gic-v3-its.c >> b/drivers/irqchip/irq-gic-v3-its.c >> index db20e992a40f..99feb62e63ba 100644 >> --- a/drivers/irqchip/irq-gic-v3-its.c >> +++ b/drivers/irqchip/irq-gic-v3-its.c >> @@ -97,9 +97,14 @@ struct its_device; >> * The ITS structure - contains most of the infrastructure, with the >> * top-level MSI domain, the command queue, the collections, and the >> * list of devices writing to it. >> + * >> + * alloc_lock has to be taken for any allocation that can happen at >> + * run time, while the spinlock must be taken to parse data structures >> + * such as the device list. >> */ >> struct its_node { >> raw_spinlock_t lock; >> + struct mutex alloc_lock; >> struct list_head entry; >> void __iomem *base; >> phys_addr_t phys_base; >> @@ -2421,6 +2426,7 @@ static int its_msi_prepare(struct irq_domain *domain, >> struct device *dev, >> struct its_device *its_dev; >> struct msi_domain_info *msi_info; >> u32 dev_id; >> + int err = 0; >> >> /* >> * We ignore "dev" entierely, and rely on the dev_id that has >> @@ -2443,6 +2449,7 @@ static int its_msi_prepare(struct irq_domain *domain, >> struct device *dev, >> return -EINVAL; >> } >> >> + mutex_lock(&its->alloc_lock); >> its_dev = its_find_device(its, dev_id); >> if (its_dev) { >> /* >> @@ -2455,11 +2462,14 @@ static int its_msi_prepare(struct irq_domain >> *domain, struct device *dev, >> } >> >> its_dev = its_create_device(its, dev_id, nvec, true); >> - if (!its_dev) >> - return -ENOMEM; >> + if (!its_dev) { >> + err = -ENOMEM; >> + goto out; >> + } >> >> pr_debug("ITT %d entries, %d bits\n", nvec, ilog2(nvec)); >> out: >> + mutex_unlock(&its->alloc_lock); >> info->scratchpad[0].ptr = its_dev; >> return 0; > > Should it return *err* here? Absolutely. Does it fix the problem for you? Thanks, M. -- Jazz is not dead. It just smells funny...