Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote: > So at this point, the AMD IOMMU driver does: > > swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0; > > where 'swiotlb' is a global variable indicating whether or not swiotlb > is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which > will call swiotlb_exit() if 'swiotlb' is false. > > Now, that used to work fine, because swiotlb_exit() clears > 'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I > think that all the devices which have successfully probed beforehand will > have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem' > field. Yeah. I don't think we can do that anymore, and I also think it is a bad idea to start with. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
On Mon, Jul 5, 2021 at 8:50 PM Stefan Hajnoczi wrote: > > On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote: > > > > 在 2021/7/4 下午5:49, Yongji Xie 写道: > > > > > OK, I get you now. Since the VIRTIO specification says "Device > > > > > configuration space is generally used for rarely-changing or > > > > > initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG > > > > > ioctl should not be called frequently. > > > > The spec uses MUST and other terms to define the precise requirements. > > > > Here the language (especially the word "generally") is weaker and means > > > > there may be exceptions. > > > > > > > > Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG > > > > approach is reads that have side-effects. For example, imagine a field > > > > containing an error code if the device encounters a problem unrelated to > > > > a specific virtqueue request. Reading from this field resets the error > > > > code to 0, saving the driver an extra configuration space write access > > > > and possibly race conditions. It isn't possible to implement those > > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it > > > > makes me think that the interface does not allow full VIRTIO semantics. > > > > > > Note that though you're correct, my understanding is that config space is > > not suitable for this kind of error propagating. And it would be very hard > > to implement such kind of semantic in some transports. Virtqueue should be > > much better. As Yong Ji quoted, the config space is used for > > "rarely-changing or intialization-time parameters". > > > > > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to > > > handle the message failure, I'm going to add a return value to > > > virtio_config_ops.get() and virtio_cread_* API so that the error can > > > be propagated to the virtio device driver. Then the virtio-blk device > > > driver can be modified to handle that. > > > > > > Jason and Stefan, what do you think of this way? > > Why does VDUSE_DEV_GET_CONFIG need to support an error return value? > We add a timeout and return error in case userspace never replies to the message. > The VIRTIO spec provides no way for the device to report errors from > config space accesses. > > The QEMU virtio-pci implementation returns -1 from invalid > virtio_config_read*() and silently discards virtio_config_write*() > accesses. > > VDUSE can take the same approach with > VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG. > I noticed that virtio_config_read*() only returns -1 when we access a invalid field. But in the VDUSE case, VDUSE_DEV_GET_CONFIG might fail when we access a valid field. Not sure if it's ok to silently ignore this kind of error. Thanks, Yongji ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
在 2021/7/5 下午8:49, Stefan Hajnoczi 写道: On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote: 在 2021/7/4 下午5:49, Yongji Xie 写道: OK, I get you now. Since the VIRTIO specification says "Device configuration space is generally used for rarely-changing or initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG ioctl should not be called frequently. The spec uses MUST and other terms to define the precise requirements. Here the language (especially the word "generally") is weaker and means there may be exceptions. Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG approach is reads that have side-effects. For example, imagine a field containing an error code if the device encounters a problem unrelated to a specific virtqueue request. Reading from this field resets the error code to 0, saving the driver an extra configuration space write access and possibly race conditions. It isn't possible to implement those semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it makes me think that the interface does not allow full VIRTIO semantics. Note that though you're correct, my understanding is that config space is not suitable for this kind of error propagating. And it would be very hard to implement such kind of semantic in some transports. Virtqueue should be much better. As Yong Ji quoted, the config space is used for "rarely-changing or intialization-time parameters". Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to handle the message failure, I'm going to add a return value to virtio_config_ops.get() and virtio_cread_* API so that the error can be propagated to the virtio device driver. Then the virtio-blk device driver can be modified to handle that. Jason and Stefan, what do you think of this way? Why does VDUSE_DEV_GET_CONFIG need to support an error return value? The VIRTIO spec provides no way for the device to report errors from config space accesses. The QEMU virtio-pci implementation returns -1 from invalid virtio_config_read*() and silently discards virtio_config_write*() accesses. VDUSE can take the same approach with VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG. I'd like to stick to the current assumption thich get_config won't fail. That is to say, 1) maintain a config in the kernel, make sure the config space read can always succeed 2) introduce an ioctl for the vduse usersapce to update the config space. 3) we can synchronize with the vduse userspace during set_config Does this work? I noticed that caching is also allowed by the vhost-user protocol messages (QEMU's docs/interop/vhost-user.rst), but the device doesn't know whether or not caching is in effect. The interface you outlined above requires caching. Is there a reason why the host kernel vDPA code needs to cache the configuration space? Because: 1) Kernel can not wait forever in get_config(), this is the major difference with vhost-user. 2) Stick to the current assumption that virtio_cread() should always succeed. Thanks Here are the vhost-user protocol messages: Virtio device config space ^^ ++--+---+-+ | offset | size | flags | payload | ++--+---+-+ :offset: a 32-bit offset of virtio device's configuration space :size: a 32-bit configuration space access size in bytes :flags: a 32-bit value: - 0: Vhost master messages used for writeable fields - 1: Vhost master messages used for live migration :payload: Size bytes array holding the contents of the virtio device's configuration space ... ``VHOST_USER_GET_CONFIG`` :id: 24 :equivalent ioctl: N/A :master payload: virtio device config space :slave payload: virtio device config space When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is submitted by the vhost-user master to fetch the contents of the virtio device configuration space, vhost-user slave's payload size MUST match master's request, vhost-user slave uses zero length of payload to indicate an error to vhost-user master. The vhost-user master may cache the contents to avoid repeated ``VHOST_USER_GET_CONFIG`` calls. ``VHOST_USER_SET_CONFIG`` :id: 25 :equivalent ioctl: N/A :master payload: virtio device config space :slave payload: N/A When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is submitted by the vhost-user master when the Guest changes the virtio device configuration space and also can be used for live migration on the destination host. The vhost-user slave must check the flags field, and slaves MUST NOT accept SET_CONFIG for read-only configuration space fields unless the live migration bit is set. Stefan ___ iommu mailing list iommu@lists.linux-foundation.org
[RFC PATCH 1/1] dma-debug: fix check_for_illegal_area() in debug_dma_map_sg()
The following warning occurred sporadically on s390: DMA-API: nvme 0006:00:00.0: device driver maps memory from kernel text or rodata [addr=48cc5e2f] [len=131072] WARNING: CPU: 4 PID: 825 at kernel/dma/debug.c:1083 check_for_illegal_area+0xa8/0x138 It is a false-positive warning, due to a broken logic in debug_dma_map_sg(). check_for_illegal_area() should check for overlay of sg elements with kernel text or rodata. It is called with sg_dma_len(s) instead of s->length as parameter. After the call to ->map_sg(), sg_dma_len() contains the length of possibly combined sg elements in the DMA address space, and not the individual sg element length, which would be s->length. The check will then use the kernel start address of an sg element, and add the DMA length for overlap check, which can result in the false-positive warning because the DMA length can be larger than the actual single sg element length in kernel address space. In addition, the call to check_for_illegal_area() happens in the iteration over mapped_ents, which will not include all individual sg elements if any of them were combined in ->map_sg(). Fix this by using s->length instead of sg_dma_len(s). Also put the call to check_for_illegal_area() in a separate loop, iterating over all the individual sg elements ("nents" instead of "mapped_ents"). Fixes: 884d05970bfb ("dma-debug: use sg_dma_len accessor") Tested-by: Niklas Schnelle Signed-off-by: Gerald Schaefer --- kernel/dma/debug.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 14de1271463f..d7d44b7fe7e2 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -1299,6 +1299,12 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg, if (unlikely(dma_debug_disabled())) return; + for_each_sg(sg, s, nents, i) { + if (!PageHighMem(sg_page(s))) { + check_for_illegal_area(dev, sg_virt(s), s->length); + } + } + for_each_sg(sg, s, mapped_ents, i) { entry = dma_entry_alloc(); if (!entry) @@ -1316,10 +1322,6 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg, check_for_stack(dev, sg_page(s), s->offset); - if (!PageHighMem(sg_page(s))) { - check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s)); - } - check_sg_segment(dev, s); add_dma_entry(entry); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 0/1] dma-debug: fix check_for_illegal_area() in debug_dma_map_sg()
The following warning occurred sporadically on s390: DMA-API: nvme 0006:00:00.0: device driver maps memory from kernel text or rodata [addr=48cc5e2f] [len=131072] WARNING: CPU: 4 PID: 825 at kernel/dma/debug.c:1083 check_for_illegal_area+0xa8/0x138 It is a false-positive warning, due to a broken logic in debug_dma_map_sg(), see patch description. In short, the check is mixing up kernel start address for sg elements with the length of possibly combined sg elements in the DMA address space. I am a bit confused by the whole logic, and not sure what would be the best way to fix this. The false-postives should have been possible since commit 884d05970bfb ("dma-debug: use sg_dma_len accessor"), which is included since 2.6.31. Also, it seems to me that even before that commit, the check would have been wrong, or at least incomplete, because it is located in a loop that iterates over mapped_ents instead of nents. So it would not check all physical sg elements if any were combined in DMA address space. Gerald Schaefer (1): dma-debug: fix check_for_illegal_area() in debug_dma_map_sg() kernel/dma/debug.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
Hi Nathan, I may have just spotted something in these logs... On Fri, Jul 02, 2021 at 10:55:17PM -0700, Nathan Chancellor wrote: > [2.340956] pci :0c:00.1: Adding to iommu group 4 > [2.340996] pci :0c:00.2: Adding to iommu group 4 > [2.341038] pci :0c:00.3: Adding to iommu group 4 > [2.341078] pci :0c:00.4: Adding to iommu group 4 > [2.341122] pci :0c:00.6: Adding to iommu group 4 > [2.341163] pci :0d:00.0: Adding to iommu group 4 > [2.341203] pci :0d:00.1: Adding to iommu group 4 > [2.361821] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40 > [2.361839] pci :00:00.2: AMD-Vi: Extended features > (0x206d73ef22254ade): > [2.361846] PPR X2APIC NX GT IA GA PC GA_vAPIC > [2.361861] AMD-Vi: Interrupt remapping enabled > [2.361865] AMD-Vi: Virtual APIC enabled > [2.361870] AMD-Vi: X2APIC enabled > [2.362272] AMD-Vi: Lazy IO/TLB flushing enabled So at this point, the AMD IOMMU driver does: swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0; where 'swiotlb' is a global variable indicating whether or not swiotlb is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which will call swiotlb_exit() if 'swiotlb' is false. Now, that used to work fine, because swiotlb_exit() clears 'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I think that all the devices which have successfully probed beforehand will have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem' field. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug report] IB/usnic: Add Cisco VIC low-level hardware driver
On Mon, Jul 05, 2021 at 12:21:38PM -0300, Jason Gunthorpe wrote: > On Mon, Jul 05, 2021 at 02:47:36PM +0100, Robin Murphy wrote: > > On 2021-07-05 11:23, Dan Carpenter wrote: > > > [ Ancient code, but the bug seems real enough still. -dan ] > > > > > > Hello Upinder Malhi, > > > > > > The patch e3cf00d0a87f: "IB/usnic: Add Cisco VIC low-level hardware > > > driver" from Sep 10, 2013, leads to the following static checker > > > warning: > > > > > > drivers/iommu/iommu.c:2482 iommu_map() > > > warn: sleeping in atomic context > > > > > > drivers/infiniband/hw/usnic/usnic_uiom.c > > > 244 static int usnic_uiom_map_sorted_intervals(struct list_head > > > *intervals, > > > 245 struct > > > usnic_uiom_reg *uiomr) > > > > > > This function is always called from usnic_uiom_reg_get() which is holding > > > spin_lock(>lock); so it can't sleep. > > > > FWIW back in those days it wasn't really well defined whether iommu_map() > > was callable from non-sleeping contexts (the arch/arm DMA API code relied on > > it, for instance). It was only formalised 2 years ago by 781ca2de89ba > > ("iommu: Add gfp parameter to iommu_ops::map") which introduced the > > might_sleep() check that's firing there. I guess these calls want to be > > updated to iommu_map_atomic() now. > > Does this mean this driver doesn't work at all upstream? I would be > quite interested to delete it. It just means it hasn't been used with CONFIG_DEBUG_ATOMIC_SLEEP enabled within the past two years. regards, dan carpenter ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug report] IB/usnic: Add Cisco VIC low-level hardware driver
On 2021-07-05 16:21, Jason Gunthorpe wrote: On Mon, Jul 05, 2021 at 02:47:36PM +0100, Robin Murphy wrote: On 2021-07-05 11:23, Dan Carpenter wrote: [ Ancient code, but the bug seems real enough still. -dan ] Hello Upinder Malhi, The patch e3cf00d0a87f: "IB/usnic: Add Cisco VIC low-level hardware driver" from Sep 10, 2013, leads to the following static checker warning: drivers/iommu/iommu.c:2482 iommu_map() warn: sleeping in atomic context drivers/infiniband/hw/usnic/usnic_uiom.c 244 static int usnic_uiom_map_sorted_intervals(struct list_head *intervals, 245 struct usnic_uiom_reg *uiomr) This function is always called from usnic_uiom_reg_get() which is holding spin_lock(>lock); so it can't sleep. FWIW back in those days it wasn't really well defined whether iommu_map() was callable from non-sleeping contexts (the arch/arm DMA API code relied on it, for instance). It was only formalised 2 years ago by 781ca2de89ba ("iommu: Add gfp parameter to iommu_ops::map") which introduced the might_sleep() check that's firing there. I guess these calls want to be updated to iommu_map_atomic() now. Does this mean this driver doesn't work at all upstream? I would be quite interested to delete it. I think the only time it's actually in trouble is on AMD hardware if one of those iommu_map() calls has to allocate a new pagetable page and that allocation has to go down whichever reclaim path actually sleeps. Historically all the other IOMMU drivers it might have come into contact with already used GFP_ATOMIC for their internal allocations anyway (AMD was the only one using a mutex instead of a spinlock internally), and some like intel-iommu still haven't relaxed that even now. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug report] IB/usnic: Add Cisco VIC low-level hardware driver
On Mon, Jul 05, 2021 at 02:47:36PM +0100, Robin Murphy wrote: > On 2021-07-05 11:23, Dan Carpenter wrote: > > [ Ancient code, but the bug seems real enough still. -dan ] > > > > Hello Upinder Malhi, > > > > The patch e3cf00d0a87f: "IB/usnic: Add Cisco VIC low-level hardware > > driver" from Sep 10, 2013, leads to the following static checker > > warning: > > > > drivers/iommu/iommu.c:2482 iommu_map() > > warn: sleeping in atomic context > > > > drivers/infiniband/hw/usnic/usnic_uiom.c > > 244 static int usnic_uiom_map_sorted_intervals(struct list_head > > *intervals, > > 245 struct > > usnic_uiom_reg *uiomr) > > > > This function is always called from usnic_uiom_reg_get() which is holding > > spin_lock(>lock); so it can't sleep. > > FWIW back in those days it wasn't really well defined whether iommu_map() > was callable from non-sleeping contexts (the arch/arm DMA API code relied on > it, for instance). It was only formalised 2 years ago by 781ca2de89ba > ("iommu: Add gfp parameter to iommu_ops::map") which introduced the > might_sleep() check that's firing there. I guess these calls want to be > updated to iommu_map_atomic() now. Does this mean this driver doesn't work at all upstream? I would be quite interested to delete it. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"
On 21/07/05 08:56AM, Marek Szyprowski wrote: > QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller, > what fails for the second and latter IOMMU devices. This is intended and > must be not fatal to the driver registration process. Also the cleanup > path should take care of the runtime PM state, what is missing in the > current patch. Revert relevant changes to the QCOM IOMMU driver until > a proper fix is prepared. > Apologies for the broken patch I don't have any arm machine to test the patches. Is this bug unique to qcom iommu? [...] Thanks, Amey ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug report] IB/usnic: Add Cisco VIC low-level hardware driver
On 2021-07-05 11:23, Dan Carpenter wrote: [ Ancient code, but the bug seems real enough still. -dan ] Hello Upinder Malhi, The patch e3cf00d0a87f: "IB/usnic: Add Cisco VIC low-level hardware driver" from Sep 10, 2013, leads to the following static checker warning: drivers/iommu/iommu.c:2482 iommu_map() warn: sleeping in atomic context drivers/infiniband/hw/usnic/usnic_uiom.c 244 static int usnic_uiom_map_sorted_intervals(struct list_head *intervals, 245 struct usnic_uiom_reg *uiomr) This function is always called from usnic_uiom_reg_get() which is holding spin_lock(>lock); so it can't sleep. FWIW back in those days it wasn't really well defined whether iommu_map() was callable from non-sleeping contexts (the arch/arm DMA API code relied on it, for instance). It was only formalised 2 years ago by 781ca2de89ba ("iommu: Add gfp parameter to iommu_ops::map") which introduced the might_sleep() check that's firing there. I guess these calls want to be updated to iommu_map_atomic() now. Robin. 246 { 247 int i, err; 248 size_t size; 249 struct usnic_uiom_chunk *chunk; 250 struct usnic_uiom_interval_node *interval_node; 251 dma_addr_t pa; 252 dma_addr_t pa_start = 0; 253 dma_addr_t pa_end = 0; 254 long int va_start = -EINVAL; 255 struct usnic_uiom_pd *pd = uiomr->pd; 256 long int va = uiomr->va & PAGE_MASK; 257 int flags = IOMMU_READ | IOMMU_CACHE; 258 259 flags |= (uiomr->writable) ? IOMMU_WRITE : 0; 260 chunk = list_first_entry(>chunk_list, struct usnic_uiom_chunk, 261 list); 262 list_for_each_entry(interval_node, intervals, link) { 263 iter_chunk: 264 for (i = 0; i < chunk->nents; i++, va += PAGE_SIZE) { 265 pa = sg_phys(>page_list[i]); 266 if ((va >> PAGE_SHIFT) < interval_node->start) 267 continue; 268 269 if ((va >> PAGE_SHIFT) == interval_node->start) { 270 /* First page of the interval */ 271 va_start = va; 272 pa_start = pa; 273 pa_end = pa; 274 } 275 276 WARN_ON(va_start == -EINVAL); 277 278 if ((pa_end + PAGE_SIZE != pa) && 279 (pa != pa_start)) { 280 /* PAs are not contiguous */ 281 size = pa_end - pa_start + PAGE_SIZE; 282 usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x", 283 va_start, _start, size, flags); 284 err = iommu_map(pd->domain, va_start, pa_start, 285 size, flags); The iommu_map() function sleeps. 286 if (err) { 287 usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n", 288 va_start, _start, size, err); 289 goto err_out; 290 } 291 va_start = va; 292 pa_start = pa; 293 pa_end = pa; 294 } 295 296 if ((va >> PAGE_SHIFT) == interval_node->last) { 297 /* Last page of the interval */ 298 size = pa - pa_start + PAGE_SIZE; 299 usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x\n", 300 va_start, _start, size, flags); 301 err = iommu_map(pd->domain, va_start, pa_start, 302 size, flags); iommu_map() again. 303 if (err) { 304 usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n", 305 va_start, _start, size, err); 306 goto err_out; 307 } 308 break; 309 } 310 311 if (pa != pa_start) 312
Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote: > > 在 2021/7/4 下午5:49, Yongji Xie 写道: > > > > OK, I get you now. Since the VIRTIO specification says "Device > > > > configuration space is generally used for rarely-changing or > > > > initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG > > > > ioctl should not be called frequently. > > > The spec uses MUST and other terms to define the precise requirements. > > > Here the language (especially the word "generally") is weaker and means > > > there may be exceptions. > > > > > > Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG > > > approach is reads that have side-effects. For example, imagine a field > > > containing an error code if the device encounters a problem unrelated to > > > a specific virtqueue request. Reading from this field resets the error > > > code to 0, saving the driver an extra configuration space write access > > > and possibly race conditions. It isn't possible to implement those > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it > > > makes me think that the interface does not allow full VIRTIO semantics. > > > Note that though you're correct, my understanding is that config space is > not suitable for this kind of error propagating. And it would be very hard > to implement such kind of semantic in some transports. Virtqueue should be > much better. As Yong Ji quoted, the config space is used for > "rarely-changing or intialization-time parameters". > > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to > > handle the message failure, I'm going to add a return value to > > virtio_config_ops.get() and virtio_cread_* API so that the error can > > be propagated to the virtio device driver. Then the virtio-blk device > > driver can be modified to handle that. > > > > Jason and Stefan, what do you think of this way? Why does VDUSE_DEV_GET_CONFIG need to support an error return value? The VIRTIO spec provides no way for the device to report errors from config space accesses. The QEMU virtio-pci implementation returns -1 from invalid virtio_config_read*() and silently discards virtio_config_write*() accesses. VDUSE can take the same approach with VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG. > I'd like to stick to the current assumption thich get_config won't fail. > That is to say, > > 1) maintain a config in the kernel, make sure the config space read can > always succeed > 2) introduce an ioctl for the vduse usersapce to update the config space. > 3) we can synchronize with the vduse userspace during set_config > > Does this work? I noticed that caching is also allowed by the vhost-user protocol messages (QEMU's docs/interop/vhost-user.rst), but the device doesn't know whether or not caching is in effect. The interface you outlined above requires caching. Is there a reason why the host kernel vDPA code needs to cache the configuration space? Here are the vhost-user protocol messages: Virtio device config space ^^ ++--+---+-+ | offset | size | flags | payload | ++--+---+-+ :offset: a 32-bit offset of virtio device's configuration space :size: a 32-bit configuration space access size in bytes :flags: a 32-bit value: - 0: Vhost master messages used for writeable fields - 1: Vhost master messages used for live migration :payload: Size bytes array holding the contents of the virtio device's configuration space ... ``VHOST_USER_GET_CONFIG`` :id: 24 :equivalent ioctl: N/A :master payload: virtio device config space :slave payload: virtio device config space When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is submitted by the vhost-user master to fetch the contents of the virtio device configuration space, vhost-user slave's payload size MUST match master's request, vhost-user slave uses zero length of payload to indicate an error to vhost-user master. The vhost-user master may cache the contents to avoid repeated ``VHOST_USER_GET_CONFIG`` calls. ``VHOST_USER_SET_CONFIG`` :id: 25 :equivalent ioctl: N/A :master payload: virtio device config space :slave payload: N/A When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is submitted by the vhost-user master when the Guest changes the virtio device configuration space and also can be used for live migration on the destination host. The vhost-user slave must check the flags field, and slaves MUST NOT accept SET_CONFIG for read-only configuration space fields unless the live migration bit is set. Stefan signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu: rockchip: Fix physical address decoding
Le 18/06/2021 à 15:00, Benjamin Gaignard a écrit : Restore bits 39 to 32 at correct position. It reverses the operation done in rk_dma_addr_dte_v2(). Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2") Reported-by: Dan Carpenter Signed-off-by: Benjamin Gaignard Gentle ping on this patch :-) Regards, Benjamin --- drivers/iommu/rockchip-iommu.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 94b9d8e5b9a40..9febfb7f3025b 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -544,12 +544,14 @@ static inline u32 rk_dma_addr_dte(dma_addr_t dt_dma) } #define DT_HI_MASK GENMASK_ULL(39, 32) +#define DTE_BASE_HI_MASK GENMASK(11, 4) #define DT_SHIFT 28 static inline phys_addr_t rk_dte_addr_phys_v2(u32 addr) { - return (phys_addr_t)(addr & RK_DTE_PT_ADDRESS_MASK) | - ((addr & DT_HI_MASK) << DT_SHIFT); + u64 addr64 = addr; + return (phys_addr_t)(addr64 & RK_DTE_PT_ADDRESS_MASK) | + ((addr64 & DTE_BASE_HI_MASK) << DT_SHIFT); } static inline u32 rk_dma_addr_dte_v2(dma_addr_t dt_dma) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[bug report] IB/usnic: Add Cisco VIC low-level hardware driver
[ Ancient code, but the bug seems real enough still. -dan ] Hello Upinder Malhi, The patch e3cf00d0a87f: "IB/usnic: Add Cisco VIC low-level hardware driver" from Sep 10, 2013, leads to the following static checker warning: drivers/iommu/iommu.c:2482 iommu_map() warn: sleeping in atomic context drivers/infiniband/hw/usnic/usnic_uiom.c 244 static int usnic_uiom_map_sorted_intervals(struct list_head *intervals, 245 struct usnic_uiom_reg *uiomr) This function is always called from usnic_uiom_reg_get() which is holding spin_lock(>lock); so it can't sleep. 246 { 247 int i, err; 248 size_t size; 249 struct usnic_uiom_chunk *chunk; 250 struct usnic_uiom_interval_node *interval_node; 251 dma_addr_t pa; 252 dma_addr_t pa_start = 0; 253 dma_addr_t pa_end = 0; 254 long int va_start = -EINVAL; 255 struct usnic_uiom_pd *pd = uiomr->pd; 256 long int va = uiomr->va & PAGE_MASK; 257 int flags = IOMMU_READ | IOMMU_CACHE; 258 259 flags |= (uiomr->writable) ? IOMMU_WRITE : 0; 260 chunk = list_first_entry(>chunk_list, struct usnic_uiom_chunk, 261 list); 262 list_for_each_entry(interval_node, intervals, link) { 263 iter_chunk: 264 for (i = 0; i < chunk->nents; i++, va += PAGE_SIZE) { 265 pa = sg_phys(>page_list[i]); 266 if ((va >> PAGE_SHIFT) < interval_node->start) 267 continue; 268 269 if ((va >> PAGE_SHIFT) == interval_node->start) { 270 /* First page of the interval */ 271 va_start = va; 272 pa_start = pa; 273 pa_end = pa; 274 } 275 276 WARN_ON(va_start == -EINVAL); 277 278 if ((pa_end + PAGE_SIZE != pa) && 279 (pa != pa_start)) { 280 /* PAs are not contiguous */ 281 size = pa_end - pa_start + PAGE_SIZE; 282 usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x", 283 va_start, _start, size, flags); 284 err = iommu_map(pd->domain, va_start, pa_start, 285 size, flags); The iommu_map() function sleeps. 286 if (err) { 287 usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n", 288 va_start, _start, size, err); 289 goto err_out; 290 } 291 va_start = va; 292 pa_start = pa; 293 pa_end = pa; 294 } 295 296 if ((va >> PAGE_SHIFT) == interval_node->last) { 297 /* Last page of the interval */ 298 size = pa - pa_start + PAGE_SIZE; 299 usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x\n", 300 va_start, _start, size, flags); 301 err = iommu_map(pd->domain, va_start, pa_start, 302 size, flags); iommu_map() again. 303 if (err) { 304 usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n", 305 va_start, _start, size, err); 306 goto err_out; 307 } 308 break; 309 } 310 311 if (pa != pa_start) 312 pa_end += PAGE_SIZE; 313 } 314 315 if (i == chunk->nents) { 316 /* 317 * Hit last entry of the chunk, 318 * hence advance to next chunk 319 */ 320 chunk = list_first_entry(>list, 321 struct usnic_uiom_chunk, 322
RE: [PATCH v5 3/8] ACPI/IORT: Add a helper to retrieve RMR memory regions
> -Original Message- > From: Jon Nettleton [mailto:j...@solid-run.com] > Sent: 04 July 2021 08:39 > To: Shameerali Kolothum Thodi > Cc: Robin Murphy ; > linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; > iommu@lists.linux-foundation.org; Linuxarm ; > steven.pr...@arm.com; Guohanjun (Hanjun Guo) ; > yangyicong ; sami.muja...@arm.com; > wanghuiqiang > Subject: Re: [PATCH v5 3/8] ACPI/IORT: Add a helper to retrieve RMR memory > regions > > On Tue, Jun 29, 2021 at 7:34 PM Jon Nettleton wrote: > > > > On Mon, Jun 14, 2021 at 2:49 PM Shameerali Kolothum Thodi > > wrote: > > > > > > > > > > > > > -Original Message- > > > > From: Robin Murphy [mailto:robin.mur...@arm.com] > > > > Sent: 14 June 2021 12:23 > > > > To: Shameerali Kolothum Thodi > > > > ; > > > > linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; > > > > iommu@lists.linux-foundation.org > > > > Cc: j...@solid-run.com; Linuxarm ; > > > > steven.pr...@arm.com; Guohanjun (Hanjun Guo) > > > > ; yangyicong ; > > > > sami.muja...@arm.com; wanghuiqiang > > > > Subject: Re: [PATCH v5 3/8] ACPI/IORT: Add a helper to retrieve > > > > RMR memory regions > > > > > > > > On 2021-05-24 12:02, Shameer Kolothum wrote: > > > > > Add a helper function that retrieves RMR memory descriptors > > > > > associated with a given IOMMU. This will be used by IOMMU > > > > > drivers to setup necessary mappings. > > > > > > > > > > Now that we have this, invoke it from the generic helper > > > > > interface. > > > > > > > > > > Signed-off-by: Shameer Kolothum > > > > > > > > > --- > > > > > drivers/acpi/arm64/iort.c | 50 > > > > +++ > > > > > drivers/iommu/dma-iommu.c | 4 > > > > > include/linux/acpi_iort.h | 7 ++ > > > > > 3 files changed, 61 insertions(+) > > > > > > > > > > diff --git a/drivers/acpi/arm64/iort.c > > > > > b/drivers/acpi/arm64/iort.c index fea1ffaedf3b..01917caf58de > > > > > 100644 > > > > > --- a/drivers/acpi/arm64/iort.c > > > > > +++ b/drivers/acpi/arm64/iort.c > > > > > @@ -12,6 +12,7 @@ > > > > > > > > > > #include > > > > > #include > > > > > +#include > > > > > #include > > > > > #include > > > > > #include > > > > > @@ -837,6 +838,53 @@ static inline int > > > > > iort_add_device_replay(struct > > > > device *dev) > > > > > return err; > > > > > } > > > > > > > > > > +/** > > > > > + * iort_iommu_get_rmrs - Helper to retrieve RMR info associated > > > > > +with > > > > IOMMU > > > > > + * @iommu: fwnode for the IOMMU > > > > > + * @head: RMR list head to be populated > > > > > + * > > > > > + * Returns: 0 on success, <0 failure */ int > > > > > +iort_iommu_get_rmrs(struct fwnode_handle *iommu_fwnode, > > > > > + struct list_head *head) { > > > > > + struct iort_rmr_entry *e; > > > > > + struct acpi_iort_node *iommu; > > > > > + int rmrs = 0; > > > > > + > > > > > + iommu = iort_get_iort_node(iommu_fwnode); > > > > > + if (!iommu || list_empty(_rmr_list)) > > > > > + return -ENODEV; > > > > > + > > > > > + list_for_each_entry(e, _rmr_list, list) { > > > > > + int prot = IOMMU_READ | IOMMU_WRITE | > IOMMU_NOEXEC | > > > > IOMMU_MMIO; > > > > > + struct iommu_resv_region *region; > > > > > + enum iommu_resv_type type; > > > > > + struct acpi_iort_rmr_desc *rmr_desc; > > > > > + > > > > > + if (e->smmu != iommu) > > > > > + continue; > > > > > + > > > > > + rmr_desc = e->rmr_desc; > > > > > + if (e->flags & IOMMU_RMR_REMAP_PERMITTED) > > > > > + type = IOMMU_RESV_DIRECT_RELAXABLE; > > > > > + else > > > > > + type = IOMMU_RESV_DIRECT; > > > > > > > > I have been looking at this a bit more and looking at the history of > > RMR_REMAP_PERMITTED. Based on the history I have found it seems to > me > > like this would be the better options for prot. > > > > @@ -840,7 +840,7 @@ int iort_iommu_get_rmrs(struct fwnode_handle > *iommu_fwnode, > > return -ENODEV; > > > > list_for_each_entry(e, _rmr_list, list) { > > - int prot = IOMMU_READ | IOMMU_WRITE | > IOMMU_NOEXEC | IOMMU_MMIO; > > + int prot = IOMMU_READ | IOMMU_WRITE; > > struct iommu_resv_region *region; > > enum iommu_resv_type type; > > struct acpi_iort_rmr_desc *rmr_desc; @@ -849,10 > > +849,13 @@ int iort_iommu_get_rmrs(struct fwnode_handle > *iommu_fwnode, > > continue; > > > > rmr_desc = e->rmr_desc; > > - if (e->flags & IOMMU_RMR_REMAP_PERMITTED) > > + if (e->flags & IOMMU_RMR_REMAP_PERMITTED) { > > type = IOMMU_RESV_DIRECT_RELAXABLE; > > - else > > + prot |= IOMMU_CACHE; > > + } else { > > type = IOMMU_RESV_DIRECT; > > +
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Sat, Jul 3, 2021 at 1:55 PM Nathan Chancellor wrote: > > Hi Will and Robin, > > On Fri, Jul 02, 2021 at 04:13:50PM +0100, Robin Murphy wrote: > > On 2021-07-02 14:58, Will Deacon wrote: > > > Hi Nathan, > > > > > > On Thu, Jul 01, 2021 at 12:52:20AM -0700, Nathan Chancellor wrote: > > > > On 7/1/2021 12:40 AM, Will Deacon wrote: > > > > > On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote: > > > > > > On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote: > > > > > > > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote: > > > > > > > > `BUG: unable to handle page fault for address: > > > > > > > > 003a8290` and > > > > > > > > the fact it crashed at `_raw_spin_lock_irqsave` look like the > > > > > > > > memory > > > > > > > > (maybe dev->dma_io_tlb_mem) was corrupted? > > > > > > > > The dev->dma_io_tlb_mem should be set here > > > > > > > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528) > > > > > > > > through device_initialize. > > > > > > > > > > > > > > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at > > > > > > > 'io_tlb_default_mem', which is a page-aligned allocation from > > > > > > > memblock. > > > > > > > The spinlock is at offset 0x24 in that structure, and looking at > > > > > > > the > > > > > > > register dump from the crash: > > > > > > > > > > > > > > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 > > > > > > > EFLAGS: 00010006 > > > > > > > Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX: > > > > > > > RCX: 8900572ad580 > > > > > > > Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: > > > > > > > 000c RDI: 1d17 > > > > > > > Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: > > > > > > > 000c R09: > > > > > > > Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: > > > > > > > 89005653f000 R12: 0212 > > > > > > > Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: > > > > > > > 0002 R15: 0020 > > > > > > > Jun 29 18:28:42 hp-4300G kernel: FS: 7f1f8898ea40() > > > > > > > GS:89005728() knlGS: > > > > > > > Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: ES: CR0: > > > > > > > 80050033 > > > > > > > Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: > > > > > > > 0001020d CR4: 00350ee0 > > > > > > > Jun 29 18:28:42 hp-4300G kernel: Call Trace: > > > > > > > Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50 > > > > > > > Jun 29 18:28:42 hp-4300G kernel: > > > > > > > swiotlb_tbl_map_single+0x12b/0x4c0 > > > > > > > > > > > > > > Then that correlates with R11 holding the 'dma_io_tlb_mem' > > > > > > > pointer and > > > > > > > RDX pointing at the spinlock. Yet RAX is holding junk :/ > > > > > > > > > > > > > > I agree that enabling KASAN would be a good idea, but I also > > > > > > > think we > > > > > > > probably need to get some more information out of > > > > > > > swiotlb_tbl_map_single() > > > > > > > to see see what exactly is going wrong in there. > > > > > > > > > > > > I can certainly enable KASAN and if there is any debug print I can > > > > > > add > > > > > > or dump anything, let me know! > > > > > > > > > > I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged > > > > > in, built > > > > > x86 defconfig and ran it on my laptop. However, it seems to work fine! > > > > > > > > > > Please can you share your .config? > > > > > > > > Sure thing, it is attached. It is just Arch Linux's config run through > > > > olddefconfig. The original is below in case you need to diff it. > > > > > > > > https://raw.githubusercontent.com/archlinux/svntogit-packages/9045405dc835527164f3034b3ceb9a67c7a53cd4/trunk/config > > > > > > > > If there is anything more that I can provide, please let me know. > > > > > > I eventually got this booting (for some reason it was causing LD to SEGV > > > trying to link it for a while...) and sadly it works fine on my laptop. > > > Hmm. > > Seems like it might be something specific to the amdgpu module? > > > > Did you manage to try again with KASAN? > > Yes, it took a few times to reproduce the issue but I did manage to get > a dmesg, please find it attached. I build from commit 7d31f1c65cc9 ("swiotlb: > fix implicit debugfs declarations") in Konrad's tree. Looking at the logs, the use-after-free bug looked somehow relevant (and it's nvme again. Qian's crash is about nvme too): [2.468288] BUG: KASAN: use-after-free in __iommu_dma_unmap_swiotlb+0x64/0xb0 [2.468288] Read of size 8 at addr 8881d783 by task swapper/0/0 [2.468288] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3-debug #1 [2.468288] Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.12 12/17/2020 [2.468288] Call Trace: [2.468288] [2.479433]
[PATCH] iommu: qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"
QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller, what fails for the second and latter IOMMU devices. This is intended and must be not fatal to the driver registration process. Also the cleanup path should take care of the runtime PM state, what is missing in the current patch. Revert relevant changes to the QCOM IOMMU driver until a proper fix is prepared. This partially reverts commit 249c9dc6aa0db74a0f7908efd04acf774e19b155. Fixes: 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error path") Suggested-by: Will Deacon Signed-off-by: Marek Szyprowski --- drivers/iommu/arm/arm-smmu/qcom_iommu.c | 13 ++--- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c index 25ed444ff94d..021cf8f65ffc 100644 --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c @@ -849,12 +849,10 @@ static int qcom_iommu_device_probe(struct platform_device *pdev) ret = iommu_device_register(_iommu->iommu, _iommu_ops, dev); if (ret) { dev_err(dev, "Failed to register iommu\n"); - goto err_sysfs_remove; + return ret; } - ret = bus_set_iommu(_bus_type, _iommu_ops); - if (ret) - goto err_unregister_device; + bus_set_iommu(_bus_type, _iommu_ops); if (qcom_iommu->local_base) { pm_runtime_get_sync(dev); @@ -863,13 +861,6 @@ static int qcom_iommu_device_probe(struct platform_device *pdev) } return 0; - -err_unregister_device: - iommu_device_unregister(_iommu->iommu); - -err_sysfs_remove: - iommu_device_sysfs_remove(_iommu->iommu); - return ret; } static int qcom_iommu_device_remove(struct platform_device *pdev) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu