Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/8/5 上午4:30, Peter Xu wrote: On Mon, Aug 03, 2020 at 06:00:34PM +0200, Eugenio Pérez wrote: On Fri, 2020-07-03 at 15:24 +0800, Jason Wang wrote: On 2020/7/2 下午11:45, Peter Xu wrote: On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote: So I think we agree that a new notifier is needed? Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)? That should work but I wonder something as following is better. Instead of introducing new flags, how about carry the type of event in the notifier then the device (vhost) can choose the message it want to process like: static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event) { switch (event->type) { case IOMMU_MAP: case IOMMU_UNMAP: case IOMMU_DEV_IOTLB_UNMAP: ... } Thanks Hi! Sorry, I thought I had this clear but now it seems not so clear to me. Do you mean to add that switch to the current vhost_iommu_unmap_notify, and then the "type" field to the IOMMUTLBEntry? Is that the scope of the changes, or there is something I'm missing? If that is correct, what is the advantage for vhost or other notifiers? I understand that move the IOMMUTLBEntry (addr, len) -> (iova, mask) split/transformation to the different notifiers implementation could pollute them, but this is even a deeper change and vhost is not insterested in other events but IOMMU_UNMAP, isn't? On the other hand, who decide what type of event is? If I follow the backtrace of the assert in https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01015.html, it seems to me that it should be vtd_process_device_iotlb_desc. How do I know if it should be IOMMU_UNMAP or IOMMU_DEV_IOTLB_UNMAP? If I set it in some function of memory.c, I should decide the type looking the actual notifier, isn't? (Since Jason didn't reply yesterday, I'll try to; Jason, feel free to correct me...) IMHO whether to put the type into the IOMMUTLBEntry is not important. The important change should be that we introduce IOMMU_DEV_IOTLB_UNMAP (or I'd rather call it IOMMU_DEV_IOTLB directly which is shorter and cleaner). With that information we can make the failing assertion conditional for MAP/UNMAP only. Or having another dedicated device IOTLB notifier. We can also allow dev-iotlb messages to take arbitrary addr_mask (so it becomes a length of address range; imho we can keep using addr_mask for simplicity, but we can comment for addr_mask that for dev-iotlb it can be not a real mask). Yes. Thanks Thanks,
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Mon, Aug 03, 2020 at 06:00:34PM +0200, Eugenio Pérez wrote: > On Fri, 2020-07-03 at 15:24 +0800, Jason Wang wrote: > > On 2020/7/2 下午11:45, Peter Xu wrote: > > > On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote: > > > > So I think we agree that a new notifier is needed? > > > Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)? > > > > That should work but I wonder something as following is better. > > > > Instead of introducing new flags, how about carry the type of event in > > the notifier then the device (vhost) can choose the message it want to > > process like: > > > > static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event) > > > > { > > > > switch (event->type) { > > > > case IOMMU_MAP: > > case IOMMU_UNMAP: > > case IOMMU_DEV_IOTLB_UNMAP: > > ... > > > > } > > > > Thanks > > > > > > Hi! > > Sorry, I thought I had this clear but now it seems not so clear to me. Do you > mean to add that switch to the current > vhost_iommu_unmap_notify, and then the "type" field to the IOMMUTLBEntry? Is > that the scope of the changes, or there is > something I'm missing? > > If that is correct, what is the advantage for vhost or other notifiers? I > understand that move the IOMMUTLBEntry (addr, > len) -> (iova, mask) split/transformation to the different notifiers > implementation could pollute them, but this is even a deeper change and vhost > is not insterested in other events but IOMMU_UNMAP, isn't? > > On the other hand, who decide what type of event is? If I follow the > backtrace of the assert in > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01015.html, it seems > to me that it should be > vtd_process_device_iotlb_desc. How do I know if it should be IOMMU_UNMAP or > IOMMU_DEV_IOTLB_UNMAP? If I set it in some > function of memory.c, I should decide the type looking the actual notifier, > isn't? (Since Jason didn't reply yesterday, I'll try to; Jason, feel free to correct me...) IMHO whether to put the type into the IOMMUTLBEntry is not important. The important change should be that we introduce IOMMU_DEV_IOTLB_UNMAP (or I'd rather call it IOMMU_DEV_IOTLB directly which is shorter and cleaner). With that information we can make the failing assertion conditional for MAP/UNMAP only. We can also allow dev-iotlb messages to take arbitrary addr_mask (so it becomes a length of address range; imho we can keep using addr_mask for simplicity, but we can comment for addr_mask that for dev-iotlb it can be not a real mask). Thanks, -- Peter Xu
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Fri, 2020-07-03 at 15:24 +0800, Jason Wang wrote: > On 2020/7/2 下午11:45, Peter Xu wrote: > > On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote: > > > So I think we agree that a new notifier is needed? > > Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)? > > That should work but I wonder something as following is better. > > Instead of introducing new flags, how about carry the type of event in > the notifier then the device (vhost) can choose the message it want to > process like: > > static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event) > > { > > switch (event->type) { > > case IOMMU_MAP: > case IOMMU_UNMAP: > case IOMMU_DEV_IOTLB_UNMAP: > ... > > } > > Thanks > > Hi! Sorry, I thought I had this clear but now it seems not so clear to me. Do you mean to add that switch to the current vhost_iommu_unmap_notify, and then the "type" field to the IOMMUTLBEntry? Is that the scope of the changes, or there is something I'm missing? If that is correct, what is the advantage for vhost or other notifiers? I understand that move the IOMMUTLBEntry (addr, len) -> (iova, mask) split/transformation to the different notifiers implementation could pollute them, but this is even a deeper change and vhost is not insterested in other events but IOMMU_UNMAP, isn't? On the other hand, who decide what type of event is? If I follow the backtrace of the assert in https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01015.html, it seems to me that it should be vtd_process_device_iotlb_desc. How do I know if it should be IOMMU_UNMAP or IOMMU_DEV_IOTLB_UNMAP? If I set it in some function of memory.c, I should decide the type looking the actual notifier, isn't? Thanks!
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Tue, Jul 21, 2020 at 02:20:01PM +0800, Jason Wang wrote: > > On 2020/7/20 下午9:03, Peter Xu wrote: > > On Mon, Jul 20, 2020 at 12:02:06PM +0800, Jason Wang wrote: > > > Right, so there's no need to deal with unmap in vtd's replay > > > implementation > > > (as what generic one did). > > We don't even for now; see vtd_page_walk_info.notify_unmap. Thanks, > > > Right, but I meant the vtd_address_space_unmap() in vtd_iomm_replay(). It > looks to me it will trigger UNMAP notifiers. Should be pretty safe for now, but I agree it seems optional (as we've discussed about this previously). Thanks, -- Peter Xu
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/20 下午9:03, Peter Xu wrote: On Mon, Jul 20, 2020 at 12:02:06PM +0800, Jason Wang wrote: Right, so there's no need to deal with unmap in vtd's replay implementation (as what generic one did). We don't even for now; see vtd_page_walk_info.notify_unmap. Thanks, Right, but I meant the vtd_address_space_unmap() in vtd_iomm_replay(). It looks to me it will trigger UNMAP notifiers. Thanks
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Mon, Jul 20, 2020 at 12:02:06PM +0800, Jason Wang wrote: > Right, so there's no need to deal with unmap in vtd's replay implementation > (as what generic one did). We don't even for now; see vtd_page_walk_info.notify_unmap. Thanks, -- Peter Xu
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/17 下午10:18, Peter Xu wrote: On Thu, Jul 16, 2020 at 10:54:31AM +0800, Jason Wang wrote: On 2020/7/16 上午9:00, Peter Xu wrote: On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote: On 2020/7/10 下午9:30, Peter Xu wrote: On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote: On 2020/7/9 下午10:10, Peter Xu wrote: On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: - If we care the performance, it's better to implement the MAP event for vhost, otherwise it could be a lot of IOTLB miss I feel like these are two things. So far what we are talking about is whether vt-d should have knowledge about what kind of events one iommu notifier is interested in. I still think we should keep this as answered in question 1. The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP events even without vDMA, so that vhost can establish the mapping even before IO starts. IMHO it's doable, but only if the guest runs DPDK workloads. When the guest is using dynamic iommu page mappings, I feel like that can be even slower, because then the worst case is for each IO we'll need to vmexit twice: - The first vmexit caused by an invalidation to MAP the page tables, so vhost will setup the page table before IO starts - IO/DMA triggers and completes - The second vmexit caused by another invalidation to UNMAP the page tables So it seems to be worse than when vhost only uses UNMAP like right now. At least we only have one vmexit (when UNMAP). We'll have a vhost translate() request from kernel to userspace, but IMHO that's cheaper than the vmexit. Right but then I would still prefer to have another notifier. Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a dedicated command for flushing device IOTLB. But the check for vtd_as_has_map_notifier is used to skip the device which can do demand paging via ATS or device specific way. If we have two different notifiers, vhost will be on the device iotlb notifier so we don't need it at all? But we can still have iommu notifier that only registers to UNMAP even after we introduce dev-iotlb notifier? We don't want to do page walk for them as well. TCG should be the only one so far, but I don't know.. maybe there can still be new ones? I think you're right. But looking at the codes, it looks like the check of vtd_as_has_map_notifier() was only used in: 1) vtd_iommu_replay() 2) vtd_iotlb_page_invalidate_notify() (PSI) For the replay, it's expensive anyhow. For PSI, I think it's just about one or few mappings, not sure it will have obvious performance impact. And I had two questions: 1) The codes doesn't check map for DSI or GI, does this match what spec said? (It looks to me the spec is unclear in this part) Both DSI/GI should cover maps too? E.g. vtd_sync_shadow_page_table() in vtd_iotlb_domain_invalidate(). I meant the code doesn't check whether there's an MAP notifier :) It's actually checked, because it loops over vtd_as_with_notifiers, and only MAP notifiers register to that. :) I may miss something but I don't find the code to block UNMAP notifiers? vhost_iommu_region_add() memory_region_register_iommu_notifier() memory_region_update_iommu_notify_flags() imrc->notify_flag_changed() vtd_iommu_notify_flag_changed() ? Yeah I think you're right. I might have confused with some previous implementations. Maybe we should also do similar thing for DSI/GI just like what we do in PSI. Ok. 2) for the replay() I don't see other implementations (either spapr or generic one) that did unmap (actually they skip unmap explicitly), any reason for doing this in intel IOMMU? I could be wrong, but I'd guess it's because vt-d implemented the caching mode by leveraging the same invalidation strucuture, so it's harder to make all things right (IOW, we can't clearly identify MAP with UNMAP when we receive an invalidation request, because MAP/UNMAP requests look the same). I didn't check others, but I believe spapr is doing it differently by using some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to what virtio-iommu is doing. Anyway, the point is if we have explicit MAP/UNMAP from the guest, logically the replay indeed does not need to do any unmap because we don't need to call replay() on an already existing device but only for e.g. hot plug. But this looks conflict with what memory_region_iommu_replay( ) did, for IOMMU that doesn't have a replay method, it skips UNMAP request: for (addr = 0; addr < memory_region_size(mr); addr += granularity) { iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx); if (iotlb.perm != IOMMU_NONE) { n->notify(n, ); } I guess there's no knowledge of whether guest have an explicit MAP/UMAP for this generic code. Or replay implies that guest doesn't have explicit MAP/UNMAP? I think it matches exactly with a
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Thu, Jul 16, 2020 at 10:54:31AM +0800, Jason Wang wrote: > > On 2020/7/16 上午9:00, Peter Xu wrote: > > On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote: > > > On 2020/7/10 下午9:30, Peter Xu wrote: > > > > On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote: > > > > > On 2020/7/9 下午10:10, Peter Xu wrote: > > > > > > On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: > > > > > > > > > - If we care the performance, it's better to implement the > > > > > > > > > MAP event for > > > > > > > > > vhost, otherwise it could be a lot of IOTLB miss > > > > > > > > I feel like these are two things. > > > > > > > > > > > > > > > > So far what we are talking about is whether vt-d should have > > > > > > > > knowledge about > > > > > > > > what kind of events one iommu notifier is interested in. I > > > > > > > > still think we > > > > > > > > should keep this as answered in question 1. > > > > > > > > > > > > > > > > The other question is whether we want to switch vhost from > > > > > > > > UNMAP to MAP/UNMAP > > > > > > > > events even without vDMA, so that vhost can establish the > > > > > > > > mapping even before > > > > > > > > IO starts. IMHO it's doable, but only if the guest runs DPDK > > > > > > > > workloads. When > > > > > > > > the guest is using dynamic iommu page mappings, I feel like > > > > > > > > that can be even > > > > > > > > slower, because then the worst case is for each IO we'll need > > > > > > > > to vmexit twice: > > > > > > > > > > > > > > > > - The first vmexit caused by an invalidation to MAP the > > > > > > > > page tables, so vhost > > > > > > > > will setup the page table before IO starts > > > > > > > > > > > > > > > > - IO/DMA triggers and completes > > > > > > > > > > > > > > > > - The second vmexit caused by another invalidation to > > > > > > > > UNMAP the page tables > > > > > > > > > > > > > > > > So it seems to be worse than when vhost only uses UNMAP like > > > > > > > > right now. At > > > > > > > > least we only have one vmexit (when UNMAP). We'll have a vhost > > > > > > > > translate() > > > > > > > > request from kernel to userspace, but IMHO that's cheaper than > > > > > > > > the vmexit. > > > > > > > Right but then I would still prefer to have another notifier. > > > > > > > > > > > > > > Since vtd_page_walk has nothing to do with device IOTLB. IOMMU > > > > > > > have a > > > > > > > dedicated command for flushing device IOTLB. But the check for > > > > > > > vtd_as_has_map_notifier is used to skip the device which can do > > > > > > > demand > > > > > > > paging via ATS or device specific way. If we have two different > > > > > > > notifiers, > > > > > > > vhost will be on the device iotlb notifier so we don't need it at > > > > > > > all? > > > > > > But we can still have iommu notifier that only registers to UNMAP > > > > > > even after we > > > > > > introduce dev-iotlb notifier? We don't want to do page walk for > > > > > > them as well. > > > > > > TCG should be the only one so far, but I don't know.. maybe there > > > > > > can still be > > > > > > new ones? > > > > > I think you're right. But looking at the codes, it looks like the > > > > > check of > > > > > vtd_as_has_map_notifier() was only used in: > > > > > > > > > > 1) vtd_iommu_replay() > > > > > 2) vtd_iotlb_page_invalidate_notify() (PSI) > > > > > > > > > > For the replay, it's expensive anyhow. For PSI, I think it's just > > > > > about one > > > > > or few mappings, not sure it will have obvious performance impact. > > > > > > > > > > And I had two questions: > > > > > > > > > > 1) The codes doesn't check map for DSI or GI, does this match what > > > > > spec > > > > > said? (It looks to me the spec is unclear in this part) > > > > Both DSI/GI should cover maps too? E.g. vtd_sync_shadow_page_table() in > > > > vtd_iotlb_domain_invalidate(). > > > > > > I meant the code doesn't check whether there's an MAP notifier :) > > It's actually checked, because it loops over vtd_as_with_notifiers, and only > > MAP notifiers register to that. :) > > > I may miss something but I don't find the code to block UNMAP notifiers? > > vhost_iommu_region_add() > memory_region_register_iommu_notifier() > memory_region_update_iommu_notify_flags() > imrc->notify_flag_changed() > vtd_iommu_notify_flag_changed() > > ? Yeah I think you're right. I might have confused with some previous implementations. Maybe we should also do similar thing for DSI/GI just like what we do in PSI. > > > > > 2) for the replay() I don't see other implementations (either spapr or > > > > > generic one) that did unmap (actually they skip unmap explicitly), any > > > > > reason for doing this in intel IOMMU? > > > > I could be wrong, but I'd guess it's because vt-d implemented the > > > > caching mode > > > > by leveraging the same invalidation strucuture, so it's harder to make > > > > all > > > > things
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/16 上午9:00, Peter Xu wrote: On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote: On 2020/7/10 下午9:30, Peter Xu wrote: On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote: On 2020/7/9 下午10:10, Peter Xu wrote: On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: - If we care the performance, it's better to implement the MAP event for vhost, otherwise it could be a lot of IOTLB miss I feel like these are two things. So far what we are talking about is whether vt-d should have knowledge about what kind of events one iommu notifier is interested in. I still think we should keep this as answered in question 1. The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP events even without vDMA, so that vhost can establish the mapping even before IO starts. IMHO it's doable, but only if the guest runs DPDK workloads. When the guest is using dynamic iommu page mappings, I feel like that can be even slower, because then the worst case is for each IO we'll need to vmexit twice: - The first vmexit caused by an invalidation to MAP the page tables, so vhost will setup the page table before IO starts - IO/DMA triggers and completes - The second vmexit caused by another invalidation to UNMAP the page tables So it seems to be worse than when vhost only uses UNMAP like right now. At least we only have one vmexit (when UNMAP). We'll have a vhost translate() request from kernel to userspace, but IMHO that's cheaper than the vmexit. Right but then I would still prefer to have another notifier. Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a dedicated command for flushing device IOTLB. But the check for vtd_as_has_map_notifier is used to skip the device which can do demand paging via ATS or device specific way. If we have two different notifiers, vhost will be on the device iotlb notifier so we don't need it at all? But we can still have iommu notifier that only registers to UNMAP even after we introduce dev-iotlb notifier? We don't want to do page walk for them as well. TCG should be the only one so far, but I don't know.. maybe there can still be new ones? I think you're right. But looking at the codes, it looks like the check of vtd_as_has_map_notifier() was only used in: 1) vtd_iommu_replay() 2) vtd_iotlb_page_invalidate_notify() (PSI) For the replay, it's expensive anyhow. For PSI, I think it's just about one or few mappings, not sure it will have obvious performance impact. And I had two questions: 1) The codes doesn't check map for DSI or GI, does this match what spec said? (It looks to me the spec is unclear in this part) Both DSI/GI should cover maps too? E.g. vtd_sync_shadow_page_table() in vtd_iotlb_domain_invalidate(). I meant the code doesn't check whether there's an MAP notifier :) It's actually checked, because it loops over vtd_as_with_notifiers, and only MAP notifiers register to that. :) I may miss something but I don't find the code to block UNMAP notifiers? vhost_iommu_region_add() memory_region_register_iommu_notifier() memory_region_update_iommu_notify_flags() imrc->notify_flag_changed() vtd_iommu_notify_flag_changed() ? But I agree with you that it should be cleaner to introduce the dev-iotlb notifier type. Yes, it can solve the issues of duplicated UNMAP issue of vhost. 2) for the replay() I don't see other implementations (either spapr or generic one) that did unmap (actually they skip unmap explicitly), any reason for doing this in intel IOMMU? I could be wrong, but I'd guess it's because vt-d implemented the caching mode by leveraging the same invalidation strucuture, so it's harder to make all things right (IOW, we can't clearly identify MAP with UNMAP when we receive an invalidation request, because MAP/UNMAP requests look the same). I didn't check others, but I believe spapr is doing it differently by using some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to what virtio-iommu is doing. Anyway, the point is if we have explicit MAP/UNMAP from the guest, logically the replay indeed does not need to do any unmap because we don't need to call replay() on an already existing device but only for e.g. hot plug. But this looks conflict with what memory_region_iommu_replay( ) did, for IOMMU that doesn't have a replay method, it skips UNMAP request: for (addr = 0; addr < memory_region_size(mr); addr += granularity) { iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx); if (iotlb.perm != IOMMU_NONE) { n->notify(n, ); } I guess there's no knowledge of whether guest have an explicit MAP/UMAP for this generic code. Or replay implies that guest doesn't have explicit MAP/UNMAP? I think it matches exactly with a hot plug case? Note that when IOMMU_NONE could also mean the translation does not exist. So it's actually trying to map
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote: > > On 2020/7/10 下午9:30, Peter Xu wrote: > > On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote: > > > On 2020/7/9 下午10:10, Peter Xu wrote: > > > > On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: > > > > > > > - If we care the performance, it's better to implement the MAP > > > > > > > event for > > > > > > > vhost, otherwise it could be a lot of IOTLB miss > > > > > > I feel like these are two things. > > > > > > > > > > > > So far what we are talking about is whether vt-d should have > > > > > > knowledge about > > > > > > what kind of events one iommu notifier is interested in. I still > > > > > > think we > > > > > > should keep this as answered in question 1. > > > > > > > > > > > > The other question is whether we want to switch vhost from UNMAP to > > > > > > MAP/UNMAP > > > > > > events even without vDMA, so that vhost can establish the mapping > > > > > > even before > > > > > > IO starts. IMHO it's doable, but only if the guest runs DPDK > > > > > > workloads. When > > > > > > the guest is using dynamic iommu page mappings, I feel like that > > > > > > can be even > > > > > > slower, because then the worst case is for each IO we'll need to > > > > > > vmexit twice: > > > > > > > > > > > > - The first vmexit caused by an invalidation to MAP the page > > > > > > tables, so vhost > > > > > >will setup the page table before IO starts > > > > > > > > > > > > - IO/DMA triggers and completes > > > > > > > > > > > > - The second vmexit caused by another invalidation to UNMAP > > > > > > the page tables > > > > > > > > > > > > So it seems to be worse than when vhost only uses UNMAP like right > > > > > > now. At > > > > > > least we only have one vmexit (when UNMAP). We'll have a vhost > > > > > > translate() > > > > > > request from kernel to userspace, but IMHO that's cheaper than the > > > > > > vmexit. > > > > > Right but then I would still prefer to have another notifier. > > > > > > > > > > Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a > > > > > dedicated command for flushing device IOTLB. But the check for > > > > > vtd_as_has_map_notifier is used to skip the device which can do demand > > > > > paging via ATS or device specific way. If we have two different > > > > > notifiers, > > > > > vhost will be on the device iotlb notifier so we don't need it at all? > > > > But we can still have iommu notifier that only registers to UNMAP even > > > > after we > > > > introduce dev-iotlb notifier? We don't want to do page walk for them > > > > as well. > > > > TCG should be the only one so far, but I don't know.. maybe there can > > > > still be > > > > new ones? > > > > > > I think you're right. But looking at the codes, it looks like the check of > > > vtd_as_has_map_notifier() was only used in: > > > > > > 1) vtd_iommu_replay() > > > 2) vtd_iotlb_page_invalidate_notify() (PSI) > > > > > > For the replay, it's expensive anyhow. For PSI, I think it's just about > > > one > > > or few mappings, not sure it will have obvious performance impact. > > > > > > And I had two questions: > > > > > > 1) The codes doesn't check map for DSI or GI, does this match what spec > > > said? (It looks to me the spec is unclear in this part) > > Both DSI/GI should cover maps too? E.g. vtd_sync_shadow_page_table() in > > vtd_iotlb_domain_invalidate(). > > > I meant the code doesn't check whether there's an MAP notifier :) It's actually checked, because it loops over vtd_as_with_notifiers, and only MAP notifiers register to that. :) But I agree with you that it should be cleaner to introduce the dev-iotlb notifier type. > > > > > > > 2) for the replay() I don't see other implementations (either spapr or > > > generic one) that did unmap (actually they skip unmap explicitly), any > > > reason for doing this in intel IOMMU? > > I could be wrong, but I'd guess it's because vt-d implemented the caching > > mode > > by leveraging the same invalidation strucuture, so it's harder to make all > > things right (IOW, we can't clearly identify MAP with UNMAP when we receive > > an > > invalidation request, because MAP/UNMAP requests look the same). > > > > I didn't check others, but I believe spapr is doing it differently by using > > some hypercalls to deliver IOMMU map/unmap requests, which seems a bit > > close to > > what virtio-iommu is doing. Anyway, the point is if we have explicit > > MAP/UNMAP > > from the guest, logically the replay indeed does not need to do any unmap > > because we don't need to call replay() on an already existing device but > > only > > for e.g. hot plug. > > > But this looks conflict with what memory_region_iommu_replay( ) did, for > IOMMU that doesn't have a replay method, it skips UNMAP request: > > for (addr = 0; addr < memory_region_size(mr); addr += granularity) { > iotlb = imrc->translate(iommu_mr, addr,
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/10 下午9:30, Peter Xu wrote: On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote: On 2020/7/9 下午10:10, Peter Xu wrote: On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: - If we care the performance, it's better to implement the MAP event for vhost, otherwise it could be a lot of IOTLB miss I feel like these are two things. So far what we are talking about is whether vt-d should have knowledge about what kind of events one iommu notifier is interested in. I still think we should keep this as answered in question 1. The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP events even without vDMA, so that vhost can establish the mapping even before IO starts. IMHO it's doable, but only if the guest runs DPDK workloads. When the guest is using dynamic iommu page mappings, I feel like that can be even slower, because then the worst case is for each IO we'll need to vmexit twice: - The first vmexit caused by an invalidation to MAP the page tables, so vhost will setup the page table before IO starts - IO/DMA triggers and completes - The second vmexit caused by another invalidation to UNMAP the page tables So it seems to be worse than when vhost only uses UNMAP like right now. At least we only have one vmexit (when UNMAP). We'll have a vhost translate() request from kernel to userspace, but IMHO that's cheaper than the vmexit. Right but then I would still prefer to have another notifier. Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a dedicated command for flushing device IOTLB. But the check for vtd_as_has_map_notifier is used to skip the device which can do demand paging via ATS or device specific way. If we have two different notifiers, vhost will be on the device iotlb notifier so we don't need it at all? But we can still have iommu notifier that only registers to UNMAP even after we introduce dev-iotlb notifier? We don't want to do page walk for them as well. TCG should be the only one so far, but I don't know.. maybe there can still be new ones? I think you're right. But looking at the codes, it looks like the check of vtd_as_has_map_notifier() was only used in: 1) vtd_iommu_replay() 2) vtd_iotlb_page_invalidate_notify() (PSI) For the replay, it's expensive anyhow. For PSI, I think it's just about one or few mappings, not sure it will have obvious performance impact. And I had two questions: 1) The codes doesn't check map for DSI or GI, does this match what spec said? (It looks to me the spec is unclear in this part) Both DSI/GI should cover maps too? E.g. vtd_sync_shadow_page_table() in vtd_iotlb_domain_invalidate(). I meant the code doesn't check whether there's an MAP notifier :) 2) for the replay() I don't see other implementations (either spapr or generic one) that did unmap (actually they skip unmap explicitly), any reason for doing this in intel IOMMU? I could be wrong, but I'd guess it's because vt-d implemented the caching mode by leveraging the same invalidation strucuture, so it's harder to make all things right (IOW, we can't clearly identify MAP with UNMAP when we receive an invalidation request, because MAP/UNMAP requests look the same). I didn't check others, but I believe spapr is doing it differently by using some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to what virtio-iommu is doing. Anyway, the point is if we have explicit MAP/UNMAP from the guest, logically the replay indeed does not need to do any unmap because we don't need to call replay() on an already existing device but only for e.g. hot plug. But this looks conflict with what memory_region_iommu_replay( ) did, for IOMMU that doesn't have a replay method, it skips UNMAP request: for (addr = 0; addr < memory_region_size(mr); addr += granularity) { iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx); if (iotlb.perm != IOMMU_NONE) { n->notify(n, ); } I guess there's no knowledge of whether guest have an explicit MAP/UMAP for this generic code. Or replay implies that guest doesn't have explicit MAP/UNMAP? (btw, the code shortcut the memory_region_notify_one(), not sure the reason) VT-d does not have that clear interface, so VT-d needs to maintain its own mapping structures, and also vt-d is using the same replay & page_walk operations to sync all these structures, which complicated the vt-d replay a bit. With that, we assume replay() can be called anytime on a device, and we won't notify duplicated MAPs to lower layer like vfio if it is mapped before. At the meantime, since we'll compare the latest mapping with the one we cached in the iova tree, UNMAP becomes possible too. AFAIK vtd_iommu_replay() did a completely UNMAP: /* * The replay can be triggered by either a invalidation or a newly * created entry. No matter what, we release existing mappings * (it means flushing caches
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote: > > On 2020/7/9 下午10:10, Peter Xu wrote: > > On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: > > > > > - If we care the performance, it's better to implement the MAP event > > > > > for > > > > > vhost, otherwise it could be a lot of IOTLB miss > > > > I feel like these are two things. > > > > > > > > So far what we are talking about is whether vt-d should have knowledge > > > > about > > > > what kind of events one iommu notifier is interested in. I still think > > > > we > > > > should keep this as answered in question 1. > > > > > > > > The other question is whether we want to switch vhost from UNMAP to > > > > MAP/UNMAP > > > > events even without vDMA, so that vhost can establish the mapping even > > > > before > > > > IO starts. IMHO it's doable, but only if the guest runs DPDK > > > > workloads. When > > > > the guest is using dynamic iommu page mappings, I feel like that can be > > > > even > > > > slower, because then the worst case is for each IO we'll need to vmexit > > > > twice: > > > > > > > > - The first vmexit caused by an invalidation to MAP the page > > > > tables, so vhost > > > > will setup the page table before IO starts > > > > > > > > - IO/DMA triggers and completes > > > > > > > > - The second vmexit caused by another invalidation to UNMAP the > > > > page tables > > > > > > > > So it seems to be worse than when vhost only uses UNMAP like right now. > > > > At > > > > least we only have one vmexit (when UNMAP). We'll have a vhost > > > > translate() > > > > request from kernel to userspace, but IMHO that's cheaper than the > > > > vmexit. > > > > > > Right but then I would still prefer to have another notifier. > > > > > > Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a > > > dedicated command for flushing device IOTLB. But the check for > > > vtd_as_has_map_notifier is used to skip the device which can do demand > > > paging via ATS or device specific way. If we have two different notifiers, > > > vhost will be on the device iotlb notifier so we don't need it at all? > > But we can still have iommu notifier that only registers to UNMAP even > > after we > > introduce dev-iotlb notifier? We don't want to do page walk for them as > > well. > > TCG should be the only one so far, but I don't know.. maybe there can still > > be > > new ones? > > > I think you're right. But looking at the codes, it looks like the check of > vtd_as_has_map_notifier() was only used in: > > 1) vtd_iommu_replay() > 2) vtd_iotlb_page_invalidate_notify() (PSI) > > For the replay, it's expensive anyhow. For PSI, I think it's just about one > or few mappings, not sure it will have obvious performance impact. > > And I had two questions: > > 1) The codes doesn't check map for DSI or GI, does this match what spec > said? (It looks to me the spec is unclear in this part) Both DSI/GI should cover maps too? E.g. vtd_sync_shadow_page_table() in vtd_iotlb_domain_invalidate(). > 2) for the replay() I don't see other implementations (either spapr or > generic one) that did unmap (actually they skip unmap explicitly), any > reason for doing this in intel IOMMU? I could be wrong, but I'd guess it's because vt-d implemented the caching mode by leveraging the same invalidation strucuture, so it's harder to make all things right (IOW, we can't clearly identify MAP with UNMAP when we receive an invalidation request, because MAP/UNMAP requests look the same). I didn't check others, but I believe spapr is doing it differently by using some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to what virtio-iommu is doing. Anyway, the point is if we have explicit MAP/UNMAP from the guest, logically the replay indeed does not need to do any unmap because we don't need to call replay() on an already existing device but only for e.g. hot plug. VT-d does not have that clear interface, so VT-d needs to maintain its own mapping structures, and also vt-d is using the same replay & page_walk operations to sync all these structures, which complicated the vt-d replay a bit. With that, we assume replay() can be called anytime on a device, and we won't notify duplicated MAPs to lower layer like vfio if it is mapped before. At the meantime, since we'll compare the latest mapping with the one we cached in the iova tree, UNMAP becomes possible too. Thanks, -- Peter Xu
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/9 下午10:10, Peter Xu wrote: On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: - If we care the performance, it's better to implement the MAP event for vhost, otherwise it could be a lot of IOTLB miss I feel like these are two things. So far what we are talking about is whether vt-d should have knowledge about what kind of events one iommu notifier is interested in. I still think we should keep this as answered in question 1. The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP events even without vDMA, so that vhost can establish the mapping even before IO starts. IMHO it's doable, but only if the guest runs DPDK workloads. When the guest is using dynamic iommu page mappings, I feel like that can be even slower, because then the worst case is for each IO we'll need to vmexit twice: - The first vmexit caused by an invalidation to MAP the page tables, so vhost will setup the page table before IO starts - IO/DMA triggers and completes - The second vmexit caused by another invalidation to UNMAP the page tables So it seems to be worse than when vhost only uses UNMAP like right now. At least we only have one vmexit (when UNMAP). We'll have a vhost translate() request from kernel to userspace, but IMHO that's cheaper than the vmexit. Right but then I would still prefer to have another notifier. Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a dedicated command for flushing device IOTLB. But the check for vtd_as_has_map_notifier is used to skip the device which can do demand paging via ATS or device specific way. If we have two different notifiers, vhost will be on the device iotlb notifier so we don't need it at all? But we can still have iommu notifier that only registers to UNMAP even after we introduce dev-iotlb notifier? We don't want to do page walk for them as well. TCG should be the only one so far, but I don't know.. maybe there can still be new ones? I think you're right. But looking at the codes, it looks like the check of vtd_as_has_map_notifier() was only used in: 1) vtd_iommu_replay() 2) vtd_iotlb_page_invalidate_notify() (PSI) For the replay, it's expensive anyhow. For PSI, I think it's just about one or few mappings, not sure it will have obvious performance impact. And I had two questions: 1) The codes doesn't check map for DSI or GI, does this match what spec said? (It looks to me the spec is unclear in this part) 2) for the replay() I don't see other implementations (either spapr or generic one) that did unmap (actually they skip unmap explicitly), any reason for doing this in intel IOMMU? Thanks Thanks,
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: > > > - If we care the performance, it's better to implement the MAP event for > > > vhost, otherwise it could be a lot of IOTLB miss > > I feel like these are two things. > > > > So far what we are talking about is whether vt-d should have knowledge about > > what kind of events one iommu notifier is interested in. I still think we > > should keep this as answered in question 1. > > > > The other question is whether we want to switch vhost from UNMAP to > > MAP/UNMAP > > events even without vDMA, so that vhost can establish the mapping even > > before > > IO starts. IMHO it's doable, but only if the guest runs DPDK workloads. > > When > > the guest is using dynamic iommu page mappings, I feel like that can be even > > slower, because then the worst case is for each IO we'll need to vmexit > > twice: > > > >- The first vmexit caused by an invalidation to MAP the page tables, so > > vhost > > will setup the page table before IO starts > > > >- IO/DMA triggers and completes > > > >- The second vmexit caused by another invalidation to UNMAP the page > > tables > > > > So it seems to be worse than when vhost only uses UNMAP like right now. At > > least we only have one vmexit (when UNMAP). We'll have a vhost translate() > > request from kernel to userspace, but IMHO that's cheaper than the vmexit. > > > Right but then I would still prefer to have another notifier. > > Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a > dedicated command for flushing device IOTLB. But the check for > vtd_as_has_map_notifier is used to skip the device which can do demand > paging via ATS or device specific way. If we have two different notifiers, > vhost will be on the device iotlb notifier so we don't need it at all? But we can still have iommu notifier that only registers to UNMAP even after we introduce dev-iotlb notifier? We don't want to do page walk for them as well. TCG should be the only one so far, but I don't know.. maybe there can still be new ones? Thanks, -- Peter Xu
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/8 下午10:16, Peter Xu wrote: On Wed, Jul 08, 2020 at 01:42:30PM +0800, Jason Wang wrote: So it should be functional equivalent to vtd_as_has_notifier(). For example: in vtd_iommu_replay() we'll skip the replay if vhost has registered the iommu notifier because vtd_as_has_map_notifier() will return false. Two questions: - Do we care the performance here? If not, vhost may just ignore the MAP event? I think we care, because vtd_page_walk() can be expensive. Ok. - If we care the performance, it's better to implement the MAP event for vhost, otherwise it could be a lot of IOTLB miss I feel like these are two things. So far what we are talking about is whether vt-d should have knowledge about what kind of events one iommu notifier is interested in. I still think we should keep this as answered in question 1. The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP events even without vDMA, so that vhost can establish the mapping even before IO starts. IMHO it's doable, but only if the guest runs DPDK workloads. When the guest is using dynamic iommu page mappings, I feel like that can be even slower, because then the worst case is for each IO we'll need to vmexit twice: - The first vmexit caused by an invalidation to MAP the page tables, so vhost will setup the page table before IO starts - IO/DMA triggers and completes - The second vmexit caused by another invalidation to UNMAP the page tables So it seems to be worse than when vhost only uses UNMAP like right now. At least we only have one vmexit (when UNMAP). We'll have a vhost translate() request from kernel to userspace, but IMHO that's cheaper than the vmexit. Right but then I would still prefer to have another notifier. Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a dedicated command for flushing device IOTLB. But the check for vtd_as_has_map_notifier is used to skip the device which can do demand paging via ATS or device specific way. If we have two different notifiers, vhost will be on the device iotlb notifier so we don't need it at all? Thanks
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Wed, Jul 08, 2020 at 01:42:30PM +0800, Jason Wang wrote: > > > So it should be functional equivalent to vtd_as_has_notifier(). > > For example: in vtd_iommu_replay() we'll skip the replay if vhost has > > registered the iommu notifier because vtd_as_has_map_notifier() will return > > false. > > > Two questions: > > - Do we care the performance here? If not, vhost may just ignore the MAP > event? I think we care, because vtd_page_walk() can be expensive. > - If we care the performance, it's better to implement the MAP event for > vhost, otherwise it could be a lot of IOTLB miss I feel like these are two things. So far what we are talking about is whether vt-d should have knowledge about what kind of events one iommu notifier is interested in. I still think we should keep this as answered in question 1. The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP events even without vDMA, so that vhost can establish the mapping even before IO starts. IMHO it's doable, but only if the guest runs DPDK workloads. When the guest is using dynamic iommu page mappings, I feel like that can be even slower, because then the worst case is for each IO we'll need to vmexit twice: - The first vmexit caused by an invalidation to MAP the page tables, so vhost will setup the page table before IO starts - IO/DMA triggers and completes - The second vmexit caused by another invalidation to UNMAP the page tables So it seems to be worse than when vhost only uses UNMAP like right now. At least we only have one vmexit (when UNMAP). We'll have a vhost translate() request from kernel to userspace, but IMHO that's cheaper than the vmexit. -- Peter Xu
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/8 上午3:54, Peter Xu wrote: On Tue, Jul 07, 2020 at 04:03:10PM +0800, Jason Wang wrote: On 2020/7/3 下午9:03, Peter Xu wrote: On Fri, Jul 03, 2020 at 03:24:19PM +0800, Jason Wang wrote: On 2020/7/2 下午11:45, Peter Xu wrote: On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote: So I think we agree that a new notifier is needed? Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)? That should work but I wonder something as following is better. Instead of introducing new flags, how about carry the type of event in the notifier then the device (vhost) can choose the message it want to process like: static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event) { switch (event->type) { case IOMMU_MAP: case IOMMU_UNMAP: case IOMMU_DEV_IOTLB_UNMAP: ... } Looks ok to me, though imo we should still keep the registration information, so VT-d knows which notifiers is interested in which events. E.g., we can still do something like vtd_as_has_map_notifier(). Is this for a better performance? I wonder whether it's worth since we can't not have both vhost and vtd to be registered into the same as. /me failed to follow this sentence.. :( Sorry, I meant "vfio" instead "vtd". So it should be functional equivalent to vtd_as_has_notifier(). For example: in vtd_iommu_replay() we'll skip the replay if vhost has registered the iommu notifier because vtd_as_has_map_notifier() will return false. Two questions: - Do we care the performance here? If not, vhost may just ignore the MAP event? - If we care the performance, it's better to implement the MAP event for vhost, otherwise it could be a lot of IOTLB miss Thanks It'll only return true if the device is a vfio-pci device. Without vtd_as_has_map_notifier(), how could we do that?
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Tue, Jul 07, 2020 at 04:03:10PM +0800, Jason Wang wrote: > > On 2020/7/3 下午9:03, Peter Xu wrote: > > On Fri, Jul 03, 2020 at 03:24:19PM +0800, Jason Wang wrote: > > > On 2020/7/2 下午11:45, Peter Xu wrote: > > > > On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote: > > > > > So I think we agree that a new notifier is needed? > > > > Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)? > > > > > > That should work but I wonder something as following is better. > > > > > > Instead of introducing new flags, how about carry the type of event in the > > > notifier then the device (vhost) can choose the message it want to process > > > like: > > > > > > static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event) > > > > > > { > > > > > > switch (event->type) { > > > > > > case IOMMU_MAP: > > > case IOMMU_UNMAP: > > > case IOMMU_DEV_IOTLB_UNMAP: > > > ... > > > > > > } > > Looks ok to me, though imo we should still keep the registration > > information, > > so VT-d knows which notifiers is interested in which events. E.g., we can > > still do something like vtd_as_has_map_notifier(). > > > Is this for a better performance? > > I wonder whether it's worth since we can't not have both vhost and vtd to be > registered into the same as. /me failed to follow this sentence.. :( > > So it should be functional equivalent to vtd_as_has_notifier(). For example: in vtd_iommu_replay() we'll skip the replay if vhost has registered the iommu notifier because vtd_as_has_map_notifier() will return false. It'll only return true if the device is a vfio-pci device. Without vtd_as_has_map_notifier(), how could we do that? -- Peter Xu
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/3 下午9:03, Peter Xu wrote: On Fri, Jul 03, 2020 at 03:24:19PM +0800, Jason Wang wrote: On 2020/7/2 下午11:45, Peter Xu wrote: On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote: So I think we agree that a new notifier is needed? Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)? That should work but I wonder something as following is better. Instead of introducing new flags, how about carry the type of event in the notifier then the device (vhost) can choose the message it want to process like: static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event) { switch (event->type) { case IOMMU_MAP: case IOMMU_UNMAP: case IOMMU_DEV_IOTLB_UNMAP: ... } Looks ok to me, though imo we should still keep the registration information, so VT-d knows which notifiers is interested in which events. E.g., we can still do something like vtd_as_has_map_notifier(). Is this for a better performance? I wonder whether it's worth since we can't not have both vhost and vtd to be registered into the same as. So it should be functional equivalent to vtd_as_has_notifier(). Thanks So these are probably two different things: the new IOMMU_NOTIFIER_DEV_IOTLB flag is one as discussed; the other one is whether we would like to introduce IOMMUTLBEvent to include the type, so that we can avoid introduce two notifiers for one device majorly to identify dev-iotlb from unmaps. Thanks,
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Fri, Jul 03, 2020 at 03:24:19PM +0800, Jason Wang wrote: > > On 2020/7/2 下午11:45, Peter Xu wrote: > > On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote: > > > So I think we agree that a new notifier is needed? > > Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)? > > > That should work but I wonder something as following is better. > > Instead of introducing new flags, how about carry the type of event in the > notifier then the device (vhost) can choose the message it want to process > like: > > static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event) > > { > > switch (event->type) { > > case IOMMU_MAP: > case IOMMU_UNMAP: > case IOMMU_DEV_IOTLB_UNMAP: > ... > > } Looks ok to me, though imo we should still keep the registration information, so VT-d knows which notifiers is interested in which events. E.g., we can still do something like vtd_as_has_map_notifier(). So these are probably two different things: the new IOMMU_NOTIFIER_DEV_IOTLB flag is one as discussed; the other one is whether we would like to introduce IOMMUTLBEvent to include the type, so that we can avoid introduce two notifiers for one device majorly to identify dev-iotlb from unmaps. Thanks, -- Peter Xu
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/2 下午11:45, Peter Xu wrote: On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote: So I think we agree that a new notifier is needed? Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)? That should work but I wonder something as following is better. Instead of introducing new flags, how about carry the type of event in the notifier then the device (vhost) can choose the message it want to process like: static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event) { switch (event->type) { case IOMMU_MAP: case IOMMU_UNMAP: case IOMMU_DEV_IOTLB_UNMAP: ... } Thanks
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote: > So I think we agree that a new notifier is needed? Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)? -- Peter Xu
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/1 下午4:09, Jason Wang wrote: On 2020/6/30 下午11:39, Peter Xu wrote: On Tue, Jun 30, 2020 at 10:41:10AM +0800, Jason Wang wrote: /* According to ATS spec table 2.4: * S = 0, bits 15:12 = range size: 4K * S = 1, bits 15:12 = xxx0 range size: 8K * S = 1, bits 15:12 = xx01 range size: 16K * S = 1, bits 15:12 = x011 range size: 32K * S = 1, bits 15:12 = 0111 range size: 64K * ... */ Right, but the comment is probably misleading here, since it's for the PCI-E transaction between IOMMU and device not for the device IOTLB invalidation descriptor. For device IOTLB invalidation descriptor, spec allows a [0, ~0ULL] invalidation: " 6.5.2.5 Device-TLB Invalidate Descriptor ... Size (S): The size field indicates the number of consecutive pages targeted by this invalidation request. If S field is zero, a single page at page address specified by Address [63:12] is requested to be invalidated. If S field is Set, the least significant bit in the Address field with value 0b indicates the invalidation address range. For example, if S field is Set and Address[12] is Clear, it indicates an 8KB invalidation address range with base address in Address [63:13]. If S field and Address[12] is Set and bit 13 is Clear, it indicates a 16KB invalidation address range with base address in Address [63:14], etc. " So if we receive an address whose [63] is 0 and the rest is all 1, it's then a [0, ~0ULL] invalidation. Yes. I think invalidating the whole range is always fine. It's still not arbitrary, right? E.g., we can't even invalidate (0x1000, 0x3000) with device-iotlb because of the address mask, not to say sub-pages. Yes. How about just convert to use a range [start, end] for any notifier and move the checks (e.g the assert) into the actual notifier implemented (vhost or vfio)? IOMMUTLBEntry itself is the abstraction layer of TLB entry. Hardware TLB entry is definitely not arbitrary range either (because AFAICT the hardware should only cache PFN rather than address, so at least PAGE_SIZE aligned). Introducing this flag will already make this trickier just to avoid introducing another similar struct to IOMMUTLBEntry, but I really don't want to make it a default option... Not to mention I probably have no reason to urge the rest iommu notifier users (tcg, vfio) to change their existing good code to suite any of the backend who can cooperate with arbitrary address ranges... Ok, so it looks like we need a dedicated notifiers to device IOTLB. Or we can also make a new flag for device iotlb just like current UNMAP? Then we replace the vhost type from UNMAP to DEVICE_IOTLB. But IMHO using the ARBITRARY_LENGTH flag would work in a similar way. DEVICE_IOTLB flag could also allow virtio/vhost to only receive one invalidation (now IIUC it'll receive both iotlb and device-iotlb for unmapping a page when ats=on), but then ats=on will be a must and it could break some old (misconfiged) qemu because afaict previously virtio/vhost could even work with vIOMMU (accidentally) even without ats=on. That's a bug and I don't think we need to workaround mis-configurated qemu :) IMHO it depends on the strictness we want on the qemu cmdline API. :) We should at least check libvirt to make sure it's using ats=on always, then I agree maybe we can avoid considering the rest... Thanks, Cc libvirt list, but I think we should fix libvirt if they don't provide "ats=on". Thanks Libvirt looks fine, according to the domain XML documentation[1]: QEMU's virtio devices have some attributes related to the virtio transport under the driver element: The iommu attribute enables the use of emulated IOMMU by the device. The attribute ats controls the Address Translation Service support for PCIe devices. This is needed to make use of IOTLB support (see IOMMU device). Possible values are on or off. Since 3.5.0 So I think we agree that a new notifier is needed? Thanks [1] https://libvirt.org/formatdomain.html
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/6/30 下午11:39, Peter Xu wrote: On Tue, Jun 30, 2020 at 10:41:10AM +0800, Jason Wang wrote: /* According to ATS spec table 2.4: * S = 0, bits 15:12 = range size: 4K * S = 1, bits 15:12 = xxx0 range size: 8K * S = 1, bits 15:12 = xx01 range size: 16K * S = 1, bits 15:12 = x011 range size: 32K * S = 1, bits 15:12 = 0111 range size: 64K * ... */ Right, but the comment is probably misleading here, since it's for the PCI-E transaction between IOMMU and device not for the device IOTLB invalidation descriptor. For device IOTLB invalidation descriptor, spec allows a [0, ~0ULL] invalidation: " 6.5.2.5 Device-TLB Invalidate Descriptor ... Size (S): The size field indicates the number of consecutive pages targeted by this invalidation request. If S field is zero, a single page at page address specified by Address [63:12] is requested to be invalidated. If S field is Set, the least significant bit in the Address field with value 0b indicates the invalidation address range. For example, if S field is Set and Address[12] is Clear, it indicates an 8KB invalidation address range with base address in Address [63:13]. If S field and Address[12] is Set and bit 13 is Clear, it indicates a 16KB invalidation address range with base address in Address [63:14], etc. " So if we receive an address whose [63] is 0 and the rest is all 1, it's then a [0, ~0ULL] invalidation. Yes. I think invalidating the whole range is always fine. It's still not arbitrary, right? E.g., we can't even invalidate (0x1000, 0x3000) with device-iotlb because of the address mask, not to say sub-pages. Yes. How about just convert to use a range [start, end] for any notifier and move the checks (e.g the assert) into the actual notifier implemented (vhost or vfio)? IOMMUTLBEntry itself is the abstraction layer of TLB entry. Hardware TLB entry is definitely not arbitrary range either (because AFAICT the hardware should only cache PFN rather than address, so at least PAGE_SIZE aligned). Introducing this flag will already make this trickier just to avoid introducing another similar struct to IOMMUTLBEntry, but I really don't want to make it a default option... Not to mention I probably have no reason to urge the rest iommu notifier users (tcg, vfio) to change their existing good code to suite any of the backend who can cooperate with arbitrary address ranges... Ok, so it looks like we need a dedicated notifiers to device IOTLB. Or we can also make a new flag for device iotlb just like current UNMAP? Then we replace the vhost type from UNMAP to DEVICE_IOTLB. But IMHO using the ARBITRARY_LENGTH flag would work in a similar way. DEVICE_IOTLB flag could also allow virtio/vhost to only receive one invalidation (now IIUC it'll receive both iotlb and device-iotlb for unmapping a page when ats=on), but then ats=on will be a must and it could break some old (misconfiged) qemu because afaict previously virtio/vhost could even work with vIOMMU (accidentally) even without ats=on. That's a bug and I don't think we need to workaround mis-configurated qemu :) IMHO it depends on the strictness we want on the qemu cmdline API. :) We should at least check libvirt to make sure it's using ats=on always, then I agree maybe we can avoid considering the rest... Thanks, Cc libvirt list, but I think we should fix libvirt if they don't provide "ats=on". Thanks