> From: Kunkun Jiang <jiangkun...@huawei.com> > Sent: Thursday, March 18, 2021 8:29 PM > > Hi Kevin, > > On 2021/3/18 17:04, Tian, Kevin wrote: > >> From: Kunkun Jiang <jiangkun...@huawei.com> > >> Sent: Thursday, March 18, 2021 3:59 PM > >> > >> Hi Kevin, > >> > >> On 2021/3/18 14:28, Tian, Kevin wrote: > >>>> From: Kunkun Jiang > >>>> Sent: Wednesday, March 10, 2021 5:41 PM > >>>> > >>>> Hi all, > >>>> > >>>> In the past, we clear dirty log immediately after sync dirty log to > >>>> userspace. This may cause redundant dirty handling if userspace > >>>> handles dirty log iteratively: > >>>> > >>>> After vfio clears dirty log, new dirty log starts to generate. These > >>>> new dirty log will be reported to userspace even if they are generated > >>>> before userspace handles the same dirty page. > >>>> > >>>> Since a new dirty log tracking method for vfio based on iommu > hwdbm[1] > >>>> has been introduced in the kernel and added a new capability named > >>>> VFIO_DIRTY_LOG_MANUAL_CLEAR, we can eliminate some redundant > >> dirty > >>>> handling by supporting it. > >>> Is there any performance data showing the benefit of this new method? > >>> > >> Current dirty log tracking method for VFIO: > >> [1] All pages marked dirty if not all iommu_groups have pinned_scope > >> [2] pinned pages by various vendor drivers if all iommu_groups have > >> pinned scope > >> > >> Both methods are coarse-grained and can not determine which pages are > >> really dirty. Each round may mark the pages that are not really dirty as > >> dirty > >> and send them to the destination. ( It might be better if the range of the > >> pinned_scope was smaller. ) This will result in a waste of resources. > >> > >> HWDBM is short for Hardware Dirty Bit Management. > >> (e.g. smmuv3 HTTU, Hardware Translation Table Update) > >> > >> About SMMU HTTU: > >> HTTU is a feature of ARM SMMUv3, it can update access flag or/and dirty > >> state of the TTD (Translation Table Descriptor) by hardware. > >> > >> With HTTU, stage1 TTD is classified into 3 types: > >> DBM bit AP[2](readonly bit) > >> 1. writable_clean 1 1 > >> 2. writable_dirty 1 0 > >> 3. readonly 0 1 > >> > >> If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from > >> writable_clean to writable_dirty. Then software can scan TTD to sync dirty > >> state into dirty bitmap. With this feature, we can track the dirty log of > >> DMA continuously and precisely. > >> > >> The capability of VFIO_DIRTY_LOG_MANUAL_CLEAR is similar to that on > >> the KVM side. We add this new log_clear() interface only to split the old > >> log_sync() into two separated procedures: > >> > >> - use log_sync() to collect the collection only, and, > >> - use log_clear() to clear the dirty bitmap. > >> > >> If you're interested in this new method, you can take a look at our set of > >> patches. > >> [1] > >> https://lore.kernel.org/linux-iommu/20210310090614.26668-1- > >> zhukeqi...@huawei.com/ > >> > > I know what you are doing. Intel is also working on VT-d dirty bit support > > based on above link. What I'm curious is the actual performance gain > > with this optimization. KVM doing that is one good reference, but IOMMU > > has different characteristics (e.g. longer invalidation latency) compared to > > CPU MMU. It's always good to understand what a so-called optimization > > can actually optimize in a context different from where it's originally > proved.😊 > > > > Thanks > > Kevin > > My understanding is that this is a new method, which is quite different > from the > previous two. So can you explain in more detail what performance data > you want?😁 > > Thanks, > Kunkun Jiang
When you have HTTU enabled, compare the migration efficiency with and without this manual clear interface. Thanks Kevin