Hi Marc, Do you have time to have a look at this? Thanks ;-)
Keqian. On 2021/1/26 20:44, Keqian Zhu wrote: > The intention: > > On arm64 platform, we tracking dirty log of vCPU through guest memory abort. > KVM occupys some vCPU time of guest to change stage2 mapping and mark dirty. > This leads to heavy side effect on VM, especially when multi vCPU race and > some of them block on kvm mmu_lock. > > DBM is a HW auxiliary approach to log dirty. MMU chages PTE to be writable if > its DBM bit is set. Then KVM doesn't occupy vCPU time to log dirty. > > About this patch series: > > The biggest problem of apply DBM for stage2 is that software must scan PTs to > collect dirty state, which may cost much time and affect downtime of > migration. > > This series realize a SW/HW combined dirty log that can effectively solve this > problem (The smmu side can also use this approach to solve dma dirty log > tracking). > > The core idea is that we do not enable hardware dirty at start (do not add > DBM bit). > When a arbitrary PT occurs fault, we execute soft tracking for this PT and > enable > hardware tracking for its *nearby* PTs (e.g. Add DBM bit for nearby 16PTs). > Then when > sync dirty log, we have known all PTs with hardware dirty enabled, so we do > not need > to scan all PTs. > > mem abort point mem abort point > ↓ ↓ > --------------------------------------------------------------- > |********| | |********| | | > --------------------------------------------------------------- > ↑ ↑ > set DBM bit of set DBM bit of > this PT section (64PTEs) this PT section (64PTEs) > > We may worry that when dirty rate is over-high we still need to scan too much > PTs. > We mainly concern the VM stop time. With Qemu dirty rate throttling, the > dirty memory > is closing to the VM stop threshold, so there is a little PTs to scan after > VM stop. > > It has the advantages of hardware tracking that minimizes side effect on vCPU, > and also has the advantages of software tracking that controls vCPU dirty > rate. > Moreover, software tracking helps us to scan PTs at some fixed points, which > greatly reduces scanning time. And the biggest benefit is that we can apply > this > solution for dma dirty tracking. > > Test: > > Host: Kunpeng 920 with 128 CPU 512G RAM. Disable Transparent Hugepage (Ensure > test result > is not effected by dissolve of block page table at the early stage of > migration). > VM: 16 CPU 16GB RAM. Run 4 pair of (redis_benchmark+redis_server). > > Each run 5 times for software dirty log and SW/HW conbined dirty log. > > Test result: > > Gain 5%~7% improvement of redis QPS during VM migration. > VM downtime is not affected fundamentally. > About 56.7% of DBM is effectively used. > > Keqian Zhu (7): > arm64: cpufeature: Add API to report system support of HWDBM > kvm: arm64: Use atomic operation when update PTE > kvm: arm64: Add level_apply parameter for stage2_attr_walker > kvm: arm64: Add some HW_DBM related pgtable interfaces > kvm: arm64: Add some HW_DBM related mmu interfaces > kvm: arm64: Only write protect selected PTE > kvm: arm64: Start up SW/HW combined dirty log > > arch/arm64/include/asm/cpufeature.h | 12 +++ > arch/arm64/include/asm/kvm_host.h | 6 ++ > arch/arm64/include/asm/kvm_mmu.h | 7 ++ > arch/arm64/include/asm/kvm_pgtable.h | 45 ++++++++++ > arch/arm64/kvm/arm.c | 125 ++++++++++++++++++++++++++ > arch/arm64/kvm/hyp/pgtable.c | 130 ++++++++++++++++++++++----- > arch/arm64/kvm/mmu.c | 47 +++++++++- > arch/arm64/kvm/reset.c | 8 +- > 8 files changed, 351 insertions(+), 29 deletions(-) >