On Wed, Jul 10, 2024 at 01:55:23PM +0000, Liu, Yuan1 wrote: [...]
> migrate_set_parameter max-bandwidth 1250M > |-----------|--------|---------|----------|----------|------|------| > |8 Channels |Total |down |throughput|pages per | send | recv | > | |time(ms)|time(ms) |(mbps) |second | cpu %| cpu% | > |-----------|--------|---------|----------|----------|------|------| > |qatzip | 16630| 28| 10467| 2940235| 160| 360| > |-----------|--------|---------|----------|----------|------|------| > |zstd | 20165| 24| 8579| 2391465| 810| 340| > |-----------|--------|---------|----------|----------|------|------| > |none | 46063| 40| 10848| 330240| 45| 85| > |-----------|--------|---------|----------|----------|------|------| > > QATzip's dirty page processing throughput is much higher than that no > compression. > In this test, the vCPUs are in idle state, so the migration can be successful > even > without compression. Thanks! Maybe good material to be put into the docs/ too, if Yichen's going to pick up your doc patch when repost. [...] > I don’t have much experience with postcopy, here are some of my thoughts > 1. For write-intensive VMs, this solution can improve the migration success, > because in a limited bandwidth network scenario, the dirty page processing > throughput will be significantly reduced for no compression, the previous > data includes this(pages_per_second), it means that in the no compression > precopy, the dirty pages generated by the workload are greater than the > migration processing, resulting in migration failure. Yes. > > 2. If the VM is read-intensive or has low vCPU utilization (for example, my > current test scenario is that the vCPUs are all idle). I think no > compression + > precopy + postcopy also cannot improve the migration performance, and may > also > cause timeout failure due to long migration time, same with no compression > precopy. I don't think postcopy will trigger timeout failures - postcopy should use constant time to complete a migration, that is guest memsize / bw. The challenge is normally on the delay of page requests higher than precopy, but in this case it might not be a big deal. And I wonder if on 100G*2 cards it can also perform pretty well, as the delay might be minimal even if bandwidth is throttled. > > 3. In my opinion, the postcopy is a good solution in this scenario(low > network bandwidth, > VM is not critical), because even if compression is turned on, the > migration may still > fail(page_per_second may still less than the new dirty pages), and it is > hard to predict > whether VM memory is compression-friendly. Yes. Thanks, -- Peter Xu