On 2023/8/16 6:19, 【外部账号】 Peter Xu wrote:
On Tue, Aug 15, 2023 at 09:35:19AM -0300, Fabiano Rosas wrote:
Guoyi Tu <t...@chinatelecom.cn> writes:

When the migration process of a virtual machine using huge pages is
cancelled,
QEMU will continue to complete the processing of the current huge page
through the qemu file object got an error set. These processing, such as
compression and encryption, will consume a lot of CPU resources which may
affact the the performance of the other VMs.

To terminate the migration process more quickly and minimize unnecessary
resource occupancy, it's neccessary to add logic to check the error status
of qemu file object in the beginning of ram_save_target_page_legacy
function,
and make sure the function returns immediately if qemu file got an error.

Signed-off-by: Guoyi Tu <t...@chinatelecom.cn>
---
   migration/ram.c | 4 ++++
   1 file changed, 4 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 9040d66e61..3e2ebf3004 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2133,6 +2133,10 @@ static int ram_save_target_page_legacy(RAMState
*rs, PageSearchStatus *pss)
       ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
       int res;

+    if (qemu_file_get_error(pss->pss_channel)) {
+        return -1;
+    }

Where was the error set? Is this from cancelling via QMP? Or something
from within ram_save_target_page_legacy? We should probably make the
check closer to where the error happens. At the very least moving the
check into the loop.

Fabiano - I think it's in the loop (of all target pages within a same host
page), and IIUC Guoyi mentioned it's part of cancelling.

Guoyi, I assume you just saw qemu cancel too slow over e.g. 1g pages?
The patch looks good here.

Yes, when migration process got cancelled, i think there is no need to handle the remaining part of the huge page, we should quit immediatley

Thanks,


Reply via email to