[PATCH v2] MAINTAINERS: Change my email address
The zhang.zhanghaili...@huawei.com email address has been stopped. Change it to my new email address. Signed-off-by: Hailiang Zhang --- hi Juan & Dave, Firstly, thank you for your working on maintaining the COLO framework. I didn't have much time on it in the past days. I may have some time in the next days since my job has changed. Because of my old email being stopped, i can not use it to send this patch. Please help me to merge this patch. Thanks, Hailiang --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 7543eb4d59..5d9c4243b4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2967,7 +2967,7 @@ F: include/qemu/yank.h F: qapi/yank.json COLO Framework -M: zhanghailiang +M: Hailiang Zhang S: Maintained F: migration/colo* F: include/migration/colo.h -- 2.34.1
[PATCH] MAINTAINERS: Change my email address
The zhang.zhanghaili...@huawei.com email address has been stopped. Change it to my new email address. Signed-off-by: Hailiang Zhang --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 7543eb4d59..5d9c4243b4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2967,7 +2967,7 @@ F: include/qemu/yank.h F: qapi/yank.json COLO Framework -M: zhanghailiang +M: Hailiang Zhang S: Maintained F: migration/colo* F: include/migration/colo.h -- 2.34.1
[PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing
We can migrate some dirty pages during the gap of checkpointing, by this way, we can reduce the amount of ram migrated during checkpointing. Signed-off-by: Hailiang Zhang --- migration/colo.c | 69 +++--- migration/migration.h | 1 + migration/trace-events | 1 + qapi/migration.json| 4 ++- 4 files changed, 70 insertions(+), 5 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 93c5a452fb..d30c6bc4ad 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -46,6 +46,13 @@ static COLOMode last_colo_mode; #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) +#define DEFAULT_RAM_PENDING_CHECK 1000 + +/* should be calculated by bandwidth and max downtime ? */ +#define THRESHOLD_PENDING_SIZE (100 * 1024 * 1024UL) + +static int checkpoint_request; + bool migration_in_colo_state(void) { MigrationState *s = migrate_get_current(); @@ -516,6 +523,20 @@ static void colo_compare_notify_checkpoint(Notifier *notifier, void *data) colo_checkpoint_notify(data); } +static bool colo_need_migrate_ram_background(MigrationState *s) +{ +uint64_t pending_size, pend_pre, pend_compat, pend_post; +int64_t max_size = THRESHOLD_PENDING_SIZE; + +qemu_savevm_state_pending(s->to_dst_file, max_size, _pre, + _compat, _post); +pending_size = pend_pre + pend_compat + pend_post; + +trace_colo_need_migrate_ram_background(pending_size); +return (pending_size >= max_size); +} + + static void colo_process_checkpoint(MigrationState *s) { QIOChannelBuffer *bioc; @@ -571,6 +592,8 @@ static void colo_process_checkpoint(MigrationState *s) timer_mod(s->colo_delay_timer, current_time + s->parameters.x_checkpoint_delay); +timer_mod(s->pending_ram_check_timer, +current_time + DEFAULT_RAM_PENDING_CHECK); while (s->state == MIGRATION_STATUS_COLO) { if (failover_get_state() != FAILOVER_STATUS_NONE) { @@ -583,10 +606,25 @@ static void colo_process_checkpoint(MigrationState *s) if (s->state != MIGRATION_STATUS_COLO) { goto out; } -ret = colo_do_checkpoint_transaction(s, bioc, fb); -if (ret < 0) { -goto out; -} +if (atomic_xchg(_request, 0)) { +/* start a colo checkpoint */ +ret = colo_do_checkpoint_transaction(s, bioc, fb); +if (ret < 0) { +goto out; +} +} else { +if (colo_need_migrate_ram_background(s)) { +colo_send_message(s->to_dst_file, + COLO_MESSAGE_MIGRATE_RAM_BACKGROUND, + _err); +if (local_err) { +goto out; +} + +qemu_savevm_state_iterate(s->to_dst_file, false); +qemu_put_byte(s->to_dst_file, QEMU_VM_EOF); +} + } } out: @@ -626,6 +664,8 @@ out: colo_compare_unregister_notifier(_compare_notifier); timer_del(s->colo_delay_timer); timer_free(s->colo_delay_timer); +timer_del(s->pending_ram_check_timer); +timer_free(s->pending_ram_check_timer); qemu_sem_destroy(>colo_checkpoint_sem); /* @@ -643,6 +683,7 @@ void colo_checkpoint_notify(void *opaque) MigrationState *s = opaque; int64_t next_notify_time; +atomic_inc(_request); qemu_sem_post(>colo_checkpoint_sem); s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); next_notify_time = s->colo_checkpoint_time + @@ -650,6 +691,19 @@ void colo_checkpoint_notify(void *opaque) timer_mod(s->colo_delay_timer, next_notify_time); } +static void colo_pending_ram_check_notify(void *opaque) +{ +int64_t next_notify_time; +MigrationState *s = opaque; + +if (migration_in_colo_state()) { +next_notify_time = DEFAULT_RAM_PENDING_CHECK + + qemu_clock_get_ms(QEMU_CLOCK_HOST); +timer_mod(s->pending_ram_check_timer, next_notify_time); +qemu_sem_post(>colo_checkpoint_sem); +} +} + void migrate_start_colo_process(MigrationState *s) { qemu_mutex_unlock_iothread(); @@ -657,6 +711,8 @@ void migrate_start_colo_process(MigrationState *s) s->colo_delay_timer = timer_new_ms(QEMU_CLOCK_HOST, colo_checkpoint_notify, s); +s->pending_ram_check_timer = timer_new_ms(QEMU_CLOCK_HOST, +colo_pending_ram_check_notify, s); qemu_sem_init(>colo_exit_sem, 0); migrate_set_state(>state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); @@ -805,6 +861,11 @@ static void colo_wait_handle_message(MigrationIncomingState *mis, case COLO_MESSAGE_CHECKPOINT_REQUEST: colo_incoming_process_checkpoint(mis, fb, bioc, errp); break; +case COLO_MESSAGE_MIGRATE
[PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO
Hi, This is an untested serial that tries to reduce VM's pause time while do checkpoint in COLO state. The second patch tries to reduce the total number of dirty pages while do checkpoint with VM been paused, instead of sending all dirty pages while VM been pause, it sends part of dirty pages during the gap time of two checkpoints when SVM and PVM are running. The third patch tries to reduce the pause time of backup ram into cache in secondary part. Hailiang Zhang (3): migration/colo: wrap incoming checkpoint process into new helper COLO: Migrate dirty pages during the gap of checkpointing COLO: Optimize memory back-up process migration/colo.c | 332 + migration/migration.h | 1 + migration/ram.c| 35 - migration/ram.h| 1 + migration/trace-events | 1 + qapi/migration.json| 4 +- 6 files changed, 234 insertions(+), 140 deletions(-) -- 2.21.0
[PATCH 3/3] COLO: Optimize memory back-up process
This patch will reduce the downtime of VM for the initial process, Privously, we copied all these memory in preparing stage of COLO while we need to stop VM, which is a time-consuming process. Here we optimize it by a trick, back-up every page while in migration process while COLO is enabled, though it affects the speed of the migration, but it obviously reduce the downtime of back-up all SVM'S memory in COLO preparing stage. Signed-off-by: Hailiang Zhang --- migration/colo.c | 3 +++ migration/ram.c | 35 +++ migration/ram.h | 1 + 3 files changed, 31 insertions(+), 8 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index d30c6bc4ad..febf010571 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -26,6 +26,7 @@ #include "qemu/main-loop.h" #include "qemu/rcu.h" #include "migration/failover.h" +#include "migration/ram.h" #ifdef CONFIG_REPLICATION #include "replication.h" #endif @@ -906,6 +907,8 @@ void *colo_process_incoming_thread(void *opaque) */ qemu_file_set_blocking(mis->from_src_file, true); +colo_incoming_start_dirty_log(); + bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE); fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc)); object_unref(OBJECT(bioc)); diff --git a/migration/ram.c b/migration/ram.c index ed23ed1c7c..24a8aa3527 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2986,7 +2986,6 @@ int colo_init_ram_cache(void) } return -errno; } -memcpy(block->colo_cache, block->host, block->used_length); } } @@ -3005,12 +3004,16 @@ int colo_init_ram_cache(void) bitmap_set(block->bmap, 0, pages); } } + +return 0; +} + +void colo_incoming_start_dirty_log(void) +{ ram_state = g_new0(RAMState, 1); ram_state->migration_dirty_pages = 0; qemu_mutex_init(_state->bitmap_mutex); memory_global_dirty_log_start(); - -return 0; } /* It is need to hold the global lock to call this helper */ @@ -3348,7 +3351,7 @@ static int ram_load_precopy(QEMUFile *f) while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { ram_addr_t addr, total_ram_bytes; -void *host = NULL; +void *host = NULL, *host_bak = NULL; uint8_t ch; /* @@ -3378,13 +3381,26 @@ static int ram_load_precopy(QEMUFile *f) if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE | RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) { RAMBlock *block = ram_block_from_stream(f, flags); - /* - * After going into COLO, we should load the Page into colo_cache. + * After going into COLO, we should load the Page into colo_cache + * NOTE: We need to keep a copy of SVM's ram in colo_cache. + * Privously, we copied all these memory in preparing stage of COLO + * while we need to stop VM, which is a time-consuming process. + * Here we optimize it by a trick, back-up every page while in + * migration process while COLO is enabled, though it affects the + * speed of the migration, but it obviously reduce the downtime of + * back-up all SVM'S memory in COLO preparing stage. */ -if (migration_incoming_in_colo_state()) { +if (migration_incoming_colo_enabled()) { host = colo_cache_from_block_offset(block, addr); -} else { +/* + * After going into COLO, load the Page into colo_cache. + */ +if (!migration_incoming_in_colo_state()) { +host_bak = host; +} +} +if (!migration_incoming_in_colo_state()) { host = host_from_ram_block_offset(block, addr); } if (!host) { @@ -3506,6 +3522,9 @@ static int ram_load_precopy(QEMUFile *f) if (!ret) { ret = qemu_file_get_error(f); } +if (!ret && host_bak && host) { +memcpy(host_bak, host, TARGET_PAGE_SIZE); +} } ret |= wait_for_decompress_done(); diff --git a/migration/ram.h b/migration/ram.h index a553d40751..5ceaff7cb4 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -66,5 +66,6 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); /* ram cache */ int colo_init_ram_cache(void); void colo_release_ram_cache(void); +void colo_incoming_start_dirty_log(void); #endif -- 2.21.0
[PATCH 1/3] migration/colo: wrap incoming checkpoint process into new helper
Split checkpoint incoming process into a helper. Signed-off-by: Hailiang Zhang --- migration/colo.c | 260 --- 1 file changed, 133 insertions(+), 127 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 2c88aa57a2..93c5a452fb 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -664,13 +664,138 @@ void migrate_start_colo_process(MigrationState *s) qemu_mutex_lock_iothread(); } -static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request, - Error **errp) +static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, + QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp) +{ +uint64_t total_size; +uint64_t value; +Error *local_err = NULL; +int ret; + +qemu_mutex_lock_iothread(); +vm_stop_force_state(RUN_STATE_COLO); +trace_colo_vm_state_change("run", "stop"); +qemu_mutex_unlock_iothread(); + +/* FIXME: This is unnecessary for periodic checkpoint mode */ +colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY, + _err); +if (local_err) { +error_propagate(errp, local_err); +return; +} + +colo_receive_check_message(mis->from_src_file, + COLO_MESSAGE_VMSTATE_SEND, _err); +if (local_err) { +error_propagate(errp, local_err); +return; +} + +qemu_mutex_lock_iothread(); +cpu_synchronize_all_pre_loadvm(); +ret = qemu_loadvm_state_main(mis->from_src_file, mis); +qemu_mutex_unlock_iothread(); + +if (ret < 0) { +error_setg(errp, "Load VM's live state (ram) error"); +return; +} + +value = colo_receive_message_value(mis->from_src_file, + COLO_MESSAGE_VMSTATE_SIZE, _err); +if (local_err) { +error_propagate(errp, local_err); +return; +} + +/* + * Read VM device state data into channel buffer, + * It's better to re-use the memory allocated. + * Here we need to handle the channel buffer directly. + */ +if (value > bioc->capacity) { +bioc->capacity = value; +bioc->data = g_realloc(bioc->data, bioc->capacity); +} +total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value); +if (total_size != value) { +error_setg(errp, "Got %" PRIu64 " VMState data, less than expected" +" %" PRIu64, total_size, value); +return; +} +bioc->usage = total_size; +qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL); + +colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED, + _err); +if (local_err) { +error_propagate(errp, local_err); +return; +} + +qemu_mutex_lock_iothread(); +vmstate_loading = true; +ret = qemu_load_device_state(fb); +if (ret < 0) { +error_setg(errp, "COLO: load device state failed"); +qemu_mutex_unlock_iothread(); +return; +} + +#ifdef CONFIG_REPLICATION +replication_get_error_all(_err); +if (local_err) { +error_propagate(errp, local_err); +qemu_mutex_unlock_iothread(); +return; +} + +/* discard colo disk buffer */ +replication_do_checkpoint_all(_err); +if (local_err) { +error_propagate(errp, local_err); +qemu_mutex_unlock_iothread(); +return; +} +#else +abort(); +#endif +/* Notify all filters of all NIC to do checkpoint */ +colo_notify_filters_event(COLO_EVENT_CHECKPOINT, _err); + +if (local_err) { +error_propagate(errp, local_err); +qemu_mutex_unlock_iothread(); +return; +} + +vmstate_loading = false; +vm_start(); +trace_colo_vm_state_change("stop", "run"); +qemu_mutex_unlock_iothread(); + +if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) { +failover_set_state(FAILOVER_STATUS_RELAUNCH, +FAILOVER_STATUS_NONE); +failover_request_active(NULL); +return; +} + +colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED, + _err); +if (local_err) { +error_propagate(errp, local_err); +} +} + +static void colo_wait_handle_message(MigrationIncomingState *mis, +QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp) { COLOMessage msg; Error *local_err = NULL; -msg = colo_receive_message(f, _err); +msg = colo_receive_message(mis->from_src_file, _err); if (local_err) { error_propagate(errp, local_err); return; @@ -678,10 +803,9 @@ static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request, switch (msg) { case COLO_MESSAGE_CHECKPOINT_REQUEST: -*checkpoint_request = 1; +colo_incoming
Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
On 2018/4/24 14:13, Wei Wang wrote: This is the deivce part implementation to add a new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device receives the guest free page hints from the driver and clears the corresponding bits in the dirty bitmap, so that those free pages are not transferred by the migration thread to the destination. - Test Environment Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz Guest: 8G RAM, 4 vCPU Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second - Test Results - Idle Guest Live Migration Time (results are averaged over 10 runs): - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction - Guest with Linux Compilation Workload (make bzImage -j4): - Live Migration Time (average) Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction - Linux Compilation Time Optimization v.s. Legacy = 4min56s v.s. 5min3s --> no obvious difference - Source Code - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git ChangeLog: v6->v7: virtio-balloon/virtio_balloo_poll_free_page_hints: - add virtio_notify() at the end to notify the driver that the optimization is done, which indicates that the entries have all been put back to the vq and ready to detach them. v5->v6: virtio-balloon: use iothread to get free page hint v4->v5: 1) migration: - bitmap_clear_dirty: update the dirty bitmap and dirty page count under the bitmap mutex as what other functions are doing; - qemu_guest_free_page_hint: - add comments for this function; - check the !block case; - check "offset > block->used_length" before proceed; - assign used_len inside the for{} body; - update the dirty bitmap and dirty page counter under the bitmap mutex; - ram_state_reset: - rs->free_page_support: && with use "migrate_postcopy" instead of migration_in_postcopy; - clear the ram_bulk_stage flag if free_page_support is true; 2) balloon: - add the usage documentation of balloon_free_page_start and balloon_free_page_stop in code; - the optimization thread is named "balloon_fpo" to meet the requirement of "less than 14 characters"; - virtio_balloon_poll_free_page_hints: - run on condition when runstate_is_running() is true; - add a qemu spin lock to synchronize accesses to the free page reporting related fields shared among the migration thread and the optimization thread; - virtio_balloon_free_page_start: just return if runstate_is_running is false; - virtio_balloon_free_page_stop: access to the free page reporting related fields under a qemu spin lock; - virtio_balloon_device_unrealize/reset: call virtio_balloon_free_page_stop is the free page hint feature is used; - virtio_balloon_set_status: call irtio_balloon_free_page_stop in case the guest is stopped by qmp when the optimization is running; v3->v4: 1) bitmap: add a new API to count 1s starting from an offset of a bitmap 2) migration: - qemu_guest_free_page_hint: calculate ram_state->migration_dirty_pages by counting how many bits of free pages are truely cleared. If some of the bits were already 0, they shouldn't be deducted by ram_state->migration_dirty_pages. This wasn't needed for previous versions since we optimized bulk stage only, where all bits are guaranteed to be set. It's needed now because we extened the usage of this optimizaton to all stages except the last stop stage. From 2nd stage onward, there are possibilities that some bits of free pages are already 0. 3) virtio-balloon: - virtio_balloon_free_page_report_status: introduce a new status, FREE_PAGE_REPORT_S_EXIT. This status indicates that the optimization thread has exited. FREE_PAGE_REPORT_S_STOP means the reporting is stopped, but the optimization thread still needs to be joined by the migration thread. v2->v3: 1) virtio-balloon - virtio_balloon_free_page_start: poll the hints using a new thread; - use cmd id between [0x8000, UINT_MAX]; - virtio_balloon_poll_free_page_hints: - stop the optimization only when it has started; - don't skip free pages when !poison_val; - add poison_val to vmsd to migrate; - virtio_balloon_get_features: add the F_PAGE_POISON feature when
Re: [Qemu-devel] 答复: [BUG] Windows 7 got stuck easily while run PCMark10 application
On 2017/12/2 2:37, Paolo Bonzini wrote: On 01/12/2017 18:45, Gonglei (Arei) wrote: I also think it's windows bug, the problem is that it doesn't occur on xen platform. It's a race, it may just be that RTC PIO is faster in Xen because it's implemented in the hypervisor. No, In Xen, it does not has such problem because it injects the RTC irq without checking whether its previous irq been cleared or not, which we do has such checking in KVM. static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, int irq_level, bool line_status) { ... ... if (!irq_level) { ioapic->irr &= ~mask; -->clear the RTC irq in irr, Or we will can not inject RTC irq. ret = 1; goto out; } I agree that we move the operation of clearing RTC irq from cmos_ioport_read() to cmos_ioport_write() to ensure the action been done. Thanks, Hailiang I will try reporting it to Microsoft. Thanks, Paolo Thanks, Gonglei *发件人:*Paolo Bonzini *收件人:*龚磊,张海亮,qemu-devel,Michael S. Tsirkin *抄送:*黄伟栋,王欣,谢祥有 *时间:*2017-12-02 01:10:08 *主题:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application On 01/12/2017 08:08, Gonglei (Arei) wrote: First write to 0x70, cmos_index = 0xc & 0x7f = 0xc CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>CPU 1/KVM-15567 kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because cmos_index is 0x6 now:>CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size 1 count 1 val 0x6> vcpu1 read 0x6:>CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count 1 val 0x6 This seems to be a Windows bug. The easiest workaround that I can think of is to clear the interrupts already when 0xc is written, without waiting for the read (because REG_C can only be read). What do you think? Thanks, Paolo .
Re: [Qemu-devel] colo-compare: segfault and assert on colo_compare_finalize
Hi, Did you test this branch https://github.com/coloft/qemu/tree/colo-for-qemu-2.10-2017-4-22 ? This seems to be an already known problem, I'm not quite sure, it may be fixed by this patch b19456dd0ea4eb418ad093f092adbb882be13054 char: Fix removing wrong GSource that be found by fd_in_tag We use fd_in_tag to find a GSource, fd_in_tag is return value of g_source_attach(GSource *source, GMainContext *context), the return value is unique only in the same context, so we may get the same values with different 'context' parameters. It is no problem to find the right fd_in_tag by using g_main_context_find_source_by_id(GMainContext *context, guint source_id) while there is only one default main context. But colo-compare tries to create/use its own context, and if we pass wrong 'context' parameter with right fd_in_tag, we will find a wrong GSource to handle. We tried to fix the related codes in commit b43decb015a6efeb9e3cdbdb80f6547ad7248a4c, but it didn't fix the bug completely, because we still have some codes didn't pass *right* context parameter for remove_fd_in_watch(). Let's fix it by record the GSource directly instead of fd_in_tag. Signed-off-by: zhanghailiangReviewed-by: Marc-André Lureau Message-Id: <1492564532-91680-1-git-send-email-zhang.zhanghaili...@huawei.com> Signed-off-by: Paolo Bonzini Actually, we have already re-writed this part, and please follow the later series. Thanks, Hailiang On 2017/8/8 0:39, Eduardo Otubo wrote: (please ignore my last email, looks like mutt wants play games lately) Hi all, I have found a problem on colo-compare that leads to segmentation fault when calling qemu like this: $ qemu-system-x86_64 -S -machine pc -object colo-compare,id=test-object First I got an assert failed: (qemu-system-x86_64:7887): GLib-CRITICAL **: g_main_loop_quit: assertion 'loop != NULL' failed From this looks like s->compare_loop is NULL on the function colo_compare_finalize(), then I just added a check there and the assert went away. But then there's the segfault: Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x7333f79e in pthread_join () from /lib64/libpthread.so.0 (gdb) bt #0 0x7333f79e in pthread_join () at /lib64/libpthread.so.0 #1 0x55c379d2 in qemu_thread_join (thread=0x77ff5160) at util/qemu-thread-posix.c:547 #2 0x55adfc1a in colo_compare_finalize (obj=0x77fd3010) at net/colo-compare.c:867 #3 0x55b2cd87 in object_deinit (obj=0x77fd3010, type=0x567432e0) at qom/object.c:453 #4 0x55b2cdf9 in object_finalize (data=0x77fd3010) at qom/object.c:467 #5 0x55b2dd80 in object_unref (obj=0x77fd3010) at qom/object.c:902 #6 0x55b319a5 in user_creatable_add_type (type=0x567499a0 "colo-compare", id=0x56749960 "test-object", qdict=0x56835750, v=0x5681a3f0, errp=0x7fffde58) at qom/object_interfaces.c:105 #7 0x55b31b02 in user_creatable_add_opts (opts=0x56749910, errp=0x7fffde58) at qom/object_interfaces.c:135 #8 0x55b31bfd in user_creatable_add_opts_foreach (opaque=0x558e9c39 , opts=0x56749910, errp=0x0) at qom/object_interfaces.c:159 #9 0x55c4aecf in qemu_opts_foreach (list=0x56157ac0 , func=0x55b31b6f , opaque=0x558e9c39 , errp=0x0) at util/qemu-option.c:1104 #10 0x558edb75 in main (argc=6, argv=0x7fffe2d8, envp=0x7fffe310) at vl.c:4520 At this point '>thread' is '0'. Is this segfault and the above mentioned assert trigged because I'm creating a colo-compare object without any other parameter? In a positive case, a simple workaround and error check should do it. Otherwise I'll debug a little more. Best regards,
Re: [Qemu-devel] [PATCH 2/3] COLO: Define COLOMode without QAPI
On 2017/7/29 1:17, Dr. David Alan Gilbert wrote: * Markus Armbruster (arm...@redhat.com) wrote: COLOMode is defined in the QAPI schema, but not used there. Of the stuff QAPI generates for it only the typedef is actually used. Use of QAPI is pointless and only complicates things, so don't. Hmm, in one of the old COLO worlds I have, there's code to emit an event on exiting from COLO and that event includes the mode it was in. Yes, we need it in the later series. If the intent is to bring that or similar back then it would be worth keeping. Agreed. ;) Dave Cc: zhanghailiangSigned-off-by: Markus Armbruster --- include/migration/colo.h | 6 ++ qapi-schema.json | 16 2 files changed, 6 insertions(+), 16 deletions(-) diff --git a/include/migration/colo.h b/include/migration/colo.h index ff9874e..5d7c500 100644 --- a/include/migration/colo.h +++ b/include/migration/colo.h @@ -26,6 +26,12 @@ void migration_incoming_exit_colo(void); void *colo_process_incoming_thread(void *opaque); bool migration_incoming_in_colo_state(void); +typedef enum { +COLO_MODE_UNKNOWN, +COLO_MODE_PRIMARY, +COLO_MODE_SECONDARY, +} COLOMode; + COLOMode get_colo_mode(void); /* failover */ diff --git a/qapi-schema.json b/qapi-schema.json index 9b6f6cb..3f0eb05 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1304,22 +1304,6 @@ 'vmstate-loaded' ] } ## -# @COLOMode: -# -# The colo mode -# -# @unknown: unknown mode -# -# @primary: master side -# -# @secondary: slave side -# -# Since: 2.8 -## -{ 'enum': 'COLOMode', - 'data': [ 'unknown', 'primary', 'secondary'] } - -## # @FailoverStatus: # # An enumeration of COLO failover status -- 2.7.5 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK .
Re: [Qemu-devel] [PATCH 05/12] migration: Move colo.h to migration/
On 2017/5/15 19:04, Juan Quintela wrote: "Dr. David Alan Gilbert"wrote: D> * Juan Quintela (quint...@redhat.com) wrote: There are functions only used by migration code. That's only mostly true; see the current 'integrate colo frame with block replication and net compare' series (posted 22nd April). That adds colo_handle_shutdown to this header and calls it from vl.c ( https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03901.html ) where should that go? Dropped. It compiled, but who knows O:-) There's also a net/colo.h as well, so using the #include "colo.h" in migration is correct but that's really scary when there are two files of the same name. Ha, I'd like to suggest it be renamed as colo-proxy.h (colo-net.h ? ) for net/colo.h, which is saner, hmm, together with colo.c, colo-proxy.c ? colo-net.c ? Or any other proper name ? ( I didn't notice that until it went into upstream. O:-) ) Cc: Zhang Chen Yeap, it is scary ...
Re: [Qemu-devel] [PATCH V4 02/12] net/filter-mirror.c: Add new option to enable vnet support for filter-mirror
Hi, On 2017/5/12 9:41, Zhang Chen wrote: We add the vnet_hdr option for filter-mirror, default is disable. If you use virtio-net-pci net driver, please enable it. You can use it for example: -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr=on Is there any way to detect whether or not the vNIC using vnet_hdr ? I don't think it is a good idea to let users to confirm it, especially for users who may not be so familiar with the vNIC realizing in qemu. Thanks, Hailiang Signed-off-by: Zhang Chen--- net/filter-mirror.c | 34 ++ qemu-options.hx | 5 +++-- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/net/filter-mirror.c b/net/filter-mirror.c index 72fa7c2..3766414 100644 --- a/net/filter-mirror.c +++ b/net/filter-mirror.c @@ -38,6 +38,7 @@ typedef struct MirrorState { NetFilterState parent_obj; char *indev; char *outdev; +bool vnet_hdr; CharBackend chr_in; CharBackend chr_out; SocketReadState rs; @@ -308,6 +309,13 @@ static char *filter_mirror_get_outdev(Object *obj, Error **errp) return g_strdup(s->outdev); } +static char *filter_mirror_get_vnet_hdr(Object *obj, Error **errp) +{ +MirrorState *s = FILTER_MIRROR(obj); + +return s->vnet_hdr ? g_strdup("on") : g_strdup("off"); +} + static void filter_mirror_set_outdev(Object *obj, const char *value, Error **errp) { @@ -322,6 +330,21 @@ filter_mirror_set_outdev(Object *obj, const char *value, Error **errp) } } +static void filter_mirror_set_vnet_hdr(Object *obj, + const char *value, + Error **errp) +{ +MirrorState *s = FILTER_MIRROR(obj); + +if (strcmp(value, "on") && strcmp(value, "off")) { +error_setg(errp, "Invalid value for filter-mirror vnet_hdr, " + "should be 'on' or 'off'"); +return; +} + +s->vnet_hdr = !strcmp(value, "on"); +} + static char *filter_redirector_get_outdev(Object *obj, Error **errp) { MirrorState *s = FILTER_REDIRECTOR(obj); @@ -340,8 +363,19 @@ filter_redirector_set_outdev(Object *obj, const char *value, Error **errp) static void filter_mirror_init(Object *obj) { +MirrorState *s = FILTER_MIRROR(obj); + object_property_add_str(obj, "outdev", filter_mirror_get_outdev, filter_mirror_set_outdev, NULL); + +/* + * The vnet_hdr is disabled by default, if you want to enable + * this option, you must enable all the option on related modules + * (like other filter or colo-compare). + */ +s->vnet_hdr = false; +object_property_add_str(obj, "vnet_hdr", filter_mirror_get_vnet_hdr, + filter_mirror_set_vnet_hdr, NULL); } static void filter_redirector_init(Object *obj) diff --git a/qemu-options.hx b/qemu-options.hx index 70c0ded..1e08481 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -4024,10 +4024,11 @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter. @option{tx}: the filter is attached to the transmit queue of the netdev, where it will receive packets sent by the netdev. -@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid}[,queue=@var{all|rx|tx}] +@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},vnet_hdr=@var{on|off}[,queue=@var{all|rx|tx}] filter-mirror on netdev @var{netdevid},mirror net packet to chardev -@var{chardevid} +@var{chardevid}, if vnet_hdr = on, filter-mirror will mirror packet +with vnet_hdr_len. @item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid}, outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
On 2017/5/12 16:09, Xiao Guangrong wrote: On 05/11/2017 08:24 PM, Paolo Bonzini wrote: On 11/05/2017 14:07, Zhoujian (jay) wrote: -* Scan sptes if dirty logging has been stopped, dropping those -* which can be collapsed into a single large-page spte. Later -* page faults will create the large-page sptes. +* Reset each vcpu's mmu, then page faults will create the large-page +* sptes later. */ if ((change != KVM_MR_DELETE) && (old->flags & KVM_MEM_LOG_DIRTY_PAGES) && - !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) - kvm_mmu_zap_collapsible_sptes(kvm, new); + !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) { + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_mmu_reset_context(vcpu); This should be "kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);" but I am not sure it is enough. I think that if you do not zap the SPTEs, the page faults will use 4K SPTEs, not large ones (though I'd have to check better; CCing Xiao and Wanpeng). Yes, Paolo is right. kvm_mmu_reset_context() just reloads vCPU's root page table, 4k mappings are still kept. There are two issues reported: - one is kvm_mmu_slot_apply_flags(), when enable dirty log tracking. Its root cause is kvm_mmu_slot_remove_write_access() takes too much time. We can make the code adaptive to use the new fast-write-protect faculty introduced by my patchset, i.e, if the number of pages contained in this memslot is more than > TOTAL * FAST_WRITE_PROTECT_PAGE_PERCENTAGE, then we use fast-write-protect instead. - another one is kvm_mmu_zap_collapsible_sptes() when disable dirty log tracking. collapsible_sptes zaps 4k mappings to make memory-read happy, it is not required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not urgent for vCPU's running, it could be done in a separate thread and use lock-break technology. How about move the action of stopping dirty log into migrate_fd_cleanup() directly, which is processed in main thread as BH after migration is completed? It will not has any side-effect even migration is failed, Or users cancel migration, No ? Thanks! .
Re: [Qemu-devel] [Qemu-block] [PATCH v4 0/6] COLO block replication supports shared disk case
On 2017/5/12 3:17, Stefan Hajnoczi wrote: On Wed, Apr 12, 2017 at 10:05:15PM +0800, zhanghailiang wrote: COLO block replication doesn't support the shared disk case, Here we try to implement it and this is the 4th version. Please review and any commits are welcomed. Cc: Dr. David Alan Gilbert (git)Cc: eddie.d...@intel.com Sorry for the delay. Feel free to ping me if I don't review within a few days when you post a patch. That is OK. :) , I was doing other things these days, and it is not quite urgent ... thanks. v4: - Add proper comment for primary_disk in patch 2 (Stefan) - Call bdrv_invalidate_cache() while do checkpoint for shared disk in patch 5 v3: - Fix some comments from Stefan and Eric v2: - Drop the patch which add a blk_root() helper - Fix some comments from Changlong zhanghailiang (6): docs/block-replication: Add description for shared-disk case replication: add shared-disk and shared-disk-id options replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() replication: fix code logic with the new shared_disk option replication: Implement block replication for shared disk case nbd/replication: implement .bdrv_get_info() for nbd and replication driver block/nbd.c| 12 +++ block/replication.c| 198 ++--- docs/block-replication.txt | 139 ++- qapi/block-core.json | 10 ++- 4 files changed, 306 insertions(+), 53 deletions(-) -- 1.8.3.1
Re: [Qemu-devel] [PATCH v4 5/6] replication: Implement block replication for shared disk case
On 2017/5/12 3:15, Stefan Hajnoczi wrote: On Wed, Apr 12, 2017 at 10:05:20PM +0800, zhanghailiang wrote: @@ -612,6 +644,16 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp) error_propagate(errp, local_err); break; } +} else { +/* + * For shared disk, we need to force SVM to re-read metadata + * that is loaded in memory, or there will be inconsistent. + */ +bdrv_invalidate_cache(s->secondary_disk->bs, _err); I'm not sure this call has any effect: if (!(bs->open_flags & BDRV_O_INACTIVE)) { return; } Is BDRV_O_INACTIVE set? No, you are right, it does not take any effect. So should we set this flag for secondary_disk ? Is it enough to set this flag only, or should we call bdrv_inactivate_recurse() ? To be honest, i'm not quite familiar with this parts. Thanks, Hailiang
Re: [Qemu-devel] [Qemu-block] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options
On 2017/5/12 3:08, Stefan Hajnoczi wrote: On Wed, Apr 12, 2017 at 10:05:17PM +0800, zhanghailiang wrote: We use these two options to identify which disk is shared Signed-off-by: zhanghailiangSigned-off-by: Wen Congyang Signed-off-by: Zhang Chen --- v4: - Add proper comment for primary_disk (Stefan) v2: - Move g_free(s->shared_disk_id) to the common fail process place (Stefan) - Fix comments for these two options --- block/replication.c | 43 +-- qapi/block-core.json | 10 +- 2 files changed, 50 insertions(+), 3 deletions(-) Aside from the ongoing discussion about this patch... Reviewed-by: Stefan Hajnoczi Thanks, I'll fix the related problems found by changlong.
Re: [Qemu-devel] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options
On 2017/4/18 13:59, Xie Changlong wrote: On 04/12/2017 10:05 PM, zhanghailiang wrote: We use these two options to identify which disk is shared Signed-off-by: zhanghailiangSigned-off-by: Wen Congyang Signed-off-by: Zhang Chen --- v4: - Add proper comment for primary_disk (Stefan) v2: - Move g_free(s->shared_disk_id) to the common fail process place (Stefan) - Fix comments for these two options --- block/replication.c | 43 +-- qapi/block-core.json | 10 +- 2 files changed, 50 insertions(+), 3 deletions(-) diff --git a/block/replication.c b/block/replication.c index bf3c395..418b81b 100644 --- a/block/replication.c +++ b/block/replication.c @@ -25,9 +25,12 @@ typedef struct BDRVReplicationState { ReplicationMode mode; int replication_state; +bool is_shared_disk; +char *shared_disk_id; BdrvChild *active_disk; BdrvChild *hidden_disk; BdrvChild *secondary_disk; +BdrvChild *primary_disk; char *top_id; ReplicationState *rs; Error *blocker; @@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool failover, #define REPLICATION_MODE"mode" #define REPLICATION_TOP_ID "top-id" +#define REPLICATION_SHARED_DISK "shared-disk" +#define REPLICATION_SHARED_DISK_ID "shared-disk-id" + static QemuOptsList replication_runtime_opts = { .name = "replication", .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head), @@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = { .name = REPLICATION_TOP_ID, .type = QEMU_OPT_STRING, }, +{ +.name = REPLICATION_SHARED_DISK_ID, +.type = QEMU_OPT_STRING, +}, +{ +.name = REPLICATION_SHARED_DISK, +.type = QEMU_OPT_BOOL, +}, { /* end of list */ } }, }; @@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict *options, QemuOpts *opts = NULL; const char *mode; const char *top_id; +const char *shared_disk_id; +BlockBackend *blk; +BlockDriverState *tmp_bs; bs->file = bdrv_open_child(NULL, options, "file", bs, _file, false, errp); @@ -125,12 +142,33 @@ static int replication_open(BlockDriverState *bs, QDict *options, "The option mode's value should be primary or secondary"); goto fail; } +s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK, + What If secondary side is supplied with 'REPLICATION_SHARED_DISK_ID'? Pls refer f4f2539bc to pefect the logical. Hmm, we should not configure it for secondary side, i'll fix it in next version. false); +if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) { +shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID); +if (!shared_disk_id) { +error_setg(_err, "Missing shared disk blk option"); +goto fail; +} +s->shared_disk_id = g_strdup(shared_disk_id); +blk = blk_by_name(s->shared_disk_id); +if (!blk) { +error_setg(_err, "There is no %s block", s->shared_disk_id); +goto fail; +} +/* We have a BlockBackend for the primary disk but use BdrvChild for + * consistency - active_disk, secondary_disk, etc are also BdrvChild. + */ +tmp_bs = blk_bs(blk); +s->primary_disk = QLIST_FIRST(_bs->parents); +} s->rs = replication_new(bs, _ops); -ret = 0; - +qemu_opts_del(opts); +return 0; fail: +g_free(s->shared_disk_id); qemu_opts_del(opts); error_propagate(errp, local_err); @@ -141,6 +179,7 @@ static void replication_close(BlockDriverState *bs) { BDRVReplicationState *s = bs->opaque; +g_free(s->shared_disk_id); if (s->replication_state == BLOCK_REPLICATION_RUNNING) { replication_stop(s->rs, false, NULL); } diff --git a/qapi/block-core.json b/qapi/block-core.json index 033457c..361c932 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2661,12 +2661,20 @@ # node who owns the replication node chain. Must not be given in # primary mode. # +# @shared-disk-id: Id of shared disk while is replication mode, if @shared-disk +# is true, this option is required (Since: 2.10) +# Further explanations: For @shared-disk-id, it must/only be given when @shared-disk enable on Primary side. OK. +# @shared-disk: To indicate whether or not a disk is shared by primary VM +# and secondary VM. (The default is false) (Since: 2.10) +# Further explanations: For @shared-disk, it must be given or not-given on both side at the same
Re: [Qemu-devel] [PATCH v2 0/3] Remove old MigrationParams
On 2017/5/12 0:32, Juan Quintela wrote: Hi Changes from v1: - make migrate_block_set_* take a boolean - disable block migration in colo to maintain semantics. Please review, Juan. [v1] Upon a time there were MigrationParms (only used for block migration) and then MigrationParams used for everything else. This series: - create migration capabilities for block parameters - make the migrate command line parameters to use capabilities - remove MigrationParams completely Please, review. Looks good to me, this makes codes more grace. Reviewed-by:zhanghailiang*** BLURB HERE *** Juan Quintela (3): migration: Create block capabilities for shared and enable migration: Remove use of old MigrationParams migration: Remove old MigrationParams include/migration/block.h | 3 +++ include/migration/migration.h | 14 +--- include/migration/vmstate.h | 1 - include/qemu/typedefs.h | 1 - include/sysemu/sysemu.h | 3 +-- migration/block.c | 17 ++ migration/colo.c | 7 +++--- migration/migration.c | 52 --- migration/savevm.c| 18 +++ qapi-schema.json | 7 +- 10 files changed, 68 insertions(+), 55 deletions(-)
Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams
On 2017/5/12 0:33, Juan Quintela wrote: Hailiang Zhang <zhang.zhanghaili...@huawei.com> wrote: Hi, Hmm you don't seem to have replaced this with anything. I think that's a behavioural change; the trick COLO did (I'm not sure if this is still the way it works) is that they initiate the first migration with block migration enabled so that the two hosts (with non-shared storage) get sync'd storage, and then at the completion of that first migration they then switch into the checkpointing mode where they're only doing updates - that's why it gets switched off at this point prior to the 1st checkpoint. Weird, really. I did't catch that. Will investigate. Yes, Dave is right, for non-shared disk, we need to enable block migration for first cycle, to sync the disks of two sides. After that, qemu will go into COLO state which we need to disable block migration. v2 posted. My understanding is that it maintains the sematic, please test/comment. Yes, it is right now, i have reviewed it, thanks. Thanks, Juan. .
Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams
Hi, On 2017/5/4 16:51, Juan Quintela wrote: "Dr. David Alan Gilbert"wrote: * Juan Quintela (quint...@redhat.com) wrote: We have change in the previous patch to use migration capabilities for it. Notice that we continue using the old command line flags from migrate command from the time being. Remove the set_params method as now it is empty. Signed-off-by: Juan Quintela --- include/migration/migration.h | 3 +-- migration/block.c | 17 ++--- migration/colo.c | 3 --- migration/migration.c | 8 +--- migration/savevm.c| 2 -- 5 files changed, 8 insertions(+), 25 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index c19eb3f..5c6c2f0 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -332,9 +332,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s, goto out; } -/* Disable block migration */ -s->params.blk = 0; -s->params.shared = 0; Hmm you don't seem to have replaced this with anything. I think that's a behavioural change; the trick COLO did (I'm not sure if this is still the way it works) is that they initiate the first migration with block migration enabled so that the two hosts (with non-shared storage) get sync'd storage, and then at the completion of that first migration they then switch into the checkpointing mode where they're only doing updates - that's why it gets switched off at this point prior to the 1st checkpoint. Weird, really. I did't catch that. Will investigate. Yes, Dave is right, for non-shared disk, we need to enable block migration for first cycle, to sync the disks of two sides. After that, qemu will go into COLO state which we need to disable block migration. Thanks, Hailiang Thanks. .
Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread
Hi Jason, On 2017/4/25 19:33, Jason Wang wrote: On 2017年04月25日 17:59, Hailiang Zhang wrote: On 2017/4/25 16:41, Jason Wang wrote: On 2017年04月24日 14:03, Hailiang Zhang wrote: On 2017/4/24 12:10, Jason Wang wrote: On 2017年04月20日 15:46, zhanghailiang wrote: We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used to detach watched fd from default main context, so it has chance to handle the same watched fd with main thread concurrently, which will trigger an error report: "qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src == ((void *)0)' failed." Anyway to prevent fd from being handled by main thread before creating colo thread? Using semaphore seems not elegant. So how about calling qemu_mutex_lock_iothread() before qemu_chr_fe_set_handlers() ? Looks better, but I needs more information e.g how main thread can touch it? Hmm, this happened quite occasionally, and we didn't catch the first place (backtrace) of removing fd from been watched, but from the codes logic, we found there should be such possible cases: tcp_chr_write (Or tcp_chr_read/tcp_chr_sync_read/chr_disconnect) ->tcp_chr_disconnect (Or char_socket_finalize) ->tcp_chr_free_connection -> remove_fd_in_watch(chr); Anyway, it needs the protection from been freed twice. Thanks, Hailiang Still a little bit confused. The question is how could main thread still call tcp_chr_write or other in the above case? Finally, we reproduced this bug (We use qemu 2.6), and got the follow backtrace of this problem: (gdb) thread apply all bt Thread 7 (Thread 0x7f407a1ff700 (LWP 23144)): #0 0x7f41037e0db5 in _int_malloc () from /usr/lib64/libc.so.6 #1 0x7f41037e3b96 in calloc () from /usr/lib64/libc.so.6 #2 0x7f41041ad4d7 in g_malloc0 () from /usr/lib64/libglib-2.0.so.0 #3 0x7f41041a5437 in g_source_new () from /usr/lib64/libglib-2.0.so.0 #4 0x7f410a2cec9c in qio_channel_create_fd_watch (ioc=ioc@entry=0x7f410d6238c0, fd=20, condition=condition@entry= (G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel-watch.c:259 #5 0x7f410a2ced01 in qio_channel_create_socket_watch (ioc=ioc@entry=0x7f410d6238c0, socket=, condition=condition@entry=(G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel-watch.c:311 #6 0x7f410a2cbea7 in qio_channel_socket_create_watch (ioc=0x7f410d6238c0, condition=(G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel-socket.c:732 #7 0x7f410a2c94d2 in qio_channel_create_watch (ioc=0x7f410d6238c0, condition=condition@entry= (G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel.c:132 #8 0x7f410a003cd6 in io_watch_poll_prepare (source=0x7f407d00, timeout_=) at qemu-char.c:883 #9 0x7f41041a72ed in g_main_context_prepare () from /usr/lib64/libglib-2.0.so.0 #10 0x7f41041a7b7b in g_main_context_iterate.isra.24 () from /usr/lib64/libglib-2.0.so.0 #11 0x7f41041a7fba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0 #12 0x7f410a1e528f in colo_compare_thread (opaque=0x7f410d7d6800) at net/colo-compare.c:651 #13 0x7f4103b2bdc5 in start_thread () from /usr/lib64/libpthread.so.0 #14 0x7f410385971d in clone () from /usr/lib64/libc.so.6 Thread 6 (Thread 0x7f40799fe700 (LWP 19368)): #0 0x7f4103b2f6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x7f410a3138d1 in qemu_cond_wait (cond=cond@entry=0x7f410cce44c0, mutex=mutex@entry=0x7f410cce44f0) at util/qemu-thread-posix.c:132 ---Type to continue, or q to quit--- #2 0x7f410a22b1a3 in vnc_worker_thread_loop (queue=queue@entry=0x7f410cce44c0) at ui/vnc-jobs.c:228 #3 0x7f410a22b810 in vnc_worker_thread (arg=0x7f410cce44c0) at ui/vnc-jobs.c:335 #4 0x7f4103b2bdc5 in start_thread () from /usr/lib64/libpthread.so.0 #5 0x7f410385971d in clone () from /usr/lib64/libc.so.6 Thread 5 (Thread 0x7f407abff700 (LWP 19366)): #0 0x7f4103b2f6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x7f410a3138d1 in qemu_cond_wait (cond=cond@entry=0x7f410a9fc368 <mlock_struct+40>, mutex=mutex@entry=0x7f410a9fc340 ) at util/qemu-thread-posix.c:132 #2 0x7f4109e99060 in mlock_wait () at /work/zhanghailiang/qemu-kvm/exec.c:392 #3 mlock_thread (opaque=) at /work/zhanghailiang/qemu-kvm/exec.c:407 #4 0x7f4103b2bdc5 in start_thread () from /usr/lib64/libpthread.so.0 #5 0x7f410385971d in clone () from /usr/lib64/libc.so.6 Thread 4 (Thread 0x7f40fcd83700 (LWP 19364)): #0 0x7f4103b2f6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x7f410a3138d1 in qemu_cond_wait (cond=, mutex=mutex@entry=0x7f410aa66ca0 ) at util/qemu-thread-posix.c:132 #2 0x7f4109ed5b3b in qemu_kvm_wait_io_event (cpu=0x7f410c2bda30) at /work/zhanghailiang/qemu-kvm/cpus.c:1087 #3 qemu_kvm_cpu_thread_fn (arg=0x7f410c2bda30) at /work/zhanghailiang/qemu-kvm/cpus.c:1126 #4 0x7f4103b2bdc5 in start_thread () f
Re: [Qemu-devel] [PATCH] ram: Split dirty bitmap by RAMBlock
On 2017/4/26 15:32, Juan Quintela wrote: Both the ram bitmap and the unsent bitmap are split by RAMBlock. Signed-off-by: Juan Quintela-- Fix compilation when DEBUG_POSTCOPY is enabled (thanks Hailiang) Signed-off-by: Juan Quintela --- include/exec/ram_addr.h | 13 +- include/migration/migration.h| 3 +- include/migration/postcopy-ram.h | 3 - migration/postcopy-ram.c | 5 +- migration/ram.c | 273 +++ 5 files changed, 119 insertions(+), 178 deletions(-) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 6436a41..c56b35b 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -39,6 +39,14 @@ struct RAMBlock { QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers; int fd; size_t page_size; +/* dirty bitmap used during migration */ +unsigned long *bmap; +/* bitmap of pages that haven't been sent even once + * only maintained and used in postcopy at the moment + * where it's used to send the dirtymap at the start + * of the postcopy phase + */ +unsigned long *unsentmap; }; static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset) @@ -360,16 +368,15 @@ static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, static inline -uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest, - RAMBlock *rb, +uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb, ram_addr_t start, ram_addr_t length, uint64_t *real_dirty_pages) { ram_addr_t addr; -start = rb->offset + start; unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS); uint64_t num_dirty = 0; +unsigned long *dest = rb->bmap; /* start address is aligned at the start of a word? */ if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) { diff --git a/include/migration/migration.h b/include/migration/migration.h index ba1a16c..e29cb01 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -266,7 +266,8 @@ uint64_t xbzrle_mig_pages_cache_miss(void); double xbzrle_mig_cache_miss_rate(void); void ram_handle_compressed(void *host, uint8_t ch, uint64_t size); -void ram_debug_dump_bitmap(unsigned long *todump, bool expected); +void ram_debug_dump_bitmap(unsigned long *todump, bool expected, + unsigned long pages); /* For outgoing discard bitmap */ int ram_postcopy_send_discard_bitmap(MigrationState *ms); /* For incoming postcopy discard */ diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h index 8e036b9..4c25f03 100644 --- a/include/migration/postcopy-ram.h +++ b/include/migration/postcopy-ram.h @@ -43,12 +43,9 @@ int postcopy_ram_prepare_discard(MigrationIncomingState *mis); /* * Called at the start of each RAMBlock by the bitmap code. - * 'offset' is the bitmap offset of the named RAMBlock in the migration - * bitmap. * Returns a new PDS */ PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms, - unsigned long offset, const char *name); /* diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 85fd8d7..e3f4a37 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -33,7 +33,6 @@ struct PostcopyDiscardState { const char *ramblock_name; -uint64_t offset; /* Bitmap entry for the 1st bit of this RAMBlock */ uint16_t cur_entry; /* * Start and length of a discard range (bytes) @@ -717,14 +716,12 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis) * returns: a new PDS. */ PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms, - unsigned long offset, const char *name) { PostcopyDiscardState *res = g_malloc0(sizeof(PostcopyDiscardState)); if (res) { res->ramblock_name = name; -res->offset = offset; } return res; @@ -745,7 +742,7 @@ void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds, { size_t tp_size = qemu_target_page_size(); /* Convert to byte offsets within the RAM block */ -pds->start_list[pds->cur_entry] = (start - pds->offset) * tp_size; +pds->start_list[pds->cur_entry] = start * tp_size; pds->length_list[pds->cur_entry] = length * tp_size; trace_postcopy_discard_send_range(pds->ramblock_name, start, length); pds->cur_entry++; diff --git a/migration/ram.c b/migration/ram.c index f48664e..235f400 100644 ---
Re: [Qemu-devel] [PATCH RESEND v2 01/18] net/colo: Add notifier/callback related helpers for filter
On 2017/4/25 19:40, Jason Wang wrote: On 2017年04月22日 16:35, zhanghailiang wrote: We will use this notifier to help COLO to notify filter object to do something, like do checkpoint, or process failover event. Cc: Jason WangSigned-off-by: zhanghailiang Signed-off-by: Zhang Chen Signed-off-by: Li Zhijian --- net/colo.c | 105 + net/colo.h | 19 +++ 2 files changed, 124 insertions(+) Went through this series and I feel that the code duplicates (at least part) functionality of IOThread (iothread.c) . So I come to an idea that, switch to use IOThread for COLO comparing thread then you can use aio bh to do the inter process communication. Thoughts? Interesting idea, but after investigation, we found it will make things quite complex if we use iothread for net compare (Actually, i don't think we can use it directly for COLO), It seems to be special for block, you can see some codes call block helpers. Besides, we still have to realize a way to communicate with COLO frame for filters (I mean patch 16/17/18), even we use it after reconstruct those iothread part. ;) Thanks, Hailiang Thanks .
Re: [Qemu-devel] [PATCH v2 05/18] COLO: Handle shutdown command for VM in COLO state
On 2017/4/24 22:51, Eric Blake wrote: On 04/22/2017 03:25 AM, zhanghailiang wrote: If VM is in COLO FT state, we need to do some extra works before starting normal shutdown process. Secondary VM will ignore the shutdown command if users issue it directly to Secondary VM. COLO will capture shutdown command and after shutdown request from user. Cc: Paolo BonziniSigned-off-by: zhanghailiang Signed-off-by: Li Zhijian Reviewed-by: Dr. David Alan Gilbert --- +++ b/qapi-schema.json @@ -1187,12 +1187,14 @@ # # @vmstate-loaded: VM's state has been loaded by SVM. # +# @guest-shutdown: shutdown requested from PVM to SVM. (Since 2.9) You missed 2.9. Please fix this to state 2.10. OK, will fix in next version, thanks. +# # Since: 2.8 ## { 'enum': 'COLOMessage', 'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply', 'vmstate-send', 'vmstate-size', 'vmstate-received', -'vmstate-loaded' ] } +'vmstate-loaded', 'guest-shutdown' ] }
Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread
On 2017/4/25 19:33, Jason Wang wrote: On 2017年04月25日 17:59, Hailiang Zhang wrote: On 2017/4/25 16:41, Jason Wang wrote: On 2017年04月24日 14:03, Hailiang Zhang wrote: On 2017/4/24 12:10, Jason Wang wrote: On 2017年04月20日 15:46, zhanghailiang wrote: We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used to detach watched fd from default main context, so it has chance to handle the same watched fd with main thread concurrently, which will trigger an error report: "qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src == ((void *)0)' failed." Anyway to prevent fd from being handled by main thread before creating colo thread? Using semaphore seems not elegant. So how about calling qemu_mutex_lock_iothread() before qemu_chr_fe_set_handlers() ? Looks better, but I needs more information e.g how main thread can touch it? Hmm, this happened quite occasionally, and we didn't catch the first place (backtrace) of removing fd from been watched, but from the codes logic, we found there should be such possible cases: tcp_chr_write (Or tcp_chr_read/tcp_chr_sync_read/chr_disconnect) ->tcp_chr_disconnect (Or char_socket_finalize) ->tcp_chr_free_connection -> remove_fd_in_watch(chr); Anyway, it needs the protection from been freed twice. Thanks, Hailiang Still a little bit confused. The question is how could main thread still call tcp_chr_write or other in the above case? The 'char_socekt_finalize' ? Hmm, I'd better to reproduce it again to get the first time of removing the fd been watched... Thanks Thanks . .
Re: [Qemu-devel] [PATCH] COLO-compare: Add compare_lock aviod comparison conflict
On 2017/4/25 19:57, Zhang Chen wrote: On 04/20/2017 02:40 PM, Jason Wang wrote: On 2017年04月20日 14:36, Zhang Chen wrote: On 04/20/2017 02:20 PM, Hailiang Zhang wrote: On 2017/4/20 12:32, Zhang Chen wrote: When network traffic heavy, compare_pri_rs_finalize() and compare_sec_rs_finalize() have a chance to confilct. Both of them call colo_compare_connection() to compare packet, But during compare_pri_rs_finalize() comparison, have secondary packet come and call compare_sec_rs_finalize(), that packet will be handle twice. If packet same, the pkt will be double free. Interesting, if I'm right, this should not happen, because, all the comparing works are done in colo compare thread, so there is no chance to access the connect_list concurrently. Besides, even both of the packets from primary and secondary arrive at the same time, it should only be handle once, we will handle it with the later arrived one, No ? No, In my test often trigger this bug, you can use udp server and client test it. 13517@1492648526.850246:colo_compare_main : packet same and release packet 13517@1492648526.850304:colo_compare_main : packet same and release packet *** glibc detected *** /home/zhangchen/qemu-colo-apr14/x86_64-softmmu/qemu-system-x86_64: double free or corruption (out): 0x56a75210 *** === Backtrace: = /lib64/libc.so.6(+0x76628)[0x753d6628] /lib64/libc.so.6(cfree+0x6c)[0x753db5cc] Thanks Zhang Chen I agree that you should check whether or not they are running in the same thread. I found they are not running in the same thread, and I have reviewed relative code but don't find out why we do same job to pri_chr_in and sec_chr_in then they running in different thread. Anyone can tell me the reason? Log: Breakpoint 5, compare_pri_chr_in (opaque=0x77fd1010, buf=0x73eb0950 "", size=8) at net/colo-compare.c:591 591{ (gdb) info thread Id Target Id Frame 18 Thread 0x7fff70bff700 (LWP 27864) "qemu-system-x86" 0x756e4ae7 in sem_timedwait () from /lib64/libpthread.so.0 9Thread 0x7fff6f1ff700 (LWP 27748) "qemu-system-x86" 0x756e4a00 in sem_wait () from /lib64/libpthread.so.0 7Thread 0x7fff701ff700 (LWP 27746) "qemu-system-x86" 0x756e261c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5Thread 0x72dbf700 (LWP 27743) "qemu-system-x86" 0x754320a7 in ioctl () from /lib64/libc.so.6 4Thread 0x735c0700 (LWP 27742) "qemu-system-x86" 0x756e5dbd in sendmsg () from /lib64/libpthread.so.0 * 3Thread 0x73eb2700 (LWP 27741) "qemu-system-x86" compare_pri_chr_in (opaque=0x77fd1010, buf=0x73eb0950 "", size=8) at net/colo-compare.c:591 2Thread 0x746b3700 (LWP 27729) "qemu-system-x86" 0x75436789 in syscall () from /lib64/libc.so.6 1Thread 0x77fb7a80 (LWP 27725) "qemu-system-x86" 0x756e5294 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 compare_pri_chr_in (opaque=0x77fd1010, buf=0x73eb0950 "", size=8) at net/colo-compare.c:591 #1 0x55c60fba in qemu_chr_be_write_impl (s=0x5684a630, buf=0x73eb0950 "", len=8) at chardev/char.c:284 #2 0x55c6102f in qemu_chr_be_write (s=0x5684a630, buf=0x73eb0950 "", len=8) at chardev/char.c:296 #3 0x55c6a056 in tcp_chr_read (chan=0x5684aa30, cond=G_IO_IN, opaque=0x5684a630) at chardev/char-socket.c:414 #4 0x55c83dbc in qio_channel_fd_source_dispatch (source=0x568d8b80, callback= 0x55c69ebf , user_data=0x5684a630) at io/channel-watch.c:84 #5 0x760c460a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #6 0x760c7e88 in ?? () from /usr/lib64/libglib-2.0.so.0 #7 0x760c835d in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0 #8 0x55b82e22 in colo_compare_thread (opaque=0x77fd1010) at net/colo-compare.c:703 #9 0x756de7b6 in start_thread () from /lib64/libpthread.so.0 #10 0x75439d6d in clone () from /lib64/libc.so.6 #11 0x in ?? () (gdb) c Continuing. [Switching to Thread 0x77fb7a80 (LWP 27725)] Breakpoint 6, compare_sec_chr_in (opaque=0x77fd1010, buf=0x7fffc590 "", size=1088) at net/colo-compare.c:608 608{ (gdb) info thread Id Target Id Frame 18 Thread 0x7fff70bff700 (LWP 27864) "qemu-system-x86" (Exiting) 0x756de9b3 in start_thread () from /lib64/libpthread.so.0 9Thread 0x7fff6f1ff700 (LWP 27748) "qemu-system-x86" 0x756e4a00 in sem_wait () from /lib64/libpthread.so.0 7Thread 0x7fff701ff700 (LWP 27746) "qemu-system-x86" 0x756e261c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
Re: [Qemu-devel] [PATCH] ram: Split dirty bitmap by RAMBlock
On 2017/4/25 18:11, Juan Quintela wrote: Both the ram bitmap and the unsent bitmap are split by RAMBlock. Signed-off-by: Juan Quintela--- include/exec/ram_addr.h | 13 +- include/migration/postcopy-ram.h | 3 - migration/postcopy-ram.c | 5 +- migration/ram.c | 257 +++ 4 files changed, 109 insertions(+), 169 deletions(-) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 6436a41..c56b35b 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -39,6 +39,14 @@ struct RAMBlock { QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers; int fd; size_t page_size; +/* dirty bitmap used during migration */ +unsigned long *bmap; +/* bitmap of pages that haven't been sent even once + * only maintained and used in postcopy at the moment + * where it's used to send the dirtymap at the start + * of the postcopy phase + */ +unsigned long *unsentmap; }; static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset) @@ -360,16 +368,15 @@ static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, static inline -uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest, - RAMBlock *rb, +uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb, ram_addr_t start, ram_addr_t length, uint64_t *real_dirty_pages) { ram_addr_t addr; -start = rb->offset + start; unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS); uint64_t num_dirty = 0; +unsigned long *dest = rb->bmap; /* start address is aligned at the start of a word? */ if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) { diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h index 8e036b9..4c25f03 100644 --- a/include/migration/postcopy-ram.h +++ b/include/migration/postcopy-ram.h @@ -43,12 +43,9 @@ int postcopy_ram_prepare_discard(MigrationIncomingState *mis); /* * Called at the start of each RAMBlock by the bitmap code. - * 'offset' is the bitmap offset of the named RAMBlock in the migration - * bitmap. * Returns a new PDS */ PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms, - unsigned long offset, const char *name); /* diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 85fd8d7..e3f4a37 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -33,7 +33,6 @@ struct PostcopyDiscardState { const char *ramblock_name; -uint64_t offset; /* Bitmap entry for the 1st bit of this RAMBlock */ uint16_t cur_entry; /* * Start and length of a discard range (bytes) @@ -717,14 +716,12 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis) * returns: a new PDS. */ PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms, - unsigned long offset, const char *name) { PostcopyDiscardState *res = g_malloc0(sizeof(PostcopyDiscardState)); if (res) { res->ramblock_name = name; -res->offset = offset; } return res; @@ -745,7 +742,7 @@ void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds, { size_t tp_size = qemu_target_page_size(); /* Convert to byte offsets within the RAM block */ -pds->start_list[pds->cur_entry] = (start - pds->offset) * tp_size; +pds->start_list[pds->cur_entry] = start * tp_size; pds->length_list[pds->cur_entry] = length * tp_size; trace_postcopy_discard_send_range(pds->ramblock_name, start, length); pds->cur_entry++; diff --git a/migration/ram.c b/migration/ram.c index f48664e..d99f6e2 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -138,19 +138,6 @@ out: return ret; } -struct RAMBitmap { -struct rcu_head rcu; -/* Main migration bitmap */ -unsigned long *bmap; -/* bitmap of pages that haven't been sent even once - * only maintained and used in postcopy at the moment - * where it's used to send the dirtymap at the start - * of the postcopy phase - */ -unsigned long *unsentmap; -}; -typedef struct RAMBitmap RAMBitmap; - /* * An outstanding page request, on the source, having been received * and queued @@ -220,8 +207,6 @@ struct RAMState { uint64_t postcopy_requests; /* protects modification of the bitmap */ QemuMutex bitmap_mutex; -/* Ram Bitmap protected by RCU */ -RAMBitmap *ram_bitmap; /* The RAMBlock used in the last src_page_requests */
Re: [Qemu-devel] [PATCH RESEND v2 08/18] ram/COLO: Record the dirty pages that SVM received
On 2017/4/25 2:29, Juan Quintela wrote: zhanghailiangwrote: We record the address of the dirty pages that received, it will help flushing pages that cached into SVM. Here, it is a trick, we record dirty pages by re-using migration dirty bitmap. In the later patch, we will start the dirty log for SVM, just like migration, in this way, we can record both the dirty pages caused by PVM and SVM, we only flush those dirty pages from RAM cache while do checkpoint. Cc: Juan Quintela Signed-off-by: zhanghailiang Reviewed-by: Dr. David Alan Gilbert --- migration/ram.c | 29 + 1 file changed, 29 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 05d1b06..0653a24 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2268,6 +2268,9 @@ static inline void *host_from_ram_block_offset(RAMBlock *block, static inline void *colo_cache_from_block_offset(RAMBlock *block, ram_addr_t offset) { +unsigned long *bitmap; +long k; + if (!offset_in_ramblock(block, offset)) { return NULL; } @@ -2276,6 +2279,17 @@ static inline void *colo_cache_from_block_offset(RAMBlock *block, __func__, block->idstr); return NULL; } + +k = (memory_region_get_ram_addr(block->mr) + offset) >> TARGET_PAGE_BITS; +bitmap = atomic_rcu_read(_state.ram_bitmap)->bmap; +/* +* During colo checkpoint, we need bitmap of these migrated pages. +* It help us to decide which pages in ram cache should be flushed +* into VM's RAM later. +*/ +if (!test_and_set_bit(k, bitmap)) { +ram_state.migration_dirty_pages++; +} return block->colo_cache + offset; } @@ -2752,6 +2766,15 @@ int colo_init_ram_cache(void) memcpy(block->colo_cache, block->host, block->used_length); } rcu_read_unlock(); +/* +* Record the dirty pages that sent by PVM, we use this dirty bitmap together +* with to decide which page in cache should be flushed into SVM's RAM. Here +* we use the same name 'ram_bitmap' as for migration. +*/ +ram_state.ram_bitmap = g_new0(RAMBitmap, 1); +ram_state.ram_bitmap->bmap = bitmap_new(last_ram_page()); +ram_state.migration_dirty_pages = 0; + return 0; out_locked: @@ -2770,6 +2793,12 @@ out_locked: void colo_release_ram_cache(void) { RAMBlock *block; +RAMBitmap *bitmap = ram_state.ram_bitmap; + +atomic_rcu_set(_state.ram_bitmap, NULL); +if (bitmap) { +call_rcu(bitmap, migration_bitmap_free, rcu); +} rcu_read_lock(); QLIST_FOREACH_RCU(block, _list.blocks, next) { You can see my Split bitmap patches, I am splitting the dirty bitmap per block, I think that it shouldn't make your life more difficult, but please take a look. OK, I'll look at it. I am wondering if it is faster/easier to use the page_cache.c that xbzrle uses to store the dirty pages instead of copying the whole RAMBlocks, but I don't really know. Hmm, Yes, it takes long time (depends on VM's memory size) to backup the whole VM's memory data into cache. And we can reduce the time by backup page one by one while loading the page during the first live migration round, because we can know if users enable COLO at the beginning of the first migration stage. I'd like to send those optimization later in another series... Thanks, Hailiang Thanks, Juan. .
Re: [Qemu-devel] [PATCH RESEND v2 07/18] COLO: Load dirty pages into SVM's RAM cache firstly
On 2017/4/25 2:27, Juan Quintela wrote: zhanghailiangwrote: We should not load PVM's state directly into SVM, because there maybe some errors happen when SVM is receving data, which will break SVM. We need to ensure receving all data before load the state into SVM. We use an extra memory to cache these data (PVM's ram). The ram cache in secondary side is initially the same as SVM/PVM's memory. And in the process of checkpoint, we cache the dirty pages of PVM into this ram cache firstly, so this ram cache always the same as PVM's memory at every checkpoint, then we flush this cached ram to SVM after we receive all PVM's state. Cc: Dr. David Alan Gilbert Signed-off-by: zhanghailiang Signed-off-by: Li Zhijian --- v2: - Move colo_init_ram_cache() and colo_release_ram_cache() out of incoming thread since both of them need the global lock, if we keep colo_release_ram_cache() in incoming thread, there are potential dead-lock. - Remove bool ram_cache_enable flag, use migration_incoming_in_state() instead. - Remove the Reviewd-by tag because of the above changes. +out_locked: +QLIST_FOREACH_RCU(block, _list.blocks, next) { +if (block->colo_cache) { +qemu_anon_ram_free(block->colo_cache, block->used_length); +block->colo_cache = NULL; +} +} + +rcu_read_unlock(); +return -errno; +} + +/* It is need to hold the global lock to call this helper */ +void colo_release_ram_cache(void) +{ +RAMBlock *block; + +rcu_read_lock(); +QLIST_FOREACH_RCU(block, _list.blocks, next) { +if (block->colo_cache) { +qemu_anon_ram_free(block->colo_cache, block->used_length); +block->colo_cache = NULL; +} +} +rcu_read_unlock(); +} Create a function from the creation/removal? We have exactly two copies of the same code. Right now the code inside the function is very small, but it could be bigger, no? Yes, we add more codes in next patch (patch 08). :) Later, Juan. .
Re: [Qemu-devel] [PATCH RESEND v2 04/18] COLO: integrate colo compare with colo frame
On 2017/4/25 2:18, Juan Quintela wrote: zhanghailiangwrote: For COLO FT, both the PVM and SVM run at the same time, only sync the state while it needs. So here, let SVM runs while not doing checkpoint, change DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100. Besides, we forgot to release colo_checkpoint_semd and colo_delay_timer, fix them here. Cc: Jason Wang Signed-off-by: zhanghailiang Reviewed-by: Dr. David Alan Gilbert diff --git a/migration/migration.c b/migration/migration.c index 353f272..2ade2aa 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -70,7 +70,7 @@ /* The delay time (in ms) between two COLO checkpoints * Note: Please change this default value to 1 when we support hybrid mode. */ -#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200 +#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100) static NotifierList migration_state_notifiers = NOTIFIER_LIST_INITIALIZER(migration_state_notifiers); 1000 or 200 * 100 Please, fix value or comment? OK, will fix in next version, thanks. Later, Juan. .
Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread
On 2017/4/25 16:41, Jason Wang wrote: On 2017年04月24日 14:03, Hailiang Zhang wrote: On 2017/4/24 12:10, Jason Wang wrote: On 2017年04月20日 15:46, zhanghailiang wrote: We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used to detach watched fd from default main context, so it has chance to handle the same watched fd with main thread concurrently, which will trigger an error report: "qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src == ((void *)0)' failed." Anyway to prevent fd from being handled by main thread before creating colo thread? Using semaphore seems not elegant. So how about calling qemu_mutex_lock_iothread() before qemu_chr_fe_set_handlers() ? Looks better, but I needs more information e.g how main thread can touch it? Hmm, this happened quite occasionally, and we didn't catch the first place (backtrace) of removing fd from been watched, but from the codes logic, we found there should be such possible cases: tcp_chr_write (Or tcp_chr_read/tcp_chr_sync_read/chr_disconnect) ->tcp_chr_disconnect (Or char_socket_finalize) ->tcp_chr_free_connection -> remove_fd_in_watch(chr); Anyway, it needs the protection from been freed twice. Thanks, Hailiang Thanks .
Re: [Qemu-devel] [QEMU-2.8] Source QEMU crashes with: "bdrv_co_pwritev: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed"
On 2017/4/24 15:59, Kashyap Chamarthy wrote: On Sat, Apr 22, 2017 at 05:23:49PM +0800, Hailiang Zhang wrote: Hi, Hi Hailiang, I think the bellow patch can fix your problme. [PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH https://patchwork.kernel.org/patch/9591885/ Hmm, the above patch ("qmp-cont: invalidate on RUN_STATE_PRELAUNCH") is not merged in Git, as it's stalled on design discussion between Kevin Wolf and Vladimir. And the below patch, from you, seems to be not submitted upstream (2.8 stable tree, perhaps). Do you intend to do so? Er, since this patch does the same thing with the above patch, I'm not sure if i should send this patch ... Actually, we encounter the same problem in our test, we fix it with the follow patch: From 0e4d6d706afd9909b5fd71536b45c58af60892f8 Mon Sep 17 00:00:00 2001 From: zhanghailiang<zhang.zhanghaili...@huawei.com> Date: Tue, 21 Mar 2017 09:44:36 +0800 Subject: [PATCH] migration: Re-activate blocks whenever migration been cancelled In commit 1d2acc3162d9c7772510c973f446353fbdd1f9a8, we try to fix the bug 'bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed' which occured in migration cancelling process. But it seems that we didn't cover all the cases, we caught such a case which slipped from the old fixup in our test: if libvirtd cancelled the migration process for a shutting down VM, it will send 'system_reset' command first, and then 'cont' command behind, after VM resumes to run, it will trigger the above error reports, because we didn't regain the control of blocks for VM. Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com> Signed-off-by: Hongyang Yang<yanghongy...@huawei.com> --- block.c | 12 +++- include/block/block.h | 1 + include/migration/migration.h | 3 --- migration/migration.c | 7 +-- qmp.c | 4 +--- 5 files changed, 14 insertions(+), 13 deletions(-) [...]
Re: [Qemu-devel] [PATCH 2/3] colo-compare: Check main_loop value before call g_main_loop_quit
On 2017/4/24 12:13, Jason Wang wrote: On 2017年04月20日 15:46, zhanghailiang wrote: If some errors happen before main_loop is initialized in colo compare thread, qemu will go into finalizing process where we call g_main_loop_quit(s->main_loop), if main_loop is NULL, there will be an error report: "(process:14861): GLib-CRITICAL **: g_main_loop_quit: assertion 'loop != NULL' failed". We need to check if main_loop is NULL or not before call g_main_loop_quit(). Do we need check and fail early in colo_compare_thread() too? Yes, we need to check there too, will add the check in next version, thanks. Thanks Signed-off-by: zhanghailiang--- net/colo-compare.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index a6bf419..d6a5e4c 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -770,7 +770,9 @@ static void colo_compare_finalize(Object *obj) s->worker_context, true); qemu_chr_fe_deinit(>chr_out); -g_main_loop_quit(s->compare_loop); +if (s->compare_loop) { +g_main_loop_quit(s->compare_loop); +} qemu_thread_join(>thread); /* Release all unhandled packets after compare thead exited */ .
Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread
On 2017/4/24 12:10, Jason Wang wrote: On 2017年04月20日 15:46, zhanghailiang wrote: We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used to detach watched fd from default main context, so it has chance to handle the same watched fd with main thread concurrently, which will trigger an error report: "qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src == ((void *)0)' failed." Anyway to prevent fd from being handled by main thread before creating colo thread? Using semaphore seems not elegant. So how about calling qemu_mutex_lock_iothread() before qemu_chr_fe_set_handlers() ? Thanks Fix it by serializing compare thread's initialization with main thread. Signed-off-by: zhanghailiang--- net/colo-compare.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/net/colo-compare.c b/net/colo-compare.c index 54e6d40..a6bf419 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -83,6 +83,7 @@ typedef struct CompareState { GHashTable *connection_track_table; /* compare thread, a thread for each NIC */ QemuThread thread; +QemuSemaphore thread_ready; GMainContext *worker_context; GMainLoop *compare_loop; @@ -557,6 +558,8 @@ static void *colo_compare_thread(void *opaque) (GSourceFunc)check_old_packet_regular, s, NULL); g_source_attach(timeout_source, s->worker_context); +qemu_sem_post(>thread_ready); + g_main_loop_run(s->compare_loop); g_source_unref(timeout_source); @@ -707,12 +710,15 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) connection_key_equal, g_free, connection_destroy); +qemu_sem_init(>thread_ready, 0); sprintf(thread_name, "colo-compare %d", compare_id); qemu_thread_create(>thread, thread_name, colo_compare_thread, s, QEMU_THREAD_JOINABLE); compare_id++; +qemu_sem_wait(>thread_ready); +qemu_sem_destroy(>thread_ready); return; } .
Re: [Qemu-devel] [QEMU-2.8] Source QEMU crashes with: "bdrv_co_pwritev: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed"
Hi, I think the bellow patch can fix your problme. [PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH https://patchwork.kernel.org/patch/9591885/ Actually, we encounter the same problem in our test, we fix it with the follow patch: From 0e4d6d706afd9909b5fd71536b45c58af60892f8 Mon Sep 17 00:00:00 2001 From: zhanghailiangDate: Tue, 21 Mar 2017 09:44:36 +0800 Subject: [PATCH] migration: Re-activate blocks whenever migration been cancelled In commit 1d2acc3162d9c7772510c973f446353fbdd1f9a8, we try to fix the bug 'bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed' which occured in migration cancelling process. But it seems that we didn't cover all the cases, we caught such a case which slipped from the old fixup in our test: if libvirtd cancelled the migration process for a shutting down VM, it will send 'system_reset' command first, and then 'cont' command behind, after VM resumes to run, it will trigger the above error reports, because we didn't regain the control of blocks for VM. Signed-off-by: zhanghailiang Signed-off-by: Hongyang Yang --- block.c | 12 +++- include/block/block.h | 1 + include/migration/migration.h | 3 --- migration/migration.c | 7 +-- qmp.c | 4 +--- 5 files changed, 14 insertions(+), 13 deletions(-) diff --git a/block.c b/block.c index 6e906ec..c2555b0 100644 --- a/block.c +++ b/block.c @@ -3875,6 +3875,13 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp) } } +static bool is_inactivated; + +bool bdrv_is_inactivated(void) +{ +return is_inactivated; +} + void bdrv_invalidate_cache_all(Error **errp) { BlockDriverState *bs; @@ -3892,6 +3899,7 @@ void bdrv_invalidate_cache_all(Error **errp) return; } } +is_inactivated = false; } static int bdrv_inactivate_recurse(BlockDriverState *bs, @@ -3948,7 +3956,9 @@ out: for (bs = bdrv_first(); bs; bs = bdrv_next()) { aio_context_release(bdrv_get_aio_context(bs)); } - +if (!ret) { +is_inactivated = true; +} return ret; } diff --git a/include/block/block.h b/include/block/block.h index 5149260..f77b57f 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -365,6 +365,7 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf); void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp); void bdrv_invalidate_cache_all(Error **errp); int bdrv_inactivate_all(void); +bool bdrv_is_inactivated(void); /* Ensure contents are flushed to disk. */ int bdrv_flush(BlockDriverState *bs); diff --git a/include/migration/migration.h b/include/migration/migration.h index 5720c88..a9a2071 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -183,9 +183,6 @@ struct MigrationState /* Flag set once the migration thread is running (and needs joining) */ bool migration_thread_running; -/* Flag set once the migration thread called bdrv_inactivate_all */ -bool block_inactive; - /* Queue of outstanding page requests from the destination */ QemuMutex src_page_req_mutex; QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests; diff --git a/migration/migration.c b/migration/migration.c index 54060f7..7c3d593 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1015,14 +1015,12 @@ static void migrate_fd_cancel(MigrationState *s) if (s->state == MIGRATION_STATUS_CANCELLING && f) { qemu_file_shutdown(f); } -if (s->state == MIGRATION_STATUS_CANCELLING && s->block_inactive) { +if (bdrv_is_inactivated()) { Error *local_err = NULL; bdrv_invalidate_cache_all(_err); if (local_err) { error_report_err(local_err); -} else { -s->block_inactive = false; } } } @@ -1824,7 +1822,6 @@ static void migration_completion(MigrationState *s, int current_active_state, if (ret >= 0) { qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX); qemu_savevm_state_complete_precopy(s->to_dst_file, false); -s->block_inactive = true; } } qemu_mutex_unlock_iothread(); @@ -1879,8 +1876,6 @@ fail_invalidate: bdrv_invalidate_cache_all(_err);
Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint
On 2017/4/20 13:15, Jason Wang wrote: On 2017年04月18日 14:58, Hailiang Zhang wrote: On 2017/4/18 11:55, Jason Wang wrote: On 2017年04月17日 19:04, Hailiang Zhang wrote: Hi Jason, On 2017/4/14 14:38, Jason Wang wrote: On 2017年04月14日 14:22, Hailiang Zhang wrote: Hi Jason, On 2017/4/14 13:57, Jason Wang wrote: On 2017年02月22日 17:31, Zhang Chen wrote: On 02/22/2017 11:42 AM, zhanghailiang wrote: While do checkpoint, we need to flush all the unhandled packets, By using the filter notifier mechanism, we can easily to notify every compare object to do this process, which runs inside of compare threads as a coroutine. Hi~ Jason and Hailiang. I will send a patch set later about colo-compare notify mechanism for Xen like this patch. I want to add a new chardev socket way in colo-comapre connect to Xen colo, for notify checkpoint or failover, Because We have no choice to use this way communicate with Xen codes. That's means we will have two notify mechanism. What do you think about this? Thanks Zhang Chen I was thinking the possibility of using similar way to for colo compare. E.g can we use socket? This can saves duplicated codes more or less. Since there are too many sockets used by filter and COLO, (Two unix sockets and two tcp sockets for each vNIC), I don't want to introduce more ;) , but i'm not sure if it is possible to make it more flexible and optional, abstract these duplicated codes, pass the opened fd (No matter eventfd or socket fd ) as parameter, for example. Is this way acceptable ? Thanks, Hailiang Yes, that's kind of what I want. We don't want to use two message format. Passing a opened fd need management support, we still need a fallback if there's no management on top. For qemu/kvm, we can do all stuffs transparent to the cli by e.g socketpair() or others, but the key is to have a unified message format. After a deeper investigation, i think we can re-use most codes, since there is no existing way to notify xen (no ?), we still needs notify chardev socket (Be used to notify xen, it is optional.) (http://patchwork.ozlabs.org/patch/733431/ "COLO-compare: Add Xen notify chardev socket handler frame") Yes and actually you can use this for bi-directional communication. The only differences is the implementation of comparing. Besides, there is an existing qmp comand 'xen-colo-do-checkpoint', I don't see this in master? Er, it has been merged already, please see migration/colo.c, void qmp_xen_colo_do_checkpoint(Error **errp); Aha, I see. Thanks. ;) we can re-use it to notify colo-compare objects and other filter objects to do checkpoint, for the opposite direction, we use the notify chardev socket (Only for xen). Just want to make sure I understand the design, who will trigger this command? Management? The command will be issued by XEN (xc_save ?), the original existing xen-colo-do-checkpoint command now only be used to notify block replication to do checkpoint, we can re-use it for filters too. So it was called by management. For KVM case, we probably don't need this since the comparing thread are under control of qemu. Yes, you are right. Can we just use the socket? I don't quite understand ... Just as the codes showed bellow, in this scenario, XEN notifies colo-compare and fiters do checkpoint by using qmp command, Yes, that's just what I mean. Technically Xen can use socket to do this too. Yes, great, since we have come to an agreement on the scenario, I'll update this series. Thanks, Hailiang. Thanks and colo-compare notifies XEN about net inconsistency event by using the new socket. So the codes will be like: diff --git a/migration/colo.c b/migration/colo.c index 91da936..813c281 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -224,7 +224,19 @@ ReplicationStatus *qmp_query_xen_replication_status(Error **errp) void qmp_xen_colo_do_checkpoint(Error **errp) { +Error *local_err = NULL; + replication_do_checkpoint_all(errp); +/* Notify colo-compare and other filters to do checkpoint */ +colo_notify_compares_event(NULL, COLO_CHECKPOINT, _err); +if (local_err) { +error_propagate(errp, local_err); +return; +} +colo_notify_filters_event(COLO_CHECKPOINT, _err); +if (local_err) { +error_propagate(errp, local_err); +} } static void colo_send_message(QEMUFile *f, COLOMessage msg, diff --git a/net/colo-compare.c b/net/colo-compare.c index 24e13f0..de975c5 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -391,6 +391,9 @@ static void colo_compare_inconsistent_notify(void) { notifier_list_notify(_compare_notifiers, migrate_get_current()); KVM will use this notifier/callback way, and in this way, we can avoid the redundant socket. +if (s->notify_dev) { + /* Do something, notify remote side through notify dev */ +} } If we have a notify socket configured, we will send the message about net inconsistent eve
Re: [Qemu-devel] [PATCH 12/15] savevm: split the process of different stages for loadvm/savevm
On 2017/4/20 17:09, Dr. David Alan Gilbert wrote: * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote: On 2017/4/8 1:18, Dr. David Alan Gilbert wrote: * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: There are several stages during loadvm/savevm process. In different stage, migration incoming processes different types of sections. We want to control these stages more accuracy, it will benefit COLO performance, we don't have to save type of QEMU_VM_SECTION_START sections everytime while do checkpoint, besides, we want to separate the process of saving/loading memory and devices state. So we add three new helper functions: qemu_loadvm_state_begin(), qemu_load_device_state() and qemu_savevm_live_state() to achieve different process during migration. Besides, we make qemu_loadvm_state_main() and qemu_save_device_state() public. Cc: Juan Quintela <quint...@redhat.com> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> --- include/sysemu/sysemu.h | 6 ++ migration/savevm.c | 55 ++--- 2 files changed, 54 insertions(+), 7 deletions(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 7ed665a..95cae41 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -132,7 +132,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name, uint64_t *start_list, uint64_t *length_list); +void qemu_savevm_live_state(QEMUFile *f); +int qemu_save_device_state(QEMUFile *f); + int qemu_loadvm_state(QEMUFile *f); +int qemu_loadvm_state_begin(QEMUFile *f); +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis); +int qemu_load_device_state(QEMUFile *f); extern int autostart; diff --git a/migration/savevm.c b/migration/savevm.c index 9c2d239..dac478b 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -54,6 +54,7 @@ #include "qemu/cutils.h" #include "io/channel-buffer.h" #include "io/channel-file.h" +#include "migration/colo.h" #ifndef ETH_P_RARP #define ETH_P_RARP 0x8035 @@ -1279,13 +1280,21 @@ done: return ret; } -static int qemu_save_device_state(QEMUFile *f) +void qemu_savevm_live_state(QEMUFile *f) { -SaveStateEntry *se; +/* save QEMU_VM_SECTION_END section */ +qemu_savevm_state_complete_precopy(f, true); +qemu_put_byte(f, QEMU_VM_EOF); +} -qemu_put_be32(f, QEMU_VM_FILE_MAGIC); -qemu_put_be32(f, QEMU_VM_FILE_VERSION); +int qemu_save_device_state(QEMUFile *f) +{ +SaveStateEntry *se; +if (!migration_in_colo_state()) { +qemu_put_be32(f, QEMU_VM_FILE_MAGIC); +qemu_put_be32(f, QEMU_VM_FILE_VERSION); +} Note that got split out into qemu_savevm_state_header() at some point. Do you mean i should use the wrapper qemu_savevm_state_heade() here ? Yes, I think so; best to keep the code that writes the file headers in one place. OK, Will fix in next version, thanks. Dave Dave cpu_synchronize_all_states(); QTAILQ_FOREACH(se, _state.handlers, entry) { @@ -1336,8 +1345,6 @@ enum LoadVMExitCodes { LOADVM_QUIT = 1, }; -static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis); - /* -- incoming postcopy messages -- */ /* 'advise' arrives before any transfers just to tell us that a postcopy * *might* happen - it might be skipped if precopy transferred everything @@ -1942,7 +1949,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis) return 0; } -static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis) +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis) { uint8_t section_type; int ret = 0; @@ -2080,6 +2087,40 @@ int qemu_loadvm_state(QEMUFile *f) return ret; } +int qemu_loadvm_state_begin(QEMUFile *f) +{ +MigrationIncomingState *mis = migration_incoming_get_current(); +Error *local_err = NULL; +int ret; + +if (qemu_savevm_state_blocked(_err)) { +error_report_err(local_err); +return -EINVAL; +} +/* Load QEMU_VM_SECTION_START section */ +ret = qemu_loadvm_state_main(f, mis); +if (ret < 0) { +error_report("Failed to loadvm begin work: %d", ret); +} +return ret; +} + +int qemu_load_device_state(QEMUFile *f) +{ +MigrationIncomingState *mis = migration_incoming_get_current(); +int ret; + +/* Load QEMU_VM_SECTION_FULL section */ +ret = qemu_loadvm_state_main(f, mis); +if (ret < 0) { +error_report("Failed to load device state: %d", ret); +return ret; +} + +cpu_synchronize_all_post_init(); +return 0; +} + int save_vmstate(Monitor *mon,
Re: [Qemu-devel] [PATCH] COLO-compare: Add compare_lock aviod comparison conflict
On 2017/4/20 12:32, Zhang Chen wrote: When network traffic heavy, compare_pri_rs_finalize() and compare_sec_rs_finalize() have a chance to confilct. Both of them call colo_compare_connection() to compare packet, But during compare_pri_rs_finalize() comparison, have secondary packet come and call compare_sec_rs_finalize(), that packet will be handle twice. If packet same, the pkt will be double free. Interesting, if I'm right, this should not happen, because, all the comparing works are done in colo compare thread, so there is no chance to access the connect_list concurrently. Besides, even both of the packets from primary and secondary arrive at the same time, it should only be handle once, we will handle it with the later arrived one, No ? Signed-off-by: Zhang Chen--- net/colo-compare.c | 8 1 file changed, 8 insertions(+) diff --git a/net/colo-compare.c b/net/colo-compare.c index 54e6d40..686c1b4 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -79,6 +79,8 @@ typedef struct CompareState { * element type: Connection */ GQueue conn_list; +/* compare lock */ +QemuMutex compare_lock; /* hashtable to save connection */ GHashTable *connection_track_table; /* compare thread, a thread for each NIC */ @@ -619,7 +621,9 @@ static void compare_pri_rs_finalize(SocketReadState *pri_rs) compare_chr_send(>chr_out, pri_rs->buf, pri_rs->packet_len); } else { /* compare connection */ +qemu_mutex_lock(>compare_lock); g_queue_foreach(>conn_list, colo_compare_connection, s); +qemu_mutex_unlock(>compare_lock); } } @@ -631,7 +635,9 @@ static void compare_sec_rs_finalize(SocketReadState *sec_rs) trace_colo_compare_main("secondary: unsupported packet in"); } else { /* compare connection */ +qemu_mutex_lock(>compare_lock); g_queue_foreach(>conn_list, colo_compare_connection, s); +qemu_mutex_unlock(>compare_lock); } } @@ -702,6 +708,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) net_socket_rs_init(>sec_rs, compare_sec_rs_finalize); g_queue_init(>conn_list); +qemu_mutex_init(>compare_lock); s->connection_track_table = g_hash_table_new_full(connection_key_hash, connection_key_equal, @@ -771,6 +778,7 @@ static void colo_compare_finalize(Object *obj) g_queue_foreach(>conn_list, colo_flush_packets, s); g_queue_clear(>conn_list); +qemu_mutex_destroy(>compare_lock); g_hash_table_destroy(s->connection_track_table); g_free(s->pri_indev);
Re: [Qemu-devel] 答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
On 2017/4/19 10:02, Xiao Guangrong wrote: On 04/13/2017 05:38 PM, Hailiang Zhang wrote: On 2017/4/13 17:35, Xiao Guangrong wrote: On 04/13/2017 05:29 PM, Hailiang Zhang wrote: On 2017/4/13 17:18, Xiao Guangrong wrote: On 04/13/2017 05:05 PM, Zhanghailiang wrote: Hi, -邮件原件- 发件人: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] 代表 Xiao Guangrong 发送时间: 2017年4月13日 16:53 收件人: Paolo Bonzini; m...@redhat.com; mtosa...@redhat.com 抄送: qemu-devel@nongnu.org; k...@vger.kernel.org; yunfang...@tencent.com; Xiao Guangrong 主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster On 04/13/2017 04:39 PM, Xiao Guangrong wrote: On 04/13/2017 02:37 PM, Paolo Bonzini wrote: On 12/04/2017 17:51, guangrong.x...@gmail.com wrote: The root cause is that the clock will be lost if the periodic period is changed as currently code counts the next periodic time like this: next_irq_clock = (cur_clock & ~(period - 1)) + period; consider the case if cur_clock = 0x11FF and period = 0x100, then the next_irq_clock is 0x1200, however, there is only 1 clock left to trigger the next irq. Unfortunately, Windows guests (at least Windows7) change the period very frequently if it runs the attached code, so that the lost clock is accumulated, the wall-time become faster and faster Very interesting. Yes, indeed. However, I think that the above should be exactly how the RTC should work. The original RTC circuit had 22 divider stages (see page 13 of the datasheet[1], at the bottom right), and the periodic interrupt taps the rising edge of one of the dividers (page 16, second paragraph). The datasheet also never mentions a comparator being used to trigger the periodic interrupts. That was my thought before, however, after more test, i am not sure if re-configuring RegA changes these divider stages internal... Have you checked that this Windows bug doesn't happen on real hardware too? Or is the combination of driftfix=slew and changing periods that is a problem? I have two physical windows 7 machines, both of them have 'useplatformclock = off' and ntp disabled, the wall time is really accurate. The difference is that the physical machines are using Intel Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the issue is easily be reproduced just in ~10 mins. Our test mostly focus on 'driftfix=slew' and after this patchset the time is accurate and stable. I will do the test for dropping 'slew' and see what will happen... Well, the time is easily observed to be faster if 'driftfix=slew' is not used. :( You mean, it only fixes the one case which with the ' driftfix=slew ' is used ? No. for both. We encountered this problem too, I have tried to fix it long time ago. https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html. (It seems that your solution is more useful) But it seems that it is impossible to fix, we need to emulate the behaviors of real hardware, but we didn't find any clear description about it. And it seems that other virtualization platforms That is the issue, the hardware spec does not detail how the clock is counted when the timer interval is changed. What we can do at this time is that speculate it from the behaviors. Current RTC is completely unusable anyway. have this problem too: VMware: https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf Heper-v: https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/ Hmm, slower clock is understandable, does really the Windows7 on hyperV have faster clock? Did you meet it? I don't know, we didn't test it, besides, I'd like to know how long did your testcase run before you judge it is stable with 'driftfix=slew' option? (My previous patch can't fix it completely but only narrows the gap between timer in guest and real timer.) More than 12 hours. Great, I'll test and look into it ... thanks. Hi Hailiang, Does this patchset work for you? :) Yes, i think it works for us, nice work :) .
Re: [Qemu-devel] [PATCH v2] char: Fix removing wrong GSource that be found by fd_in_tag
On 2017/4/18 16:11, Marc-André Lureau wrote: Hi - Original Message - We use fd_in_tag to find a GSource, fd_in_tag is return value of g_source_attach(GSource *source, GMainContext *context), the return value is unique only in the same context, so we may get the same values with different 'context' parameters. It is no problem to find the right fd_in_tag by using g_main_context_find_source_by_id(GMainContext *context, guint source_id) while there is only one default main context. But colo-compare tries to create/use its own context, and if we pass wrong 'context' parameter with right fd_in_tag, we will find a wrong GSource to handle. We tried to fix the related codes in commit b43dec, but it didn't fix the bug completely, because we still have some codes didn't pass *right* context parameter for remove_fd_in_watch(). Let's fix it by record the GSource directly instead of fd_in_tag. You could simply name it "gsource" or "source" instead of "chr_gsource" to avoid redundancy in chr->chr_gsource. OK, besides, i forgot to update the hash, I'll fix both of them in V3, thanks. Signed-off-by: zhanghailianglooks good to me: Reviewed-by: Marc-André Lureau --- v2: - Fix minor comments from Marc-André Lureau --- chardev/char-fd.c | 8 chardev/char-io.c | 23 --- chardev/char-io.h | 4 ++-- chardev/char-pty.c| 6 +++--- chardev/char-socket.c | 8 chardev/char-udp.c| 8 chardev/char.c| 2 +- include/sysemu/char.h | 2 +- 8 files changed, 27 insertions(+), 34 deletions(-) diff --git a/chardev/char-fd.c b/chardev/char-fd.c index 548dd4c..7f0169d 100644 --- a/chardev/char-fd.c +++ b/chardev/char-fd.c @@ -58,7 +58,7 @@ static gboolean fd_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque) ret = qio_channel_read( chan, (gchar *)buf, len, NULL); if (ret == 0) { -remove_fd_in_watch(chr, NULL); +remove_fd_in_watch(chr); qemu_chr_be_event(chr, CHR_EVENT_CLOSED); return FALSE; } @@ -89,9 +89,9 @@ static void fd_chr_update_read_handler(Chardev *chr, { FDChardev *s = FD_CHARDEV(chr); -remove_fd_in_watch(chr, NULL); +remove_fd_in_watch(chr); if (s->ioc_in) { -chr->fd_in_tag = io_add_watch_poll(chr, s->ioc_in, +chr->chr_gsource = io_add_watch_poll(chr, s->ioc_in, fd_chr_read_poll, fd_chr_read, chr, context); @@ -103,7 +103,7 @@ static void char_fd_finalize(Object *obj) Chardev *chr = CHARDEV(obj); FDChardev *s = FD_CHARDEV(obj); -remove_fd_in_watch(chr, NULL); +remove_fd_in_watch(chr); if (s->ioc_in) { object_unref(OBJECT(s->ioc_in)); } diff --git a/chardev/char-io.c b/chardev/char-io.c index b4bb094..d781ad6 100644 --- a/chardev/char-io.c +++ b/chardev/char-io.c @@ -98,7 +98,7 @@ static GSourceFuncs io_watch_poll_funcs = { .finalize = io_watch_poll_finalize, }; -guint io_add_watch_poll(Chardev *chr, +GSource *io_add_watch_poll(Chardev *chr, QIOChannel *ioc, IOCanReadHandler *fd_can_read, QIOChannelFunc fd_read, @@ -106,7 +106,6 @@ guint io_add_watch_poll(Chardev *chr, GMainContext *context) { IOWatchPoll *iwp; -int tag; char *name; iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs, @@ -122,21 +121,15 @@ guint io_add_watch_poll(Chardev *chr, g_source_set_name((GSource *)iwp, name); g_free(name); -tag = g_source_attach(>parent, context); +g_source_attach(>parent, context); g_source_unref(>parent); -return tag; +return (GSource *)iwp; } -static void io_remove_watch_poll(guint tag, GMainContext *context) +static void io_remove_watch_poll(GSource *source) { -GSource *source; IOWatchPoll *iwp; -g_return_if_fail(tag > 0); - -source = g_main_context_find_source_by_id(context, tag); -g_return_if_fail(source != NULL); - iwp = io_watch_poll_from_source(source); if (iwp->src) { g_source_destroy(iwp->src); @@ -146,11 +139,11 @@ static void io_remove_watch_poll(guint tag, GMainContext *context) g_source_destroy(>parent); } -void remove_fd_in_watch(Chardev *chr, GMainContext *context) +void remove_fd_in_watch(Chardev *chr) { -if (chr->fd_in_tag) { -io_remove_watch_poll(chr->fd_in_tag, context); -chr->fd_in_tag = 0; +if (chr->chr_gsource) { +io_remove_watch_poll(chr->chr_gsource); +chr->chr_gsource = NULL; } } diff --git a/chardev/char-io.h b/chardev/char-io.h index 842be56..55973a7 100644 --- a/chardev/char-io.h +++ b/chardev/char-io.h @@ -29,14 +29,14 @@ #include
Re: [Qemu-devel] [PATCH] char: Fix removing wrong GSource that be found by fd_in_tag
On 2017/4/18 21:36, Eric Blake wrote: On 04/14/2017 05:10 AM, Marc-André Lureau wrote: Hi - Original Message - We use fd_in_tag to find a GSource, fd_in_tag is return value of g_source_attach(GSource *source, GMainContext *context), the return value is unique only in the same context, so we may get the same values with different 'context' parameters. It is no problem to find the right fd_in_tag by using g_main_context_find_source_by_id(GMainContext *context, guint source_id) while there is only one default main context. But colo-compare tries to create/use its own context, and if we pass wrong 'context' parameter with right fd_in_tag, we will find a wrong GSource to handle. We tied to fix the related codes in commit b43dec, but it didn't tied->tried Please use a bit longer commit sha1, or full sha1, it will likely conflict otherwise in the future. 6 chars is indeed short, 7 is git's default as usually long enough, although I've encountered collisions that require 8 chars. [And google has proved that you can have a collision across the entire hash, Thanks for your explanation. I didn't notice that before, I have been always using the 6 chars hash to get the commit ... although that is harder to generate.] I generally use 8 or so when writing commit messages. Fortunately, even if a collision is introduces later, someone that is motivated enough can still resolve the collision by filtering out any collisions that resolve to non-commits, and among the remaining colliding SHA1 focus on the one that has a commit date which predates the message with the reference. Agreed, but I'd better to update the comment to make it clearer. thanks.
Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint
On 2017/4/18 11:55, Jason Wang wrote: On 2017年04月17日 19:04, Hailiang Zhang wrote: Hi Jason, On 2017/4/14 14:38, Jason Wang wrote: On 2017年04月14日 14:22, Hailiang Zhang wrote: Hi Jason, On 2017/4/14 13:57, Jason Wang wrote: On 2017年02月22日 17:31, Zhang Chen wrote: On 02/22/2017 11:42 AM, zhanghailiang wrote: While do checkpoint, we need to flush all the unhandled packets, By using the filter notifier mechanism, we can easily to notify every compare object to do this process, which runs inside of compare threads as a coroutine. Hi~ Jason and Hailiang. I will send a patch set later about colo-compare notify mechanism for Xen like this patch. I want to add a new chardev socket way in colo-comapre connect to Xen colo, for notify checkpoint or failover, Because We have no choice to use this way communicate with Xen codes. That's means we will have two notify mechanism. What do you think about this? Thanks Zhang Chen I was thinking the possibility of using similar way to for colo compare. E.g can we use socket? This can saves duplicated codes more or less. Since there are too many sockets used by filter and COLO, (Two unix sockets and two tcp sockets for each vNIC), I don't want to introduce more ;) , but i'm not sure if it is possible to make it more flexible and optional, abstract these duplicated codes, pass the opened fd (No matter eventfd or socket fd ) as parameter, for example. Is this way acceptable ? Thanks, Hailiang Yes, that's kind of what I want. We don't want to use two message format. Passing a opened fd need management support, we still need a fallback if there's no management on top. For qemu/kvm, we can do all stuffs transparent to the cli by e.g socketpair() or others, but the key is to have a unified message format. After a deeper investigation, i think we can re-use most codes, since there is no existing way to notify xen (no ?), we still needs notify chardev socket (Be used to notify xen, it is optional.) (http://patchwork.ozlabs.org/patch/733431/ "COLO-compare: Add Xen notify chardev socket handler frame") Yes and actually you can use this for bi-directional communication. The only differences is the implementation of comparing. Besides, there is an existing qmp comand 'xen-colo-do-checkpoint', I don't see this in master? Er, it has been merged already, please see migration/colo.c, void qmp_xen_colo_do_checkpoint(Error **errp); we can re-use it to notify colo-compare objects and other filter objects to do checkpoint, for the opposite direction, we use the notify chardev socket (Only for xen). Just want to make sure I understand the design, who will trigger this command? Management? The command will be issued by XEN (xc_save ?), the original existing xen-colo-do-checkpoint command now only be used to notify block replication to do checkpoint, we can re-use it for filters too. Can we just use the socket? I don't quite understand ... Just as the codes showed bellow, in this scenario, XEN notifies colo-compare and fiters do checkpoint by using qmp command, and colo-compare notifies XEN about net inconsistency event by using the new socket. So the codes will be like: diff --git a/migration/colo.c b/migration/colo.c index 91da936..813c281 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -224,7 +224,19 @@ ReplicationStatus *qmp_query_xen_replication_status(Error **errp) void qmp_xen_colo_do_checkpoint(Error **errp) { +Error *local_err = NULL; + replication_do_checkpoint_all(errp); +/* Notify colo-compare and other filters to do checkpoint */ +colo_notify_compares_event(NULL, COLO_CHECKPOINT, _err); +if (local_err) { +error_propagate(errp, local_err); +return; +} +colo_notify_filters_event(COLO_CHECKPOINT, _err); +if (local_err) { +error_propagate(errp, local_err); +} } static void colo_send_message(QEMUFile *f, COLOMessage msg, diff --git a/net/colo-compare.c b/net/colo-compare.c index 24e13f0..de975c5 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -391,6 +391,9 @@ static void colo_compare_inconsistent_notify(void) { notifier_list_notify(_compare_notifiers, migrate_get_current()); KVM will use this notifier/callback way, and in this way, we can avoid the redundant socket. +if (s->notify_dev) { + /* Do something, notify remote side through notify dev */ +} } If we have a notify socket configured, we will send the message about net inconsistent event. void colo_compare_register_notifier(Notifier *notify) How about this scenario ? See my reply above, and we need unify the message format too. Raw string is ok but we'd better have something like TLV or others. Agreed, we need it to be more standard. Thanks, Hailiang Thanks Thoughts? Thanks Thanks . . .
Re: [Qemu-devel] [PULL 2/8] replication: clarify permissions
On 2017/4/18 9:23, Eric Blake wrote: On 03/17/2017 08:15 AM, Kevin Wolf wrote: From: Changlong XieEven if hidden_disk, secondary_disk are backing files, they all need write permissions in replication scenario. Otherwise we will encouter below exceptions on secondary side during adding nbd server: {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk', 'writable': true } } {"error": {"class": "GenericError", "desc": "Conflicts with use by hidden-qcow2-driver as 'backing', which does not allow 'write' on sec-qcow2-driver-for-nbd"}} CC: Zhang Hailiang CC: Zhang Chen CC: Wen Congyang This address for Wen Congyang is different than the one listed in MAINTAINERS for replication (M: Wen Congyang ), and different still from addresses my mailer has harvested from other posts (wencongy...@gmail.com). The MAINTAINERS entry is now resulting in 'undelivered mail' bounce messages, can you please submit an update to MAINTAINERS with your new preferred address? [or gently correct me if I'm confusing two people with the same name?] No, the same people, he just left his job from fujitsu, the entry in MAINTAINERS file needs to be updated. Cc: Changlong Xie Hi Changlong, would you please send a patch to update it ? Hailiang
Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint
Hi Jason, On 2017/4/14 14:38, Jason Wang wrote: On 2017年04月14日 14:22, Hailiang Zhang wrote: Hi Jason, On 2017/4/14 13:57, Jason Wang wrote: On 2017年02月22日 17:31, Zhang Chen wrote: On 02/22/2017 11:42 AM, zhanghailiang wrote: While do checkpoint, we need to flush all the unhandled packets, By using the filter notifier mechanism, we can easily to notify every compare object to do this process, which runs inside of compare threads as a coroutine. Hi~ Jason and Hailiang. I will send a patch set later about colo-compare notify mechanism for Xen like this patch. I want to add a new chardev socket way in colo-comapre connect to Xen colo, for notify checkpoint or failover, Because We have no choice to use this way communicate with Xen codes. That's means we will have two notify mechanism. What do you think about this? Thanks Zhang Chen I was thinking the possibility of using similar way to for colo compare. E.g can we use socket? This can saves duplicated codes more or less. Since there are too many sockets used by filter and COLO, (Two unix sockets and two tcp sockets for each vNIC), I don't want to introduce more ;) , but i'm not sure if it is possible to make it more flexible and optional, abstract these duplicated codes, pass the opened fd (No matter eventfd or socket fd ) as parameter, for example. Is this way acceptable ? Thanks, Hailiang Yes, that's kind of what I want. We don't want to use two message format. Passing a opened fd need management support, we still need a fallback if there's no management on top. For qemu/kvm, we can do all stuffs transparent to the cli by e.g socketpair() or others, but the key is to have a unified message format. After a deeper investigation, i think we can re-use most codes, since there is no existing way to notify xen (no ?), we still needs notify chardev socket (Be used to notify xen, it is optional.) (http://patchwork.ozlabs.org/patch/733431/ "COLO-compare: Add Xen notify chardev socket handler frame") Besides, there is an existing qmp comand 'xen-colo-do-checkpoint', we can re-use it to notify colo-compare objects and other filter objects to do checkpoint, for the opposite direction, we use the notify chardev socket (Only for xen). So the codes will be like: diff --git a/migration/colo.c b/migration/colo.c index 91da936..813c281 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -224,7 +224,19 @@ ReplicationStatus *qmp_query_xen_replication_status(Error **errp) void qmp_xen_colo_do_checkpoint(Error **errp) { +Error *local_err = NULL; + replication_do_checkpoint_all(errp); +/* Notify colo-compare and other filters to do checkpoint */ +colo_notify_compares_event(NULL, COLO_CHECKPOINT, _err); +if (local_err) { +error_propagate(errp, local_err); +return; +} +colo_notify_filters_event(COLO_CHECKPOINT, _err); +if (local_err) { +error_propagate(errp, local_err); +} } static void colo_send_message(QEMUFile *f, COLOMessage msg, diff --git a/net/colo-compare.c b/net/colo-compare.c index 24e13f0..de975c5 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -391,6 +391,9 @@ static void colo_compare_inconsistent_notify(void) { notifier_list_notify(_compare_notifiers, migrate_get_current()); +if (s->notify_dev) { + /* Do something, notify remote side through notify dev */ +} } void colo_compare_register_notifier(Notifier *notify) How about this scenario ? Thoughts? Thanks Thanks . .
Re: [Qemu-devel] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options
On 2017/4/12 22:28, Eric Blake wrote: On 04/12/2017 09:05 AM, zhanghailiang wrote: We use these two options to identify which disk is shared Signed-off-by: zhanghailiangSigned-off-by: Wen Congyang Signed-off-by: Zhang Chen --- v4: - Add proper comment for primary_disk (Stefan) v2: - Move g_free(s->shared_disk_id) to the common fail process place (Stefan) - Fix comments for these two options --- +++ b/qapi/block-core.json @@ -2661,12 +2661,20 @@ # node who owns the replication node chain. Must not be given in # primary mode. # +# @shared-disk-id: Id of shared disk while is replication mode, if @shared-disk +# is true, this option is required (Since: 2.10) +# +# @shared-disk: To indicate whether or not a disk is shared by primary VM +# and secondary VM. (The default is false) (Since: 2.10) +# # Since: 2.9 ## { 'struct': 'BlockdevOptionsReplication', 'base': 'BlockdevOptionsGenericFormat', 'data': { 'mode': 'ReplicationMode', -'*top-id': 'str' } } +'*top-id': 'str', +'*shared-disk-id': 'str', +'*shared-disk': 'bool' } } Do we need a separate bool and string? Or is it sufficient to say that if shared-disk is omitted, we are not sharing, and that if a shared-disk string is present, then we are sharing and it names the id of the shared disk. Er, Yes, We need both of them, the command line of secondary sides is like: -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\ file.driver=qcow2,top-id=active-disk0,\ file.file.filename=/mnt/ramfs/active_disk.img,\ file.backing=hidden_disk0,shared-disk=on We only need the bool shared-disk to indicate whether disk is sharing or not, but for primary side, we need to the blockdev-add command to tell primary which disk is shared. { 'execute': 'blockdev-add', 'arguments': { 'driver': 'replication', 'node-name': 'rep', 'mode': 'primary', 'shared-disk-id': 'primary_disk0', 'shared-disk': true, 'file': { 'driver': 'nbd', 'export': 'hidden_disk0', 'server': { 'type': 'inet', 'data': { 'host': 'xxx.xxx.xxx.xxx', 'port': 'yyy' } } } } }
Re: [Qemu-devel] [PATCH] char: Fix removing wrong GSource that be found by fd_in_tag
Hi, On 2017/4/14 18:10, Marc-André Lureau wrote: Hi - Original Message - We use fd_in_tag to find a GSource, fd_in_tag is return value of g_source_attach(GSource *source, GMainContext *context), the return value is unique only in the same context, so we may get the same values with different 'context' parameters. It is no problem to find the right fd_in_tag by using g_main_context_find_source_by_id(GMainContext *context, guint source_id) while there is only one default main context. But colo-compare tries to create/use its own context, and if we pass wrong 'context' parameter with right fd_in_tag, we will find a wrong GSource to handle. We tied to fix the related codes in commit b43dec, but it didn't tied->tried Please use a bit longer commit sha1, or full sha1, it will likely conflict otherwise in the future. OK, I'll replace it with the full sha1. fix the bug completely, because we still have some codes didn't pass *right* context parameter for remove_fd_in_watch(). Mixing contexts sounds like a colo-compare bug, did you fix it there too? Yes, we don't have to change any other codes in colo-compare, We just call qemu_chr_fe_set_handlers() in colo-compare to detach the old GSource from main default context, and attach the new GSource to the new context. Let's fix it by record the GSource directly instead of fd_in_tag. Make sense to me, and even simplify a bit the code. We should be more careful with pointers though (the reason why tag existed in the first place I imagine). Agreed. Signed-off-by: zhanghailiang--- chardev/char-fd.c | 8 chardev/char-io.c | 23 --- chardev/char-io.h | 4 ++-- chardev/char-pty.c| 6 +++--- chardev/char-socket.c | 8 chardev/char-udp.c| 8 chardev/char.c| 2 +- include/sysemu/char.h | 2 +- 8 files changed, 27 insertions(+), 34 deletions(-) diff --git a/chardev/char-fd.c b/chardev/char-fd.c index 548dd4c..7f0169d 100644 --- a/chardev/char-fd.c +++ b/chardev/char-fd.c @@ -58,7 +58,7 @@ static gboolean fd_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque) ret = qio_channel_read( chan, (gchar *)buf, len, NULL); if (ret == 0) { -remove_fd_in_watch(chr, NULL); +remove_fd_in_watch(chr); qemu_chr_be_event(chr, CHR_EVENT_CLOSED); return FALSE; } @@ -89,9 +89,9 @@ static void fd_chr_update_read_handler(Chardev *chr, { FDChardev *s = FD_CHARDEV(chr); -remove_fd_in_watch(chr, NULL); +remove_fd_in_watch(chr); if (s->ioc_in) { -chr->fd_in_tag = io_add_watch_poll(chr, s->ioc_in, +chr->chr_gsource = io_add_watch_poll(chr, s->ioc_in, fd_chr_read_poll, fd_chr_read, chr, context); @@ -103,7 +103,7 @@ static void char_fd_finalize(Object *obj) Chardev *chr = CHARDEV(obj); FDChardev *s = FD_CHARDEV(obj); -remove_fd_in_watch(chr, NULL); +remove_fd_in_watch(chr); if (s->ioc_in) { object_unref(OBJECT(s->ioc_in)); } diff --git a/chardev/char-io.c b/chardev/char-io.c index b4bb094..6deb193 100644 --- a/chardev/char-io.c +++ b/chardev/char-io.c @@ -98,7 +98,7 @@ static GSourceFuncs io_watch_poll_funcs = { .finalize = io_watch_poll_finalize, }; -guint io_add_watch_poll(Chardev *chr, +GSource *io_add_watch_poll(Chardev *chr, QIOChannel *ioc, IOCanReadHandler *fd_can_read, QIOChannelFunc fd_read, @@ -106,7 +106,6 @@ guint io_add_watch_poll(Chardev *chr, GMainContext *context) { IOWatchPoll *iwp; -int tag; char *name; iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs, @@ -122,21 +121,15 @@ guint io_add_watch_poll(Chardev *chr, g_source_set_name((GSource *)iwp, name); g_free(name); -tag = g_source_attach(>parent, context); +g_source_attach(>parent, context); g_source_unref(>parent); -return tag; +return (GSource *)iwp; } -static void io_remove_watch_poll(guint tag, GMainContext *context) +static void io_remove_watch_poll(GSource *source) { -GSource *source; IOWatchPoll *iwp; -g_return_if_fail(tag > 0); - -source = g_main_context_find_source_by_id(context, tag); -g_return_if_fail(source != NULL); - iwp = io_watch_poll_from_source(source); if (iwp->src) { g_source_destroy(iwp->src); @@ -146,11 +139,11 @@ static void io_remove_watch_poll(guint tag, GMainContext *context) g_source_destroy(>parent); } -void remove_fd_in_watch(Chardev *chr, GMainContext *context) +void remove_fd_in_watch(Chardev *chr) { -if (chr->fd_in_tag) { -io_remove_watch_poll(chr->fd_in_tag, context); -chr->fd_in_tag = 0;
Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint
Hi Jason, On 2017/4/14 13:57, Jason Wang wrote: On 2017年02月22日 17:31, Zhang Chen wrote: On 02/22/2017 11:42 AM, zhanghailiang wrote: While do checkpoint, we need to flush all the unhandled packets, By using the filter notifier mechanism, we can easily to notify every compare object to do this process, which runs inside of compare threads as a coroutine. Hi~ Jason and Hailiang. I will send a patch set later about colo-compare notify mechanism for Xen like this patch. I want to add a new chardev socket way in colo-comapre connect to Xen colo, for notify checkpoint or failover, Because We have no choice to use this way communicate with Xen codes. That's means we will have two notify mechanism. What do you think about this? Thanks Zhang Chen I was thinking the possibility of using similar way to for colo compare. E.g can we use socket? This can saves duplicated codes more or less. Since there are too many sockets used by filter and COLO, (Two unix sockets and two tcp sockets for each vNIC), I don't want to introduce more ;) , but i'm not sure if it is possible to make it more flexible and optional, abstract these duplicated codes, pass the opened fd (No matter eventfd or socket fd ) as parameter, for example. Is this way acceptable ? Thanks, Hailiang Thanks .
Re: [Qemu-devel] 答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
On 2017/4/13 17:35, Xiao Guangrong wrote: On 04/13/2017 05:29 PM, Hailiang Zhang wrote: On 2017/4/13 17:18, Xiao Guangrong wrote: On 04/13/2017 05:05 PM, Zhanghailiang wrote: Hi, -邮件原件- 发件人: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] 代表 Xiao Guangrong 发送时间: 2017年4月13日 16:53 收件人: Paolo Bonzini; m...@redhat.com; mtosa...@redhat.com 抄送: qemu-devel@nongnu.org; k...@vger.kernel.org; yunfang...@tencent.com; Xiao Guangrong 主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster On 04/13/2017 04:39 PM, Xiao Guangrong wrote: On 04/13/2017 02:37 PM, Paolo Bonzini wrote: On 12/04/2017 17:51, guangrong.x...@gmail.com wrote: The root cause is that the clock will be lost if the periodic period is changed as currently code counts the next periodic time like this: next_irq_clock = (cur_clock & ~(period - 1)) + period; consider the case if cur_clock = 0x11FF and period = 0x100, then the next_irq_clock is 0x1200, however, there is only 1 clock left to trigger the next irq. Unfortunately, Windows guests (at least Windows7) change the period very frequently if it runs the attached code, so that the lost clock is accumulated, the wall-time become faster and faster Very interesting. Yes, indeed. However, I think that the above should be exactly how the RTC should work. The original RTC circuit had 22 divider stages (see page 13 of the datasheet[1], at the bottom right), and the periodic interrupt taps the rising edge of one of the dividers (page 16, second paragraph). The datasheet also never mentions a comparator being used to trigger the periodic interrupts. That was my thought before, however, after more test, i am not sure if re-configuring RegA changes these divider stages internal... Have you checked that this Windows bug doesn't happen on real hardware too? Or is the combination of driftfix=slew and changing periods that is a problem? I have two physical windows 7 machines, both of them have 'useplatformclock = off' and ntp disabled, the wall time is really accurate. The difference is that the physical machines are using Intel Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the issue is easily be reproduced just in ~10 mins. Our test mostly focus on 'driftfix=slew' and after this patchset the time is accurate and stable. I will do the test for dropping 'slew' and see what will happen... Well, the time is easily observed to be faster if 'driftfix=slew' is not used. :( You mean, it only fixes the one case which with the ' driftfix=slew ' is used ? No. for both. We encountered this problem too, I have tried to fix it long time ago. https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html. (It seems that your solution is more useful) But it seems that it is impossible to fix, we need to emulate the behaviors of real hardware, but we didn't find any clear description about it. And it seems that other virtualization platforms That is the issue, the hardware spec does not detail how the clock is counted when the timer interval is changed. What we can do at this time is that speculate it from the behaviors. Current RTC is completely unusable anyway. have this problem too: VMware: https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf Heper-v: https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/ Hmm, slower clock is understandable, does really the Windows7 on hyperV have faster clock? Did you meet it? I don't know, we didn't test it, besides, I'd like to know how long did your testcase run before you judge it is stable with 'driftfix=slew' option? (My previous patch can't fix it completely but only narrows the gap between timer in guest and real timer.) More than 12 hours. Great, I'll test and look into it ... thanks. .
Re: [Qemu-devel] 答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
On 2017/4/13 17:18, Xiao Guangrong wrote: On 04/13/2017 05:05 PM, Zhanghailiang wrote: Hi, -邮件原件- 发件人: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] 代表 Xiao Guangrong 发送时间: 2017年4月13日 16:53 收件人: Paolo Bonzini; m...@redhat.com; mtosa...@redhat.com 抄送: qemu-devel@nongnu.org; k...@vger.kernel.org; yunfang...@tencent.com; Xiao Guangrong 主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster On 04/13/2017 04:39 PM, Xiao Guangrong wrote: On 04/13/2017 02:37 PM, Paolo Bonzini wrote: On 12/04/2017 17:51, guangrong.x...@gmail.com wrote: The root cause is that the clock will be lost if the periodic period is changed as currently code counts the next periodic time like this: next_irq_clock = (cur_clock & ~(period - 1)) + period; consider the case if cur_clock = 0x11FF and period = 0x100, then the next_irq_clock is 0x1200, however, there is only 1 clock left to trigger the next irq. Unfortunately, Windows guests (at least Windows7) change the period very frequently if it runs the attached code, so that the lost clock is accumulated, the wall-time become faster and faster Very interesting. Yes, indeed. However, I think that the above should be exactly how the RTC should work. The original RTC circuit had 22 divider stages (see page 13 of the datasheet[1], at the bottom right), and the periodic interrupt taps the rising edge of one of the dividers (page 16, second paragraph). The datasheet also never mentions a comparator being used to trigger the periodic interrupts. That was my thought before, however, after more test, i am not sure if re-configuring RegA changes these divider stages internal... Have you checked that this Windows bug doesn't happen on real hardware too? Or is the combination of driftfix=slew and changing periods that is a problem? I have two physical windows 7 machines, both of them have 'useplatformclock = off' and ntp disabled, the wall time is really accurate. The difference is that the physical machines are using Intel Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the issue is easily be reproduced just in ~10 mins. Our test mostly focus on 'driftfix=slew' and after this patchset the time is accurate and stable. I will do the test for dropping 'slew' and see what will happen... Well, the time is easily observed to be faster if 'driftfix=slew' is not used. :( You mean, it only fixes the one case which with the ' driftfix=slew ' is used ? No. for both. We encountered this problem too, I have tried to fix it long time ago. https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html. (It seems that your solution is more useful) But it seems that it is impossible to fix, we need to emulate the behaviors of real hardware, but we didn't find any clear description about it. And it seems that other virtualization platforms That is the issue, the hardware spec does not detail how the clock is counted when the timer interval is changed. What we can do at this time is that speculate it from the behaviors. Current RTC is completely unusable anyway. have this problem too: VMware: https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf Heper-v: https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/ Hmm, slower clock is understandable, does really the Windows7 on hyperV have faster clock? Did you meet it? I don't know, we didn't test it, besides, I'd like to know how long did your testcase run before you judge it is stable with 'driftfix=slew' option? (My previous patch can't fix it completely but only narrows the gap between timer in guest and real timer.) Hailiang .
Re: [Qemu-devel] [PATCH 12/15] savevm: split the process of different stages for loadvm/savevm
On 2017/4/8 1:18, Dr. David Alan Gilbert wrote: * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: There are several stages during loadvm/savevm process. In different stage, migration incoming processes different types of sections. We want to control these stages more accuracy, it will benefit COLO performance, we don't have to save type of QEMU_VM_SECTION_START sections everytime while do checkpoint, besides, we want to separate the process of saving/loading memory and devices state. So we add three new helper functions: qemu_loadvm_state_begin(), qemu_load_device_state() and qemu_savevm_live_state() to achieve different process during migration. Besides, we make qemu_loadvm_state_main() and qemu_save_device_state() public. Cc: Juan QuintelaSigned-off-by: zhanghailiang Signed-off-by: Li Zhijian Reviewed-by: Dr. David Alan Gilbert --- include/sysemu/sysemu.h | 6 ++ migration/savevm.c | 55 ++--- 2 files changed, 54 insertions(+), 7 deletions(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 7ed665a..95cae41 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -132,7 +132,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name, uint64_t *start_list, uint64_t *length_list); +void qemu_savevm_live_state(QEMUFile *f); +int qemu_save_device_state(QEMUFile *f); + int qemu_loadvm_state(QEMUFile *f); +int qemu_loadvm_state_begin(QEMUFile *f); +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis); +int qemu_load_device_state(QEMUFile *f); extern int autostart; diff --git a/migration/savevm.c b/migration/savevm.c index 9c2d239..dac478b 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -54,6 +54,7 @@ #include "qemu/cutils.h" #include "io/channel-buffer.h" #include "io/channel-file.h" +#include "migration/colo.h" #ifndef ETH_P_RARP #define ETH_P_RARP 0x8035 @@ -1279,13 +1280,21 @@ done: return ret; } -static int qemu_save_device_state(QEMUFile *f) +void qemu_savevm_live_state(QEMUFile *f) { -SaveStateEntry *se; +/* save QEMU_VM_SECTION_END section */ +qemu_savevm_state_complete_precopy(f, true); +qemu_put_byte(f, QEMU_VM_EOF); +} -qemu_put_be32(f, QEMU_VM_FILE_MAGIC); -qemu_put_be32(f, QEMU_VM_FILE_VERSION); +int qemu_save_device_state(QEMUFile *f) +{ +SaveStateEntry *se; +if (!migration_in_colo_state()) { +qemu_put_be32(f, QEMU_VM_FILE_MAGIC); +qemu_put_be32(f, QEMU_VM_FILE_VERSION); +} Note that got split out into qemu_savevm_state_header() at some point. Do you mean i should use the wrapper qemu_savevm_state_heade() here ? Dave cpu_synchronize_all_states(); QTAILQ_FOREACH(se, _state.handlers, entry) { @@ -1336,8 +1345,6 @@ enum LoadVMExitCodes { LOADVM_QUIT = 1, }; -static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis); - /* -- incoming postcopy messages -- */ /* 'advise' arrives before any transfers just to tell us that a postcopy * *might* happen - it might be skipped if precopy transferred everything @@ -1942,7 +1949,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis) return 0; } -static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis) +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis) { uint8_t section_type; int ret = 0; @@ -2080,6 +2087,40 @@ int qemu_loadvm_state(QEMUFile *f) return ret; } +int qemu_loadvm_state_begin(QEMUFile *f) +{ +MigrationIncomingState *mis = migration_incoming_get_current(); +Error *local_err = NULL; +int ret; + +if (qemu_savevm_state_blocked(_err)) { +error_report_err(local_err); +return -EINVAL; +} +/* Load QEMU_VM_SECTION_START section */ +ret = qemu_loadvm_state_main(f, mis); +if (ret < 0) { +error_report("Failed to loadvm begin work: %d", ret); +} +return ret; +} + +int qemu_load_device_state(QEMUFile *f) +{ +MigrationIncomingState *mis = migration_incoming_get_current(); +int ret; + +/* Load QEMU_VM_SECTION_FULL section */ +ret = qemu_loadvm_state_main(f, mis); +if (ret < 0) { +error_report("Failed to load device state: %d", ret); +return ret; +} + +cpu_synchronize_all_post_init(); +return 0; +} + int save_vmstate(Monitor *mon, const char *name) { BlockDriverState *bs, *bs1; -- 1.8.3.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK .
Re: [Qemu-devel] [PATCH v2 0/5] Disable hotplug during migration
On 2017/4/6 21:13, Juan Quintela wrote: Hi This updates patches with all the comments received. I move qdev_unplug() to make linux-user compile. Please, review. [RFC - v1] This series disable hotplug/unplug during migration. Thank to Markus for explaining where I had to put the checks. Why? Because during migration we will fail if there are changes. For instance, in postcopy, if we add a memory region, we would failing. Same for other devices if they are not setup exactly the same on destination. Iidea would be to disable it, andthen enable for the thing that we know that work. This series are on top of my previous RAMState v2 serie. Commets, please? Make sense, this will benefit COLO too :) After the types found by Eric be fixed in patch 5, Reviewed-by: zhanghailiangThanks, Juan. *** BLURB HERE *** Juan Quintela (5): qdev: qdev_hotplug is really a bool qdev: Export qdev_hot_removed qdev: Move qdev_unplug() to qdev-monitor.c migration: Disable hotplug/unplug during migration ram: Remove migration_bitmap_extend() exec.c | 1 - hw/core/qdev.c | 40 +++- include/exec/ram_addr.h | 2 -- include/hw/qdev-core.h | 3 ++- migration/ram.c | 34 -- qdev-monitor.c | 45 + 6 files changed, 50 insertions(+), 75 deletions(-)
Re: [Qemu-devel] [PATCH 07/15] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
On 2017/4/8 1:06, Dr. David Alan Gilbert wrote: * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: We should not load PVM's state directly into SVM, because there maybe some errors happen when SVM is receving data, which will break SVM. We need to ensure receving all data before load the state into SVM. We use an extra memory to cache these data (PVM's ram). The ram cache in secondary side is initially the same as SVM/PVM's memory. And in the process of checkpoint, we cache the dirty pages of PVM into this ram cache firstly, so this ram cache always the same as PVM's memory at every checkpoint, then we flush this cached ram to SVM after we receive all PVM's state. You're probably going to find this interesting to merge with Juan's recent RAM block series. Probably not too hard, but he's touching a lot of the same code and rearranging things. Yes, I'll update this series on top of his series, better to send next version after his series been merged. Thanks, Hailiang Dave Cc: Juan QuintelaSigned-off-by: zhanghailiang Signed-off-by: Li Zhijian Reviewed-by: Dr. David Alan Gilbert --- include/exec/ram_addr.h | 1 + include/migration/migration.h | 4 +++ migration/colo.c | 14 + migration/ram.c | 73 ++- 4 files changed, 91 insertions(+), 1 deletion(-) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 3e79466..44e1190 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -27,6 +27,7 @@ struct RAMBlock { struct rcu_head rcu; struct MemoryRegion *mr; uint8_t *host; +uint8_t *colo_cache; /* For colo, VM's ram cache */ ram_addr_t offset; ram_addr_t used_length; ram_addr_t max_length; diff --git a/include/migration/migration.h b/include/migration/migration.h index 1735d66..93c6148 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -379,4 +379,8 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname, PostcopyState postcopy_state_get(void); /* Set the state and return the old state */ PostcopyState postcopy_state_set(PostcopyState new_state); + +/* ram cache */ +int colo_init_ram_cache(void); +void colo_release_ram_cache(void); #endif diff --git a/migration/colo.c b/migration/colo.c index 1e3e975..edb7f00 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -551,6 +551,7 @@ void *colo_process_incoming_thread(void *opaque) uint64_t total_size; uint64_t value; Error *local_err = NULL; +int ret; qemu_sem_init(>colo_incoming_sem, 0); @@ -572,6 +573,12 @@ void *colo_process_incoming_thread(void *opaque) */ qemu_file_set_blocking(mis->from_src_file, true); +ret = colo_init_ram_cache(); +if (ret < 0) { +error_report("Failed to initialize ram cache"); +goto out; +} + bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE); fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc)); object_unref(OBJECT(bioc)); @@ -705,11 +712,18 @@ out: if (fb) { qemu_fclose(fb); } +/* + * We can ensure BH is hold the global lock, and will join COLO + * incoming thread, so here it is not necessary to lock here again, + * Or there will be a deadlock error. + */ +colo_release_ram_cache(); /* Hope this not to be too long to loop here */ qemu_sem_wait(>colo_incoming_sem); qemu_sem_destroy(>colo_incoming_sem); /* Must be called after failover BH is completed */ + if (mis->to_src_file) { qemu_fclose(mis->to_src_file); } diff --git a/migration/ram.c b/migration/ram.c index f289fcd..b588990 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -219,6 +219,7 @@ static RAMBlock *last_sent_block; static ram_addr_t last_offset; static QemuMutex migration_bitmap_mutex; static uint64_t migration_dirty_pages; +static bool ram_cache_enable; static uint32_t last_version; static bool ram_bulk_stage; @@ -2227,6 +2228,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block, return block->host + offset; } +static inline void *colo_cache_from_block_offset(RAMBlock *block, + ram_addr_t offset) +{ +if (!offset_in_ramblock(block, offset)) { +return NULL; +} +if (!block->colo_cache) { +error_report("%s: colo_cache is NULL in block :%s", + __func__, block->idstr); +return NULL; +} +return block->colo_cache + offset; +} + /* * If a page (or a whole RDMA chunk) has been * determined to be zero, then zap it. @@ -2542,7 +2557,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) { RAMBlock *block
Re: [Qemu-devel] [PATCH 01/15] net/colo: Add notifier/callback related helpers for filter
On 2017/4/7 23:46, Dr. David Alan Gilbert wrote: * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: We will use this notifier to help COLO to notify filter object to do something, like do checkpoint, or process failover event. Cc: Jason WangSigned-off-by: zhanghailiang --- net/colo.c | 92 ++ net/colo.h | 18 2 files changed, 110 insertions(+) <..> +FilterNotifier *filter_noitifier_new(FilterNotifierCallback *cb, ^ Typo - no*i*tifier Good catch, will fix it in next version. (I've not looked at this patch much, I'll leave networking stuff to Jason) OK, thanks. Dave +void *opaque, Error **errp) +{ +FilterNotifier *notify; +int ret; + +notify = (FilterNotifier *)g_source_new(_source_funcs, +sizeof(FilterNotifier)); +ret = event_notifier_init(>event, false); +if (ret < 0) { +error_setg_errno(errp, -ret, "Failed to initialize event notifier"); +goto fail; +} +notify->pfd.fd = event_notifier_get_fd(>event); +notify->pfd.events = G_IO_IN | G_IO_HUP | G_IO_ERR; +notify->cb = cb; +notify->opaque = opaque; +g_source_add_poll(>source, >pfd); + +return notify; + +fail: +g_source_destroy(>source); +return NULL; +} + +int filter_notifier_set(FilterNotifier *notify, uint64_t value) +{ +ssize_t ret; + +do { +ret = write(notify->event.wfd, , sizeof(value)); +} while (ret < 0 && errno == EINTR); + +/* EAGAIN is fine, a read must be pending. */ +if (ret < 0 && errno != EAGAIN) { +return -errno; +} +return 0; +} diff --git a/net/colo.h b/net/colo.h index cd9027f..00f03b5 100644 --- a/net/colo.h +++ b/net/colo.h @@ -19,6 +19,7 @@ #include "qemu/jhash.h" #include "qemu/timer.h" #include "slirp/tcp.h" +#include "qemu/event_notifier.h" #define HASHTABLE_MAX_SIZE 16384 @@ -89,4 +90,21 @@ void connection_hashtable_reset(GHashTable *connection_track_table); Packet *packet_new(const void *data, int size); void packet_destroy(void *opaque, void *user_data); +typedef void FilterNotifierCallback(void *opaque, int value); +typedef struct FilterNotifier { +GSource source; +EventNotifier event; +GPollFD pfd; +FilterNotifierCallback *cb; +void *opaque; +} FilterNotifier; + +FilterNotifier *filter_noitifier_new(FilterNotifierCallback *cb, +void *opaque, Error **errp); +int filter_notifier_set(FilterNotifier *notify, uint64_t value); + +enum { +COLO_CHECKPOINT = 2, +COLO_FAILOVER, +}; #endif /* QEMU_COLO_PROXY_H */ -- 1.8.3.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK .
Re: [Qemu-devel] [PATCH 15/15] COLO: flush host dirty ram from cache
On 2017/4/8 1:39, Dr. David Alan Gilbert wrote: * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: Don't need to flush all VM's ram from cache, only flush the dirty pages since last checkpoint Cc: Juan QuintelaSigned-off-by: Li Zhijian Signed-off-by: Zhang Chen Signed-off-by: zhanghailiang --- migration/ram.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 6227b94..e9ba740 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2702,6 +2702,7 @@ int colo_init_ram_cache(void) migration_bitmap_rcu = g_new0(struct BitmapRcu, 1); migration_bitmap_rcu->bmap = bitmap_new(ram_cache_pages); migration_dirty_pages = 0; +memory_global_dirty_log_start(); Shouldn't there be a stop somewhere? (Probably if you failover to the secondary and colo stops?) Ha, good catch, i forgot to stop the dirty log in secondary side. return 0; @@ -2750,6 +2751,15 @@ void colo_flush_ram_cache(void) void *src_host; ram_addr_t offset = 0; +memory_global_dirty_log_sync(); +qemu_mutex_lock(_bitmap_mutex); +rcu_read_lock(); +QLIST_FOREACH_RCU(block, _list.blocks, next) { +migration_bitmap_sync_range(block->offset, block->used_length); +} +rcu_read_unlock(); +qemu_mutex_unlock(_bitmap_mutex); Again this might have some fun merging with Juan's recent changes - what's really unusual about your set is that you're using this bitmap on the destination; I suspect Juan's recent changes that trickier. Check 'Creating RAMState for migration' and 'Split migration bitmaps by ramblock'. I have reviewed these two series, and i think it's not a big problem for COLO here, We can still re-use most of the codes. Thanks, Hailiang Dave trace_colo_flush_ram_cache_begin(migration_dirty_pages); rcu_read_lock(); block = QLIST_FIRST_RCU(_list.blocks); -- 1.8.3.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK .
Re: [Qemu-devel] [PATCH v2] migration: Fix colo hang in socket_accept_incoming_migration
Hi, It seems that there is no difference from your previous version. You don't have to re-send it if there are no changes. This patch has been reviewed, so you can just wait until maintainers process it :) Thanks. On 2017/3/27 9:58, Guang Wang wrote: From: Wang guangThe channel socket was initialized manually, but forgot to set QIO_CHANNEL_FEATURE_SHUTDOWN. Thus, the colo_process_incoming_thread would hang at recvmsg. This patch just call qio_channel_socket_new to get channel, Which set QIO_CHANNEL_FEATURE_SHUTDOWN already. Signed-off-by: Wang Guang Signed-off-by: zhanghailiang Reviewed-by: Eric Blake --- io/channel-socket.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/io/channel-socket.c b/io/channel-socket.c index f546c68..64b36f5 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -331,16 +331,10 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, { QIOChannelSocket *cioc; -cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); -cioc->fd = -1; +cioc = qio_channel_socket_new(); cioc->remoteAddrLen = sizeof(ioc->remoteAddr); cioc->localAddrLen = sizeof(ioc->localAddr); -#ifdef WIN32 -QIO_CHANNEL(cioc)->event = CreateEvent(NULL, FALSE, FALSE, NULL); -#endif - - retry: trace_qio_channel_socket_accept(ioc); cioc->fd = qemu_accept(ioc->fd, (struct sockaddr *)>remoteAddr,
Re: [Qemu-devel] 答复: Re: 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang
On 2017/3/22 16:09, wang.guan...@zte.com.cn wrote: hi: yes.it is better. And should we delete Yes, you are right. #ifdef WIN32 QIO_CHANNEL(cioc)->event = CreateEvent(NULL, FALSE, FALSE, NULL) #endif in qio_channel_socket_accept? qio_channel_socket_new already have it. 原始邮件 发件人: <zhang.zhanghaili...@huawei.com> 收件人:王广10165992 抄送人: <xuqu...@huawei.com> <dgilb...@redhat.com> <zhangchen.f...@cn.fujitsu.com> <qemu-devel@nongnu.org> 日 期 :2017年03月22日 15:03 主 题 :Re: [Qemu-devel] 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang Hi, On 2017/3/22 9:42, wang.guan...@zte.com.cn wrote: > diff --git a/migration/socket.c b/migration/socket.c > > > index 13966f1..d65a0ea 100644 > > > --- a/migration/socket.c > > > +++ b/migration/socket.c > > > @@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc, > > > } > > > > > > trace_migration_socket_incoming_accepted() > > > > > > qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") > > > +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) > > > migration_channel_process_incoming(migrate_get_current(), > > > QIO_CHANNEL(sioc)) > > > object_unref(OBJECT(sioc)) > > > > > Is this patch ok? > Yes, i think this works, but a better way maybe to call qio_channel_set_feature() in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the socket accept fd, Or fix it by this: diff --git a/io/channel-socket.c b/io/channel-socket.c index f546c68..ce6894c 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, Error **errp) { QIOChannelSocket *cioc - -cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)) -cioc->fd = -1 + +cioc = qio_channel_socket_new() cioc->remoteAddrLen = sizeof(ioc->remoteAddr) cioc->localAddrLen = sizeof(ioc->localAddr) Thanks, Hailiang > I have test it . The test could not hang any more. > > > > > > > > > > > > > 原始邮件 > > > > 发件人: <zhang.zhanghaili...@huawei.com> > 收件人: <dgilb...@redhat.com> <berra...@redhat.com> > 抄送人: <xuqu...@huawei.com> <qemu-devel@nongnu.org> <zhangchen.f...@cn.fujitsu..com>王广10165992 > 日 期 :2017年03月22日 09:11 > 主 题 :Re: [Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang > > > > > > On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: > > * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote: > >> Hi, > >> > >> Thanks for reporting this, and i confirmed it in my test, and it is a bug. > >> > >> Though we tried to call qemu_file_shutdown() to shutdown the related fd, in > >> case COLO thread/incoming thread is stuck in read/write() while do failover, > >> but it didn't take effect, because all the fd used by COLO (also migration) > >> has been wrapped by qio channel, and it will not call the shutdown API if > >> we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN). > >> > >> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com> > >> > >> I doubted migration cancel has the same problem, it may be stuck in write() > >> if we tried to cancel migration. > >> > >> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) > >> { > >> qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") > >> migration_channel_connect(s, ioc, NULL) > >> ... ... > >> We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) above, > >> and the > >> migrate_fd_cancel() > >> { > >> ... ... > >> if (s->state == MIGRATION_STATUS_CANCELLING && f) { > >> qemu_file_shutdown(f) --> This will not take effect. No ? > >> } > >> } > > > > (cc'd in Daniel Berrange). > > I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) at the > > top of qio_channel_socket_new so I think that's safe isn't it? > > > > Hmm, you are right, this problem is only exist for the migration incoming fd, thanks. > > > Dave > > > >> Thanks, > >> Hailiang > >> > >> On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote: > >>> Thank you。 > >>> > >>> I have test aready。 > >>> > >>> When the Primary Node panic,the Secondary Node qemu hang at the same place。 > >>> > >>> Incorrding http://wiki.qemu-project.org/Features/COLO ,kill Primary Node qemu will not produce the problem,but Primary Node panic can。 > >>> > >>> I think due to the feature of channel does not support QIO_CHANNEL_FEATURE_SHUTDOWN. > >>> > >>> > >>> when failover,channel_shutdown could not shut down the channel. > >>> > >>> > >>> so the colo_process_incoming_thread will hang at recvmsg. > >>> > >>> > >>> I test a patch: > >>> > >>> > >>> diff --git a/migration/socket.c b/migration/socket.c > >>> > >>> > >>> index 13966f1..d65a0ea 100644 > >>> > >>> > >>> --- a/migration/socket.c > >>> > >>> >
Re: [Qemu-devel] 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang
Hi, On 2017/3/22 9:42, wang.guan...@zte.com.cn wrote: diff --git a/migration/socket.c b/migration/socket.c index 13966f1..d65a0ea 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc, } trace_migration_socket_incoming_accepted() qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) migration_channel_process_incoming(migrate_get_current(), QIO_CHANNEL(sioc)) object_unref(OBJECT(sioc)) Is this patch ok? Yes, i think this works, but a better way maybe to call qio_channel_set_feature() in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the socket accept fd, Or fix it by this: diff --git a/io/channel-socket.c b/io/channel-socket.c index f546c68..ce6894c 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, Error **errp) { QIOChannelSocket *cioc; - -cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); -cioc->fd = -1; + +cioc = qio_channel_socket_new(); cioc->remoteAddrLen = sizeof(ioc->remoteAddr); cioc->localAddrLen = sizeof(ioc->localAddr); Thanks, Hailiang I have test it . The test could not hang any more. 原始邮件 发件人: <zhang.zhanghaili...@huawei.com> 收件人: <dgilb...@redhat.com> <berra...@redhat.com> 抄送人: <xuqu...@huawei.com> <qemu-devel@nongnu.org> <zhangchen.f...@cn.fujitsu.com>王广10165992 日 期 :2017年03月22日 09:11 主 题 :Re: [Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: > * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote: >> Hi, >> >> Thanks for reporting this, and i confirmed it in my test, and it is a bug. >> >> Though we tried to call qemu_file_shutdown() to shutdown the related fd, in >> case COLO thread/incoming thread is stuck in read/write() while do failover, >> but it didn't take effect, because all the fd used by COLO (also migration) >> has been wrapped by qio channel, and it will not call the shutdown API if >> we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN). >> >> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com> >> >> I doubted migration cancel has the same problem, it may be stuck in write() >> if we tried to cancel migration. >> >> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) >> { >> qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") >> migration_channel_connect(s, ioc, NULL) >> ... ... >> We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) above, >> and the >> migrate_fd_cancel() >> { >> ... ... >> if (s->state == MIGRATION_STATUS_CANCELLING && f) { >> qemu_file_shutdown(f) --> This will not take effect. No ? >> } >> } > > (cc'd in Daniel Berrange). > I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) at the > top of qio_channel_socket_new so I think that's safe isn't it? > Hmm, you are right, this problem is only exist for the migration incoming fd, thanks. > Dave > >> Thanks, >> Hailiang >> >> On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote: >>> Thank you。 >>> >>> I have test aready。 >>> >>> When the Primary Node panic,the Secondary Node qemu hang at the same place。 >>> >>> Incorrding http://wiki.qemu-project.org/Features/COLO ,kill Primary Node qemu will not produce the problem,but Primary Node panic can。 >>> >>> I think due to the feature of channel does not support QIO_CHANNEL_FEATURE_SHUTDOWN. >>> >>> >>> when failover,channel_shutdown could not shut down the channel. >>> >>> >>> so the colo_process_incoming_thread will hang at recvmsg. >>> >>> >>> I test a patch: >>> >>> >>> diff --git a/migration/socket.c b/migration/socket.c >>> >>> >>> index 13966f1..d65a0ea 100644 >>> >>> >>> --- a/migration/socket.c >>> >>> >>> +++ b/migration/socket.c >>> >>> >>> @@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc, >>> >>> >>>} >>> >>> >>> >>> >>> >>>trace_migration_socket_incoming_accepted() >>> >>> >>> >>> >>> >>>qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") >>> >>> >>> +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) >>> >>> >>>migration_channel_process_incoming(migrate_get_current(), >>> >>> >>> QIO_CHANNEL(sioc)) >>> >>> >>>object_unref(OBJECT(sioc)) >>> >>> >>> >>> >>> My test will not hang any more. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> 原始邮件 >>> >>> >>> >>> 发件人: <zhangchen.f...@cn.fujitsu..com> >>> 收件人:王广10165992
Re: [Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang
On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote: Hi, Thanks for reporting this, and i confirmed it in my test, and it is a bug. Though we tried to call qemu_file_shutdown() to shutdown the related fd, in case COLO thread/incoming thread is stuck in read/write() while do failover, but it didn't take effect, because all the fd used by COLO (also migration) has been wrapped by qio channel, and it will not call the shutdown API if we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN). Cc: Dr. David Alan Gilbert <dgilb...@redhat.com> I doubted migration cancel has the same problem, it may be stuck in write() if we tried to cancel migration. void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) { qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing"); migration_channel_connect(s, ioc, NULL); ... ... We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) above, and the migrate_fd_cancel() { ... ... if (s->state == MIGRATION_STATUS_CANCELLING && f) { qemu_file_shutdown(f); --> This will not take effect. No ? } } (cc'd in Daniel Berrange). I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); at the top of qio_channel_socket_new; so I think that's safe isn't it? Hmm, you are right, this problem is only exist for the migration incoming fd, thanks. Dave Thanks, Hailiang On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote: Thank you。 I have test aready。 When the Primary Node panic,the Secondary Node qemu hang at the same place。 Incorrding http://wiki.qemu-project.org/Features/COLO ,kill Primary Node qemu will not produce the problem,but Primary Node panic can。 I think due to the feature of channel does not support QIO_CHANNEL_FEATURE_SHUTDOWN. when failover,channel_shutdown could not shut down the channel. so the colo_process_incoming_thread will hang at recvmsg. I test a patch: diff --git a/migration/socket.c b/migration/socket.c index 13966f1..d65a0ea 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc, } trace_migration_socket_incoming_accepted() qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) migration_channel_process_incoming(migrate_get_current(), QIO_CHANNEL(sioc)) object_unref(OBJECT(sioc)) My test will not hang any more. 原始邮件 发件人: <zhangchen.f...@cn.fujitsu.com> 收件人:王广10165992 <zhang.zhanghaili...@huawei.com> 抄送人: <qemu-devel@nongnu.org> <zhangchen.f...@cn.fujitsu.com> 日 期 :2017年03月21日 15:58 主 题 :Re: [Qemu-devel] 答复: Re: [BUG]COLO failover hang Hi,Wang. You can test this branch: https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk and please follow wiki ensure your own configuration correctly. http://wiki.qemu-project.org/Features/COLO Thanks Zhang Chen On 03/21/2017 03:27 PM, wang.guan...@zte.com.cn wrote: > > hi. > > I test the git qemu master have the same problem. > > (gdb) bt > > #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 > > #1 0x7f658e4aa0c2 in qio_channel_read > (ioc=ioc@entry=0x7f65911b4e50, buf=buf@entry=0x7f65907cb838 "", > buflen=buflen@entry=32768, errp=errp@entry=0x0) at io/channel.c:114 > > #2 0x7f658e3ea990 in channel_get_buffer (opaque=<optimized out>, > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at > migration/qemu-file-channel.c:78 > > #3 0x7f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at > migration/qemu-file.c:295 > > #4 0x7f658e3ea2e1 in qemu_peek_byte (f=f@entry=0x7f65907cb800, > offset=offset@entry=0) at migration/qemu-file.c:555 > > #5 0x7f658e3ea34b in qemu_get_byte (f=f@entry=0x7f65907cb800) at > migration/qemu-file.c:568 > > #6 0x7f658e3ea552 in qemu_get_be32 (f=f@entry=0x7f65907cb800) at > migration/qemu-file.c:648 > > #7 0x7f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, > errp=errp@entry=0x7f64ef3fd9b0) at migration/colo.c:244 > > #8 0x7f658e3e681e in colo_receive_check_message (f=<optimized > out>, expect_msg=expect_msg@entry=COLO_MESSAGE_VMSTATE_SEND, > errp=errp@entry=0x7f64ef3fda08) > > at migration/colo.c:264 > > #9 0x7f658e3e740e in colo_process_incoming_thread > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577 > > #10 0x7f658be09df3 in start_thread () from /lib64/libpthread.so.0 > > #11 0x7f65881983ed in clone () from /lib64/libc.so.6 > > (gdb) p ioc->name > > $2 = 0x7f658ff7d5c0 "migration
Re: [Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang
Hi, Thanks for reporting this, and i confirmed it in my test, and it is a bug. Though we tried to call qemu_file_shutdown() to shutdown the related fd, in case COLO thread/incoming thread is stuck in read/write() while do failover, but it didn't take effect, because all the fd used by COLO (also migration) has been wrapped by qio channel, and it will not call the shutdown API if we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN). Cc: Dr. David Alan GilbertI doubted migration cancel has the same problem, it may be stuck in write() if we tried to cancel migration. void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) { qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing"); migration_channel_connect(s, ioc, NULL); ... ... We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) above, and the migrate_fd_cancel() { ... ... if (s->state == MIGRATION_STATUS_CANCELLING && f) { qemu_file_shutdown(f); --> This will not take effect. No ? } } Thanks, Hailiang On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote: Thank you。 I have test aready。 When the Primary Node panic,the Secondary Node qemu hang at the same place。 Incorrding http://wiki.qemu-project.org/Features/COLO ,kill Primary Node qemu will not produce the problem,but Primary Node panic can。 I think due to the feature of channel does not support QIO_CHANNEL_FEATURE_SHUTDOWN. when failover,channel_shutdown could not shut down the channel. so the colo_process_incoming_thread will hang at recvmsg. I test a patch: diff --git a/migration/socket.c b/migration/socket.c index 13966f1..d65a0ea 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc, } trace_migration_socket_incoming_accepted() qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) migration_channel_process_incoming(migrate_get_current(), QIO_CHANNEL(sioc)) object_unref(OBJECT(sioc)) My test will not hang any more. 原始邮件 发件人: <zhangchen.f...@cn.fujitsu.com> 收件人:王广10165992 <zhang.zhanghaili...@huawei.com> 抄送人: <qemu-devel@nongnu.org> <zhangchen.f...@cn.fujitsu.com> 日 期 :2017年03月21日 15:58 主 题 :Re: [Qemu-devel] 答复: Re: [BUG]COLO failover hang Hi,Wang. You can test this branch: https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk and please follow wiki ensure your own configuration correctly. http://wiki.qemu-project.org/Features/COLO Thanks Zhang Chen On 03/21/2017 03:27 PM, wang.guan...@zte.com.cn wrote: > > hi. > > I test the git qemu master have the same problem. > > (gdb) bt > > #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 > > #1 0x7f658e4aa0c2 in qio_channel_read > (ioc=ioc@entry=0x7f65911b4e50, buf=buf@entry=0x7f65907cb838 "", > buflen=buflen@entry=32768, errp=errp@entry=0x0) at io/channel.c:114 > > #2 0x7f658e3ea990 in channel_get_buffer (opaque=<optimized out>, > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at > migration/qemu-file-channel.c:78 > > #3 0x7f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at > migration/qemu-file.c:295 > > #4 0x7f658e3ea2e1 in qemu_peek_byte (f=f@entry=0x7f65907cb800, > offset=offset@entry=0) at migration/qemu-file.c:555 > > #5 0x7f658e3ea34b in qemu_get_byte (f=f@entry=0x7f65907cb800) at > migration/qemu-file.c:568 > > #6 0x7f658e3ea552 in qemu_get_be32 (f=f@entry=0x7f65907cb800) at > migration/qemu-file.c:648 > > #7 0x7f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, > errp=errp@entry=0x7f64ef3fd9b0) at migration/colo.c:244 > > #8 0x7f658e3e681e in colo_receive_check_message (f=<optimized > out>, expect_msg=expect_msg@entry=COLO_MESSAGE_VMSTATE_SEND, > errp=errp@entry=0x7f64ef3fda08) > > at migration/colo.c:264 > > #9 0x7f658e3e740e in colo_process_incoming_thread > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577 > > #10 0x7f658be09df3 in start_thread () from /lib64/libpthread.so.0 > > #11 0x7f65881983ed in clone () from /lib64/libc.so.6 > > (gdb) p ioc->name > > $2 = 0x7f658ff7d5c0 "migration-socket-incoming" > > (gdb) p ioc->featuresDo not support QIO_CHANNEL_FEATURE_SHUTDOWN > > $3 = 0 > > > (gdb) bt > > #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, > condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 > > #1 0x7fdcc6966350 in g_main_dispatch (context=<optimized out>) at > gmain.c:3054 > > #2 g_main_context_dispatch (context=<optimized out>, > context@entry=0x7fdccce9f590) at gmain.c:3630 > > #3
Re: [Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang
Hi, On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote: Thank you。 I have test aready。 When the Primary Node panic,the Secondary Node qemu hang at the same place。 Incorrding http://wiki.qemu-project.org/Features/COLO ,kill Primary Node qemu will not produce the problem,but Primary Node panic can。 I think due to the feature of channel does not support QIO_CHANNEL_FEATURE_SHUTDOWN. Yes, you are right, when we do failover for primary/secondary VM, we will shutdown the related fd in case it is stuck in the read/write fd. It seems that you didn't follow the above introduction exactly to do the test. Could you share your test procedures ? Especially the commands used in the test. Thanks, Hailiang when failover,channel_shutdown could not shut down the channel. so the colo_process_incoming_thread will hang at recvmsg. I test a patch: diff --git a/migration/socket.c b/migration/socket.c index 13966f1..d65a0ea 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc, } trace_migration_socket_incoming_accepted() qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) migration_channel_process_incoming(migrate_get_current(), QIO_CHANNEL(sioc)) object_unref(OBJECT(sioc)) My test will not hang any more. 原始邮件 发件人: <zhangchen.f...@cn.fujitsu.com> 收件人:王广10165992 <zhang.zhanghaili...@huawei.com> 抄送人: <qemu-devel@nongnu.org> <zhangchen.f...@cn.fujitsu.com> 日 期 :2017年03月21日 15:58 主 题 :Re: [Qemu-devel] 答复: Re: [BUG]COLO failover hang Hi,Wang. You can test this branch: https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk and please follow wiki ensure your own configuration correctly. http://wiki.qemu-project.org/Features/COLO Thanks Zhang Chen On 03/21/2017 03:27 PM, wang.guan...@zte.com.cn wrote: > > hi. > > I test the git qemu master have the same problem. > > (gdb) bt > > #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 > > #1 0x7f658e4aa0c2 in qio_channel_read > (ioc=ioc@entry=0x7f65911b4e50, buf=buf@entry=0x7f65907cb838 "", > buflen=buflen@entry=32768, errp=errp@entry=0x0) at io/channel.c:114 > > #2 0x7f658e3ea990 in channel_get_buffer (opaque=<optimized out>, > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at > migration/qemu-file-channel.c:78 > > #3 0x7f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at > migration/qemu-file.c:295 > > #4 0x7f658e3ea2e1 in qemu_peek_byte (f=f@entry=0x7f65907cb800, > offset=offset@entry=0) at migration/qemu-file.c:555 > > #5 0x7f658e3ea34b in qemu_get_byte (f=f@entry=0x7f65907cb800) at > migration/qemu-file.c:568 > > #6 0x7f658e3ea552 in qemu_get_be32 (f=f@entry=0x7f65907cb800) at > migration/qemu-file.c:648 > > #7 0x7f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, > errp=errp@entry=0x7f64ef3fd9b0) at migration/colo.c:244 > > #8 0x7f658e3e681e in colo_receive_check_message (f=<optimized > out>, expect_msg=expect_msg@entry=COLO_MESSAGE_VMSTATE_SEND, > errp=errp@entry=0x7f64ef3fda08) > > at migration/colo.c:264 > > #9 0x7f658e3e740e in colo_process_incoming_thread > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577 > > #10 0x7f658be09df3 in start_thread () from /lib64/libpthread.so.0 > > #11 0x7f65881983ed in clone () from /lib64/libc.so.6 > > (gdb) p ioc->name > > $2 = 0x7f658ff7d5c0 "migration-socket-incoming" > > (gdb) p ioc->featuresDo not support QIO_CHANNEL_FEATURE_SHUTDOWN > > $3 = 0 > > > (gdb) bt > > #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, > condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 > > #1 0x7fdcc6966350 in g_main_dispatch (context=<optimized out>) at > gmain.c:3054 > > #2 g_main_context_dispatch (context=<optimized out>, > context@entry=0x7fdccce9f590) at gmain.c:3630 > > #3 0x7fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 > > #4 os_host_main_loop_wait (timeout=<optimized out>) at > util/main-loop.c:258 > > #5 main_loop_wait (nonblocking=nonblocking@entry=0) at > util/main-loop.c:506 > > #6 0x7fdccb526187 in main_loop () at vl.c:1898 > > #7 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized > out>) at vl.c:4709 > > (gdb) p ioc->features > > $1 = 6 > > (gdb) p ioc->name > > $2 = 0x7fdcce1b1ab0 "migration-socket-listener" > > > May be socket_accept_incoming_migration should > call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? > > > thank you. > > > > > > 原始邮件 > *发件人:*<zhangchen.f...@cn.fujitsu.com> > *收件人:*王广10165992<qemu-devel@nongnu.org> > *抄送人:*<zhangchen.f...@cn.fujitsu.com><zhang.zhanghaili...@huawei.com> > *日 期
Re: [Qemu-devel] [PATCHV2] COLO: COLO-FT.txt err
Hi, Thanks for reporting this. This patch has wrong format and it seems that you pasted the content to this email directly, please follow the introduction bellow to submit a patch: http://wiki.qemu-project.org/Contribute/SubmitAPatch Thanks, Hailiang On 2017/3/18 13:43, wang.guan...@zte.com.cn wrote: This is an error in COLO-FT.txt. secondeary-disk0 should be secondary-disk0. Signed-off-by: Guang Wang<wang.guan...@zte.com.cn> diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index e289be2..754efda 100644 --- a/docs/COLO-FT.txt +++ b/docs/COLO-FT.txt @@ -139,7 +139,7 @@ Secondary: { 'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': '8889'} } } } -{'execute': 'nbd-server-add', 'arguments': {'device': 'secondeary-disk0', 'writable': true } } +{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 'writable': true } } Note: a. The qmp command nbd-server-start and nbd-server-add must be run 原始邮件 发件人:王广10165992 收件人: <zhang.zhanghaili...@huawei.com> 抄送人: <qemu-devel@nongnu.org> <zhangchen.f...@cn.fujitsu.com> 日 期 :2017年03月18日 13:38 主 题 :[Qemu-devel] [PATCH] COLO: COLO-FT.txt err This is an error in COLO-FT.txt. secondeary-disk0 should be secondary-disk0. Signed-off-by: Guang Wang<wang.guan...@zte.com.cn> diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index e289be2..754efda 100644 --- a/docs/COLO-FT.txt +++ b/docs/COLO-FT.txt @@ -139,7 +139,7 @@ Secondary: { 'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': '8889'} } } } -{'execute': 'nbd-server-add', 'arguments': {'device': 'secondeary-disk0', 'writable': true } } +{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 'writable': true } }
Re: [Qemu-devel] [PATCH] virtio-serial-bus: Delete timer from list before free it
ping... ? On 2017/3/8 4:01, Amit Shah wrote: On (Mon) 06 Mar 2017 [11:29:31], zhanghailiang wrote: Signed-off-by: zhanghailiangReviewed-by: Amit Shah Michael, please pick this up. Amit
Re: [Qemu-devel] [PATCH v3 5/6] replication: Implement block replication for shared disk case
Hi Stefan, Sorry for the delayed reply. On 2017/2/28 1:37, Stefan Hajnoczi wrote: On Fri, Jan 20, 2017 at 11:47:59AM +0800, zhanghailiang wrote: Just as the scenario of non-shared disk block replication, we are going to implement block replication from many basic blocks that are already in QEMU. The architecture is: virtio-blk || .-- / || | Secondary / || '-- /|| virtio-blk / || | | || replication(5) |NBD > NBD (2) | | client ||server ---> hidden disk <-- active disk(4) | ^ || | | replication(1) || | | | || | | +-' || | (3) |drive-backup sync=none || | . | +-+ || | Primary | | | || backing| ' | | || | V | | +---+| | shared disk | <--+ +---+ 1) Primary writes will read original data and forward it to Secondary QEMU. 2) The hidden-disk is created automatically. It buffers the original content that is modified by the primary VM. It should also be an empty disk, and the driver supports bdrv_make_empty() and backing file. 3) Primary write requests will be written to Shared disk. 4) Secondary write requests will be buffered in the active disk and it will overwrite the existing sector content in the buffer. Signed-off-by: zhanghailiangSigned-off-by: Wen Congyang Signed-off-by: Zhang Chen Are there any restrictions on the shared disk? For example the -drive cache= mode must be 'none'. If the cache mode isn't 'none' the secondary host might have old data in the host page cache. The While do checkpoint, we will call vm_stop(), in which, the bdrv_flush_all() will be called, is it enough ? Secondary QEMU would have an inconsistent view of the shared disk. Are image file formats like qcow2 supported for the shared disk? Extra In the above scenario, it has no limitation of formats for the shared disk. steps are required to achieve consistency, see bdrv_invalidate_cache(). Hmm, in that case, we should call bdrv_invalidate_cache_all() while checkpoint. Thanks, Hailiang Stefan
Re: [Qemu-devel] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options
On 2017/2/28 1:10, Stefan Hajnoczi wrote: On Fri, Jan 20, 2017 at 11:47:56AM +0800, zhanghailiang wrote: @@ -119,12 +136,31 @@ static int replication_open(BlockDriverState *bs, QDict *options, "The option mode's value should be primary or secondary"); goto fail; } +s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK, + false); +if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) { +shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID); +if (!shared_disk_id) { +error_setg(_err, "Missing shared disk blk option"); +goto fail; +} +s->shared_disk_id = g_strdup(shared_disk_id); +blk = blk_by_name(s->shared_disk_id); +if (!blk) { +error_setg(_err, "There is no %s block", s->shared_disk_id); +goto fail; +} +/* We can't access root member of BlockBackend directly */ +tmp_bs = blk_bs(blk); +s->primary_disk = QLIST_FIRST(_bs->parents); Why is this necessary? We already have the BlockBackend for the primary disk. I'm not sure why the BdrvChild is needed. Er, the helper backup_job_create() needs the BlockDriverState for the primary disk, besides, we want to make it same with 'secondary_disk' in struct BDRVReplicationState. Thanks. Hailiang Stefan
Re: [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case
On 2017/2/28 0:46, Stefan Hajnoczi wrote: On Fri, Jan 20, 2017 at 11:47:55AM +0800, zhanghailiang wrote: +Secondary: + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\ +backing.driver=raw,backing.file.filename=1.raw \ + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\ +file.driver=qcow2,top-id=active-disk0,\ +file.file.filename=/mnt/ramfs/active_disk.img,\ +file.backing=hidden_disk0,shared-disk=on I don't see where this patch series adds a -drive shared-disk=on option. Did I miss it? In patch 2, we add this option. Thanks.
Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
On 2017/3/1 0:14, Andrea Arcangeli wrote: Hello, On Tue, Feb 28, 2017 at 09:48:26AM +0800, Hailiang Zhang wrote: Yes, for current implementing of live snapshot, it supports tcg, but does not support kvm mode, the reason i have mentioned above, if you try to implement it, i think you need to start from userfaultfd supporting KVM. There is scenario for it, But I'm blocked by other things these days. I'm glad to discuss with you if you planed to do it. Yes, there were other urgent userfaultfd features needed by QEMU and CRIU queued for merging (hugetlbfs/shmem/non-cooperative support) and they're all included upstream now. Now that such work is finished, fixing the WP support to work with KVM and to provide full accuracy will be the next thing to do. Great, looking forward to it. thanks. Thanks, Andrea .
Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
Hi, On 2017/2/27 23:37, Christian Pinto wrote: Hello Hailiang, are there any updates on this patch series? Are you planning to release a new version? No, userfaultfd still does not support write-protect for KVM. You can see the newest discussion about it here: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg01127.html You say there are some issues with the current snapshot-v2 version, which issues were you referring to? On my side the only problem I have seen was that the live snapshot was not working on ARMv8, but I have fixed that and managed to successfully snapshot and restore a QEMU ARMv8 tcg machine on an ARMv8 host. I will gladly contribute with these fixes once you will release a new version of the patches. Yes, for current implementing of live snapshot, it supports tcg, but does not support kvm mode, the reason i have mentioned above, if you try to implement it, i think you need to start from userfaultfd supporting KVM. There is scenario for it, But I'm blocked by other things these days. I'm glad to discuss with you if you planed to do it. Thanks. Hailiang Thanks a lot, Christian On 20/08/2016 08:31, Hailiang Zhang wrote: Hi, I updated this series, but didn't post it, because there are some problems while i tested the snapshot function. I didn't know if it is the userfaultfd issue or not. I don't have time to investigate it this month. I have put them in github https://github.com/coloft/qemu/tree/snapshot-v2 Anyone who want to test and modify it are welcomed! Besides, will you join the linuxcon or KVM forum in Canada ? I wish to see you there if you join the conference ;) Thanks, Hailiang On 2016/8/18 23:56, Andrea Arcangeli wrote: Hello everyone, I've an aa.git tree uptodate on the master & userfault branch (master includes other pending VM stuff, userfault branch only contains userfault enhancements): https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault I didn't have time to test KVM live memory snapshot on it yet as I'm still working to improve it. Did anybody test it? However I'd be happy to take any bugreports and quickly solve anything that isn't working right with the shadow MMU. I got positive report already for another usage of the uffd WP support: https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f The last few things I'm working on to finish the WP support are: 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a vma->vm_flags with VM_UFFD_WP set, which swap entries were generated while the pte was wrprotected. 2) to avoid all false positives the equivalent of pte_mksoft_dirty is needed too... and that requires spare software bits on the pte which are available on x86. I considered also taking over the soft_dirty bit but then you couldn't do checkpoint restore of a JIT/to-native compiler that uses uffd WP support so it wasn't ideal. Perhaps it would be ok as an incremental patch to make the two options mutually exclusive to defer the arch changes that pte_mkuffd_wp would require for later. 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a cow in userfaultfd_writeprotect. 4) WP selftest In theory things should work ok already if the userland code is tolerant against false positives through swap and after fork() and KSM. For an usage like snapshotting false positives shouldn't be an issue (it'll just run slower if you swap in the worst case), and point 3) above also isn't an issue because it's going to register into uffd with WP only. The current status includes: 1) WP support for anon (with false positives.. work in progress) 2) MISSING support for tmpfs and hugetlbfs 3) non cooperative support Thanks, Andrea . .
Re: [Qemu-devel] [PATCH V10 0/2] Add new qmp commands to suppurt Xen COLO
On 2017/2/28 6:52, Stefano Stabellini wrote: On Mon, 27 Feb 2017, Eric Blake wrote: On 02/27/2017 04:31 PM, Stefano Stabellini wrote: Eric, are you OK with this series going upstream? If so, do you want me to send the pull request for it or are you going to handle it? Both patches have my R-b, but MAINTAINERS suggests the pull request should go through COLO Framework (zhanghailiang) or maybe Xen (you). I am fine either way. zhanghailiang? Yes, I'm fine if you could give a pull request for this series. Thanks. .
Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table
On 2017/2/27 17:05, Jason Wang wrote: On 2017年02月27日 14:53, Hailiang Zhang wrote: I think the issue is that your code can not differ A from B. We have a parameter 'fin_ack_seq' recording the sequence of 'FIN=1,ACK=1,seq=w,ack=u+1' and if the ack value from the opposite side is is 'w+1', we can consider this connection is closed, no ? Hi Jason, Thanks very much for your patience. Let's see what happens, consider VM is doing active close (reuse the figure above): (We didn't support tracking the connection start by the VM in current rewriter codes. I mean the Client side is VM). Your figure is not quite correct, the process should be: (VM) Client:Server: ESTABLISHED| | | -> FIN=1,seq=u -> | FIN_WAIT_1 | | | <- ACK=1,seq=v,ack=u+1 <- | FINA_WAIT_2| |CLOSE_WAIT | <- FIN=1,ACK=1,seq=w,ack=u+1<-| handle_secondary(): fin_ack_seq = w tcp_state = TCPS_LAST_ACK | |LAST+ACK | -> ACK=1,seq=u+1,ack=w+1| TIME_WAIT | |CLOSED CLOSED | | handle_primary(): if (ack = fin_ack_seq + 1) g_hash_table_remove() (VM) Client:Server: ESTABLISHED| | | -> FIN=1,seq=u -> | handle_secondary(): fin_ack_seq = u tcp_state = TCPS_LAST_ACK FIN_WAIT_1 | | | <- ACK=1,seq=v,ack=u+1 <- | handle_primary(): fin_ack_seq = ack + 1 g_hash_table_remove() But we probably want it to be removed in TIME_WAIT_CLOSED. Yes, we should removed it after 2MSL, because the last the sever side may not get the 'ACK=1,seq=v,ack=u+1' packet, and it will resend the 'FIN=1,ACK=1,seq=w,ack=u+1'. Thanks. Thanks .
Re: [Qemu-devel] [PATCH 2/3] COLO-compare: Optimize colo_packet_compare_common
On 2017/2/27 15:34, Zhang Chen wrote: On 02/27/2017 03:28 PM, Hailiang Zhang wrote: On 2017/2/27 15:03, Zhang Chen wrote: On 02/25/2017 02:58 PM, Hailiang Zhang wrote: On 2017/2/25 11:32, Zhang Chen wrote: Add offset args for colo_packet_compare_common, optimize colo_packet_compare_icmp() and colo_packet_compare_udp() just compare the IP payload. Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com> --- net/colo-compare.c | 28 +--- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index e75f0ae..9853232 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int mode) * return:0 means packet same *> 0 || < 0 means packet different */ -static int colo_packet_compare_common(Packet *ppkt, Packet *spkt) +static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) { trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src), inet_ntoa(ppkt->ip->ip_dst), spkt->size, @@ -188,7 +188,8 @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt) inet_ntoa(spkt->ip->ip_dst)); if (ppkt->size == spkt->size) { -return memcmp(ppkt->data, spkt->data, spkt->size); +return memcmp(ppkt->data + offset, spkt->data + offset, + spkt->size - offset); } else { trace_colo_compare_main("Net packet size are not the same"); return -1; @@ -237,8 +238,7 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) spkt->ip->ip_sum = ppkt->ip->ip_sum; } -res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN, -(spkt->size - ETH_HLEN)); +res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN); For tcp packets check, why not ignore the ip headers, just like udp packets check ? Besides, here, can we compare the checksum stored in headers of tcp and udp before call colo_packet_compare_common(), which i think will improve the comparing performance. That's another way to compare the packet suggested by Dr. David Alan Gilbert, It makes two packets IP header be same firstly, then compare all IP packet. This way can tell people why we ignore IP header explicitly in other packets check like udp. For performance, If we ignore the IP header that will reduce at least 20 byte not to be compared. So, ignore ip header have better comparing performance. OK, here, i think we can re-use the checksum value stored in headers of tcp or udp, comparing it first before comparing the complete payload of tcp or udp packets is another way to improve the performance, it is only 2 bytes. No, The IP header's checksum are always different, Because the IP header's ID field are always different. Not checksum in ip header, i mean checksum in tcp header or udp header. Thanks Zhang Chen Thanks. Thanks Zhang Chen Thanks. Hailiang if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { trace_colo_compare_pkt_info_src(inet_ntoa(ppkt->ip->ip_src), @@ -277,7 +277,14 @@ static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt) return -1; } -ret = colo_packet_compare_common(ppkt, spkt); +/* + * Because of ppkt and spkt are both in the same connection, + * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are + * same with spkt. In addition, IP header's Identification is a random + * field, we can handle it in IP fragmentation function later. + * So we just compare the ip payload here. + */ +ret = colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN); if (ret) { trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size); @@ -304,7 +311,14 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt) return -1; } -if (colo_packet_compare_common(ppkt, spkt)) { +/* + * Because of ppkt and spkt are both in the same connection, + * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are + * same with spkt. In addition, IP header's Identification is a random + * field, we can handle it in IP fragmentation function later. + * So we just compare the ip payload here. + */ +if (colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN)) { trace_colo_compare_icmp_miscompare("primary pkt size", ppkt->size); qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", @@ -330,7 +344,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet *ppkt) inet_ntoa(ppkt->ip->ip_dst), spkt->size, inet_ntoa(spkt->ip->ip_src), inet_ntoa(spkt->ip->ip_dst)); -return colo_packet_compare_common(ppkt, spkt); +return colo_packet_compare_common(ppkt, spkt, 0); } static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time) . .
Re: [Qemu-devel] [PATCH 2/3] COLO-compare: Optimize colo_packet_compare_common
On 2017/2/27 15:03, Zhang Chen wrote: On 02/25/2017 02:58 PM, Hailiang Zhang wrote: On 2017/2/25 11:32, Zhang Chen wrote: Add offset args for colo_packet_compare_common, optimize colo_packet_compare_icmp() and colo_packet_compare_udp() just compare the IP payload. Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com> --- net/colo-compare.c | 28 +--- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index e75f0ae..9853232 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int mode) * return:0 means packet same *> 0 || < 0 means packet different */ -static int colo_packet_compare_common(Packet *ppkt, Packet *spkt) +static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) { trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src), inet_ntoa(ppkt->ip->ip_dst), spkt->size, @@ -188,7 +188,8 @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt) inet_ntoa(spkt->ip->ip_dst)); if (ppkt->size == spkt->size) { -return memcmp(ppkt->data, spkt->data, spkt->size); +return memcmp(ppkt->data + offset, spkt->data + offset, + spkt->size - offset); } else { trace_colo_compare_main("Net packet size are not the same"); return -1; @@ -237,8 +238,7 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) spkt->ip->ip_sum = ppkt->ip->ip_sum; } -res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN, -(spkt->size - ETH_HLEN)); +res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN); For tcp packets check, why not ignore the ip headers, just like udp packets check ? Besides, here, can we compare the checksum stored in headers of tcp and udp before call colo_packet_compare_common(), which i think will improve the comparing performance. That's another way to compare the packet suggested by Dr. David Alan Gilbert, It makes two packets IP header be same firstly, then compare all IP packet. This way can tell people why we ignore IP header explicitly in other packets check like udp. For performance, If we ignore the IP header that will reduce at least 20 byte not to be compared. So, ignore ip header have better comparing performance. OK, here, i think we can re-use the checksum value stored in headers of tcp or udp, comparing it first before comparing the complete payload of tcp or udp packets is another way to improve the performance, it is only 2 bytes. Thanks. Thanks Zhang Chen Thanks. Hailiang if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { trace_colo_compare_pkt_info_src(inet_ntoa(ppkt->ip->ip_src), @@ -277,7 +277,14 @@ static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt) return -1; } -ret = colo_packet_compare_common(ppkt, spkt); +/* + * Because of ppkt and spkt are both in the same connection, + * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are + * same with spkt. In addition, IP header's Identification is a random + * field, we can handle it in IP fragmentation function later. + * So we just compare the ip payload here. + */ +ret = colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN); if (ret) { trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size); @@ -304,7 +311,14 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt) return -1; } -if (colo_packet_compare_common(ppkt, spkt)) { +/* + * Because of ppkt and spkt are both in the same connection, + * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are + * same with spkt. In addition, IP header's Identification is a random + * field, we can handle it in IP fragmentation function later. + * So we just compare the ip payload here. + */ +if (colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN)) { trace_colo_compare_icmp_miscompare("primary pkt size", ppkt->size); qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", @@ -330,7 +344,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet *ppkt) inet_ntoa(ppkt->ip->ip_dst), spkt->size, inet_ntoa(spkt->ip->ip_src), inet_ntoa(spkt->ip->ip_dst)); -return colo_packet_compare_common(ppkt, spkt); +return colo_packet_compare_common(ppkt, spkt, 0); } static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time) .
Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table
On 2017/2/27 13:35, Jason Wang wrote: On 2017年02月27日 12:09, Hailiang Zhang wrote: On 2017/2/27 11:40, Jason Wang wrote: On 2017年02月27日 11:11, Hailiang Zhang wrote: On 2017/2/23 12:16, Jason Wang wrote: On 2017年02月22日 16:51, Hailiang Zhang wrote: On 2017/2/22 16:45, Hailiang Zhang wrote: On 2017/2/22 16:07, Jason Wang wrote: On 2017年02月22日 11:46, zhanghailiang wrote: After a net connection is closed, we didn't clear its releated resources in connection_track_table, which will lead to memory leak. Not a real leak but would lead reset of hash table if too many closed connections. Yes, you are right, there will be lots of stale connection data in hash table if we don't remove it while it is been closed. Which Ok, so let's come up with a better title of the patch. OK. Let't track the state of net connection, if it is closed, its related resources will be cleared up. The issue is the state were tracked partially, do we need a full state machine here? Not, IMHO, we only care about the last state of it, because, we will do nothing even if we track the intermedial states. Well, you care at least syn state too. Without a complete state machine, it's very hard to track even partial state I believe. And you will fail to track some state transition for sure which makes the code fragile. Agree, but here things are a little different. There are some extreme cases that we may can't track the complete process of closing connection. For example (I have explained that in the bellow, it seems that you didn't got it ;) ). If VM is running before we want to make it goes into COLO FT state, there maybe some connections exist already, in extreme case, VM is going into COLO state while some connections are in half closing state, we can only track the bellow half closing state in filter-rewriter and colo compare object. [...] Er, here we track the last two states 'FIN=1, ACK=1' and 'ACK=1' ( which asks the 'FIN=1,ACK=1' packet, We will remove the connection while got the 'ACK=1' packet, so is it enough ? But the connection is not closed in fact, no? It's legal for remote to continue sending tons of packet to us even after this. Er, I'm a little confused, Here, for server side, i think after got the 'ACK=1,seq=u+1,ack=w+1', it is closed, so i remove it from hash table, wrong ? Client:Server: ESTABLISHED| | | -> FIN=1,seq=u -> | This is case A and ACK should be set in this segment too. FIN_WAIT_1 | | | <- ACK=1,seq=v,ack=u+1 <- | FINA_WAIT_2| |CLOSE_WAIT | <- FIN=1,ACK=1,seq=w,ack=u+1<-| | |LAST+ACK This is case B. | -> ACK=1,seq=u+1,ack=w+1| TIME_WAIT | |CLOSED CLOSED | | I think the issue is that your code can not differ A from B. We have a parameter 'fin_ack_seq' recording the sequence of 'FIN=1,ACK=1,seq=w,ack=u+1' and if the ack value from the opposite side is is 'w+1', we can consider this connection is closed, no ? Thanks What's more I don't think we can decide passive or active close by: +if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) { Since both cases will send FIN,ACK for sure. I didn't quite understand, here we have tracked the closing request no matter it is from the server side (passive close ?) or client side ( active close ?). You can refer to the comment in codes, 'Case 1' and 'Case 2' comments. I think you need differ them since passive close is much simpler, and it seems that your code may work only in this case. Here, it seems that we can't track one case which both sides send the closing requests at the same time, in that case, there are only 'FIN=1' and 'ACK=1' packets. Yes, RFC allows this. Hmm, I'd like to remove this patch from this series, And send it as a single patch after all the above questions been solved. How about the other patches ? Looks good except for the compiling issue of patch 3. OK, i will fix it in version 3. Thanks Thanks Thanks, Hailiang Thanks Thanks. Hailiang [...] .
Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table
On 2017/2/27 11:40, Jason Wang wrote: On 2017年02月27日 11:11, Hailiang Zhang wrote: On 2017/2/23 12:16, Jason Wang wrote: On 2017年02月22日 16:51, Hailiang Zhang wrote: On 2017/2/22 16:45, Hailiang Zhang wrote: On 2017/2/22 16:07, Jason Wang wrote: On 2017年02月22日 11:46, zhanghailiang wrote: After a net connection is closed, we didn't clear its releated resources in connection_track_table, which will lead to memory leak. Not a real leak but would lead reset of hash table if too many closed connections. Yes, you are right, there will be lots of stale connection data in hash table if we don't remove it while it is been closed. Which Ok, so let's come up with a better title of the patch. OK. Let't track the state of net connection, if it is closed, its related resources will be cleared up. The issue is the state were tracked partially, do we need a full state machine here? Not, IMHO, we only care about the last state of it, because, we will do nothing even if we track the intermedial states. Well, you care at least syn state too. Without a complete state machine, it's very hard to track even partial state I believe. And you will fail to track some state transition for sure which makes the code fragile. Agree, but here things are a little different. There are some extreme cases that we may can't track the complete process of closing connection. For example (I have explained that in the bellow, it seems that you didn't got it ;) ). If VM is running before we want to make it goes into COLO FT state, there maybe some connections exist already, in extreme case, VM is going into COLO state while some connections are in half closing state, we can only track the bellow half closing state in filter-rewriter and colo compare object. Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> --- net/colo.h| 4 +++ net/filter-rewriter.c | 70 +-- 2 files changed, 67 insertions(+), 7 deletions(-) diff --git a/net/colo.h b/net/colo.h index 7c524f3..cd9027f 100644 --- a/net/colo.h +++ b/net/colo.h @@ -18,6 +18,7 @@ #include "slirp/slirp.h" #include "qemu/jhash.h" #include "qemu/timer.h" +#include "slirp/tcp.h" #define HASHTABLE_MAX_SIZE 16384 @@ -69,6 +70,9 @@ typedef struct Connection { * run once in independent tcp connection */ int syn_flag; + +int tcp_state; /* TCP FSM state */ +tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */ } Connection; uint32_t connection_key_hash(const void *opaque); diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index c4ab91c..7e7ec35 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt) } /* handle tcp packet from primary guest */ -static int handle_primary_tcp_pkt(NetFilterState *nf, +static int handle_primary_tcp_pkt(RewriterState *rf, Connection *conn, - Packet *pkt) + Packet *pkt, ConnectionKey *key) { struct tcphdr *tcp_pkt; @@ -97,15 +97,45 @@ static int handle_primary_tcp_pkt(NetFilterState *nf, tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset); net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +/* + * Case 1: + * The *server* side of this connect is VM, *client* tries to close + * the connection. + * + * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1' + * packet from server side. From this point, we can ensure that there + * will be no packets in the connection, except that, some errors + * happen between the path of 'filter object' and vNIC, if this rare + * case really happen, we can still create a new connection, + * So it is safe to remove the connection from connection_track_table. + * + */ +if ((conn->tcp_state == TCPS_LAST_ACK) && +(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) { +fprintf(stderr, "Remove conn " Can this even compile? Oops, i forgot to remove it, will remove it in next version. + g_hash_table_remove(rf->connection_track_table, key); +} +} +/* + * Case 2: + * The *server* side of this connect is VM, *server* tries to close + * the connection. + * + * We got 'fin=1, ack=1' packet from client side, we need to + * record the seq of 'fin=1, ack=1' packet. + */ +if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) { +conn->fin_ack_seq = htonl(tcp_pkt->th_seq); +conn->tcp_state = TCPS_LAST_ACK; } return 0; } /* handle tcp packet from secon
Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table
On 2017/2/23 12:16, Jason Wang wrote: On 2017年02月22日 16:51, Hailiang Zhang wrote: On 2017/2/22 16:45, Hailiang Zhang wrote: On 2017/2/22 16:07, Jason Wang wrote: On 2017年02月22日 11:46, zhanghailiang wrote: After a net connection is closed, we didn't clear its releated resources in connection_track_table, which will lead to memory leak. Not a real leak but would lead reset of hash table if too many closed connections. Yes, you are right, there will be lots of stale connection data in hash table if we don't remove it while it is been closed. Which Ok, so let's come up with a better title of the patch. OK. Let't track the state of net connection, if it is closed, its related resources will be cleared up. The issue is the state were tracked partially, do we need a full state machine here? Not, IMHO, we only care about the last state of it, because, we will do nothing even if we track the intermedial states. Well, you care at least syn state too. Without a complete state machine, it's very hard to track even partial state I believe. And you will fail to track some state transition for sure which makes the code fragile. Agree, but here things are a little different. There are some extreme cases that we may can't track the complete process of closing connection. For example (I have explained that in the bellow, it seems that you didn't got it ;) ). If VM is running before we want to make it goes into COLO FT state, there maybe some connections exist already, in extreme case, VM is going into COLO state while some connections are in half closing state, we can only track the bellow half closing state in filter-rewriter and colo compare object. Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> --- net/colo.h| 4 +++ net/filter-rewriter.c | 70 +-- 2 files changed, 67 insertions(+), 7 deletions(-) diff --git a/net/colo.h b/net/colo.h index 7c524f3..cd9027f 100644 --- a/net/colo.h +++ b/net/colo.h @@ -18,6 +18,7 @@ #include "slirp/slirp.h" #include "qemu/jhash.h" #include "qemu/timer.h" +#include "slirp/tcp.h" #define HASHTABLE_MAX_SIZE 16384 @@ -69,6 +70,9 @@ typedef struct Connection { * run once in independent tcp connection */ int syn_flag; + +int tcp_state; /* TCP FSM state */ +tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */ } Connection; uint32_t connection_key_hash(const void *opaque); diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index c4ab91c..7e7ec35 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt) } /* handle tcp packet from primary guest */ -static int handle_primary_tcp_pkt(NetFilterState *nf, +static int handle_primary_tcp_pkt(RewriterState *rf, Connection *conn, - Packet *pkt) + Packet *pkt, ConnectionKey *key) { struct tcphdr *tcp_pkt; @@ -97,15 +97,45 @@ static int handle_primary_tcp_pkt(NetFilterState *nf, tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset); net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +/* + * Case 1: + * The *server* side of this connect is VM, *client* tries to close + * the connection. + * + * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1' + * packet from server side. From this point, we can ensure that there + * will be no packets in the connection, except that, some errors + * happen between the path of 'filter object' and vNIC, if this rare + * case really happen, we can still create a new connection, + * So it is safe to remove the connection from connection_track_table. + * + */ +if ((conn->tcp_state == TCPS_LAST_ACK) && +(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) { +fprintf(stderr, "Remove conn " Can this even compile? Oops, i forgot to remove it, will remove it in next version. + g_hash_table_remove(rf->connection_track_table, key); +} +} +/* + * Case 2: + * The *server* side of this connect is VM, *server* tries to close + * the connection. + * + * We got 'fin=1, ack=1' packet from client side, we need to + * record the seq of 'fin=1, ack=1' packet. + */ +if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) { +conn->fin_ack_seq = htonl(tcp_pkt->th_seq); +conn->tcp_state = TCPS_LAST_ACK; } return 0; } /* handle tcp packet from secondary guest */ -static int handle_secondary_tcp_pkt(NetFilterState *nf, +stati
Re: [Qemu-devel] [PATCH v2 3/3] filter-rewriter: skip net_checksum_calculate() while offset = 0
On 2017/2/24 16:08, Zhang Chen wrote: On 02/22/2017 11:46 AM, zhanghailiang wrote: While the offset of packets's sequence for primary side and secondary side is zero, it is unnecessary to call net_checksum_calculate() to recalculate the checksume value of packets. Signed-off-by: zhanghailiang--- net/filter-rewriter.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index 7e7ec35..c9a6d43 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -93,10 +93,12 @@ static int handle_primary_tcp_pkt(RewriterState *rf, conn->offset -= (ntohl(tcp_pkt->th_ack) - 1); conn->syn_flag = 0; } -/* handle packets to the secondary from the primary */ -tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset); +if (conn->offset) { This is wrong, conn->offset maybe is a negative value(like -1000), So you can change here to "if (conn->offset == 0) {" Er, if it is a negative value, it can still go into this if (conn->offset) branch, and we need to adjust the checksum value in this case. +/* handle packets to the secondary from the primary */ +tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset); -net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +} /* * Case 1: * The *server* side of this connect is VM, *client* tries to close @@ -112,7 +114,6 @@ static int handle_primary_tcp_pkt(RewriterState *rf, */ if ((conn->tcp_state == TCPS_LAST_ACK) && (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) { -fprintf(stderr, "Remove conn " Here need fix. OK. g_hash_table_remove(rf->connection_track_table, key); } } @@ -159,10 +160,13 @@ static int handle_secondary_tcp_pkt(RewriterState *rf, } if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK) { -/* handle packets to the primary from the secondary*/ -tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset); +/* Only need to adjust seq while offset is Non-zero */ +if (conn->offset) { Refer to the above comments. Thanks Zhang Chen +/* handle packets to the primary from the secondary*/ +tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset); -net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +} /* * Case 2: * The *server* side of this connect is VM, *server* tries to close
Re: [Qemu-devel] [PATCH 2/3] COLO-compare: Optimize colo_packet_compare_common
On 2017/2/25 11:32, Zhang Chen wrote: Add offset args for colo_packet_compare_common, optimize colo_packet_compare_icmp() and colo_packet_compare_udp() just compare the IP payload. Signed-off-by: Zhang Chen--- net/colo-compare.c | 28 +--- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index e75f0ae..9853232 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int mode) * return:0 means packet same *> 0 || < 0 means packet different */ -static int colo_packet_compare_common(Packet *ppkt, Packet *spkt) +static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) { trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src), inet_ntoa(ppkt->ip->ip_dst), spkt->size, @@ -188,7 +188,8 @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt) inet_ntoa(spkt->ip->ip_dst)); if (ppkt->size == spkt->size) { This check is useless, because we have done such checks in every caller. -return memcmp(ppkt->data, spkt->data, spkt->size); +return memcmp(ppkt->data + offset, spkt->data + offset, + spkt->size - offset); } else { trace_colo_compare_main("Net packet size are not the same"); return -1; @@ -237,8 +238,7 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) spkt->ip->ip_sum = ppkt->ip->ip_sum; } -res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN, -(spkt->size - ETH_HLEN)); +res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN); if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { trace_colo_compare_pkt_info_src(inet_ntoa(ppkt->ip->ip_src), @@ -277,7 +277,14 @@ static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt) return -1; } -ret = colo_packet_compare_common(ppkt, spkt); +/* + * Because of ppkt and spkt are both in the same connection, + * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are + * same with spkt. In addition, IP header's Identification is a random + * field, we can handle it in IP fragmentation function later. + * So we just compare the ip payload here. + */ +ret = colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN); if (ret) { trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size); @@ -304,7 +311,14 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt) return -1; } -if (colo_packet_compare_common(ppkt, spkt)) { +/* + * Because of ppkt and spkt are both in the same connection, + * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are + * same with spkt. In addition, IP header's Identification is a random + * field, we can handle it in IP fragmentation function later. + * So we just compare the ip payload here. + */ +if (colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN)) { trace_colo_compare_icmp_miscompare("primary pkt size", ppkt->size); qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", @@ -330,7 +344,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet *ppkt) inet_ntoa(ppkt->ip->ip_dst), spkt->size, inet_ntoa(spkt->ip->ip_src), inet_ntoa(spkt->ip->ip_dst)); -return colo_packet_compare_common(ppkt, spkt); +return colo_packet_compare_common(ppkt, spkt, 0); } static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)
Re: [Qemu-devel] [PATCH 2/3] COLO-compare: Optimize colo_packet_compare_common
On 2017/2/25 11:32, Zhang Chen wrote: Add offset args for colo_packet_compare_common, optimize colo_packet_compare_icmp() and colo_packet_compare_udp() just compare the IP payload. Signed-off-by: Zhang Chen--- net/colo-compare.c | 28 +--- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index e75f0ae..9853232 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int mode) * return:0 means packet same *> 0 || < 0 means packet different */ -static int colo_packet_compare_common(Packet *ppkt, Packet *spkt) +static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) { trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src), inet_ntoa(ppkt->ip->ip_dst), spkt->size, @@ -188,7 +188,8 @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt) inet_ntoa(spkt->ip->ip_dst)); if (ppkt->size == spkt->size) { -return memcmp(ppkt->data, spkt->data, spkt->size); +return memcmp(ppkt->data + offset, spkt->data + offset, + spkt->size - offset); } else { trace_colo_compare_main("Net packet size are not the same"); return -1; @@ -237,8 +238,7 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) spkt->ip->ip_sum = ppkt->ip->ip_sum; } -res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN, -(spkt->size - ETH_HLEN)); +res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN); For tcp packets check, why not ignore the ip headers, just like udp packets check ? Besides, here, can we compare the checksum stored in headers of tcp and udp before call colo_packet_compare_common(), which i think will improve the comparing performance. Thanks. Hailiang if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { trace_colo_compare_pkt_info_src(inet_ntoa(ppkt->ip->ip_src), @@ -277,7 +277,14 @@ static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt) return -1; } -ret = colo_packet_compare_common(ppkt, spkt); +/* + * Because of ppkt and spkt are both in the same connection, + * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are + * same with spkt. In addition, IP header's Identification is a random + * field, we can handle it in IP fragmentation function later. + * So we just compare the ip payload here. + */ +ret = colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN); if (ret) { trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size); @@ -304,7 +311,14 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt) return -1; } -if (colo_packet_compare_common(ppkt, spkt)) { +/* + * Because of ppkt and spkt are both in the same connection, + * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are + * same with spkt. In addition, IP header's Identification is a random + * field, we can handle it in IP fragmentation function later. + * So we just compare the ip payload here. + */ +if (colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN)) { trace_colo_compare_icmp_miscompare("primary pkt size", ppkt->size); qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", @@ -330,7 +344,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet *ppkt) inet_ntoa(ppkt->ip->ip_dst), spkt->size, inet_ntoa(spkt->ip->ip_src), inet_ntoa(spkt->ip->ip_dst)); -return colo_packet_compare_common(ppkt, spkt); +return colo_packet_compare_common(ppkt, spkt, 0); } static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)
Re: [Qemu-devel] [PATCH 1/3] COLO-compare: Add minimum packet size check and some fix
Hi, On 2017/2/25 11:32, Zhang Chen wrote: Add packet minimum size check in colo_packet_compare_udp() and colo_packet_compare_udp() like colo_packet_compare_icmp(), rename function colo_packet_compare() to colo_packet_compare_common() that we will reuse it later. Signed-off-by: Zhang Chen--- net/colo-compare.c | 30 ++ 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 300f017..e75f0ae 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int mode) * return:0 means packet same *> 0 || < 0 means packet different */ -static int colo_packet_compare(Packet *ppkt, Packet *spkt) +static int colo_packet_compare_common(Packet *ppkt, Packet *spkt) { trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src), inet_ntoa(ppkt->ip->ip_dst), spkt->size, @@ -190,6 +190,7 @@ static int colo_packet_compare(Packet *ppkt, Packet *spkt) if (ppkt->size == spkt->size) { return memcmp(ppkt->data, spkt->data, spkt->size); } else { +trace_colo_compare_main("Net packet size are not the same"); return -1; } } @@ -202,9 +203,10 @@ static int colo_packet_compare(Packet *ppkt, Packet *spkt) static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) { struct tcphdr *ptcp, *stcp; -int res; +int res, network_length; trace_colo_compare_main("compare tcp"); + if (ppkt->size != spkt->size) { if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { trace_colo_compare_main("pkt size not same"); @@ -212,6 +214,12 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) return -1; } +network_length = ppkt->ip->ip_hl * 4; +if (ppkt->size < network_length + ETH_HLEN) { I think the check here is useless, because you have such check in parse_packet_early() which is been called before these helpers. And what check you need to add is, to check if the packet's size = packet's length been record in ip header. Like: +++ b/net/colo.c @@ -78,6 +78,12 @@ int parse_packet_early(Packet *pkt) trace_colo_proxy_main("pkt->size < network_header + network_length"); return 1; } + +if (pkt->size < ETH_HLEN + ntohs(pkt->ip->ip_len)) { +fprintf(stderr, "pkt->size %d < pkt expect total len %ld\n", pkt->size, +pkt_MAChdr_len + ntohs(pkt->ip->ip_len)); +return -1; +} +trace_colo_compare_main("tcp packet size error"); +return -1; +} + ptcp = (struct tcphdr *)ppkt->transport_header; stcp = (struct tcphdr *)spkt->transport_header; @@ -260,10 +268,16 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) */ static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt) { -int ret; +int ret, network_length; trace_colo_compare_main("compare udp"); -ret = colo_packet_compare(ppkt, spkt); +network_length = ppkt->ip->ip_hl * 4; +if (ppkt->size < network_length + ETH_HLEN) { +trace_colo_compare_main("udp packet size error"); +return -1; +} + +ret = colo_packet_compare_common(ppkt, spkt); if (ret) { trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size); @@ -285,12 +299,12 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt) trace_colo_compare_main("compare icmp"); network_length = ppkt->ip->ip_hl * 4; -if (ppkt->size != spkt->size || -ppkt->size < network_length + ETH_HLEN) { +if (ppkt->size < network_length + ETH_HLEN) { +trace_colo_compare_main("icmp packet size error"); return -1; } -if (colo_packet_compare(ppkt, spkt)) { +if (colo_packet_compare_common(ppkt, spkt)) { trace_colo_compare_icmp_miscompare("primary pkt size", ppkt->size); qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", @@ -316,7 +330,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet *ppkt) inet_ntoa(ppkt->ip->ip_dst), spkt->size, inet_ntoa(spkt->ip->ip_src), inet_ntoa(spkt->ip->ip_dst)); -return colo_packet_compare(ppkt, spkt); +return colo_packet_compare_common(ppkt, spkt); } static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)
Re: [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case
ping ... ? On 2017/1/20 11:47, zhanghailiang wrote: COLO block replication doesn't support the shared disk case, Here we try to implement it and this is the third version. Last posted series patches: https://lists.gnu.org/archive/html/qemu-block/2016-12/msg00039.html You can refer to the above link if want to test it. I have uploaded the new version to github: https://github.com/coloft/qemu/tree/colo-developing-with-shared-disk-2016-1-20 Please review and any commits are welcomed. Cc: Juan QuintelaCc: Amit Shah Cc: Dr. David Alan Gilbert (git) Cc: eddie.d...@intel.com v3: - Fix some comments from Stefan and Eric v2: - Drop the patch which add a blk_root() helper - Fix some comments from Changlong zhanghailiang (6): docs/block-replication: Add description for shared-disk case replication: add shared-disk and shared-disk-id options replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() replication: fix code logic with the new shared_disk option replication: Implement block replication for shared disk case nbd/replication: implement .bdrv_get_info() for nbd and replication driver block/nbd.c| 12 block/replication.c| 156 +++-- docs/block-replication.txt | 139 ++-- qapi/block-core.json | 10 ++- 4 files changed, 279 insertions(+), 38 deletions(-)
Re: [Qemu-devel] [PATCH 05/15] COLO: Handle shutdown command for VM in COLO state
Hi Eric, On 2017/2/22 23:35, Eric Blake wrote: On 02/21/2017 09:42 PM, zhanghailiang wrote: If VM is in COLO FT state, we need to do some extra works before starting normal shutdown process. Secondary VM will ignore the shutdown command if users issue it directly to Secondary VM. COLO will capture shutdown command and after shutdown request from user. Cc: Paolo BonziniSigned-off-by: zhanghailiang Signed-off-by: Li Zhijian Reviewed-by: Dr. David Alan Gilbert --- v19: - fix title and comment Did you miss putting v19 in the subject line? Er, some patches of this series are split from previous v18 version, but most patches of that series has been merged into upstream, so that's why it has a tag here. I think it is better to remove this comment instead of using v19 tag for this series. +++ b/qapi-schema.json @@ -1157,12 +1157,14 @@ # # @vmstate-loaded: VM's state has been loaded by SVM. # +# @guest-shutdown: shutdown require from PVM to SVM maybe s/require/requested/ ? Yes. Missing '(since 2.9)' Will fix this in next version, thanks. +# # Since: 2.8 ## { 'enum': 'COLOMessage', 'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply', 'vmstate-send', 'vmstate-size', 'vmstate-received', -'vmstate-loaded' ] } +'vmstate-loaded', 'guest-shutdown' ] }
Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint
Hi, On 2017/2/22 17:31, Zhang Chen wrote: On 02/22/2017 11:42 AM, zhanghailiang wrote: While do checkpoint, we need to flush all the unhandled packets, By using the filter notifier mechanism, we can easily to notify every compare object to do this process, which runs inside of compare threads as a coroutine. Hi~ Jason and Hailiang. I will send a patch set later about colo-compare notify mechanism for Xen like this patch. I want to add a new chardev socket way in colo-comapre connect to Xen colo, for notify checkpoint or failover, Because We have no choice to use this way communicate with Xen codes. That's means we will have two notify mechanism. What do you think about this? I don't think you need another mechanism, what you need to do is to realize a qmp command which calls colo_notify_compares_event(), It will not return until the event (checkpoint or failover) be handled by all compares. Will this satisfy your requirement ? Thanks, Hailiang Thanks Zhang Chen Cc: Jason WangSigned-off-by: zhanghailiang Signed-off-by: Zhang Chen --- net/colo-compare.c | 72 ++ net/colo-compare.h | 20 +++ 2 files changed, 92 insertions(+) create mode 100644 net/colo-compare.h
Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table
On 2017/2/22 16:45, Hailiang Zhang wrote: On 2017/2/22 16:07, Jason Wang wrote: On 2017年02月22日 11:46, zhanghailiang wrote: After a net connection is closed, we didn't clear its releated resources in connection_track_table, which will lead to memory leak. Not a real leak but would lead reset of hash table if too many closed connections. Yes, you are right, there will be lots of stale connection data in hash table if we don't remove it while it is been closed. Which Let't track the state of net connection, if it is closed, its related resources will be cleared up. The issue is the state were tracked partially, do we need a full state machine here? Not, IMHO, we only care about the last state of it, because, we will do nothing even if we track the intermedial states. Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> --- net/colo.h| 4 +++ net/filter-rewriter.c | 70 +-- 2 files changed, 67 insertions(+), 7 deletions(-) diff --git a/net/colo.h b/net/colo.h index 7c524f3..cd9027f 100644 --- a/net/colo.h +++ b/net/colo.h @@ -18,6 +18,7 @@ #include "slirp/slirp.h" #include "qemu/jhash.h" #include "qemu/timer.h" +#include "slirp/tcp.h" #define HASHTABLE_MAX_SIZE 16384 @@ -69,6 +70,9 @@ typedef struct Connection { * run once in independent tcp connection */ int syn_flag; + +int tcp_state; /* TCP FSM state */ +tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */ } Connection; uint32_t connection_key_hash(const void *opaque); diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index c4ab91c..7e7ec35 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt) } /* handle tcp packet from primary guest */ -static int handle_primary_tcp_pkt(NetFilterState *nf, +static int handle_primary_tcp_pkt(RewriterState *rf, Connection *conn, - Packet *pkt) + Packet *pkt, ConnectionKey *key) { struct tcphdr *tcp_pkt; @@ -97,15 +97,45 @@ static int handle_primary_tcp_pkt(NetFilterState *nf, tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset); net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +/* + * Case 1: + * The *server* side of this connect is VM, *client* tries to close + * the connection. + * + * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1' + * packet from server side. From this point, we can ensure that there + * will be no packets in the connection, except that, some errors + * happen between the path of 'filter object' and vNIC, if this rare + * case really happen, we can still create a new connection, + * So it is safe to remove the connection from connection_track_table. + * + */ +if ((conn->tcp_state == TCPS_LAST_ACK) && +(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) { +fprintf(stderr, "Remove conn " Can this even compile? Oops, i forgot to remove it, will remove it in next version. +g_hash_table_remove(rf->connection_track_table, key); +} +} +/* + * Case 2: + * The *server* side of this connect is VM, *server* tries to close + * the connection. + * + * We got 'fin=1, ack=1' packet from client side, we need to + * record the seq of 'fin=1, ack=1' packet. + */ +if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) { +conn->fin_ack_seq = htonl(tcp_pkt->th_seq); +conn->tcp_state = TCPS_LAST_ACK; } return 0; } /* handle tcp packet from secondary guest */ -static int handle_secondary_tcp_pkt(NetFilterState *nf, +static int handle_secondary_tcp_pkt(RewriterState *rf, Connection *conn, -Packet *pkt) +Packet *pkt, ConnectionKey *key) { struct tcphdr *tcp_pkt; @@ -133,8 +163,34 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf, tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset); net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +/* + * Case 2: + * The *server* side of this connect is VM, *server* tries to close + * the connection. + * + * We got 'ack=1' packets from server side, it acks 'fin=1, ack=1' + * packet from client side. Like Case 1, there should be no packets + * in the connection from now know, But the difference here is + * if the packet is lost, We will get the
Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table
On 2017/2/22 16:07, Jason Wang wrote: On 2017年02月22日 11:46, zhanghailiang wrote: After a net connection is closed, we didn't clear its releated resources in connection_track_table, which will lead to memory leak. Not a real leak but would lead reset of hash table if too many closed connections. Yes, you are right, there will be lots of stale connection data in hash table if we don't remove it while it is been closed. Which Let't track the state of net connection, if it is closed, its related resources will be cleared up. The issue is the state were tracked partially, do we need a full state machine here? Not, IMHO, we only care about the last state of it, because, we will do nothing even if we track the intermedial states. Signed-off-by: zhanghailiang--- net/colo.h| 4 +++ net/filter-rewriter.c | 70 +-- 2 files changed, 67 insertions(+), 7 deletions(-) diff --git a/net/colo.h b/net/colo.h index 7c524f3..cd9027f 100644 --- a/net/colo.h +++ b/net/colo.h @@ -18,6 +18,7 @@ #include "slirp/slirp.h" #include "qemu/jhash.h" #include "qemu/timer.h" +#include "slirp/tcp.h" #define HASHTABLE_MAX_SIZE 16384 @@ -69,6 +70,9 @@ typedef struct Connection { * run once in independent tcp connection */ int syn_flag; + +int tcp_state; /* TCP FSM state */ +tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */ } Connection; uint32_t connection_key_hash(const void *opaque); diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index c4ab91c..7e7ec35 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt) } /* handle tcp packet from primary guest */ -static int handle_primary_tcp_pkt(NetFilterState *nf, +static int handle_primary_tcp_pkt(RewriterState *rf, Connection *conn, - Packet *pkt) + Packet *pkt, ConnectionKey *key) { struct tcphdr *tcp_pkt; @@ -97,15 +97,45 @@ static int handle_primary_tcp_pkt(NetFilterState *nf, tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset); net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +/* + * Case 1: + * The *server* side of this connect is VM, *client* tries to close + * the connection. + * + * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1' + * packet from server side. From this point, we can ensure that there + * will be no packets in the connection, except that, some errors + * happen between the path of 'filter object' and vNIC, if this rare + * case really happen, we can still create a new connection, + * So it is safe to remove the connection from connection_track_table. + * + */ +if ((conn->tcp_state == TCPS_LAST_ACK) && +(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) { +fprintf(stderr, "Remove conn " Can this even compile? Oops, i forgot to remove it, will remove it in next version. +g_hash_table_remove(rf->connection_track_table, key); +} +} +/* + * Case 2: + * The *server* side of this connect is VM, *server* tries to close + * the connection. + * + * We got 'fin=1, ack=1' packet from client side, we need to + * record the seq of 'fin=1, ack=1' packet. + */ +if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) { +conn->fin_ack_seq = htonl(tcp_pkt->th_seq); +conn->tcp_state = TCPS_LAST_ACK; } return 0; } /* handle tcp packet from secondary guest */ -static int handle_secondary_tcp_pkt(NetFilterState *nf, +static int handle_secondary_tcp_pkt(RewriterState *rf, Connection *conn, -Packet *pkt) +Packet *pkt, ConnectionKey *key) { struct tcphdr *tcp_pkt; @@ -133,8 +163,34 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf, tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset); net_checksum_calculate((uint8_t *)pkt->data, pkt->size); +/* + * Case 2: + * The *server* side of this connect is VM, *server* tries to close + * the connection. + * + * We got 'ack=1' packets from server side, it acks 'fin=1, ack=1' + * packet from client side. Like Case 1, there should be no packets + * in the connection from now know, But the difference here is + * if the packet is lost, We will get the resent 'fin=1,ack=1' packet. + * TODO: Fix above case. + */ +if ((conn->tcp_state == TCPS_LAST_ACK) && +(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) { +
Re: [Qemu-devel] [PATCH] net/colo-compare: Fix memory free error
On 2017/2/21 10:44, Zhang Chen wrote: We use g_queue_init() to init s->conn_list, so we should use g_queue_clear() to instead of g_queue_free(). Signed-off-by: Zhang ChenReviewed-by: zhanghailiang --- net/colo-compare.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 162fd6a..cf8c4c9 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -715,7 +715,7 @@ static void colo_compare_finalize(Object *obj) qemu_chr_fe_deinit(>chr_sec_in); qemu_chr_fe_deinit(>chr_out); -g_queue_free(>conn_list); +g_queue_clear(>conn_list); if (qemu_thread_is_self(>thread)) { /* compare connection */
Re: [Qemu-devel] [PATCH V7 2/2] Add a new qmp command to do checkpoint, query xen replication status
On 2017/2/8 13:24, Zhang Chen wrote: We can call this qmp command to do checkpoint outside of qemu. Xen colo will need this function. Signed-off-by: Zhang ChenSigned-off-by: Wen Congyang --- migration/colo.c | 17 qapi-schema.json | 60 2 files changed, 77 insertions(+) diff --git a/migration/colo.c b/migration/colo.c index 6fc2ade..2f98a33 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -127,6 +127,23 @@ void qmp_xen_set_replication(bool enable, bool primary, } } +ReplicationResult *qmp_query_xen_replication_status(Error **errp) +{ +Error *err = NULL; +ReplicationResult *result = g_new0(ReplicationResult, 1); Indent, line break. +replication_get_error_all(); +result->status = err ? + REPLICATION_STATUS_ERROR : + REPLICATION_STATUS_NORMAL; +error_free(err); +return result; +} + +void qmp_xen_do_checkpoint(Error **errp) +{ +replication_do_checkpoint_all(errp); +} + static void colo_send_message(QEMUFile *f, COLOMessage msg, Error **errp) { diff --git a/qapi-schema.json b/qapi-schema.json index 9445b93..719744a 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -5931,6 +5931,66 @@ 'data': { 'enable': 'bool', 'primary': 'bool', '*failover' : 'bool' } } ## +# @ReplicationStatus: +# +# Describe the status of replication. +# +# @error: Replication has an error. +# +# @normal: Replication is running normally. +# +# Since: 2.9 +## +{ 'enum': 'ReplicationStatus', + 'data': [ 'error', 'normal' ] } + +## +# @ReplicationResult: +# +# The result format for 'query-xen-replication-status'. +# +# @status: enum of @ReplicationStatus, which shows current +# replication error status +# +# Since: 2.9 +## +{ 'struct': 'ReplicationResult', + 'data': { 'status': 'ReplicationStatus'} } ^ Space + +## +# @query-xen-replication-status: +# +# Query replication status while the vm is running. +# +# Returns: A @ReplicationResult objects showing the status. +# +# Example: +# +# -> { "execute": "query-xen-replication-status" } +# <- { "return": { "status": "normal" } } +# +# Since: 2.9 +## +{ 'command': 'query-xen-replication-status', + 'returns': 'ReplicationResult' } + +## +# @xen-do-checkpoint: +# Maybe use the name 'xen-checkpoint-notify' or 'xen-colo-checkpoint-notify' ? +# Xen uses this command to notify replication to trigger a checkpoint. +# +# Returns: nothing. +# +# Example: +# +# -> { "execute": "xen-do-checkpoint" } +# <- { "return": {} } +# +# Since: 2.9 +## +{ 'command': 'xen-do-checkpoint' } + +## # @GICCapability: # # The struct describes capability for a specific GIC (Generic
Re: [Qemu-devel] [PATCH V7 1/2] Add a new qmp command to start/stop replication
On 2017/2/8 13:24, Zhang Chen wrote: We can call this qmp command to start/stop replication outside of qemu. Like Xen colo need this function. Signed-off-by: Zhang ChenSigned-off-by: Wen Congyang Reviewed-by: Eric Blake Reviewed-by: Stefano Stabellini Reviewed-by: zhanghailiang --- migration/colo.c | 23 +++ qapi-schema.json | 25 + 2 files changed, 48 insertions(+) diff --git a/migration/colo.c b/migration/colo.c index 93c85c5..6fc2ade 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -19,6 +19,8 @@ #include "qemu/error-report.h" #include "qapi/error.h" #include "migration/failover.h" +#include "replication.h" +#include "qmp-commands.h" #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) @@ -104,6 +106,27 @@ void colo_do_failover(MigrationState *s) } } +void qmp_xen_set_replication(bool enable, bool primary, + bool has_failover, bool failover, + Error **errp) +{ +ReplicationMode mode = primary ? + REPLICATION_MODE_PRIMARY : + REPLICATION_MODE_SECONDARY; + +if (has_failover && enable) { +error_setg(errp, "Parameter 'failover' is only for" + " stopping replication"); +return; +} + +if (enable) { +replication_start_all(mode, errp); +} else { +replication_stop_all(failover, failover ? NULL : errp); +} +} + static void colo_send_message(QEMUFile *f, COLOMessage msg, Error **errp) { diff --git a/qapi-schema.json b/qapi-schema.json index cbdffdd..9445b93 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -5906,6 +5906,31 @@ { 'command': 'xen-load-devices-state', 'data': {'filename': 'str'} } ## +# @xen-set-replication: +# +# Enable or disable replication. +# +# @enable: true to enable, false to disable. +# +# @primary: true for primary or false for secondary. +# +# @failover: #optional true to do failover, false to stop. but cannot be +#specified if 'enable' is true. default value is false. +# +# Returns: nothing. +# +# Example: +# +# -> { "execute": "xen-set-replication", +# "arguments": {"enable": true, "primary": false} } +# <- { "return": {} } +# +# Since: 2.9 +## +{ 'command': 'xen-set-replication', + 'data': { 'enable': 'bool', 'primary': 'bool', '*failover' : 'bool' } } + +## # @GICCapability: # # The struct describes capability for a specific GIC (Generic
Re: [Qemu-devel] [PATCH 1/3] net/colo: fix memory double free error
On 2017/2/21 10:25, Zhang Chen wrote: On 02/20/2017 04:01 PM, zhanghailiang wrote: The 'primary_list' and 'secondary_list' members of struct Connection is not allocated through dynamically g_queue_new(), but we free it by using g_queue_free(), which will lead to a double-free bug. Signed-off-by: zhanghailiang--- net/colo.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/colo.c b/net/colo.c index 6a6eacd..7d5c423 100644 --- a/net/colo.c +++ b/net/colo.c @@ -147,9 +147,7 @@ void connection_destroy(void *opaque) Connection *conn = opaque; g_queue_foreach(>primary_list, packet_destroy, NULL); -g_queue_free(>primary_list); g_queue_foreach(>secondary_list, packet_destroy, NULL); -g_queue_free(>secondary_list); I think we need use g_queue_clear () here. Ha, you are right, my original modification will introduce memory leak. Will fix in next version. void g_queue_clear (GQueue *queue); Removes all the elements in queue . If queue elements contain dynamically-allocated memory, they should be freed first. Thanks Zhang Chen g_slice_free(Connection, conn); }
Re: [Qemu-devel] [PATCH v2 3/4] char: remove the right fd been watched in qemu_chr_fe_set_handlers()
On 2017/2/16 21:04, Marc-André Lureau wrote: Hi On Thu, Feb 16, 2017 at 4:49 PM Hailiang Zhang < zhang.zhanghaili...@huawei.com> wrote: Hi, On 2017/2/16 18:40, Marc-André Lureau wrote: Hi On Thu, Feb 16, 2017 at 10:08 AM zhanghailiang < zhang.zhanghaili...@huawei.com> wrote: We can call qemu_chr_fe_set_handlers() to add/remove fd been watched in 'context' which can be either default main context or other explicit context. But the original logic is not correct, we didn't remove the right fd because we call g_main_context_find_source_by_id(NULL, tag) which always try to find the Gsource from default context. Fix it by passing the right context to g_main_context_find_source_by_id(). Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Marc-André Lureau <marcandre.lur...@redhat.com> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> --- chardev/char-io.c | 13 + chardev/char-io.h | 2 ++ chardev/char.c| 2 +- 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/chardev/char-io.c b/chardev/char-io.c index 7dfc3f2..a69cc61 100644 --- a/chardev/char-io.c +++ b/chardev/char-io.c @@ -127,14 +127,14 @@ guint io_add_watch_poll(Chardev *chr, return tag; } -static void io_remove_watch_poll(guint tag) +static void io_remove_watch_poll(guint tag, GMainContext *context) { GSource *source; IOWatchPoll *iwp; g_return_if_fail(tag > 0); -source = g_main_context_find_source_by_id(NULL, tag); +source = g_main_context_find_source_by_id(context, tag); g_return_if_fail(source != NULL); iwp = io_watch_poll_from_source(source); @@ -146,14 +146,19 @@ static void io_remove_watch_poll(guint tag) g_source_destroy(>parent); } -void remove_fd_in_watch(Chardev *chr) +void qemu_remove_fd_in_watch(Chardev *chr, GMainContext *context) { if (chr->fd_in_tag) { -io_remove_watch_poll(chr->fd_in_tag); +io_remove_watch_poll(chr->fd_in_tag, context); chr->fd_in_tag = 0; } } +void remove_fd_in_watch(Chardev *chr) +{ +qemu_remove_fd_in_watch(chr, NULL); +} + I would just change the signature of remove_fd_in_watch() instead of introducing a function. In that case, i have to modify all the places call this helper, About eleven places in six files,is it acceptable ? Yes, that's fine. OK, will fix it in v3, thanks. Thanks. int io_channel_send_full(QIOChannel *ioc, const void *buf, size_t len, int *fds, size_t nfds) diff --git a/chardev/char-io.h b/chardev/char-io.h index d7ae5f1..117c888 100644 --- a/chardev/char-io.h +++ b/chardev/char-io.h @@ -38,6 +38,8 @@ guint io_add_watch_poll(Chardev *chr, void remove_fd_in_watch(Chardev *chr); +void qemu_remove_fd_in_watch(Chardev *chr, GMainContext *context); + int io_channel_send(QIOChannel *ioc, const void *buf, size_t len); int io_channel_send_full(QIOChannel *ioc, const void *buf, size_t len, diff --git a/chardev/char.c b/chardev/char.c index abd525f..5563375 100644 --- a/chardev/char.c +++ b/chardev/char.c @@ -560,7 +560,7 @@ void qemu_chr_fe_set_handlers(CharBackend *b, cc = CHARDEV_GET_CLASS(s); if (!opaque && !fd_can_read && !fd_read && !fd_event) { fe_open = 0; -remove_fd_in_watch(s); +qemu_remove_fd_in_watch(s, context); } else { fe_open = 1; } -- 1.8.3.1 otherwise, looks good to me -- Marc-André Lureau
Re: [Qemu-devel] [PATCH v2 3/4] char: remove the right fd been watched in qemu_chr_fe_set_handlers()
Hi, On 2017/2/16 18:40, Marc-André Lureau wrote: Hi On Thu, Feb 16, 2017 at 10:08 AM zhanghailiang < zhang.zhanghaili...@huawei.com> wrote: We can call qemu_chr_fe_set_handlers() to add/remove fd been watched in 'context' which can be either default main context or other explicit context. But the original logic is not correct, we didn't remove the right fd because we call g_main_context_find_source_by_id(NULL, tag) which always try to find the Gsource from default context. Fix it by passing the right context to g_main_context_find_source_by_id(). Cc: Paolo BonziniCc: Marc-André Lureau Signed-off-by: zhanghailiang --- chardev/char-io.c | 13 + chardev/char-io.h | 2 ++ chardev/char.c| 2 +- 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/chardev/char-io.c b/chardev/char-io.c index 7dfc3f2..a69cc61 100644 --- a/chardev/char-io.c +++ b/chardev/char-io.c @@ -127,14 +127,14 @@ guint io_add_watch_poll(Chardev *chr, return tag; } -static void io_remove_watch_poll(guint tag) +static void io_remove_watch_poll(guint tag, GMainContext *context) { GSource *source; IOWatchPoll *iwp; g_return_if_fail(tag > 0); -source = g_main_context_find_source_by_id(NULL, tag); +source = g_main_context_find_source_by_id(context, tag); g_return_if_fail(source != NULL); iwp = io_watch_poll_from_source(source); @@ -146,14 +146,19 @@ static void io_remove_watch_poll(guint tag) g_source_destroy(>parent); } -void remove_fd_in_watch(Chardev *chr) +void qemu_remove_fd_in_watch(Chardev *chr, GMainContext *context) { if (chr->fd_in_tag) { -io_remove_watch_poll(chr->fd_in_tag); +io_remove_watch_poll(chr->fd_in_tag, context); chr->fd_in_tag = 0; } } +void remove_fd_in_watch(Chardev *chr) +{ +qemu_remove_fd_in_watch(chr, NULL); +} + I would just change the signature of remove_fd_in_watch() instead of introducing a function. In that case, i have to modify all the places call this helper, About eleven places in six files,is it acceptable ? Thanks. int io_channel_send_full(QIOChannel *ioc, const void *buf, size_t len, int *fds, size_t nfds) diff --git a/chardev/char-io.h b/chardev/char-io.h index d7ae5f1..117c888 100644 --- a/chardev/char-io.h +++ b/chardev/char-io.h @@ -38,6 +38,8 @@ guint io_add_watch_poll(Chardev *chr, void remove_fd_in_watch(Chardev *chr); +void qemu_remove_fd_in_watch(Chardev *chr, GMainContext *context); + int io_channel_send(QIOChannel *ioc, const void *buf, size_t len); int io_channel_send_full(QIOChannel *ioc, const void *buf, size_t len, diff --git a/chardev/char.c b/chardev/char.c index abd525f..5563375 100644 --- a/chardev/char.c +++ b/chardev/char.c @@ -560,7 +560,7 @@ void qemu_chr_fe_set_handlers(CharBackend *b, cc = CHARDEV_GET_CLASS(s); if (!opaque && !fd_can_read && !fd_read && !fd_event) { fe_open = 0; -remove_fd_in_watch(s); +qemu_remove_fd_in_watch(s, context); } else { fe_open = 1; } -- 1.8.3.1 otherwise, looks good to me
Re: [Qemu-devel] [PATCH 3/5] colo-compare: release all unhandled packets in finalize function
On 2017/2/16 10:27, Zhang Chen wrote: On 02/15/2017 04:34 PM, zhanghailiang wrote: We should release all unhandled packets before finalize colo compare. Besides, we need to free connection_track_table, or there will be a memory leak bug. Signed-off-by: zhanghailiang--- net/colo-compare.c | 20 1 file changed, 20 insertions(+) diff --git a/net/colo-compare.c b/net/colo-compare.c index a16e2d5..809bad3 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -676,6 +676,23 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) return; } This function in my patch "colo-compare and filter-rewriter work with colo-frame " Named 'colo_flush_connection', I think use 'flush' instead of 'release' is better, OK, i will fix it in next version, thanks. Thanks Zhang Chen +static void colo_release_packets(void *opaque, void *user_data) +{ +CompareState *s = user_data; +Connection *conn = opaque; +Packet *pkt = NULL; + +while (!g_queue_is_empty(>primary_list)) { +pkt = g_queue_pop_head(>primary_list); +compare_chr_send(>chr_out, pkt->data, pkt->size); +packet_destroy(pkt, NULL); +} +while (!g_queue_is_empty(>secondary_list)) { +pkt = g_queue_pop_head(>secondary_list); +packet_destroy(pkt, NULL); +} +} + static void colo_compare_class_init(ObjectClass *oc, void *data) { UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc); @@ -707,9 +724,12 @@ static void colo_compare_finalize(Object *obj) g_main_loop_quit(s->compare_loop); qemu_thread_join(>thread); +/* Release all unhandled packets after compare thead exited */ +g_queue_foreach(>conn_list, colo_release_packets, s); g_queue_free(>conn_list); +g_hash_table_destroy(s->connection_track_table); g_free(s->pri_indev); g_free(s->sec_indev); g_free(s->outdev);
Re: [Qemu-devel] [PATCH 2/5] colo-compare: kick compare thread to exit while finalize
On 2017/2/16 10:25, Zhang Chen wrote: On 02/15/2017 04:34 PM, zhanghailiang wrote: We should call g_main_loop_quit() to notify colo compare thread to exit, Or it will run in g_main_loop_run() forever. Besides, the finalizing process can't happen in context of colo thread, it is reasonable to remove the 'if (qemu_thread_is_self(>thread))' branch. Signed-off-by: zhanghailiang--- net/colo-compare.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index fdde788..a16e2d5 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -83,6 +83,8 @@ typedef struct CompareState { GHashTable *connection_track_table; /* compare thread, a thread for each NIC */ QemuThread thread; + +GMainLoop *compare_loop; } CompareState; typedef struct CompareClass { @@ -496,7 +498,6 @@ static gboolean check_old_packet_regular(void *opaque) static void *colo_compare_thread(void *opaque) { GMainContext *worker_context; -GMainLoop *compare_loop; CompareState *s = opaque; GSource *timeout_source; @@ -507,7 +508,7 @@ static void *colo_compare_thread(void *opaque) qemu_chr_fe_set_handlers(>chr_sec_in, compare_chr_can_read, compare_sec_chr_in, NULL, s, worker_context, true); -compare_loop = g_main_loop_new(worker_context, FALSE); +s->compare_loop = g_main_loop_new(worker_context, FALSE); /* To kick any packets that the secondary doesn't match */ timeout_source = g_timeout_source_new(REGULAR_PACKET_CHECK_MS); @@ -515,10 +516,10 @@ static void *colo_compare_thread(void *opaque) (GSourceFunc)check_old_packet_regular, s, NULL); g_source_attach(timeout_source, worker_context); -g_main_loop_run(compare_loop); +g_main_loop_run(s->compare_loop); g_source_unref(timeout_source); -g_main_loop_unref(compare_loop); +g_main_loop_unref(s->compare_loop); g_main_context_unref(worker_context); return NULL; } @@ -703,13 +704,11 @@ static void colo_compare_finalize(Object *obj) qemu_chr_fe_deinit(>chr_sec_in); qemu_chr_fe_deinit(>chr_out); -g_queue_free(>conn_list); +g_main_loop_quit(s->compare_loop); +qemu_thread_join(>thread); -if (qemu_thread_is_self(>thread)) { -/* compare connection */ -g_queue_foreach(>conn_list, colo_compare_connection, s); -qemu_thread_join(>thread); -} Before free the 's->conn_list', you should flush all queued primary packets and release all queued secondary packets here, so combine this patch with 3/5 patch as one patch is a better choose. Make sense, will fix it in next version, thanks. Thanks Zhang Chen + +g_queue_free(>conn_list); g_free(s->pri_indev); g_free(s->sec_indev);
Re: [Qemu-devel] [PATCH 3/5] colo-compare: release all unhandled packets in finalize function
On 2017/2/16 10:34, Jason Wang wrote: On 2017年02月15日 16:34, zhanghailiang wrote: We should release all unhandled packets before finalize colo compare. Besides, we need to free connection_track_table, or there will be a memory leak bug. Signed-off-by: zhanghailiang--- net/colo-compare.c | 20 1 file changed, 20 insertions(+) diff --git a/net/colo-compare.c b/net/colo-compare.c index a16e2d5..809bad3 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -676,6 +676,23 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) return; } +static void colo_release_packets(void *opaque, void *user_data) +{ +CompareState *s = user_data; +Connection *conn = opaque; +Packet *pkt = NULL; + +while (!g_queue_is_empty(>primary_list)) { +pkt = g_queue_pop_head(>primary_list); +compare_chr_send(>chr_out, pkt->data, pkt->size); Any reason to send packets here? Yes, considering the usage case which we shut COLO for the VM to make it as a normal VM without FT. We need to remove all the filter objects. In this case, IMHO, it is necessary to release the unhandled packets. Thanks. Thanks +packet_destroy(pkt, NULL); +} +while (!g_queue_is_empty(>secondary_list)) { +pkt = g_queue_pop_head(>secondary_list); +packet_destroy(pkt, NULL); +} +} + static void colo_compare_class_init(ObjectClass *oc, void *data) { UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc); @@ -707,9 +724,12 @@ static void colo_compare_finalize(Object *obj) g_main_loop_quit(s->compare_loop); qemu_thread_join(>thread); +/* Release all unhandled packets after compare thead exited */ +g_queue_foreach(>conn_list, colo_release_packets, s); g_queue_free(>conn_list); +g_hash_table_destroy(s->connection_track_table); g_free(s->pri_indev); g_free(s->sec_indev); g_free(s->outdev); .
Re: [Qemu-devel] [PATCH 0/3] COLO: fix some bugs
On 2017/2/10 23:44, Dr. David Alan Gilbert wrote: * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: Hi, This series fix three bugs of COLO. patch 1 fix one usage case which users want to change checkpoint-delay with an extream big value set before, the new value may not take effect until finishing a long time of sleep. Patch 2 and 3 are old patches that split from previous version long time ago, and has been reviewed by Dave. I'd like to pick these three patches from the later COLO series, which will include some optimization and integrating with block replication and COLO net proxy. Please review, thanks. Queued. I've added the '(Since 2.9)' that Eric asked for. Thank you very much. By the way, as we discussed in patch 1, should i add the codes in patch 1 which kick the colo_checkpoint_sem to quick the colo thread exiting ? In next version ? Or post a new patch ? Hailiang Dave zhanghailiang (3): COLO: fix setting checkpoint-delay not working properly COLO: Shutdown related socket fd while do failover COLO: Don't process failover request while loading VM's state include/migration/colo.h | 2 + include/migration/migration.h | 8 migration/colo.c | 102 +- migration/migration.c | 3 ++ qapi-schema.json | 4 +- 5 files changed, 108 insertions(+), 11 deletions(-) -- 1.8.3.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK .
Re: [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover
On 2017/2/9 3:53, Dr. David Alan Gilbert wrote: * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote: On 2017/1/18 19:01, Dr. David Alan Gilbert wrote: * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: If the net connection between primary host and secondary host breaks while COLO/COLO incoming threads are doing read() or write(). It will block until connection is timeout, and the failover process will be blocked because of it. So it is necessary to shutdown all the socket fds used by COLO to avoid this situation. Besides, we should close the corresponding file descriptors after failvoer BH shutdown them, Or there will be an error. Hi, There are two parts to this patch: a) Add some semaphores to sequence failover b) Use shutdown() At first I wondered if perhaps they should be split; but I see the reason for the semaphores is mostly to stop the race between the fd's getting closed and the shutdown() calls; so I think it's OK. Hi, Yes, you are right, maybe i should add some comments about that. Will do in next version. Do you have any problems with these semaphores during powering off the guest? No, we didn't encounter any problems or trigger any bugs in our test with this semaphores. In what places do you doubt it may has problems ? :) I just wondered about other exit cases other than failover; e.g. what if the guest shutdown or something like that, would it get stuck waiting for the colo_incoming_sem. Sorry for the late reply, too busy with our project these days ... Your worry makes sense, we have processed the shutdown case specially, Let qemu does a checkpoint process (It seems that, we should kick colo thread which might be waiting for colo_checkpoint_sem.) before exit colo therad. And for the secondary sides, if it receives shutdown request, it will exit colo incoming thread after some cleanup works. Thanks. Hailiang Dave Thanks, Hailiang Dave Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com> --- include/migration/migration.h | 3 +++ migration/colo.c | 43 +++ 2 files changed, 46 insertions(+) diff --git a/include/migration/migration.h b/include/migration/migration.h index 487ac1e..7cac877 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -113,6 +113,7 @@ struct MigrationIncomingState { QemuThread colo_incoming_thread; /* The coroutine we should enter (back) after failover */ Coroutine *migration_incoming_co; +QemuSemaphore colo_incoming_sem; /* See savevm.c */ LoadStateEntry_Head loadvm_handlers; @@ -182,6 +183,8 @@ struct MigrationState QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests; /* The RAMBlock used in the last src_page_request */ RAMBlock *last_req_rb; +/* The semaphore is used to notify COLO thread that failover is finished */ +QemuSemaphore colo_exit_sem; /* The semaphore is used to notify COLO thread to do checkpoint */ QemuSemaphore colo_checkpoint_sem; diff --git a/migration/colo.c b/migration/colo.c index 08b2e46..3222812 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void) /* recover runstate to normal migration finish state */ autostart = true; } +/* + * Make sure COLO incoming thread not block in recv or send, + * If mis->from_src_file and mis->to_src_file use the same fd, + * The second shutdown() will return -1, we ignore this value, + * It is harmless. + */ +if (mis->from_src_file) { +qemu_file_shutdown(mis->from_src_file); +} +if (mis->to_src_file) { +qemu_file_shutdown(mis->to_src_file); +} old_state = failover_set_state(FAILOVER_STATUS_ACTIVE, FAILOVER_STATUS_COMPLETED); @@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void) "secondary VM", FailoverStatus_lookup[old_state]); return; } +/* Notify COLO incoming thread that failover work is finished */ +qemu_sem_post(>colo_incoming_sem); /* For Secondary VM, jump to incoming co */ if (mis->migration_incoming_co) { qemu_coroutine_enter(mis->migration_incoming_co); @@ -81,6 +95,18 @@ static void primary_vm_do_failover(void) migrate_set_state(>state, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); +/* + * Wake up COLO thread which may blocked in recv() or send(), + * The s->rp_state.from_dst_file and s->to_dst_file may use the + * same fd, but we still shutdown the fd for twice, it is harmless. + */
Re: [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly
On 2017/2/8 18:38, Dr. David Alan Gilbert wrote: * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: If we set checkpoint-delay through command 'migrate-set-parameters', It will not take effect until we finish last sleep chekpoint-delay, That's will be offensive espeically when we want to change its value from an extreme big one to a proper value. Fix it by using timer to realize checkpoint-delay. Yes, I think this works; you only kick the timer in COLO state, and you create the timer before going into COLO state and clean it up after we leave, so it feels safe. Are you also going to kick this semaphore when doing a failover to cause this thread to exit quickly? Ha, good suggestion! I think it will work. Accepted, thanks very much. Signed-off-by: zhanghailiangReviewed-by: Dr. David Alan Gilbert Dave --- include/migration/colo.h | 2 ++ include/migration/migration.h | 5 + migration/colo.c | 33 +++-- migration/migration.c | 3 +++ 4 files changed, 33 insertions(+), 10 deletions(-) diff --git a/include/migration/colo.h b/include/migration/colo.h index e32eef4..2bbff9e 100644 --- a/include/migration/colo.h +++ b/include/migration/colo.h @@ -35,4 +35,6 @@ COLOMode get_colo_mode(void); /* failover */ void colo_do_failover(MigrationState *s); + +void colo_checkpoint_notify(void *opaque); #endif diff --git a/include/migration/migration.h b/include/migration/migration.h index c309d23..487ac1e 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -183,6 +183,11 @@ struct MigrationState /* The RAMBlock used in the last src_page_request */ RAMBlock *last_req_rb; +/* The semaphore is used to notify COLO thread to do checkpoint */ +QemuSemaphore colo_checkpoint_sem; +int64_t colo_checkpoint_time; +QEMUTimer *colo_delay_timer; + /* The last error that occurred */ Error *error; }; diff --git a/migration/colo.c b/migration/colo.c index 93c85c5..08b2e46 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -302,7 +302,7 @@ static void colo_process_checkpoint(MigrationState *s) { QIOChannelBuffer *bioc; QEMUFile *fb = NULL; -int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); +int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); Error *local_err = NULL; int ret; @@ -332,26 +332,21 @@ static void colo_process_checkpoint(MigrationState *s) qemu_mutex_unlock_iothread(); trace_colo_vm_state_change("stop", "run"); +timer_mod(s->colo_delay_timer, +current_time + s->parameters.x_checkpoint_delay); + while (s->state == MIGRATION_STATUS_COLO) { if (failover_get_state() != FAILOVER_STATUS_NONE) { error_report("failover request"); goto out; } -current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); -if (current_time - checkpoint_time < -s->parameters.x_checkpoint_delay) { -int64_t delay_ms; +qemu_sem_wait(>colo_checkpoint_sem); -delay_ms = s->parameters.x_checkpoint_delay - - (current_time - checkpoint_time); -g_usleep(delay_ms * 1000); -} ret = colo_do_checkpoint_transaction(s, bioc, fb); if (ret < 0) { goto out; } -checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); } out: @@ -364,14 +359,32 @@ out: qemu_fclose(fb); } +timer_del(s->colo_delay_timer); + if (s->rp_state.from_dst_file) { qemu_fclose(s->rp_state.from_dst_file); } } +void colo_checkpoint_notify(void *opaque) +{ +MigrationState *s = opaque; +int64_t next_notify_time; + +qemu_sem_post(>colo_checkpoint_sem); +s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); +next_notify_time = s->colo_checkpoint_time + +s->parameters.x_checkpoint_delay; +timer_mod(s->colo_delay_timer, next_notify_time); +} + void migrate_start_colo_process(MigrationState *s) { qemu_mutex_unlock_iothread(); +qemu_sem_init(>colo_checkpoint_sem, 0); +s->colo_delay_timer = timer_new_ms(QEMU_CLOCK_HOST, +colo_checkpoint_notify, s); + migrate_set_state(>state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); colo_process_checkpoint(s); diff --git a/migration/migration.c b/migration/migration.c index f498ab8..a4861da 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -895,6 +895,9 @@ void qmp_migrate_set_parameters(MigrationParameters *params, Error **errp) if (params->has_x_checkpoint_delay) { s->parameters.x_checkpoint_delay = params->x_checkpoint_delay; +if (migration_in_colo_state()) { +colo_checkpoint_notify(s); +} } } --
Re: [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover
On 2017/1/18 19:01, Dr. David Alan Gilbert wrote: * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: If the net connection between primary host and secondary host breaks while COLO/COLO incoming threads are doing read() or write(). It will block until connection is timeout, and the failover process will be blocked because of it. So it is necessary to shutdown all the socket fds used by COLO to avoid this situation. Besides, we should close the corresponding file descriptors after failvoer BH shutdown them, Or there will be an error. Hi, There are two parts to this patch: a) Add some semaphores to sequence failover b) Use shutdown() At first I wondered if perhaps they should be split; but I see the reason for the semaphores is mostly to stop the race between the fd's getting closed and the shutdown() calls; so I think it's OK. Hi, Yes, you are right, maybe i should add some comments about that. Will do in next version. Do you have any problems with these semaphores during powering off the guest? No, we didn't encounter any problems or trigger any bugs in our test with this semaphores. In what places do you doubt it may has problems ? :) Thanks, Hailiang Dave Signed-off-by: zhanghailiangSigned-off-by: Li Zhijian Reviewed-by: Dr. David Alan Gilbert Cc: Dr. David Alan Gilbert --- include/migration/migration.h | 3 +++ migration/colo.c | 43 +++ 2 files changed, 46 insertions(+) diff --git a/include/migration/migration.h b/include/migration/migration.h index 487ac1e..7cac877 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -113,6 +113,7 @@ struct MigrationIncomingState { QemuThread colo_incoming_thread; /* The coroutine we should enter (back) after failover */ Coroutine *migration_incoming_co; +QemuSemaphore colo_incoming_sem; /* See savevm.c */ LoadStateEntry_Head loadvm_handlers; @@ -182,6 +183,8 @@ struct MigrationState QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests; /* The RAMBlock used in the last src_page_request */ RAMBlock *last_req_rb; +/* The semaphore is used to notify COLO thread that failover is finished */ +QemuSemaphore colo_exit_sem; /* The semaphore is used to notify COLO thread to do checkpoint */ QemuSemaphore colo_checkpoint_sem; diff --git a/migration/colo.c b/migration/colo.c index 08b2e46..3222812 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void) /* recover runstate to normal migration finish state */ autostart = true; } +/* + * Make sure COLO incoming thread not block in recv or send, + * If mis->from_src_file and mis->to_src_file use the same fd, + * The second shutdown() will return -1, we ignore this value, + * It is harmless. + */ +if (mis->from_src_file) { +qemu_file_shutdown(mis->from_src_file); +} +if (mis->to_src_file) { +qemu_file_shutdown(mis->to_src_file); +} old_state = failover_set_state(FAILOVER_STATUS_ACTIVE, FAILOVER_STATUS_COMPLETED); @@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void) "secondary VM", FailoverStatus_lookup[old_state]); return; } +/* Notify COLO incoming thread that failover work is finished */ +qemu_sem_post(>colo_incoming_sem); /* For Secondary VM, jump to incoming co */ if (mis->migration_incoming_co) { qemu_coroutine_enter(mis->migration_incoming_co); @@ -81,6 +95,18 @@ static void primary_vm_do_failover(void) migrate_set_state(>state, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); +/* + * Wake up COLO thread which may blocked in recv() or send(), + * The s->rp_state.from_dst_file and s->to_dst_file may use the + * same fd, but we still shutdown the fd for twice, it is harmless. + */ +if (s->to_dst_file) { +qemu_file_shutdown(s->to_dst_file); +} +if (s->rp_state.from_dst_file) { +qemu_file_shutdown(s->rp_state.from_dst_file); +} + old_state = failover_set_state(FAILOVER_STATUS_ACTIVE, FAILOVER_STATUS_COMPLETED); if (old_state != FAILOVER_STATUS_ACTIVE) { @@ -88,6 +114,8 @@ static void primary_vm_do_failover(void) FailoverStatus_lookup[old_state]); return; } +/* Notify COLO thread that failover work is finished */ +qemu_sem_post(>colo_exit_sem); } void colo_do_failover(MigrationState *s) @@ -361,6 +389,14 @@ out: timer_del(s->colo_delay_timer); +/* Hope this not to be too long to wait here */ +qemu_sem_wait(>colo_exit_sem); +
Re: [Qemu-devel] [libvirt] [Block Replication] Question about supporting COLO in libvirt
On 2017/2/6 20:39, Daniel P. Berrange wrote: On Mon, Feb 06, 2017 at 08:34:28PM +0800, Hailiang Zhang wrote: Hi, I'm trying to implement supporting COLO in libvirt, But i found an annoying problem that libvirt does not support the command line option argument syntax we used for block replication in QEMU. That is libvirt does not support the bellow syntax for block: -drive driver=qcow2,file.filename=test:a.qcow2 -drive file=test.qcow2,\ backing.file.filename=/dev/fdset/1 It seems to be a new syntax that libvirt does not support thought it has been exist in QEMU for a time. I found some introductions from http://www.linux-kvm.org/images/3/34/Kvm-forum-2013-block-dev-configuration.pdf The command line we use for COLO is just like the above syntax, For example, for the shared disk in COLO, it is: -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\ backing.driver=raw,backing.file.filename=1.raw \ -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\ file.driver=qcow2,top-id=active-disk0,\ file.file.filename=/mnt/ramfs/active_disk.img,\ file.backing=hidden_disk0,shared-disk=on For the none-shared disk in COLO, it is quite same with the shared-disk: -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \ -drive if=xxx,id=topxxx,driver=replication,mode=secondary,top-id=topxxx\ file.file.filename=active_disk.qcow2,\ file.driver=qcow2,\ file.backing.file.filename=hidden_disk.qcow2,\ file.backing.driver=qcow2,\ file.backing.backing=colo1 So there seems to be two ways to solve this problem. One is to support this new option argument syntax in libvirt, but I'm not sure if it is difficult or not to implement it, and i don't know where to start either. Libvirt has to start supporting this new syntax. It is required for many different use cases beyond just colo. For example, to be able to explicitly given qemu details about a backing file format to remove probing, or to be able to set LUKS passwords on backing files, and more beside Thanks for your quick reply, do you or other developers in libvirt have plan to implement it ? Thanks, Hailiang Regards, Daniel
Re: [Qemu-devel] [PATCH 1/3] colo-compare: reconstruct the mutex lock usage
On 2017/2/7 17:21, Jason Wang wrote: On 2017年02月07日 16:19, Hailiang Zhang wrote: On 2017/2/7 15:57, Jason Wang wrote: On 2017年02月07日 15:54, Hailiang Zhang wrote: Hi Jason, On 2017/2/6 20:53, Jason Wang wrote: On 2017年02月06日 19:11, Hailiang Zhang wrote: On 2017/2/6 17:35, Jason Wang wrote: On 2017年02月06日 16:13, Hailiang Zhang wrote: On 2017/2/3 11:47, Jason Wang wrote: On 2017年01月24日 22:05, zhanghailiang wrote: The original 'timer_check_lock' mutex lock of struct CompareState is used to protect the 'conn_list' queue and its child queues which are 'primary_list' and 'secondary_list', which is a little abused and confusing To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock' which is used to protect 'conn_list' queue, use another 'conn_lock' to protect 'primary_list' and 'secondary_list'. Besides, fix some missing places which need these mutex lock. Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com> Instead of sticking to such kind of mutex, I think it's time to make colo timer run in colo thread (there's a TODO in the code). Er, it seems that, we still need these mutex locks even we make colo timer and colo thread run in the same thread, because we may access the connect/primary/secondary list from colo (migratioin) thread concurrently. Just make sure I understand the issue, why need it access the list? Besides, it seems to be a little complex to make colo timer run in colo compare thread, and it is not this series' goal. Seems not by just looking at how it was implemented in main loop, but maybe I was wrong. This series is preparing work for integrating COLO compare with COLO frame and it is prerequisite. So, we may consider implementing it later in another series. Zhang Chen, what's your opinion ? The problem is this patch make things even worse, it introduces one more mutex. Hmm, for most cases, we need to get these two locks at the same time. We can use one lock to protect conn_list/primary_list/secondary_list, (We need to get this lock before operate all these lists, as you can see in patch 2, while do checkpoint, we may operate these lists in COLO checkpoint thread concurrently.) But for the original codes, we didn't got timer_check_lock in packet_enqueue() while operate conn_list/primary_list/secondary_list, and didn't got this lock in colo_compare_connection while operate secondary_list either. So, is it OK to use the conn_lock instead of timer_check_lock, and add the lock where it is need ? I'd like to know if timer were run in colo thread (this looks as simple as a g_timeout_source_new() followed by a g_source_attach()), why do we still need a mutex. And if we need it now but plan to remove it in the future, I'd like to use big lock to simplify the codes. After investigated your above suggestion, I think it works by using g_timeout_source_new() and g_source_attach(), but I'm not sure if it is a good idea to use the big lock to protect the possible concurrent cases which seem to only happen between COLO migration thread and COLO compare thread, any further suggestions ? I think I need first understand why migration thread need to access the list? Er, sorry to confuse you here, to be exactly, it is COLO checkpoint thread, we reuse the migration thread to realize checkpoint process for COLO, Because qemu enters into COLO state after a complete migration, so we reuse it. While do checkpoint, we need to release all the buffered packets that have not yet been compared, so we need to access the list in COLO checkpoint thread. I think the better way is notify the comparing thread and let it do the releasing. You probably need similar mechanism to notify from comparing thread to checkpoint thread. Hmm, that's really a good idea, it can avoid using the mutex lock, I'll try to implement it in next version, thanks very much. :) Thanks Thanks. .
Re: [Qemu-devel] [PATCH 1/3] colo-compare: reconstruct the mutex lock usage
On 2017/2/7 15:57, Jason Wang wrote: On 2017年02月07日 15:54, Hailiang Zhang wrote: Hi Jason, On 2017/2/6 20:53, Jason Wang wrote: On 2017年02月06日 19:11, Hailiang Zhang wrote: On 2017/2/6 17:35, Jason Wang wrote: On 2017年02月06日 16:13, Hailiang Zhang wrote: On 2017/2/3 11:47, Jason Wang wrote: On 2017年01月24日 22:05, zhanghailiang wrote: The original 'timer_check_lock' mutex lock of struct CompareState is used to protect the 'conn_list' queue and its child queues which are 'primary_list' and 'secondary_list', which is a little abused and confusing To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock' which is used to protect 'conn_list' queue, use another 'conn_lock' to protect 'primary_list' and 'secondary_list'. Besides, fix some missing places which need these mutex lock. Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com> Instead of sticking to such kind of mutex, I think it's time to make colo timer run in colo thread (there's a TODO in the code). Er, it seems that, we still need these mutex locks even we make colo timer and colo thread run in the same thread, because we may access the connect/primary/secondary list from colo (migratioin) thread concurrently. Just make sure I understand the issue, why need it access the list? Besides, it seems to be a little complex to make colo timer run in colo compare thread, and it is not this series' goal. Seems not by just looking at how it was implemented in main loop, but maybe I was wrong. This series is preparing work for integrating COLO compare with COLO frame and it is prerequisite. So, we may consider implementing it later in another series. Zhang Chen, what's your opinion ? The problem is this patch make things even worse, it introduces one more mutex. Hmm, for most cases, we need to get these two locks at the same time. We can use one lock to protect conn_list/primary_list/secondary_list, (We need to get this lock before operate all these lists, as you can see in patch 2, while do checkpoint, we may operate these lists in COLO checkpoint thread concurrently.) But for the original codes, we didn't got timer_check_lock in packet_enqueue() while operate conn_list/primary_list/secondary_list, and didn't got this lock in colo_compare_connection while operate secondary_list either. So, is it OK to use the conn_lock instead of timer_check_lock, and add the lock where it is need ? I'd like to know if timer were run in colo thread (this looks as simple as a g_timeout_source_new() followed by a g_source_attach()), why do we still need a mutex. And if we need it now but plan to remove it in the future, I'd like to use big lock to simplify the codes. After investigated your above suggestion, I think it works by using g_timeout_source_new() and g_source_attach(), but I'm not sure if it is a good idea to use the big lock to protect the possible concurrent cases which seem to only happen between COLO migration thread and COLO compare thread, any further suggestions ? I think I need first understand why migration thread need to access the list? Er, sorry to confuse you here, to be exactly, it is COLO checkpoint thread, we reuse the migration thread to realize checkpoint process for COLO, Because qemu enters into COLO state after a complete migration, so we reuse it. While do checkpoint, we need to release all the buffered packets that have not yet been compared, so we need to access the list in COLO checkpoint thread. Thanks. Thanks Thanks, Hailiang Thanks Thanks. Hailiang Thanks Thanks, Hailiang Thought? Thanks . . . .
Re: [Qemu-devel] [PATCH 1/3] colo-compare: reconstruct the mutex lock usage
Hi Jason, On 2017/2/6 20:53, Jason Wang wrote: On 2017年02月06日 19:11, Hailiang Zhang wrote: On 2017/2/6 17:35, Jason Wang wrote: On 2017年02月06日 16:13, Hailiang Zhang wrote: On 2017/2/3 11:47, Jason Wang wrote: On 2017年01月24日 22:05, zhanghailiang wrote: The original 'timer_check_lock' mutex lock of struct CompareState is used to protect the 'conn_list' queue and its child queues which are 'primary_list' and 'secondary_list', which is a little abused and confusing To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock' which is used to protect 'conn_list' queue, use another 'conn_lock' to protect 'primary_list' and 'secondary_list'. Besides, fix some missing places which need these mutex lock. Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com> Instead of sticking to such kind of mutex, I think it's time to make colo timer run in colo thread (there's a TODO in the code). Er, it seems that, we still need these mutex locks even we make colo timer and colo thread run in the same thread, because we may access the connect/primary/secondary list from colo (migratioin) thread concurrently. Just make sure I understand the issue, why need it access the list? Besides, it seems to be a little complex to make colo timer run in colo compare thread, and it is not this series' goal. Seems not by just looking at how it was implemented in main loop, but maybe I was wrong. This series is preparing work for integrating COLO compare with COLO frame and it is prerequisite. So, we may consider implementing it later in another series. Zhang Chen, what's your opinion ? The problem is this patch make things even worse, it introduces one more mutex. Hmm, for most cases, we need to get these two locks at the same time. We can use one lock to protect conn_list/primary_list/secondary_list, (We need to get this lock before operate all these lists, as you can see in patch 2, while do checkpoint, we may operate these lists in COLO checkpoint thread concurrently.) But for the original codes, we didn't got timer_check_lock in packet_enqueue() while operate conn_list/primary_list/secondary_list, and didn't got this lock in colo_compare_connection while operate secondary_list either. So, is it OK to use the conn_lock instead of timer_check_lock, and add the lock where it is need ? I'd like to know if timer were run in colo thread (this looks as simple as a g_timeout_source_new() followed by a g_source_attach()), why do we still need a mutex. And if we need it now but plan to remove it in the future, I'd like to use big lock to simplify the codes. After investigated your above suggestion, I think it works by using g_timeout_source_new() and g_source_attach(), but I'm not sure if it is a good idea to use the big lock to protect the possible concurrent cases which seem to only happen between COLO migration thread and COLO compare thread, any further suggestions ? Thanks, Hailiang Thanks Thanks. Hailiang Thanks Thanks, Hailiang Thought? Thanks . . .