from:"Hailiang Zhang"

[PATCH v2] MAINTAINERS: Change my email address

2021-12-13 Thread Hailiang Zhang

The zhang.zhanghaili...@huawei.com email address has been
stopped. Change it to my new email address.

Signed-off-by: Hailiang Zhang 
---
hi Juan & Dave,

Firstly, thank you for your working on maintaining the COLO framework.
I didn't have much time on it in the past days.

I may have some time in the next days since my job has changed.

Because of my old email being stopped, i can not use it to send this patch.
Please help me to merge this patch.

Thanks,
Hailiang
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7543eb4d59..5d9c4243b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2967,7 +2967,7 @@ F: include/qemu/yank.h
 F: qapi/yank.json
 
 COLO Framework
-M: zhanghailiang 
+M: Hailiang Zhang 
 S: Maintained
 F: migration/colo*
 F: include/migration/colo.h
-- 
2.34.1

[PATCH] MAINTAINERS: Change my email address

2021-12-13 Thread Hailiang Zhang

The zhang.zhanghaili...@huawei.com email address has been
stopped. Change it to my new email address.

Signed-off-by: Hailiang Zhang 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7543eb4d59..5d9c4243b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2967,7 +2967,7 @@ F: include/qemu/yank.h
 F: qapi/yank.json
 
 COLO Framework
-M: zhanghailiang 
+M: Hailiang Zhang 
 S: Maintained
 F: migration/colo*
 F: include/migration/colo.h
-- 
2.34.1

[PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing

2020-02-16 Thread Hailiang Zhang

We can migrate some dirty pages during the gap of checkpointing,
by this way, we can reduce the amount of ram migrated during checkpointing.

Signed-off-by: Hailiang Zhang 
---
 migration/colo.c   | 69 +++---
 migration/migration.h  |  1 +
 migration/trace-events |  1 +
 qapi/migration.json|  4 ++-
 4 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 93c5a452fb..d30c6bc4ad 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -46,6 +46,13 @@ static COLOMode last_colo_mode;
 
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
+#define DEFAULT_RAM_PENDING_CHECK 1000
+
+/* should be calculated by bandwidth and max downtime ? */
+#define THRESHOLD_PENDING_SIZE (100 * 1024 * 1024UL)
+
+static int checkpoint_request;
+
 bool migration_in_colo_state(void)
 {
 MigrationState *s = migrate_get_current();
@@ -516,6 +523,20 @@ static void colo_compare_notify_checkpoint(Notifier 
*notifier, void *data)
 colo_checkpoint_notify(data);
 }
 
+static bool colo_need_migrate_ram_background(MigrationState *s)
+{
+uint64_t pending_size, pend_pre, pend_compat, pend_post;
+int64_t max_size = THRESHOLD_PENDING_SIZE;
+
+qemu_savevm_state_pending(s->to_dst_file, max_size, _pre,
+  _compat, _post);
+pending_size = pend_pre + pend_compat + pend_post;
+
+trace_colo_need_migrate_ram_background(pending_size);
+return (pending_size >= max_size);
+}
+
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 QIOChannelBuffer *bioc;
@@ -571,6 +592,8 @@ static void colo_process_checkpoint(MigrationState *s)
 
 timer_mod(s->colo_delay_timer,
 current_time + s->parameters.x_checkpoint_delay);
+timer_mod(s->pending_ram_check_timer,
+current_time + DEFAULT_RAM_PENDING_CHECK);
 
 while (s->state == MIGRATION_STATUS_COLO) {
 if (failover_get_state() != FAILOVER_STATUS_NONE) {
@@ -583,10 +606,25 @@ static void colo_process_checkpoint(MigrationState *s)
 if (s->state != MIGRATION_STATUS_COLO) {
 goto out;
 }
-ret = colo_do_checkpoint_transaction(s, bioc, fb);
-if (ret < 0) {
-goto out;
-}
+if (atomic_xchg(_request, 0)) {
+/* start a colo checkpoint */
+ret = colo_do_checkpoint_transaction(s, bioc, fb);
+if (ret < 0) {
+goto out;
+}
+} else {
+if (colo_need_migrate_ram_background(s)) {
+colo_send_message(s->to_dst_file,
+  COLO_MESSAGE_MIGRATE_RAM_BACKGROUND,
+  _err);
+if (local_err) {
+goto out;
+}
+
+qemu_savevm_state_iterate(s->to_dst_file, false);
+qemu_put_byte(s->to_dst_file, QEMU_VM_EOF);
+}
+ }
 }
 
 out:
@@ -626,6 +664,8 @@ out:
 colo_compare_unregister_notifier(_compare_notifier);
 timer_del(s->colo_delay_timer);
 timer_free(s->colo_delay_timer);
+timer_del(s->pending_ram_check_timer);
+timer_free(s->pending_ram_check_timer);
 qemu_sem_destroy(>colo_checkpoint_sem);
 
 /*
@@ -643,6 +683,7 @@ void colo_checkpoint_notify(void *opaque)
 MigrationState *s = opaque;
 int64_t next_notify_time;
 
+atomic_inc(_request);
 qemu_sem_post(>colo_checkpoint_sem);
 s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 next_notify_time = s->colo_checkpoint_time +
@@ -650,6 +691,19 @@ void colo_checkpoint_notify(void *opaque)
 timer_mod(s->colo_delay_timer, next_notify_time);
 }
 
+static void colo_pending_ram_check_notify(void *opaque)
+{
+int64_t next_notify_time;
+MigrationState *s = opaque;
+
+if (migration_in_colo_state()) {
+next_notify_time = DEFAULT_RAM_PENDING_CHECK +
+   qemu_clock_get_ms(QEMU_CLOCK_HOST);
+timer_mod(s->pending_ram_check_timer, next_notify_time);
+qemu_sem_post(>colo_checkpoint_sem);
+}
+}
+
 void migrate_start_colo_process(MigrationState *s)
 {
 qemu_mutex_unlock_iothread();
@@ -657,6 +711,8 @@ void migrate_start_colo_process(MigrationState *s)
 s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
 colo_checkpoint_notify, s);
 
+s->pending_ram_check_timer = timer_new_ms(QEMU_CLOCK_HOST,
+colo_pending_ram_check_notify, s);
 qemu_sem_init(>colo_exit_sem, 0);
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
@@ -805,6 +861,11 @@ static void 
colo_wait_handle_message(MigrationIncomingState *mis,
 case COLO_MESSAGE_CHECKPOINT_REQUEST:
 colo_incoming_process_checkpoint(mis, fb, bioc, errp);
 break;
+case COLO_MESSAGE_MIGRATE

[PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO

2020-02-16 Thread Hailiang Zhang

Hi,

This is an untested serial that tries to reduce VM's pause time
while do checkpoint in COLO state.

The second patch tries to reduce the total number of dirty pages
while do checkpoint with VM been paused, instead of sending all
dirty pages while VM been pause, it sends part of dirty pages during
the gap time of two checkpoints when SVM and PVM are running.

The third patch tries to reduce the pause time of backup ram into
cache in secondary part.


Hailiang Zhang (3):
  migration/colo: wrap incoming checkpoint process into new helper
  COLO: Migrate dirty pages during the gap of checkpointing
  COLO: Optimize memory back-up process

 migration/colo.c   | 332 +
 migration/migration.h  |   1 +
 migration/ram.c|  35 -
 migration/ram.h|   1 +
 migration/trace-events |   1 +
 qapi/migration.json|   4 +-
 6 files changed, 234 insertions(+), 140 deletions(-)

-- 
2.21.0

[PATCH 3/3] COLO: Optimize memory back-up process

2020-02-16 Thread Hailiang Zhang

This patch will reduce the downtime of VM for the initial process,
Privously, we copied all these memory in preparing stage of COLO
while we need to stop VM, which is a time-consuming process.
Here we optimize it by a trick, back-up every page while in migration
process while COLO is enabled, though it affects the speed of the
migration, but it obviously reduce the downtime of back-up all SVM'S
memory in COLO preparing stage.

Signed-off-by: Hailiang Zhang 
---
 migration/colo.c |  3 +++
 migration/ram.c  | 35 +++
 migration/ram.h  |  1 +
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index d30c6bc4ad..febf010571 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -26,6 +26,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/rcu.h"
 #include "migration/failover.h"
+#include "migration/ram.h"
 #ifdef CONFIG_REPLICATION
 #include "replication.h"
 #endif
@@ -906,6 +907,8 @@ void *colo_process_incoming_thread(void *opaque)
  */
 qemu_file_set_blocking(mis->from_src_file, true);
 
+colo_incoming_start_dirty_log();
+
 bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
 fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
diff --git a/migration/ram.c b/migration/ram.c
index ed23ed1c7c..24a8aa3527 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2986,7 +2986,6 @@ int colo_init_ram_cache(void)
 }
 return -errno;
 }
-memcpy(block->colo_cache, block->host, block->used_length);
 }
 }
 
@@ -3005,12 +3004,16 @@ int colo_init_ram_cache(void)
 bitmap_set(block->bmap, 0, pages);
 }
 }
+
+return 0;
+}
+
+void colo_incoming_start_dirty_log(void)
+{
 ram_state = g_new0(RAMState, 1);
 ram_state->migration_dirty_pages = 0;
 qemu_mutex_init(_state->bitmap_mutex);
 memory_global_dirty_log_start();
-
-return 0;
 }
 
 /* It is need to hold the global lock to call this helper */
@@ -3348,7 +3351,7 @@ static int ram_load_precopy(QEMUFile *f)
 
 while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
 ram_addr_t addr, total_ram_bytes;
-void *host = NULL;
+void *host = NULL, *host_bak = NULL;
 uint8_t ch;
 
 /*
@@ -3378,13 +3381,26 @@ static int ram_load_precopy(QEMUFile *f)
 if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
 RAMBlock *block = ram_block_from_stream(f, flags);
-
 /*
- * After going into COLO, we should load the Page into colo_cache.
+ * After going into COLO, we should load the Page into colo_cache
+ * NOTE: We need to keep a copy of SVM's ram in colo_cache.
+ * Privously, we copied all these memory in preparing stage of COLO
+ * while we need to stop VM, which is a time-consuming process.
+ * Here we optimize it by a trick, back-up every page while in
+ * migration process while COLO is enabled, though it affects the
+ * speed of the migration, but it obviously reduce the downtime of
+ * back-up all SVM'S memory in COLO preparing stage.
  */
-if (migration_incoming_in_colo_state()) {
+if (migration_incoming_colo_enabled()) {
 host = colo_cache_from_block_offset(block, addr);
-} else {
+/*
+ * After going into COLO, load the Page into colo_cache.
+ */
+if (!migration_incoming_in_colo_state()) {
+host_bak = host;
+}
+}
+if (!migration_incoming_in_colo_state()) {
 host = host_from_ram_block_offset(block, addr);
 }
 if (!host) {
@@ -3506,6 +3522,9 @@ static int ram_load_precopy(QEMUFile *f)
 if (!ret) {
 ret = qemu_file_get_error(f);
 }
+if (!ret && host_bak && host) {
+memcpy(host_bak, host, TARGET_PAGE_SIZE);
+}
 }
 
 ret |= wait_for_decompress_done();
diff --git a/migration/ram.h b/migration/ram.h
index a553d40751..5ceaff7cb4 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -66,5 +66,6 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 /* ram cache */
 int colo_init_ram_cache(void);
 void colo_release_ram_cache(void);
+void colo_incoming_start_dirty_log(void);
 
 #endif
-- 
2.21.0

[PATCH 1/3] migration/colo: wrap incoming checkpoint process into new helper

2020-02-16 Thread Hailiang Zhang

Split checkpoint incoming process into a helper.

Signed-off-by: Hailiang Zhang 
---
 migration/colo.c | 260 ---
 1 file changed, 133 insertions(+), 127 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 2c88aa57a2..93c5a452fb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -664,13 +664,138 @@ void migrate_start_colo_process(MigrationState *s)
 qemu_mutex_lock_iothread();
 }
 
-static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request,
- Error **errp)
+static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
+  QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp)
+{
+uint64_t total_size;
+uint64_t value;
+Error *local_err = NULL;
+int ret;
+
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+trace_colo_vm_state_change("run", "stop");
+qemu_mutex_unlock_iothread();
+
+/* FIXME: This is unnecessary for periodic checkpoint mode */
+colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
+ _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+colo_receive_check_message(mis->from_src_file,
+   COLO_MESSAGE_VMSTATE_SEND, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+qemu_mutex_lock_iothread();
+cpu_synchronize_all_pre_loadvm();
+ret = qemu_loadvm_state_main(mis->from_src_file, mis);
+qemu_mutex_unlock_iothread();
+
+if (ret < 0) {
+error_setg(errp, "Load VM's live state (ram) error");
+return;
+}
+
+value = colo_receive_message_value(mis->from_src_file,
+ COLO_MESSAGE_VMSTATE_SIZE, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+/*
+ * Read VM device state data into channel buffer,
+ * It's better to re-use the memory allocated.
+ * Here we need to handle the channel buffer directly.
+ */
+if (value > bioc->capacity) {
+bioc->capacity = value;
+bioc->data = g_realloc(bioc->data, bioc->capacity);
+}
+total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value);
+if (total_size != value) {
+error_setg(errp, "Got %" PRIu64 " VMState data, less than expected"
+" %" PRIu64, total_size, value);
+return;
+}
+bioc->usage = total_size;
+qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL);
+
+colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
+ _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+qemu_mutex_lock_iothread();
+vmstate_loading = true;
+ret = qemu_load_device_state(fb);
+if (ret < 0) {
+error_setg(errp, "COLO: load device state failed");
+qemu_mutex_unlock_iothread();
+return;
+}
+
+#ifdef CONFIG_REPLICATION
+replication_get_error_all(_err);
+if (local_err) {
+error_propagate(errp, local_err);
+qemu_mutex_unlock_iothread();
+return;
+}
+
+/* discard colo disk buffer */
+replication_do_checkpoint_all(_err);
+if (local_err) {
+error_propagate(errp, local_err);
+qemu_mutex_unlock_iothread();
+return;
+}
+#else
+abort();
+#endif
+/* Notify all filters of all NIC to do checkpoint */
+colo_notify_filters_event(COLO_EVENT_CHECKPOINT, _err);
+
+if (local_err) {
+error_propagate(errp, local_err);
+qemu_mutex_unlock_iothread();
+return;
+}
+
+vmstate_loading = false;
+vm_start();
+trace_colo_vm_state_change("stop", "run");
+qemu_mutex_unlock_iothread();
+
+if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
+failover_set_state(FAILOVER_STATUS_RELAUNCH,
+FAILOVER_STATUS_NONE);
+failover_request_active(NULL);
+return;
+}
+
+colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
+ _err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
+}
+
+static void colo_wait_handle_message(MigrationIncomingState *mis,
+QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp)
 {
 COLOMessage msg;
 Error *local_err = NULL;
 
-msg = colo_receive_message(f, _err);
+msg = colo_receive_message(mis->from_src_file, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -678,10 +803,9 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 
 switch (msg) {
 case COLO_MESSAGE_CHECKPOINT_REQUEST:
-*checkpoint_request = 1;
+colo_incoming

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

2018-05-29 Thread Hailiang Zhang


On 2018/4/24 14:13, Wei Wang wrote:

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
 Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
 Guest: 8G RAM, 4 vCPU
 Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
 - Idle Guest Live Migration Time (results are averaged over 10 runs):
 - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
 - Guest with Linux Compilation Workload (make bzImage -j4):
 - Live Migration Time (average)
   Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
 - Linux Compilation Time
   Optimization v.s. Legacy = 4min56s v.s. 5min3s
   --> no obvious difference

- Source Code
 - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
 - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

ChangeLog:
v6->v7:
   virtio-balloon/virtio_balloo_poll_free_page_hints:
   - add virtio_notify() at the end to notify the driver that
 the optimization is done, which indicates that the entries
 have all been put back to the vq and ready to detach them.
v5->v6:
   virtio-balloon: use iothread to get free page hint
v4->v5:
 1) migration:
 - bitmap_clear_dirty: update the dirty bitmap and dirty page
   count under the bitmap mutex as what other functions are doing;
 - qemu_guest_free_page_hint:
 - add comments for this function;
 - check the !block case;
 - check "offset > block->used_length" before proceed;
 - assign used_len inside the for{} body;
 - update the dirty bitmap and dirty page counter under the
   bitmap mutex;
 - ram_state_reset:
 - rs->free_page_support: && with use "migrate_postcopy"
   instead of migration_in_postcopy;
 - clear the ram_bulk_stage flag if free_page_support is true;
 2) balloon:
  - add the usage documentation of balloon_free_page_start and
balloon_free_page_stop in code;
  - the optimization thread is named "balloon_fpo" to meet the
requirement of "less than 14 characters";
  - virtio_balloon_poll_free_page_hints:
  - run on condition when runstate_is_running() is true;
  - add a qemu spin lock to synchronize accesses to the free
page reporting related fields shared among the migration
thread and the optimization thread;
   - virtio_balloon_free_page_start: just return if
 runstate_is_running is false;
   - virtio_balloon_free_page_stop: access to the free page
 reporting related fields under a qemu spin lock;
   - virtio_balloon_device_unrealize/reset: call
 virtio_balloon_free_page_stop is the free page hint feature is
 used;
   - virtio_balloon_set_status: call irtio_balloon_free_page_stop
 in case the guest is stopped by qmp when the optimization is
 running;
v3->v4:
 1) bitmap: add a new API to count 1s starting from an offset of a
bitmap
 2) migration:
 - qemu_guest_free_page_hint: calculate
   ram_state->migration_dirty_pages by counting how many bits of
   free pages are truely cleared. If some of the bits were
   already 0, they shouldn't be deducted by
   ram_state->migration_dirty_pages. This wasn't needed for
   previous versions since we optimized bulk stage only,
   where all bits are guaranteed to be set. It's needed now
   because we extened the usage of this optimizaton to all stages
   except the last stop stage. From 2nd stage onward, there
   are possibilities that some bits of free pages are already 0.
  3) virtio-balloon:
  - virtio_balloon_free_page_report_status: introduce a new status,
FREE_PAGE_REPORT_S_EXIT. This status indicates that the
optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
the reporting is stopped, but the optimization thread still needs
to be joined by the migration thread.
v2->v3:
 1) virtio-balloon
 - virtio_balloon_free_page_start: poll the hints using a new
   thread;
 - use cmd id between [0x8000, UINT_MAX];
 - virtio_balloon_poll_free_page_hints:
 - stop the optimization only when it has started;
 - don't skip free pages when !poison_val;
 - add poison_val to vmsd to migrate;
 - virtio_balloon_get_features: add the F_PAGE_POISON feature when

Re: [Qemu-devel] 答复: [BUG] Windows 7 got stuck easily while run PCMark10 application

2017-12-01 Thread Hailiang Zhang


On 2017/12/2 2:37, Paolo Bonzini wrote:

On 01/12/2017 18:45, Gonglei (Arei) wrote:

I also think it's windows bug, the problem is that it doesn't occur on
xen platform.

It's a race, it may just be that RTC PIO is faster in Xen because it's
implemented in the hypervisor.


No, In Xen, it does not has such problem because it injects the RTC irq without
checking whether its previous irq been cleared or not, which we do has such 
checking
in KVM.

static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
int irq_level, bool line_status)
{
   ... ...
if (!irq_level) {
ioapic->irr &= ~mask; -->clear the RTC irq in irr, Or we will can not 
inject RTC irq.
ret = 1;
goto out;
}

I agree that we move the operation of clearing RTC irq from cmos_ioport_read() 
to
cmos_ioport_write() to ensure the action been done.

Thanks,
Hailiang


I will try reporting it to Microsoft.

Thanks,

Paolo


Thanks,
Gonglei
*发件人：*Paolo Bonzini
*收件人：*龚磊,张海亮,qemu-devel,Michael S. Tsirkin
*抄送：*黄伟栋,王欣,谢祥有
*时间：*2017-12-02 01:10:08
*主题:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application

On 01/12/2017 08:08, Gonglei (Arei) wrote:

First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> Second write to 
0x70, cmos_index = 0x86 & 0x7f = 0x6>CPU 1/KVM-15567 kvm_pio: pio_write at 0x70 
size 1 count 1 val 0x86> vcpu0 read 0x6 because cmos_index is 0x6 now:>CPU 
0/KVM-15566 kvm_pio: pio_read at 0x71 size 1 count 1 val 0x6> vcpu1

read 0x6:>CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count
1 val 0x6
This seems to be a Windows bug.  The easiest workaround that I
can think of is to clear the interrupts already when 0xc is written,
without waiting for the read (because REG_C can only be read).

What do you think?

Thanks,

Paolo


.

Re: [Qemu-devel] colo-compare: segfault and assert on colo_compare_finalize

2017-08-08 Thread Hailiang Zhang


Hi,

Did you test this branch 
https://github.com/coloft/qemu/tree/colo-for-qemu-2.10-2017-4-22 ?

This seems to be an already known problem, I'm not quite sure, it may be fixed 
by this patch

b19456dd0ea4eb418ad093f092adbb882be13054
char: Fix removing wrong GSource that be found by fd_in_tag
We use fd_in_tag to find a GSource, fd_in_tag is return value of g_source_attach(GSource *source, GMainContext *context), the return value is unique only in the same context, so we may get the same values with different 'context' parameters. It is no problem to find the right fd_in_tag by using g_main_context_find_source_by_id(GMainContext *context, guint source_id) while there is only one default main context. But colo-compare tries to create/use its own context, and if we pass wrong 'context' parameter with right fd_in_tag, we will find a wrong GSource to handle. We tried to fix the related codes in commit b43decb015a6efeb9e3cdbdb80f6547ad7248a4c, but it didn't fix the bug completely, because we still have some codes didn't pass *right* context parameter for remove_fd_in_watch(). Let's fix it by record the GSource directly instead of fd_in_tag. Signed-off-by: zhanghailiang  Reviewed-by: Marc-AndrÃ© Lureau  
Message-Id: <1492564532-91680-1-git-send-email-zhang.zhanghaili...@huawei.com> Signed-off-by: Paolo Bonzini  Actually, we have already re-writed this part, and please follow the later series. Thanks, Hailiang


On 2017/8/8 0:39, Eduardo Otubo wrote:

(please ignore my last email, looks like mutt wants play games lately)

Hi all,

I have found a problem on colo-compare that leads to segmentation fault
when calling qemu like this:

   $ qemu-system-x86_64 -S -machine pc -object colo-compare,id=test-object

First I got an assert failed:

   (qemu-system-x86_64:7887): GLib-CRITICAL **: g_main_loop_quit:
assertion 'loop != NULL' failed

  From this looks like s->compare_loop is NULL on the function
colo_compare_finalize(), then I just added a check there and the assert
went away. But then there's the segfault:

   Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
   0x7333f79e in pthread_join () from /lib64/libpthread.so.0
   (gdb) bt
   #0  0x7333f79e in pthread_join () at /lib64/libpthread.so.0
   #1  0x55c379d2 in qemu_thread_join (thread=0x77ff5160) at
util/qemu-thread-posix.c:547
   #2  0x55adfc1a in colo_compare_finalize (obj=0x77fd3010)
at net/colo-compare.c:867
   #3  0x55b2cd87 in object_deinit (obj=0x77fd3010,
type=0x567432e0) at qom/object.c:453
   #4  0x55b2cdf9 in object_finalize (data=0x77fd3010) at
qom/object.c:467
   #5  0x55b2dd80 in object_unref (obj=0x77fd3010) at
qom/object.c:902
   #6  0x55b319a5 in user_creatable_add_type (type=0x567499a0
"colo-compare", id=0x56749960 "test-object", qdict=0x56835750,
v=0x5681a3f0, errp=0x7fffde58) at qom/object_interfaces.c:105
   #7  0x55b31b02 in user_creatable_add_opts
(opts=0x56749910, errp=0x7fffde58) at qom/object_interfaces.c:135
   #8  0x55b31bfd in user_creatable_add_opts_foreach
(opaque=0x558e9c39 , opts=0x56749910,
errp=0x0) at qom/object_interfaces.c:159
   #9  0x55c4aecf in qemu_opts_foreach (list=0x56157ac0
, func=0x55b31b6f
, opaque=0x558e9c39
, errp=0x0) at util/qemu-option.c:1104
   #10 0x558edb75 in main (argc=6, argv=0x7fffe2d8,
envp=0x7fffe310) at vl.c:4520

At this point '>thread' is '0'. Is this segfault and the above
mentioned assert trigged because I'm creating a colo-compare object
without any other parameter? In a positive case, a simple workaround and
error check should do it. Otherwise I'll debug a little more.

Best regards,

Re: [Qemu-devel] [PATCH 2/3] COLO: Define COLOMode without QAPI

2017-07-29 Thread Hailiang Zhang


On 2017/7/29 1:17, Dr. David Alan Gilbert wrote:

* Markus Armbruster (arm...@redhat.com) wrote:

COLOMode is defined in the QAPI schema, but not used there.  Of the
stuff QAPI generates for it only the typedef is actually used.  Use of
QAPI is pointless and only complicates things, so don't.

Hmm, in one of the old COLO worlds I have, there's code to emit an event
on exiting from COLO and that event includes the mode it was in.


Yes, we need it in the later series.


If the intent is to bring that or similar back then it would be worth
keeping.


Agreed.  ;)


Dave


Cc: zhanghailiang 
Signed-off-by: Markus Armbruster 
---
  include/migration/colo.h |  6 ++
  qapi-schema.json | 16 
  2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index ff9874e..5d7c500 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -26,6 +26,12 @@ void migration_incoming_exit_colo(void);
  void *colo_process_incoming_thread(void *opaque);
  bool migration_incoming_in_colo_state(void);
  
+typedef enum {

+COLO_MODE_UNKNOWN,
+COLO_MODE_PRIMARY,
+COLO_MODE_SECONDARY,
+} COLOMode;
+
  COLOMode get_colo_mode(void);
  
  /* failover */

diff --git a/qapi-schema.json b/qapi-schema.json
index 9b6f6cb..3f0eb05 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1304,22 +1304,6 @@
  'vmstate-loaded' ] }
  
  ##

-# @COLOMode:
-#
-# The colo mode
-#
-# @unknown: unknown mode
-#
-# @primary: master side
-#
-# @secondary: slave side
-#
-# Since: 2.8
-##
-{ 'enum': 'COLOMode',
-  'data': [ 'unknown', 'primary', 'secondary'] }
-
-##
  # @FailoverStatus:
  #
  # An enumeration of COLO failover status
--
2.7.5



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.

Re: [Qemu-devel] [PATCH 05/12] migration: Move colo.h to migration/

2017-05-15 Thread Hailiang Zhang


On 2017/5/15 19:04, Juan Quintela wrote:

"Dr. David Alan Gilbert"  wrote:
D> * Juan Quintela (quint...@redhat.com) wrote:

There are functions only used by migration code.

That's only mostly true; see the current 'integrate colo frame with
block replication and net compare' series (posted 22nd April).
That adds colo_handle_shutdown to this header and calls it from vl.c
( https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03901.html )
where should that go?

Dropped.

It compiled, but who knows O:-)


There's also a net/colo.h as well, so using the
   #include "colo.h" in migration is correct but that's
really scary when there are two files of the same name.


Ha, I'd like to suggest it be renamed as colo-proxy.h (colo-net.h ? ) for 
net/colo.h,
which is saner,  hmm, together with colo.c, colo-proxy.c ? colo-net.c ? Or any 
other proper name ?
( I didn't notice that until it went into upstream.  O:-) )

Cc:  Zhang Chen 


Yeap, it is scary ...

Re: [Qemu-devel] [PATCH V4 02/12] net/filter-mirror.c: Add new option to enable vnet support for filter-mirror

2017-05-12 Thread Hailiang Zhang


Hi,

On 2017/5/12 9:41, Zhang Chen wrote:

We add the vnet_hdr option for filter-mirror, default is disable.
If you use virtio-net-pci net driver, please enable it.
You can use it for example:
-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr=on


Is there any way to detect whether or not the vNIC using vnet_hdr ?
I don't think it is a good idea to let users to confirm it, especially for 
users who may not
be so familiar with the vNIC realizing in qemu.


Thanks,
Hailiang


Signed-off-by: Zhang Chen 
---
  net/filter-mirror.c | 34 ++
  qemu-options.hx |  5 +++--
  2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 72fa7c2..3766414 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -38,6 +38,7 @@ typedef struct MirrorState {
  NetFilterState parent_obj;
  char *indev;
  char *outdev;
+bool vnet_hdr;
  CharBackend chr_in;
  CharBackend chr_out;
  SocketReadState rs;
@@ -308,6 +309,13 @@ static char *filter_mirror_get_outdev(Object *obj, Error 
**errp)
  return g_strdup(s->outdev);
  }
  
+static char *filter_mirror_get_vnet_hdr(Object *obj, Error **errp)

+{
+MirrorState *s = FILTER_MIRROR(obj);
+
+return s->vnet_hdr ? g_strdup("on") : g_strdup("off");
+}
+
  static void
  filter_mirror_set_outdev(Object *obj, const char *value, Error **errp)
  {
@@ -322,6 +330,21 @@ filter_mirror_set_outdev(Object *obj, const char *value, 
Error **errp)
  }
  }
  
+static void filter_mirror_set_vnet_hdr(Object *obj,

+   const char *value,
+   Error **errp)
+{
+MirrorState *s = FILTER_MIRROR(obj);
+
+if (strcmp(value, "on") && strcmp(value, "off")) {
+error_setg(errp, "Invalid value for filter-mirror vnet_hdr, "
+ "should be 'on' or 'off'");
+return;
+}
+
+s->vnet_hdr = !strcmp(value, "on");
+}
+
  static char *filter_redirector_get_outdev(Object *obj, Error **errp)
  {
  MirrorState *s = FILTER_REDIRECTOR(obj);
@@ -340,8 +363,19 @@ filter_redirector_set_outdev(Object *obj, const char 
*value, Error **errp)
  
  static void filter_mirror_init(Object *obj)

  {
+MirrorState *s = FILTER_MIRROR(obj);
+
  object_property_add_str(obj, "outdev", filter_mirror_get_outdev,
  filter_mirror_set_outdev, NULL);
+
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+ */
+s->vnet_hdr = false;
+object_property_add_str(obj, "vnet_hdr", filter_mirror_get_vnet_hdr,
+ filter_mirror_set_vnet_hdr, NULL);
  }
  
  static void filter_redirector_init(Object *obj)

diff --git a/qemu-options.hx b/qemu-options.hx
index 70c0ded..1e08481 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4024,10 +4024,11 @@ queue @var{all|rx|tx} is an option that can be applied 
to any netfilter.
  @option{tx}: the filter is attached to the transmit queue of the netdev,
   where it will receive packets sent by the netdev.
  
-@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid}[,queue=@var{all|rx|tx}]

+@item -object 
filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},vnet_hdr=@var{on|off}[,queue=@var{all|rx|tx}]
  
  filter-mirror on netdev @var{netdevid},mirror net packet to chardev

-@var{chardevid}
+@var{chardevid}, if vnet_hdr = on, filter-mirror will mirror packet
+with vnet_hdr_len.
  
  @item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},

  outdev=@var{chardevid}[,queue=@var{all|rx|tx}]

Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration

2017-05-12 Thread Hailiang Zhang


On 2017/5/12 16:09, Xiao Guangrong wrote:


On 05/11/2017 08:24 PM, Paolo Bonzini wrote:


On 11/05/2017 14:07, Zhoujian (jay) wrote:

-* Scan sptes if dirty logging has been stopped, dropping those
-* which can be collapsed into a single large-page spte.  Later
-* page faults will create the large-page sptes.
+* Reset each vcpu's mmu, then page faults will create the large-page
+* sptes later.
   */
  if ((change != KVM_MR_DELETE) &&
  (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-   kvm_mmu_zap_collapsible_sptes(kvm, new);
+   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_mmu_reset_context(vcpu);

This should be "kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);" but
I am not sure it is enough.  I think that if you do not zap the SPTEs,
the page faults will use 4K SPTEs, not large ones (though I'd have to
check better; CCing Xiao and Wanpeng).

Yes, Paolo is right. kvm_mmu_reset_context() just reloads vCPU's
root page table, 4k mappings are still kept.

There are two issues reported:
- one is kvm_mmu_slot_apply_flags(), when enable dirty log tracking.

Its root cause is kvm_mmu_slot_remove_write_access() takes too much
time.

We can make the code adaptive to use the new fast-write-protect faculty
introduced by my patchset, i.e, if the number of pages contained in this
memslot is more than > TOTAL * FAST_WRITE_PROTECT_PAGE_PERCENTAGE, then
we use fast-write-protect instead.

- another one is kvm_mmu_zap_collapsible_sptes() when disable dirty
log tracking.

collapsible_sptes zaps 4k mappings to make memory-read happy, it is not
required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not
urgent for vCPU's running, it could be done in a separate thread and use
lock-break technology.


How about move the action of stopping dirty log into migrate_fd_cleanup() 
directly,
which is processed in main thread as BH after migration is completed?  It will 
not
has any side-effect even migration is failed, Or users cancel migration, No ?


Thanks!


.

Re: [Qemu-devel] [Qemu-block] [PATCH v4 0/6] COLO block replication supports shared disk case

2017-05-12 Thread Hailiang Zhang


On 2017/5/12 3:17, Stefan Hajnoczi wrote:

On Wed, Apr 12, 2017 at 10:05:15PM +0800, zhanghailiang wrote:

COLO block replication doesn't support the shared disk case,
Here we try to implement it and this is the 4th version.

Please review and any commits are welcomed.

Cc: Dr. David Alan Gilbert (git) 
Cc: eddie.d...@intel.com

Sorry for the delay.  Feel free to ping me if I don't review within a
few days when you post a patch.


That is OK. :) , I was doing other things these days, and it is not quite 
urgent ... thanks.


v4:
- Add proper comment for primary_disk in patch 2 (Stefan)
- Call bdrv_invalidate_cache() while do checkpoint for shared disk in patch 5

v3:
- Fix some comments from Stefan and Eric

v2:
- Drop the patch which add a blk_root() helper
- Fix some comments from Changlong

zhanghailiang (6):
   docs/block-replication: Add description for shared-disk case
   replication: add shared-disk and shared-disk-id options
   replication: Split out backup_do_checkpoint() from
 secondary_do_checkpoint()
   replication: fix code logic with the new shared_disk option
   replication: Implement block replication for shared disk case
   nbd/replication: implement .bdrv_get_info() for nbd and replication
 driver

  block/nbd.c|  12 +++
  block/replication.c| 198 ++---
  docs/block-replication.txt | 139 ++-
  qapi/block-core.json   |  10 ++-
  4 files changed, 306 insertions(+), 53 deletions(-)

--
1.8.3.1

Re: [Qemu-devel] [PATCH v4 5/6] replication: Implement block replication for shared disk case

2017-05-12 Thread Hailiang Zhang


On 2017/5/12 3:15, Stefan Hajnoczi wrote:

On Wed, Apr 12, 2017 at 10:05:20PM +0800, zhanghailiang wrote:

@@ -612,6 +644,16 @@ static void replication_do_checkpoint(ReplicationState 
*rs, Error **errp)
  error_propagate(errp, local_err);
  break;
  }
+} else {
+/*
+ * For shared disk, we need to force SVM to re-read metadata
+ * that is loaded in memory, or there will be inconsistent.
+ */
+bdrv_invalidate_cache(s->secondary_disk->bs, _err);

I'm not sure this call has any effect:

 if (!(bs->open_flags & BDRV_O_INACTIVE)) {
 return;
 }

Is BDRV_O_INACTIVE set?


No, you are right, it does not take any effect. So should we set this flag for 
secondary_disk ?
Is it enough to set this flag only, or should we call bdrv_inactivate_recurse() 
?
To be honest, i'm not quite familiar with this parts.

Thanks,
Hailiang

Re: [Qemu-devel] [Qemu-block] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-05-12 Thread Hailiang Zhang


On 2017/5/12 3:08, Stefan Hajnoczi wrote:

On Wed, Apr 12, 2017 at 10:05:17PM +0800, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
  block/replication.c  | 43 +--
  qapi/block-core.json | 10 +-
  2 files changed, 50 insertions(+), 3 deletions(-)

Aside from the ongoing discussion about this patch...

Reviewed-by: Stefan Hajnoczi 


Thanks,  I'll fix the related problems found by changlong.

Re: [Qemu-devel] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-05-12 Thread Hailiang Zhang


On 2017/4/18 13:59, Xie Changlong wrote:


On 04/12/2017 10:05 PM, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
   block/replication.c  | 43 +--
   qapi/block-core.json | 10 +-
   2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index bf3c395..418b81b 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -25,9 +25,12 @@
   typedef struct BDRVReplicationState {
   ReplicationMode mode;
   int replication_state;
+bool is_shared_disk;
+char *shared_disk_id;
   BdrvChild *active_disk;
   BdrvChild *hidden_disk;
   BdrvChild *secondary_disk;
+BdrvChild *primary_disk;
   char *top_id;
   ReplicationState *rs;
   Error *blocker;
@@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool 
failover,
   
   #define REPLICATION_MODE"mode"

   #define REPLICATION_TOP_ID  "top-id"
+#define REPLICATION_SHARED_DISK "shared-disk"
+#define REPLICATION_SHARED_DISK_ID "shared-disk-id"
+
   static QemuOptsList replication_runtime_opts = {
   .name = "replication",
   .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
@@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = {
   .name = REPLICATION_TOP_ID,
   .type = QEMU_OPT_STRING,
   },
+{
+.name = REPLICATION_SHARED_DISK_ID,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = REPLICATION_SHARED_DISK,
+.type = QEMU_OPT_BOOL,
+},
   { /* end of list */ }
   },
   };
@@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
   QemuOpts *opts = NULL;
   const char *mode;
   const char *top_id;
+const char *shared_disk_id;
+BlockBackend *blk;
+BlockDriverState *tmp_bs;
   
   bs->file = bdrv_open_child(NULL, options, "file", bs, _file,

  false, errp);
@@ -125,12 +142,33 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
  "The option mode's value should be primary or secondary");
   goto fail;
   }
+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+

What If secondary side is supplied with 'REPLICATION_SHARED_DISK_ID'?
Pls refer f4f2539bc to pefect the logical.


Hmm, we should not configure it for secondary side, i'll fix it in next version.



  false);
+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(_err, "Missing shared disk blk option");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+error_setg(_err, "There is no %s block", s->shared_disk_id);
+goto fail;
+}
+/* We have a BlockBackend for the primary disk but use BdrvChild for
+ * consistency - active_disk, secondary_disk, etc are also BdrvChild.
+ */
+tmp_bs = blk_bs(blk);
+s->primary_disk = QLIST_FIRST(_bs->parents);
+}
   
   s->rs = replication_new(bs, _ops);
   
-ret = 0;

-
+qemu_opts_del(opts);
+return 0;
   fail:
+g_free(s->shared_disk_id);
   qemu_opts_del(opts);
   error_propagate(errp, local_err);
   
@@ -141,6 +179,7 @@ static void replication_close(BlockDriverState *bs)

   {
   BDRVReplicationState *s = bs->opaque;
   
+g_free(s->shared_disk_id);

   if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
   replication_stop(s->rs, false, NULL);
   }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 033457c..361c932 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2661,12 +2661,20 @@
   #  node who owns the replication node chain. Must not be given in
   #  primary mode.
   #
+# @shared-disk-id: Id of shared disk while is replication mode, if @shared-disk
+#  is true, this option is required (Since: 2.10)
+#

Further explanations:

For @shared-disk-id, it must/only be given when @shared-disk enable on
Primary side.


OK.

+# @shared-disk: To indicate whether or not a disk is shared by primary VM
+#   and secondary VM. (The default is false) (Since: 2.10)
+#

Further explanations:

For @shared-disk, it must be given or not-given on both side at the same

Re: [Qemu-devel] [PATCH v2 0/3] Remove old MigrationParams

2017-05-11 Thread Hailiang Zhang


On 2017/5/12 0:32, Juan Quintela wrote:

Hi

Changes from v1:

- make migrate_block_set_* take a boolean
- disable block migration in colo to maintain semantics.

Please review, Juan.

[v1]
Upon a time there were MigrationParms (only used for block migration)
and then MigrationParams used for everything else.  This series:

- create migration capabilities for block parameters
- make the migrate command line parameters to use capabilities
- remove MigrationParams completely

Please, review.



Looks good to me, this makes codes more grace.

Reviewed-by:zhanghailiang 



*** BLURB HERE ***

Juan Quintela (3):
   migration: Create block capabilities for shared and enable
   migration: Remove use of old MigrationParams
   migration: Remove old MigrationParams

  include/migration/block.h |  3 +++
  include/migration/migration.h | 14 +---
  include/migration/vmstate.h   |  1 -
  include/qemu/typedefs.h   |  1 -
  include/sysemu/sysemu.h   |  3 +--
  migration/block.c | 17 ++
  migration/colo.c  |  7 +++---
  migration/migration.c | 52 ---
  migration/savevm.c| 18 +++
  qapi-schema.json  |  7 +-
  10 files changed, 68 insertions(+), 55 deletions(-)

Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams

2017-05-11 Thread Hailiang Zhang


On 2017/5/12 0:33, Juan Quintela wrote:

Hailiang Zhang <zhang.zhanghaili...@huawei.com> wrote:

Hi,


Hmm you don't seem to have replaced this with anything.
I think that's a behavioural change; the trick COLO did (I'm not sure if this
is still the way it works) is that they initiate the first migration
with block migration enabled so that the two hosts (with non-shared storage)
get sync'd storage, and then at the completion of that first migration
they then switch into the checkpointing mode where they're only
doing updates - that's why it gets switched off at this point
prior to the 1st checkpoint.

Weird, really.

I did't catch that.

Will investigate.

Yes, Dave is right, for non-shared disk, we need to enable block
migration for first cycle,
to sync the disks of two sides. After that, qemu will go into COLO
state which we need to
disable block migration.

v2 posted.

My understanding is that it maintains the sematic, please test/comment.


Yes, it is right now, i have reviewed it, thanks.


Thanks, Juan.

.

Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams

2017-05-04 Thread Hailiang Zhang


Hi,

On 2017/5/4 16:51, Juan Quintela wrote:

"Dr. David Alan Gilbert"  wrote:

* Juan Quintela (quint...@redhat.com) wrote:

We have change in the previous patch to use migration capabilities for
it.  Notice that we continue using the old command line flags from
migrate command from the time being.  Remove the set_params method as
now it is empty.

Signed-off-by: Juan Quintela 
---
  include/migration/migration.h |  3 +--
  migration/block.c | 17 ++---
  migration/colo.c  |  3 ---
  migration/migration.c |  8 +---
  migration/savevm.c|  2 --
  5 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index c19eb3f..5c6c2f0 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -332,9 +332,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
  goto out;
  }
  
-/* Disable block migration */

-s->params.blk = 0;
-s->params.shared = 0;

Hmm you don't seem to have replaced this with anything.
I think that's a behavioural change; the trick COLO did (I'm not sure if this
is still the way it works) is that they initiate the first migration
with block migration enabled so that the two hosts (with non-shared storage)
get sync'd storage, and then at the completion of that first migration
they then switch into the checkpointing mode where they're only
doing updates - that's why it gets switched off at this point
prior to the 1st checkpoint.


Weird, really.

I did't catch that.

Will investigate.


Yes, Dave is right, for non-shared disk, we need to enable block migration for 
first cycle,
to sync the disks of two sides. After that, qemu will go into COLO state which 
we need to
disable block migration.

Thanks,
Hailiang



Thanks.

.

Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread

2017-05-03 Thread Hailiang Zhang


Hi Jason,

On 2017/4/25 19:33, Jason Wang wrote:


On 2017年04月25日 17:59, Hailiang Zhang wrote:

On 2017/4/25 16:41, Jason Wang wrote:

On 2017年04月24日 14:03, Hailiang Zhang wrote:

On 2017/4/24 12:10, Jason Wang wrote:

On 2017年04月20日 15:46, zhanghailiang wrote:

We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used
to detach watched fd from default main context, so it has chance to
handle the same watched fd with main thread concurrently, which will
trigger an error report:
"qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src ==
((void *)0)' failed."

Anyway to prevent fd from being handled by main thread before creating
colo thread? Using semaphore seems not elegant.

So how about calling qemu_mutex_lock_iothread() before
qemu_chr_fe_set_handlers() ?

Looks better, but I needs more information e.g how main thread can
touch it?

Hmm, this happened quite occasionally, and we didn't catch the first
place (backtrace)
of removing fd from been watched, but  from the codes logic, we found
there should
be such possible cases:
tcp_chr_write (Or tcp_chr_read/tcp_chr_sync_read/chr_disconnect)
  ->tcp_chr_disconnect (Or char_socket_finalize)
 ->tcp_chr_free_connection
   -> remove_fd_in_watch(chr);

Anyway, it needs the protection from been freed twice.

Thanks,
Hailiang

Still a little bit confused. The question is how could main thread still
call tcp_chr_write or other in the above case?


Finally, we reproduced this bug (We use qemu 2.6), and got the follow backtrace 
of this problem:

(gdb) thread apply all bt

Thread 7 (Thread 0x7f407a1ff700 (LWP 23144)):
#0  0x7f41037e0db5 in _int_malloc () from /usr/lib64/libc.so.6
#1  0x7f41037e3b96 in calloc () from /usr/lib64/libc.so.6
#2  0x7f41041ad4d7 in g_malloc0 () from /usr/lib64/libglib-2.0.so.0
#3  0x7f41041a5437 in g_source_new () from /usr/lib64/libglib-2.0.so.0
#4  0x7f410a2cec9c in qio_channel_create_fd_watch 
(ioc=ioc@entry=0x7f410d6238c0, fd=20, condition=condition@entry=
(G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel-watch.c:259
#5  0x7f410a2ced01 in qio_channel_create_socket_watch 
(ioc=ioc@entry=0x7f410d6238c0, socket=,
condition=condition@entry=(G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at 
io/channel-watch.c:311
#6  0x7f410a2cbea7 in qio_channel_socket_create_watch (ioc=0x7f410d6238c0, 
condition=(G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL))
at io/channel-socket.c:732
#7  0x7f410a2c94d2 in qio_channel_create_watch (ioc=0x7f410d6238c0, 
condition=condition@entry=
(G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel.c:132
#8  0x7f410a003cd6 in io_watch_poll_prepare (source=0x7f407d00, 
timeout_=) at qemu-char.c:883
#9  0x7f41041a72ed in g_main_context_prepare () from 
/usr/lib64/libglib-2.0.so.0
#10 0x7f41041a7b7b in g_main_context_iterate.isra.24 () from 
/usr/lib64/libglib-2.0.so.0
#11 0x7f41041a7fba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#12 0x7f410a1e528f in colo_compare_thread (opaque=0x7f410d7d6800) at 
net/colo-compare.c:651
#13 0x7f4103b2bdc5 in start_thread () from /usr/lib64/libpthread.so.0
#14 0x7f410385971d in clone () from /usr/lib64/libc.so.6

Thread 6 (Thread 0x7f40799fe700 (LWP 19368)):
#0  0x7f4103b2f6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/usr/lib64/libpthread.so.0
#1  0x7f410a3138d1 in qemu_cond_wait (cond=cond@entry=0x7f410cce44c0, 
mutex=mutex@entry=0x7f410cce44f0)
at util/qemu-thread-posix.c:132
---Type  to continue, or q  to quit---
#2  0x7f410a22b1a3 in vnc_worker_thread_loop 
(queue=queue@entry=0x7f410cce44c0) at ui/vnc-jobs.c:228
#3  0x7f410a22b810 in vnc_worker_thread (arg=0x7f410cce44c0) at 
ui/vnc-jobs.c:335
#4  0x7f4103b2bdc5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x7f410385971d in clone () from /usr/lib64/libc.so.6

Thread 5 (Thread 0x7f407abff700 (LWP 19366)):
#0  0x7f4103b2f6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/usr/lib64/libpthread.so.0
#1  0x7f410a3138d1 in qemu_cond_wait (cond=cond@entry=0x7f410a9fc368 
<mlock_struct+40>,
mutex=mutex@entry=0x7f410a9fc340 ) at 
util/qemu-thread-posix.c:132
#2  0x7f4109e99060 in mlock_wait () at 
/work/zhanghailiang/qemu-kvm/exec.c:392
#3  mlock_thread (opaque=) at 
/work/zhanghailiang/qemu-kvm/exec.c:407
#4  0x7f4103b2bdc5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x7f410385971d in clone () from /usr/lib64/libc.so.6

Thread 4 (Thread 0x7f40fcd83700 (LWP 19364)):
#0  0x7f4103b2f6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/usr/lib64/libpthread.so.0
#1  0x7f410a3138d1 in qemu_cond_wait (cond=, 
mutex=mutex@entry=0x7f410aa66ca0 )
at util/qemu-thread-posix.c:132
#2  0x7f4109ed5b3b in qemu_kvm_wait_io_event (cpu=0x7f410c2bda30) at 
/work/zhanghailiang/qemu-kvm/cpus.c:1087
#3  qemu_kvm_cpu_thread_fn (arg=0x7f410c2bda30) at 
/work/zhanghailiang/qemu-kvm/cpus.c:1126
#4  0x7f4103b2bdc5 in start_thread () f

Re: [Qemu-devel] [PATCH] ram: Split dirty bitmap by RAMBlock

2017-04-26 Thread Hailiang Zhang


On 2017/4/26 15:32, Juan Quintela wrote:

Both the ram bitmap and the unsent bitmap are split by RAMBlock.

Signed-off-by: Juan Quintela 

--

Fix compilation when DEBUG_POSTCOPY is enabled (thanks Hailiang)

Signed-off-by: Juan Quintela 
---
  include/exec/ram_addr.h  |  13 +-
  include/migration/migration.h|   3 +-
  include/migration/postcopy-ram.h |   3 -
  migration/postcopy-ram.c |   5 +-
  migration/ram.c  | 273 +++
  5 files changed, 119 insertions(+), 178 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 6436a41..c56b35b 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -39,6 +39,14 @@ struct RAMBlock {
  QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
  int fd;
  size_t page_size;
+/* dirty bitmap used during migration */
+unsigned long *bmap;
+/* bitmap of pages that haven't been sent even once
+ * only maintained and used in postcopy at the moment
+ * where it's used to send the dirtymap at the start
+ * of the postcopy phase
+ */
+unsigned long *unsentmap;
  };
  
  static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)

@@ -360,16 +368,15 @@ static inline void 
cpu_physical_memory_clear_dirty_range(ram_addr_t start,
  
  
  static inline

-uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
-   RAMBlock *rb,
+uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
 ram_addr_t start,
 ram_addr_t length,
 uint64_t *real_dirty_pages)
  {
  ram_addr_t addr;
-start = rb->offset + start;
  unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
  uint64_t num_dirty = 0;
+unsigned long *dest = rb->bmap;
  
  /* start address is aligned at the start of a word? */

  if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) {
diff --git a/include/migration/migration.h b/include/migration/migration.h
index ba1a16c..e29cb01 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -266,7 +266,8 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
  double xbzrle_mig_cache_miss_rate(void);
  
  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);

-void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected,
+   unsigned long pages);
  /* For outgoing discard bitmap */
  int ram_postcopy_send_discard_bitmap(MigrationState *ms);
  /* For incoming postcopy discard */
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 8e036b9..4c25f03 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -43,12 +43,9 @@ int postcopy_ram_prepare_discard(MigrationIncomingState 
*mis);
  
  /*

   * Called at the start of each RAMBlock by the bitmap code.
- * 'offset' is the bitmap offset of the named RAMBlock in the migration
- * bitmap.
   * Returns a new PDS
   */
  PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
- unsigned long offset,
   const char *name);
  
  /*

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 85fd8d7..e3f4a37 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -33,7 +33,6 @@
  
  struct PostcopyDiscardState {

  const char *ramblock_name;
-uint64_t offset; /* Bitmap entry for the 1st bit of this RAMBlock */
  uint16_t cur_entry;
  /*
   * Start and length of a discard range (bytes)
@@ -717,14 +716,12 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
   * returns: a new PDS.
   */
  PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
- unsigned long offset,
   const char *name)
  {
  PostcopyDiscardState *res = g_malloc0(sizeof(PostcopyDiscardState));
  
  if (res) {

  res->ramblock_name = name;
-res->offset = offset;
  }
  
  return res;

@@ -745,7 +742,7 @@ void postcopy_discard_send_range(MigrationState *ms, 
PostcopyDiscardState *pds,
  {
  size_t tp_size = qemu_target_page_size();
  /* Convert to byte offsets within the RAM block */
-pds->start_list[pds->cur_entry] = (start - pds->offset) * tp_size;
+pds->start_list[pds->cur_entry] = start  * tp_size;
  pds->length_list[pds->cur_entry] = length * tp_size;
  trace_postcopy_discard_send_range(pds->ramblock_name, start, length);
  pds->cur_entry++;
diff --git a/migration/ram.c b/migration/ram.c
index f48664e..235f400 100644
---

Re: [Qemu-devel] [PATCH RESEND v2 01/18] net/colo: Add notifier/callback related helpers for filter

2017-04-26 Thread Hailiang Zhang


On 2017/4/25 19:40, Jason Wang wrote:


On 2017年04月22日 16:35, zhanghailiang wrote:

We will use this notifier to help COLO to notify filter object
to do something, like do checkpoint, or process failover event.

Cc: Jason Wang
Signed-off-by: zhanghailiang
Signed-off-by: Zhang Chen
Signed-off-by: Li Zhijian
---
   net/colo.c | 105 
+
   net/colo.h |  19 +++
   2 files changed, 124 insertions(+)

Went through this series and I feel that the code duplicates (at least
part) functionality of IOThread (iothread.c) . So I come to an idea that,
switch to use IOThread for COLO comparing thread then you can use aio bh
to do the inter process communication.

Thoughts?


Interesting idea, but after investigation, we found it will make things quite 
complex
if we use iothread for net compare (Actually, i don't think we can use it 
directly for COLO),
It seems to be special for block, you can see some codes call block helpers.
Besides, we still have to realize a way to communicate with COLO frame for 
filters  (I mean patch 16/17/18),
even we use it after reconstruct those iothread part. ;)

Thanks,
Hailiang


Thanks

.

Re: [Qemu-devel] [PATCH v2 05/18] COLO: Handle shutdown command for VM in COLO state

2017-04-26 Thread Hailiang Zhang


On 2017/4/24 22:51, Eric Blake wrote:

On 04/22/2017 03:25 AM, zhanghailiang wrote:

If VM is in COLO FT state, we need to do some extra works before
starting normal shutdown process.

Secondary VM will ignore the shutdown command if users issue it directly
to Secondary VM. COLO will capture shutdown command and after
shutdown request from user.

Cc: Paolo Bonzini 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
+++ b/qapi-schema.json
@@ -1187,12 +1187,14 @@
  #
  # @vmstate-loaded: VM's state has been loaded by SVM.
  #
+# @guest-shutdown: shutdown requested from PVM to SVM. (Since 2.9)

You missed 2.9. Please fix this to state 2.10.


OK, will fix in next version, thanks.


+#
  # Since: 2.8
  ##
  { 'enum': 'COLOMessage',
'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
  'vmstate-send', 'vmstate-size', 'vmstate-received',
-'vmstate-loaded' ] }
+'vmstate-loaded', 'guest-shutdown' ] }

Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread

2017-04-26 Thread Hailiang Zhang


On 2017/4/25 19:33, Jason Wang wrote:


On 2017年04月25日 17:59, Hailiang Zhang wrote:

On 2017/4/25 16:41, Jason Wang wrote:

On 2017年04月24日 14:03, Hailiang Zhang wrote:

On 2017/4/24 12:10, Jason Wang wrote:

On 2017年04月20日 15:46, zhanghailiang wrote:

We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used
to detach watched fd from default main context, so it has chance to
handle the same watched fd with main thread concurrently, which will
trigger an error report:
"qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src ==
((void *)0)' failed."

Anyway to prevent fd from being handled by main thread before creating
colo thread? Using semaphore seems not elegant.

So how about calling qemu_mutex_lock_iothread() before
qemu_chr_fe_set_handlers() ?

Looks better, but I needs more information e.g how main thread can
touch it?

Hmm, this happened quite occasionally, and we didn't catch the first
place (backtrace)
of removing fd from been watched, but  from the codes logic, we found
there should
be such possible cases:
tcp_chr_write (Or tcp_chr_read/tcp_chr_sync_read/chr_disconnect)
  ->tcp_chr_disconnect (Or char_socket_finalize)
 ->tcp_chr_free_connection
   -> remove_fd_in_watch(chr);

Anyway, it needs the protection from been freed twice.

Thanks,
Hailiang

Still a little bit confused. The question is how could main thread still
call tcp_chr_write or other in the above case?


The 'char_socekt_finalize' ? Hmm, I'd better to reproduce it again to get the 
first
time of removing the fd been watched...


Thanks


Thanks

.





.

Re: [Qemu-devel] [PATCH] COLO-compare: Add compare_lock aviod comparison conflict

2017-04-25 Thread Hailiang Zhang


On 2017/4/25 19:57, Zhang Chen wrote:


On 04/20/2017 02:40 PM, Jason Wang wrote:


On 2017年04月20日 14:36, Zhang Chen wrote:


On 04/20/2017 02:20 PM, Hailiang Zhang wrote:

On 2017/4/20 12:32, Zhang Chen wrote:

When network traffic heavy, compare_pri_rs_finalize() and
compare_sec_rs_finalize() have a chance to confilct.
Both of them call colo_compare_connection() to compare packet,
But during compare_pri_rs_finalize() comparison, have secondary
packet come and call compare_sec_rs_finalize(), that packet will be
handle twice. If packet same, the pkt will be double free.

Interesting, if I'm right, this should not happen, because, all the
comparing works
are done in colo compare thread, so there is no chance to access the
connect_list
concurrently.  Besides, even both of the packets from primary and
secondary arrive
at the same time, it should only be handle once, we will handle it
with the later arrived one,
No ?

No, In my test often trigger this bug, you can use udp server and
client test it.

13517@1492648526.850246:colo_compare_main : packet same and release
packet
13517@1492648526.850304:colo_compare_main : packet same and release
packet
*** glibc detected ***
/home/zhangchen/qemu-colo-apr14/x86_64-softmmu/qemu-system-x86_64:
double free or corruption (out): 0x56a75210 ***
=== Backtrace: =
/lib64/libc.so.6(+0x76628)[0x753d6628]
/lib64/libc.so.6(cfree+0x6c)[0x753db5cc]


Thanks
Zhang Chen

I agree that you should check whether or not they are running in the
same thread.



I found they are not running in the same thread, and I have reviewed
relative code but don't find out
why we do same job to pri_chr_in and sec_chr_in then they running in
different thread.
Anyone can tell me the reason?

Log:

Breakpoint 5, compare_pri_chr_in (opaque=0x77fd1010,
buf=0x73eb0950 "", size=8)
  at net/colo-compare.c:591
591{
(gdb) info thread
Id   Target Id Frame
18   Thread 0x7fff70bff700 (LWP 27864) "qemu-system-x86"
0x756e4ae7 in sem_timedwait ()
 from /lib64/libpthread.so.0
9Thread 0x7fff6f1ff700 (LWP 27748) "qemu-system-x86"
0x756e4a00 in sem_wait ()
 from /lib64/libpthread.so.0
7Thread 0x7fff701ff700 (LWP 27746) "qemu-system-x86"
0x756e261c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
5Thread 0x72dbf700 (LWP 27743) "qemu-system-x86"
0x754320a7 in ioctl ()
 from /lib64/libc.so.6
4Thread 0x735c0700 (LWP 27742) "qemu-system-x86"
0x756e5dbd in sendmsg ()
 from /lib64/libpthread.so.0
* 3Thread 0x73eb2700 (LWP 27741) "qemu-system-x86"
compare_pri_chr_in (opaque=0x77fd1010,
  buf=0x73eb0950 "", size=8) at net/colo-compare.c:591
2Thread 0x746b3700 (LWP 27729) "qemu-system-x86"
0x75436789 in syscall ()
 from /lib64/libc.so.6
1Thread 0x77fb7a80 (LWP 27725) "qemu-system-x86"
0x756e5294 in __lll_lock_wait ()
 from /lib64/libpthread.so.0
(gdb)  bt
#0  compare_pri_chr_in (opaque=0x77fd1010, buf=0x73eb0950 "",
size=8) at net/colo-compare.c:591
#1  0x55c60fba in qemu_chr_be_write_impl (s=0x5684a630,
buf=0x73eb0950 "", len=8)
  at chardev/char.c:284
#2  0x55c6102f in qemu_chr_be_write (s=0x5684a630,
buf=0x73eb0950 "", len=8)
  at chardev/char.c:296
#3  0x55c6a056 in tcp_chr_read (chan=0x5684aa30,
cond=G_IO_IN, opaque=0x5684a630)
  at chardev/char-socket.c:414
#4  0x55c83dbc in qio_channel_fd_source_dispatch
(source=0x568d8b80, callback=
  0x55c69ebf , user_data=0x5684a630) at
io/channel-watch.c:84
#5  0x760c460a in g_main_context_dispatch () from
/usr/lib64/libglib-2.0.so.0
#6  0x760c7e88 in ?? () from /usr/lib64/libglib-2.0.so.0
#7  0x760c835d in g_main_loop_run () from
/usr/lib64/libglib-2.0.so.0
#8  0x55b82e22 in colo_compare_thread (opaque=0x77fd1010) at
net/colo-compare.c:703
#9  0x756de7b6 in start_thread () from /lib64/libpthread.so.0
#10 0x75439d6d in clone () from /lib64/libc.so.6
#11 0x in ?? ()
(gdb) c
Continuing.
[Switching to Thread 0x77fb7a80 (LWP 27725)]

Breakpoint 6, compare_sec_chr_in (opaque=0x77fd1010,
buf=0x7fffc590 "", size=1088)
  at net/colo-compare.c:608
608{
(gdb) info thread
Id   Target Id Frame
18   Thread 0x7fff70bff700 (LWP 27864) "qemu-system-x86" (Exiting)
0x756de9b3 in start_thread
  () from /lib64/libpthread.so.0
9Thread 0x7fff6f1ff700 (LWP 27748) "qemu-system-x86"
0x756e4a00 in sem_wait ()
 from /lib64/libpthread.so.0
7Thread 0x7fff701ff700 (LWP 27746) "qemu-system-x86"
0x756e261c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0

Re: [Qemu-devel] [PATCH] ram: Split dirty bitmap by RAMBlock

2017-04-25 Thread Hailiang Zhang


On 2017/4/25 18:11, Juan Quintela wrote:

Both the ram bitmap and the unsent bitmap are split by RAMBlock.

Signed-off-by: Juan Quintela 
---
  include/exec/ram_addr.h  |  13 +-
  include/migration/postcopy-ram.h |   3 -
  migration/postcopy-ram.c |   5 +-
  migration/ram.c  | 257 +++
  4 files changed, 109 insertions(+), 169 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 6436a41..c56b35b 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -39,6 +39,14 @@ struct RAMBlock {
  QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
  int fd;
  size_t page_size;
+/* dirty bitmap used during migration */
+unsigned long *bmap;
+/* bitmap of pages that haven't been sent even once
+ * only maintained and used in postcopy at the moment
+ * where it's used to send the dirtymap at the start
+ * of the postcopy phase
+ */
+unsigned long *unsentmap;
  };
  
  static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)

@@ -360,16 +368,15 @@ static inline void 
cpu_physical_memory_clear_dirty_range(ram_addr_t start,
  
  
  static inline

-uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
-   RAMBlock *rb,
+uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
 ram_addr_t start,
 ram_addr_t length,
 uint64_t *real_dirty_pages)
  {
  ram_addr_t addr;
-start = rb->offset + start;
  unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
  uint64_t num_dirty = 0;
+unsigned long *dest = rb->bmap;
  
  /* start address is aligned at the start of a word? */

  if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) {
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 8e036b9..4c25f03 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -43,12 +43,9 @@ int postcopy_ram_prepare_discard(MigrationIncomingState 
*mis);
  
  /*

   * Called at the start of each RAMBlock by the bitmap code.
- * 'offset' is the bitmap offset of the named RAMBlock in the migration
- * bitmap.
   * Returns a new PDS
   */
  PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
- unsigned long offset,
   const char *name);
  
  /*

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 85fd8d7..e3f4a37 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -33,7 +33,6 @@
  
  struct PostcopyDiscardState {

  const char *ramblock_name;
-uint64_t offset; /* Bitmap entry for the 1st bit of this RAMBlock */
  uint16_t cur_entry;
  /*
   * Start and length of a discard range (bytes)
@@ -717,14 +716,12 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
   * returns: a new PDS.
   */
  PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
- unsigned long offset,
   const char *name)
  {
  PostcopyDiscardState *res = g_malloc0(sizeof(PostcopyDiscardState));
  
  if (res) {

  res->ramblock_name = name;
-res->offset = offset;
  }
  
  return res;

@@ -745,7 +742,7 @@ void postcopy_discard_send_range(MigrationState *ms, 
PostcopyDiscardState *pds,
  {
  size_t tp_size = qemu_target_page_size();
  /* Convert to byte offsets within the RAM block */
-pds->start_list[pds->cur_entry] = (start - pds->offset) * tp_size;
+pds->start_list[pds->cur_entry] = start  * tp_size;
  pds->length_list[pds->cur_entry] = length * tp_size;
  trace_postcopy_discard_send_range(pds->ramblock_name, start, length);
  pds->cur_entry++;
diff --git a/migration/ram.c b/migration/ram.c
index f48664e..d99f6e2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -138,19 +138,6 @@ out:
  return ret;
  }
  
-struct RAMBitmap {

-struct rcu_head rcu;
-/* Main migration bitmap */
-unsigned long *bmap;
-/* bitmap of pages that haven't been sent even once
- * only maintained and used in postcopy at the moment
- * where it's used to send the dirtymap at the start
- * of the postcopy phase
- */
-unsigned long *unsentmap;
-};
-typedef struct RAMBitmap RAMBitmap;
-
  /*
   * An outstanding page request, on the source, having been received
   * and queued
@@ -220,8 +207,6 @@ struct RAMState {
  uint64_t postcopy_requests;
  /* protects modification of the bitmap */
  QemuMutex bitmap_mutex;
-/* Ram Bitmap protected by RCU */
-RAMBitmap *ram_bitmap;
  /* The RAMBlock used in the last src_page_requests */

Re: [Qemu-devel] [PATCH RESEND v2 08/18] ram/COLO: Record the dirty pages that SVM received

2017-04-25 Thread Hailiang Zhang


On 2017/4/25 2:29, Juan Quintela wrote:

zhanghailiang  wrote:

We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.

Here, it is a trick, we record dirty pages by re-using migration
dirty bitmap. In the later patch, we will start the dirty log
for SVM, just like migration, in this way, we can record both
the dirty pages caused by PVM and SVM, we only flush those dirty
pages from RAM cache while do checkpoint.

Cc: Juan Quintela 
Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
  migration/ram.c | 29 +
  1 file changed, 29 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 05d1b06..0653a24 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2268,6 +2268,9 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
  static inline void *colo_cache_from_block_offset(RAMBlock *block,
   ram_addr_t offset)
  {
+unsigned long *bitmap;
+long k;
+
  if (!offset_in_ramblock(block, offset)) {
  return NULL;
  }
@@ -2276,6 +2279,17 @@ static inline void 
*colo_cache_from_block_offset(RAMBlock *block,
   __func__, block->idstr);
  return NULL;
  }
+
+k = (memory_region_get_ram_addr(block->mr) + offset) >> TARGET_PAGE_BITS;
+bitmap = atomic_rcu_read(_state.ram_bitmap)->bmap;
+/*
+* During colo checkpoint, we need bitmap of these migrated pages.
+* It help us to decide which pages in ram cache should be flushed
+* into VM's RAM later.
+*/
+if (!test_and_set_bit(k, bitmap)) {
+ram_state.migration_dirty_pages++;
+}
  return block->colo_cache + offset;
  }
  
@@ -2752,6 +2766,15 @@ int colo_init_ram_cache(void)

  memcpy(block->colo_cache, block->host, block->used_length);
  }
  rcu_read_unlock();
+/*
+* Record the dirty pages that sent by PVM, we use this dirty bitmap 
together
+* with to decide which page in cache should be flushed into SVM's RAM. Here
+* we use the same name 'ram_bitmap' as for migration.
+*/
+ram_state.ram_bitmap = g_new0(RAMBitmap, 1);
+ram_state.ram_bitmap->bmap = bitmap_new(last_ram_page());
+ram_state.migration_dirty_pages = 0;
+
  return 0;
  
  out_locked:

@@ -2770,6 +2793,12 @@ out_locked:
  void colo_release_ram_cache(void)
  {
  RAMBlock *block;
+RAMBitmap *bitmap = ram_state.ram_bitmap;
+
+atomic_rcu_set(_state.ram_bitmap, NULL);
+if (bitmap) {
+call_rcu(bitmap, migration_bitmap_free, rcu);
+}
  
  rcu_read_lock();

  QLIST_FOREACH_RCU(block, _list.blocks, next) {

You can see my Split bitmap patches, I am splitting the dirty bitmap per
block, I think that it shouldn't make your life more difficult, but
please take a look.


OK, I'll look at it.


I am wondering if it is faster/easier to use the page_cache.c that
xbzrle uses to store the dirty pages instead of copying the whole
RAMBlocks, but I don't really know.


Hmm,  Yes, it takes long time (depends on VM's memory size) to backup the whole 
VM's memory data into cache.
And we can reduce the time by backup page one by one while loading the page 
during the first live migration round,
because we can know  if users enable COLO at the beginning of the first 
migration stage.
I'd like to send those optimization later in another series...

Thanks,
Hailiang



Thanks, Juan.

.

Re: [Qemu-devel] [PATCH RESEND v2 07/18] COLO: Load dirty pages into SVM's RAM cache firstly

2017-04-25 Thread Hailiang Zhang


On 2017/4/25 2:27, Juan Quintela wrote:

zhanghailiang  wrote:

We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached 
ram
to SVM after we receive all PVM's state.

Cc: Dr. David Alan Gilbert 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
v2:
- Move colo_init_ram_cache() and colo_release_ram_cache() out of
   incoming thread since both of them need the global lock, if we keep
   colo_release_ram_cache() in incoming thread, there are potential
   dead-lock.
- Remove bool ram_cache_enable flag, use migration_incoming_in_state() instead.
- Remove the Reviewd-by tag because of the above changes.



+out_locked:
+QLIST_FOREACH_RCU(block, _list.blocks, next) {
+if (block->colo_cache) {
+qemu_anon_ram_free(block->colo_cache, block->used_length);
+block->colo_cache = NULL;
+}
+}
+
+rcu_read_unlock();
+return -errno;
+}
+
+/* It is need to hold the global lock to call this helper */
+void colo_release_ram_cache(void)
+{
+RAMBlock *block;
+
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, _list.blocks, next) {
+if (block->colo_cache) {
+qemu_anon_ram_free(block->colo_cache, block->used_length);
+block->colo_cache = NULL;
+}
+}
+rcu_read_unlock();
+}

Create a function from the creation/removal?  We have exactly two copies
of the same code.  Right now the code inside the function is very small,
but it could be bigger, no?


Yes, we add more codes  in next patch (patch 08).  :)


Later, Juan.


.

Re: [Qemu-devel] [PATCH RESEND v2 04/18] COLO: integrate colo compare with colo frame

2017-04-25 Thread Hailiang Zhang


On 2017/4/25 2:18, Juan Quintela wrote:

zhanghailiang  wrote:

For COLO FT, both the PVM and SVM run at the same time,
only sync the state while it needs.

So here, let SVM runs while not doing checkpoint, change
DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100.

Besides, we forgot to release colo_checkpoint_semd and
colo_delay_timer, fix them here.

Cc: Jason Wang 
Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 




diff --git a/migration/migration.c b/migration/migration.c
index 353f272..2ade2aa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -70,7 +70,7 @@
  /* The delay time (in ms) between two COLO checkpoints
   * Note: Please change this default value to 1 when we support hybrid 
mode.
   */
-#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
+#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
  
  static NotifierList migration_state_notifiers =

  NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);

1000 or 200 * 100

Please, fix value or comment?


OK, will fix in next version, thanks.


Later, Juan.

.

Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread

2017-04-25 Thread Hailiang Zhang


On 2017/4/25 16:41, Jason Wang wrote:


On 2017年04月24日 14:03, Hailiang Zhang wrote:

On 2017/4/24 12:10, Jason Wang wrote:

On 2017年04月20日 15:46, zhanghailiang wrote:

We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used
to detach watched fd from default main context, so it has chance to
handle the same watched fd with main thread concurrently, which will
trigger an error report:
"qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src ==
((void *)0)' failed."

Anyway to prevent fd from being handled by main thread before creating
colo thread? Using semaphore seems not elegant.

So how about calling qemu_mutex_lock_iothread() before
qemu_chr_fe_set_handlers() ?

Looks better, but I needs more information e.g how main thread can touch it?


Hmm, this happened quite occasionally, and we didn't catch the first place 
(backtrace)
of removing fd from been watched, but  from the codes logic, we found there 
should
be such possible cases:
tcp_chr_write (Or tcp_chr_read/tcp_chr_sync_read/chr_disconnect)
 ->tcp_chr_disconnect (Or char_socket_finalize)
->tcp_chr_free_connection
  -> remove_fd_in_watch(chr);

Anyway, it needs the protection from been freed twice.

Thanks,
Hailiang

Thanks

.

Re: [Qemu-devel] [QEMU-2.8] Source QEMU crashes with: "bdrv_co_pwritev: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed"

2017-04-25 Thread Hailiang Zhang


On 2017/4/24 15:59, Kashyap Chamarthy wrote:

On Sat, Apr 22, 2017 at 05:23:49PM +0800, Hailiang Zhang wrote:

Hi,

Hi Hailiang,


I think the bellow patch can fix your problme.
[PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH
https://patchwork.kernel.org/patch/9591885/

Hmm, the above patch ("qmp-cont: invalidate on RUN_STATE_PRELAUNCH") is
not merged in Git, as it's stalled on design discussion between Kevin
Wolf and Vladimir.

And the below patch, from you, seems to be not submitted upstream (2.8
stable tree, perhaps).  Do you intend to do so?


Er, since this patch does the same thing with the above patch, I'm not sure if 
i should
send this patch ...


Actually, we encounter the same problem in our test, we fix it with the follow 
patch:

  From 0e4d6d706afd9909b5fd71536b45c58af60892f8 Mon Sep 17 00:00:00 2001
  From: zhanghailiang<zhang.zhanghaili...@huawei.com>
  Date: Tue, 21 Mar 2017 09:44:36 +0800
  Subject: [PATCH] migration: Re-activate blocks whenever migration been
   cancelled

  In commit 1d2acc3162d9c7772510c973f446353fbdd1f9a8, we try to fix the bug
  'bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed'
  which occured in migration cancelling process.

  But it seems that we didn't cover all the cases, we caught such a case 
which
  slipped from the old fixup in our test: if libvirtd cancelled the 
migration
  process for a shutting down VM, it will send 'system_reset' command first,
  and then 'cont' command behind, after VM resumes to run, it will trigger 
the above
  error reports, because we didn't regain the control of blocks for VM.

  Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com>
  Signed-off-by: Hongyang Yang<yanghongy...@huawei.com>
  ---
   block.c   | 12 +++-
   include/block/block.h |  1 +
   include/migration/migration.h |  3 ---
   migration/migration.c |  7 +--
   qmp.c |  4 +---
   5 files changed, 14 insertions(+), 13 deletions(-)

[...]

Re: [Qemu-devel] [PATCH 2/3] colo-compare: Check main_loop value before call g_main_loop_quit

2017-04-24 Thread Hailiang Zhang


On 2017/4/24 12:13, Jason Wang wrote:


On 2017年04月20日 15:46, zhanghailiang wrote:

If some errors happen before main_loop is initialized in colo
compare thread, qemu will go into finalizing process where
we call g_main_loop_quit(s->main_loop), if main_loop is NULL, there
will be an error report:
"(process:14861): GLib-CRITICAL **: g_main_loop_quit: assertion 'loop != NULL' 
failed".

We need to check if main_loop is NULL or not before call g_main_loop_quit().

Do we need check and fail early in colo_compare_thread() too?


Yes, we need to check there too, will add the check in next version, thanks.


Thanks


Signed-off-by: zhanghailiang 
---
   net/colo-compare.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index a6bf419..d6a5e4c 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -770,7 +770,9 @@ static void colo_compare_finalize(Object *obj)
s->worker_context, true);
   qemu_chr_fe_deinit(>chr_out);
   
-g_main_loop_quit(s->compare_loop);

+if (s->compare_loop) {
+g_main_loop_quit(s->compare_loop);
+}
   qemu_thread_join(>thread);
   
   /* Release all unhandled packets after compare thead exited */


.

Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread

2017-04-24 Thread Hailiang Zhang


On 2017/4/24 12:10, Jason Wang wrote:


On 2017年04月20日 15:46, zhanghailiang wrote:

We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used
to detach watched fd from default main context, so it has chance to
handle the same watched fd with main thread concurrently, which will
trigger an error report:
"qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src == ((void *)0)' 
failed."

Anyway to prevent fd from being handled by main thread before creating
colo thread? Using semaphore seems not elegant.


So how about calling qemu_mutex_lock_iothread() before 
qemu_chr_fe_set_handlers() ?


Thanks


Fix it by serializing compare thread's initialization with main thread.

Signed-off-by: zhanghailiang 
---
   net/colo-compare.c | 6 ++
   1 file changed, 6 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 54e6d40..a6bf419 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -83,6 +83,7 @@ typedef struct CompareState {
   GHashTable *connection_track_table;
   /* compare thread, a thread for each NIC */
   QemuThread thread;
+QemuSemaphore thread_ready;
   
   GMainContext *worker_context;

   GMainLoop *compare_loop;
@@ -557,6 +558,8 @@ static void *colo_compare_thread(void *opaque)
 (GSourceFunc)check_old_packet_regular, s, NULL);
   g_source_attach(timeout_source, s->worker_context);
   
+qemu_sem_post(>thread_ready);

+
   g_main_loop_run(s->compare_loop);
   
   g_source_unref(timeout_source);

@@ -707,12 +710,15 @@ static void colo_compare_complete(UserCreatable *uc, 
Error **errp)
 connection_key_equal,
 g_free,
 connection_destroy);
+qemu_sem_init(>thread_ready, 0);
   
   sprintf(thread_name, "colo-compare %d", compare_id);

   qemu_thread_create(>thread, thread_name,
  colo_compare_thread, s,
  QEMU_THREAD_JOINABLE);
   compare_id++;
+qemu_sem_wait(>thread_ready);
+qemu_sem_destroy(>thread_ready);
   
   return;

   }


.

Re: [Qemu-devel] [QEMU-2.8] Source QEMU crashes with: "bdrv_co_pwritev: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed"

2017-04-22 Thread Hailiang Zhang


Hi,

I think the bellow patch can fix your problme.
[PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH
https://patchwork.kernel.org/patch/9591885/

Actually, we encounter the same problem in our test, we fix it with the follow 
patch:

 From 0e4d6d706afd9909b5fd71536b45c58af60892f8 Mon Sep 17 00:00:00 2001
 From: zhanghailiang
 Date: Tue, 21 Mar 2017 09:44:36 +0800
 Subject: [PATCH] migration: Re-activate blocks whenever migration been
  cancelled

 In commit 1d2acc3162d9c7772510c973f446353fbdd1f9a8, we try to fix the bug
 'bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed'
 which occured in migration cancelling process.

 But it seems that we didn't cover all the cases, we caught such a case 
which
 slipped from the old fixup in our test: if libvirtd cancelled the migration
 process for a shutting down VM, it will send 'system_reset' command first,
 and then 'cont' command behind, after VM resumes to run, it will trigger 
the above
 error reports, because we didn't regain the control of blocks for VM.

 Signed-off-by: zhanghailiang
 Signed-off-by: Hongyang Yang
 ---
  block.c   | 12 +++-
  include/block/block.h |  1 +
  include/migration/migration.h |  3 ---
  migration/migration.c |  7 +--
  qmp.c |  4 +---
  5 files changed, 14 insertions(+), 13 deletions(-)

 diff --git a/block.c b/block.c
 index 6e906ec..c2555b0 100644
 --- a/block.c
 +++ b/block.c
 @@ -3875,6 +3875,13 @@ void bdrv_invalidate_cache(BlockDriverState *bs, 
Error **errp)
  }
  }

 +static bool is_inactivated;
 +
 +bool bdrv_is_inactivated(void)
 +{
 +return is_inactivated;
 +}
 +
  void bdrv_invalidate_cache_all(Error **errp)
  {
  BlockDriverState *bs;
 @@ -3892,6 +3899,7 @@ void bdrv_invalidate_cache_all(Error **errp)
  return;
  }
  }
 +is_inactivated = false;
  }

  static int bdrv_inactivate_recurse(BlockDriverState *bs,
 @@ -3948,7 +3956,9 @@ out:
  for (bs = bdrv_first(); bs; bs = bdrv_next()) {
  aio_context_release(bdrv_get_aio_context(bs));
  }
 -
 +if (!ret) {
 +is_inactivated = true;
 +}
  return ret;
  }

 diff --git a/include/block/block.h b/include/block/block.h
 index 5149260..f77b57f 100644
 --- a/include/block/block.h
 +++ b/include/block/block.h
 @@ -365,6 +365,7 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void 
*buf);
  void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
  void bdrv_invalidate_cache_all(Error **errp);
  int bdrv_inactivate_all(void);
 +bool bdrv_is_inactivated(void);

  /* Ensure contents are flushed to disk.  */
  int bdrv_flush(BlockDriverState *bs);
 diff --git a/include/migration/migration.h b/include/migration/migration.h
 index 5720c88..a9a2071 100644
 --- a/include/migration/migration.h
 +++ b/include/migration/migration.h
 @@ -183,9 +183,6 @@ struct MigrationState
  /* Flag set once the migration thread is running (and needs joining) 
*/
  bool migration_thread_running;

 -/* Flag set once the migration thread called bdrv_inactivate_all */
 -bool block_inactive;
 -
  /* Queue of outstanding page requests from the destination */
  QemuMutex src_page_req_mutex;
  QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
src_page_requests;
 diff --git a/migration/migration.c b/migration/migration.c
 index 54060f7..7c3d593 100644
 --- a/migration/migration.c
 +++ b/migration/migration.c
 @@ -1015,14 +1015,12 @@ static void migrate_fd_cancel(MigrationState *s)
  if (s->state == MIGRATION_STATUS_CANCELLING && f) {
  qemu_file_shutdown(f);
  }
 -if (s->state == MIGRATION_STATUS_CANCELLING && s->block_inactive) {
 +if (bdrv_is_inactivated()) {
  Error *local_err = NULL;

  bdrv_invalidate_cache_all(_err);
  if (local_err) {
  error_report_err(local_err);
 -} else {
 -s->block_inactive = false;
  }
  }
  }
 @@ -1824,7 +1822,6 @@ static void migration_completion(MigrationState *s, 
int current_active_state,
  if (ret >= 0) {
  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
  qemu_savevm_state_complete_precopy(s->to_dst_file, false);
 -s->block_inactive = true;
  }
  }
  qemu_mutex_unlock_iothread();
 @@ -1879,8 +1876,6 @@ fail_invalidate:
  bdrv_invalidate_cache_all(_err);

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-04-21 Thread Hailiang Zhang


On 2017/4/20 13:15, Jason Wang wrote:


On 2017年04月18日 14:58, Hailiang Zhang wrote:

On 2017/4/18 11:55, Jason Wang wrote:

On 2017年04月17日 19:04, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 14:38, Jason Wang wrote:

On 2017年04月14日 14:22, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 13:57, Jason Wang wrote:

On 2017年02月22日 17:31, Zhang Chen wrote:

On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify
mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect
to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?


Thanks
Zhang Chen

I was thinking the possibility of using similar way to for colo
compare.
E.g can we use socket? This can saves duplicated codes more or less.

Since there are too many sockets used by filter and COLO, (Two unix
sockets and two
tcp sockets for each vNIC), I don't want to introduce more ;) ,
but
i'm not sure if it is
possible to make it more flexible and optional, abstract these
duplicated codes,
pass the opened fd (No matter eventfd or socket fd ) as parameter,
for
example.
Is this way acceptable ?

Thanks,
Hailiang

Yes, that's kind of what I want. We don't want to use two message
format. Passing a opened fd need management support, we still need a
fallback if there's no management on top. For qemu/kvm, we can do all
stuffs transparent to the cli by e.g socketpair() or others, but
the key
is to have a unified message format.

After a deeper investigation, i think we can re-use most codes, since
there is no
existing way to notify xen (no ?), we still needs notify chardev
socket (Be used to notify xen, it is optional.)
(http://patchwork.ozlabs.org/patch/733431/ "COLO-compare: Add Xen
notify chardev socket handler frame")

Yes and actually you can use this for bi-directional communication. The
only differences is the implementation of comparing.


Besides, there is an existing qmp comand 'xen-colo-do-checkpoint',

I don't see this in master?

Er, it has been merged already, please see migration/colo.c, void
qmp_xen_colo_do_checkpoint(Error **errp);

Aha, I see. Thanks.


;)


we can re-use it to notify
colo-compare objects and other filter objects to do checkpoint, for
the opposite direction, we use
the notify chardev socket (Only for xen).

Just want to make sure I understand the design, who will trigger this
command? Management?

The command will be issued by XEN (xc_save ?), the original existing
xen-colo-do-checkpoint
command now only be used to notify block replication to do checkpoint,
we can re-use it for filters too.

So it was called by management. For KVM case, we probably don't need
this since the comparing thread are under control of qemu.


Yes, you are right.


Can we just use the socket?

I don't quite understand ...
Just as the codes showed bellow, in this scenario,
XEN notifies colo-compare and fiters do checkpoint by using qmp command,

Yes, that's just what I mean. Technically Xen can use socket to do this too.


Yes, great, since we have come to an agreement on the scenario, I'll update 
this series.

Thanks,
Hailiang.


Thanks


and colo-compare notifies XEN about net inconsistency event by using
the new socket.


So the codes will be like:
diff --git a/migration/colo.c b/migration/colo.c
index 91da936..813c281 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -224,7 +224,19 @@ ReplicationStatus
*qmp_query_xen_replication_status(Error **errp)

   void qmp_xen_colo_do_checkpoint(Error **errp)
   {
+Error *local_err = NULL;
+
   replication_do_checkpoint_all(errp);
+/* Notify colo-compare and other filters to do checkpoint */
+colo_notify_compares_event(NULL, COLO_CHECKPOINT, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+colo_notify_filters_event(COLO_CHECKPOINT, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
   }

   static void colo_send_message(QEMUFile *f, COLOMessage msg,
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 24e13f0..de975c5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -391,6 +391,9 @@ static void colo_compare_inconsistent_notify(void)
   {
   notifier_list_notify(_compare_notifiers,
   migrate_get_current());

KVM will use this notifier/callback way, and in this way, we can avoid
the redundant socket.


+if (s->notify_dev) {
+   /* Do something, notify remote side through notify dev */
+}
   }

If we have a notify socket configured, we will send the message about
net inconsistent eve

Re: [Qemu-devel] [PATCH 12/15] savevm: split the process of different stages for loadvm/savevm

2017-04-21 Thread Hailiang Zhang


On 2017/4/20 17:09, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

On 2017/4/8 1:18, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

There are several stages during loadvm/savevm process. In different stage,
migration incoming processes different types of sections.
We want to control these stages more accuracy, it will benefit COLO
performance, we don't have to save type of QEMU_VM_SECTION_START
sections everytime while do checkpoint, besides, we want to separate
the process of saving/loading memory and devices state.

So we add three new helper functions: qemu_loadvm_state_begin(),
qemu_load_device_state() and qemu_savevm_live_state() to achieve
different process during migration.

Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
public.

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
   include/sysemu/sysemu.h |  6 ++
   migration/savevm.c  | 55 
++---
   2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 7ed665a..95cae41 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -132,7 +132,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, 
const char *name,
  uint64_t *start_list,
  uint64_t *length_list);
+void qemu_savevm_live_state(QEMUFile *f);
+int qemu_save_device_state(QEMUFile *f);
+
   int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state_begin(QEMUFile *f);
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+int qemu_load_device_state(QEMUFile *f);
   extern int autostart;
diff --git a/migration/savevm.c b/migration/savevm.c
index 9c2d239..dac478b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -54,6 +54,7 @@
   #include "qemu/cutils.h"
   #include "io/channel-buffer.h"
   #include "io/channel-file.h"
+#include "migration/colo.h"
   #ifndef ETH_P_RARP
   #define ETH_P_RARP 0x8035
@@ -1279,13 +1280,21 @@ done:
   return ret;
   }
-static int qemu_save_device_state(QEMUFile *f)
+void qemu_savevm_live_state(QEMUFile *f)
   {
-SaveStateEntry *se;
+/* save QEMU_VM_SECTION_END section */
+qemu_savevm_state_complete_precopy(f, true);
+qemu_put_byte(f, QEMU_VM_EOF);
+}
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+int qemu_save_device_state(QEMUFile *f)
+{
+SaveStateEntry *se;
+if (!migration_in_colo_state()) {
+qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+}

Note that got split out into qemu_savevm_state_header() at some point.

Do you mean i should use the wrapper qemu_savevm_state_heade() here ?

Yes, I think so; best to keep the code that writes the file headers in one 
place.


OK, Will fix in next version, thanks.


Dave


Dave


   cpu_synchronize_all_states();
   QTAILQ_FOREACH(se, _state.handlers, entry) {
@@ -1336,8 +1345,6 @@ enum LoadVMExitCodes {
   LOADVM_QUIT =  1,
   };
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
-
   /* -- incoming postcopy messages -- */
   /* 'advise' arrives before any transfers just to tell us that a postcopy
* *might* happen - it might be skipped if precopy transferred everything
@@ -1942,7 +1949,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
   return 0;
   }
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
   {
   uint8_t section_type;
   int ret = 0;
@@ -2080,6 +2087,40 @@ int qemu_loadvm_state(QEMUFile *f)
   return ret;
   }
+int qemu_loadvm_state_begin(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
+int ret;
+
+if (qemu_savevm_state_blocked(_err)) {
+error_report_err(local_err);
+return -EINVAL;
+}
+/* Load QEMU_VM_SECTION_START section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to loadvm begin work: %d", ret);
+}
+return ret;
+}
+
+int qemu_load_device_state(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+int ret;
+
+/* Load QEMU_VM_SECTION_FULL section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to load device state: %d", ret);
+return ret;
+}
+
+cpu_synchronize_all_post_init();
+return 0;
+}
+
   int save_vmstate(Monitor *mon,

Re: [Qemu-devel] [PATCH] COLO-compare: Add compare_lock aviod comparison conflict

2017-04-20 Thread Hailiang Zhang


On 2017/4/20 12:32, Zhang Chen wrote:

When network traffic heavy, compare_pri_rs_finalize() and
compare_sec_rs_finalize() have a chance to confilct.
Both of them call colo_compare_connection() to compare packet,
But during compare_pri_rs_finalize() comparison, have secondary
packet come and call compare_sec_rs_finalize(), that packet will be
handle twice. If packet same, the pkt will be double free.


Interesting, if I'm right, this should not happen, because, all the comparing 
works
are done in colo compare thread, so there is no chance to access the 
connect_list
concurrently.  Besides, even both of the packets from primary and secondary 
arrive
at the same time, it should only be handle once, we will handle it with the 
later arrived one,
No ?


Signed-off-by: Zhang Chen 
---
  net/colo-compare.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 54e6d40..686c1b4 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -79,6 +79,8 @@ typedef struct CompareState {
   * element type: Connection
   */
  GQueue conn_list;
+/* compare lock */
+QemuMutex compare_lock;
  /* hashtable to save connection */
  GHashTable *connection_track_table;
  /* compare thread, a thread for each NIC */
@@ -619,7 +621,9 @@ static void compare_pri_rs_finalize(SocketReadState *pri_rs)
  compare_chr_send(>chr_out, pri_rs->buf, pri_rs->packet_len);
  } else {
  /* compare connection */
+qemu_mutex_lock(>compare_lock);
  g_queue_foreach(>conn_list, colo_compare_connection, s);
+qemu_mutex_unlock(>compare_lock);
  }
  }
  
@@ -631,7 +635,9 @@ static void compare_sec_rs_finalize(SocketReadState *sec_rs)

  trace_colo_compare_main("secondary: unsupported packet in");
  } else {
  /* compare connection */
+qemu_mutex_lock(>compare_lock);
  g_queue_foreach(>conn_list, colo_compare_connection, s);
+qemu_mutex_unlock(>compare_lock);
  }
  }
  
@@ -702,6 +708,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)

  net_socket_rs_init(>sec_rs, compare_sec_rs_finalize);
  
  g_queue_init(>conn_list);

+qemu_mutex_init(>compare_lock);
  
  s->connection_track_table = g_hash_table_new_full(connection_key_hash,

connection_key_equal,
@@ -771,6 +778,7 @@ static void colo_compare_finalize(Object *obj)
  g_queue_foreach(>conn_list, colo_flush_packets, s);
  
  g_queue_clear(>conn_list);

+qemu_mutex_destroy(>compare_lock);
  
  g_hash_table_destroy(s->connection_track_table);

  g_free(s->pri_indev);

Re: [Qemu-devel] 答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster

2017-04-19 Thread Hailiang Zhang

On 2017/4/19 10:02, Xiao Guangrong wrote:

On 04/13/2017 05:38 PM, Hailiang Zhang wrote:

On 2017/4/13 17:35, Xiao Guangrong wrote:

On 04/13/2017 05:29 PM, Hailiang Zhang wrote:

On 2017/4/13 17:18, Xiao Guangrong wrote:

On 04/13/2017 05:05 PM, Zhanghailiang wrote:

Hi,

-邮件原件-
发件人: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
代表 Xiao Guangrong
发送时间: 2017年4月13日 16:53
收件人: Paolo Bonzini; m...@redhat.com; mtosa...@redhat.com
抄送: qemu-devel@nongnu.org; k...@vger.kernel.org;
yunfang...@tencent.com; Xiao Guangrong
主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster

On 04/13/2017 04:39 PM, Xiao Guangrong wrote:

On 04/13/2017 02:37 PM, Paolo Bonzini wrote:

On 12/04/2017 17:51, guangrong.x...@gmail.com wrote:

The root cause is that the clock will be lost if the periodic
period
is changed as currently code counts the next periodic time like
this:
next_irq_clock = (cur_clock & ~(period - 1)) + period;

consider the case if cur_clock = 0x11FF and period = 0x100, then
the
next_irq_clock is 0x1200, however, there is only 1 clock left to
trigger the next irq. Unfortunately, Windows guests (at least
Windows7) change the period very frequently if it runs the attached
code, so that the lost clock is accumulated, the wall-time become
faster and faster

Very interesting.

Yes, indeed.

However, I think that the above should be exactly how the RTC should
work. The original RTC circuit had 22 divider stages (see page
13 of
the datasheet[1], at the bottom right), and the periodic interrupt
taps the rising edge of one of the dividers (page 16, second
paragraph). The datasheet also never mentions a comparator being
used to trigger the periodic interrupts.

That was my thought before, however, after more test, i am not
sure if
re-configuring RegA changes these divider stages internal...

Have you checked that this Windows bug doesn't happen on real
hardware too? Or is the combination of driftfix=slew and changing
periods that is a problem?

I have two physical windows 7 machines, both of them have
'useplatformclock = off' and ntp disabled, the wall time is really
accurate. The difference is that the physical machines are using
Intel
Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
issue is easily be reproduced just in ~10 mins.

Our test mostly focus on 'driftfix=slew' and after this patchset the
time is accurate and stable.

I will do the test for dropping 'slew' and see what will happen...

Well, the time is easily observed to be faster if 'driftfix=slew' is
not used. :(

You mean, it only fixes the one case which with the ' driftfix=slew '
is used ?

No. for both.

We encountered this problem too, I have tried to fix it long time ago.
https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
(It seems that your solution is more useful)
But it seems that it is impossible to fix, we need to emulate the
behaviors of real hardware,
but we didn't find any clear description about it. And it seems that
other virtualization platforms

That is the issue, the hardware spec does not detail how the clock is
counted when the timer interval is changed. What we can do at this time
is that speculate it from the behaviors. Current RTC is completely
unusable anyway.

have this problem too:
VMware:
https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
Heper-v:
https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/

Hmm, slower clock is understandable, does really the Windows7 on hyperV
have faster clock? Did you meet it?

I don't know, we didn't test it, besides, I'd like to know how long did
your testcase run before
you judge it is stable with 'driftfix=slew' option? (My previous patch
can't fix it completely but
only narrows the gap between timer in guest and real timer.)

More than 12 hours.

Great, I'll test and look into it ... thanks.

Hi Hailiang,

Does this patchset work for you? :)

Yes, i think it works for us, nice work :)

Re: [Qemu-devel] [PATCH v2] char: Fix removing wrong GSource that be found by fd_in_tag

2017-04-18 Thread Hailiang Zhang


On 2017/4/18 16:11, Marc-André Lureau wrote:

Hi

- Original Message -

We use fd_in_tag to find a GSource, fd_in_tag is return value of
g_source_attach(GSource *source, GMainContext *context), the return
value is unique only in the same context, so we may get the same
values with different 'context' parameters.

It is no problem to find the right fd_in_tag by using
  g_main_context_find_source_by_id(GMainContext *context, guint source_id)
while there is only one default main context.

But colo-compare tries to create/use its own context, and if we pass wrong
'context' parameter with right fd_in_tag, we will find a wrong GSource
to handle. We tried to fix the related codes in commit b43dec, but it didn't
fix the bug completely, because we still have some codes didn't pass *right*
context parameter for remove_fd_in_watch().

Let's fix it by record the GSource directly instead of fd_in_tag.

You could simply name it "gsource" or "source" instead of "chr_gsource" to avoid 
redundancy in chr->chr_gsource.


OK, besides, i forgot to update the hash, I'll  fix both of them in V3, thanks.


Signed-off-by: zhanghailiang 

looks good to me:
Reviewed-by: Marc-André Lureau 



---
v2:
- Fix minor comments from Marc-André Lureau
---
  chardev/char-fd.c |  8 
  chardev/char-io.c | 23 ---
  chardev/char-io.h |  4 ++--
  chardev/char-pty.c|  6 +++---
  chardev/char-socket.c |  8 
  chardev/char-udp.c|  8 
  chardev/char.c|  2 +-
  include/sysemu/char.h |  2 +-
  8 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/chardev/char-fd.c b/chardev/char-fd.c
index 548dd4c..7f0169d 100644
--- a/chardev/char-fd.c
+++ b/chardev/char-fd.c
@@ -58,7 +58,7 @@ static gboolean fd_chr_read(QIOChannel *chan, GIOCondition
cond, void *opaque)
  ret = qio_channel_read(
  chan, (gchar *)buf, len, NULL);
  if (ret == 0) {
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
  qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
  return FALSE;
  }
@@ -89,9 +89,9 @@ static void fd_chr_update_read_handler(Chardev *chr,
  {
  FDChardev *s = FD_CHARDEV(chr);
  
-remove_fd_in_watch(chr, NULL);

+remove_fd_in_watch(chr);
  if (s->ioc_in) {
-chr->fd_in_tag = io_add_watch_poll(chr, s->ioc_in,
+chr->chr_gsource = io_add_watch_poll(chr, s->ioc_in,
 fd_chr_read_poll,
 fd_chr_read, chr,
 context);
@@ -103,7 +103,7 @@ static void char_fd_finalize(Object *obj)
  Chardev *chr = CHARDEV(obj);
  FDChardev *s = FD_CHARDEV(obj);
  
-remove_fd_in_watch(chr, NULL);

+remove_fd_in_watch(chr);
  if (s->ioc_in) {
  object_unref(OBJECT(s->ioc_in));
  }
diff --git a/chardev/char-io.c b/chardev/char-io.c
index b4bb094..d781ad6 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -98,7 +98,7 @@ static GSourceFuncs io_watch_poll_funcs = {
  .finalize = io_watch_poll_finalize,
  };
  
-guint io_add_watch_poll(Chardev *chr,

+GSource *io_add_watch_poll(Chardev *chr,
  QIOChannel *ioc,
  IOCanReadHandler *fd_can_read,
  QIOChannelFunc fd_read,
@@ -106,7 +106,6 @@ guint io_add_watch_poll(Chardev *chr,
  GMainContext *context)
  {
  IOWatchPoll *iwp;
-int tag;
  char *name;
  
  iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs,

@@ -122,21 +121,15 @@ guint io_add_watch_poll(Chardev *chr,
  g_source_set_name((GSource *)iwp, name);
  g_free(name);
  
-tag = g_source_attach(>parent, context);

+g_source_attach(>parent, context);
  g_source_unref(>parent);
-return tag;
+return (GSource *)iwp;
  }
  
-static void io_remove_watch_poll(guint tag, GMainContext *context)

+static void io_remove_watch_poll(GSource *source)
  {
-GSource *source;
  IOWatchPoll *iwp;
  
-g_return_if_fail(tag > 0);

-
-source = g_main_context_find_source_by_id(context, tag);
-g_return_if_fail(source != NULL);
-
  iwp = io_watch_poll_from_source(source);
  if (iwp->src) {
  g_source_destroy(iwp->src);
@@ -146,11 +139,11 @@ static void io_remove_watch_poll(guint tag,
GMainContext *context)
  g_source_destroy(>parent);
  }
  
-void remove_fd_in_watch(Chardev *chr, GMainContext *context)

+void remove_fd_in_watch(Chardev *chr)
  {
-if (chr->fd_in_tag) {
-io_remove_watch_poll(chr->fd_in_tag, context);
-chr->fd_in_tag = 0;
+if (chr->chr_gsource) {
+io_remove_watch_poll(chr->chr_gsource);
+chr->chr_gsource = NULL;
  }
  }
  
diff --git a/chardev/char-io.h b/chardev/char-io.h

index 842be56..55973a7 100644
--- a/chardev/char-io.h
+++ b/chardev/char-io.h
@@ -29,14 +29,14 @@
  #include

Re: [Qemu-devel] [PATCH] char: Fix removing wrong GSource that be found by fd_in_tag

2017-04-18 Thread Hailiang Zhang


On 2017/4/18 21:36, Eric Blake wrote:

On 04/14/2017 05:10 AM, Marc-André Lureau wrote:

Hi

- Original Message -

We use fd_in_tag to find a GSource, fd_in_tag is return value of
g_source_attach(GSource *source, GMainContext *context), the return
value is unique only in the same context, so we may get the same
values with different 'context' parameters.

It is no problem to find the right fd_in_tag by using
  g_main_context_find_source_by_id(GMainContext *context, guint source_id)
while there is only one default main context.

But colo-compare tries to create/use its own context, and if we pass wrong
'context' parameter with right fd_in_tag, we will find a wrong GSource
to handle. We tied to fix the related codes in commit b43dec, but it didn't

tied->tried

Please use a bit longer commit sha1, or full sha1, it will likely conflict 
otherwise in the future.

6 chars is indeed short, 7 is git's default as usually long enough,
although I've encountered collisions that require 8 chars.  [And google
has proved that you can have a collision across the entire hash,


Thanks for your explanation.  I didn't notice that before,
I have been always using the 6 chars hash to get the commit ...


although that is harder to generate.] I generally use 8 or so when
writing commit messages.  Fortunately, even if a collision is introduces
later, someone that is motivated enough can still resolve the collision
by filtering out any collisions that resolve to non-commits, and among
the remaining colliding SHA1 focus on the one that has a commit date
which predates the message with the reference.


Agreed, but I'd better to update the comment to make it clearer. thanks.

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-04-18 Thread Hailiang Zhang


On 2017/4/18 11:55, Jason Wang wrote:


On 2017年04月17日 19:04, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 14:38, Jason Wang wrote:

On 2017年04月14日 14:22, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 13:57, Jason Wang wrote:

On 2017年02月22日 17:31, Zhang Chen wrote:

On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?


Thanks
Zhang Chen

I was thinking the possibility of using similar way to for colo
compare.
E.g can we use socket? This can saves duplicated codes more or less.

Since there are too many sockets used by filter and COLO, (Two unix
sockets and two
   tcp sockets for each vNIC), I don't want to introduce more ;) , but
i'm not sure if it is
possible to make it more flexible and optional, abstract these
duplicated codes,
pass the opened fd (No matter eventfd or socket fd ) as parameter, for
example.
Is this way acceptable ?

Thanks,
Hailiang

Yes, that's kind of what I want. We don't want to use two message
format. Passing a opened fd need management support, we still need a
fallback if there's no management on top. For qemu/kvm, we can do all
stuffs transparent to the cli by e.g socketpair() or others, but the key
is to have a unified message format.

After a deeper investigation, i think we can re-use most codes, since
there is no
existing way to notify xen (no ?), we still needs notify chardev
socket (Be used to notify xen, it is optional.)
(http://patchwork.ozlabs.org/patch/733431/ "COLO-compare: Add Xen
notify chardev socket handler frame")

Yes and actually you can use this for bi-directional communication. The
only differences is the implementation of comparing.


Besides, there is an existing qmp comand 'xen-colo-do-checkpoint',

I don't see this in master?


Er, it has been merged already, please see migration/colo.c, void 
qmp_xen_colo_do_checkpoint(Error **errp);


we can re-use it to notify
colo-compare objects and other filter objects to do checkpoint, for
the opposite direction, we use
the notify chardev socket (Only for xen).

Just want to make sure I understand the design, who will trigger this
command? Management?


The command will be issued by XEN (xc_save ?), the original existing 
xen-colo-do-checkpoint
command now only be used to notify block replication to do checkpoint, we can 
re-use it for filters too.


Can we just use the socket?


I don't quite understand ...
Just as the codes showed bellow, in this scenario,
XEN notifies colo-compare and fiters do checkpoint by using qmp command,
and colo-compare notifies XEN about net inconsistency event by using the new 
socket.


So the codes will be like:
diff --git a/migration/colo.c b/migration/colo.c
index 91da936..813c281 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -224,7 +224,19 @@ ReplicationStatus
*qmp_query_xen_replication_status(Error **errp)

  void qmp_xen_colo_do_checkpoint(Error **errp)
  {
+Error *local_err = NULL;
+
  replication_do_checkpoint_all(errp);
+/* Notify colo-compare and other filters to do checkpoint */
+colo_notify_compares_event(NULL, COLO_CHECKPOINT, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+colo_notify_filters_event(COLO_CHECKPOINT, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
  }

  static void colo_send_message(QEMUFile *f, COLOMessage msg,
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 24e13f0..de975c5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -391,6 +391,9 @@ static void colo_compare_inconsistent_notify(void)
  {
  notifier_list_notify(_compare_notifiers,
  migrate_get_current());


KVM will use this notifier/callback way, and in this way, we can avoid the 
redundant socket.


+if (s->notify_dev) {
+   /* Do something, notify remote side through notify dev */
+}
  }


If we have a notify socket configured, we will send the message about net 
inconsistent event.



  void colo_compare_register_notifier(Notifier *notify)

How about this scenario ？

See my reply above, and we need unify the message format too. Raw string
is ok but we'd better have something like TLV or others.


Agreed, we need it to be more standard.

Thanks,
Hailiang


Thanks


Thoughts?

Thanks


Thanks


.


.






.

Re: [Qemu-devel] [PULL 2/8] replication: clarify permissions

2017-04-17 Thread Hailiang Zhang


On 2017/4/18 9:23, Eric Blake wrote:

On 03/17/2017 08:15 AM, Kevin Wolf wrote:

From: Changlong Xie 

Even if hidden_disk, secondary_disk are backing files, they all need
write permissions in replication scenario. Otherwise we will encouter
below exceptions on secondary side during adding nbd server:

{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk', 'writable': 
true } }
{"error": {"class": "GenericError", "desc": "Conflicts with use by 
hidden-qcow2-driver as 'backing', which does not allow 'write' on sec-qcow2-driver-for-nbd"}}

CC: Zhang Hailiang 
CC: Zhang Chen 
CC: Wen Congyang 

This address for Wen Congyang is different than the one listed in
MAINTAINERS for replication (M: Wen Congyang ),
and different still from addresses my mailer has harvested from other
posts (wencongy...@gmail.com).  The MAINTAINERS entry is now resulting
in 'undelivered mail' bounce messages, can you please submit an update
to MAINTAINERS with your new preferred address? [or gently correct me if
I'm confusing two people with the same name?]



No, the same people, he just left his job from fujitsu, the entry in MAINTAINERS
file needs to be updated.

Cc: Changlong Xie 
Hi Changlong, would you please send a patch to update it ?

Hailiang

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-04-17 Thread Hailiang Zhang


Hi Jason,

On 2017/4/14 14:38, Jason Wang wrote:


On 2017年04月14日 14:22, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 13:57, Jason Wang wrote:

On 2017年02月22日 17:31, Zhang Chen wrote:

On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?


Thanks
Zhang Chen

I was thinking the possibility of using similar way to for colo compare.
E.g can we use socket? This can saves duplicated codes more or less.

Since there are too many sockets used by filter and COLO, (Two unix
sockets and two
  tcp sockets for each vNIC), I don't want to introduce more ;) , but
i'm not sure if it is
possible to make it more flexible and optional, abstract these
duplicated codes,
pass the opened fd (No matter eventfd or socket fd ) as parameter, for
example.
Is this way acceptable ?

Thanks,
Hailiang

Yes, that's kind of what I want. We don't want to use two message
format. Passing a opened fd need management support, we still need a
fallback if there's no management on top. For qemu/kvm, we can do all
stuffs transparent to the cli by e.g socketpair() or others, but the key
is to have a unified message format.


After a deeper investigation, i think we can re-use most codes, since there is 
no
existing way to notify xen (no ?), we still needs notify chardev socket (Be 
used to notify xen, it is optional.)
(http://patchwork.ozlabs.org/patch/733431/ "COLO-compare: Add Xen notify chardev 
socket handler frame")

Besides, there is an existing qmp comand 'xen-colo-do-checkpoint', we can 
re-use it to notify
colo-compare objects and other filter objects to do checkpoint, for the 
opposite direction, we use
the notify chardev socket (Only for xen).

So the codes will be like:
diff --git a/migration/colo.c b/migration/colo.c
index 91da936..813c281 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -224,7 +224,19 @@ ReplicationStatus *qmp_query_xen_replication_status(Error 
**errp)

 void qmp_xen_colo_do_checkpoint(Error **errp)
 {
+Error *local_err = NULL;
+
 replication_do_checkpoint_all(errp);
+/* Notify colo-compare and other filters to do checkpoint */
+colo_notify_compares_event(NULL, COLO_CHECKPOINT, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+colo_notify_filters_event(COLO_CHECKPOINT, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
 }

 static void colo_send_message(QEMUFile *f, COLOMessage msg,
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 24e13f0..de975c5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -391,6 +391,9 @@ static void colo_compare_inconsistent_notify(void)
 {
 notifier_list_notify(_compare_notifiers,
 migrate_get_current());
+if (s->notify_dev) {
+   /* Do something, notify remote side through notify dev */
+}
 }

 void colo_compare_register_notifier(Notifier *notify)

How about this scenario ？


Thoughts?

Thanks


Thanks


.





.

Re: [Qemu-devel] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-04-17 Thread Hailiang Zhang


On 2017/4/12 22:28, Eric Blake wrote:

On 04/12/2017 09:05 AM, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
+++ b/qapi/block-core.json
@@ -2661,12 +2661,20 @@
  #  node who owns the replication node chain. Must not be given in
  #  primary mode.
  #
+# @shared-disk-id: Id of shared disk while is replication mode, if @shared-disk
+#  is true, this option is required (Since: 2.10)
+#
+# @shared-disk: To indicate whether or not a disk is shared by primary VM
+#   and secondary VM. (The default is false) (Since: 2.10)
+#
  # Since: 2.9
  ##
  { 'struct': 'BlockdevOptionsReplication',
'base': 'BlockdevOptionsGenericFormat',
'data': { 'mode': 'ReplicationMode',
-'*top-id': 'str' } }
+'*top-id': 'str',
+'*shared-disk-id': 'str',
+'*shared-disk': 'bool' } }

Do we need a separate bool and string? Or is it sufficient to say that
if shared-disk is omitted, we are not sharing, and that if a shared-disk
string is present, then we are sharing and it names the id of the shared
disk.


Er,  Yes, We need both of them, the command line of secondary sides is like:

 -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
file.driver=qcow2,top-id=active-disk0,\
file.file.filename=/mnt/ramfs/active_disk.img,\
file.backing=hidden_disk0,shared-disk=on
We only need the bool shared-disk to indicate whether disk is sharing or not, 
but
for primary side, we need to the blockdev-add command to tell primary which 
disk is shared.
  { 'execute': 'blockdev-add',
'arguments': {
'driver': 'replication',
'node-name': 'rep',
'mode': 'primary',
'shared-disk-id': 'primary_disk0',
'shared-disk': true,
'file': {
'driver': 'nbd',
'export': 'hidden_disk0',
'server': {
'type': 'inet',
'data': {
'host': 'xxx.xxx.xxx.xxx',
'port': 'yyy'
}
}
}
 }
  }

Re: [Qemu-devel] [PATCH] char: Fix removing wrong GSource that be found by fd_in_tag

2017-04-14 Thread Hailiang Zhang


Hi,

On 2017/4/14 18:10, Marc-André Lureau wrote:

Hi

- Original Message -

We use fd_in_tag to find a GSource, fd_in_tag is return value of
g_source_attach(GSource *source, GMainContext *context), the return
value is unique only in the same context, so we may get the same
values with different 'context' parameters.

It is no problem to find the right fd_in_tag by using
  g_main_context_find_source_by_id(GMainContext *context, guint source_id)
while there is only one default main context.

But colo-compare tries to create/use its own context, and if we pass wrong
'context' parameter with right fd_in_tag, we will find a wrong GSource
to handle. We tied to fix the related codes in commit b43dec, but it didn't

tied->tried

Please use a bit longer commit sha1, or full sha1, it will likely conflict 
otherwise in the future.


OK, I'll replace it with the full sha1.

fix the bug completely, because we still have some codes didn't pass *right*
context parameter for remove_fd_in_watch().

Mixing contexts sounds like a colo-compare bug, did you fix it there too?


Yes, we don't have to change any other codes in colo-compare,
We just call qemu_chr_fe_set_handlers() in colo-compare to detach the old 
GSource
from main default context, and attach the new GSource to the new context.


Let's fix it by record the GSource directly instead of fd_in_tag.

Make sense to me, and even simplify a bit the code. We should be more careful 
with pointers though (the reason why tag existed in the first place I imagine).


Agreed.


Signed-off-by: zhanghailiang 
---
  chardev/char-fd.c |  8 
  chardev/char-io.c | 23 ---
  chardev/char-io.h |  4 ++--
  chardev/char-pty.c|  6 +++---
  chardev/char-socket.c |  8 
  chardev/char-udp.c|  8 
  chardev/char.c|  2 +-
  include/sysemu/char.h |  2 +-
  8 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/chardev/char-fd.c b/chardev/char-fd.c
index 548dd4c..7f0169d 100644
--- a/chardev/char-fd.c
+++ b/chardev/char-fd.c
@@ -58,7 +58,7 @@ static gboolean fd_chr_read(QIOChannel *chan, GIOCondition
cond, void *opaque)
  ret = qio_channel_read(
  chan, (gchar *)buf, len, NULL);
  if (ret == 0) {
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
  qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
  return FALSE;
  }
@@ -89,9 +89,9 @@ static void fd_chr_update_read_handler(Chardev *chr,
  {
  FDChardev *s = FD_CHARDEV(chr);
  
-remove_fd_in_watch(chr, NULL);

+remove_fd_in_watch(chr);
  if (s->ioc_in) {
-chr->fd_in_tag = io_add_watch_poll(chr, s->ioc_in,
+chr->chr_gsource = io_add_watch_poll(chr, s->ioc_in,
 fd_chr_read_poll,
 fd_chr_read, chr,
 context);
@@ -103,7 +103,7 @@ static void char_fd_finalize(Object *obj)
  Chardev *chr = CHARDEV(obj);
  FDChardev *s = FD_CHARDEV(obj);
  
-remove_fd_in_watch(chr, NULL);

+remove_fd_in_watch(chr);
  if (s->ioc_in) {
  object_unref(OBJECT(s->ioc_in));
  }
diff --git a/chardev/char-io.c b/chardev/char-io.c
index b4bb094..6deb193 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -98,7 +98,7 @@ static GSourceFuncs io_watch_poll_funcs = {
  .finalize = io_watch_poll_finalize,
  };
  
-guint io_add_watch_poll(Chardev *chr,

+GSource *io_add_watch_poll(Chardev *chr,
  QIOChannel *ioc,
  IOCanReadHandler *fd_can_read,
  QIOChannelFunc fd_read,
@@ -106,7 +106,6 @@ guint io_add_watch_poll(Chardev *chr,
  GMainContext *context)
  {
  IOWatchPoll *iwp;
-int tag;
  char *name;
  
  iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs,

@@ -122,21 +121,15 @@ guint io_add_watch_poll(Chardev *chr,
  g_source_set_name((GSource *)iwp, name);
  g_free(name);
  
-tag = g_source_attach(>parent, context);

+g_source_attach(>parent, context);
  g_source_unref(>parent);
-return tag;
+return (GSource *)iwp;
  }
  
-static void io_remove_watch_poll(guint tag, GMainContext *context)

+static void io_remove_watch_poll(GSource *source)
  {
-GSource *source;
  IOWatchPoll *iwp;
  
-g_return_if_fail(tag > 0);

-
-source = g_main_context_find_source_by_id(context, tag);
-g_return_if_fail(source != NULL);
-
  iwp = io_watch_poll_from_source(source);
  if (iwp->src) {
  g_source_destroy(iwp->src);
@@ -146,11 +139,11 @@ static void io_remove_watch_poll(guint tag,
GMainContext *context)
  g_source_destroy(>parent);
  }
  
-void remove_fd_in_watch(Chardev *chr, GMainContext *context)

+void remove_fd_in_watch(Chardev *chr)
  {
-if (chr->fd_in_tag) {
-io_remove_watch_poll(chr->fd_in_tag, context);
-chr->fd_in_tag = 0;

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-04-14 Thread Hailiang Zhang


Hi Jason,

On 2017/4/14 13:57, Jason Wang wrote:


On 2017年02月22日 17:31, Zhang Chen wrote:


On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?


Thanks
Zhang Chen

I was thinking the possibility of using similar way to for colo compare.
E.g can we use socket? This can saves duplicated codes more or less.


Since there are too many sockets used by filter and COLO, (Two unix sockets and 
two
 tcp sockets for each vNIC), I don't want to introduce more ;) , but i'm not 
sure if it is
possible to make it more flexible and optional, abstract these duplicated codes,
pass the opened fd (No matter eventfd or socket fd ) as parameter, for example.
Is this way acceptable ?

Thanks,
Hailiang


Thanks


.

Re: [Qemu-devel] 答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster

2017-04-13 Thread Hailiang Zhang

On 2017/4/13 17:35, Xiao Guangrong wrote:

On 04/13/2017 05:29 PM, Hailiang Zhang wrote:

On 2017/4/13 17:18, Xiao Guangrong wrote:

On 04/13/2017 05:05 PM, Zhanghailiang wrote:

Hi,

On 04/13/2017 04:39 PM, Xiao Guangrong wrote:

On 04/13/2017 02:37 PM, Paolo Bonzini wrote:

On 12/04/2017 17:51, guangrong.x...@gmail.com wrote:

The root cause is that the clock will be lost if the periodic period
is changed as currently code counts the next periodic time like this:
next_irq_clock = (cur_clock & ~(period - 1)) + period;

consider the case if cur_clock = 0x11FF and period = 0x100, then the
next_irq_clock is 0x1200, however, there is only 1 clock left to
trigger the next irq. Unfortunately, Windows guests (at least
Windows7) change the period very frequently if it runs the attached
code, so that the lost clock is accumulated, the wall-time become
faster and faster

Very interesting.

Yes, indeed.

However, I think that the above should be exactly how the RTC should
work. The original RTC circuit had 22 divider stages (see page 13 of
the datasheet[1], at the bottom right), and the periodic interrupt
taps the rising edge of one of the dividers (page 16, second
paragraph). The datasheet also never mentions a comparator being
used to trigger the periodic interrupts.

That was my thought before, however, after more test, i am not sure if
re-configuring RegA changes these divider stages internal...

Have you checked that this Windows bug doesn't happen on real
hardware too? Or is the combination of driftfix=slew and changing
periods that is a problem?

I have two physical windows 7 machines, both of them have
'useplatformclock = off' and ntp disabled, the wall time is really
accurate. The difference is that the physical machines are using Intel
Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
issue is easily be reproduced just in ~10 mins.

Our test mostly focus on 'driftfix=slew' and after this patchset the
time is accurate and stable.

I will do the test for dropping 'slew' and see what will happen...

Well, the time is easily observed to be faster if 'driftfix=slew' is
not used. :(

You mean, it only fixes the one case which with the ' driftfix=slew '
is used ?

No. for both.

have this problem too:
VMware:
https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
Heper-v:
https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/

Hmm, slower clock is understandable, does really the Windows7 on hyperV
have faster clock? Did you meet it?

More than 12 hours.

Great, I'll test and look into it ... thanks.

Re: [Qemu-devel] 答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster

2017-04-13 Thread Hailiang Zhang

On 2017/4/13 17:18, Xiao Guangrong wrote:

On 04/13/2017 05:05 PM, Zhanghailiang wrote:

Hi,

-邮件原件-
发件人: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] 代表 Xiao
Guangrong
发送时间: 2017年4月13日 16:53
收件人: Paolo Bonzini; m...@redhat.com; mtosa...@redhat.com
抄送: qemu-devel@nongnu.org; k...@vger.kernel.org; yunfang...@tencent.com; Xiao
Guangrong
主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster

On 04/13/2017 04:39 PM, Xiao Guangrong wrote:

On 04/13/2017 02:37 PM, Paolo Bonzini wrote:

On 12/04/2017 17:51, guangrong.x...@gmail.com wrote:

The root cause is that the clock will be lost if the periodic period
is changed as currently code counts the next periodic time like this:
next_irq_clock = (cur_clock & ~(period - 1)) + period;

consider the case if cur_clock = 0x11FF and period = 0x100, then the
next_irq_clock is 0x1200, however, there is only 1 clock left to
trigger the next irq. Unfortunately, Windows guests (at least
Windows7) change the period very frequently if it runs the attached
code, so that the lost clock is accumulated, the wall-time become
faster and faster

Very interesting.

Yes, indeed.

However, I think that the above should be exactly how the RTC should
work. The original RTC circuit had 22 divider stages (see page 13 of
the datasheet[1], at the bottom right), and the periodic interrupt
taps the rising edge of one of the dividers (page 16, second
paragraph). The datasheet also never mentions a comparator being
used to trigger the periodic interrupts.

That was my thought before, however, after more test, i am not sure if
re-configuring RegA changes these divider stages internal...

Have you checked that this Windows bug doesn't happen on real
hardware too? Or is the combination of driftfix=slew and changing
periods that is a problem?

I have two physical windows 7 machines, both of them have
'useplatformclock = off' and ntp disabled, the wall time is really
accurate. The difference is that the physical machines are using Intel
Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
issue is easily be reproduced just in ~10 mins.

Our test mostly focus on 'driftfix=slew' and after this patchset the
time is accurate and stable.

I will do the test for dropping 'slew' and see what will happen...

Well, the time is easily observed to be faster if 'driftfix=slew' is not used.
:(

You mean, it only fixes the one case which with the ' driftfix=slew ' is used ?

No. for both.

We encountered this problem too, I have tried to fix it long time ago.
https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
(It seems that your solution is more useful)
But it seems that it is impossible to fix, we need to emulate the behaviors of
real hardware,
but we didn't find any clear description about it. And it seems that other
virtualization platforms

have this problem too:
VMware:
https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
Heper-v:
https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/

Hmm, slower clock is understandable, does really the Windows7 on hyperV
have faster clock? Did you meet it?

I don't know, we didn't test it, besides, I'd like to know how long did your
testcase run before
you judge it is stable with 'driftfix=slew' option? (My previous patch can't
fix it completely but
only narrows the gap between timer in guest and real timer.)

Hailiang

Re: [Qemu-devel] [PATCH 12/15] savevm: split the process of different stages for loadvm/savevm

2017-04-10 Thread Hailiang Zhang


On 2017/4/8 1:18, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

There are several stages during loadvm/savevm process. In different stage,
migration incoming processes different types of sections.
We want to control these stages more accuracy, it will benefit COLO
performance, we don't have to save type of QEMU_VM_SECTION_START
sections everytime while do checkpoint, besides, we want to separate
the process of saving/loading memory and devices state.

So we add three new helper functions: qemu_loadvm_state_begin(),
qemu_load_device_state() and qemu_savevm_live_state() to achieve
different process during migration.

Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
public.

Cc: Juan Quintela 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
  include/sysemu/sysemu.h |  6 ++
  migration/savevm.c  | 55 ++---
  2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 7ed665a..95cae41 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -132,7 +132,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, 
const char *name,
 uint64_t *start_list,
 uint64_t *length_list);
  
+void qemu_savevm_live_state(QEMUFile *f);

+int qemu_save_device_state(QEMUFile *f);
+
  int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state_begin(QEMUFile *f);
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+int qemu_load_device_state(QEMUFile *f);
  
  extern int autostart;
  
diff --git a/migration/savevm.c b/migration/savevm.c

index 9c2d239..dac478b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -54,6 +54,7 @@
  #include "qemu/cutils.h"
  #include "io/channel-buffer.h"
  #include "io/channel-file.h"
+#include "migration/colo.h"
  
  #ifndef ETH_P_RARP

  #define ETH_P_RARP 0x8035
@@ -1279,13 +1280,21 @@ done:
  return ret;
  }
  
-static int qemu_save_device_state(QEMUFile *f)

+void qemu_savevm_live_state(QEMUFile *f)
  {
-SaveStateEntry *se;
+/* save QEMU_VM_SECTION_END section */
+qemu_savevm_state_complete_precopy(f, true);
+qemu_put_byte(f, QEMU_VM_EOF);
+}
  
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);

-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+int qemu_save_device_state(QEMUFile *f)
+{
+SaveStateEntry *se;
  
+if (!migration_in_colo_state()) {

+qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+}

Note that got split out into qemu_savevm_state_header() at some point.


Do you mean i should use the wrapper qemu_savevm_state_heade() here ?


Dave


  cpu_synchronize_all_states();
  
  QTAILQ_FOREACH(se, _state.handlers, entry) {

@@ -1336,8 +1345,6 @@ enum LoadVMExitCodes {
  LOADVM_QUIT =  1,
  };
  
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);

-
  /* -- incoming postcopy messages -- */
  /* 'advise' arrives before any transfers just to tell us that a postcopy
   * *might* happen - it might be skipped if precopy transferred everything
@@ -1942,7 +1949,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
  return 0;
  }
  
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)

+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
  {
  uint8_t section_type;
  int ret = 0;
@@ -2080,6 +2087,40 @@ int qemu_loadvm_state(QEMUFile *f)
  return ret;
  }
  
+int qemu_loadvm_state_begin(QEMUFile *f)

+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
+int ret;
+
+if (qemu_savevm_state_blocked(_err)) {
+error_report_err(local_err);
+return -EINVAL;
+}
+/* Load QEMU_VM_SECTION_START section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to loadvm begin work: %d", ret);
+}
+return ret;
+}
+
+int qemu_load_device_state(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+int ret;
+
+/* Load QEMU_VM_SECTION_FULL section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to load device state: %d", ret);
+return ret;
+}
+
+cpu_synchronize_all_post_init();
+return 0;
+}
+
  int save_vmstate(Monitor *mon, const char *name)
  {
  BlockDriverState *bs, *bs1;
--
1.8.3.1



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.

Re: [Qemu-devel] [PATCH v2 0/5] Disable hotplug during migration

2017-04-10 Thread Hailiang Zhang


On 2017/4/6 21:13, Juan Quintela wrote:

Hi

This updates patches with all the comments received.
I move qdev_unplug() to make linux-user compile.

Please, review.


[RFC - v1]
This series disable hotplug/unplug during migration.  Thank to Markus
for explaining where I had to put the checks.  Why?  Because during
migration we will fail if there are changes.  For instance, in
postcopy, if we add a memory region, we would failing.  Same for other
devices if they are not setup exactly the same on destination.

Iidea would be to disable it, andthen enable for the thing that we know that 
work.

This series are on top of my previous RAMState v2 serie.

Commets, please?


Make sense, this will benefit COLO too :)

After the types found by Eric be fixed in patch 5,
Reviewed-by: zhanghailiang 


Thanks, Juan.


*** BLURB HERE ***

Juan Quintela (5):
   qdev: qdev_hotplug is really a bool
   qdev: Export qdev_hot_removed
   qdev: Move qdev_unplug() to qdev-monitor.c
   migration: Disable hotplug/unplug during migration
   ram: Remove migration_bitmap_extend()

  exec.c  |  1 -
  hw/core/qdev.c  | 40 +++-
  include/exec/ram_addr.h |  2 --
  include/hw/qdev-core.h  |  3 ++-
  migration/ram.c | 34 --
  qdev-monitor.c  | 45 +
  6 files changed, 50 insertions(+), 75 deletions(-)

Re: [Qemu-devel] [PATCH 07/15] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily

2017-04-10 Thread Hailiang Zhang


On 2017/4/8 1:06, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached 
ram
to SVM after we receive all PVM's state.

You're probably going to find this interesting to merge with Juan's recent RAM 
block series.
Probably not too hard, but he's touching a lot of the same code and rearranging 
things.


Yes, I'll update this series on top of his series, better to send next version 
after his series been merged.

Thanks,
Hailiang


Dave



Cc: Juan Quintela 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
  include/exec/ram_addr.h   |  1 +
  include/migration/migration.h |  4 +++
  migration/colo.c  | 14 +
  migration/ram.c   | 73 ++-
  4 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 3e79466..44e1190 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -27,6 +27,7 @@ struct RAMBlock {
  struct rcu_head rcu;
  struct MemoryRegion *mr;
  uint8_t *host;
+uint8_t *colo_cache; /* For colo, VM's ram cache */
  ram_addr_t offset;
  ram_addr_t used_length;
  ram_addr_t max_length;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1735d66..93c6148 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -379,4 +379,8 @@ int ram_save_queue_pages(MigrationState *ms, const char 
*rbname,
  PostcopyState postcopy_state_get(void);
  /* Set the state and return the old state */
  PostcopyState postcopy_state_set(PostcopyState new_state);
+
+/* ram cache */
+int colo_init_ram_cache(void);
+void colo_release_ram_cache(void);
  #endif
diff --git a/migration/colo.c b/migration/colo.c
index 1e3e975..edb7f00 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -551,6 +551,7 @@ void *colo_process_incoming_thread(void *opaque)
  uint64_t total_size;
  uint64_t value;
  Error *local_err = NULL;
+int ret;
  
  qemu_sem_init(>colo_incoming_sem, 0);
  
@@ -572,6 +573,12 @@ void *colo_process_incoming_thread(void *opaque)

   */
  qemu_file_set_blocking(mis->from_src_file, true);
  
+ret = colo_init_ram_cache();

+if (ret < 0) {
+error_report("Failed to initialize ram cache");
+goto out;
+}
+
  bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
  fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
  object_unref(OBJECT(bioc));
@@ -705,11 +712,18 @@ out:
  if (fb) {
  qemu_fclose(fb);
  }
+/*
+ * We can ensure BH is hold the global lock, and will join COLO
+ * incoming thread, so here it is not necessary to lock here again,
+ * Or there will be a deadlock error.
+ */
+colo_release_ram_cache();
  
  /* Hope this not to be too long to loop here */

  qemu_sem_wait(>colo_incoming_sem);
  qemu_sem_destroy(>colo_incoming_sem);
  /* Must be called after failover BH is completed */
+
  if (mis->to_src_file) {
  qemu_fclose(mis->to_src_file);
  }
diff --git a/migration/ram.c b/migration/ram.c
index f289fcd..b588990 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -219,6 +219,7 @@ static RAMBlock *last_sent_block;
  static ram_addr_t last_offset;
  static QemuMutex migration_bitmap_mutex;
  static uint64_t migration_dirty_pages;
+static bool ram_cache_enable;
  static uint32_t last_version;
  static bool ram_bulk_stage;
  
@@ -2227,6 +2228,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,

  return block->host + offset;
  }
  
+static inline void *colo_cache_from_block_offset(RAMBlock *block,

+ ram_addr_t offset)
+{
+if (!offset_in_ramblock(block, offset)) {
+return NULL;
+}
+if (!block->colo_cache) {
+error_report("%s: colo_cache is NULL in block :%s",
+ __func__, block->idstr);
+return NULL;
+}
+return block->colo_cache + offset;
+}
+
  /*
   * If a page (or a whole RDMA chunk) has been
   * determined to be zero, then zap it.
@@ -2542,7 +2557,12 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
   RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
  RAMBlock *block

Re: [Qemu-devel] [PATCH 01/15] net/colo: Add notifier/callback related helpers for filter

2017-04-10 Thread Hailiang Zhang


On 2017/4/7 23:46, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

We will use this notifier to help COLO to notify filter object
to do something, like do checkpoint, or process failover event.

Cc: Jason Wang 
Signed-off-by: zhanghailiang 
---
  net/colo.c | 92 ++
  net/colo.h | 18 
  2 files changed, 110 insertions(+)


<..>


+FilterNotifier *filter_noitifier_new(FilterNotifierCallback *cb,

   ^ Typo - no*i*tifier


Good catch, will fix it in next version.


(I've not looked at this patch much, I'll leave networking stuff to Jason)


OK, thanks.


Dave


+void *opaque, Error **errp)
+{
+FilterNotifier *notify;
+int ret;
+
+notify = (FilterNotifier *)g_source_new(_source_funcs,
+sizeof(FilterNotifier));
+ret = event_notifier_init(>event, false);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to initialize event notifier");
+goto fail;
+}
+notify->pfd.fd = event_notifier_get_fd(>event);
+notify->pfd.events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+notify->cb = cb;
+notify->opaque = opaque;
+g_source_add_poll(>source, >pfd);
+
+return notify;
+
+fail:
+g_source_destroy(>source);
+return NULL;
+}
+
+int filter_notifier_set(FilterNotifier *notify, uint64_t value)
+{
+ssize_t ret;
+
+do {
+ret = write(notify->event.wfd, , sizeof(value));
+} while (ret < 0 && errno == EINTR);
+
+/* EAGAIN is fine, a read must be pending.  */
+if (ret < 0 && errno != EAGAIN) {
+return -errno;
+}
+return 0;
+}
diff --git a/net/colo.h b/net/colo.h
index cd9027f..00f03b5 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -19,6 +19,7 @@
  #include "qemu/jhash.h"
  #include "qemu/timer.h"
  #include "slirp/tcp.h"
+#include "qemu/event_notifier.h"
  
  #define HASHTABLE_MAX_SIZE 16384
  
@@ -89,4 +90,21 @@ void connection_hashtable_reset(GHashTable *connection_track_table);

  Packet *packet_new(const void *data, int size);
  void packet_destroy(void *opaque, void *user_data);
  
+typedef void FilterNotifierCallback(void *opaque, int value);

+typedef struct FilterNotifier {
+GSource source;
+EventNotifier event;
+GPollFD pfd;
+FilterNotifierCallback *cb;
+void *opaque;
+} FilterNotifier;
+
+FilterNotifier *filter_noitifier_new(FilterNotifierCallback *cb,
+void *opaque, Error **errp);
+int filter_notifier_set(FilterNotifier *notify, uint64_t value);
+
+enum {
+COLO_CHECKPOINT = 2,
+COLO_FAILOVER,
+};
  #endif /* QEMU_COLO_PROXY_H */
--
1.8.3.1




--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.

Re: [Qemu-devel] [PATCH 15/15] COLO: flush host dirty ram from cache

2017-04-10 Thread Hailiang Zhang


On 2017/4/8 1:39, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

Don't need to flush all VM's ram from cache, only
flush the dirty pages since last checkpoint

Cc: Juan Quintela 
Signed-off-by: Li Zhijian 
Signed-off-by: Zhang Chen 
Signed-off-by: zhanghailiang 
---
  migration/ram.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 6227b94..e9ba740 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2702,6 +2702,7 @@ int colo_init_ram_cache(void)
  migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
  migration_bitmap_rcu->bmap = bitmap_new(ram_cache_pages);
  migration_dirty_pages = 0;
+memory_global_dirty_log_start();

Shouldn't there be a stop somewhere?
(Probably if you failover to the secondary and colo stops?)


Ha, good catch, i forgot to stop the dirty log in secondary side.


  return 0;
  
@@ -2750,6 +2751,15 @@ void colo_flush_ram_cache(void)

  void *src_host;
  ram_addr_t offset = 0;
  
+memory_global_dirty_log_sync();

+qemu_mutex_lock(_bitmap_mutex);
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, _list.blocks, next) {
+migration_bitmap_sync_range(block->offset, block->used_length);
+}
+rcu_read_unlock();
+qemu_mutex_unlock(_bitmap_mutex);

Again this might have some fun merging with Juan's recent changes - what's
really unusual about your set is that you're using this bitmap on the 
destination;
I suspect Juan's recent changes that trickier.
Check 'Creating RAMState for migration' and 'Split migration bitmaps by 
ramblock'.


I have reviewed these two series, and i think it's not a big problem
for COLO here,  We can still re-use most of the codes.

Thanks,
Hailiang


Dave

  trace_colo_flush_ram_cache_begin(migration_dirty_pages);
  rcu_read_lock();
  block = QLIST_FIRST_RCU(_list.blocks);
--
1.8.3.1




--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.

Re: [Qemu-devel] [PATCH v2] migration: Fix colo hang in socket_accept_incoming_migration

2017-03-26 Thread Hailiang Zhang


Hi,

It seems that there is no difference from your previous version.
You don't have to re-send it if there are no changes.
This patch has been reviewed, so you can just wait until maintainers process it 
:)

Thanks.

On 2017/3/27 9:58, Guang Wang wrote:

From: Wang guang 

The channel socket was initialized manually,
but forgot to set QIO_CHANNEL_FEATURE_SHUTDOWN.
Thus, the colo_process_incoming_thread would hang at recvmsg.
This patch just call qio_channel_socket_new to get channel,
Which set QIO_CHANNEL_FEATURE_SHUTDOWN already.

Signed-off-by: Wang Guang
Signed-off-by: zhanghailiang 
Reviewed-by: Eric Blake 
---
  io/channel-socket.c | 8 +---
  1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index f546c68..64b36f5 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -331,16 +331,10 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
  {
  QIOChannelSocket *cioc;
  
-cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));

-cioc->fd = -1;
+cioc = qio_channel_socket_new();
  cioc->remoteAddrLen = sizeof(ioc->remoteAddr);
  cioc->localAddrLen = sizeof(ioc->localAddr);
  
-#ifdef WIN32

-QIO_CHANNEL(cioc)->event = CreateEvent(NULL, FALSE, FALSE, NULL);
-#endif
-
-
   retry:
  trace_qio_channel_socket_accept(ioc);
  cioc->fd = qemu_accept(ioc->fd, (struct sockaddr *)>remoteAddr,

Re: [Qemu-devel] 答复: Re: 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-22 Thread Hailiang Zhang


On 2017/3/22 16:09, wang.guan...@zte.com.cn wrote:

hi:

yes.it is better.

And should we delete



Yes, you are right.





#ifdef WIN32

 QIO_CHANNEL(cioc)-＞event = CreateEvent(NULL, FALSE, FALSE, NULL)

#endif




in qio_channel_socket_accept？

qio_channel_socket_new already have it.












原始邮件



发件人： ＜zhang.zhanghaili...@huawei.com＞
收件人：王广10165992
抄送人： ＜xuqu...@huawei.com＞ ＜dgilb...@redhat.com＞ ＜zhangchen.f...@cn.fujitsu.com＞ 
＜qemu-devel@nongnu.org＞
日 期 ：2017年03月22日 15:03
主 题 ：Re: [Qemu-devel]  答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang





Hi,

On 2017/3/22 9:42, wang.guan...@zte.com.cn wrote:
＞ diff --git a/migration/socket.c b/migration/socket.c
＞
＞
＞ index 13966f1..d65a0ea 100644
＞
＞
＞ --- a/migration/socket.c
＞
＞
＞ +++ b/migration/socket.c
＞
＞
＞ @@ -147,8 +147,9 @@ static gboolean 
socket_accept_incoming_migration(QIOChannel *ioc,
＞
＞
＞   }
＞
＞
＞
＞
＞
＞   trace_migration_socket_incoming_accepted()
＞
＞
＞
＞
＞
＞   qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
＞
＞
＞ +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
＞
＞
＞   migration_channel_process_incoming(migrate_get_current(),
＞
＞
＞  QIO_CHANNEL(sioc))
＞
＞
＞   object_unref(OBJECT(sioc))
＞
＞
＞
＞
＞ Is this patch ok?
＞

Yes, i think this works, but a better way maybe to call 
qio_channel_set_feature()
in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
socket accept fd,
Or fix it by this:

diff --git a/io/channel-socket.c b/io/channel-socket.c
index f546c68..ce6894c 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
 Error **errp)
   {
   QIOChannelSocket *cioc
-
-cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
-cioc-＞fd = -1
+
+cioc = qio_channel_socket_new()
   cioc-＞remoteAddrLen = sizeof(ioc-＞remoteAddr)
   cioc-＞localAddrLen = sizeof(ioc-＞localAddr)


Thanks,
Hailiang

＞ I have test it . The test could not hang any more.
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞ 原始邮件
＞
＞
＞
＞ 发件人： ＜zhang.zhanghaili...@huawei.com＞
＞ 收件人： ＜dgilb...@redhat.com＞ ＜berra...@redhat.com＞
＞ 抄送人： ＜xuqu...@huawei.com＞ ＜qemu-devel@nongnu.org＞ 
＜zhangchen.f...@cn.fujitsu..com＞王广10165992
＞ 日 期 ：2017年03月22日 09:11
＞ 主 题 ：Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang
＞
＞
＞
＞
＞
＞ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
＞ ＞ * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
＞ ＞＞ Hi,
＞ ＞＞
＞ ＞＞ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
＞ ＞＞
＞ ＞＞ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
＞ ＞＞ case COLO thread/incoming thread is stuck in read/write() while do 
failover,
＞ ＞＞ but it didn't take effect, because all the fd used by COLO (also migration)
＞ ＞＞ has been wrapped by qio channel, and it will not call the shutdown API if
＞ ＞＞ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN).
＞ ＞＞
＞ ＞＞ Cc: Dr. David Alan Gilbert ＜dgilb...@redhat.com＞
＞ ＞＞
＞ ＞＞ I doubted migration cancel has the same problem, it may be stuck in write()
＞ ＞＞ if we tried to cancel migration.
＞ ＞＞
＞ ＞＞ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
Error **errp)
＞ ＞＞ {
＞ ＞＞  qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
＞ ＞＞  migration_channel_connect(s, ioc, NULL)
＞ ＞＞  ... ...
＞ ＞＞ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN) above,
＞ ＞＞ and the
＞ ＞＞ migrate_fd_cancel()
＞ ＞＞ {
＞ ＞＞   ... ...
＞ ＞＞  if (s-＞state == MIGRATION_STATUS_CANCELLING && f) {
＞ ＞＞  qemu_file_shutdown(f)  --＞ This will not take effect. No ?
＞ ＞＞  }
＞ ＞＞ }
＞ ＞
＞ ＞ (cc'd in Daniel Berrange).
＞ ＞ I see that we call qio_channel_set_feature(ioc, 
QIO_CHANNEL_FEATURE_SHUTDOWN) at the
＞ ＞ top of qio_channel_socket_new  so I think that's safe isn't it?
＞ ＞
＞
＞ Hmm, you are right, this problem is only exist for the migration incoming fd, 
thanks.
＞
＞ ＞ Dave
＞ ＞
＞ ＞＞ Thanks,
＞ ＞＞ Hailiang
＞ ＞＞
＞ ＞＞ On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:
＞ ＞＞＞ Thank you。
＞ ＞＞＞
＞ ＞＞＞ I have test aready。
＞ ＞＞＞
＞ ＞＞＞ When the Primary Node panic,the Secondary Node qemu hang at the same 
place。
＞ ＞＞＞
＞ ＞＞＞ Incorrding http://wiki.qemu-project.org/Features/COLO ，kill Primary Node 
qemu will not produce the problem,but Primary Node panic can。
＞ ＞＞＞
＞ ＞＞＞ I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ when failover,channel_shutdown could not shut down the channel.
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ so the colo_process_incoming_thread will hang at recvmsg.
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ I test a patch:
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ diff --git a/migration/socket.c b/migration/socket.c
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ index 13966f1..d65a0ea 100644
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ --- a/migration/socket.c
＞ ＞＞＞
＞ ＞＞＞
＞

Re: [Qemu-devel] 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-22 Thread Hailiang Zhang


Hi,

On 2017/3/22 9:42, wang.guan...@zte.com.cn wrote:

diff --git a/migration/socket.c b/migration/socket.c


index 13966f1..d65a0ea 100644


--- a/migration/socket.c


+++ b/migration/socket.c


@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
*ioc,


  }





  trace_migration_socket_incoming_accepted()





  qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")


+qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)


  migration_channel_process_incoming(migrate_get_current(),


 QIO_CHANNEL(sioc))


  object_unref(OBJECT(sioc))




Is this patch ok?



Yes, i think this works, but a better way maybe to call 
qio_channel_set_feature()
in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
socket accept fd,
Or fix it by this:

diff --git a/io/channel-socket.c b/io/channel-socket.c
index f546c68..ce6894c 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
   Error **errp)
 {
 QIOChannelSocket *cioc;
-
-cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
-cioc->fd = -1;
+
+cioc = qio_channel_socket_new();
 cioc->remoteAddrLen = sizeof(ioc->remoteAddr);
 cioc->localAddrLen = sizeof(ioc->localAddr);


Thanks,
Hailiang


I have test it . The test could not hang any more.












原始邮件



发件人： ＜zhang.zhanghaili...@huawei.com＞
收件人： ＜dgilb...@redhat.com＞ ＜berra...@redhat.com＞
抄送人： ＜xuqu...@huawei.com＞ ＜qemu-devel@nongnu.org＞ 
＜zhangchen.f...@cn.fujitsu.com＞王广10165992
日 期 ：2017年03月22日 09:11
主 题 ：Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang





On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
＞ * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
＞＞ Hi,
＞＞
＞＞ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
＞＞
＞＞ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
＞＞ case COLO thread/incoming thread is stuck in read/write() while do failover,
＞＞ but it didn't take effect, because all the fd used by COLO (also migration)
＞＞ has been wrapped by qio channel, and it will not call the shutdown API if
＞＞ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN).
＞＞
＞＞ Cc: Dr. David Alan Gilbert ＜dgilb...@redhat.com＞
＞＞
＞＞ I doubted migration cancel has the same problem, it may be stuck in write()
＞＞ if we tried to cancel migration.
＞＞
＞＞ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
Error **errp)
＞＞ {
＞＞  qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
＞＞  migration_channel_connect(s, ioc, NULL)
＞＞  ... ...
＞＞ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN) above,
＞＞ and the
＞＞ migrate_fd_cancel()
＞＞ {
＞＞   ... ...
＞＞  if (s-＞state == MIGRATION_STATUS_CANCELLING && f) {
＞＞  qemu_file_shutdown(f)  --＞ This will not take effect. No ?
＞＞  }
＞＞ }
＞
＞ (cc'd in Daniel Berrange).
＞ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) 
at the
＞ top of qio_channel_socket_new  so I think that's safe isn't it?
＞

Hmm, you are right, this problem is only exist for the migration incoming fd, 
thanks.

＞ Dave
＞
＞＞ Thanks,
＞＞ Hailiang
＞＞
＞＞ On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:
＞＞＞ Thank you。
＞＞＞
＞＞＞ I have test aready。
＞＞＞
＞＞＞ When the Primary Node panic,the Secondary Node qemu hang at the same place。
＞＞＞
＞＞＞ Incorrding http://wiki.qemu-project.org/Features/COLO ，kill Primary Node 
qemu will not produce the problem,but Primary Node panic can。
＞＞＞
＞＞＞ I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.
＞＞＞
＞＞＞
＞＞＞ when failover,channel_shutdown could not shut down the channel.
＞＞＞
＞＞＞
＞＞＞ so the colo_process_incoming_thread will hang at recvmsg.
＞＞＞
＞＞＞
＞＞＞ I test a patch:
＞＞＞
＞＞＞
＞＞＞ diff --git a/migration/socket.c b/migration/socket.c
＞＞＞
＞＞＞
＞＞＞ index 13966f1..d65a0ea 100644
＞＞＞
＞＞＞
＞＞＞ --- a/migration/socket.c
＞＞＞
＞＞＞
＞＞＞ +++ b/migration/socket.c
＞＞＞
＞＞＞
＞＞＞ @@ -147,8 +147,9 @@ static gboolean 
socket_accept_incoming_migration(QIOChannel *ioc,
＞＞＞
＞＞＞
＞＞＞}
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞trace_migration_socket_incoming_accepted()
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
＞＞＞
＞＞＞
＞＞＞ +qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN)
＞＞＞
＞＞＞
＞＞＞migration_channel_process_incoming(migrate_get_current(),
＞＞＞
＞＞＞
＞＞＞   QIO_CHANNEL(sioc))
＞＞＞
＞＞＞
＞＞＞object_unref(OBJECT(sioc))
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞ My test will not hang any more.
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞ 原始邮件
＞＞＞
＞＞＞
＞＞＞
＞＞＞ 发件人： ＜zhangchen.f...@cn.fujitsu..com＞
＞＞＞ 收件人：王广10165992

Re: [Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-21 Thread Hailiang Zhang


On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

Hi,

Thanks for reporting this, and i confirmed it in my test, and it is a bug.

Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
case COLO thread/incoming thread is stuck in read/write() while do failover,
but it didn't take effect, because all the fd used by COLO (also migration)
has been wrapped by qio channel, and it will not call the shutdown API if
we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN).

Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>

I doubted migration cancel has the same problem, it may be stuck in write()
if we tried to cancel migration.

void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
**errp)
{
 qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
 migration_channel_connect(s, ioc, NULL);
 ... ...
We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN) above,
and the
migrate_fd_cancel()
{
  ... ...
 if (s->state == MIGRATION_STATUS_CANCELLING && f) {
 qemu_file_shutdown(f);  --> This will not take effect. No ?
 }
}


(cc'd in Daniel Berrange).
I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
at the
top of qio_channel_socket_new;  so I think that's safe isn't it?



Hmm, you are right, this problem is only exist for the migration incoming fd, 
thanks.


Dave


Thanks,
Hailiang

On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:

Thank you。

I have test aready。

When the Primary Node panic,the Secondary Node qemu hang at the same place。

Incorrding http://wiki.qemu-project.org/Features/COLO ，kill Primary Node qemu 
will not produce the problem,but Primary Node panic can。

I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.


when failover,channel_shutdown could not shut down the channel.


so the colo_process_incoming_thread will hang at recvmsg.


I test a patch:


diff --git a/migration/socket.c b/migration/socket.c


index 13966f1..d65a0ea 100644


--- a/migration/socket.c


+++ b/migration/socket.c


@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
*ioc,


   }





   trace_migration_socket_incoming_accepted()





   qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")


+qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)


   migration_channel_process_incoming(migrate_get_current(),


  QIO_CHANNEL(sioc))


   object_unref(OBJECT(sioc))




My test will not hang any more.

















原始邮件



发件人： ＜zhangchen.f...@cn.fujitsu.com＞
收件人：王广10165992 ＜zhang.zhanghaili...@huawei.com＞
抄送人： ＜qemu-devel@nongnu.org＞ ＜zhangchen.f...@cn.fujitsu.com＞
日 期 ：2017年03月21日 15:58
主 题 ：Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang





Hi,Wang.

You can test this branch:

https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk

and please follow wiki ensure your own configuration correctly.

http://wiki.qemu-project.org/Features/COLO


Thanks

Zhang Chen


On 03/21/2017 03:27 PM, wang.guan...@zte.com.cn wrote:
＞
＞ hi.
＞
＞ I test the git qemu master have the same problem.
＞
＞ (gdb) bt
＞
＞ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
＞ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
＞
＞ #1  0x7f658e4aa0c2 in qio_channel_read
＞ (ioc=ioc@entry=0x7f65911b4e50, buf=buf@entry=0x7f65907cb838 "",
＞ buflen=buflen@entry=32768, errp=errp@entry=0x0) at io/channel.c:114
＞
＞ #2  0x7f658e3ea990 in channel_get_buffer (opaque=＜optimized out＞,
＞ buf=0x7f65907cb838 "", pos=＜optimized out＞, size=32768) at
＞ migration/qemu-file-channel.c:78
＞
＞ #3  0x7f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
＞ migration/qemu-file.c:295
＞
＞ #4  0x7f658e3ea2e1 in qemu_peek_byte (f=f@entry=0x7f65907cb800,
＞ offset=offset@entry=0) at migration/qemu-file.c:555
＞
＞ #5  0x7f658e3ea34b in qemu_get_byte (f=f@entry=0x7f65907cb800) at
＞ migration/qemu-file.c:568
＞
＞ #6  0x7f658e3ea552 in qemu_get_be32 (f=f@entry=0x7f65907cb800) at
＞ migration/qemu-file.c:648
＞
＞ #7  0x7f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
＞ errp=errp@entry=0x7f64ef3fd9b0) at migration/colo.c:244
＞
＞ #8  0x7f658e3e681e in colo_receive_check_message (f=＜optimized
＞ out＞, expect_msg=expect_msg@entry=COLO_MESSAGE_VMSTATE_SEND,
＞ errp=errp@entry=0x7f64ef3fda08)
＞
＞ at migration/colo.c:264
＞
＞ #9  0x7f658e3e740e in colo_process_incoming_thread
＞ (opaque=0x7f658eb30360 ＜mis_current.31286＞) at migration/colo.c:577
＞
＞ #10 0x7f658be09df3 in start_thread () from /lib64/libpthread.so.0
＞
＞ #11 0x7f65881983ed in clone () from /lib64/libc.so.6
＞
＞ (gdb) p ioc-＞name
＞
＞ $2 = 0x7f658ff7d5c0 "migration

Re: [Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-21 Thread Hailiang Zhang


Hi,

Thanks for reporting this, and i confirmed it in my test, and it is a bug.

Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
case COLO thread/incoming thread is stuck in read/write() while do failover,
but it didn't take effect, because all the fd used by COLO (also migration)
has been wrapped by qio channel, and it will not call the shutdown API if
we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN).

Cc: Dr. David Alan Gilbert 

I doubted migration cancel has the same problem, it may be stuck in write()
if we tried to cancel migration.

void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
**errp)
{
qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
migration_channel_connect(s, ioc, NULL);
... ...
We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN) above,
and the
migrate_fd_cancel()
{
 ... ...
if (s->state == MIGRATION_STATUS_CANCELLING && f) {
qemu_file_shutdown(f);  --> This will not take effect. No ?
}
}

Thanks,
Hailiang

On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:

Thank you。

I have test aready。

When the Primary Node panic,the Secondary Node qemu hang at the same place。

Incorrding http://wiki.qemu-project.org/Features/COLO ，kill Primary Node qemu 
will not produce the problem,but Primary Node panic can。

I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.


when failover,channel_shutdown could not shut down the channel.


so the colo_process_incoming_thread will hang at recvmsg.


I test a patch:


diff --git a/migration/socket.c b/migration/socket.c


index 13966f1..d65a0ea 100644


--- a/migration/socket.c


+++ b/migration/socket.c


@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
*ioc,


  }





  trace_migration_socket_incoming_accepted()





  qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")


+qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)


  migration_channel_process_incoming(migrate_get_current(),


 QIO_CHANNEL(sioc))


  object_unref(OBJECT(sioc))




My test will not hang any more.

















原始邮件



发件人： ＜zhangchen.f...@cn.fujitsu.com＞
收件人：王广10165992 ＜zhang.zhanghaili...@huawei.com＞
抄送人： ＜qemu-devel@nongnu.org＞ ＜zhangchen.f...@cn.fujitsu.com＞
日 期 ：2017年03月21日 15:58
主 题 ：Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang





Hi,Wang.

You can test this branch:

https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk

and please follow wiki ensure your own configuration correctly.

http://wiki.qemu-project.org/Features/COLO


Thanks

Zhang Chen


On 03/21/2017 03:27 PM, wang.guan...@zte.com.cn wrote:
＞
＞ hi.
＞
＞ I test the git qemu master have the same problem.
＞
＞ (gdb) bt
＞
＞ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
＞ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
＞
＞ #1  0x7f658e4aa0c2 in qio_channel_read
＞ (ioc=ioc@entry=0x7f65911b4e50, buf=buf@entry=0x7f65907cb838 "",
＞ buflen=buflen@entry=32768, errp=errp@entry=0x0) at io/channel.c:114
＞
＞ #2  0x7f658e3ea990 in channel_get_buffer (opaque=＜optimized out＞,
＞ buf=0x7f65907cb838 "", pos=＜optimized out＞, size=32768) at
＞ migration/qemu-file-channel.c:78
＞
＞ #3  0x7f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
＞ migration/qemu-file.c:295
＞
＞ #4  0x7f658e3ea2e1 in qemu_peek_byte (f=f@entry=0x7f65907cb800,
＞ offset=offset@entry=0) at migration/qemu-file.c:555
＞
＞ #5  0x7f658e3ea34b in qemu_get_byte (f=f@entry=0x7f65907cb800) at
＞ migration/qemu-file.c:568
＞
＞ #6  0x7f658e3ea552 in qemu_get_be32 (f=f@entry=0x7f65907cb800) at
＞ migration/qemu-file.c:648
＞
＞ #7  0x7f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
＞ errp=errp@entry=0x7f64ef3fd9b0) at migration/colo.c:244
＞
＞ #8  0x7f658e3e681e in colo_receive_check_message (f=＜optimized
＞ out＞, expect_msg=expect_msg@entry=COLO_MESSAGE_VMSTATE_SEND,
＞ errp=errp@entry=0x7f64ef3fda08)
＞
＞ at migration/colo.c:264
＞
＞ #9  0x7f658e3e740e in colo_process_incoming_thread
＞ (opaque=0x7f658eb30360 ＜mis_current.31286＞) at migration/colo.c:577
＞
＞ #10 0x7f658be09df3 in start_thread () from /lib64/libpthread.so.0
＞
＞ #11 0x7f65881983ed in clone () from /lib64/libc.so.6
＞
＞ (gdb) p ioc-＞name
＞
＞ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
＞
＞ (gdb) p ioc-＞featuresDo not support QIO_CHANNEL_FEATURE_SHUTDOWN
＞
＞ $3 = 0
＞
＞
＞ (gdb) bt
＞
＞ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
＞ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
＞
＞ #1  0x7fdcc6966350 in g_main_dispatch (context=＜optimized out＞) at
＞ gmain.c:3054
＞
＞ #2  g_main_context_dispatch (context=＜optimized out＞,
＞ context@entry=0x7fdccce9f590) at gmain.c:3630
＞
＞ #3

Re: [Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-21 Thread Hailiang Zhang


Hi,

On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:

Thank you。

I have test aready。

When the Primary Node panic,the Secondary Node qemu hang at the same place。

Incorrding http://wiki.qemu-project.org/Features/COLO ，kill Primary Node qemu 
will not produce the problem,but Primary Node panic can。

I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.




Yes, you are right, when we do failover for primary/secondary VM, we will 
shutdown the related
fd in case it is stuck in the read/write fd.

It seems that you didn't follow the above introduction exactly to do the test. 
Could you
share your test procedures ? Especially the commands used in the test.

Thanks,
Hailiang


when failover,channel_shutdown could not shut down the channel.


so the colo_process_incoming_thread will hang at recvmsg.


I test a patch:


diff --git a/migration/socket.c b/migration/socket.c


index 13966f1..d65a0ea 100644


--- a/migration/socket.c


+++ b/migration/socket.c


@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
*ioc,


  }





  trace_migration_socket_incoming_accepted()





  qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")


+qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)


  migration_channel_process_incoming(migrate_get_current(),


 QIO_CHANNEL(sioc))


  object_unref(OBJECT(sioc))




My test will not hang any more.

















原始邮件



发件人： ＜zhangchen.f...@cn.fujitsu.com＞
收件人：王广10165992 ＜zhang.zhanghaili...@huawei.com＞
抄送人： ＜qemu-devel@nongnu.org＞ ＜zhangchen.f...@cn.fujitsu.com＞
日 期 ：2017年03月21日 15:58
主 题 ：Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang





Hi,Wang.

You can test this branch:

https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk

and please follow wiki ensure your own configuration correctly.

http://wiki.qemu-project.org/Features/COLO


Thanks

Zhang Chen


On 03/21/2017 03:27 PM, wang.guan...@zte.com.cn wrote:
＞
＞ hi.
＞
＞ I test the git qemu master have the same problem.
＞
＞ (gdb) bt
＞
＞ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
＞ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
＞
＞ #1  0x7f658e4aa0c2 in qio_channel_read
＞ (ioc=ioc@entry=0x7f65911b4e50, buf=buf@entry=0x7f65907cb838 "",
＞ buflen=buflen@entry=32768, errp=errp@entry=0x0) at io/channel.c:114
＞
＞ #2  0x7f658e3ea990 in channel_get_buffer (opaque=＜optimized out＞,
＞ buf=0x7f65907cb838 "", pos=＜optimized out＞, size=32768) at
＞ migration/qemu-file-channel.c:78
＞
＞ #3  0x7f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
＞ migration/qemu-file.c:295
＞
＞ #4  0x7f658e3ea2e1 in qemu_peek_byte (f=f@entry=0x7f65907cb800,
＞ offset=offset@entry=0) at migration/qemu-file.c:555
＞
＞ #5  0x7f658e3ea34b in qemu_get_byte (f=f@entry=0x7f65907cb800) at
＞ migration/qemu-file.c:568
＞
＞ #6  0x7f658e3ea552 in qemu_get_be32 (f=f@entry=0x7f65907cb800) at
＞ migration/qemu-file.c:648
＞
＞ #7  0x7f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
＞ errp=errp@entry=0x7f64ef3fd9b0) at migration/colo.c:244
＞
＞ #8  0x7f658e3e681e in colo_receive_check_message (f=＜optimized
＞ out＞, expect_msg=expect_msg@entry=COLO_MESSAGE_VMSTATE_SEND,
＞ errp=errp@entry=0x7f64ef3fda08)
＞
＞ at migration/colo.c:264
＞
＞ #9  0x7f658e3e740e in colo_process_incoming_thread
＞ (opaque=0x7f658eb30360 ＜mis_current.31286＞) at migration/colo.c:577
＞
＞ #10 0x7f658be09df3 in start_thread () from /lib64/libpthread.so.0
＞
＞ #11 0x7f65881983ed in clone () from /lib64/libc.so.6
＞
＞ (gdb) p ioc-＞name
＞
＞ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
＞
＞ (gdb) p ioc-＞featuresDo not support QIO_CHANNEL_FEATURE_SHUTDOWN
＞
＞ $3 = 0
＞
＞
＞ (gdb) bt
＞
＞ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
＞ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
＞
＞ #1  0x7fdcc6966350 in g_main_dispatch (context=＜optimized out＞) at
＞ gmain.c:3054
＞
＞ #2  g_main_context_dispatch (context=＜optimized out＞,
＞ context@entry=0x7fdccce9f590) at gmain.c:3630
＞
＞ #3  0x7fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
＞
＞ #4  os_host_main_loop_wait (timeout=＜optimized out＞) at
＞ util/main-loop.c:258
＞
＞ #5  main_loop_wait (nonblocking=nonblocking@entry=0) at
＞ util/main-loop.c:506
＞
＞ #6  0x7fdccb526187 in main_loop () at vl.c:1898
＞
＞ #7  main (argc=＜optimized out＞, argv=＜optimized out＞, envp=＜optimized
＞ out＞) at vl.c:4709
＞
＞ (gdb) p ioc-＞features
＞
＞ $1 = 6
＞
＞ (gdb) p ioc-＞name
＞
＞ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
＞
＞
＞ May be socket_accept_incoming_migration should
＞ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
＞
＞
＞ thank you.
＞
＞
＞
＞
＞
＞ 原始邮件
＞ *发件人：*＜zhangchen.f...@cn.fujitsu.com＞
＞ *收件人：*王广10165992＜qemu-devel@nongnu.org＞
＞ *抄送人：*＜zhangchen.f...@cn.fujitsu.com＞＜zhang.zhanghaili...@huawei.com＞
＞ *日 期

Re: [Qemu-devel] [PATCHV2] COLO: COLO-FT.txt err

2017-03-18 Thread Hailiang Zhang


Hi,

Thanks for reporting this.

This patch has wrong format and it seems that you pasted the content
to this email directly, please follow the introduction bellow to submit a patch:
http://wiki.qemu-project.org/Contribute/SubmitAPatch

Thanks,
Hailiang


On 2017/3/18 13:43, wang.guan...@zte.com.cn wrote:

This is an error in COLO-FT.txt.


secondeary-disk0 should be secondary-disk0.






Signed-off-by: Guang Wang＜wang.guan...@zte.com.cn＞

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt

index e289be2..754efda 100644

--- a/docs/COLO-FT.txt

+++ b/docs/COLO-FT.txt

@@ -139,7 +139,7 @@ Secondary:

  { 'execute': 'nbd-server-start',

'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 
'port': '8889'} } }

  }

-{'execute': 'nbd-server-add', 'arguments': {'device': 'secondeary-disk0', 
'writable': true } }

+{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 
'writable': true } }



  Note:

a. The qmp command nbd-server-start and nbd-server-add must be run















原始邮件



发件人：王广10165992
收件人： ＜zhang.zhanghaili...@huawei.com＞
抄送人： ＜qemu-devel@nongnu.org＞ ＜zhangchen.f...@cn.fujitsu.com＞
日 期 ：2017年03月18日 13:38
主 题 ：[Qemu-devel] [PATCH] COLO: COLO-FT.txt err






This is an error in COLO-FT.txt.

secondeary-disk0 should be secondary-disk0.




Signed-off-by: Guang Wang＜wang.guan...@zte.com.cn＞

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt

index e289be2..754efda 100644

--- a/docs/COLO-FT.txt

+++ b/docs/COLO-FT.txt

@@ -139,7 +139,7 @@ Secondary:

  { 'execute': 'nbd-server-start',

'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 
'port': '8889'} } }

  }

-{'execute': 'nbd-server-add', 'arguments': {'device': 'secondeary-disk0', 
'writable': true } }

+{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 
'writable': true } }

Re: [Qemu-devel] [PATCH] virtio-serial-bus: Delete timer from list before free it

2017-03-14 Thread Hailiang Zhang


ping... ?

On 2017/3/8 4:01, Amit Shah wrote:

On (Mon) 06 Mar 2017 [11:29:31], zhanghailiang wrote:

Signed-off-by: zhanghailiang 


Reviewed-by: Amit Shah 

Michael, please pick this up.

Amit

Re: [Qemu-devel] [PATCH v3 5/6] replication: Implement block replication for shared disk case

2017-03-07 Thread Hailiang Zhang


Hi Stefan,

Sorry for the delayed reply.

On 2017/2/28 1:37, Stefan Hajnoczi wrote:

On Fri, Jan 20, 2017 at 11:47:59AM +0800, zhanghailiang wrote:

Just as the scenario of non-shared disk block replication,
we are going to implement block replication from many basic
blocks that are already in QEMU.
The architecture is:

  virtio-blk ||   
.--
  /  ||   | 
Secondary
 /   ||   
'--
/|| 
virtio-blk
   / || 
 |
   | ||   
replication(5)
   |NBD  >   NBD   (2)  
 |
   |  client ||server ---> hidden disk <-- 
active disk(4)
   | ^   ||  |
   |  replication(1) ||  |
   | |   ||  |
   |   +-'   ||  |
  (3)  |drive-backup sync=none   ||  |
. |   +-+   ||  |
Primary | | |   ||   backing|
' | |   ||  |
   V |   |
+---+|
|   shared disk | <--+
+---+

 1) Primary writes will read original data and forward it to Secondary
QEMU.
 2) The hidden-disk is created automatically. It buffers the original 
content
that is modified by the primary VM. It should also be an empty disk, and
the driver supports bdrv_make_empty() and backing file.
 3) Primary write requests will be written to Shared disk.
 4) Secondary write requests will be buffered in the active disk and it
will overwrite the existing sector content in the buffer.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 


Are there any restrictions on the shared disk?  For example the -drive
cache= mode must be 'none'.  If the cache mode isn't 'none' the
secondary host might have old data in the host page cache.  The


While do checkpoint, we will call vm_stop(), in which, the bdrv_flush_all()
will be called, is it enough ?


Secondary QEMU would have an inconsistent view of the shared disk.

Are image file formats like qcow2 supported for the shared disk?  Extra


In the above scenario, it has no limitation of formats for the shared disk.


steps are required to achieve consistency, see bdrv_invalidate_cache().



Hmm, in that case, we should call bdrv_invalidate_cache_all() while checkpoint.


Thanks,
Hailiang


Stefan

Re: [Qemu-devel] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options

2017-02-28 Thread Hailiang Zhang


On 2017/2/28 1:10, Stefan Hajnoczi wrote:

On Fri, Jan 20, 2017 at 11:47:56AM +0800, zhanghailiang wrote:

@@ -119,12 +136,31 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
 "The option mode's value should be primary or secondary");
  goto fail;
  }
+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+  false);
+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(_err, "Missing shared disk blk option");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+error_setg(_err, "There is no %s block", s->shared_disk_id);
+goto fail;
+}
+/* We can't access root member of BlockBackend directly */
+tmp_bs = blk_bs(blk);
+s->primary_disk = QLIST_FIRST(_bs->parents);


Why is this necessary?

We already have the BlockBackend for the primary disk.  I'm not sure why
the BdrvChild is needed.



Er, the helper backup_job_create() needs the BlockDriverState for the primary 
disk,
besides, we want to make it same with 'secondary_disk' in struct 
BDRVReplicationState.

Thanks.
Hailiang


Stefan

Re: [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case

2017-02-28 Thread Hailiang Zhang


On 2017/2/28 0:46, Stefan Hajnoczi wrote:

On Fri, Jan 20, 2017 at 11:47:55AM +0800, zhanghailiang wrote:

+Secondary:
+ -drive 
if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
+backing.driver=raw,backing.file.filename=1.raw \
+ -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
+file.driver=qcow2,top-id=active-disk0,\
+file.file.filename=/mnt/ramfs/active_disk.img,\
+file.backing=hidden_disk0,shared-disk=on


I don't see where this patch series adds a -drive shared-disk=on option.
Did I miss it?



In patch 2, we add this option.

Thanks.

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2017-02-28 Thread Hailiang Zhang


On 2017/3/1 0:14, Andrea Arcangeli wrote:

Hello,

On Tue, Feb 28, 2017 at 09:48:26AM +0800, Hailiang Zhang wrote:

Yes, for current implementing of live snapshot, it supports tcg,
but does not support kvm mode, the reason i have mentioned above,
if you try to implement it, i think you need to start from userfaultfd
supporting KVM. There is scenario for it, But I'm blocked by other things
these days. I'm glad to discuss with you if you planed to do it.


Yes, there were other urgent userfaultfd features needed by QEMU and
CRIU queued for merging (hugetlbfs/shmem/non-cooperative support) and
they're all included upstream now. Now that such work is finished,
fixing the WP support to work with KVM and to provide full accuracy
will be the next thing to do.



Great, looking forward to it. thanks.


Thanks,
Andrea

.

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2017-02-27 Thread Hailiang Zhang

Hi,

On 2017/2/27 23:37, Christian Pinto wrote:

Hello Hailiang,

are there any updates on this patch series? Are you planning to release
a new version?

No, userfaultfd still does not support write-protect for KVM.
You can see the newest discussion about it here:
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg01127.html

You say there are some issues with the current snapshot-v2 version,
which issues were you referring to? On my side the only problem I have
seen was that the live snapshot was not working on ARMv8, but I have
fixed that and managed to successfully snapshot and restore a QEMU ARMv8
tcg machine on an ARMv8 host. I will gladly contribute with these fixes
once you will release a new version of the patches.

Yes, for current implementing of live snapshot, it supports tcg,
but does not support kvm mode, the reason i have mentioned above,
if you try to implement it, i think you need to start from userfaultfd
supporting KVM. There is scenario for it, But I'm blocked by other things
these days. I'm glad to discuss with you if you planed to do it.

Thanks.
Hailiang

Thanks a lot,

Christian

On 20/08/2016 08:31, Hailiang Zhang wrote:

Hi,

I updated this series, but didn't post it, because there are some
problems while i tested the snapshot function.
I didn't know if it is the userfaultfd issue or not.
I don't have time to investigate it this month. I have put them in github
https://github.com/coloft/qemu/tree/snapshot-v2

Anyone who want to test and modify it are welcomed!

Besides, will you join the linuxcon or KVM forum in Canada ?
I wish to see you there if you join the conference ;)

Thanks,
Hailiang

On 2016/8/18 23:56, Andrea Arcangeli wrote:

Hello everyone,

I've an aa.git tree uptodate on the master & userfault branch (master
includes other pending VM stuff, userfault branch only contains
userfault enhancements):

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault

I didn't have time to test KVM live memory snapshot on it yet as I'm
still working to improve it. Did anybody test it? However I'd be happy
to take any bugreports and quickly solve anything that isn't working
right with the shadow MMU.

I got positive report already for another usage of the uffd WP support:

https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f

The last few things I'm working on to finish the WP support are:

1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
vma->vm_flags with VM_UFFD_WP set, which swap entries were
generated while the pte was wrprotected.

2) to avoid all false positives the equivalent of pte_mksoft_dirty is
needed too... and that requires spare software bits on the pte
which are available on x86. I considered also taking over the
soft_dirty bit but then you couldn't do checkpoint restore of a
JIT/to-native compiler that uses uffd WP support so it wasn't
ideal. Perhaps it would be ok as an incremental patch to make the
two options mutually exclusive to defer the arch changes that
pte_mkuffd_wp would require for later.

3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
cow in userfaultfd_writeprotect.

4) WP selftest

In theory things should work ok already if the userland code is
tolerant against false positives through swap and after fork() and
KSM. For an usage like snapshotting false positives shouldn't be an
issue (it'll just run slower if you swap in the worst case), and point
3) above also isn't an issue because it's going to register into uffd
with WP only.

The current status includes:

1) WP support for anon (with false positives.. work in progress)

2) MISSING support for tmpfs and hugetlbfs

3) non cooperative support

Thanks,
Andrea

Re: [Qemu-devel] [PATCH V10 0/2] Add new qmp commands to suppurt Xen COLO

2017-02-27 Thread Hailiang Zhang


On 2017/2/28 6:52, Stefano Stabellini wrote:

On Mon, 27 Feb 2017, Eric Blake wrote:

On 02/27/2017 04:31 PM, Stefano Stabellini wrote:

Eric, are you OK with this series going upstream? If so, do you want me
to send the pull request for it or are you going to handle it?


Both patches have my R-b, but MAINTAINERS suggests the pull request
should go through COLO Framework (zhanghailiang) or maybe Xen (you).


I am fine either way. zhanghailiang?



Yes, I'm fine if you could give a pull request for this series.

Thanks.


.

Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table

2017-02-27 Thread Hailiang Zhang


On 2017/2/27 17:05, Jason Wang wrote:



On 2017年02月27日 14:53, Hailiang Zhang wrote:

I think the issue is that your code can not differ A from B.



We have a parameter 'fin_ack_seq' recording the sequence of
'FIN=1,ACK=1,seq=w,ack=u+1' and if the ack value from the opposite
side is is 'w+1', we can consider this connection is closed, no ?




Hi Jason,

Thanks very much for your patience.


Let's see what happens, consider VM is doing active close (reuse the
figure above):



(We didn't support tracking the connection start
by the VM in current rewriter codes.
I mean the Client side is VM).

Your figure is not quite correct, the process should be:
  (VM)
Client:Server:

ESTABLISHED|   |
   | -> FIN=1,seq=u   ->   |
FIN_WAIT_1 |   |
   | <- ACK=1,seq=v,ack=u+1 <- |
FINA_WAIT_2|   |CLOSE_WAIT
   | <- FIN=1,ACK=1,seq=w,ack=u+1<-|
handle_secondary():
fin_ack_seq = w
tcp_state = TCPS_LAST_ACK

   |   |LAST+ACK
   |   -> ACK=1,seq=u+1,ack=w+1|
TIME_WAIT  |   |CLOSED
CLOSED |   |

handle_primary():
if (ack = fin_ack_seq + 1)
   g_hash_table_remove()


(VM)
Client:Server:

ESTABLISHED|   |
 | -> FIN=1,seq=u   ->   |

handle_secondary():
fin_ack_seq = u
tcp_state = TCPS_LAST_ACK

FIN_WAIT_1 |   |
 | <- ACK=1,seq=v,ack=u+1 <- |

handle_primary():
fin_ack_seq = ack + 1
g_hash_table_remove()

But we probably want it to be removed in TIME_WAIT_CLOSED.



Yes, we should removed it after 2MSL, because the last
the sever side may not get the 'ACK=1,seq=v,ack=u+1' packet,
and it will resend the 'FIN=1,ACK=1,seq=w,ack=u+1'.

Thanks.



Thanks

.

Re: [Qemu-devel] [PATCH 2/3] COLO-compare: Optimize colo_packet_compare_common

2017-02-27 Thread Hailiang Zhang


On 2017/2/27 15:34, Zhang Chen wrote:



On 02/27/2017 03:28 PM, Hailiang Zhang wrote:

On 2017/2/27 15:03, Zhang Chen wrote:



On 02/25/2017 02:58 PM, Hailiang Zhang wrote:

On 2017/2/25 11:32, Zhang Chen wrote:

Add offset args for colo_packet_compare_common, optimize
colo_packet_compare_icmp() and colo_packet_compare_udp()
just compare the IP payload.

Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
---
net/colo-compare.c | 28 +---
1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index e75f0ae..9853232 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int
mode)
 * return:0  means packet same
 *> 0 || < 0 means packet different
 */
-static int colo_packet_compare_common(Packet *ppkt, Packet *spkt)
+static int colo_packet_compare_common(Packet *ppkt, Packet *spkt,
int offset)
{
trace_colo_compare_ip_info(ppkt->size,
inet_ntoa(ppkt->ip->ip_src),
inet_ntoa(ppkt->ip->ip_dst), spkt->size,
@@ -188,7 +188,8 @@ static int colo_packet_compare_common(Packet
*ppkt, Packet *spkt)
inet_ntoa(spkt->ip->ip_dst));

if (ppkt->size == spkt->size) {
-return memcmp(ppkt->data, spkt->data, spkt->size);
+return memcmp(ppkt->data + offset, spkt->data + offset,
+  spkt->size - offset);
} else {
trace_colo_compare_main("Net packet size are not the
same");
return -1;
@@ -237,8 +238,7 @@ static int colo_packet_compare_tcp(Packet *spkt,
Packet *ppkt)
spkt->ip->ip_sum = ppkt->ip->ip_sum;
}

-res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN,
-(spkt->size - ETH_HLEN));
+res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);



For tcp packets check, why not ignore the ip headers, just like udp
packets check ?
Besides, here, can we compare the checksum stored in headers of tcp
and udp
before call colo_packet_compare_common(), which i think will improve
the comparing
performance.


That's another way to compare the packet suggested by Dr. David Alan
Gilbert,
It makes two packets IP header be same firstly, then compare all IP
packet.
This way can tell people why we ignore IP header explicitly in other
packets check like udp.
For performance, If we ignore the IP header that will reduce at least 20
byte not to be compared.
So, ignore ip header have better comparing performance.



OK, here, i think we can re-use the checksum value stored in headers
of tcp
or udp, comparing it first before comparing the complete payload of
tcp or udp
packets is another way to improve the performance, it is only 2 bytes.


No, The IP header's checksum are always different, Because the IP
header's ID field
are always different.



Not checksum in ip header， i mean checksum in tcp header or udp header.



Thanks
Zhang Chen



Thanks.


Thanks
Zhang Chen




Thanks.
Hailiang


if (res != 0 &&
trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
trace_colo_compare_pkt_info_src(inet_ntoa(ppkt->ip->ip_src),
@@ -277,7 +277,14 @@ static int colo_packet_compare_udp(Packet *spkt,
Packet *ppkt)
return -1;
}

-ret = colo_packet_compare_common(ppkt, spkt);
+/*
+ * Because of ppkt and spkt are both in the same connection,
+ * The ppkt's src ip, dst ip, src port, dst port, ip_proto all
are
+ * same with spkt. In addition, IP header's Identification is a
random
+ * field, we can handle it in IP fragmentation function later.
+ * So we just compare the ip payload here.
+ */
+ret = colo_packet_compare_common(ppkt, spkt, network_length +
ETH_HLEN);

if (ret) {
trace_colo_compare_udp_miscompare("primary pkt size",
ppkt->size);
@@ -304,7 +311,14 @@ static int colo_packet_compare_icmp(Packet
*spkt, Packet *ppkt)
return -1;
}

-if (colo_packet_compare_common(ppkt, spkt)) {
+/*
+ * Because of ppkt and spkt are both in the same connection,
+ * The ppkt's src ip, dst ip, src port, dst port, ip_proto all
are
+ * same with spkt. In addition, IP header's Identification is a
random
+ * field, we can handle it in IP fragmentation function later.
+ * So we just compare the ip payload here.
+ */
+if (colo_packet_compare_common(ppkt, spkt, network_length +
ETH_HLEN)) {
trace_colo_compare_icmp_miscompare("primary pkt size",
ppkt->size);
qemu_hexdump((char *)ppkt->data, stderr, "colo-compare",
@@ -330,7 +344,7 @@ static int colo_packet_compare_other(Packet
*spkt, Packet *ppkt)
inet_ntoa(ppkt->ip->ip_dst), spkt->size,
inet_ntoa(spkt->ip->ip_src),
inet_ntoa(spkt->ip->ip_dst));
-return colo_packet_compare_common(ppkt, spkt);
+return colo_packet_compare_common(ppkt, spkt, 0);
}

static int colo_old_packet_check_one(Packet *pkt, int64_t
*check_time)






.







.

Re: [Qemu-devel] [PATCH 2/3] COLO-compare: Optimize colo_packet_compare_common

2017-02-26 Thread Hailiang Zhang


On 2017/2/27 15:03, Zhang Chen wrote:



On 02/25/2017 02:58 PM, Hailiang Zhang wrote:

On 2017/2/25 11:32, Zhang Chen wrote:

Add offset args for colo_packet_compare_common, optimize
colo_packet_compare_icmp() and colo_packet_compare_udp()
just compare the IP payload.

Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
---
   net/colo-compare.c | 28 +---
   1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index e75f0ae..9853232 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int mode)
* return:0  means packet same
*> 0 || < 0 means packet different
*/
-static int colo_packet_compare_common(Packet *ppkt, Packet *spkt)
+static int colo_packet_compare_common(Packet *ppkt, Packet *spkt,
int offset)
   {
   trace_colo_compare_ip_info(ppkt->size,
inet_ntoa(ppkt->ip->ip_src),
inet_ntoa(ppkt->ip->ip_dst), spkt->size,
@@ -188,7 +188,8 @@ static int colo_packet_compare_common(Packet
*ppkt, Packet *spkt)
inet_ntoa(spkt->ip->ip_dst));

   if (ppkt->size == spkt->size) {
-return memcmp(ppkt->data, spkt->data, spkt->size);
+return memcmp(ppkt->data + offset, spkt->data + offset,
+  spkt->size - offset);
   } else {
   trace_colo_compare_main("Net packet size are not the same");
   return -1;
@@ -237,8 +238,7 @@ static int colo_packet_compare_tcp(Packet *spkt,
Packet *ppkt)
   spkt->ip->ip_sum = ppkt->ip->ip_sum;
   }

-res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN,
-(spkt->size - ETH_HLEN));
+res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);



For tcp packets check, why not ignore the ip headers, just like udp
packets check ?
Besides, here, can we compare the checksum stored in headers of tcp
and udp
before call colo_packet_compare_common(), which i think will improve
the comparing
performance.


That's another way to compare the packet suggested by Dr. David Alan
Gilbert,
It makes two packets IP header be same firstly, then compare all IP packet.
This way can tell people why we ignore IP header explicitly in other
packets check like udp.
For performance, If we ignore the IP header that will reduce at least 20
byte not to be compared.
So, ignore ip header have better comparing performance.



OK, here, i think we can re-use the checksum value stored in headers of tcp
or udp, comparing it first before comparing the complete payload of tcp or udp
packets is another way to improve the performance, it is only 2 bytes.

Thanks.


Thanks
Zhang Chen




Thanks.
Hailiang


   if (res != 0 &&
trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
trace_colo_compare_pkt_info_src(inet_ntoa(ppkt->ip->ip_src),
@@ -277,7 +277,14 @@ static int colo_packet_compare_udp(Packet *spkt,
Packet *ppkt)
   return -1;
   }

-ret = colo_packet_compare_common(ppkt, spkt);
+/*
+ * Because of ppkt and spkt are both in the same connection,
+ * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are
+ * same with spkt. In addition, IP header's Identification is a
random
+ * field, we can handle it in IP fragmentation function later.
+ * So we just compare the ip payload here.
+ */
+ret = colo_packet_compare_common(ppkt, spkt, network_length +
ETH_HLEN);

   if (ret) {
   trace_colo_compare_udp_miscompare("primary pkt size",
ppkt->size);
@@ -304,7 +311,14 @@ static int colo_packet_compare_icmp(Packet
*spkt, Packet *ppkt)
   return -1;
   }

-if (colo_packet_compare_common(ppkt, spkt)) {
+/*
+ * Because of ppkt and spkt are both in the same connection,
+ * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are
+ * same with spkt. In addition, IP header's Identification is a
random
+ * field, we can handle it in IP fragmentation function later.
+ * So we just compare the ip payload here.
+ */
+if (colo_packet_compare_common(ppkt, spkt, network_length +
ETH_HLEN)) {
   trace_colo_compare_icmp_miscompare("primary pkt size",
  ppkt->size);
   qemu_hexdump((char *)ppkt->data, stderr, "colo-compare",
@@ -330,7 +344,7 @@ static int colo_packet_compare_other(Packet
*spkt, Packet *ppkt)
inet_ntoa(ppkt->ip->ip_dst), spkt->size,
inet_ntoa(spkt->ip->ip_src),
inet_ntoa(spkt->ip->ip_dst));
-return colo_packet_compare_common(ppkt, spkt);
+return colo_packet_compare_common(ppkt, spkt, 0);
   }

   static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)






.

Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table

2017-02-26 Thread Hailiang Zhang


On 2017/2/27 13:35, Jason Wang wrote:



On 2017年02月27日 12:09, Hailiang Zhang wrote:

On 2017/2/27 11:40, Jason Wang wrote:



On 2017年02月27日 11:11, Hailiang Zhang wrote:

On 2017/2/23 12:16, Jason Wang wrote:



On 2017年02月22日 16:51, Hailiang Zhang wrote:

On 2017/2/22 16:45, Hailiang Zhang wrote:

On 2017/2/22 16:07, Jason Wang wrote:



On 2017年02月22日 11:46, zhanghailiang wrote:

After a net connection is closed, we didn't clear its releated
resources
in connection_track_table, which will lead to memory leak.


Not a real leak but would lead reset of hash table if too many
closed
connections.



Yes, you are right, there will be lots of stale connection data in
hash table
if we don't remove it while it is been closed. Which



Ok, so let's come up with a better title of the patch.



OK.





Let't track the state of net connection, if it is closed, its
related
resources will be cleared up.


The issue is the state were tracked partially, do we need a full
state
machine here?



Not, IMHO, we only care about the last state of it, because, we will
do nothing
even if we track the intermedial states.


Well, you care at least syn state too. Without a complete state
machine,
it's very hard to track even partial state I believe. And you will
fail
to track some state transition for sure which makes the code fragile.



Agree, but here things are a little different. There are some extreme
cases
that we may can't track the complete process of closing connection.
For example (I have explained that in the bellow, it seems that you
didn't
got it ;) ).
If VM is running before we want to make it goes into COLO FT state,
there maybe some connections exist already, in extreme case, VM is
going into
COLO state while some connections are in half closing state, we can
only track
the bellow half closing state in filter-rewriter and colo compare
object.




[...]


Er, here we track the last two states 'FIN=1, ACK=1' and  'ACK=1' (
which asks
the 'FIN=1,ACK=1' packet, We will remove the connection while got the
'ACK=1'
packet, so  is it enough ?


But the connection is not closed in fact, no? It's legal for remote to
continue sending tons of packet to us even after this.



Er, I'm a little confused, Here, for server side,
i think after got the 'ACK=1,seq=u+1,ack=w+1', it is closed,
so i remove it from hash table, wrong ?

Client:Server:

ESTABLISHED|   |
| -> FIN=1,seq=u   ->   |


This is case A and ACK should be set in this segment too.


FIN_WAIT_1 |   |
| <- ACK=1,seq=v,ack=u+1 <- |
FINA_WAIT_2|   |CLOSE_WAIT
| <- FIN=1,ACK=1,seq=w,ack=u+1<-|
|   |LAST+ACK


This is case B.


|   -> ACK=1,seq=u+1,ack=w+1|
TIME_WAIT  |   |CLOSED
CLOSED |   |



I think the issue is that your code can not differ A from B.



We have a parameter 'fin_ack_seq' recording the sequence of
'FIN=1,ACK=1,seq=w,ack=u+1' and if the ack value from the opposite
side is is 'w+1', we can consider this connection is closed, no ?


Thanks




What's more I don't think we can decide passive or active close by:


+if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK |
TH_FIN)) {

Since both cases will send FIN,ACK for sure.



I didn't quite understand, here we have tracked the closing request no
matter
it is from the server side (passive close ?) or client side ( active
close ?).
You can refer to the comment in codes, 'Case 1' and 'Case 2' comments.


I think you need differ them since passive close is much simpler, and it
seems that your code may work only in this case.



Here, it seems that we can't track one case which both sides send the
closing
requests at the same time, in that case, there are only 'FIN=1' and
'ACK=1'
packets.



Yes, RFC allows this.



Hmm, I'd like to remove this patch from this series,
And send it as a single patch after all the above questions
been solved. How about the other patches ?



Looks good except for the compiling issue of patch 3.



OK, i will fix it in version 3.

Thanks


Thanks


Thanks,
Hailiang



Thanks



Thanks.
Hailiang



[...]

.

Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table

2017-02-26 Thread Hailiang Zhang


On 2017/2/27 11:40, Jason Wang wrote:



On 2017年02月27日 11:11, Hailiang Zhang wrote:

On 2017/2/23 12:16, Jason Wang wrote:



On 2017年02月22日 16:51, Hailiang Zhang wrote:

On 2017/2/22 16:45, Hailiang Zhang wrote:

On 2017/2/22 16:07, Jason Wang wrote:



On 2017年02月22日 11:46, zhanghailiang wrote:

After a net connection is closed, we didn't clear its releated
resources
in connection_track_table, which will lead to memory leak.


Not a real leak but would lead reset of hash table if too many closed
connections.



Yes, you are right, there will be lots of stale connection data in
hash table
if we don't remove it while it is been closed. Which



Ok, so let's come up with a better title of the patch.



OK.





Let't track the state of net connection, if it is closed, its
related
resources will be cleared up.


The issue is the state were tracked partially, do we need a full
state
machine here?



Not, IMHO, we only care about the last state of it, because, we will
do nothing
even if we track the intermedial states.


Well, you care at least syn state too. Without a complete state machine,
it's very hard to track even partial state I believe. And you will fail
to track some state transition for sure which makes the code fragile.



Agree, but here things are a little different. There are some extreme
cases
that we may can't track the complete process of closing connection.
For example (I have explained that in the bellow, it seems that you
didn't
got it ;) ).
If VM is running before we want to make it goes into COLO FT state,
there maybe some connections exist already, in extreme case, VM is
going into
COLO state while some connections are in half closing state, we can
only track
the bellow half closing state in filter-rewriter and colo compare object.





Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
  net/colo.h|  4 +++
  net/filter-rewriter.c | 70
+--
  2 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/net/colo.h b/net/colo.h
index 7c524f3..cd9027f 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -18,6 +18,7 @@
  #include "slirp/slirp.h"
  #include "qemu/jhash.h"
  #include "qemu/timer.h"
+#include "slirp/tcp.h"

  #define HASHTABLE_MAX_SIZE 16384

@@ -69,6 +70,9 @@ typedef struct Connection {
   * run once in independent tcp connection
   */
  int syn_flag;
+
+int tcp_state; /* TCP FSM state */
+tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
  } Connection;

  uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index c4ab91c..7e7ec35 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt)
  }

  /* handle tcp packet from primary guest */
-static int handle_primary_tcp_pkt(NetFilterState *nf,
+static int handle_primary_tcp_pkt(RewriterState *rf,
Connection *conn,
-  Packet *pkt)
+  Packet *pkt, ConnectionKey *key)
  {
  struct tcphdr *tcp_pkt;

@@ -97,15 +97,45 @@ static int
handle_primary_tcp_pkt(NetFilterState *nf,
  tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) +
conn->offset);

  net_checksum_calculate((uint8_t *)pkt->data,
pkt->size);
+/*
+ * Case 1:
+ * The *server* side of this connect is VM, *client* tries
to close
+ * the connection.
+ *
+ * We got 'ack=1' packets from client side, it acks
'fin=1, ack=1'
+ * packet from server side. From this point, we can ensure
that there
+ * will be no packets in the connection, except that, some
errors
+ * happen between the path of 'filter object' and vNIC, if
this rare
+ * case really happen, we can still create a new
connection,
+ * So it is safe to remove the connection from
connection_track_table.
+ *
+ */
+if ((conn->tcp_state == TCPS_LAST_ACK) &&
+(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+fprintf(stderr, "Remove conn "


Can this even compile?



Oops, i forgot to remove it, will remove it in next version.


+ g_hash_table_remove(rf->connection_track_table, key);
+}
+}
+/*
+ * Case 2:
+ * The *server* side of this connect is VM, *server* tries to
close
+ * the connection.
+ *
+ * We got 'fin=1, ack=1' packet from client side, we need to
+ * record the seq of 'fin=1, ack=1' packet.
+ */
+if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK |
TH_FIN)) {
+conn->fin_ack_seq = htonl(tcp_pkt->th_seq);
+conn->tcp_state = TCPS_LAST_ACK;
  }

  return 0;
  }

  /* handle tcp packet from secon

Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table

2017-02-26 Thread Hailiang Zhang


On 2017/2/23 12:16, Jason Wang wrote:



On 2017年02月22日 16:51, Hailiang Zhang wrote:

On 2017/2/22 16:45, Hailiang Zhang wrote:

On 2017/2/22 16:07, Jason Wang wrote:



On 2017年02月22日 11:46, zhanghailiang wrote:

After a net connection is closed, we didn't clear its releated
resources
in connection_track_table, which will lead to memory leak.


Not a real leak but would lead reset of hash table if too many closed
connections.



Yes, you are right, there will be lots of stale connection data in
hash table
if we don't remove it while it is been closed. Which



Ok, so let's come up with a better title of the patch.



OK.





Let't track the state of net connection, if it is closed, its related
resources will be cleared up.


The issue is the state were tracked partially, do we need a full state
machine here?



Not, IMHO, we only care about the last state of it, because, we will
do nothing
even if we track the intermedial states.


Well, you care at least syn state too. Without a complete state machine,
it's very hard to track even partial state I believe. And you will fail
to track some state transition for sure which makes the code fragile.



Agree, but here things are a little different. There are some extreme cases
that we may can't track the complete process of closing connection.
For example (I have explained that in the bellow, it seems that you didn't
got it ;) ).
If VM is running before we want to make it goes into COLO FT state,
there maybe some connections exist already, in extreme case, VM is going into
COLO state while some connections are in half closing state, we can only track
the bellow half closing state in filter-rewriter and colo compare object.





Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 net/colo.h|  4 +++
 net/filter-rewriter.c | 70
+--
 2 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/net/colo.h b/net/colo.h
index 7c524f3..cd9027f 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -18,6 +18,7 @@
 #include "slirp/slirp.h"
 #include "qemu/jhash.h"
 #include "qemu/timer.h"
+#include "slirp/tcp.h"

 #define HASHTABLE_MAX_SIZE 16384

@@ -69,6 +70,9 @@ typedef struct Connection {
  * run once in independent tcp connection
  */
 int syn_flag;
+
+int tcp_state; /* TCP FSM state */
+tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
 } Connection;

 uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index c4ab91c..7e7ec35 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt)
 }

 /* handle tcp packet from primary guest */
-static int handle_primary_tcp_pkt(NetFilterState *nf,
+static int handle_primary_tcp_pkt(RewriterState *rf,
   Connection *conn,
-  Packet *pkt)
+  Packet *pkt, ConnectionKey *key)
 {
 struct tcphdr *tcp_pkt;

@@ -97,15 +97,45 @@ static int
handle_primary_tcp_pkt(NetFilterState *nf,
 tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) +
conn->offset);

 net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+/*
+ * Case 1:
+ * The *server* side of this connect is VM, *client* tries
to close
+ * the connection.
+ *
+ * We got 'ack=1' packets from client side, it acks
'fin=1, ack=1'
+ * packet from server side. From this point, we can ensure
that there
+ * will be no packets in the connection, except that, some
errors
+ * happen between the path of 'filter object' and vNIC, if
this rare
+ * case really happen, we can still create a new connection,
+ * So it is safe to remove the connection from
connection_track_table.
+ *
+ */
+if ((conn->tcp_state == TCPS_LAST_ACK) &&
+(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+fprintf(stderr, "Remove conn "


Can this even compile?



Oops, i forgot to remove it, will remove it in next version.


+ g_hash_table_remove(rf->connection_track_table, key);
+}
+}
+/*
+ * Case 2:
+ * The *server* side of this connect is VM, *server* tries to
close
+ * the connection.
+ *
+ * We got 'fin=1, ack=1' packet from client side, we need to
+ * record the seq of 'fin=1, ack=1' packet.
+ */
+if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK |
TH_FIN)) {
+conn->fin_ack_seq = htonl(tcp_pkt->th_seq);
+conn->tcp_state = TCPS_LAST_ACK;
 }

 return 0;
 }

 /* handle tcp packet from secondary guest */
-static int handle_secondary_tcp_pkt(NetFilterState *nf,
+stati

Re: [Qemu-devel] [PATCH v2 3/3] filter-rewriter: skip net_checksum_calculate() while offset = 0

2017-02-26 Thread Hailiang Zhang


On 2017/2/24 16:08, Zhang Chen wrote:



On 02/22/2017 11:46 AM, zhanghailiang wrote:

While the offset of packets's sequence for primary side and
secondary side is zero, it is unnecessary to call net_checksum_calculate()
to recalculate the checksume value of packets.

Signed-off-by: zhanghailiang 
---
   net/filter-rewriter.c | 18 +++---
   1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 7e7ec35..c9a6d43 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -93,10 +93,12 @@ static int handle_primary_tcp_pkt(RewriterState *rf,
   conn->offset -= (ntohl(tcp_pkt->th_ack) - 1);
   conn->syn_flag = 0;
   }
-/* handle packets to the secondary from the primary */
-tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);
+if (conn->offset) {


This is wrong, conn->offset maybe is a negative value(like -1000),
So you can change here to  "if (conn->offset == 0) {"



Er, if it is a negative value, it can still go into this if (conn->offset)
branch, and we need to adjust the checksum value in this case.




+/* handle packets to the secondary from the primary */
+tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);

-net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+}
   /*
* Case 1:
* The *server* side of this connect is VM, *client* tries to close
@@ -112,7 +114,6 @@ static int handle_primary_tcp_pkt(RewriterState *rf,
*/
   if ((conn->tcp_state == TCPS_LAST_ACK) &&
   (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
-fprintf(stderr, "Remove conn "


Here need fix.



OK.


   g_hash_table_remove(rf->connection_track_table, key);
   }
   }
@@ -159,10 +160,13 @@ static int handle_secondary_tcp_pkt(RewriterState *rf,
   }

   if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK) {
-/* handle packets to the primary from the secondary*/
-tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset);
+/* Only need to adjust seq while offset is Non-zero */
+if (conn->offset) {


Refer to the above comments.

Thanks
Zhang Chen


+/* handle packets to the primary from the secondary*/
+tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset);

-net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+}
   /*
* Case 2:
* The *server* side of this connect is VM, *server* tries to close

Re: [Qemu-devel] [PATCH 2/3] COLO-compare: Optimize colo_packet_compare_common

2017-02-24 Thread Hailiang Zhang


On 2017/2/25 11:32, Zhang Chen wrote:

Add offset args for colo_packet_compare_common, optimize
colo_packet_compare_icmp() and colo_packet_compare_udp()
just compare the IP payload.

Signed-off-by: Zhang Chen 
---
  net/colo-compare.c | 28 +---
  1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index e75f0ae..9853232 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int mode)
   * return:0  means packet same
   *> 0 || < 0 means packet different
   */
-static int colo_packet_compare_common(Packet *ppkt, Packet *spkt)
+static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset)
  {
  trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src),
 inet_ntoa(ppkt->ip->ip_dst), spkt->size,
@@ -188,7 +188,8 @@ static int colo_packet_compare_common(Packet *ppkt, Packet 
*spkt)
 inet_ntoa(spkt->ip->ip_dst));

  if (ppkt->size == spkt->size) {


This check is useless, because we have done such checks in every caller.


-return memcmp(ppkt->data, spkt->data, spkt->size);
+return memcmp(ppkt->data + offset, spkt->data + offset,
+  spkt->size - offset);
  } else {
  trace_colo_compare_main("Net packet size are not the same");
  return -1;
@@ -237,8 +238,7 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
  spkt->ip->ip_sum = ppkt->ip->ip_sum;
  }

-res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN,
-(spkt->size - ETH_HLEN));
+res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);

  if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
  trace_colo_compare_pkt_info_src(inet_ntoa(ppkt->ip->ip_src),
@@ -277,7 +277,14 @@ static int colo_packet_compare_udp(Packet *spkt, Packet 
*ppkt)
  return -1;
  }

-ret = colo_packet_compare_common(ppkt, spkt);
+/*
+ * Because of ppkt and spkt are both in the same connection,
+ * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are
+ * same with spkt. In addition, IP header's Identification is a random
+ * field, we can handle it in IP fragmentation function later.
+ * So we just compare the ip payload here.
+ */
+ret = colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN);

  if (ret) {
  trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size);
@@ -304,7 +311,14 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet 
*ppkt)
  return -1;
  }

-if (colo_packet_compare_common(ppkt, spkt)) {
+/*
+ * Because of ppkt and spkt are both in the same connection,
+ * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are
+ * same with spkt. In addition, IP header's Identification is a random
+ * field, we can handle it in IP fragmentation function later.
+ * So we just compare the ip payload here.
+ */
+if (colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN)) {
  trace_colo_compare_icmp_miscompare("primary pkt size",
 ppkt->size);
  qemu_hexdump((char *)ppkt->data, stderr, "colo-compare",
@@ -330,7 +344,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet 
*ppkt)
 inet_ntoa(ppkt->ip->ip_dst), spkt->size,
 inet_ntoa(spkt->ip->ip_src),
 inet_ntoa(spkt->ip->ip_dst));
-return colo_packet_compare_common(ppkt, spkt);
+return colo_packet_compare_common(ppkt, spkt, 0);
  }

  static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)

Re: [Qemu-devel] [PATCH 2/3] COLO-compare: Optimize colo_packet_compare_common

2017-02-24 Thread Hailiang Zhang


On 2017/2/25 11:32, Zhang Chen wrote:

Add offset args for colo_packet_compare_common, optimize
colo_packet_compare_icmp() and colo_packet_compare_udp()
just compare the IP payload.

Signed-off-by: Zhang Chen 
---
  net/colo-compare.c | 28 +---
  1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index e75f0ae..9853232 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int mode)
   * return:0  means packet same
   *> 0 || < 0 means packet different
   */
-static int colo_packet_compare_common(Packet *ppkt, Packet *spkt)
+static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset)
  {
  trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src),
 inet_ntoa(ppkt->ip->ip_dst), spkt->size,
@@ -188,7 +188,8 @@ static int colo_packet_compare_common(Packet *ppkt, Packet 
*spkt)
 inet_ntoa(spkt->ip->ip_dst));

  if (ppkt->size == spkt->size) {
-return memcmp(ppkt->data, spkt->data, spkt->size);
+return memcmp(ppkt->data + offset, spkt->data + offset,
+  spkt->size - offset);
  } else {
  trace_colo_compare_main("Net packet size are not the same");
  return -1;
@@ -237,8 +238,7 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
  spkt->ip->ip_sum = ppkt->ip->ip_sum;
  }

-res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN,
-(spkt->size - ETH_HLEN));
+res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);



For tcp packets check, why not ignore the ip headers, just like udp packets 
check ?
Besides, here, can we compare the checksum stored in headers of tcp and udp
before call colo_packet_compare_common(), which i think will improve the 
comparing
performance.

Thanks.
Hailiang


  if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
  trace_colo_compare_pkt_info_src(inet_ntoa(ppkt->ip->ip_src),
@@ -277,7 +277,14 @@ static int colo_packet_compare_udp(Packet *spkt, Packet 
*ppkt)
  return -1;
  }

-ret = colo_packet_compare_common(ppkt, spkt);
+/*
+ * Because of ppkt and spkt are both in the same connection,
+ * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are
+ * same with spkt. In addition, IP header's Identification is a random
+ * field, we can handle it in IP fragmentation function later.
+ * So we just compare the ip payload here.
+ */
+ret = colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN);

  if (ret) {
  trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size);
@@ -304,7 +311,14 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet 
*ppkt)
  return -1;
  }

-if (colo_packet_compare_common(ppkt, spkt)) {
+/*
+ * Because of ppkt and spkt are both in the same connection,
+ * The ppkt's src ip, dst ip, src port, dst port, ip_proto all are
+ * same with spkt. In addition, IP header's Identification is a random
+ * field, we can handle it in IP fragmentation function later.
+ * So we just compare the ip payload here.
+ */
+if (colo_packet_compare_common(ppkt, spkt, network_length + ETH_HLEN)) {
  trace_colo_compare_icmp_miscompare("primary pkt size",
 ppkt->size);
  qemu_hexdump((char *)ppkt->data, stderr, "colo-compare",
@@ -330,7 +344,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet 
*ppkt)
 inet_ntoa(ppkt->ip->ip_dst), spkt->size,
 inet_ntoa(spkt->ip->ip_src),
 inet_ntoa(spkt->ip->ip_dst));
-return colo_packet_compare_common(ppkt, spkt);
+return colo_packet_compare_common(ppkt, spkt, 0);
  }

  static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)

Re: [Qemu-devel] [PATCH 1/3] COLO-compare: Add minimum packet size check and some fix

2017-02-24 Thread Hailiang Zhang


Hi,

On 2017/2/25 11:32, Zhang Chen wrote:

Add packet minimum size check in colo_packet_compare_udp()
and colo_packet_compare_udp() like colo_packet_compare_icmp(),
rename function colo_packet_compare() to colo_packet_compare_common()
that we will reuse it later.

Signed-off-by: Zhang Chen 
---
  net/colo-compare.c | 30 ++
  1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 300f017..e75f0ae 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -180,7 +180,7 @@ static int packet_enqueue(CompareState *s, int mode)
   * return:0  means packet same
   *> 0 || < 0 means packet different
   */
-static int colo_packet_compare(Packet *ppkt, Packet *spkt)
+static int colo_packet_compare_common(Packet *ppkt, Packet *spkt)
  {
  trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src),
 inet_ntoa(ppkt->ip->ip_dst), spkt->size,
@@ -190,6 +190,7 @@ static int colo_packet_compare(Packet *ppkt, Packet *spkt)
  if (ppkt->size == spkt->size) {
  return memcmp(ppkt->data, spkt->data, spkt->size);
  } else {
+trace_colo_compare_main("Net packet size are not the same");
  return -1;
  }
  }
@@ -202,9 +203,10 @@ static int colo_packet_compare(Packet *ppkt, Packet *spkt)
  static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt)
  {
  struct tcphdr *ptcp, *stcp;
-int res;
+int res, network_length;

  trace_colo_compare_main("compare tcp");
+
  if (ppkt->size != spkt->size) {
  if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
  trace_colo_compare_main("pkt size not same");
@@ -212,6 +214,12 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
  return -1;
  }

+network_length = ppkt->ip->ip_hl * 4;
+if (ppkt->size < network_length + ETH_HLEN) {


I think the check here is useless, because you have such check in
parse_packet_early() which is been called before these helpers.
And what check you need to add is, to check if the packet's size

= packet's length been record in ip header.


Like:
+++ b/net/colo.c
@@ -78,6 +78,12 @@ int parse_packet_early(Packet *pkt)
 trace_colo_proxy_main("pkt->size < network_header + network_length");
 return 1;
 }
+
+if (pkt->size < ETH_HLEN + ntohs(pkt->ip->ip_len)) {
+fprintf(stderr, "pkt->size %d < pkt expect total len %ld\n", pkt->size,
+pkt_MAChdr_len + ntohs(pkt->ip->ip_len));
+return -1;
+}



+trace_colo_compare_main("tcp packet size error");
+return -1;
+}
+
  ptcp = (struct tcphdr *)ppkt->transport_header;
  stcp = (struct tcphdr *)spkt->transport_header;

@@ -260,10 +268,16 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
   */
  static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt)
  {
-int ret;
+int ret, network_length;

  trace_colo_compare_main("compare udp");
-ret = colo_packet_compare(ppkt, spkt);
+network_length = ppkt->ip->ip_hl * 4;
+if (ppkt->size < network_length + ETH_HLEN) {
+trace_colo_compare_main("udp packet size error");
+return -1;
+}
+
+ret = colo_packet_compare_common(ppkt, spkt);

  if (ret) {
  trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size);
@@ -285,12 +299,12 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet 
*ppkt)

  trace_colo_compare_main("compare icmp");
  network_length = ppkt->ip->ip_hl * 4;
-if (ppkt->size != spkt->size ||
-ppkt->size < network_length + ETH_HLEN) {
+if (ppkt->size < network_length + ETH_HLEN) {
+trace_colo_compare_main("icmp packet size error");
  return -1;
  }

-if (colo_packet_compare(ppkt, spkt)) {
+if (colo_packet_compare_common(ppkt, spkt)) {
  trace_colo_compare_icmp_miscompare("primary pkt size",
 ppkt->size);
  qemu_hexdump((char *)ppkt->data, stderr, "colo-compare",
@@ -316,7 +330,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet 
*ppkt)
 inet_ntoa(ppkt->ip->ip_dst), spkt->size,
 inet_ntoa(spkt->ip->ip_src),
 inet_ntoa(spkt->ip->ip_dst));
-return colo_packet_compare(ppkt, spkt);
+return colo_packet_compare_common(ppkt, spkt);
  }

  static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)

Re: [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case

2017-02-23 Thread Hailiang Zhang


ping ... ?

On 2017/1/20 11:47, zhanghailiang wrote:

COLO block replication doesn't support the shared disk case,
Here we try to implement it and this is the third version.

Last posted series patches:
https://lists.gnu.org/archive/html/qemu-block/2016-12/msg00039.html
You can refer to the above link if want to test it.

I have uploaded the new version to github:
https://github.com/coloft/qemu/tree/colo-developing-with-shared-disk-2016-1-20

Please review and any commits are welcomed.

Cc: Juan Quintela 
Cc: Amit Shah 
Cc: Dr. David Alan Gilbert (git) 
Cc: eddie.d...@intel.com

v3:
- Fix some comments from Stefan and Eric

v2:
- Drop the patch which add a blk_root() helper
- Fix some comments from Changlong

zhanghailiang (6):
   docs/block-replication: Add description for shared-disk case
   replication: add shared-disk and shared-disk-id options
   replication: Split out backup_do_checkpoint() from
 secondary_do_checkpoint()
   replication: fix code logic with the new shared_disk option
   replication: Implement block replication for shared disk case
   nbd/replication: implement .bdrv_get_info() for nbd and replication
 driver

  block/nbd.c|  12 
  block/replication.c| 156 +++--
  docs/block-replication.txt | 139 ++--
  qapi/block-core.json   |  10 ++-
  4 files changed, 279 insertions(+), 38 deletions(-)

Re: [Qemu-devel] [PATCH 05/15] COLO: Handle shutdown command for VM in COLO state

2017-02-22 Thread Hailiang Zhang


Hi Eric,

On 2017/2/22 23:35, Eric Blake wrote:

On 02/21/2017 09:42 PM, zhanghailiang wrote:

If VM is in COLO FT state, we need to do some extra works before
starting normal shutdown process.

Secondary VM will ignore the shutdown command if users issue it directly
to Secondary VM. COLO will capture shutdown command and after
shutdown request from user.

Cc: Paolo Bonzini 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- fix title and comment


Did you miss putting v19 in the subject line?



Er, some patches of this series are split from previous
v18 version, but most patches of that series has been merged
into upstream, so that's why it has a tag here.
I think it is better to remove this comment instead of using
v19 tag for this series.




+++ b/qapi-schema.json
@@ -1157,12 +1157,14 @@
  #
  # @vmstate-loaded: VM's state has been loaded by SVM.
  #
+# @guest-shutdown: shutdown require from PVM to SVM


maybe s/require/requested/ ?



Yes.


Missing '(since 2.9)'



Will fix this in next version, thanks.


+#
  # Since: 2.8
  ##
  { 'enum': 'COLOMessage',
'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
  'vmstate-send', 'vmstate-size', 'vmstate-received',
-'vmstate-loaded' ] }
+'vmstate-loaded', 'guest-shutdown' ] }

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-02-22 Thread Hailiang Zhang


Hi,

On 2017/2/22 17:31, Zhang Chen wrote:



On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.


Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?



I don't think you need another mechanism, what you need to do is to
realize a qmp command which calls colo_notify_compares_event(),
It will not  return until the event (checkpoint or failover) be
handled by all compares. Will this satisfy your requirement ?

Thanks,
Hailiang



Thanks
Zhang Chen



Cc: Jason Wang 
Signed-off-by: zhanghailiang 
Signed-off-by: Zhang Chen 
---
   net/colo-compare.c | 72 
++
   net/colo-compare.h | 20 +++
   2 files changed, 92 insertions(+)
   create mode 100644 net/colo-compare.h

Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table

2017-02-22 Thread Hailiang Zhang


On 2017/2/22 16:45, Hailiang Zhang wrote:

On 2017/2/22 16:07, Jason Wang wrote:



On 2017年02月22日 11:46, zhanghailiang wrote:

After a net connection is closed, we didn't clear its releated resources
in connection_track_table, which will lead to memory leak.


Not a real leak but would lead reset of hash table if too many closed
connections.



Yes, you are right, there will be lots of stale connection data in hash table
if we don't remove it while it is been closed. Which



Let't track the state of net connection, if it is closed, its related
resources will be cleared up.


The issue is the state were tracked partially, do we need a full state
machine here?



Not, IMHO, we only care about the last state of it, because, we will do nothing
even if we track the intermedial states.



Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
net/colo.h|  4 +++
net/filter-rewriter.c | 70 
+--
2 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/net/colo.h b/net/colo.h
index 7c524f3..cd9027f 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -18,6 +18,7 @@
#include "slirp/slirp.h"
#include "qemu/jhash.h"
#include "qemu/timer.h"
+#include "slirp/tcp.h"

#define HASHTABLE_MAX_SIZE 16384

@@ -69,6 +70,9 @@ typedef struct Connection {
 * run once in independent tcp connection
 */
int syn_flag;
+
+int tcp_state; /* TCP FSM state */
+tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
} Connection;

uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index c4ab91c..7e7ec35 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt)
}

/* handle tcp packet from primary guest */
-static int handle_primary_tcp_pkt(NetFilterState *nf,
+static int handle_primary_tcp_pkt(RewriterState *rf,
  Connection *conn,
-  Packet *pkt)
+  Packet *pkt, ConnectionKey *key)
{
struct tcphdr *tcp_pkt;

@@ -97,15 +97,45 @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);

net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+/*
+ * Case 1:
+ * The *server* side of this connect is VM, *client* tries to close
+ * the connection.
+ *
+ * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1'
+ * packet from server side. From this point, we can ensure that there
+ * will be no packets in the connection, except that, some errors
+ * happen between the path of 'filter object' and vNIC, if this rare
+ * case really happen, we can still create a new connection,
+ * So it is safe to remove the connection from connection_track_table.
+ *
+ */
+if ((conn->tcp_state == TCPS_LAST_ACK) &&
+(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+fprintf(stderr, "Remove conn "


Can this even compile?



Oops, i forgot to remove it, will remove it in next version.


+g_hash_table_remove(rf->connection_track_table, key);
+}
+}
+/*
+ * Case 2:
+ * The *server* side of this connect is VM, *server* tries to close
+ * the connection.
+ *
+ * We got 'fin=1, ack=1' packet from client side, we need to
+ * record the seq of 'fin=1, ack=1' packet.
+ */
+if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) {
+conn->fin_ack_seq = htonl(tcp_pkt->th_seq);
+conn->tcp_state = TCPS_LAST_ACK;
}

return 0;
}

/* handle tcp packet from secondary guest */
-static int handle_secondary_tcp_pkt(NetFilterState *nf,
+static int handle_secondary_tcp_pkt(RewriterState *rf,
Connection *conn,
-Packet *pkt)
+Packet *pkt, ConnectionKey *key)
{
struct tcphdr *tcp_pkt;

@@ -133,8 +163,34 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset);

net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+/*
+ * Case 2:
+ * The *server* side of this connect is VM, *server* tries to close
+ * the connection.
+ *
+ * We got 'ack=1' packets from server side, it acks 'fin=1, ack=1'
+ * packet from client side. Like Case 1, there should be no packets
+ * in the connection from now know, But the difference here is
+ * if the packet is lost, We will get the

Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table

2017-02-22 Thread Hailiang Zhang


On 2017/2/22 16:07, Jason Wang wrote:



On 2017年02月22日 11:46, zhanghailiang wrote:

After a net connection is closed, we didn't clear its releated resources
in connection_track_table, which will lead to memory leak.


Not a real leak but would lead reset of hash table if too many closed
connections.



Yes, you are right, there will be lots of stale connection data in hash table
if we don't remove it while it is been closed. Which



Let't track the state of net connection, if it is closed, its related
resources will be cleared up.


The issue is the state were tracked partially, do we need a full state
machine here?



Not, IMHO, we only care about the last state of it, because, we will do nothing
even if we track the intermedial states.



Signed-off-by: zhanghailiang 
---
   net/colo.h|  4 +++
   net/filter-rewriter.c | 70 
+--
   2 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/net/colo.h b/net/colo.h
index 7c524f3..cd9027f 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -18,6 +18,7 @@
   #include "slirp/slirp.h"
   #include "qemu/jhash.h"
   #include "qemu/timer.h"
+#include "slirp/tcp.h"

   #define HASHTABLE_MAX_SIZE 16384

@@ -69,6 +70,9 @@ typedef struct Connection {
* run once in independent tcp connection
*/
   int syn_flag;
+
+int tcp_state; /* TCP FSM state */
+tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
   } Connection;

   uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index c4ab91c..7e7ec35 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt)
   }

   /* handle tcp packet from primary guest */
-static int handle_primary_tcp_pkt(NetFilterState *nf,
+static int handle_primary_tcp_pkt(RewriterState *rf,
 Connection *conn,
-  Packet *pkt)
+  Packet *pkt, ConnectionKey *key)
   {
   struct tcphdr *tcp_pkt;

@@ -97,15 +97,45 @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
   tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);

   net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+/*
+ * Case 1:
+ * The *server* side of this connect is VM, *client* tries to close
+ * the connection.
+ *
+ * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1'
+ * packet from server side. From this point, we can ensure that there
+ * will be no packets in the connection, except that, some errors
+ * happen between the path of 'filter object' and vNIC, if this rare
+ * case really happen, we can still create a new connection,
+ * So it is safe to remove the connection from connection_track_table.
+ *
+ */
+if ((conn->tcp_state == TCPS_LAST_ACK) &&
+(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+fprintf(stderr, "Remove conn "


Can this even compile?



Oops, i forgot to remove it, will remove it in next version.


+g_hash_table_remove(rf->connection_track_table, key);
+}
+}
+/*
+ * Case 2:
+ * The *server* side of this connect is VM, *server* tries to close
+ * the connection.
+ *
+ * We got 'fin=1, ack=1' packet from client side, we need to
+ * record the seq of 'fin=1, ack=1' packet.
+ */
+if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) {
+conn->fin_ack_seq = htonl(tcp_pkt->th_seq);
+conn->tcp_state = TCPS_LAST_ACK;
   }

   return 0;
   }

   /* handle tcp packet from secondary guest */
-static int handle_secondary_tcp_pkt(NetFilterState *nf,
+static int handle_secondary_tcp_pkt(RewriterState *rf,
   Connection *conn,
-Packet *pkt)
+Packet *pkt, ConnectionKey *key)
   {
   struct tcphdr *tcp_pkt;

@@ -133,8 +163,34 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
   tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) - conn->offset);

   net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+/*
+ * Case 2:
+ * The *server* side of this connect is VM, *server* tries to close
+ * the connection.
+ *
+ * We got 'ack=1' packets from server side, it acks 'fin=1, ack=1'
+ * packet from client side. Like Case 1, there should be no packets
+ * in the connection from now know, But the difference here is
+ * if the packet is lost, We will get the resent 'fin=1,ack=1' packet.
+ * TODO: Fix above case.
+ */
+if ((conn->tcp_state == TCPS_LAST_ACK) &&
+(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+

Re: [Qemu-devel] [PATCH] net/colo-compare: Fix memory free error

2017-02-21 Thread Hailiang Zhang


On 2017/2/21 10:44, Zhang Chen wrote:

We use g_queue_init() to init s->conn_list, so we should use g_queue_clear()
to instead of g_queue_free().

Signed-off-by: Zhang Chen 


Reviewed-by: zhanghailiang 


---
  net/colo-compare.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 162fd6a..cf8c4c9 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -715,7 +715,7 @@ static void colo_compare_finalize(Object *obj)
  qemu_chr_fe_deinit(>chr_sec_in);
  qemu_chr_fe_deinit(>chr_out);

-g_queue_free(>conn_list);
+g_queue_clear(>conn_list);

  if (qemu_thread_is_self(>thread)) {
  /* compare connection */

Re: [Qemu-devel] [PATCH V7 2/2] Add a new qmp command to do checkpoint, query xen replication status

2017-02-21 Thread Hailiang Zhang


On 2017/2/8 13:24, Zhang Chen wrote:

We can call this qmp command to do checkpoint outside of qemu.
Xen colo will need this function.

Signed-off-by: Zhang Chen 
Signed-off-by: Wen Congyang 
---
  migration/colo.c | 17 
  qapi-schema.json | 60 
  2 files changed, 77 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 6fc2ade..2f98a33 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -127,6 +127,23 @@ void qmp_xen_set_replication(bool enable, bool primary,
  }
  }

+ReplicationResult *qmp_query_xen_replication_status(Error **errp)
+{
+Error *err = NULL;
+ReplicationResult *result = g_new0(ReplicationResult, 1);


Indent, line break.


+replication_get_error_all();
+result->status = err ?
+ REPLICATION_STATUS_ERROR :
+ REPLICATION_STATUS_NORMAL;
+error_free(err);
+return result;
+}
+
+void qmp_xen_do_checkpoint(Error **errp)
+{
+replication_do_checkpoint_all(errp);
+}
+
  static void colo_send_message(QEMUFile *f, COLOMessage msg,
Error **errp)
  {
diff --git a/qapi-schema.json b/qapi-schema.json
index 9445b93..719744a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -5931,6 +5931,66 @@
'data': { 'enable': 'bool', 'primary': 'bool', '*failover' : 'bool' } }

  ##
+# @ReplicationStatus:
+#
+# Describe the status of replication.
+#
+# @error: Replication has an error.
+#
+# @normal: Replication is running normally.
+#
+# Since: 2.9
+##
+{ 'enum': 'ReplicationStatus',
+  'data': [ 'error', 'normal' ] }
+
+##
+# @ReplicationResult:
+#
+# The result format for 'query-xen-replication-status'.
+#
+# @status: enum of @ReplicationStatus, which shows current
+#  replication error status
+#
+# Since: 2.9
+##
+{ 'struct': 'ReplicationResult',
+  'data': { 'status': 'ReplicationStatus'} }

   ^ Space


+
+##
+# @query-xen-replication-status:
+#
+# Query replication status while the vm is running.
+#
+# Returns: A @ReplicationResult objects showing the status.
+#
+# Example:
+#
+# -> { "execute": "query-xen-replication-status" }
+# <- { "return": { "status": "normal" } }
+#
+# Since: 2.9
+##
+{ 'command': 'query-xen-replication-status',
+  'returns': 'ReplicationResult' }
+
+##
+# @xen-do-checkpoint:
+#


Maybe use the name 'xen-checkpoint-notify' or 'xen-colo-checkpoint-notify' ?


+# Xen uses this command to notify replication to trigger a checkpoint.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "xen-do-checkpoint" }
+# <- { "return": {} }
+#
+# Since: 2.9
+##
+{ 'command': 'xen-do-checkpoint' }
+
+##
  # @GICCapability:
  #
  # The struct describes capability for a specific GIC (Generic

Re: [Qemu-devel] [PATCH V7 1/2] Add a new qmp command to start/stop replication

2017-02-21 Thread Hailiang Zhang


On 2017/2/8 13:24, Zhang Chen wrote:

We can call this qmp command to start/stop replication outside of qemu.
Like Xen colo need this function.

Signed-off-by: Zhang Chen 
Signed-off-by: Wen Congyang 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Stabellini 



Reviewed-by: zhanghailiang 


---
  migration/colo.c | 23 +++
  qapi-schema.json | 25 +
  2 files changed, 48 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 93c85c5..6fc2ade 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -19,6 +19,8 @@
  #include "qemu/error-report.h"
  #include "qapi/error.h"
  #include "migration/failover.h"
+#include "replication.h"
+#include "qmp-commands.h"

  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)

@@ -104,6 +106,27 @@ void colo_do_failover(MigrationState *s)
  }
  }

+void qmp_xen_set_replication(bool enable, bool primary,
+ bool has_failover, bool failover,
+ Error **errp)
+{
+ReplicationMode mode = primary ?
+   REPLICATION_MODE_PRIMARY :
+   REPLICATION_MODE_SECONDARY;
+
+if (has_failover && enable) {
+error_setg(errp, "Parameter 'failover' is only for"
+   " stopping replication");
+return;
+}
+
+if (enable) {
+replication_start_all(mode, errp);
+} else {
+replication_stop_all(failover, failover ? NULL : errp);
+}
+}
+
  static void colo_send_message(QEMUFile *f, COLOMessage msg,
Error **errp)
  {
diff --git a/qapi-schema.json b/qapi-schema.json
index cbdffdd..9445b93 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -5906,6 +5906,31 @@
  { 'command': 'xen-load-devices-state', 'data': {'filename': 'str'} }

  ##
+# @xen-set-replication:
+#
+# Enable or disable replication.
+#
+# @enable: true to enable, false to disable.
+#
+# @primary: true for primary or false for secondary.
+#
+# @failover: #optional true to do failover, false to stop. but cannot be
+#specified if 'enable' is true. default value is false.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "xen-set-replication",
+#  "arguments": {"enable": true, "primary": false} }
+# <- { "return": {} }
+#
+# Since: 2.9
+##
+{ 'command': 'xen-set-replication',
+  'data': { 'enable': 'bool', 'primary': 'bool', '*failover' : 'bool' } }
+
+##
  # @GICCapability:
  #
  # The struct describes capability for a specific GIC (Generic

Re: [Qemu-devel] [PATCH 1/3] net/colo: fix memory double free error

2017-02-20 Thread Hailiang Zhang


On 2017/2/21 10:25, Zhang Chen wrote:



On 02/20/2017 04:01 PM, zhanghailiang wrote:

The 'primary_list' and 'secondary_list' members of struct Connection
is not allocated through dynamically g_queue_new(), but we free it by using
g_queue_free(), which will lead to a double-free bug.

Signed-off-by: zhanghailiang 
---
   net/colo.c | 2 --
   1 file changed, 2 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index 6a6eacd..7d5c423 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -147,9 +147,7 @@ void connection_destroy(void *opaque)
   Connection *conn = opaque;

   g_queue_foreach(>primary_list, packet_destroy, NULL);
-g_queue_free(>primary_list);
   g_queue_foreach(>secondary_list, packet_destroy, NULL);
-g_queue_free(>secondary_list);


I think we need use g_queue_clear () here.



Ha, you are right, my original modification will introduce memory leak.
Will fix in next version.


void
g_queue_clear (GQueue *queue);
Removes all the elements in queue . If queue elements contain
dynamically-allocated memory, they should be freed first.

Thanks
Zhang Chen


   g_slice_free(Connection, conn);
   }

Re: [Qemu-devel] [PATCH v2 3/4] char: remove the right fd been watched in qemu_chr_fe_set_handlers()

2017-02-16 Thread Hailiang Zhang


On 2017/2/16 21:04, Marc-André Lureau wrote:

Hi

On Thu, Feb 16, 2017 at 4:49 PM Hailiang Zhang <
zhang.zhanghaili...@huawei.com> wrote:


Hi,

On 2017/2/16 18:40, Marc-André Lureau wrote:

Hi

On Thu, Feb 16, 2017 at 10:08 AM zhanghailiang <
zhang.zhanghaili...@huawei.com> wrote:


We can call qemu_chr_fe_set_handlers() to add/remove fd been watched
in 'context' which can be either default main context or other explicit
context. But the original logic is not correct, we didn't remove
the right fd because we call g_main_context_find_source_by_id(NULL, tag)
which always try to find the Gsource from default context.

Fix it by passing the right context to

g_main_context_find_source_by_id().


Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Marc-André Lureau <marcandre.lur...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
   chardev/char-io.c | 13 +
   chardev/char-io.h |  2 ++
   chardev/char.c|  2 +-
   3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/chardev/char-io.c b/chardev/char-io.c
index 7dfc3f2..a69cc61 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -127,14 +127,14 @@ guint io_add_watch_poll(Chardev *chr,
   return tag;
   }

-static void io_remove_watch_poll(guint tag)
+static void io_remove_watch_poll(guint tag, GMainContext *context)
   {
   GSource *source;
   IOWatchPoll *iwp;

   g_return_if_fail(tag > 0);

-source = g_main_context_find_source_by_id(NULL, tag);
+source = g_main_context_find_source_by_id(context, tag);
   g_return_if_fail(source != NULL);

   iwp = io_watch_poll_from_source(source);
@@ -146,14 +146,19 @@ static void io_remove_watch_poll(guint tag)
   g_source_destroy(>parent);
   }

-void remove_fd_in_watch(Chardev *chr)
+void qemu_remove_fd_in_watch(Chardev *chr, GMainContext *context)
   {
   if (chr->fd_in_tag) {
-io_remove_watch_poll(chr->fd_in_tag);
+io_remove_watch_poll(chr->fd_in_tag, context);
   chr->fd_in_tag = 0;
   }
   }

+void remove_fd_in_watch(Chardev *chr)
+{
+qemu_remove_fd_in_watch(chr, NULL);
+}
+



I would just change the signature of remove_fd_in_watch() instead of
introducing a function.



In that case, i have to modify all the places call this helper,
About eleven places in six files，is it acceptable ？



Yes, that's fine.



OK, will fix it in v3, thanks.



Thanks.


   int io_channel_send_full(QIOChannel *ioc,

const void *buf, size_t len,
int *fds, size_t nfds)
diff --git a/chardev/char-io.h b/chardev/char-io.h
index d7ae5f1..117c888 100644
--- a/chardev/char-io.h
+++ b/chardev/char-io.h
@@ -38,6 +38,8 @@ guint io_add_watch_poll(Chardev *chr,

   void remove_fd_in_watch(Chardev *chr);

+void qemu_remove_fd_in_watch(Chardev *chr, GMainContext *context);
+
   int io_channel_send(QIOChannel *ioc, const void *buf, size_t len);

   int io_channel_send_full(QIOChannel *ioc, const void *buf, size_t len,
diff --git a/chardev/char.c b/chardev/char.c
index abd525f..5563375 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -560,7 +560,7 @@ void qemu_chr_fe_set_handlers(CharBackend *b,
   cc = CHARDEV_GET_CLASS(s);
   if (!opaque && !fd_can_read && !fd_read && !fd_event) {
   fe_open = 0;
-remove_fd_in_watch(s);
+qemu_remove_fd_in_watch(s, context);
   } else {
   fe_open = 1;
   }
--
1.8.3.1




otherwise, looks good to me



--

Marc-André Lureau

Re: [Qemu-devel] [PATCH v2 3/4] char: remove the right fd been watched in qemu_chr_fe_set_handlers()

2017-02-16 Thread Hailiang Zhang


Hi,

On 2017/2/16 18:40, Marc-André Lureau wrote:

Hi

On Thu, Feb 16, 2017 at 10:08 AM zhanghailiang <
zhang.zhanghaili...@huawei.com> wrote:


We can call qemu_chr_fe_set_handlers() to add/remove fd been watched
in 'context' which can be either default main context or other explicit
context. But the original logic is not correct, we didn't remove
the right fd because we call g_main_context_find_source_by_id(NULL, tag)
which always try to find the Gsource from default context.

Fix it by passing the right context to g_main_context_find_source_by_id().

Cc: Paolo Bonzini 
Cc: Marc-André Lureau 
Signed-off-by: zhanghailiang 
---
  chardev/char-io.c | 13 +
  chardev/char-io.h |  2 ++
  chardev/char.c|  2 +-
  3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/chardev/char-io.c b/chardev/char-io.c
index 7dfc3f2..a69cc61 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -127,14 +127,14 @@ guint io_add_watch_poll(Chardev *chr,
  return tag;
  }

-static void io_remove_watch_poll(guint tag)
+static void io_remove_watch_poll(guint tag, GMainContext *context)
  {
  GSource *source;
  IOWatchPoll *iwp;

  g_return_if_fail(tag > 0);

-source = g_main_context_find_source_by_id(NULL, tag);
+source = g_main_context_find_source_by_id(context, tag);
  g_return_if_fail(source != NULL);

  iwp = io_watch_poll_from_source(source);
@@ -146,14 +146,19 @@ static void io_remove_watch_poll(guint tag)
  g_source_destroy(>parent);
  }

-void remove_fd_in_watch(Chardev *chr)
+void qemu_remove_fd_in_watch(Chardev *chr, GMainContext *context)
  {
  if (chr->fd_in_tag) {
-io_remove_watch_poll(chr->fd_in_tag);
+io_remove_watch_poll(chr->fd_in_tag, context);
  chr->fd_in_tag = 0;
  }
  }

+void remove_fd_in_watch(Chardev *chr)
+{
+qemu_remove_fd_in_watch(chr, NULL);
+}
+



I would just change the signature of remove_fd_in_watch() instead of
introducing a function.



In that case, i have to modify all the places call this helper,
About eleven places in six files，is it acceptable ？

Thanks.


  int io_channel_send_full(QIOChannel *ioc,

   const void *buf, size_t len,
   int *fds, size_t nfds)
diff --git a/chardev/char-io.h b/chardev/char-io.h
index d7ae5f1..117c888 100644
--- a/chardev/char-io.h
+++ b/chardev/char-io.h
@@ -38,6 +38,8 @@ guint io_add_watch_poll(Chardev *chr,

  void remove_fd_in_watch(Chardev *chr);

+void qemu_remove_fd_in_watch(Chardev *chr, GMainContext *context);
+
  int io_channel_send(QIOChannel *ioc, const void *buf, size_t len);

  int io_channel_send_full(QIOChannel *ioc, const void *buf, size_t len,
diff --git a/chardev/char.c b/chardev/char.c
index abd525f..5563375 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -560,7 +560,7 @@ void qemu_chr_fe_set_handlers(CharBackend *b,
  cc = CHARDEV_GET_CLASS(s);
  if (!opaque && !fd_can_read && !fd_read && !fd_event) {
  fe_open = 0;
-remove_fd_in_watch(s);
+qemu_remove_fd_in_watch(s, context);
  } else {
  fe_open = 1;
  }
--
1.8.3.1




otherwise, looks good to me

Re: [Qemu-devel] [PATCH 3/5] colo-compare: release all unhandled packets in finalize function

2017-02-15 Thread Hailiang Zhang



On 2017/2/16 10:27, Zhang Chen wrote:



On 02/15/2017 04:34 PM, zhanghailiang wrote:

We should release all unhandled packets before finalize colo compare.
Besides, we need to free connection_track_table, or there will be
a memory leak bug.

Signed-off-by: zhanghailiang
---
   net/colo-compare.c | 20 
   1 file changed, 20 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index a16e2d5..809bad3 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -676,6 +676,23 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
   return;
   }



This function in my patch "colo-compare and filter-rewriter work with
colo-frame "
Named 'colo_flush_connection', I think use 'flush' instead of 'release'
is better,



OK, i will fix it in next version, thanks.


Thanks
Zhang Chen



+static void colo_release_packets(void *opaque, void *user_data)
+{
+CompareState *s = user_data;
+Connection *conn = opaque;
+Packet *pkt = NULL;
+
+while (!g_queue_is_empty(>primary_list)) {
+pkt = g_queue_pop_head(>primary_list);
+compare_chr_send(>chr_out, pkt->data, pkt->size);
+packet_destroy(pkt, NULL);
+}
+while (!g_queue_is_empty(>secondary_list)) {
+pkt = g_queue_pop_head(>secondary_list);
+packet_destroy(pkt, NULL);
+}
+}
+
   static void colo_compare_class_init(ObjectClass *oc, void *data)
   {
   UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
@@ -707,9 +724,12 @@ static void colo_compare_finalize(Object *obj)
   g_main_loop_quit(s->compare_loop);
   qemu_thread_join(>thread);

+/* Release all unhandled packets after compare thead exited */
+g_queue_foreach(>conn_list, colo_release_packets, s);

   g_queue_free(>conn_list);

+g_hash_table_destroy(s->connection_track_table);
   g_free(s->pri_indev);
   g_free(s->sec_indev);
   g_free(s->outdev);

Re: [Qemu-devel] [PATCH 2/5] colo-compare: kick compare thread to exit while finalize

2017-02-15 Thread Hailiang Zhang


On 2017/2/16 10:25, Zhang Chen wrote:



On 02/15/2017 04:34 PM, zhanghailiang wrote:

We should call g_main_loop_quit() to notify colo compare thread to
exit, Or it will run in g_main_loop_run() forever.

Besides, the finalizing process can't happen in context of colo thread,
it is reasonable to remove the 'if (qemu_thread_is_self(>thread))'
branch.

Signed-off-by: zhanghailiang 
---
   net/colo-compare.c | 19 +--
   1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index fdde788..a16e2d5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -83,6 +83,8 @@ typedef struct CompareState {
   GHashTable *connection_track_table;
   /* compare thread, a thread for each NIC */
   QemuThread thread;
+
+GMainLoop *compare_loop;
   } CompareState;

   typedef struct CompareClass {
@@ -496,7 +498,6 @@ static gboolean check_old_packet_regular(void *opaque)
   static void *colo_compare_thread(void *opaque)
   {
   GMainContext *worker_context;
-GMainLoop *compare_loop;
   CompareState *s = opaque;
   GSource *timeout_source;

@@ -507,7 +508,7 @@ static void *colo_compare_thread(void *opaque)
   qemu_chr_fe_set_handlers(>chr_sec_in, compare_chr_can_read,
compare_sec_chr_in, NULL, s, worker_context, 
true);

-compare_loop = g_main_loop_new(worker_context, FALSE);
+s->compare_loop = g_main_loop_new(worker_context, FALSE);

   /* To kick any packets that the secondary doesn't match */
   timeout_source = g_timeout_source_new(REGULAR_PACKET_CHECK_MS);
@@ -515,10 +516,10 @@ static void *colo_compare_thread(void *opaque)
 (GSourceFunc)check_old_packet_regular, s, NULL);
   g_source_attach(timeout_source, worker_context);

-g_main_loop_run(compare_loop);
+g_main_loop_run(s->compare_loop);

   g_source_unref(timeout_source);
-g_main_loop_unref(compare_loop);
+g_main_loop_unref(s->compare_loop);
   g_main_context_unref(worker_context);
   return NULL;
   }
@@ -703,13 +704,11 @@ static void colo_compare_finalize(Object *obj)
   qemu_chr_fe_deinit(>chr_sec_in);
   qemu_chr_fe_deinit(>chr_out);

-g_queue_free(>conn_list);
+g_main_loop_quit(s->compare_loop);
+qemu_thread_join(>thread);

-if (qemu_thread_is_self(>thread)) {
-/* compare connection */
-g_queue_foreach(>conn_list, colo_compare_connection, s);
-qemu_thread_join(>thread);
-}


Before free the 's->conn_list', you should flush all queued primary packets
and release all queued secondary packets here, so combine this patch
with 3/5 patch as
one patch is a better choose.



Make sense, will fix it in next version, thanks.


Thanks
Zhang Chen


+
+g_queue_free(>conn_list);

   g_free(s->pri_indev);
   g_free(s->sec_indev);

Re: [Qemu-devel] [PATCH 3/5] colo-compare: release all unhandled packets in finalize function

2017-02-15 Thread Hailiang Zhang


On 2017/2/16 10:34, Jason Wang wrote:



On 2017年02月15日 16:34, zhanghailiang wrote:

We should release all unhandled packets before finalize colo compare.
Besides, we need to free connection_track_table, or there will be
a memory leak bug.

Signed-off-by: zhanghailiang 
---
   net/colo-compare.c | 20 
   1 file changed, 20 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index a16e2d5..809bad3 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -676,6 +676,23 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
   return;
   }

+static void colo_release_packets(void *opaque, void *user_data)
+{
+CompareState *s = user_data;
+Connection *conn = opaque;
+Packet *pkt = NULL;
+
+while (!g_queue_is_empty(>primary_list)) {
+pkt = g_queue_pop_head(>primary_list);
+compare_chr_send(>chr_out, pkt->data, pkt->size);


Any reason to send packets here?



Yes, considering the usage case which we shut COLO for
the VM to make it as a normal VM without FT.
We need to remove all the filter objects. In this case,
IMHO, it is necessary to release the unhandled packets.


Thanks.


Thanks


+packet_destroy(pkt, NULL);
+}
+while (!g_queue_is_empty(>secondary_list)) {
+pkt = g_queue_pop_head(>secondary_list);
+packet_destroy(pkt, NULL);
+}
+}
+
   static void colo_compare_class_init(ObjectClass *oc, void *data)
   {
   UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
@@ -707,9 +724,12 @@ static void colo_compare_finalize(Object *obj)
   g_main_loop_quit(s->compare_loop);
   qemu_thread_join(>thread);

+/* Release all unhandled packets after compare thead exited */
+g_queue_foreach(>conn_list, colo_release_packets, s);

   g_queue_free(>conn_list);

+g_hash_table_destroy(s->connection_track_table);
   g_free(s->pri_indev);
   g_free(s->sec_indev);
   g_free(s->outdev);



.

Re: [Qemu-devel] [PATCH 0/3] COLO: fix some bugs

2017-02-13 Thread Hailiang Zhang


On 2017/2/10 23:44, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

Hi,

This series fix three bugs of COLO.
patch 1 fix one usage case which users want to change checkpoint-delay
with an extream big value set before, the new value may not take effect
until finishing a long time of sleep.

Patch 2 and 3 are old patches that split from previous version
long time ago, and has been reviewed by Dave.

I'd like to pick these three patches from the later COLO series,
which will include some optimization and integrating with block
replication and COLO net proxy.

Please review, thanks.


Queued.
I've added the '(Since 2.9)' that Eric asked for.



Thank you very much.
By the way, as we discussed in patch 1, should i add the codes in patch 1
which kick the colo_checkpoint_sem to quick the colo thread exiting ?
In next version ? Or post a new patch ?

Hailiang



Dave




zhanghailiang (3):
   COLO: fix setting checkpoint-delay not working properly
   COLO: Shutdown related socket fd while do failover
   COLO: Don't process failover request while loading VM's state

  include/migration/colo.h  |   2 +
  include/migration/migration.h |   8 
  migration/colo.c  | 102 +-
  migration/migration.c |   3 ++
  qapi-schema.json  |   4 +-
  5 files changed, 108 insertions(+), 11 deletions(-)

--
1.8.3.1




--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.

Re: [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover

2017-02-12 Thread Hailiang Zhang


On 2017/2/9 3:53, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

On 2017/1/18 19:01, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

If the net connection between primary host and secondary host breaks
while COLO/COLO incoming threads are doing read() or write().
It will block until connection is timeout, and the failover process
will be blocked because of it.

So it is necessary to shutdown all the socket fds used by COLO
to avoid this situation. Besides, we should close the corresponding
file descriptors after failvoer BH shutdown them,
Or there will be an error.


Hi,
There are two parts to this patch:
 a) Add some semaphores to sequence failover
 b) Use shutdown()

At first I wondered if perhaps they should be split; but I see
the reason for the semaphores is mostly to stop the race between
the fd's getting closed and the shutdown() calls; so I think it's
OK.



Hi,

Yes, you are right, maybe i should add some comments about that.
Will do in next version.


Do you have any problems with these semaphores during powering off the
guest?



No, we didn't encounter any problems or trigger any bugs in our test
with this semaphores. In what places do you doubt it may has problems ? :)


I just wondered about other exit cases other than failover; e.g. what
if the guest shutdown or something like that, would it get stuck
waiting for the colo_incoming_sem.



Sorry for the late reply, too busy with our project these days ...

Your worry makes sense, we have processed the shutdown case specially,
Let qemu does a checkpoint process (It seems that, we should kick colo
thread which might be waiting for colo_checkpoint_sem.) before exit colo therad.
And for the secondary sides, if it receives shutdown request, it will exit
colo incoming thread after some cleanup works.


Thanks.
Hailiang



Dave


Thanks,
Hailiang


Dave





Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
   include/migration/migration.h |  3 +++
   migration/colo.c  | 43 
+++
   2 files changed, 46 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 487ac1e..7cac877 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -113,6 +113,7 @@ struct MigrationIncomingState {
   QemuThread colo_incoming_thread;
   /* The coroutine we should enter (back) after failover */
   Coroutine *migration_incoming_co;
+QemuSemaphore colo_incoming_sem;

   /* See savevm.c */
   LoadStateEntry_Head loadvm_handlers;
@@ -182,6 +183,8 @@ struct MigrationState
   QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
src_page_requests;
   /* The RAMBlock used in the last src_page_request */
   RAMBlock *last_req_rb;
+/* The semaphore is used to notify COLO thread that failover is finished */
+QemuSemaphore colo_exit_sem;

   /* The semaphore is used to notify COLO thread to do checkpoint */
   QemuSemaphore colo_checkpoint_sem;
diff --git a/migration/colo.c b/migration/colo.c
index 08b2e46..3222812 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
   /* recover runstate to normal migration finish state */
   autostart = true;
   }
+/*
+ * Make sure COLO incoming thread not block in recv or send,
+ * If mis->from_src_file and mis->to_src_file use the same fd,
+ * The second shutdown() will return -1, we ignore this value,
+ * It is harmless.
+ */
+if (mis->from_src_file) {
+qemu_file_shutdown(mis->from_src_file);
+}
+if (mis->to_src_file) {
+qemu_file_shutdown(mis->to_src_file);
+}

   old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
  FAILOVER_STATUS_COMPLETED);
@@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
"secondary VM", FailoverStatus_lookup[old_state]);
   return;
   }
+/* Notify COLO incoming thread that failover work is finished */
+qemu_sem_post(>colo_incoming_sem);
   /* For Secondary VM, jump to incoming co */
   if (mis->migration_incoming_co) {
   qemu_coroutine_enter(mis->migration_incoming_co);
@@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
   migrate_set_state(>state, MIGRATION_STATUS_COLO,
 MIGRATION_STATUS_COMPLETED);

+/*
+ * Wake up COLO thread which may blocked in recv() or send(),
+ * The s->rp_state.from_dst_file and s->to_dst_file may use the
+ * same fd, but we still shutdown the fd for twice, it is harmless.
+ */

Re: [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly

2017-02-08 Thread Hailiang Zhang


On 2017/2/8 18:38, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

If we set checkpoint-delay through command 'migrate-set-parameters',
It will not take effect until we finish last sleep chekpoint-delay,
That's will be offensive espeically when we want to change its value
from an extreme big one to a proper value.

Fix it by using timer to realize checkpoint-delay.


Yes, I think this works; you only kick the timer in COLO state,
and you create the timer before going into COLO state and clean it up
after we leave, so it feels safe.

Are you also going to kick this semaphore when doing a failover to
cause this thread to exit quickly?



Ha, good suggestion! I think it will work.
Accepted, thanks very much.




Signed-off-by: zhanghailiang 


Reviewed-by: Dr. David Alan Gilbert 

Dave

---
  include/migration/colo.h  |  2 ++
  include/migration/migration.h |  5 +
  migration/colo.c  | 33 +++--
  migration/migration.c |  3 +++
  4 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index e32eef4..2bbff9e 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -35,4 +35,6 @@ COLOMode get_colo_mode(void);

  /* failover */
  void colo_do_failover(MigrationState *s);
+
+void colo_checkpoint_notify(void *opaque);
  #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index c309d23..487ac1e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -183,6 +183,11 @@ struct MigrationState
  /* The RAMBlock used in the last src_page_request */
  RAMBlock *last_req_rb;

+/* The semaphore is used to notify COLO thread to do checkpoint */
+QemuSemaphore colo_checkpoint_sem;
+int64_t colo_checkpoint_time;
+QEMUTimer *colo_delay_timer;
+
  /* The last error that occurred */
  Error *error;
  };
diff --git a/migration/colo.c b/migration/colo.c
index 93c85c5..08b2e46 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -302,7 +302,7 @@ static void colo_process_checkpoint(MigrationState *s)
  {
  QIOChannelBuffer *bioc;
  QEMUFile *fb = NULL;
-int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
  Error *local_err = NULL;
  int ret;

@@ -332,26 +332,21 @@ static void colo_process_checkpoint(MigrationState *s)
  qemu_mutex_unlock_iothread();
  trace_colo_vm_state_change("stop", "run");

+timer_mod(s->colo_delay_timer,
+current_time + s->parameters.x_checkpoint_delay);
+
  while (s->state == MIGRATION_STATUS_COLO) {
  if (failover_get_state() != FAILOVER_STATUS_NONE) {
  error_report("failover request");
  goto out;
  }

-current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-if (current_time - checkpoint_time <
-s->parameters.x_checkpoint_delay) {
-int64_t delay_ms;
+qemu_sem_wait(>colo_checkpoint_sem);

-delay_ms = s->parameters.x_checkpoint_delay -
-   (current_time - checkpoint_time);
-g_usleep(delay_ms * 1000);
-}
  ret = colo_do_checkpoint_transaction(s, bioc, fb);
  if (ret < 0) {
  goto out;
  }
-checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
  }

  out:
@@ -364,14 +359,32 @@ out:
  qemu_fclose(fb);
  }

+timer_del(s->colo_delay_timer);
+
  if (s->rp_state.from_dst_file) {
  qemu_fclose(s->rp_state.from_dst_file);
  }
  }

+void colo_checkpoint_notify(void *opaque)
+{
+MigrationState *s = opaque;
+int64_t next_notify_time;
+
+qemu_sem_post(>colo_checkpoint_sem);
+s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+next_notify_time = s->colo_checkpoint_time +
+s->parameters.x_checkpoint_delay;
+timer_mod(s->colo_delay_timer, next_notify_time);
+}
+
  void migrate_start_colo_process(MigrationState *s)
  {
  qemu_mutex_unlock_iothread();
+qemu_sem_init(>colo_checkpoint_sem, 0);
+s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
+colo_checkpoint_notify, s);
+
  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_COLO);
  colo_process_checkpoint(s);
diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..a4861da 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -895,6 +895,9 @@ void qmp_migrate_set_parameters(MigrationParameters 
*params, Error **errp)

  if (params->has_x_checkpoint_delay) {
  s->parameters.x_checkpoint_delay = params->x_checkpoint_delay;
+if (migration_in_colo_state()) {
+colo_checkpoint_notify(s);
+}
  }
  }

--

Re: [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover

2017-02-08 Thread Hailiang Zhang


On 2017/1/18 19:01, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

If the net connection between primary host and secondary host breaks
while COLO/COLO incoming threads are doing read() or write().
It will block until connection is timeout, and the failover process
will be blocked because of it.

So it is necessary to shutdown all the socket fds used by COLO
to avoid this situation. Besides, we should close the corresponding
file descriptors after failvoer BH shutdown them,
Or there will be an error.


Hi,
   There are two parts to this patch:
a) Add some semaphores to sequence failover
b) Use shutdown()

   At first I wondered if perhaps they should be split; but I see
the reason for the semaphores is mostly to stop the race between
the fd's getting closed and the shutdown() calls; so I think it's
OK.



Hi,

Yes, you are right, maybe i should add some comments about that.
Will do in next version.


Do you have any problems with these semaphores during powering off the
guest?



No, we didn't encounter any problems or trigger any bugs in our test
with this semaphores. In what places do you doubt it may has problems ? :)

Thanks,
Hailiang


Dave





Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
Cc: Dr. David Alan Gilbert 
---
  include/migration/migration.h |  3 +++
  migration/colo.c  | 43 +++
  2 files changed, 46 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 487ac1e..7cac877 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -113,6 +113,7 @@ struct MigrationIncomingState {
  QemuThread colo_incoming_thread;
  /* The coroutine we should enter (back) after failover */
  Coroutine *migration_incoming_co;
+QemuSemaphore colo_incoming_sem;

  /* See savevm.c */
  LoadStateEntry_Head loadvm_handlers;
@@ -182,6 +183,8 @@ struct MigrationState
  QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
src_page_requests;
  /* The RAMBlock used in the last src_page_request */
  RAMBlock *last_req_rb;
+/* The semaphore is used to notify COLO thread that failover is finished */
+QemuSemaphore colo_exit_sem;

  /* The semaphore is used to notify COLO thread to do checkpoint */
  QemuSemaphore colo_checkpoint_sem;
diff --git a/migration/colo.c b/migration/colo.c
index 08b2e46..3222812 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
  /* recover runstate to normal migration finish state */
  autostart = true;
  }
+/*
+ * Make sure COLO incoming thread not block in recv or send,
+ * If mis->from_src_file and mis->to_src_file use the same fd,
+ * The second shutdown() will return -1, we ignore this value,
+ * It is harmless.
+ */
+if (mis->from_src_file) {
+qemu_file_shutdown(mis->from_src_file);
+}
+if (mis->to_src_file) {
+qemu_file_shutdown(mis->to_src_file);
+}

  old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
 FAILOVER_STATUS_COMPLETED);
@@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
   "secondary VM", FailoverStatus_lookup[old_state]);
  return;
  }
+/* Notify COLO incoming thread that failover work is finished */
+qemu_sem_post(>colo_incoming_sem);
  /* For Secondary VM, jump to incoming co */
  if (mis->migration_incoming_co) {
  qemu_coroutine_enter(mis->migration_incoming_co);
@@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
  migrate_set_state(>state, MIGRATION_STATUS_COLO,
MIGRATION_STATUS_COMPLETED);

+/*
+ * Wake up COLO thread which may blocked in recv() or send(),
+ * The s->rp_state.from_dst_file and s->to_dst_file may use the
+ * same fd, but we still shutdown the fd for twice, it is harmless.
+ */
+if (s->to_dst_file) {
+qemu_file_shutdown(s->to_dst_file);
+}
+if (s->rp_state.from_dst_file) {
+qemu_file_shutdown(s->rp_state.from_dst_file);
+}
+
  old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
 FAILOVER_STATUS_COMPLETED);
  if (old_state != FAILOVER_STATUS_ACTIVE) {
@@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
   FailoverStatus_lookup[old_state]);
  return;
  }
+/* Notify COLO thread that failover work is finished */
+qemu_sem_post(>colo_exit_sem);
  }

  void colo_do_failover(MigrationState *s)
@@ -361,6 +389,14 @@ out:

  timer_del(s->colo_delay_timer);

+/* Hope this not to be too long to wait here */
+qemu_sem_wait(>colo_exit_sem);
+

Re: [Qemu-devel] [libvirt] [Block Replication] Question about supporting COLO in libvirt

2017-02-08 Thread Hailiang Zhang


On 2017/2/6 20:39, Daniel P. Berrange wrote:

On Mon, Feb 06, 2017 at 08:34:28PM +0800, Hailiang Zhang wrote:

Hi,
I'm trying to implement supporting COLO in libvirt,
But i found an annoying problem that libvirt does not
support the command line option argument syntax we used
for block replication in QEMU.

That is libvirt does not support the bellow syntax for block:
-drive driver=qcow2,file.filename=test:a.qcow2
-drive file=test.qcow2,\
backing.file.filename=/dev/fdset/1

It seems to be a new syntax that libvirt does not support
thought it has been exist in QEMU for a time.
I found some introductions from
http://www.linux-kvm.org/images/3/34/Kvm-forum-2013-block-dev-configuration.pdf

The command line we use for COLO is just like the above syntax,
For example, for the shared disk in COLO, it is:
  -drive 
if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
 backing.driver=raw,backing.file.filename=1.raw \
  -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
 file.driver=qcow2,top-id=active-disk0,\
 file.file.filename=/mnt/ramfs/active_disk.img,\
 file.backing=hidden_disk0,shared-disk=on

For the none-shared disk in COLO, it is quite same with the shared-disk:
   -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
   -drive if=xxx,id=topxxx,driver=replication,mode=secondary,top-id=topxxx\
  file.file.filename=active_disk.qcow2,\
  file.driver=qcow2,\
  file.backing.file.filename=hidden_disk.qcow2,\
  file.backing.driver=qcow2,\
  file.backing.backing=colo1

So there seems to be two ways to solve this problem.

One is to support this new option argument syntax in libvirt,
but I'm not sure if it is difficult or not to implement it,
and i don't know where to start either.


Libvirt has to start supporting this new syntax. It is required for
many different use cases beyond just colo. For example, to be able
to explicitly given qemu details about a backing file format to
remove probing, or to be able to set LUKS passwords on backing files,
and more beside



Thanks for your quick reply, do you or other developers in libvirt
have plan to implement it ?

Thanks,
Hailiang



Regards,
Daniel

Re: [Qemu-devel] [PATCH 1/3] colo-compare: reconstruct the mutex lock usage

2017-02-07 Thread Hailiang Zhang


On 2017/2/7 17:21, Jason Wang wrote:



On 2017年02月07日 16:19, Hailiang Zhang wrote:

On 2017/2/7 15:57, Jason Wang wrote:



On 2017年02月07日 15:54, Hailiang Zhang wrote:

Hi Jason,

On 2017/2/6 20:53, Jason Wang wrote:



On 2017年02月06日 19:11, Hailiang Zhang wrote:

On 2017/2/6 17:35, Jason Wang wrote:



On 2017年02月06日 16:13, Hailiang Zhang wrote:

On 2017/2/3 11:47, Jason Wang wrote:



On 2017年01月24日 22:05, zhanghailiang wrote:

The original 'timer_check_lock' mutex lock of struct CompareState
is used to protect the 'conn_list' queue and its child queues
which
are 'primary_list' and 'secondary_list', which is a little abused
and confusing

To make it clearer, we rename 'timer_check_lock' to
'conn_list_lock'
which is used to protect 'conn_list' queue, use another
'conn_lock'
to protect 'primary_list' and 'secondary_list'.

Besides, fix some missing places which need these mutex lock.

Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com>


Instead of sticking to such kind of mutex, I think it's time to
make
colo timer run in colo thread (there's a TODO in the code).



Er, it seems that, we still need these mutex locks even we make
colo
timer and colo thread run in the same thread, because we may access
the connect/primary/secondary list from colo (migratioin) thread
concurrently.


Just make sure I understand the issue, why need it access the list?



Besides, it seems to be a little complex to make colo timer run in
colo
compare thread, and it is not this series' goal.


Seems not by just looking at how it was implemented in main loop,
but
maybe I was wrong.


This series is preparing
work for integrating COLO compare with COLO frame and it is
prerequisite.

So, we may consider implementing it later in another series.
Zhang Chen, what's your opinion ?


The problem is this patch make things even worse, it introduces one
more
mutex.



Hmm, for most cases, we need to get these two locks at the same time.
We can use one lock to protect conn_list/primary_list/secondary_list,
(We need to get this lock before operate all these lists, as you can
see
in patch 2, while do checkpoint, we may operate these lists in
COLO checkpoint thread concurrently.)

But for the original codes, we didn't got timer_check_lock in
packet_enqueue() while operate conn_list/primary_list/secondary_list,
and didn't got this lock in colo_compare_connection while operate
secondary_list either.

So, is it OK to use the conn_lock instead of timer_check_lock, and
add the lock where it is need ?


I'd like to know if timer were run in colo thread (this looks as
simple
as a g_timeout_source_new() followed by a g_source_attach()), why
do we
still need a mutex. And if we need it now but plan to remove it in the
future, I'd like to use big lock to simplify the codes.



After investigated your above suggestion, I think it works by using
g_timeout_source_new() and g_source_attach(), but I'm not sure
if it is a good idea to use the big lock to protect the possible
concurrent cases which seem to only happen between COLO migration
thread and COLO compare thread, any further suggestions ?


I think I need first understand why migration thread need to access the
list?



Er, sorry to confuse you here, to be exactly, it is COLO checkpoint
thread,
we reuse the migration thread to realize checkpoint process for COLO,
Because qemu enters into COLO state after a complete migration, so we
reuse it.

While do checkpoint, we need to release all the buffered packets that
have not
yet been compared, so we need to access the list in COLO checkpoint
thread.


I think the better way is notify the comparing thread and let it do the
releasing. You probably need similar mechanism to notify from comparing
thread to checkpoint thread.



Hmm, that's really a good idea, it can avoid using the mutex lock,
I'll try to implement it in next version, thanks very much. :)


Thanks



Thanks.



.

Re: [Qemu-devel] [PATCH 1/3] colo-compare: reconstruct the mutex lock usage

2017-02-07 Thread Hailiang Zhang


On 2017/2/7 15:57, Jason Wang wrote:



On 2017年02月07日 15:54, Hailiang Zhang wrote:

Hi Jason,

On 2017/2/6 20:53, Jason Wang wrote:



On 2017年02月06日 19:11, Hailiang Zhang wrote:

On 2017/2/6 17:35, Jason Wang wrote:



On 2017年02月06日 16:13, Hailiang Zhang wrote:

On 2017/2/3 11:47, Jason Wang wrote:



On 2017年01月24日 22:05, zhanghailiang wrote:

The original 'timer_check_lock' mutex lock of struct CompareState
is used to protect the 'conn_list' queue and its child queues which
are 'primary_list' and 'secondary_list', which is a little abused
and confusing

To make it clearer, we rename 'timer_check_lock' to
'conn_list_lock'
which is used to protect 'conn_list' queue, use another 'conn_lock'
to protect 'primary_list' and 'secondary_list'.

Besides, fix some missing places which need these mutex lock.

Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com>


Instead of sticking to such kind of mutex, I think it's time to make
colo timer run in colo thread (there's a TODO in the code).



Er, it seems that, we still need these mutex locks even we make colo
timer and colo thread run in the same thread, because we may access
the connect/primary/secondary list from colo (migratioin) thread
concurrently.


Just make sure I understand the issue, why need it access the list?



Besides, it seems to be a little complex to make colo timer run in
colo
compare thread, and it is not this series' goal.


Seems not by just looking at how it was implemented in main loop, but
maybe I was wrong.


This series is preparing
work for integrating COLO compare with COLO frame and it is
prerequisite.

So, we may consider implementing it later in another series.
Zhang Chen, what's your opinion ?


The problem is this patch make things even worse, it introduces one
more
mutex.



Hmm, for most cases, we need to get these two locks at the same time.
We can use one lock to protect conn_list/primary_list/secondary_list,
(We need to get this lock before operate all these lists, as you can
see
in patch 2, while do checkpoint, we may operate these lists in
COLO checkpoint thread concurrently.)

But for the original codes, we didn't got timer_check_lock in
packet_enqueue() while operate conn_list/primary_list/secondary_list,
and didn't got this lock in colo_compare_connection while operate
secondary_list either.

So, is it OK to use the conn_lock instead of timer_check_lock, and
add the lock where it is need ?


I'd like to know if timer were run in colo thread (this looks as simple
as a g_timeout_source_new() followed by a g_source_attach()), why do we
still need a mutex. And if we need it now but plan to remove it in the
future, I'd like to use big lock to simplify the codes.



After investigated your above suggestion, I think it works by using
g_timeout_source_new() and g_source_attach(), but I'm not sure
if it is a good idea to use the big lock to protect the possible
concurrent cases which seem to only happen between COLO migration
thread and COLO compare thread, any further suggestions ?


I think I need first understand why migration thread need to access the
list?



Er, sorry to confuse you here, to be exactly, it is COLO checkpoint thread,
we reuse the migration thread to realize checkpoint process for COLO,
Because qemu enters into COLO state after a complete migration, so we reuse it.

While do checkpoint, we need to release all the buffered packets that have not
yet been compared, so we need to access the list in COLO checkpoint thread.

Thanks.


Thanks



Thanks,
Hailiang



Thanks



Thanks.
Hailiang


Thanks




Thanks,
Hailiang


Thought?

Thanks

.






.






.






.

Re: [Qemu-devel] [PATCH 1/3] colo-compare: reconstruct the mutex lock usage

2017-02-06 Thread Hailiang Zhang


Hi Jason,

On 2017/2/6 20:53, Jason Wang wrote:



On 2017年02月06日 19:11, Hailiang Zhang wrote:

On 2017/2/6 17:35, Jason Wang wrote:



On 2017年02月06日 16:13, Hailiang Zhang wrote:

On 2017/2/3 11:47, Jason Wang wrote:



On 2017年01月24日 22:05, zhanghailiang wrote:

The original 'timer_check_lock' mutex lock of struct CompareState
is used to protect the 'conn_list' queue and its child queues which
are 'primary_list' and 'secondary_list', which is a little abused
and confusing

To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock'
which is used to protect 'conn_list' queue, use another 'conn_lock'
to protect 'primary_list' and 'secondary_list'.

Besides, fix some missing places which need these mutex lock.

Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com>


Instead of sticking to such kind of mutex, I think it's time to make
colo timer run in colo thread (there's a TODO in the code).



Er, it seems that, we still need these mutex locks even we make colo
timer and colo thread run in the same thread, because we may access
the connect/primary/secondary list from colo (migratioin) thread
concurrently.


Just make sure I understand the issue, why need it access the list?



Besides, it seems to be a little complex to make colo timer run in colo
compare thread, and it is not this series' goal.


Seems not by just looking at how it was implemented in main loop, but
maybe I was wrong.


This series is preparing
work for integrating COLO compare with COLO frame and it is
prerequisite.

So, we may consider implementing it later in another series.
Zhang Chen, what's your opinion ?


The problem is this patch make things even worse, it introduces one more
mutex.



Hmm, for most cases, we need to get these two locks at the same time.
We can use one lock to protect conn_list/primary_list/secondary_list,
(We need to get this lock before operate all these lists, as you can see
in patch 2, while do checkpoint, we may operate these lists in
COLO checkpoint thread concurrently.)

But for the original codes, we didn't got timer_check_lock in
packet_enqueue() while operate conn_list/primary_list/secondary_list,
and didn't got this lock in colo_compare_connection while operate
secondary_list either.

So, is it OK to use the conn_lock instead of timer_check_lock, and
add the lock where it is need ?


I'd like to know if timer were run in colo thread (this looks as simple
as a g_timeout_source_new() followed by a g_source_attach()), why do we
still need a mutex. And if we need it now but plan to remove it in the
future, I'd like to use big lock to simplify the codes.



After investigated your above suggestion, I think it works by using
g_timeout_source_new() and g_source_attach(), but I'm not sure
if it is a good idea to use the big lock to protect the possible
concurrent cases which seem to only happen between COLO migration
thread and COLO compare thread, any further suggestions ?

Thanks,
Hailiang



Thanks



Thanks.
Hailiang


Thanks




Thanks,
Hailiang


Thought?

Thanks

.






.






.

1 2 3 4 >

1 - 100 of 349 matches

Mail list logo