[Qemu-devel] (no subject)

2018-07-22 Thread Liujinsong (Paul)




Re: [Qemu-devel] [PATCH] block: Don't lock /dev/null and /dev/zero automatically

2018-07-22 Thread Max Reitz
On 2018-07-22 04:37, Fam Zheng wrote:
> On Sun, Jul 22, 2018 at 5:08 AM Max Reitz  wrote:
>>
>> On 2018-07-19 05:41, Fam Zheng wrote:
>>> On my Fedora 28, /dev/null is locked by some other process (couldn't
>>> inspect it due to the current lslocks limitation), so iotests 226 fails
>>> with some unexpected image locking errors because it uses qemu-io to
>>> open it.
>>>
>>> Actually it's safe to not use any lock on /dev/null or /dev/zero.
>>>
>>> Signed-off-by: Fam Zheng 
>>> ---
>>>  block/file-posix.c | 7 ++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index 60af4b3d51..8bf034108a 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -503,7 +503,12 @@ static int raw_open_common(BlockDriverState *bs, QDict 
>>> *options,
>>>  s->use_lock = false;
>>>  break;
>>>  case ON_OFF_AUTO_AUTO:
>>> -s->use_lock = qemu_has_ofd_lock();
>>> +if (!strcmp(filename, "/dev/null") ||
>>> +   !strcmp(filename, "/dev/zero")) {
>>
>> I’m not sure I like a strcmp() based on filename (though it should work
>> for all of the cases where someone would want to specify either of those
>> for a qemu block device).  Isn’t there some specific major/minor number
>> we can use to check for these two files?  Or are those Linux-specific?
> 
> Yeah, I guess major/minor numbers are Linux-specific.
> 
>>
>> I could also imagine just not locking any host_device, but since people
>> do use qcow2 immediately on block devices, maybe we do want to lock them.
> 
> That's right.
> 
>>
>> Finally, if really all you are trying to do is to make test code easier,
>> then I don’t know exactly why.  Just disabling locking in 226 shouldn’t
>> be too hard.
> 
> The tricky thing is in remembering to do that for future test cases.
> My machine seems to be somehow different than John's so that my 226
> cannot lock /dev/null, but I'm not sure that is the case for other
> releases, distros or system instances.

Usually we don’t need to use /dev/null, though, because null-co:// is
better suited for most purposes.  This is a very specific test of host
devices.

Maybe we should start a doc file with common good practices about
writing iotests?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-3.0 ?] tests/libqtest: Improve kill_qemu() assert

2018-07-22 Thread Michael S. Tsirkin
On Fri, Jul 20, 2018 at 04:39:32PM +0100, Peter Maydell wrote:
> In kill_qemu() we have an assert that checks that the QEMU process
> didn't dump core:
> assert(!WCOREDUMP(wstatus));
> 
> Unfortunately the WCOREDUMP macro here means the resulting message
> is not very easy to comprehend on at least some systems:
> 
> ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ 
> (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) 
> & 0x80)' failed.
> 
> and it doesn't identify what signal the process took.
> 
> Instead of using a raw assert, print the information in an
> easier to understand way:
> 
> /i386/ahci/sanity: libqtest.c: kill_qemu() tried to terminate QEMU process 
> but it dumped core with signal 11
> ahci-test: tests/libqtest.c:118: kill_qemu: Assertion `0' failed.
> Aborted (core dumped)
> 
> (Of course, the really useful information would be why the QEMU
> process dumped core in the first place, but we don't have that
> by the time the test program has picked up the exit status.)
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Michael S. Tsirkin 

Up to you whether to apply in 3.0.

> ---
> In particular, the travis test config that enables gprof
> seems to (a) run into this every so often and (b) have the
> really unhelpful assertion text quoted above:
>  https://travis-ci.org/qemu/qemu/jobs/406192798
> 
> Maybe for 3.0 since it's only test code.
> 
>  tests/libqtest.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 098af6aec44..99341e1b47d 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -110,7 +110,13 @@ static void kill_qemu(QTestState *s)
>  pid = waitpid(s->qemu_pid, &wstatus, 0);
>  
>  if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> -assert(!WCOREDUMP(wstatus));
> +if (WCOREDUMP(wstatus)) {
> +fprintf(stderr,
> +"libqtest.c: kill_qemu() tried to terminate QEMU "
> +"process but it dumped core with signal %d\n",
> +WTERMSIG(wstatus));
> +assert(0);
> +}
>  }
>  }
>  }
> -- 
> 2.17.1



Re: [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression

2018-07-22 Thread Michael S. Tsirkin
On Wed, Jul 18, 2018 at 04:46:21PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/17/2018 02:58 AM, Dr. David Alan Gilbert wrote:
> > * Xiao Guangrong (guangrong.x...@gmail.com) wrote:
> > > 
> > > 
> > > On 06/29/2018 05:42 PM, Dr. David Alan Gilbert wrote:
> > > > * Xiao Guangrong (guangrong.x...@gmail.com) wrote:
> > > > > 
> > > > > Hi Peter,
> > > > > 
> > > > > Sorry for the delay as i was busy on other things.
> > > > > 
> > > > > On 06/19/2018 03:30 PM, Peter Xu wrote:
> > > > > > On Mon, Jun 04, 2018 at 05:55:14PM +0800, guangrong.x...@gmail.com 
> > > > > > wrote:
> > > > > > > From: Xiao Guangrong 
> > > > > > > 
> > > > > > > Detecting zero page is not a light work, we can disable it
> > > > > > > for compression that can handle all zero data very well
> > > > > > 
> > > > > > Is there any number shows how the compression algo performs better
> > > > > > than the zero-detect algo?  Asked since AFAIU buffer_is_zero() might
> > > > > > be fast, depending on how init_accel() is done in 
> > > > > > util/bufferiszero.c.
> > > > > 
> > > > > This is the comparison between zero-detection and compression (the 
> > > > > target
> > > > > buffer is all zero bit):
> > > > > 
> > > > > Zero 810 ns Compression: 26905 ns.
> > > > > Zero 417 ns Compression: 8022 ns.
> > > > > Zero 408 ns Compression: 7189 ns.
> > > > > Zero 400 ns Compression: 7255 ns.
> > > > > Zero 412 ns Compression: 7016 ns.
> > > > > Zero 411 ns Compression: 7035 ns.
> > > > > Zero 413 ns Compression: 6994 ns.
> > > > > Zero 399 ns Compression: 7024 ns.
> > > > > Zero 416 ns Compression: 7053 ns.
> > > > > Zero 405 ns Compression: 7041 ns.
> > > > > 
> > > > > Indeed, zero-detection is faster than compression.
> > > > > 
> > > > > However during our profiling for the live_migration thread (after 
> > > > > reverted this patch),
> > > > > we noticed zero-detection cost lots of CPU:
> > > > > 
> > > > >12.01%  kqemu  qemu-system-x86_64[.] buffer_zero_sse2  
> > > > >   
> > > > >   
> > > > >  ◆
> > > > 
> > > > Interesting; what host are you running on?
> > > > Some hosts have support for the faster buffer_zero_ss4/avx2
> > > 
> > > The host is:
> > > 
> > > model name: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
> > > ...
> > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> > > mca cmov pat pse36 clflush dts acpi
> > >   mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm 
> > > constant_tsc art arch_perfmon pebs bts
> > >   rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni 
> > > pclmulqdq dtes64 monitor
> > >   ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 
> > > sse4_2 x2apic movbe popcnt
> > >   tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch 
> > > cpuid_fault epb cat_l3
> > >   cdp_l3 intel_ppin intel_pt mba tpr_shadow vnmi flexpriority ept vpid 
> > > fsgsbase tsc_adjust bmi1
> > >   hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq 
> > > rdseed adx smap clflushopt
> > >   clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc 
> > > cqm_occup_llc cqm_mbm_total
> > >   cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp 
> > > hwp_pkg_req pku ospke
> > > 
> > > I checked and noticed "CONFIG_AVX2_OPT" has not been enabled, maybe is 
> > > due to too old glib/gcc
> > > version:
> > > gcc version 4.4.6 20110731 (Red Hat 4.4.6-4) (GCC)
> > > glibc.x86_64 2.12
> > 
> > Yes, that's pretty old (RHEL6 ?) - I think you should get AVX2 in RHEL7.
> 
> Er, it is not easy to update glibc in the production env :(

But neither is QEMU updated in production all that easily. While we do
want to support older hosts functionally, it does not make
much sense to devel complex optimizations that only benefit
older hosts.

-- 
MST



Re: [Qemu-devel] [PATCH] linux-user/ppc: Implement swapcontext syscall

2018-07-22 Thread Laurent Vivier
Le 18/07/2018 à 22:06, Richard Henderson a écrit :
> This allows the tests generated by debian-powerpc-user-cross
> to function properly, especially tests/test-coroutine.
> 
> Technically this syscall is available to both ppc32 and ppc64,
> but only ppc32 glibc actually uses it.  Thus the ppc64 path is
> untested.
> 
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/qemu.h   |  2 ++
>  linux-user/ppc/signal.c | 56 +
>  linux-user/syscall.c|  6 +
>  3 files changed, 64 insertions(+)
> 

Reviewed-by: Laurent Vivier 





Re: [Qemu-devel] [PATCH v6 10/11] linux-user: Update MIPS syscall numbers up to kernel 4.18 headers

2018-07-22 Thread Laurent Vivier
Le 19/07/2018 à 14:52, Stefan Markovic a écrit :
> From: Aleksandar Markovic 
> 
> Synchronize content of linux-user/mips/syscall_nr.h and
> linux-user/mips64/syscall_nr.h with Linux kernel 4.18 headers.
> This adds 7 new syscall numbers, the last being NR_statx.
> 
> Signed-off-by: Aleksandar Markovic 
> Signed-off-by: Stefan Markovic 
> ---
>  linux-user/mips/syscall_nr.h   |  7 +++
>  linux-user/mips64/syscall_nr.h | 14 ++
>  2 files changed, 21 insertions(+)

Reviewed-by: Laurent Vivier 



[Qemu-devel] [PATCH V10 00/20] COLO: integrate colo frame with block replication and COLO proxy

2018-07-22 Thread Zhang Chen
COLO Frame, block replication and COLO proxy(colo-compare,filter-mirror,
filter-redirector,filter-rewriter) have been exist in qemu
for long time, it's time to integrate these three parts to make COLO really 
works.

In this series, we have some optimizations for COLO frame, including separating 
the
process of saving ram and device state, using an COLO_EXIT event to notify 
users that
VM exits COLO, for these parts, most of them have been reviewed long time ago 
in old version,
but since this series have just rebased on upstream which had merged a new 
series of migration,
parts of pathes in this series deserve review again.

We use notifier/callback method for COLO compare to notify COLO frame about
net packets inconsistent event, and add a handle_event method for 
NetFilterClass to
help COLO frame to notify filters and colo-compare about checkpoint/failover 
event,
it is flexible.

For the neweset version, please refer to:
https://github.com/zhangckid/qemu/tree/qemu-colo-18jul22

Please review, thanks.

V10:
 - Rebase on upstream.
 - Remove the "active" in COLOState.
 - Fix some comments.

V9:
 - Rebased on upstream codes.
 - Addressed Jason's comments add TCP state machine track in
   filter-rewriter.
 - Fix some bug in colo-compare.
 - Fix typo.
 - Add filter-rewriter failover handle.
 - Add net client type check in colo-compare.
 - Add COLO state diagram.
 - Addressed Markus and Daive's comments.


V8:
 - Rebased on upstream codes.
 - Addressed Markus's comments in patch 10/17.
 - Addressed Markus's comments in patch 11/17.
 - Removed some comments in patch 4/17.
 - Moved the "migration_bitmap_clear_dirty()" to suitable position in
   patch 9/17.
 - Rewrote the patch 07/17 to address Davie's comments.
 - Moved the "qemu_savevm_live_state" out of the
   qemu_mutex_lock_iothread.
 - Fixed the bug that in some status COLO vm crash with segmentation fault.

V7:
 - Addressed Markus's comments in 11/17.
 - Rebased on upstream.

V6:
 - Addressed Eric Blake's comments, use the enum to feedback in patch 11/17.
 - Fixed QAPI command separator problem in patch 11/17.


Zhang Chen (16):
  filter-rewriter: Add TCP state machine and fix memory leak in
connection_track_table
  colo-compare: implement the process of checkpoint
  colo-compare: use notifier to notify packets comparing result
  COLO: integrate colo compare with colo frame
  COLO: Add block replication into colo process
  COLO: Remove colo_state migration struct
  COLO: Load dirty pages into SVM's RAM cache firstly
  ram/COLO: Record the dirty pages that SVM received
  COLO: Flush memory data from ram cache
  qapi/migration.json: Rename COLO unknown mode to none mode.
  qapi: Add new command to query colo status
  savevm: split the process of different stages for loadvm/savevm
  net/net.c: Add net client type check function for COLO
  filter: Add handle_event method for NetFilterClass
  filter-rewriter: handle checkpoint and failover event
  docs: Add COLO status diagram to COLO-FT.txt

zhanghailiang (4):
  qmp event: Add COLO_EXIT event to notify users while exited COLO
  COLO: flush host dirty ram from cache
  COLO: notify net filters about checkpoint/failover event
  COLO: quick failover process by kick COLO thread

 docs/COLO-FT.txt  |  34 ++
 include/exec/ram_addr.h   |   1 +
 include/migration/colo.h  |  11 +-
 include/net/filter.h  |   5 +
 include/net/net.h |   1 +
 migration/Makefile.objs   |   2 +-
 migration/colo-comm.c |  76 --
 migration/colo-failover.c |   2 +-
 migration/colo.c  | 212 --
 migration/migration.c |  44 +++-
 migration/ram.c   | 163 -
 migration/ram.h   |   4 +
 migration/savevm.c|  53 --
 migration/savevm.h|   5 +
 migration/trace-events|   3 +
 net/colo-compare.c| 120 +++--
 net/colo-compare.h|  24 +
 net/colo.c|  10 +-
 net/colo.h|  11 +-
 net/filter-rewriter.c | 162 +++--
 net/filter.c  |  17 +++
 net/net.c |  42 
 qapi/migration.json   |  80 +-
 vl.c  |   2 -
 24 files changed, 945 insertions(+), 139 deletions(-)
 delete mode 100644 migration/colo-comm.c
 create mode 100644 net/colo-compare.h

-- 
2.17.1




[Qemu-devel] [PATCH V10 04/20] COLO: integrate colo compare with colo frame

2018-07-22 Thread Zhang Chen
For COLO FT, both the PVM and SVM run at the same time,
only sync the state while it needs.

So here, let SVM runs while not doing checkpoint, change
DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100.

Besides, we forgot to release colo_checkpoint_semd and
colo_delay_timer, fix them here.

Signed-off-by: zhanghailiang 
Signed-off-by: Zhang Chen 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/colo.c  | 42 --
 migration/migration.c |  6 ++
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 4381067ed4..081df1835f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -25,8 +25,11 @@
 #include "qemu/error-report.h"
 #include "migration/failover.h"
 #include "replication.h"
+#include "net/colo-compare.h"
+#include "net/colo.h"
 
 static bool vmstate_loading;
+static Notifier packets_compare_notifier;
 
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
@@ -343,6 +346,11 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT, &local_err);
+if (local_err) {
+goto out;
+}
+
 /* Disable block migration */
 migrate_set_block_enabled(false, &local_err);
 qemu_savevm_state_header(fb);
@@ -400,6 +408,11 @@ out:
 return ret;
 }
 
+static void colo_compare_notify_checkpoint(Notifier *notifier, void *data)
+{
+colo_checkpoint_notify(data);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 QIOChannelBuffer *bioc;
@@ -416,6 +429,9 @@ static void colo_process_checkpoint(MigrationState *s)
 goto out;
 }
 
+packets_compare_notifier.notify = colo_compare_notify_checkpoint;
+colo_compare_register_notifier(&packets_compare_notifier);
+
 /*
  * Wait for Secondary finish loading VM states and enter COLO
  * restore.
@@ -461,11 +477,21 @@ out:
 qemu_fclose(fb);
 }
 
-timer_del(s->colo_delay_timer);
-
 /* Hope this not to be too long to wait here */
 qemu_sem_wait(&s->colo_exit_sem);
 qemu_sem_destroy(&s->colo_exit_sem);
+
+/*
+ * It is safe to unregister notifier after failover finished.
+ * Besides, colo_delay_timer and colo_checkpoint_sem can't be
+ * released befor unregister notifier, or there will be use-after-free
+ * error.
+ */
+colo_compare_unregister_notifier(&packets_compare_notifier);
+timer_del(s->colo_delay_timer);
+timer_free(s->colo_delay_timer);
+qemu_sem_destroy(&s->colo_checkpoint_sem);
+
 /*
  * Must be called after failover BH is completed,
  * Or the failover BH may shutdown the wrong fd that
@@ -558,6 +584,11 @@ void *colo_process_incoming_thread(void *opaque)
 fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
 
+qemu_mutex_lock_iothread();
+vm_start();
+trace_colo_vm_state_change("stop", "run");
+qemu_mutex_unlock_iothread();
+
 colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
   &local_err);
 if (local_err) {
@@ -577,6 +608,11 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+trace_colo_vm_state_change("run", "stop");
+qemu_mutex_unlock_iothread();
+
 /* FIXME: This is unnecessary for periodic checkpoint mode */
 colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
  &local_err);
@@ -630,6 +666,8 @@ void *colo_process_incoming_thread(void *opaque)
 }
 
 vmstate_loading = false;
+vm_start();
+trace_colo_vm_state_change("stop", "run");
 qemu_mutex_unlock_iothread();
 
 if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
diff --git a/migration/migration.c b/migration/migration.c
index 8d56d56930..ce06941706 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -75,10 +75,8 @@
 /* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE (64 * 1024 * 1024)
 
-/* The delay time (in ms) between two COLO checkpoints
- * Note: Please change this default value to 1 when we support hybrid mode.
- */
-#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
+/* The delay time (in ms) between two COLO checkpoints */
+#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
 #define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
 #define DEFAULT_MIGRATE_MULTIFD_PAGE_COUNT 16
 
-- 
2.17.1




[Qemu-devel] [PATCH V10 01/20] filter-rewriter: Add TCP state machine and fix memory leak in connection_track_table

2018-07-22 Thread Zhang Chen
We add almost full TCP state machine in filter-rewriter, except
TCPS_LISTEN and some simplify in VM active close FIN states.

After a net connection is closed, we didn't clear its releated resources
in connection_track_table, which will lead to memory leak.

Let't track the state of net connection, if it is closed, its related
resources will be cleared up.

Signed-off-by: zhanghailiang 
Signed-off-by: Zhang Chen 
---
 net/colo.c|   2 +-
 net/colo.h|   9 ++--
 net/filter-rewriter.c | 105 ++
 3 files changed, 100 insertions(+), 16 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index 6dda4ed66e..97c8fc928f 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -137,7 +137,7 @@ Connection *connection_new(ConnectionKey *key)
 conn->ip_proto = key->ip_proto;
 conn->processing = false;
 conn->offset = 0;
-conn->syn_flag = 0;
+conn->tcp_state = TCPS_CLOSED;
 conn->pack = 0;
 conn->sack = 0;
 g_queue_init(&conn->primary_list);
diff --git a/net/colo.h b/net/colo.h
index da6c36dcf7..0277e0e9ba 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -18,6 +18,7 @@
 #include "slirp/slirp.h"
 #include "qemu/jhash.h"
 #include "qemu/timer.h"
+#include "slirp/tcp.h"
 
 #define HASHTABLE_MAX_SIZE 16384
 
@@ -81,11 +82,9 @@ typedef struct Connection {
 uint32_t sack;
 /* offset = secondary_seq - primary_seq */
 tcp_seq  offset;
-/*
- * we use this flag update offset func
- * run once in independent tcp connection
- */
-int syn_flag;
+
+int tcp_state; /* TCP FSM state */
+tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
 } Connection;
 
 uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index f584e4eba4..f18a71bf2e 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -59,9 +59,9 @@ static int is_tcp_packet(Packet *pkt)
 }
 
 /* handle tcp packet from primary guest */
-static int handle_primary_tcp_pkt(NetFilterState *nf,
+static int handle_primary_tcp_pkt(RewriterState *rf,
   Connection *conn,
-  Packet *pkt)
+  Packet *pkt, ConnectionKey *key)
 {
 struct tcphdr *tcp_pkt;
 
@@ -74,23 +74,28 @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
 trace_colo_filter_rewriter_conn_offset(conn->offset);
 }
 
+if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN)) &&
+conn->tcp_state == TCPS_SYN_SENT) {
+conn->tcp_state = TCPS_ESTABLISHED;
+}
+
 if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
 /*
  * we use this flag update offset func
  * run once in independent tcp connection
  */
-conn->syn_flag = 1;
+conn->tcp_state = TCPS_SYN_RECEIVED;
 }
 
 if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) {
-if (conn->syn_flag) {
+if (conn->tcp_state == TCPS_SYN_RECEIVED) {
 /*
  * offset = secondary_seq - primary seq
  * ack packet sent by guest from primary node,
  * so we use th_ack - 1 get primary_seq
  */
 conn->offset -= (ntohl(tcp_pkt->th_ack) - 1);
-conn->syn_flag = 0;
+conn->tcp_state = TCPS_ESTABLISHED;
 }
 if (conn->offset) {
 /* handle packets to the secondary from the primary */
@@ -99,15 +104,63 @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
 net_checksum_calculate((uint8_t *)pkt->data + pkt->vnet_hdr_len,
pkt->size - pkt->vnet_hdr_len);
 }
+/*
+ * Case 1:
+ * Step 3:
+ * The *server* side of this connect is VM, *client* tries to close
+ * the connection.
+ *
+ * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1'
+ * packet from server side. From this point, we can ensure that there
+ * will be no packets in the connection, except that, some errors
+ * happen between the path of 'filter object' and vNIC, if this rare
+ * case really happen, we can still create a new connection,
+ * So it is safe to remove the connection from connection_track_table.
+ *
+ */
+if ((conn->tcp_state == TCPS_LAST_ACK) &&
+(ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+conn->tcp_state = TCPS_CLOSED;
+g_hash_table_remove(rf->connection_track_table, key);
+}
+}
+
+if ((tcp_pkt->th_flags & TH_FIN) == TH_FIN) {
+/*
+ * Case 1:
+ * Step 1:
+ * The *server* side of this connect is VM, *client* tries to close
+ * the connection. We will into CLOSE_WAIT status.
+ */
+if (conn->tcp_state == TCPS_ESTABLISHED) {
+conn->tcp_state = TCPS_CLOSE_WAIT;
+}
+
+/*
+ * Case

[Qemu-devel] [PATCH V10 03/20] colo-compare: use notifier to notify packets comparing result

2018-07-22 Thread Zhang Chen
It's a good idea to use notifier to notify COLO frame of
inconsistent packets comparing.

Signed-off-by: Zhang Chen 
Signed-off-by: zhanghailiang 
---
 net/colo-compare.c | 37 ++---
 net/colo-compare.h |  2 ++
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 80e6532e8b..426eab5973 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -29,6 +29,7 @@
 #include "sysemu/iothread.h"
 #include "net/colo-compare.h"
 #include "migration/colo.h"
+#include "migration/migration.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
@@ -37,6 +38,9 @@
 static QTAILQ_HEAD(, CompareState) net_compares =
QTAILQ_HEAD_INITIALIZER(net_compares);
 
+static NotifierList colo_compare_notifiers =
+NOTIFIER_LIST_INITIALIZER(colo_compare_notifiers);
+
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
@@ -427,10 +431,7 @@ sec:
 qemu_hexdump((char *)spkt->data, stderr,
  "colo-compare spkt", spkt->size);
 
-/*
- * colo_compare_inconsistent_notify();
- * TODO: notice to checkpoint();
- */
+colo_compare_inconsistency_notify();
 }
 }
 
@@ -561,8 +562,24 @@ static int colo_old_packet_check_one(Packet *pkt, int64_t 
*check_time)
 }
 }
 
+static void colo_compare_inconsistency_notify(void)
+{
+notifier_list_notify(&colo_compare_notifiers,
+migrate_get_current());
+}
+
+void colo_compare_register_notifier(Notifier *notify)
+{
+notifier_list_add(&colo_compare_notifiers, notify);
+}
+
+void colo_compare_unregister_notifier(Notifier *notify)
+{
+notifier_remove(notify);
+}
+
 static int colo_old_packet_check_one_conn(Connection *conn,
-  void *user_data)
+   void *user_data)
 {
 GList *result = NULL;
 int64_t check_time = REGULAR_PACKET_CHECK_MS;
@@ -573,10 +590,7 @@ static int colo_old_packet_check_one_conn(Connection *conn,
 
 if (result) {
 /* Do checkpoint will flush old packet */
-/*
- * TODO: Notify colo frame to do checkpoint.
- * colo_compare_inconsistent_notify();
- */
+colo_compare_inconsistency_notify();
 return 0;
 }
 
@@ -620,11 +634,12 @@ static void colo_compare_packet(CompareState *s, 
Connection *conn,
 /*
  * If one packet arrive late, the secondary_list or
  * primary_list will be empty, so we can't compare it
- * until next comparison.
+ * until next comparison. If the packets in the list are
+ * timeout, it will trigger a checkpoint request.
  */
 trace_colo_compare_main("packet different");
 g_queue_push_head(&conn->primary_list, pkt);
-/* TODO: colo_notify_checkpoint();*/
+colo_compare_inconsistency_notify();
 break;
 }
 }
diff --git a/net/colo-compare.h b/net/colo-compare.h
index 1b1ce76aea..22ddd512e2 100644
--- a/net/colo-compare.h
+++ b/net/colo-compare.h
@@ -18,5 +18,7 @@
 #define QEMU_COLO_COMPARE_H
 
 void colo_notify_compares_event(void *opaque, int event, Error **errp);
+void colo_compare_register_notifier(Notifier *notify);
+void colo_compare_unregister_notifier(Notifier *notify);
 
 #endif /* QEMU_COLO_COMPARE_H */
-- 
2.17.1




[Qemu-devel] [PATCH V10 02/20] colo-compare: implement the process of checkpoint

2018-07-22 Thread Zhang Chen
While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Signed-off-by: zhanghailiang 
Signed-off-by: Zhang Chen 
---
 include/migration/colo.h |  6 
 net/colo-compare.c   | 78 
 net/colo-compare.h   | 22 
 3 files changed, 106 insertions(+)
 create mode 100644 net/colo-compare.h

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 2fe48ad353..fefb2fcf4c 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -16,6 +16,12 @@
 #include "qemu-common.h"
 #include "qapi/qapi-types-migration.h"
 
+enum colo_event {
+COLO_EVENT_NONE,
+COLO_EVENT_CHECKPOINT,
+COLO_EVENT_FAILOVER,
+};
+
 void colo_info_init(void);
 
 void migrate_start_colo_process(MigrationState *s);
diff --git a/net/colo-compare.c b/net/colo-compare.c
index dd745a491b..80e6532e8b 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -27,11 +27,16 @@
 #include "qemu/sockets.h"
 #include "colo.h"
 #include "sysemu/iothread.h"
+#include "net/colo-compare.h"
+#include "migration/colo.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
+static QTAILQ_HEAD(, CompareState) net_compares =
+   QTAILQ_HEAD_INITIALIZER(net_compares);
+
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
@@ -41,6 +46,10 @@
 /* TODO: Should be configurable */
 #define REGULAR_PACKET_CHECK_MS 3000
 
+static QemuMutex event_mtx;
+static QemuCond event_complete_cond;
+static int event_unhandled_count;
+
 /*
  *  + CompareState ++
  *  |   |
@@ -87,6 +96,11 @@ typedef struct CompareState {
 IOThread *iothread;
 GMainContext *worker_context;
 QEMUTimer *packet_check_timer;
+
+QEMUBH *event_bh;
+enum colo_event event;
+
+QTAILQ_ENTRY(CompareState) next;
 } CompareState;
 
 typedef struct CompareClass {
@@ -736,6 +750,25 @@ static void check_old_packet_regular(void *opaque)
 REGULAR_PACKET_CHECK_MS);
 }
 
+/* Public API, Used for COLO frame to notify compare event */
+void colo_notify_compares_event(void *opaque, int event, Error **errp)
+{
+CompareState *s;
+
+qemu_mutex_lock(&event_mtx);
+QTAILQ_FOREACH(s, &net_compares, next) {
+s->event = event;
+qemu_bh_schedule(s->event_bh);
+event_unhandled_count++;
+}
+/* Wait all compare threads to finish handling this event */
+while (event_unhandled_count > 0) {
+qemu_cond_wait(&event_complete_cond, &event_mtx);
+}
+
+qemu_mutex_unlock(&event_mtx);
+}
+
 static void colo_compare_timer_init(CompareState *s)
 {
 AioContext *ctx = iothread_get_aio_context(s->iothread);
@@ -756,6 +789,30 @@ static void colo_compare_timer_del(CompareState *s)
 }
  }
 
+static void colo_flush_packets(void *opaque, void *user_data);
+
+static void colo_compare_handle_event(void *opaque)
+{
+CompareState *s = opaque;
+
+switch (s->event) {
+case COLO_EVENT_CHECKPOINT:
+g_queue_foreach(&s->conn_list, colo_flush_packets, s);
+break;
+case COLO_EVENT_FAILOVER:
+break;
+default:
+break;
+}
+
+assert(event_unhandled_count > 0);
+
+qemu_mutex_lock(&event_mtx);
+event_unhandled_count--;
+qemu_cond_broadcast(&event_complete_cond);
+qemu_mutex_unlock(&event_mtx);
+}
+
 static void colo_compare_iothread(CompareState *s)
 {
 object_ref(OBJECT(s->iothread));
@@ -769,6 +826,7 @@ static void colo_compare_iothread(CompareState *s)
  s, s->worker_context, true);
 
 colo_compare_timer_init(s);
+s->event_bh = qemu_bh_new(colo_compare_handle_event, s);
 }
 
 static char *compare_get_pri_indev(Object *obj, Error **errp)
@@ -926,8 +984,13 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
 net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, s->vnet_hdr);
 net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, s->vnet_hdr);
 
+QTAILQ_INSERT_TAIL(&net_compares, s, next);
+
 g_queue_init(&s->conn_list);
 
+qemu_mutex_init(&event_mtx);
+qemu_cond_init(&event_complete_cond);
+
 s->connection_track_table = g_hash_table_new_full(connection_key_hash,
   connection_key_equal,
   g_free,
@@ -990,6 +1053,7 @@ static void colo_compare_init(Object *obj)
 static void colo_compare_finalize(Object *obj)
 {
 CompareState *s = COLO_COMPARE(obj);
+CompareState *tmp = NULL;
 
 qemu_chr_fe_deinit(&s->chr_pri_in, false);
 qemu_chr_fe_deinit(&s->chr_sec_in, false);
@@ -997,6 +1061,16 @@ static void colo_compare_finalize(Object *obj)
 if (s->iothread) {
 colo_compare_timer_del(s);
 }
+
+qemu_bh_d

[Qemu-devel] [PATCH V10 09/20] COLO: Flush memory data from ram cache

2018-07-22 Thread Zhang Chen
During the time of VM's running, PVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be same with PVM's memory
after checkpoint.

Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM since last checkpoint.
In this way, we can ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/ram.c| 37 +
 migration/trace-events |  2 ++
 2 files changed, 39 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index d1060f1337..4df85d69c1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3830,6 +3830,39 @@ static bool postcopy_is_running(void)
 return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END;
 }
 
+/*
+ * Flush content of RAM cache into SVM's memory.
+ * Only flush the pages that be dirtied by PVM or SVM or both.
+ */
+static void colo_flush_ram_cache(void)
+{
+RAMBlock *block = NULL;
+void *dst_host;
+void *src_host;
+unsigned long offset = 0;
+
+trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
+rcu_read_lock();
+block = QLIST_FIRST_RCU(&ram_list.blocks);
+
+while (block) {
+offset = migration_bitmap_find_dirty(ram_state, block, offset);
+
+if (offset << TARGET_PAGE_BITS >= block->used_length) {
+offset = 0;
+block = QLIST_NEXT_RCU(block, next);
+} else {
+migration_bitmap_clear_dirty(ram_state, block, offset);
+dst_host = block->host + (offset << TARGET_PAGE_BITS);
+src_host = block->colo_cache + (offset << TARGET_PAGE_BITS);
+memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+}
+}
+
+rcu_read_unlock();
+trace_colo_flush_ram_cache_end();
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 int flags = 0, ret = 0, invalid_flags = 0;
@@ -4006,6 +4039,10 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret |= wait_for_decompress_done();
 rcu_read_unlock();
 trace_ram_load_complete(ret, seq_iter);
+
+if (!ret  && migration_incoming_in_colo_state()) {
+colo_flush_ram_cache();
+}
 return ret;
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index fa0ff3f3bf..bd2d0cd25a 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -102,6 +102,8 @@ ram_dirty_bitmap_sync_start(void) ""
 ram_dirty_bitmap_sync_wait(void) ""
 ram_dirty_bitmap_sync_complete(void) ""
 ram_state_resume_prepare(uint64_t v) "%" PRId64
+colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
+colo_flush_ram_cache_end(void) ""
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
2.17.1




[Qemu-devel] [PATCH V10 10/20] qmp event: Add COLO_EXIT event to notify users while exited COLO

2018-07-22 Thread Zhang Chen
From: zhanghailiang 

If some errors happen during VM's COLO FT stage, it's important to
notify the users of this event. Together with 'x-colo-lost-heartbeat',
Users can intervene in COLO's failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Zhang Chen 
Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
---
 migration/colo.c| 31 +++
 qapi/migration.json | 38 ++
 2 files changed, 69 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index c083d3696f..ab484ad754 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -28,6 +28,7 @@
 #include "net/colo-compare.h"
 #include "net/colo.h"
 #include "block/block.h"
+#include "qapi/qapi-events-migration.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -514,6 +515,23 @@ out:
 qemu_fclose(fb);
 }
 
+/*
+ * There are only two reasons we can get here, some error happened
+ * or the user triggered failover.
+ */
+switch (failover_get_state()) {
+case FAILOVER_STATUS_NONE:
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+  COLO_EXIT_REASON_ERROR, NULL);
+break;
+case FAILOVER_STATUS_REQUIRE:
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+  COLO_EXIT_REASON_REQUEST, NULL);
+break;
+default:
+abort();
+}
+
 /* Hope this not to be too long to wait here */
 qemu_sem_wait(&s->colo_exit_sem);
 qemu_sem_destroy(&s->colo_exit_sem);
@@ -745,6 +763,19 @@ out:
 error_report_err(local_err);
 }
 
+switch (failover_get_state()) {
+case FAILOVER_STATUS_NONE:
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+  COLO_EXIT_REASON_ERROR, NULL);
+break;
+case FAILOVER_STATUS_REQUIRE:
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+  COLO_EXIT_REASON_REQUEST, NULL);
+break;
+default:
+abort();
+}
+
 if (fb) {
 qemu_fclose(fb);
 }
diff --git a/qapi/migration.json b/qapi/migration.json
index 186e8a7303..05f65680e1 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -900,6 +900,44 @@
 { 'enum': 'FailoverStatus',
   'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] }
 
+##
+# @COLO_EXIT:
+#
+# Emitted when VM finishes COLO mode due to some errors happening or
+# at the request of users.
+#
+# @mode: report COLO mode when COLO exited.
+#
+# @reason: describes the reason for the COLO exit.
+#
+# Since: 3.0
+#
+# Example:
+#
+# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172},
+#  "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } 
}
+#
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
+
+##
+# @COLOExitReason:
+#
+# The reason for a COLO exit
+#
+# @none: no failover has ever happened. This can't occur in the
+# COLO_EXIT event, only in the result of query-colo-status.
+#
+# @request: COLO exit is due to an external request
+#
+# @error: COLO exit is due to an internal error
+#
+# Since: 3.0
+##
+{ 'enum': 'COLOExitReason',
+  'data': [ 'none', 'request', 'error' ] }
+
 ##
 # @x-colo-lost-heartbeat:
 #
-- 
2.17.1




[Qemu-devel] [PATCH V10 05/20] COLO: Add block replication into colo process

2018-07-22 Thread Zhang Chen
Make sure master start block replication after slave's block
replication started.

Besides, we need to activate VM's blocks before goes into
COLO state.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Zhang Chen 
---
 migration/colo.c  | 43 +++
 migration/migration.c |  9 +
 2 files changed, 52 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 081df1835f..e06640c3d6 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -27,6 +27,7 @@
 #include "replication.h"
 #include "net/colo-compare.h"
 #include "net/colo.h"
+#include "block/block.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -56,6 +57,7 @@ static void secondary_vm_do_failover(void)
 {
 int old_state;
 MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
 
 /* Can not do failover during the process of VM's loading VMstate, Or
  * it will break the secondary VM.
@@ -73,6 +75,11 @@ static void secondary_vm_do_failover(void)
 migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
 
+replication_stop_all(true, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+
 if (!autostart) {
 error_report("\"-S\" qemu option will be ignored in secondary side");
 /* recover runstate to normal migration finish state */
@@ -110,6 +117,7 @@ static void primary_vm_do_failover(void)
 {
 MigrationState *s = migrate_get_current();
 int old_state;
+Error *local_err = NULL;
 
 migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
@@ -133,6 +141,13 @@ static void primary_vm_do_failover(void)
  FailoverStatus_str(old_state));
 return;
 }
+
+replication_stop_all(true, &local_err);
+if (local_err) {
+error_report_err(local_err);
+local_err = NULL;
+}
+
 /* Notify COLO thread that failover work is finished */
 qemu_sem_post(&s->colo_exit_sem);
 }
@@ -356,6 +371,11 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 qemu_savevm_state_header(fb);
 qemu_savevm_state_setup(fb);
 qemu_mutex_lock_iothread();
+replication_do_checkpoint_all(&local_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
 qemu_savevm_state_complete_precopy(fb, false, false);
 qemu_mutex_unlock_iothread();
 
@@ -446,6 +466,12 @@ static void colo_process_checkpoint(MigrationState *s)
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
+replication_start_all(REPLICATION_MODE_PRIMARY, &local_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vm_start();
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("stop", "run");
@@ -585,6 +611,11 @@ void *colo_process_incoming_thread(void *opaque)
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
+replication_start_all(REPLICATION_MODE_SECONDARY, &local_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
 vm_start();
 trace_colo_vm_state_change("stop", "run");
 qemu_mutex_unlock_iothread();
@@ -665,6 +696,18 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+replication_get_error_all(&local_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+/* discard colo disk buffer */
+replication_do_checkpoint_all(&local_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vmstate_loading = false;
 vm_start();
 trace_colo_vm_state_change("stop", "run");
diff --git a/migration/migration.c b/migration/migration.c
index ce06941706..c97b7660af 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -385,6 +385,7 @@ static void process_incoming_migration_co(void *opaque)
 MigrationIncomingState *mis = migration_incoming_get_current();
 PostcopyState ps;
 int ret;
+Error *local_err = NULL;
 
 assert(mis->from_src_file);
 mis->largest_page_size = qemu_ram_pagesize_largest();
@@ -416,6 +417,14 @@ static void process_incoming_migration_co(void *opaque)
 
 /* we get COLO info, and know if we are in COLO mode */
 if (!ret && migration_incoming_enable_colo()) {
+/* Make sure all file formats flush their mutable metadata */
+bdrv_invalidate_cache_all(&local_err);
+if (local_err) {
+migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+MIGRATION_STATUS_FAILED);
+error_report_err(local_err);
+exit(EXIT_FAILURE);
+}
 mis->migration_incoming_co = qemu_coroutine_self();
 qemu_thread_create(&mis->colo_

[Qemu-devel] [PATCH V10 06/20] COLO: Remove colo_state migration struct

2018-07-22 Thread Zhang Chen
We need to know if migration is going into COLO state for
incoming side before start normal migration.

Instead by using the VMStateDescription to send colo_state
from source side to destination side, we use MIG_CMD_ENABLE_COLO
to indicate whether COLO is enabled or not.

Signed-off-by: zhanghailiang 
Signed-off-by: Zhang Chen 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/migration/colo.h |  5 +--
 migration/Makefile.objs  |  2 +-
 migration/colo-comm.c| 76 
 migration/colo.c | 13 ++-
 migration/migration.c| 23 +++-
 migration/savevm.c   | 17 +
 migration/savevm.h   |  1 +
 migration/trace-events   |  1 +
 vl.c |  2 --
 9 files changed, 57 insertions(+), 83 deletions(-)
 delete mode 100644 migration/colo-comm.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
index fefb2fcf4c..99ce17aca7 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -28,8 +28,9 @@ void migrate_start_colo_process(MigrationState *s);
 bool migration_in_colo_state(void);
 
 /* loadvm */
-bool migration_incoming_enable_colo(void);
-void migration_incoming_exit_colo(void);
+void migration_incoming_enable_colo(void);
+void migration_incoming_disable_colo(void);
+bool migration_incoming_colo_enabled(void);
 void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index c83ec47ba8..a4f3bafd86 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += migration.o socket.o fd.o exec.o
 common-obj-y += tls.o channel.o savevm.o
-common-obj-y += colo-comm.o colo.o colo-failover.o
+common-obj-y += colo.o colo-failover.o
 common-obj-y += vmstate.o vmstate-types.o page_cache.o
 common-obj-y += qemu-file.o global_state.o
 common-obj-y += qemu-file-channel.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
deleted file mode 100644
index df26e4dfe7..00
--- a/migration/colo-comm.c
+++ /dev/null
@@ -1,76 +0,0 @@
-/*
- * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
- * (a.k.a. Fault Tolerance or Continuous Replication)
- *
- * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
- * Copyright (c) 2016 FUJITSU LIMITED
- * Copyright (c) 2016 Intel Corporation
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or
- * later. See the COPYING file in the top-level directory.
- *
- */
-
-#include "qemu/osdep.h"
-#include "migration.h"
-#include "migration/colo.h"
-#include "migration/vmstate.h"
-#include "trace.h"
-
-typedef struct {
- bool colo_requested;
-} COLOInfo;
-
-static COLOInfo colo_info;
-
-COLOMode get_colo_mode(void)
-{
-if (migration_in_colo_state()) {
-return COLO_MODE_PRIMARY;
-} else if (migration_incoming_in_colo_state()) {
-return COLO_MODE_SECONDARY;
-} else {
-return COLO_MODE_UNKNOWN;
-}
-}
-
-static int colo_info_pre_save(void *opaque)
-{
-COLOInfo *s = opaque;
-
-s->colo_requested = migrate_colo_enabled();
-
-return 0;
-}
-
-static bool colo_info_need(void *opaque)
-{
-   return migrate_colo_enabled();
-}
-
-static const VMStateDescription colo_state = {
-.name = "COLOState",
-.version_id = 1,
-.minimum_version_id = 1,
-.pre_save = colo_info_pre_save,
-.needed = colo_info_need,
-.fields = (VMStateField[]) {
-VMSTATE_BOOL(colo_requested, COLOInfo),
-VMSTATE_END_OF_LIST()
-},
-};
-
-void colo_info_init(void)
-{
-vmstate_register(NULL, 0, &colo_state, &colo_info);
-}
-
-bool migration_incoming_enable_colo(void)
-{
-return colo_info.colo_requested;
-}
-
-void migration_incoming_exit_colo(void)
-{
-colo_info.colo_requested = false;
-}
diff --git a/migration/colo.c b/migration/colo.c
index e06640c3d6..c083d3696f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -152,6 +152,17 @@ static void primary_vm_do_failover(void)
 qemu_sem_post(&s->colo_exit_sem);
 }
 
+COLOMode get_colo_mode(void)
+{
+if (migration_in_colo_state()) {
+return COLO_MODE_PRIMARY;
+} else if (migration_incoming_in_colo_state()) {
+return COLO_MODE_SECONDARY;
+} else {
+return COLO_MODE_UNKNOWN;
+}
+}
+
 void colo_do_failover(MigrationState *s)
 {
 /* Make sure VM stopped while failover happened. */
@@ -745,7 +756,7 @@ out:
 if (mis->to_src_file) {
 qemu_fclose(mis->to_src_file);
 }
-migration_incoming_exit_colo();
+migration_incoming_disable_colo();
 
 return NULL;
 }
diff --git a/migration/migration.c b/migration/migration.c
index c97b7660af..c645f66f4e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -295,6 +295,22 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, 
const char *rbname,
 return migrate_send_rp_message(mis, msg_type, msglen, bufc);
 }
 
+static bool migration_colo_enabled;
+bool migration_i

[Qemu-devel] [PATCH V10 11/20] qapi/migration.json: Rename COLO unknown mode to none mode.

2018-07-22 Thread Zhang Chen
From: Zhang Chen 

Suggested by Markus Armbruster rename COLO unknown mode to none mode.

Signed-off-by: Zhang Chen 
Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
---
 migration/colo-failover.c |  2 +-
 migration/colo.c  |  2 +-
 qapi/migration.json   | 10 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index 0ae0c41221..4854a96c92 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -77,7 +77,7 @@ FailoverStatus failover_get_state(void)
 
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
-if (get_colo_mode() == COLO_MODE_UNKNOWN) {
+if (get_colo_mode() == COLO_MODE_NONE) {
 error_setg(errp, QERR_FEATURE_DISABLED, "colo");
 return;
 }
diff --git a/migration/colo.c b/migration/colo.c
index ab484ad754..8fdb79ac73 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -160,7 +160,7 @@ COLOMode get_colo_mode(void)
 } else if (migration_incoming_in_colo_state()) {
 return COLO_MODE_SECONDARY;
 } else {
-return COLO_MODE_UNKNOWN;
+return COLO_MODE_NONE;
 }
 }
 
diff --git a/qapi/migration.json b/qapi/migration.json
index 05f65680e1..cdae7ddccb 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -867,18 +867,18 @@
 ##
 # @COLOMode:
 #
-# The colo mode
+# The COLO current mode.
 #
-# @unknown: unknown mode
+# @none: COLO is disabled.
 #
-# @primary: master side
+# @primary: COLO node in primary side.
 #
-# @secondary: slave side
+# @secondary: COLO node in slave side.
 #
 # Since: 2.8
 ##
 { 'enum': 'COLOMode',
-  'data': [ 'unknown', 'primary', 'secondary'] }
+  'data': [ 'none', 'primary', 'secondary'] }
 
 ##
 # @FailoverStatus:
-- 
2.17.1




[Qemu-devel] [PATCH V10 15/20] net/net.c: Add net client type check function for COLO

2018-07-22 Thread Zhang Chen
From: Zhang Chen 

We add is_colo_support_client_type() to check the net client type for
COLO-compare. Currently we just support TAP.
Suggested by Jason.

Signed-off-by: Zhang Chen 
---
 include/net/net.h  |  1 +
 net/colo-compare.c |  5 +
 net/net.c  | 14 ++
 3 files changed, 20 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 1425960f76..dcbc7ba9c0 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -212,6 +212,7 @@ void hmp_host_net_add(Monitor *mon, const QDict *qdict);
 void hmp_host_net_remove(Monitor *mon, const QDict *qdict);
 void netdev_add(QemuOpts *opts, Error **errp);
 void qmp_netdev_add(QDict *qdict, QObject **ret, Error **errp);
+bool is_colo_support_client_type(void);
 
 int net_hub_id_for_client(NetClientState *nc, int *id);
 NetClientState *net_hub_port_find(int hub_id);
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 426eab5973..b8c0240725 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -996,6 +996,11 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
 return;
 }
 
+if (!is_colo_support_client_type()) {
+error_setg(errp, "COLO-compare: Net client type is not supported");
+return;
+}
+
 net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, s->vnet_hdr);
 net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, s->vnet_hdr);
 
diff --git a/net/net.c b/net/net.c
index 2a3133990c..a77ea88fff 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1733,3 +1733,17 @@ int net_fill_rstate(SocketReadState *rs, const uint8_t 
*buf, int size)
 assert(size == 0);
 return 0;
 }
+
+/* Currently, COLO just support TAP */
+bool is_colo_support_client_type(void)
+{
+NetClientState *nc;
+
+QTAILQ_FOREACH(nc, &net_clients, next) {
+if (nc->info->type != NET_CLIENT_DRIVER_TAP) {
+return false;
+}
+}
+
+return true;
+}
-- 
2.17.1




[Qemu-devel] [PATCH V10 13/20] savevm: split the process of different stages for loadvm/savevm

2018-07-22 Thread Zhang Chen
There are several stages during loadvm/savevm process. In different stage,
migration incoming processes different types of sections.
We want to control these stages more accuracy, it will benefit COLO
performance, we don't have to save type of QEMU_VM_SECTION_START
sections everytime while do checkpoint, besides, we want to separate
the process of saving/loading memory and devices state.

So we add three new helper functions: qemu_load_device_state() and
qemu_savevm_live_state() to achieve different process during migration.

Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
public, and simplify the codes of qemu_save_device_state() by calling the
wrapper qemu_savevm_state_header().

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Zhang Chen 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/colo.c   | 41 -
 migration/savevm.c | 36 +---
 migration/savevm.h |  4 
 3 files changed, 65 insertions(+), 16 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index b355de3f01..688d6f40b2 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -30,6 +30,7 @@
 #include "block/block.h"
 #include "qapi/qapi-events-migration.h"
 #include "qapi/qmp/qerror.h"
+#include "sysemu/cpus.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -401,23 +402,34 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 
 /* Disable block migration */
 migrate_set_block_enabled(false, &local_err);
-qemu_savevm_state_header(fb);
-qemu_savevm_state_setup(fb);
 qemu_mutex_lock_iothread();
 replication_do_checkpoint_all(&local_err);
 if (local_err) {
 qemu_mutex_unlock_iothread();
 goto out;
 }
-qemu_savevm_state_complete_precopy(fb, false, false);
-qemu_mutex_unlock_iothread();
-
-qemu_fflush(fb);
 
 colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
 if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+/* Note: device state is saved into buffer */
+ret = qemu_save_device_state(fb);
+
+qemu_mutex_unlock_iothread();
+if (ret < 0) {
 goto out;
 }
+/*
+ * Only save VM's live state, which not including device state.
+ * TODO: We may need a timeout mechanism to prevent COLO process
+ * to be blocked here.
+ */
+qemu_savevm_live_state(s->to_dst_file);
+
+qemu_fflush(fb);
+
 /*
  * We need the size of the VMstate data in Secondary side,
  * With which we can decide how much data should be read.
@@ -635,6 +647,7 @@ void *colo_process_incoming_thread(void *opaque)
 uint64_t total_size;
 uint64_t value;
 Error *local_err = NULL;
+int ret;
 
 qemu_sem_init(&mis->colo_incoming_sem, 0);
 
@@ -707,6 +720,16 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+qemu_mutex_lock_iothread();
+cpu_synchronize_all_pre_loadvm();
+ret = qemu_loadvm_state_main(mis->from_src_file, mis);
+qemu_mutex_unlock_iothread();
+
+if (ret < 0) {
+error_report("Load VM's live state (ram) error");
+goto out;
+}
+
 value = colo_receive_message_value(mis->from_src_file,
  COLO_MESSAGE_VMSTATE_SIZE, &local_err);
 if (local_err) {
@@ -738,10 +761,10 @@ void *colo_process_incoming_thread(void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-qemu_system_reset(SHUTDOWN_CAUSE_NONE);
 vmstate_loading = true;
-if (qemu_loadvm_state(fb) < 0) {
-error_report("COLO: loadvm failed");
+ret = qemu_load_device_state(fb);
+if (ret < 0) {
+error_report("COLO: load device state failed");
 qemu_mutex_unlock_iothread();
 goto out;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index 33e9e7cda0..3a6aa747f9 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1378,13 +1378,21 @@ done:
 return ret;
 }
 
-static int qemu_save_device_state(QEMUFile *f)
+void qemu_savevm_live_state(QEMUFile *f)
 {
-SaveStateEntry *se;
+/* save QEMU_VM_SECTION_END section */
+qemu_savevm_state_complete_precopy(f, true, false);
+qemu_put_byte(f, QEMU_VM_EOF);
+}
 
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+int qemu_save_device_state(QEMUFile *f)
+{
+SaveStateEntry *se;
 
+if (!migration_in_colo_state()) {
+qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+}
 cpu_synchronize_all_states();
 
 QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
@@ -1440,8 +1448,6 @@ enum LoadVMExitCodes {
 LOADVM_QUIT =  1,
 };
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
-
 /* -- incoming postcopy messages -- */
 

[Qemu-devel] [PATCH V10 07/20] COLO: Load dirty pages into SVM's RAM cache firstly

2018-07-22 Thread Zhang Chen
We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached 
ram
to SVM after we receive all PVM's state.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Zhang Chen 
---
 include/exec/ram_addr.h |  1 +
 migration/migration.c   |  6 +++
 migration/ram.c | 83 -
 migration/ram.h |  4 ++
 migration/savevm.c  |  2 +-
 5 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index cf4ce06248..a78c1c99a7 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -27,6 +27,7 @@ struct RAMBlock {
 struct rcu_head rcu;
 struct MemoryRegion *mr;
 uint8_t *host;
+uint8_t *colo_cache; /* For colo, VM's ram cache */
 ram_addr_t offset;
 ram_addr_t used_length;
 ram_addr_t max_length;
diff --git a/migration/migration.c b/migration/migration.c
index c645f66f4e..d9683e06d3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -441,6 +441,10 @@ static void process_incoming_migration_co(void *opaque)
 error_report_err(local_err);
 exit(EXIT_FAILURE);
 }
+if (colo_init_ram_cache() < 0) {
+error_report("Init ram cache failed");
+exit(EXIT_FAILURE);
+}
 mis->migration_incoming_co = qemu_coroutine_self();
 qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
  colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
@@ -449,6 +453,8 @@ static void process_incoming_migration_co(void *opaque)
 
 /* Wait checkpoint incoming thread exit before free resource */
 qemu_thread_join(&mis->colo_incoming_thread);
+/* We hold the global iothread lock, so it is safe here */
+colo_release_ram_cache();
 }
 
 if (ret < 0) {
diff --git a/migration/ram.c b/migration/ram.c
index 52dd678092..33ebd09d70 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3314,6 +3314,20 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
 return block->host + offset;
 }
 
+static inline void *colo_cache_from_block_offset(RAMBlock *block,
+ ram_addr_t offset)
+{
+if (!offset_in_ramblock(block, offset)) {
+return NULL;
+}
+if (!block->colo_cache) {
+error_report("%s: colo_cache is NULL in block :%s",
+ __func__, block->idstr);
+return NULL;
+}
+return block->colo_cache + offset;
+}
+
 /**
  * ram_handle_compressed: handle the zero page case
  *
@@ -3518,6 +3532,58 @@ static void decompress_data_with_multi_threads(QEMUFile 
*f,
 qemu_mutex_unlock(&decomp_done_lock);
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM, it is need to hold the global lock
+ * to call this helper.
+ */
+int colo_init_ram_cache(void)
+{
+RAMBlock *block;
+
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+block->colo_cache = qemu_anon_ram_alloc(block->used_length,
+NULL,
+false);
+if (!block->colo_cache) {
+error_report("%s: Can't alloc memory for COLO cache of block %s,"
+ "size 0x" RAM_ADDR_FMT, __func__, block->idstr,
+ block->used_length);
+goto out_locked;
+}
+memcpy(block->colo_cache, block->host, block->used_length);
+}
+rcu_read_unlock();
+return 0;
+
+out_locked:
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+if (block->colo_cache) {
+qemu_anon_ram_free(block->colo_cache, block->used_length);
+block->colo_cache = NULL;
+}
+}
+
+rcu_read_unlock();
+return -errno;
+}
+
+/* It is need to hold the global lock to call this helper */
+void colo_release_ram_cache(void)
+{
+RAMBlock *block;
+
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+if (block->colo_cache) {
+qemu_anon_ram_free(block->colo_cache, block->used_length);
+block->colo_cache = NULL;
+}
+}
+rcu_read_unlock();
+}
+
 /**
  * ram_load_setup: Setup RAM for migration incoming side
  *
@@ -3534,6 +3600,7 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 
 xbzrle_load_setup();
 ramblock_recv_map_init();
+
 return 0;
 }
 
@@ -3547,6 +3614,7 @@ sta

[Qemu-devel] [PATCH V10 16/20] filter: Add handle_event method for NetFilterClass

2018-07-22 Thread Zhang Chen
Filter needs to process the event of checkpoint/failover or
other event passed by COLO frame.

Signed-off-by: zhanghailiang 
---
 include/net/filter.h |  5 +
 net/filter.c | 17 +
 net/net.c| 28 
 3 files changed, 50 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index 435acd6f82..49da666ac0 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -38,6 +38,8 @@ typedef ssize_t (FilterReceiveIOV)(NetFilterState *nc,
 
 typedef void (FilterStatusChanged) (NetFilterState *nf, Error **errp);
 
+typedef void (FilterHandleEvent) (NetFilterState *nf, int event, Error **errp);
+
 typedef struct NetFilterClass {
 ObjectClass parent_class;
 
@@ -45,6 +47,7 @@ typedef struct NetFilterClass {
 FilterSetup *setup;
 FilterCleanup *cleanup;
 FilterStatusChanged *status_changed;
+FilterHandleEvent *handle_event;
 /* mandatory */
 FilterReceiveIOV *receive_iov;
 } NetFilterClass;
@@ -77,4 +80,6 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
 int iovcnt,
 void *opaque);
 
+void colo_notify_filters_event(int event, Error **errp);
+
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index 2fd7d7d663..0f17eba143 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -17,6 +17,8 @@
 #include "net/vhost_net.h"
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
+#include "net/colo.h"
+#include "migration/colo.h"
 
 static inline bool qemu_can_skip_netfilter(NetFilterState *nf)
 {
@@ -245,11 +247,26 @@ static void netfilter_finalize(Object *obj)
 g_free(nf->netdev_id);
 }
 
+static void dummy_handle_event(NetFilterState *nf, int event, Error **errp)
+{
+switch (event) {
+case COLO_EVENT_CHECKPOINT:
+break;
+case COLO_EVENT_FAILOVER:
+object_property_set_str(OBJECT(nf), "off", "status", errp);
+break;
+default:
+break;
+}
+}
+
 static void netfilter_class_init(ObjectClass *oc, void *data)
 {
 UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+NetFilterClass *nfc = NETFILTER_CLASS(oc);
 
 ucc->complete = netfilter_complete;
+nfc->handle_event = dummy_handle_event;
 }
 
 static const TypeInfo netfilter_info = {
diff --git a/net/net.c b/net/net.c
index a77ea88fff..b4f6a2efb2 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1331,6 +1331,34 @@ void hmp_info_network(Monitor *mon, const QDict *qdict)
 }
 }
 
+void colo_notify_filters_event(int event, Error **errp)
+{
+NetClientState *nc, *peer;
+NetClientDriver type;
+NetFilterState *nf;
+NetFilterClass *nfc = NULL;
+Error *local_err = NULL;
+
+QTAILQ_FOREACH(nc, &net_clients, next) {
+peer = nc->peer;
+type = nc->info->type;
+if (!peer || type != NET_CLIENT_DRIVER_TAP) {
+continue;
+}
+QTAILQ_FOREACH(nf, &nc->filters, next) {
+nfc =  NETFILTER_GET_CLASS(OBJECT(nf));
+if (!nfc->handle_event) {
+continue;
+}
+nfc->handle_event(nf, event, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+}
+
 void qmp_set_link(const char *name, bool up, Error **errp)
 {
 NetClientState *ncs[MAX_QUEUE_NUM];
-- 
2.17.1




[Qemu-devel] [PATCH V10 19/20] COLO: quick failover process by kick COLO thread

2018-07-22 Thread Zhang Chen
From: zhanghailiang 

COLO thread may sleep at qemu_sem_wait(&s->colo_checkpoint_sem),
while failover works begin, It's better to wakeup it to quick
the process.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/colo.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 8221a18add..e951159071 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -131,6 +131,11 @@ static void primary_vm_do_failover(void)
 
 migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
+/*
+ * kick COLO thread which might wait at
+ * qemu_sem_wait(&s->colo_checkpoint_sem).
+ */
+colo_checkpoint_notify(migrate_get_current());
 
 /*
  * Wake up COLO thread which may blocked in recv() or send(),
@@ -539,6 +544,9 @@ static void colo_process_checkpoint(MigrationState *s)
 
 qemu_sem_wait(&s->colo_checkpoint_sem);
 
+if (s->state != MIGRATION_STATUS_COLO) {
+goto out;
+}
 ret = colo_do_checkpoint_transaction(s, bioc, fb);
 if (ret < 0) {
 goto out;
-- 
2.17.1




[Qemu-devel] [PATCH V10 17/20] filter-rewriter: handle checkpoint and failover event

2018-07-22 Thread Zhang Chen
After one round of checkpoint, the states between PVM and SVM
become consistent, so it is unnecessary to adjust the sequence
of net packets for old connections, besides, while failover
happens, filter-rewriter will into failover mode that needn't
handle the new TCP connection.

Signed-off-by: zhanghailiang 
Signed-off-by: Zhang Chen 
---
 net/colo-compare.c| 12 -
 net/colo.c|  8 ++
 net/colo.h|  2 ++
 net/filter-rewriter.c | 57 +++
 4 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index b8c0240725..462e822be6 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -116,6 +116,12 @@ enum {
 SECONDARY_IN,
 };
 
+static void colo_compare_inconsistency_notify(void)
+{
+notifier_list_notify(&colo_compare_notifiers,
+migrate_get_current());
+}
+
 static int compare_chr_send(CompareState *s,
 const uint8_t *buf,
 uint32_t size,
@@ -562,12 +568,6 @@ static int colo_old_packet_check_one(Packet *pkt, int64_t 
*check_time)
 }
 }
 
-static void colo_compare_inconsistency_notify(void)
-{
-notifier_list_notify(&colo_compare_notifiers,
-migrate_get_current());
-}
-
 void colo_compare_register_notifier(Notifier *notify)
 {
 notifier_list_add(&colo_compare_notifiers, notify);
diff --git a/net/colo.c b/net/colo.c
index 97c8fc928f..49176bf07b 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -221,3 +221,11 @@ Connection *connection_get(GHashTable 
*connection_track_table,
 
 return conn;
 }
+
+bool connection_has_tracked(GHashTable *connection_track_table,
+ConnectionKey *key)
+{
+Connection *conn = g_hash_table_lookup(connection_track_table, key);
+
+return conn ? true : false;
+}
diff --git a/net/colo.h b/net/colo.h
index 0277e0e9ba..11c5226488 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -98,6 +98,8 @@ void connection_destroy(void *opaque);
 Connection *connection_get(GHashTable *connection_track_table,
ConnectionKey *key,
GQueue *conn_list);
+bool connection_has_tracked(GHashTable *connection_track_table,
+ConnectionKey *key);
 void connection_hashtable_reset(GHashTable *connection_track_table);
 Packet *packet_new(const void *data, int size, int vnet_hdr_len);
 void packet_destroy(void *opaque, void *user_data);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index f18a71bf2e..c463a0c1d0 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -20,11 +20,15 @@
 #include "qemu/main-loop.h"
 #include "qemu/iov.h"
 #include "net/checksum.h"
+#include "net/colo.h"
+#include "migration/colo.h"
 
 #define FILTER_COLO_REWRITER(obj) \
 OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
 
 #define TYPE_FILTER_REWRITER "filter-rewriter"
+#define FAILOVER_MODE_ON  true
+#define FAILOVER_MODE_OFF false
 
 typedef struct RewriterState {
 NetFilterState parent_obj;
@@ -32,8 +36,14 @@ typedef struct RewriterState {
 /* hashtable to save connection */
 GHashTable *connection_track_table;
 bool vnet_hdr;
+bool failover_mode;
 } RewriterState;
 
+static void filter_rewriter_failover_mode(RewriterState *s)
+{
+s->failover_mode = FAILOVER_MODE_ON;
+}
+
 static void filter_rewriter_flush(NetFilterState *nf)
 {
 RewriterState *s = FILTER_COLO_REWRITER(nf);
@@ -269,6 +279,13 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState 
*nf,
  */
 reverse_connection_key(&key);
 }
+
+/* After failover we needn't change new TCP packet */
+if (s->failover_mode &&
+connection_has_tracked(s->connection_track_table, &key)) {
+goto out;
+}
+
 conn = connection_get(s->connection_track_table,
   &key,
   NULL);
@@ -302,11 +319,49 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState 
*nf,
 }
 }
 
+out:
 packet_destroy(pkt, NULL);
 pkt = NULL;
 return 0;
 }
 
+static void reset_seq_offset(gpointer key, gpointer value, gpointer user_data)
+{
+Connection *conn = (Connection *)value;
+
+conn->offset = 0;
+}
+
+static gboolean offset_is_nonzero(gpointer key,
+  gpointer value,
+  gpointer user_data)
+{
+Connection *conn = (Connection *)value;
+
+return conn->offset ? true : false;
+}
+
+static void colo_rewriter_handle_event(NetFilterState *nf, int event,
+   Error **errp)
+{
+RewriterState *rs = FILTER_COLO_REWRITER(nf);
+
+switch (event) {
+case COLO_EVENT_CHECKPOINT:
+g_hash_table_foreach(rs->connection_track_table,
+reset_seq_offset, NULL);
+break;
+case COLO_EVENT_FAILOVER:
+if (!g_hash_tab

[Qemu-devel] [PATCH V10 08/20] ram/COLO: Record the dirty pages that SVM received

2018-07-22 Thread Zhang Chen
We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.

Here, it is a trick, we record dirty pages by re-using migration
dirty bitmap. In the later patch, we will start the dirty log
for SVM, just like migration, in this way, we can record both
the dirty pages caused by PVM and SVM, we only flush those dirty
pages from RAM cache while do checkpoint.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 33ebd09d70..d1060f1337 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3325,6 +3325,15 @@ static inline void 
*colo_cache_from_block_offset(RAMBlock *block,
  __func__, block->idstr);
 return NULL;
 }
+
+/*
+* During colo checkpoint, we need bitmap of these migrated pages.
+* It help us to decide which pages in ram cache should be flushed
+* into VM's RAM later.
+*/
+if (!test_and_set_bit(offset >> TARGET_PAGE_BITS, block->bmap)) {
+ram_state->migration_dirty_pages++;
+}
 return block->colo_cache + offset;
 }
 
@@ -3555,6 +3564,24 @@ int colo_init_ram_cache(void)
 memcpy(block->colo_cache, block->host, block->used_length);
 }
 rcu_read_unlock();
+/*
+* Record the dirty pages that sent by PVM, we use this dirty bitmap 
together
+* with to decide which page in cache should be flushed into SVM's RAM. Here
+* we use the same name 'ram_bitmap' as for migration.
+*/
+if (ram_bytes_total()) {
+RAMBlock *block;
+
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
+
+block->bmap = bitmap_new(pages);
+bitmap_set(block->bmap, 0, pages);
+ }
+}
+ram_state = g_new0(RAMState, 1);
+ram_state->migration_dirty_pages = 0;
+
 return 0;
 
 out_locked:
@@ -3574,6 +3601,10 @@ void colo_release_ram_cache(void)
 {
 RAMBlock *block;
 
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+g_free(block->bmap);
+block->bmap = NULL;
+}
 rcu_read_lock();
 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 if (block->colo_cache) {
@@ -3582,6 +3613,8 @@ void colo_release_ram_cache(void)
 }
 }
 rcu_read_unlock();
+g_free(ram_state);
+ram_state = NULL;
 }
 
 /**
-- 
2.17.1




[Qemu-devel] [PATCH V10 20/20] docs: Add COLO status diagram to COLO-FT.txt

2018-07-22 Thread Zhang Chen
From: Zhang Chen 

This diagram make user better understand COLO.
Suggested by Markus Armbruster.

Signed-off-by: Zhang Chen 
---
 docs/COLO-FT.txt | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
index d7c7dcda8f..d5007895d1 100644
--- a/docs/COLO-FT.txt
+++ b/docs/COLO-FT.txt
@@ -110,6 +110,40 @@ Note:
 HeartBeat has not been implemented yet, so you need to trigger failover process
 by using 'x-colo-lost-heartbeat' command.
 
+== COLO operation status ==
+
++-+
+| |
+|Start COLO   |
+| |
++++
+ |
+ |  Main qmp command:
+ |  migrate-set-capabilities with x-colo
+ |  migrate
+ |
+ v
++++
+| |
+|  COLO running   |
+| |
++++
+ |
+ |  Main qmp command:
+ |  x-colo-lost-heartbeat
+ |  or
+ |  some error happened
+ v
++++
+| |  send qmp event:
+|  COLO failover  |  COLO_EXIT
+| |
++-+
+
+COLO use the qmp command switching and report operation status.
+The diagram just write the main qmp command, you can get the detail
+in test procedure.
+
 == Test procedure ==
 1. Startup qemu
 Primary:
-- 
2.17.1




[Qemu-devel] [PATCH V10 12/20] qapi: Add new command to query colo status

2018-07-22 Thread Zhang Chen
Libvirt or other high level software can use this command query colo status.
You can test this command like that:
{'execute':'query-colo-status'}

Signed-off-by: Zhang Chen 
---
 migration/colo.c| 21 +
 qapi/migration.json | 32 
 2 files changed, 53 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 8fdb79ac73..b355de3f01 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -29,6 +29,7 @@
 #include "net/colo.h"
 #include "block/block.h"
 #include "qapi/qapi-events-migration.h"
+#include "qapi/qmp/qerror.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -237,6 +238,26 @@ void qmp_xen_colo_do_checkpoint(Error **errp)
 #endif
 }
 
+COLOStatus *qmp_query_colo_status(Error **errp)
+{
+COLOStatus *s = g_new0(COLOStatus, 1);
+
+s->mode = get_colo_mode();
+
+switch (failover_get_state()) {
+case FAILOVER_STATUS_NONE:
+s->reason = COLO_EXIT_REASON_NONE;
+break;
+case FAILOVER_STATUS_REQUIRE:
+s->reason = COLO_EXIT_REASON_REQUEST;
+break;
+default:
+s->reason = COLO_EXIT_REASON_ERROR;
+}
+
+return s;
+}
+
 static void colo_send_message(QEMUFile *f, COLOMessage msg,
   Error **errp)
 {
diff --git a/qapi/migration.json b/qapi/migration.json
index cdae7ddccb..21b9946595 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1251,6 +1251,38 @@
 ##
 { 'command': 'xen-colo-do-checkpoint' }
 
+##
+# @COLOStatus:
+#
+# The result format for 'query-colo-status'.
+#
+# @mode: COLO running mode. If COLO is running, this field will return
+#'primary' or 'secondary'.
+#
+# @reason: describes the reason for the COLO exit.
+#
+# Since: 3.0
+##
+{ 'struct': 'COLOStatus',
+  'data': { 'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
+
+##
+# @query-colo-status:
+#
+# Query COLO status while the vm is running.
+#
+# Returns: A @COLOStatus object showing the status.
+#
+# Example:
+#
+# -> { "execute": "query-colo-status" }
+# <- { "return": { "mode": "primary", "active": true, "reason": "request" } }
+#
+# Since: 3.0
+##
+{ 'command': 'query-colo-status',
+  'returns': 'COLOStatus' }
+
 ##
 # @migrate-recover:
 #
-- 
2.17.1




[Qemu-devel] [PATCH V10 14/20] COLO: flush host dirty ram from cache

2018-07-22 Thread Zhang Chen
From: zhanghailiang 

Don't need to flush all VM's ram from cache, only
flush the dirty pages since last checkpoint

Signed-off-by: Li Zhijian 
Signed-off-by: Zhang Chen 
Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 4df85d69c1..32bc141a2e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3581,6 +3581,7 @@ int colo_init_ram_cache(void)
 }
 ram_state = g_new0(RAMState, 1);
 ram_state->migration_dirty_pages = 0;
+memory_global_dirty_log_start();
 
 return 0;
 
@@ -3601,10 +3602,12 @@ void colo_release_ram_cache(void)
 {
 RAMBlock *block;
 
+memory_global_dirty_log_stop();
 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 g_free(block->bmap);
 block->bmap = NULL;
 }
+
 rcu_read_lock();
 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 if (block->colo_cache) {
@@ -3841,6 +3844,13 @@ static void colo_flush_ram_cache(void)
 void *src_host;
 unsigned long offset = 0;
 
+memory_global_dirty_log_sync();
+rcu_read_lock();
+RAMBLOCK_FOREACH_MIGRATABLE(block) {
+migration_bitmap_sync_range(ram_state, block, 0, block->used_length);
+}
+rcu_read_unlock();
+
 trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
 rcu_read_lock();
 block = QLIST_FIRST_RCU(&ram_list.blocks);
-- 
2.17.1




[Qemu-devel] [PATCH V10 18/20] COLO: notify net filters about checkpoint/failover event

2018-07-22 Thread Zhang Chen
From: zhanghailiang 

Notify all net filters about the checkpoint and failover event.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/colo.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 688d6f40b2..8221a18add 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -31,6 +31,7 @@
 #include "qapi/qapi-events-migration.h"
 #include "qapi/qmp/qerror.h"
 #include "sysemu/cpus.h"
+#include "net/filter.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -83,6 +84,12 @@ static void secondary_vm_do_failover(void)
 error_report_err(local_err);
 }
 
+/* Notify all filters of all NIC to do checkpoint */
+colo_notify_filters_event(COLO_EVENT_FAILOVER, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+
 if (!autostart) {
 error_report("\"-S\" qemu option will be ignored in secondary side");
 /* recover runstate to normal migration finish state */
@@ -781,6 +788,14 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+/* Notify all filters of all NIC to do checkpoint */
+colo_notify_filters_event(COLO_EVENT_CHECKPOINT, &local_err);
+
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vmstate_loading = false;
 vm_start();
 trace_colo_vm_state_change("stop", "run");
-- 
2.17.1




[Qemu-devel] ARM64 STR Instruction Crash Regression in TCG

2018-07-22 Thread Jason A. Donenfeld
Hello,

Gcc 7.3 compiles bash's array_flush's dual assignment using:

STP X20, X20, [X20,#0x10]

But gcc 8.1 compiles it as:

STR Q0, [X20,#0x10]

Real processors seem okay, and qemu 2.11 seems okay. But qemu 2.12
results in a segfaulting process. I'm pretty sure this is a TCG bug.

In the attached tarball, please find kernel and run.sh. Calling
./run.sh will start the kernel with the bad bash executable that tries
to execute `config=({1..10})` and crashes. Also included in there
is the actual crashing bash binary, in case you'd like to disassemble
a little bit.

This is affecting builds on https://www.wireguard.com/build-status/ --
as you can see, at the moment aarch64 is failing.

Regards,
Jason

[ attachment: https://data.zx2c4.com/bash-qemu-arm64-crash.tar.xz ]



[Qemu-devel] [Bug 1769189] Re: Issue with qemu 2.12.0 + SATA

2018-07-22 Thread Peter Maloney
Could this affect virtio-scsi? I'm not so sure since it's not perfectly
reliable to reproduce, but v2.12.0 was hanging for me for a few minutes
at a time with virtio-scsi cache=writeback showing 100% disk util%. I
never had issues booting up, and didn't try SATA. v2.11.1 was fine.

My first attempt to bisect didn't turn out right... had some false
positives I guess. The 2nd attempt (telling git the bads from first try)
got to 89e46eb477113550485bc24264d249de9fd1260a as latest good (which is
4 commits *after* the bisect by John Snow) and
7273db9d28d5e1b7a6c202a5054861c1f0bcc446 as bad.

But testing with this patch, it seems to work (or false positive
still... after a bunch of usage). And so I didn't finish the bisect.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1769189

Title:
  Issue with qemu 2.12.0 + SATA

Status in QEMU:
  Fix Committed

Bug description:
  [EDIT: I first thought that OVMF was the issue, but it turns out to be
  SATA]

  I had a Windows 10 VM running perfectly fine with a SATA drive, since
  I upgraded to qemu 2.12, the guests hangs for a couple of minutes,
  works for a few seconds, and hangs again, etc. By "hang" I mean it
  doesn't freeze, but it looks like it's waiting on IO or something, I
  can move the mouse but everything needing disk access is unresponsive.

  What doesn't work: qemu 2.12 with SATA
  What works: using VirIO-SCSI with qemu 2.12 or downgrading qemu to 2.11.1 and 
keep using SATA.

  Platform is arch linux 4.16.7 on skylake and Haswell, I have attached
  the vm xml file.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1769189/+subscriptions



Re: [Qemu-devel] ARM64 STR Instruction Crash Regression in TCG

2018-07-22 Thread Richard Henderson
On 07/22/2018 01:47 PM, Jason A. Donenfeld wrote:
> Hello,
> 
> Gcc 7.3 compiles bash's array_flush's dual assignment using:
> 
> STP X20, X20, [X20,#0x10]
> 
> But gcc 8.1 compiles it as:
> 
> STR Q0, [X20,#0x10]
> 
> Real processors seem okay, and qemu 2.11 seems okay. But qemu 2.12
> results in a segfaulting process. I'm pretty sure this is a TCG bug.
> 
> In the attached tarball, please find kernel and run.sh. Calling
> ./run.sh will start the kernel with the bad bash executable that tries
> to execute `config=({1..10})` and crashes. Also included in there
> is the actual crashing bash binary, in case you'd like to disassemble
> a little bit.

Interesting.  The test passes on master with --enable-debug, but fails when
qemu is compiled with optimization...

I'll dig a bit deeper.


r~



[Qemu-devel] qemu-system-ppc -M mac99 and -soundhw es1370 doesn't start

2018-07-22 Thread Andrew Randrianasulu
Hello!

Currently I'm trying pre-releases of qemu, for avoiding situation when release 
was too bugged (2.12, for my taste ..qemu-system-alpha was broken, 
qemu-system-x86_64 -M q35 was broken ..) 

using 
qemu-system-ppc --version
QEMU emulator version 2.12.91 (v3.0.0-rc1-17-g5b3ecd3d94-dirty)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

I can't start qemu-system-ppc with -M mac99 and -soundhw es1370:

qemu-system-ppc -monitor stdio -soundhw es1370 -vga std -M mac99
QEMU 2.12.91 monitor - type 'help' for more information
(qemu) qemu-system-ppc: PCI bus not available for es1370

qemu-system-ppc64 starts fine:
emu-system-ppc64 -monitor stdio -soundhw es1370 -vga std -M mac99
QEMU 2.12.91 monitor - type 'help' for more information
(qemu) info pci
  Bus  0, device  11, function 0:
Host bridge: PCI device 106b:004b
  id ""
  Bus  0, device  12, function 0:
Class 65280: PCI device 106b:0022
  BAR0: 32 bit memory at 0x8000 [0x8007].
  id ""
  Bus  0, device  13, function 0:
USB controller: PCI device 106b:003f
  IRQ 28.
  BAR0: 32 bit memory at 0x8008 [0x800800ff].
  id ""
  Bus  0, device  14, function 0:
VGA controller: PCI device 1234:
  BAR0: 32 bit prefetchable memory at 0x8100 [0x81ff].
  BAR2: 32 bit memory at 0x8200 [0x82000fff].
  BAR6: 32 bit memory at 0x8201 [0x8201].
  id ""
  Bus  0, device  15, function 0:
Ethernet controller: PCI device 10ec:8029
  IRQ 30.
  BAR0: I/O at 0x1000 [0x10ff].
  BAR6: 32 bit memory at 0x8204 [0x8207].
  id ""
  Bus  0, device  16, function 0:
Audio controller: PCI device 1274:5000
  IRQ 27.
  BAR0: I/O at 0x1100 [0x11ff].
  id ""
(qemu) 

Note, I'm using 32-bit qemu build, configured like this:
setarch 
i486 ./configure --prefix=/usr --disable-gtk --enable-virglrenderer 
--enable-sdl 
\ --with-sdlabi=2.0 --audio-drv-list=alsa,oss --host-cc=/opt/gcc49/bin/gcc 
\ --enable-opengl --extra-cflags="-I/usr/X11R7/include -O3 -march=i686 
\ -mtune=native -m32 -Wno-maybe-uninitialized -Wno-nested-externs 
\ -Wno-implicit-function-declaration" --disable-werror

for qemu-system-ppc -M mac99 'info pci' shows:

qemu-system-ppc -monitor stdio -vga std -M mac99
QEMU 2.12.91 monitor - type 'help' for more information
(qemu) info pci
  Bus  0, device  11, function 0:
Host bridge: PCI device 106b:001f
  id ""
  Bus  0, device  12, function 0:
Class 65280: PCI device 106b:0022
  BAR0: 32 bit memory at 0x8000 [0x8007].
  id ""
  Bus  0, device  13, function 0:
USB controller: PCI device 106b:003f
  IRQ 28.
  BAR0: 32 bit memory at 0x8008 [0x800800ff].
  id ""
  Bus  0, device  14, function 0:
VGA controller: PCI device 1234:
  BAR0: 32 bit prefetchable memory at 0x8100 [0x81ff].
  BAR2: 32 bit memory at 0x8200 [0x82000fff].
  BAR6: 32 bit memory at 0x8201 [0x8201].
  id ""
  Bus  0, device  15, function 0:
Ethernet controller: PCI device 10ec:8029
  IRQ 30.
  BAR0: I/O at 0x1000 [0x10ff].
  BAR6: 32 bit memory at 0x8204 [0x8207].
  id ""
  Bus  0, device  14, function 0:
Host bridge: PCI device 106b:001e
  id ""
  Bus  0, device  11, function 0:
Host bridge: PCI device 106b:0020
  id ""
(qemu) 

Last two devices seems strange .

Also, mouse seems to be confused, try  finnix-ppc-110.iso (it enables gpm on 
startup, start like qemu-system-ppc -monitor stdio -M mac99 -cdrom 
~/finnix-ppc-110.iso and try to move mouse inside qemu window. It feels like 
keyboard got some mouseclicks too? or mouse movement generates mouse button 
clicks, too) or http://cdimage.ubuntu.com/lubuntu/releases/16.04/release/ ->
  
[   ]  lubuntu-16.04-desktop-powerpc.iso   (but you must wait for nearly 10 min 
until it finishes booting)  
   



[Qemu-devel] qemu-system_x86-64 (32-bit binary) -M q35 can be crashed on viewing youtube.

2018-07-22 Thread Andrew Randrianasulu
It was crashing and crashing, so I tried to debug it a bit ... 


valgrind --leak-check=yes /dev/shm/qemu/x86_64-softmmu/qemu-system-x86_64  
-display 
sdl,gl=on  -M q35 -soundhw 
hda -cdrom /home/guest/Downloads/ISO/slax-English-US-7.0.8-x86_64.iso -m 
1G -enable-kvm -d trace:e1000e*   shows some errors at very end, see attached 
file.

For reproduction, wait for liveCD to finish loading, start firefox, go to 
youtube.com, it will warn you about outdated browser but continue anyway, try 
to click on any video 

It seems even modern KDE Neon distribution affected by same bug :/
--quote-

29362@1532305439.464672:e1000e_core_write Write to register 0xd0, 4 byte(s), 
value: 0x10
29362@1532305439.464735:e1000e_irq_set_ims Setting IMS bits 0x10: 
0x154 --> 0x154
29362@1532305439.464791:e1000e_irq_msix_pending_clearing Clearing MSI-X pending 
bit for cause 0x10, IVAR config 0x800a0908, vector 0
29362@1532305439.464867:e1000e_irq_fix_icr_asserted ICR_ASSERTED bit fixed: 
0x8002
29362@1532305439.464916:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
0x8002, IMS: 0x154)
29362@1532305439.524010:e1000e_core_write Write to register 0x3818, 4 byte(s), 
value: 0x97
29362@1532305439.524097:e1000e_tx_descr 0x21180e : 27000b68 5b43600
29362@1532305439.524169:e1000e_tx_descr 0x3bf970fa : 261005c6 300
29362@1532305439.524248:e1000e_tx_descr 0x1c1c8000 : af1005d8 300
==29362== Invalid write of size 4
==29362==at 0x552E58: memcpy (string3.h:53)
==29362==by 0x552E58: m_cat (mbuf.c:143)
==29362==by 0x54FE1E: ip_reass (ip_input.c:341)
==29362==by 0x54FE1E: ip_input (ip_input.c:190)
==29362==by 0x552478: slirp_input (slirp.c:874)
==29362==by 0x53FBE7: net_slirp_receive (slirp.c:121)
==29362==by 0x537478: nc_sendv_compat (net.c:701)
==29362==by 0x537478: qemu_deliver_packet_iov (net.c:728)
==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
==29362==by 0x538925: qemu_sendv_packet (net.c:772)
==29362==by 0x53AB71: net_hub_receive_iov (hub.c:73)
==29362==by 0x53AB71: net_hub_port_receive_iov (hub.c:124)
==29362==by 0x53748D: qemu_deliver_packet_iov (net.c:726)
==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
==29362==  Address 0x114f9b60 is 0 bytes after a block of size 2,976 alloc'd
==29362==at 0x482B29C: malloc (vg_replace_malloc.c:299)
==29362==by 0x534D3E1: g_malloc (in /usr/lib/libglib-2.0.so.0.4600.2)
==29362==by 0x552B53: m_inc.part.1 (mbuf.c:166)
==29362==by 0x552EAE: m_inc (string3.h:53)
==29362==by 0x552EAE: m_cat (mbuf.c:141)
==29362==by 0x54FE1E: ip_reass (ip_input.c:341)
==29362==by 0x54FE1E: ip_input (ip_input.c:190)
==29362==by 0x552478: slirp_input (slirp.c:874)
==29362==by 0x53FBE7: net_slirp_receive (slirp.c:121)
==29362==by 0x537478: nc_sendv_compat (net.c:701)
==29362==by 0x537478: qemu_deliver_packet_iov (net.c:728)
==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
==29362==by 0x538925: qemu_sendv_packet (net.c:772)
==29362==by 0x53AB71: net_hub_receive_iov (hub.c:73)
==29362==by 0x53AB71: net_hub_port_receive_iov (hub.c:124)
==29362==
==29362== Invalid write of size 4
==29362==at 0x552E5E: memcpy (string3.h:53)
==29362==by 0x552E5E: m_cat (mbuf.c:143)
==29362==by 0x54FE1E: ip_reass (ip_input.c:341)
==29362==by 0x54FE1E: ip_input (ip_input.c:190)
==29362==by 0x552478: slirp_input (slirp.c:874)
==29362==by 0x53FBE7: net_slirp_receive (slirp.c:121)
==29362==by 0x537478: nc_sendv_compat (net.c:701)
==29362==by 0x537478: qemu_deliver_packet_iov (net.c:728)
==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
==29362==by 0x538925: qemu_sendv_packet (net.c:772)
==29362==by 0x53AB71: net_hub_receive_iov (hub.c:73)
==29362==by 0x53AB71: net_hub_port_receive_iov (hub.c:124)
==29362==by 0x53748D: qemu_deliver_packet_iov (net.c:726)
==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
==29362==  Address 0x114f9b78 is 24 bytes before an unallocated block of size 
156,632 in arena "client"
==29362==
==29362== Invalid write of size 4
==29362==at 0x552E6B: memcpy (string3.h:53)
==29362==by 0x552E6B: m_cat (mbuf.c:143)
==29362==by 0x54FE1E: ip_reass (ip_input.c:341)
==29362==by 0x54FE1E: ip_input (ip_input.c:190)

[Qemu-devel] [PATCH for-3.0] tcg/i386: Mark xmm registers call-clobbered

2018-07-22 Thread Richard Henderson
When host vector registers and operations were introduced, I failed
to mark the registers call clobbered as required by the ABI.

Fixes: 770c2fc7bb7
Cc: qemu-sta...@nongnu.org
Reported-by: Jason A. Donenfeld 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index e87b0d445e..a91e4f1313 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3532,7 +3532,7 @@ static void tcg_target_init(TCGContext *s)
 tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
 }
 
-tcg_target_call_clobber_regs = 0;
+tcg_target_call_clobber_regs = ALL_VECTOR_REGS;
 tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_EAX);
 tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_EDX);
 tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_ECX);
-- 
2.17.1




Re: [Qemu-devel] ARM64 STR Instruction Crash Regression in TCG

2018-07-22 Thread Richard Henderson
On 07/22/2018 02:31 PM, Richard Henderson wrote:
> On 07/22/2018 01:47 PM, Jason A. Donenfeld wrote:
>> Hello,
>>
>> Gcc 7.3 compiles bash's array_flush's dual assignment using:
>>
>> STP X20, X20, [X20,#0x10]
>>
>> But gcc 8.1 compiles it as:
>>
>> STR Q0, [X20,#0x10]
>>
>> Real processors seem okay, and qemu 2.11 seems okay. But qemu 2.12
>> results in a segfaulting process. I'm pretty sure this is a TCG bug.
>>
>> In the attached tarball, please find kernel and run.sh. Calling
>> ./run.sh will start the kernel with the bad bash executable that tries
>> to execute `config=({1..10})` and crashes. Also included in there
>> is the actual crashing bash binary, in case you'd like to disassemble
>> a little bit.
> 
> Interesting.  The test passes on master with --enable-debug, but fails when
> qemu is compiled with optimization...
> 
> I'll dig a bit deeper.

The failing sequence is

0x0045ba44:  4e080e80  dup  v0.2d, x20
0x0045ba48:  9340  adrp x0, #0x4c3000
0x0045ba4c:  91098003  add  x3, x0, #0x260
0x0045ba50:  9281  movn x1, #0
0x0045ba54:  f9413002  ldr  x2, [x0, #0x260]
0x0045ba58:  3d800680  str  q0, [x20, #0x10]
...

OP after optimization and liveness analysis:
 ld_i32 tmp0,env,$0xffdc  dead: 1
 movi_i32 tmp1,$0x0
 brcond_i32 tmp0,tmp1,lt,$L0  dead: 0 1

  0045ba44  
 dup_vec v128,e64,tmp2,x20
 st_vec v128,e8,tmp2,env,$0x8c0   dead: 0

...

  0045ba58  
 movi_i64 tmp4,$0x10
 add_i64 tmp3,x20,tmp4dead: 1 2
 ld_i64 tmp4,env,$0x8c0
 movi_i64 tmp6,$0x8
 add_i64 tmp5,tmp3,tmp6   dead: 2
 qemu_st_i64 tmp4,tmp3,leq,0  dead: 0 1
 ld_i64 tmp4,env,$0x8c8   dead: 1
 qemu_st_i64 tmp4,tmp5,leq,0  dead: 0 1
...

0x7fffcd2e678c:  vmovq0xe0(%r14), %xmm0
0x7fffcd2e6795:  vpbroadcastq %xmm0, %xmm1
0x7fffcd2e679a:  vmovdqu  %xmm1, 0x8c0(%r14)
...
0x7fffcd2c0e78:  vmovq%xmm0, %r12
0x7fffcd2c0e7d:  addq $0x10, %r12


The guest x20 is loaded in to xmm0 for the dup at 0x45ba44, and was reused for
the store at 0x45ba58.  However, if the load at 0x45ba54 misses the TLB, then
we will have a function call, which can clobber xmm0.

With -O0, it just so happens that the function call does not clobber xmm0; with
optimization enabled, the compiler's different code generation does clobber 
xmm0.

Fix by properly considering xmm registers to be call-clobbered.  At which point
the saved value is evicted from xmm0 naturally.  Patch posted separately.


r~



Re: [Qemu-devel] [PATCH] block: Don't lock /dev/null and /dev/zero automatically

2018-07-22 Thread Fam Zheng
On Sun, Jul 22, 2018 at 10:06 PM Max Reitz  wrote:
>
> On 2018-07-22 04:37, Fam Zheng wrote:
> > On Sun, Jul 22, 2018 at 5:08 AM Max Reitz  wrote:
> >>
> >> On 2018-07-19 05:41, Fam Zheng wrote:
> >>> On my Fedora 28, /dev/null is locked by some other process (couldn't
> >>> inspect it due to the current lslocks limitation), so iotests 226 fails
> >>> with some unexpected image locking errors because it uses qemu-io to
> >>> open it.
> >>>
> >>> Actually it's safe to not use any lock on /dev/null or /dev/zero.
> >>>
> >>> Signed-off-by: Fam Zheng 
> >>> ---
> >>>  block/file-posix.c | 7 ++-
> >>>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/block/file-posix.c b/block/file-posix.c
> >>> index 60af4b3d51..8bf034108a 100644
> >>> --- a/block/file-posix.c
> >>> +++ b/block/file-posix.c
> >>> @@ -503,7 +503,12 @@ static int raw_open_common(BlockDriverState *bs, 
> >>> QDict *options,
> >>>  s->use_lock = false;
> >>>  break;
> >>>  case ON_OFF_AUTO_AUTO:
> >>> -s->use_lock = qemu_has_ofd_lock();
> >>> +if (!strcmp(filename, "/dev/null") ||
> >>> +   !strcmp(filename, "/dev/zero")) {
> >>
> >> I’m not sure I like a strcmp() based on filename (though it should work
> >> for all of the cases where someone would want to specify either of those
> >> for a qemu block device).  Isn’t there some specific major/minor number
> >> we can use to check for these two files?  Or are those Linux-specific?
> >
> > Yeah, I guess major/minor numbers are Linux-specific.
> >
> >>
> >> I could also imagine just not locking any host_device, but since people
> >> do use qcow2 immediately on block devices, maybe we do want to lock them.
> >
> > That's right.
> >
> >>
> >> Finally, if really all you are trying to do is to make test code easier,
> >> then I don’t know exactly why.  Just disabling locking in 226 shouldn’t
> >> be too hard.
> >
> > The tricky thing is in remembering to do that for future test cases.
> > My machine seems to be somehow different than John's so that my 226
> > cannot lock /dev/null, but I'm not sure that is the case for other
> > releases, distros or system instances.
>
> Usually we don’t need to use /dev/null, though, because null-co:// is
> better suited for most purposes.  This is a very specific test of host
> devices.
>
> Maybe we should start a doc file with common good practices about
> writing iotests?

Yes, mentioning using pseudo devices in docs/devel/testing.rst would
probably be a good idea. So is my understanding right that you prefer
fixing the test case and discard this patch? If so I'll send another
version together with the doc update.

Fam

>
> Max
>



Re: [Qemu-devel] qemu-system_x86-64 (32-bit binary) -M q35 can be crashed on viewing youtube.

2018-07-22 Thread Fam Zheng
On Mon, 07/23 03:42, Andrew Randrianasulu wrote:
> It was crashing and crashing, so I tried to debug it a bit ... 
> 
> 
> valgrind --leak-check=yes /dev/shm/qemu/x86_64-softmmu/qemu-system-x86_64  
> -display 
> sdl,gl=on  -M q35 -soundhw 
> hda -cdrom /home/guest/Downloads/ISO/slax-English-US-7.0.8-x86_64.iso -m 
> 1G -enable-kvm -d trace:e1000e*   shows some errors at very end, see attached 
> file.
> 
> For reproduction, wait for liveCD to finish loading, start firefox, go to 
> youtube.com, it will warn you about outdated browser but continue anyway, try 
> to click on any video 
> 
> It seems even modern KDE Neon distribution affected by same bug :/

Cc'ing Jason and Samuel/Jan who maintain net and slirp respectively.

> --quote-
> 
> 29362@1532305439.464672:e1000e_core_write Write to register 0xd0, 4 byte(s), 
> value: 0x10
> 29362@1532305439.464735:e1000e_irq_set_ims Setting IMS bits 0x10: 
> 0x154 --> 0x154
> 29362@1532305439.464791:e1000e_irq_msix_pending_clearing Clearing MSI-X 
> pending 
> bit for cause 0x10, IVAR config 0x800a0908, vector 0
> 29362@1532305439.464867:e1000e_irq_fix_icr_asserted ICR_ASSERTED bit fixed: 
> 0x8002
> 29362@1532305439.464916:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
> 0x8002, IMS: 0x154)
> 29362@1532305439.524010:e1000e_core_write Write to register 0x3818, 4 
> byte(s), 
> value: 0x97
> 29362@1532305439.524097:e1000e_tx_descr 0x21180e : 27000b68 5b43600
> 29362@1532305439.524169:e1000e_tx_descr 0x3bf970fa : 261005c6 300
> 29362@1532305439.524248:e1000e_tx_descr 0x1c1c8000 : af1005d8 300
> ==29362== Invalid write of size 4
> ==29362==at 0x552E58: memcpy (string3.h:53)
> ==29362==by 0x552E58: m_cat (mbuf.c:143)
> ==29362==by 0x54FE1E: ip_reass (ip_input.c:341)
> ==29362==by 0x54FE1E: ip_input (ip_input.c:190)
> ==29362==by 0x552478: slirp_input (slirp.c:874)
> ==29362==by 0x53FBE7: net_slirp_receive (slirp.c:121)
> ==29362==by 0x537478: nc_sendv_compat (net.c:701)
> ==29362==by 0x537478: qemu_deliver_packet_iov (net.c:728)
> ==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
> ==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
> ==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
> ==29362==by 0x538925: qemu_sendv_packet (net.c:772)
> ==29362==by 0x53AB71: net_hub_receive_iov (hub.c:73)
> ==29362==by 0x53AB71: net_hub_port_receive_iov (hub.c:124)
> ==29362==by 0x53748D: qemu_deliver_packet_iov (net.c:726)
> ==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
> ==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
> ==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
> ==29362==  Address 0x114f9b60 is 0 bytes after a block of size 2,976 alloc'd
> ==29362==at 0x482B29C: malloc (vg_replace_malloc.c:299)
> ==29362==by 0x534D3E1: g_malloc (in /usr/lib/libglib-2.0.so.0.4600.2)
> ==29362==by 0x552B53: m_inc.part.1 (mbuf.c:166)
> ==29362==by 0x552EAE: m_inc (string3.h:53)
> ==29362==by 0x552EAE: m_cat (mbuf.c:141)
> ==29362==by 0x54FE1E: ip_reass (ip_input.c:341)
> ==29362==by 0x54FE1E: ip_input (ip_input.c:190)
> ==29362==by 0x552478: slirp_input (slirp.c:874)
> ==29362==by 0x53FBE7: net_slirp_receive (slirp.c:121)
> ==29362==by 0x537478: nc_sendv_compat (net.c:701)
> ==29362==by 0x537478: qemu_deliver_packet_iov (net.c:728)
> ==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
> ==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
> ==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
> ==29362==by 0x538925: qemu_sendv_packet (net.c:772)
> ==29362==by 0x53AB71: net_hub_receive_iov (hub.c:73)
> ==29362==by 0x53AB71: net_hub_port_receive_iov (hub.c:124)
> ==29362==
> ==29362== Invalid write of size 4
> ==29362==at 0x552E5E: memcpy (string3.h:53)
> ==29362==by 0x552E5E: m_cat (mbuf.c:143)
> ==29362==by 0x54FE1E: ip_reass (ip_input.c:341)
> ==29362==by 0x54FE1E: ip_input (ip_input.c:190)
> ==29362==by 0x552478: slirp_input (slirp.c:874)
> ==29362==by 0x53FBE7: net_slirp_receive (slirp.c:121)
> ==29362==by 0x537478: nc_sendv_compat (net.c:701)
> ==29362==by 0x537478: qemu_deliver_packet_iov (net.c:728)
> ==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
> ==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
> ==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
> ==29362==by 0x538925: qemu_sendv_packet (net.c:772)
> ==29362==by 0x53AB71: net_hub_receive_iov (hub.c:73)
> ==29362==by 0x53AB71: net_hub_port_receive_iov (hub.c:124)
> ==29362==by 0x53748D: qemu_deliver_packet_iov (net.c:726)
> ==29362==by 0x53A649: qemu_net_queue_deliver_iov (queue.c:179)
> ==29362==by 0x53A649: qemu_net_queue_send_iov (queue.c:224)
> ==29362==by 0x538901: qemu_sendv_packet_async (net.c:764)
> ==29362==  Address 0x114f9b78 is 24 byt

Re: [Qemu-devel] [PATCH qemu v2] vfio/spapr: Allow backing bigger guest IOMMU pages with smaller physical pages

2018-07-22 Thread David Gibson
On Wed, Jun 20, 2018 at 07:10:12PM +1000, Alexey Kardashevskiy wrote:
> At the moment the PPC64/pseries guest only supports 4K/64K/16M IOMMU
> pages and POWER8 CPU supports the exact same set of page size so
> so far things worked fine.
> 
> However POWER9 supports different set of sizes - 4K/64K/2M/1G and
> the last two - 2M and 1G - are not even allowed in the paravirt interface
> (RTAS DDW) so we always end up using 64K IOMMU pages, although we could
> back guest's 16MB IOMMU pages with 2MB pages on the host.
> 
> This stores the supported host IOMMU page sizes in VFIOContainer and uses
> this later when creating a new DMA window. This uses the system page size
> (64k normally, 2M/16M/1G if hugepages used) as the upper limit of
> the IOMMU pagesize.
> 
> This changes the type of @pagesize to uint64_t as this is what
> memory_region_iommu_get_min_page_size() returns and clz64() takes.
> 
> There should be no behavioral changes on platforms other than pseries.
> The guest will keep using the IOMMU page size selected by the PHB pagesize
> property as this only changes the underlying hardware TCE table
> granularity.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> Changes:
> v2:
> * fixed biggest but smaller page size calculation
> * limit IOMMU pagesize to the system pagesize

Sorry, this fell of my radar for a bit.

I've now applied it to ppc-for-3.1.

> ---
>  include/hw/vfio/vfio-common.h |  1 +
>  hw/vfio/common.c  |  3 +++
>  hw/vfio/spapr.c   | 21 -
>  3 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index a903692..c20524d 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -73,6 +73,7 @@ typedef struct VFIOContainer {
>  unsigned iommu_type;
>  int error;
>  bool initialized;
> +unsigned long pgsizes;
>  /*
>   * This assumes the host IOMMU can support only a single
>   * contiguous IOVA window.  We may need to generalize that in
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index fb396cf..40f0356 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1108,6 +1108,7 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>  info.iova_pgsizes = 4096;
>  }
>  vfio_host_win_add(container, 0, (hwaddr)-1, info.iova_pgsizes);
> +container->pgsizes = info.iova_pgsizes;
>  } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
> ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
>  struct vfio_iommu_spapr_tce_info info;
> @@ -1172,6 +1173,7 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>  }
>  
>  if (v2) {
> +container->pgsizes = info.ddw.pgsizes;
>  /*
>   * There is a default window in just created container.
>   * To make region_add/del simpler, we better remove this
> @@ -1186,6 +1188,7 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>  }
>  } else {
>  /* The default table uses 4K pages */
> +container->pgsizes = 0x1000;
>  vfio_host_win_add(container, info.dma32_window_start,
>info.dma32_window_start +
>info.dma32_window_size - 1,
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 259397c..becf71a 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -15,6 +15,7 @@
>  
>  #include "hw/vfio/vfio-common.h"
>  #include "hw/hw.h"
> +#include "exec/ram_addr.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  
> @@ -144,9 +145,27 @@ int vfio_spapr_create_window(VFIOContainer *container,
>  {
>  int ret;
>  IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> -unsigned pagesize = memory_region_iommu_get_min_page_size(iommu_mr);
> +uint64_t pagesize = memory_region_iommu_get_min_page_size(iommu_mr);
>  unsigned entries, pages;
>  struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) };
> +long systempagesize = qemu_getrampagesize();
> +
> +/*
> + * The host might not support the guest supported IOMMU page size,
> + * so we will use smaller physical IOMMU pages to back them.
> + */
> +if (pagesize > systempagesize) {
> +pagesize = systempagesize;
> +}
> +pagesize = 1ULL << (63 - clz64(container->pgsizes &
> +   (pagesize | (pagesize - 1;
> +if (!pagesize) {
> +error_report("Host doesn't support page size 0x%"PRIx64
> + ", the supported mask is 0x%lx",
> + memory_region_iommu_get_min_page_size(iommu_mr),
> + container->pgsizes);
> +return -EINVAL;
> +}
>  
>  /*
>   * FIXME: For VFIO iommu types which have KVM acceleration to


Re: [Qemu-devel] [PATCH v2 1/8] migration: do not wait for free thread

2018-07-22 Thread Peter Xu
On Thu, Jul 19, 2018 at 08:15:13PM +0800, guangrong.x...@gmail.com wrote:
> @@ -3113,6 +3132,8 @@ static Property migration_properties[] = {
>  DEFINE_PROP_UINT8("x-compress-threads", MigrationState,
>parameters.compress_threads,
>DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT),
> +DEFINE_PROP_BOOL("x-compress-wait-thread", MigrationState,
> +  parameters.compress_wait_thread, false),

This performance feature bit makes sense to me, but I would still
think it should be true by default to keep the old behavior:

- it might change the behavior drastically: we might be in a state
  between "normal" migration and "compressed" migration since we'll
  contain both of the pages.  Old compression users might not expect
  that.

- it might still even perform worse - an extreme case is that when
  network bandwidth is very very limited but instead we have plenty of
  CPU resources. [1]

So it's really a good tunable for me when people really needs to
understand what's it before turning it on.

>  DEFINE_PROP_UINT8("x-decompress-threads", MigrationState,
>parameters.decompress_threads,
>DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT),
> diff --git a/migration/migration.h b/migration/migration.h
> index 64a7b33735..a46b9e6c8d 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -271,6 +271,7 @@ bool migrate_use_return_path(void);
>  bool migrate_use_compression(void);
>  int migrate_compress_level(void);
>  int migrate_compress_threads(void);
> +int migrate_compress_wait_thread(void);
>  int migrate_decompress_threads(void);
>  bool migrate_use_events(void);
>  bool migrate_postcopy_blocktime(void);
> diff --git a/migration/ram.c b/migration/ram.c
> index 52dd678092..0ad234c692 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1889,30 +1889,34 @@ static int compress_page_with_multi_thread(RAMState 
> *rs, RAMBlock *block,
> ram_addr_t offset)
>  {
>  int idx, thread_count, bytes_xmit = -1, pages = -1;
> +bool wait = migrate_compress_wait_thread();
>  
>  thread_count = migrate_compress_threads();
>  qemu_mutex_lock(&comp_done_lock);
> -while (true) {
> -for (idx = 0; idx < thread_count; idx++) {
> -if (comp_param[idx].done) {
> -comp_param[idx].done = false;
> -bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
> -qemu_mutex_lock(&comp_param[idx].mutex);
> -set_compress_params(&comp_param[idx], block, offset);
> -qemu_cond_signal(&comp_param[idx].cond);
> -qemu_mutex_unlock(&comp_param[idx].mutex);
> -pages = 1;
> -ram_counters.normal++;
> -ram_counters.transferred += bytes_xmit;
> -break;
> -}
> -}
> -if (pages > 0) {
> +retry:
> +for (idx = 0; idx < thread_count; idx++) {
> +if (comp_param[idx].done) {
> +comp_param[idx].done = false;
> +bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
> +qemu_mutex_lock(&comp_param[idx].mutex);
> +set_compress_params(&comp_param[idx], block, offset);
> +qemu_cond_signal(&comp_param[idx].cond);
> +qemu_mutex_unlock(&comp_param[idx].mutex);
> +pages = 1;
> +ram_counters.normal++;
> +ram_counters.transferred += bytes_xmit;
>  break;
> -} else {
> -qemu_cond_wait(&comp_done_cond, &comp_done_lock);
>  }
>  }
> +
> +/*
> + * if there is no thread is free to compress the data and the user
> + * really expects the slowdown, wait it.

Considering [1] above, IMHO it might not really be a slow down but it
depends.  Maybe only mentioning about the fact that we're sending a
normal page instead of the compressed page if wait is not specified.

> + */
> +if (pages < 0 && wait) {
> +qemu_cond_wait(&comp_done_cond, &comp_done_lock);
> +goto retry;
> +}
>  qemu_mutex_unlock(&comp_done_lock);
>  
>  return pages;
> @@ -2226,7 +2230,10 @@ static int ram_save_target_page(RAMState *rs, 
> PageSearchStatus *pss,
>   * CPU resource.
>   */
>  if (block == rs->last_sent_block && save_page_use_compression(rs)) {
> -return compress_page_with_multi_thread(rs, block, offset);
> +res = compress_page_with_multi_thread(rs, block, offset);
> +if (res > 0) {
> +return res;
> +}
>  } else if (migrate_use_multifd()) {
>  return ram_save_multifd_page(rs, block, offset);
>  }
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 186e8a7303..b4f394844b 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -462,6 +462,11 @@
>  # @compress-threads: Set compression thread count to be used i

Re: [Qemu-devel] [PATCH v2 2/8] migration: fix counting normal page for compression

2018-07-22 Thread Peter Xu
On Thu, Jul 19, 2018 at 08:15:14PM +0800, guangrong.x...@gmail.com wrote:
> From: Xiao Guangrong 
> 
> The compressed page is not normal page
> 
> Signed-off-by: Xiao Guangrong 

I think it'll depend on how we are defining the "normal" page.  AFAIU
it's the count of raw pages, then I think it's correct.

Reviewed-by: Peter Xu 

> ---
>  migration/ram.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 0ad234c692..1b016e048d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1903,7 +1903,6 @@ retry:
>  qemu_cond_signal(&comp_param[idx].cond);
>  qemu_mutex_unlock(&comp_param[idx].mutex);
>  pages = 1;
> -ram_counters.normal++;
>  ram_counters.transferred += bytes_xmit;
>  break;
>  }
> -- 
> 2.14.4
> 

Regards,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-22 Thread David Gibson
On Wed, Jul 18, 2018 at 06:03:29PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2018-07-18 at 16:12 +1000, David Gibson wrote:
> > On Thu, Jun 28, 2018 at 10:36:32AM +0200, Cédric Le Goater wrote:
> > > From: Benjamin Herrenschmidt 
> 
> Can you trim your replies ? It's really hard to find your comments in
> such a long patch...
> 
> > > diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> > > index 6ac8a9392da6..966a996c2eac 100644
> > > --- a/include/hw/ppc/xics.h
> > > +++ b/include/hw/ppc/xics.h
> > > @@ -194,6 +194,7 @@ void icp_set_mfrr(ICPState *icp, uint8_t mfrr);
> > >  uint32_t icp_accept(ICPState *ss);
> > >  uint32_t icp_ipoll(ICPState *ss, uint32_t *mfrr);
> > >  void icp_eoi(ICPState *icp, uint32_t xirr);
> > > +void icp_irq(ICSState *ics, int server, int nr, uint8_t priority);
> > 
> > Hrm... are you sure you need to expose this?

[snip]
> > > +limit = pci_config_size(pdev);
> > > +if (limit <= cfg_addr) {
> > > +/* conventional pci device can be behind pcie-to-pci bridge.
> > > +   256 <= addr < 4K has no effects. */
> > > +return ~0ull;
> > > +}
> > > +val = pci_host_config_read_common(pdev, cfg_addr, limit, size);
> > > +switch (size) {
> > > +case 1:
> > > +return val;
> > > +case 2:
> > > +return bswap16(val);
> > > +case 4:
> > > +return bswap32(val);
> > > +default:
> > > +g_assert_not_reached();
> > > +}
> > > +}
> > > +
> > > +static void pnv_phb3_check_m32(PnvPHB3 *phb)
> > > +{
> > > +uint64_t base, start, size;
> > > +MemoryRegion *parent;
> > > +PnvPBCQState *pbcq = &phb->pbcq;
> > > +
> > > +if (phb->m32_mapped) {
> > > +memory_region_del_subregion(phb->mr_m32.container, &phb->mr_m32);
> > > +phb->m32_mapped = false;
> > 
> > Could you use memory_region_set_enabled() rather than having to add
> > and delete the subregion and keep track of it in this separate
> > variable?
> 
> There was a reason for that but it's years old and I forgot... I think
> when re-enabled it moves around, among other things. So it's not more
> efficient.

Hm, ok.  It'd be good to know what this is.

> > > +}
> > > +
> > > +/* Disabled ? move on with life ... */
> > 
> > Trivial: "nothing to do" seems to be the standard way to express this.
> > Even trivialler: usual English rules don't put a space before '?' or
> > '!' punctuation.
> 
> No, that's the tasteless English rule :-) It shall be overridden by
> anybody interested in making things actually readable :-)
> 
> > > +
> > > +static void pnv_phb3_lxivt_write(PnvPHB3 *phb, unsigned idx, uint64_t 
> > > val)
> > > +{
> > > +ICSState *ics = &phb->lsis;
> > > +uint8_t server, prio;
> > > +
> > > +phb->ioda_LXIVT[idx] = val & (IODA2_LXIVT_SERVER |
> > > +  IODA2_LXIVT_PRIORITY |
> > > +  IODA2_LXIVT_NODE_ID);
> > > +server = GETFIELD(IODA2_LXIVT_SERVER, val);
> > > +prio = GETFIELD(IODA2_LXIVT_PRIORITY, val);
> > > +
> > > +/*
> > > + * The low order 2 bits are the link pointer (Type II interrupts).
> > 
> > I don't think we've currently implemented link pointers in our xics
> > code.  Do we need to do that?
> 
> Not until somebody uses them (other than pHyp).
> 
> > > + * Shift back to get a valid IRQ server.
> > > + */
> > > +server >>= 2;
> > > +
> > > +ics_simple_write_xive(ics, idx, server, prio, prio);
> > > +}
> > > +
> > > +static uint64_t *pnv_phb3_ioda_access(PnvPHB3 *phb,
> > > +  unsigned *out_table, unsigned 
> > > *out_idx)
> > > +{
> > > +uint64_t adreg = phb->regs[PHB_IODA_ADDR >> 3];
> > > +unsigned int index = GETFIELD(PHB_IODA_AD_TADR, adreg);
> > > +unsigned int table = GETFIELD(PHB_IODA_AD_TSEL, adreg);
> > > +unsigned int mask;
> > > +uint64_t *tptr = NULL;
> > > +
> > > +switch (table) {
> > > +case IODA2_TBL_LIST:
> > > +tptr = phb->ioda_LIST;
> > > +mask = 7;
> > 
> > I'd suggest hex for the masks.
> 
> This is more readable when matched with the documentation but not a big
> deal.

Matching the docs is a good enough reason to keep decimal.

> > > +break;
> > > +case IODA2_TBL_LXIVT:
> > > +tptr = phb->ioda_LXIVT;
> > > +mask = 7;
> > > +break;
> > > +case IODA2_TBL_IVC_CAM:
> > > +case IODA2_TBL_RBA:
> > > +mask = 31;
> > > +break;
> > > +case IODA2_TBL_RCAM:
> > > +mask = 63;
> > > +break;
> > > +case IODA2_TBL_MRT:
> > > +mask = 7;
> > > +break;
> > > +case IODA2_TBL_PESTA:
> > > +case IODA2_TBL_PESTB:
> > > +mask = 255;
> > > +break;
> > > +case IODA2_TBL_TVT:
> > > +tptr = phb->ioda_TVT;
> > > +mask = 511;
> > > +break;
> > > +case IODA2_TBL_TCAM:
> > > +case IODA2_TBL_TDR:
> > > +mask = 63;
> > > +break;
> > > +case IO

[Qemu-devel] [Bug 1353947] Re: Hypervisor with QEMU-2.0/libvirtd 1.2.2 stack when launching VM with CirrOS or Ubuntu 12.04

2018-07-22 Thread Launchpad Bug Tracker
[Expired for linux (Ubuntu) because there has been no activity for 60
days.]

** Changed in: linux (Ubuntu)
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1353947

Title:
  Hypervisor with QEMU-2.0/libvirtd 1.2.2 stack when launching VM with
  CirrOS or Ubuntu 12.04

Status in QEMU:
  Expired
Status in linux package in Ubuntu:
  Expired

Bug description:
  The issue observed when running an hypervisor with QEMU 2.0/libvirtd 1.2.2
  The VM network interface is attached to a PCI virtual function (SR-IOV).

  When we ran VM with guest OS CirrOS or Ubuntu 12.04 we observed an hipervisor 
hang shortly after the VM is loaded
  We observed the same issue with Mellanox NIC and with Intel NIC

  We’ve tried few combinations of {GuestOS}X{Hypervisor} and we got the 
following findings:
  When a hypervisor is running QEMU 1.5/libvirtd 1.1.1 - no issue observed
  When a hypervisor is running QEMU 2.0/libvirtd 1.2.2 - CirrOS and Ubuntu 
12.04 guest OSes caused hypervisor hang
  When a hypervisor is running QEMU 2.0/libvirtd 1.2.2 - CentOS 6.4 and Ubuntu 
13.10 - no issue observed

  The problematic guest OSes are with kernel versions ~3.2.y

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1353947/+subscriptions



[Qemu-devel] [Bug 1425597] Re: moving window + changing screen resolution = bug

2018-07-22 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1425597

Title:
  moving window + changing screen resolution = bug

Status in QEMU:
  Expired

Bug description:
  Steps to reproduce:
  1. Run qemu (sdl)
  2. Start moving the window
  3. At that moment the virtualized OS should change its screen resolution (for 
example, when switching from initial qemu screen to grub)

  What I see:
  Window size doesn't change, but internal screen resolution changes, so, image 
scale stops to be 1:1, now I see virtualized OS in wrong scale.

  What I expected to see:
  Window size changes so, that it keeps synchronized with internal resolution 
(as usual)

  This bug preserves at lastest git version at the moment, i. e.
  3d30395f7fb3315e4ecf0de4e48790e1326bbd47

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1425597/+subscriptions



[Qemu-devel] [Bug 1353947] Re: Hypervisor with QEMU-2.0/libvirtd 1.2.2 stack when launching VM with CirrOS or Ubuntu 12.04

2018-07-22 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1353947

Title:
  Hypervisor with QEMU-2.0/libvirtd 1.2.2 stack when launching VM with
  CirrOS or Ubuntu 12.04

Status in QEMU:
  Expired
Status in linux package in Ubuntu:
  Expired

Bug description:
  The issue observed when running an hypervisor with QEMU 2.0/libvirtd 1.2.2
  The VM network interface is attached to a PCI virtual function (SR-IOV).

  When we ran VM with guest OS CirrOS or Ubuntu 12.04 we observed an hipervisor 
hang shortly after the VM is loaded
  We observed the same issue with Mellanox NIC and with Intel NIC

  We’ve tried few combinations of {GuestOS}X{Hypervisor} and we got the 
following findings:
  When a hypervisor is running QEMU 1.5/libvirtd 1.1.1 - no issue observed
  When a hypervisor is running QEMU 2.0/libvirtd 1.2.2 - CirrOS and Ubuntu 
12.04 guest OSes caused hypervisor hang
  When a hypervisor is running QEMU 2.0/libvirtd 1.2.2 - CentOS 6.4 and Ubuntu 
13.10 - no issue observed

  The problematic guest OSes are with kernel versions ~3.2.y

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1353947/+subscriptions



[Qemu-devel] [Bug 1257352] Re: kvm hangs occasionally when switching out of the qemu console

2018-07-22 Thread Launchpad Bug Tracker
[Expired for qemu (Ubuntu) because there has been no activity for 60
days.]

** Changed in: qemu (Ubuntu)
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1257352

Title:
  kvm hangs occasionally when switching out of the qemu console

Status in QEMU:
  Expired
Status in qemu package in Ubuntu:
  Expired

Bug description:
  To recreate (although this does *NOT* fail most of the time alas):

  1) press "ctrl-alt-2" to switch to the qemu console.
  2) type say "sendkey ctrl-alt-f1"
  3) press "ctrl-alt-1".

  Expected outcome: Switch to tty1 in the VM.

  Actual outcome: No switch to tty1 in the VM. and qemu console
  unresponsive to any keyboard input.

  
  Rather a vague problem description I'm afraid but this has happened to me 3 
times recently. No crash and no excessive CPU is observed.

  I'll grab an strace when it happens again and attach...

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: qemu-system-x86 1.6.0+dfsg-2ubuntu4
  ProcVersionSignature: Ubuntu 3.12.0-4.12-generic 3.12.1
  Uname: Linux 3.12.0-4-generic i686
  NonfreeKernelModules: nvidia
  ApportVersion: 2.12.7-0ubuntu1
  Architecture: i386
  CurrentDesktop: Unity
  Date: Tue Dec  3 15:41:40 2013
  InstallationDate: Installed on 2010-10-21 (1139 days ago)
  InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release i386 (20101007)
  SourcePackage: qemu
  UpgradeStatus: Upgraded to trusty on 2013-11-01 (31 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1257352/+subscriptions



[Qemu-devel] [Bug 1257352] Re: kvm hangs occasionally when switching out of the qemu console

2018-07-22 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1257352

Title:
  kvm hangs occasionally when switching out of the qemu console

Status in QEMU:
  Expired
Status in qemu package in Ubuntu:
  Expired

Bug description:
  To recreate (although this does *NOT* fail most of the time alas):

  1) press "ctrl-alt-2" to switch to the qemu console.
  2) type say "sendkey ctrl-alt-f1"
  3) press "ctrl-alt-1".

  Expected outcome: Switch to tty1 in the VM.

  Actual outcome: No switch to tty1 in the VM. and qemu console
  unresponsive to any keyboard input.

  
  Rather a vague problem description I'm afraid but this has happened to me 3 
times recently. No crash and no excessive CPU is observed.

  I'll grab an strace when it happens again and attach...

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: qemu-system-x86 1.6.0+dfsg-2ubuntu4
  ProcVersionSignature: Ubuntu 3.12.0-4.12-generic 3.12.1
  Uname: Linux 3.12.0-4-generic i686
  NonfreeKernelModules: nvidia
  ApportVersion: 2.12.7-0ubuntu1
  Architecture: i386
  CurrentDesktop: Unity
  Date: Tue Dec  3 15:41:40 2013
  InstallationDate: Installed on 2010-10-21 (1139 days ago)
  InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release i386 (20101007)
  SourcePackage: qemu
  UpgradeStatus: Upgraded to trusty on 2013-11-01 (31 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1257352/+subscriptions



Re: [Qemu-devel] [PATCH v2 3/8] migration: show the statistics of compression

2018-07-22 Thread Peter Xu
On Thu, Jul 19, 2018 at 08:15:15PM +0800, guangrong.x...@gmail.com wrote:
> @@ -1597,6 +1608,24 @@ static void migration_update_rates(RAMState *rs, 
> int64_t end_time)
>  rs->xbzrle_cache_miss_prev) / iter_count;
>  rs->xbzrle_cache_miss_prev = xbzrle_counters.cache_miss;
>  }
> +
> +if (migrate_use_compression()) {
> +uint64_t comp_pages;
> +
> +compression_counters.busy_rate = (double)(compression_counters.busy -
> +rs->compress_thread_busy_prev) / iter_count;

Here I'm not sure it's correct...

"iter_count" stands for ramstate.iterations.  It's increased per
ram_find_and_save_block(), so IMHO it might contain multiple guest
pages.  However compression_counters.busy should be per guest page.

> +rs->compress_thread_busy_prev = compression_counters.busy;
> +
> +comp_pages = compression_counters.pages - rs->compress_pages_prev;
> +if (comp_pages) {
> +compression_counters.compression_rate =
> +(double)(compression_counters.reduced_size -
> +rs->compress_reduced_size_prev) /
> +(comp_pages * TARGET_PAGE_SIZE);
> +rs->compress_pages_prev = compression_counters.pages;
> +rs->compress_reduced_size_prev = 
> compression_counters.reduced_size;
> +}
> +}
>  }
>  
>  static void migration_bitmap_sync(RAMState *rs)
> @@ -1872,6 +1901,9 @@ static void flush_compressed_data(RAMState *rs)
>  qemu_mutex_lock(&comp_param[idx].mutex);
>  if (!comp_param[idx].quit) {
>  len = qemu_put_qemu_file(rs->f, comp_param[idx].file);
> +/* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
> +compression_counters.reduced_size += TARGET_PAGE_SIZE - len + 8;

I would agree with Dave here - why we store the "reduced size" instead
of the size of the compressed data (which I think should be len - 8)?

Meanwhile, would a helper be nicer? Like:

void migrate_compressed_page_stats_update(int xfer_size)
{
/* Removing the offset header in save_page_header() */
compression_counters.compressed_size = xfer_size - 8;
compression_counters.pages++;
ram_counters.transferred += bytes_xmit;
}

> +compression_counters.pages++;
>  ram_counters.transferred += len;
>  }
>  qemu_mutex_unlock(&comp_param[idx].mutex);
> @@ -1903,6 +1935,10 @@ retry:
>  qemu_cond_signal(&comp_param[idx].cond);
>  qemu_mutex_unlock(&comp_param[idx].mutex);
>  pages = 1;
> +/* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
> +compression_counters.reduced_size += TARGET_PAGE_SIZE -
> + bytes_xmit + 8;
> +compression_counters.pages++;
>  ram_counters.transferred += bytes_xmit;
>  break;
>  }

Regards,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2 4/8] migration: introduce save_zero_page_to_file

2018-07-22 Thread Peter Xu
On Thu, Jul 19, 2018 at 08:15:16PM +0800, guangrong.x...@gmail.com wrote:
> From: Xiao Guangrong 
> 
> It will be used by the compression threads
> 
> Signed-off-by: Xiao Guangrong 

Reviewed-by: Peter Xu 

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2 5/8] migration: drop the return value of do_compress_ram_page

2018-07-22 Thread Peter Xu
On Thu, Jul 19, 2018 at 08:15:17PM +0800, guangrong.x...@gmail.com wrote:
> From: Xiao Guangrong 
> 
> It is not used and cleans the code up a little
> 
> Signed-off-by: Xiao Guangrong 

Reviewed-by: Peter Xu 

-- 
Peter Xu



[Qemu-devel] [RFC 1/3] vfio: split vfio_get_group() into small functions

2018-07-22 Thread Tiwei Bie
This patch splits vfio_get_group() into small functions.
It makes it easier to implement other vfio_get_group*()
functions in the future.

Signed-off-by: Tiwei Bie 
---
 hw/vfio/common.c | 83 
 1 file changed, 55 insertions(+), 28 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fb396cf00a..52a05532cd 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1279,37 +1279,20 @@ static void vfio_disconnect_container(VFIOGroup *group)
 }
 }
 
-VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
+static VFIOGroup *vfio_init_group(int groupfd, int groupid, AddressSpace *as,
+  Error **errp)
 {
 VFIOGroup *group;
-char path[32];
 struct vfio_group_status status = { .argsz = sizeof(status) };
 
-QLIST_FOREACH(group, &vfio_group_list, next) {
-if (group->groupid == groupid) {
-/* Found it.  Now is it already in the right context? */
-if (group->container->space->as == as) {
-return group;
-} else {
-error_setg(errp, "group %d used in multiple address spaces",
-   group->groupid);
-return NULL;
-}
-}
-}
-
 group = g_malloc0(sizeof(*group));
 
-snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
-group->fd = qemu_open(path, O_RDWR);
-if (group->fd < 0) {
-error_setg_errno(errp, errno, "failed to open %s", path);
-goto free_group_exit;
-}
+group->fd = groupfd;
+group->groupid = groupid;
 
 if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &status)) {
 error_setg_errno(errp, errno, "failed to get group %d status", 
groupid);
-goto close_fd_exit;
+goto free_group_exit;
 }
 
 if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
@@ -1317,16 +1300,15 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace 
*as, Error **errp)
 error_append_hint(errp,
   "Please ensure all devices within the iommu_group "
   "are bound to their vfio bus driver.\n");
-goto close_fd_exit;
+goto free_group_exit;
 }
 
-group->groupid = groupid;
 QLIST_INIT(&group->device_list);
 
 if (vfio_connect_container(group, as, errp)) {
 error_prepend(errp, "failed to setup container for group %d: ",
   groupid);
-goto close_fd_exit;
+goto free_group_exit;
 }
 
 if (QLIST_EMPTY(&vfio_group_list)) {
@@ -1337,15 +1319,60 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace 
*as, Error **errp)
 
 return group;
 
-close_fd_exit:
-close(group->fd);
-
 free_group_exit:
 g_free(group);
 
 return NULL;
 }
 
+static VFIOGroup *vfio_find_group(int groupid, AddressSpace *as,
+  Error **errp)
+{
+VFIOGroup *group;
+
+QLIST_FOREACH(group, &vfio_group_list, next) {
+if (group->groupid == groupid) {
+/* Found it.  Now is it already in the right context? */
+if (group->container->space->as == as) {
+return group;
+} else {
+error_setg(errp, "group %d used in multiple address spaces",
+   group->groupid);
+return NULL;
+}
+}
+}
+
+return NULL;
+}
+
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
+{
+VFIOGroup *group;
+char path[32];
+int groupfd;
+
+group = vfio_find_group(groupid, as, errp);
+if (group) {
+return group;
+}
+
+snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
+groupfd = qemu_open(path, O_RDWR);
+if (groupfd < 0) {
+error_setg_errno(errp, errno, "failed to open %s", path);
+return NULL;
+}
+
+group = vfio_init_group(groupfd, groupid, as, errp);
+if (!group) {
+close(groupfd);
+return NULL;
+}
+
+return group;
+}
+
 void vfio_put_group(VFIOGroup *group)
 {
 if (!group || !QLIST_EMPTY(&group->device_list)) {
-- 
2.18.0




[Qemu-devel] [RFC 2/3] vfio: support getting VFIOGroup from groupfd

2018-07-22 Thread Tiwei Bie
This patch introduces an API to support getting
VFIOGroup from groupfd. This is useful when the
groupfd is opened and shared by another process
via UNIX socket.

Signed-off-by: Tiwei Bie 
---
 hw/vfio/common.c  | 44 +++
 include/hw/vfio/vfio-common.h |  1 +
 2 files changed, 45 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 52a05532cd..4c19a33dd5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1279,6 +1279,30 @@ static void vfio_disconnect_container(VFIOGroup *group)
 }
 }
 
+static int vfio_groupfd_to_groupid(int groupfd)
+{
+char *tmp, group_path[PATH_MAX], *group_name;
+int groupid, len;
+
+tmp = g_strdup_printf("/proc/self/fd/%d", groupfd);
+len = readlink(tmp, group_path, sizeof(group_path));
+g_free(tmp);
+
+if (len <= 0 || len >= sizeof(group_path)) {
+return -1;
+}
+
+group_path[len] = '\0';
+
+group_name = g_path_get_basename(group_path);
+if (sscanf(group_name, "%d", &groupid) != 1) {
+groupid = -1;
+}
+g_free(group_name);
+
+return groupid;
+}
+
 static VFIOGroup *vfio_init_group(int groupfd, int groupid, AddressSpace *as,
   Error **errp)
 {
@@ -1373,6 +1397,26 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, 
Error **errp)
 return group;
 }
 
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp)
+{
+VFIOGroup *group;
+int groupid;
+
+groupid = vfio_groupfd_to_groupid(groupfd);
+if (groupid < 0) {
+error_setg(errp, "failed to get group id from group fd %d",
+   groupfd);
+return NULL;
+}
+
+group = vfio_find_group(groupid, as, errp);
+if (group) {
+return group;
+}
+
+return vfio_init_group(groupfd, groupid, as, errp);
+}
+
 void vfio_put_group(VFIOGroup *group)
 {
 if (!group || !QLIST_EMPTY(&group->device_list)) {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index a9036929b2..9bb1068a36 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -170,6 +170,7 @@ void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
 VFIODevice *vbasedev, Error **errp);
-- 
2.18.0




[Qemu-devel] [RFC 0/3] Supporting programming IOMMU in QEMU (vDPA/vhost-user)

2018-07-22 Thread Tiwei Bie
This patch set introduces a slave message in vhost-user to
allow slave to share its VFIO group fd to master and do the
IOMMU programming based on virtio device's DMA address space
for this group in QEMU.

For the vhost-user backends which support vDPA, they could
leverage this message to ask master to do IOMMU programming
in QEMU for the vDPA device in backend. This is helpful to
support vIOMMU in vDPA.

Tiwei Bie (3):
  vfio: split vfio_get_group() into small functions
  vfio: support getting VFIOGroup from groupfd
  vhost-user: support programming VFIO group in master

 docs/interop/vhost-user.txt|  16 +
 hw/vfio/common.c   | 127 +
 hw/virtio/vhost-user.c |  40 +++
 include/hw/vfio/vfio-common.h  |   1 +
 include/hw/virtio/vhost-user.h |   2 +
 5 files changed, 158 insertions(+), 28 deletions(-)

-- 
2.18.0




[Qemu-devel] [RFC 3/3] vhost-user: support programming VFIO group in master

2018-07-22 Thread Tiwei Bie
Introduce a slave message to allow slave to share its
VFIO group fd to master and do the IOMMU programming
based on virtio device's DMA address space for this
group in QEMU.

For the vhost backends which support vDPA, they could
leverage this message to ask master to do the IOMMU
programming in QEMU for the vDPA device in backend.

Signed-off-by: Tiwei Bie 
---
 docs/interop/vhost-user.txt| 16 ++
 hw/virtio/vhost-user.c | 40 ++
 include/hw/virtio/vhost-user.h |  2 ++
 3 files changed, 58 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index f59667f498..a57a8f9451 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -397,6 +397,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_CONFIG 9
 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
+#define VHOST_USER_PROTOCOL_F_VFIO_GROUP 12
 
 Master message types
 
@@ -815,6 +816,21 @@ Slave message types
   This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
   protocol feature has been successfully negotiated.
 
+ * VHOST_USER_SLAVE_VFIO_GROUP_MSG
+
+  Id: 4
+  Equivalent ioctl: N/A
+  Slave payload: N/A
+  Master payload: N/A
+
+  When VHOST_USER_PROTOCOL_F_VFIO_GROUP is negotiated, vhost-user slave
+  could send this request to share its VFIO group fd via ancillary data
+  to master. After receiving this request from slave, master will close
+  the existing VFIO group if any and do the DMA programming based on the
+  virtio device's DMA address space for the new group if the request is
+  sent with a file descriptor.
+
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 ---
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b041343632..db958e24c7 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_CONFIG = 9,
 VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
 VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+VHOST_USER_PROTOCOL_F_VFIO_GROUP = 12,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -97,6 +98,7 @@ typedef enum VhostUserSlaveRequest {
 VHOST_USER_SLAVE_IOTLB_MSG = 1,
 VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
 VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
+VHOST_USER_SLAVE_VFIO_GROUP_MSG = 4,
 VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -949,6 +951,41 @@ static int 
vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
 return 0;
 }
 
+static int vhost_user_slave_handle_vfio_group(struct vhost_dev *dev,
+  int *fd)
+{
+struct vhost_user *u = dev->opaque;
+VhostUserState *user = u->user;
+VirtIODevice *vdev = dev->vdev;
+int groupfd = fd[0];
+VFIOGroup *group;
+
+if (!virtio_has_feature(dev->protocol_features,
+VHOST_USER_PROTOCOL_F_VFIO_GROUP) ||
+vdev == NULL) {
+return -1;
+}
+
+if (user->vfio_group) {
+vfio_put_group(user->vfio_group);
+user->vfio_group = NULL;
+}
+
+group = vfio_get_group_from_fd(groupfd, vdev->dma_as, NULL);
+if (group == NULL) {
+return -1;
+}
+
+if (group->fd != groupfd) {
+close(groupfd);
+}
+
+user->vfio_group = group;
+fd[0] = -1;
+
+return 0;
+}
+
 static void slave_read(void *opaque)
 {
 struct vhost_dev *dev = opaque;
@@ -1021,6 +1058,9 @@ static void slave_read(void *opaque)
 ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
   fd[0]);
 break;
+case VHOST_USER_SLAVE_VFIO_GROUP_MSG:
+ret = vhost_user_slave_handle_vfio_group(dev, fd);
+break;
 default:
 error_report("Received unexpected msg type.");
 ret = -EINVAL;
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index fd660393a0..9e11473274 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -10,6 +10,7 @@
 
 #include "chardev/char-fe.h"
 #include "hw/virtio/virtio.h"
+#include "hw/vfio/vfio-common.h"
 
 typedef struct VhostUserHostNotifier {
 MemoryRegion mr;
@@ -20,6 +21,7 @@ typedef struct VhostUserHostNotifier {
 typedef struct VhostUserState {
 CharBackend *chr;
 VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
+VFIOGroup *vfio_group;
 } VhostUserState;
 
 VhostUserState *vhost_user_init(void);
-- 
2.18.0




Re: [Qemu-devel] [PATCH v2 6/8] migration: move handle of zero page to the thread

2018-07-22 Thread Peter Xu
On Thu, Jul 19, 2018 at 08:15:18PM +0800, guangrong.x...@gmail.com wrote:

[...]

> @@ -1950,12 +1971,16 @@ retry:
>  set_compress_params(&comp_param[idx], block, offset);
>  qemu_cond_signal(&comp_param[idx].cond);
>  qemu_mutex_unlock(&comp_param[idx].mutex);
> -pages = 1;
> -/* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
> -compression_counters.reduced_size += TARGET_PAGE_SIZE -
> - bytes_xmit + 8;
> -compression_counters.pages++;
>  ram_counters.transferred += bytes_xmit;
> +pages = 1;

(moving of this line seems irrelevant; meanwhile more duplicated codes
so even better to have a helper now)

> +if (comp_param[idx].zero_page) {
> +ram_counters.duplicate++;
> +} else {
> +/* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
> +compression_counters.reduced_size += TARGET_PAGE_SIZE -
> + bytes_xmit + 8;
> +compression_counters.pages++;
> +}
>  break;
>  }
>  }

[...]

> @@ -2249,15 +2308,8 @@ static int ram_save_target_page(RAMState *rs, 
> PageSearchStatus *pss,
>  return res;
>  }
>  
> -/*
> - * When starting the process of a new block, the first page of
> - * the block should be sent out before other pages in the same
> - * block, and all the pages in last block should have been sent
> - * out, keeping this order is important, because the 'cont' flag
> - * is used to avoid resending the block name.
> - */
> -if (block != rs->last_sent_block && save_page_use_compression(rs)) {
> -flush_compressed_data(rs);
> +if (save_compress_page(rs, block, offset)) {
> +return 1;

It's a bit tricky (though it seems to be a good idea too) to move the
zero detect into the compression thread, though I noticed that we also
do something else for zero pages:

res = save_zero_page(rs, block, offset);
if (res > 0) {
/* Must let xbzrle know, otherwise a previous (now 0'd) cached
 * page would be stale
 */
if (!save_page_use_compression(rs)) {
XBZRLE_cache_lock();
xbzrle_cache_zero_page(rs, block->offset + offset);
XBZRLE_cache_unlock();
}
ram_release_pages(block->idstr, offset, res);
return res;
}

I'd guess that the xbzrle update of the zero page is not needed for
compression since after all xbzrle is not enabled when compression is
enabled, however do we need to do ram_release_pages() somehow?

>  }
>  
>  res = save_zero_page(rs, block, offset);
> @@ -2275,18 +2327,10 @@ static int ram_save_target_page(RAMState *rs, 
> PageSearchStatus *pss,
>  }
>  
>  /*
> - * Make sure the first page is sent out before other pages.
> - *
> - * we post it as normal page as compression will take much
> - * CPU resource.
> - */
> -if (block == rs->last_sent_block && save_page_use_compression(rs)) {
> -res = compress_page_with_multi_thread(rs, block, offset);
> -if (res > 0) {
> -return res;
> -}
> -compression_counters.busy++;
> -} else if (migrate_use_multifd()) {
> +* do not use multifd for compression as the first page in the new
> +* block should be posted out before sending the compressed page
> +*/
> +if (!save_page_use_compression(rs) && migrate_use_multifd()) {
>  return ram_save_multifd_page(rs, block, offset);
>  }
>  
> -- 
> 2.14.4
> 

Regards,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2 10/18] qjson: report an error if there are multiple results

2018-07-22 Thread Markus Armbruster
Marc-André Lureau  writes:

> Hi
>
> On Fri, Jul 20, 2018 at 10:49 AM, Markus Armbruster  wrote:
>> Marc-André Lureau  writes:
>>
>>> qobject_from_jsonv() returns a single object. Let's make sure that
>>> during parsing we don't leak an intermediary object. Instead of
>>> returning the last object, set a parsing error.
>>>
>>> Also, the lexer/parser keeps consuming all the data. There might be an
>>> error set earlier. Let's keep it and not call json_parser_parse() with
>>> the remaining data. Eventually, we may teach the message parser to
>>> stop consuming the data.
>>
>> Took me a while to figure out what you mean :)
>>
>> qobject_from_jsonv() feeds a complete string to the JSON parser, with
>> json_message_parser_feed().  This actually feeds one character after the
>> other to the lexer, via json_lexer_feed_char().
>>
>> Whenever a character completes a token, the lexer feeds that token to
>> the streamer via a callback that is always json_message_process_token().
>>
>> The attentive reader may wonder what it does with trailing characters
>> that aren't a complete token.  More on that below.
>>
>> The streamer accumulates tokens, counting various parenthesis.  When all
>> counts are zero or any is negative, the group is complete, and is fed to
>> the parser via another callback.  That callback is parse_json() here.
>> There are more elsewhere.
>>
>> The attentive reader may wonder what it does with trailing tokens that
>> aren't a complete group.  More on that below.
>>
>> The callbacks all pass the tokens to json_parser_parse() to do the
>> actual parsing.  Returns the abstract syntax tree as QObject on success.
>>
>> Note that the callback can be called any number of times.
>>
>> In my opinion, this is over-engineered and under-thought.  There's more
>> flexibility than we're using, and the associated complexity breeds bugs.

Two obvious simplifications come to mind:

* Call json_message_process_token() directly.

* Move the call json_parser_parse() into the streamer, and have the
  streamer pass QObject * instead of GQueue * to its callback.

I'll post patches for your consideration, but first I'll continue to
review your series.

>> In particular, parse_json() is wrong:
>>
>> s->result = json_parser_parse(tokens, s->ap, &s->err);
>>
>> If the previous call succeeded, we leak its result.  This is the leak
>> you mentioned.
>>
>> If any previous call set an error, we pass &s->err pointing to non-null,
>> which is forbidden.  If json_parser_parse() passed it to error_setg(),
>> this would crash.  Since it passes it only to error_propagate(), all
>> errors but the first one are ignored.  Latent crash bug.
>>
>> If the last call succeeds and any previous call failed, the function
>> simultaneously succeeds and fails: we return both a result and set an
>> error.
>>
>> Let's demonstrate these bugs before we fix them, by inserting the
>> appended patch before this one.
>>
>> Are the other callbacks similarly buggy?  Turns out they're okay:
>>
>> * handle_qmp_command() consumes result and error on each call.
>>
>> * process_event() does, too.
>>
>> * qmp_response() treats errors as fatal, and asserts "no previous
>>   response".
>>
>> On trailing tokens that don't form a group: they get silently ignored,
>> as demonstrated by check-qjson test cases unterminated_array(),
>> unterminated_array_comma(), unterminated_dict(),
>> unterminated_dict_comma(), all marked BUG.
>>
>> On trailing characters that don't form a token: they get silently
>> ignored, as demonstrated by check-qjson test cases
>> unterminated_string(), unterminated_sq_string(), unterminated_escape(),
>> all marked BUG, except when they aren't, as in test case
>> unterminated_literal().
>>
>> The BUG marks are all gone at the end of this series.  I guess you're
>> fixing all that :)
>>
>> Note that these "trailing characters / tokens are silently ignored" bugs
>> *do* affect the other callbacks.  Reproducer for handle_qmp_command():
>>
>> $ echo -e '{ "execute": "qmp_capabilities" }\n{ "execute": "query-name" 
>> }\n"unterminated' | socat UNIX:test-qmp STDIO
>> {"QMP": {"version": {"qemu": {"micro": 90, "minor": 12, "major": 2}, 
>> "package": "v3.0.0-rc1-20-g6a024cd461"}, "capabilities": ["oob"]}}
>> {"return": {}}
>> {"return": {}}
>>
>> Note there's no error reported for the last line.
>>
>>> Signed-off-by: Marc-André Lureau 
>>> ---
>>>  qobject/qjson.c | 16 +++-
>>>  tests/check-qjson.c | 11 +++
>>>  2 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qobject/qjson.c b/qobject/qjson.c
>>> index ee04e61096..8a9d116150 100644
>>> --- a/qobject/qjson.c
>>> +++ b/qobject/qjson.c
>>> @@ -22,6 +22,7 @@
>>>  #include "qapi/qmp/qlist.h"
>>>  #include "qapi/qmp/qnum.h"
>>>  #include "qapi/qmp/qstring.h"
>>> +#include "qapi/qmp/qerror.h"
>>>  #include "qemu/unicode.h"
>>>
>>>  typedef struct JSONParsingState
>>> @@ -36,7 +37,20 @@ static void parse_json(JSONMessageParser *parser, GQueue 
>

Re: [Qemu-devel] [PATCH v2 7/8] migration: hold the lock only if it is really needed

2018-07-22 Thread Peter Xu
On Thu, Jul 19, 2018 at 08:15:19PM +0800, guangrong.x...@gmail.com wrote:
> From: Xiao Guangrong 
> 
> Try to hold src_page_req_mutex only if the queue is not
> empty
> 
> Reviewed-by: Dr. David Alan Gilbert 
> Signed-off-by: Xiao Guangrong 

Reviewed-by: Peter Xu 

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2 8/8] migration: do not flush_compressed_data at the end of each iteration

2018-07-22 Thread Peter Xu
On Thu, Jul 19, 2018 at 08:15:20PM +0800, guangrong.x...@gmail.com wrote:
> From: Xiao Guangrong 
> 
> flush_compressed_data() needs to wait all compression threads to
> finish their work, after that all threads are free until the
> migration feeds new request to them, reducing its call can improve
> the throughput and use CPU resource more effectively
> 
> We do not need to flush all threads at the end of iteration, the
> data can be kept locally until the memory block is changed or
> memory migration starts over in that case we will meet a dirtied
> page which may still exists in compression threads's ring
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  migration/ram.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 89305c7af5..fdab13821d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -315,6 +315,8 @@ struct RAMState {
>  uint64_t iterations;
>  /* number of dirty bits in the bitmap */
>  uint64_t migration_dirty_pages;
> +/* last dirty_sync_count we have seen */
> +uint64_t dirty_sync_count;

Better suffix it with "_prev" as well?  So that we can quickly
identify that it's only a cache and it can be different from the one
in the ram_counters.

>  /* protects modification of the bitmap */
>  QemuMutex bitmap_mutex;
>  /* The RAMBlock used in the last src_page_requests */
> @@ -2532,6 +2534,7 @@ static void ram_save_cleanup(void *opaque)
>  }
>  
>  xbzrle_cleanup();
> +flush_compressed_data(*rsp);

Could I ask why do we need this considering that we have
compress_threads_save_cleanup() right down there?

>  compress_threads_save_cleanup();
>  ram_state_cleanup(rsp);
>  }
> @@ -3203,6 +3206,17 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  
>  ram_control_before_iterate(f, RAM_CONTROL_ROUND);
>  
> +/*
> + * if memory migration starts over, we will meet a dirtied page which
> + * may still exists in compression threads's ring, so we should flush
> + * the compressed data to make sure the new page is not overwritten by
> + * the old one in the destination.
> + */
> +if (ram_counters.dirty_sync_count != rs->dirty_sync_count) {
> +rs->dirty_sync_count = ram_counters.dirty_sync_count;
> +flush_compressed_data(rs);
> +}
> +
>  t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>  i = 0;
>  while ((ret = qemu_file_rate_limit(f)) == 0 ||
> @@ -3235,7 +3249,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  }
>  i++;
>  }
> -flush_compressed_data(rs);

This looks sane to me, but I'd like to see how other people would
think about it too...

>  rcu_read_unlock();
>  
>  /*
> -- 
> 2.14.4
> 

Regards,

-- 
Peter Xu



[Qemu-devel] patchew.org updated

2018-07-22 Thread Paolo Bonzini
Hi all,

Fam has just updated Patchew to run the latest version.  Changes include:

- a new, more compact look for series visualization

- each message in the series is shown in a separate page

- links to other messages in the series show whether the message has had
replies or not

- the interdiff (version comparison) functionality can detect reordered
patches, similar to "git-tbdiff"

- an initial version of the REST API (https://patchew.org/api/v1/)

- support for colored logs (not yet useful for QEMU)

- new search terms "is:pull" and "has:replies"

- many bug fixes

Thanks,

Paolo



Re: [Qemu-devel] [PATCH v2 11/18] qjson: report error on unterminated string

2018-07-22 Thread Markus Armbruster
Marc-André Lureau  writes:

> An unterminated string will make parser emit an error (tokens ==
> NULL). Let's report it.
>
> Signed-off-by: Marc-André Lureau 
> ---
>  qobject/qjson.c | 3 +++
>  tests/check-qjson.c | 6 +++---
>  2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/qobject/qjson.c b/qobject/qjson.c
> index 8a9d116150..01218c9ad6 100644
> --- a/qobject/qjson.c
> +++ b/qobject/qjson.c
> @@ -37,6 +37,9 @@ static void parse_json(JSONMessageParser *parser, GQueue 
> *tokens)
>  {
>  JSONParsingState *s = container_of(parser, JSONParsingState, parser);
>  
> +if (!tokens && !s->err) {
> +error_setg(&s->err, QERR_JSON_PARSING);
> +}
>  if (s->result || s->err) {
>  if (s->result) {
>  qobject_unref(s->result);

This doesn't fix the JSON parser, it "fixes" one of its users!  Other
users remain broken.  Reproducer for QMP (already mentioned in my review
of the previous patch):

$ echo -e '{ "execute": "qmp_capabilities" }\n{ "execute": "query-name" 
}\n"unterminated' | socat UNIX:test-qmp STDIO
{"QMP": {"version": {"qemu": {"micro": 90, "minor": 12, "major": 2}, 
"package": "v3.0.0-rc1-20-g6a024cd461"}, "capabilities": ["oob"]}}
{"return": {}}
{"return": {}}

Note there's no error reported for the last line.

The simplification of the JSON parser I have in mind might make this
easy to fix properly.  I'll look into it.

[...]



Re: [Qemu-devel] [PATCH v2 12/18] qjson: return parsing error if unterminated input

2018-07-22 Thread Markus Armbruster
Marc-André Lureau  writes:

> Signed-off-by: Marc-André Lureau 
> ---
>  qobject/json-streamer.c | 4 +++-
>  qobject/qjson.c | 5 -
>  tests/check-qjson.c | 8 
>  3 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
> index c51c2021f9..065c551332 100644
> --- a/qobject/json-streamer.c
> +++ b/qobject/json-streamer.c
> @@ -126,7 +126,9 @@ int json_message_parser_feed(JSONMessageParser *parser,
>  
>  int json_message_parser_flush(JSONMessageParser *parser)
>  {
> -return json_lexer_flush(&parser->lexer);
> +int ret = json_lexer_flush(&parser->lexer);
> +
> +return ret ?: g_queue_get_length(parser->tokens);
>  }
>  
>  void json_message_parser_destroy(JSONMessageParser *parser)
> diff --git a/qobject/qjson.c b/qobject/qjson.c
> index 01218c9ad6..8afdc1e06a 100644
> --- a/qobject/qjson.c
> +++ b/qobject/qjson.c
> @@ -64,7 +64,10 @@ QObject *qobject_from_jsonv(const char *string, va_list 
> *ap, Error **errp)
>  
>  json_message_parser_init(&state.parser, parse_json);
>  json_message_parser_feed(&state.parser, string, strlen(string));
> -json_message_parser_flush(&state.parser);
> +if (json_message_parser_flush(&state.parser) != 0 &&
> +!state.err) {
> +error_setg(&state.err, QERR_JSON_PARSING);
> +}
>  json_message_parser_destroy(&state.parser);
>  
>  error_propagate(errp, state.err);

Again, this leaves other users broken.  Reproducer for QMP:

$ echo -e '{ "execute": "qmp_capabilities" }\n{ "execute": "query-name" 
}\n[' | socat UNIX:/work/armbru/images/test-qmp STDIO
{"QMP": {"version": {"qemu": {"micro": 90, "minor": 12, "major": 2}, 
"package": "v3.0.0-rc1-21-g975ad3dcf2"}, "capabilities": ["oob"]}}
{"return": {}}
{"return": {}}

Note there's no error reported for the last line.

The simplification of the JSON parser I have in mind might make this
easy to fix properly.  I'll look into it.