Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-16 Thread Dr. David Alan Gilbert
* Alexander Graf (ag...@suse.de) wrote:

snip

 Can you please test whether the patch below makes things work for you again?

The patch below fixes RDMA migration (same host); however, see comments.

 Alex
 
 From ef6fde21007e62529799264f57a65c6bb3d0d414 Mon Sep 17 00:00:00 2001
 From: Alexander Graf ag...@suse.de
 Date: Sat, 14 Feb 2015 00:21:01 +0100
 Subject: [PATCH] migration: Read JSON VM description on incoming migration
 
 One of the really nice things about the VM description format is that it
 goes
 over the wire when live migration is happening. Unfortunately QEMU today
 closes
 any socket once it sees VM_EOF coming, so we never give the VMDESC the
 chance to
 actually land on the wire.
 
 This patch makes QEMU read the description as well. This way we ensure that
 anything wire tapping us in between will get the chance to also
 interpret the
 stream.
 
 Along the way we also fix virt tests that assume that number_bytes_sent
 on the
 sender side is equal to number_bytes_read which was true before the VMDESC
 patches and is true again with this patch.
 
 Signed-off-by: Alexander Graf ag...@suse.de
 
 diff --git a/savevm.c b/savevm.c
 index 8040766..ff4bead 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -929,6 +929,7 @@ int qemu_loadvm_state(QEMUFile *f)
  uint8_t section_type;
  unsigned int v;
  int ret;
 +int file_error_after_eof = -1;
 
  if (qemu_savevm_state_blocked(local_err)) {
  error_report(%s, error_get_pretty(local_err));
 @@ -1034,6 +1035,22 @@ int qemu_loadvm_state(QEMUFile *f)
  }
  }
 
 +file_error_after_eof = qemu_file_get_error(f);
 +
 +/*
 + * Try to read in the VMDESC section as well, so that dumping tools
 that
 + * intercept our migration stream have the chance to see it.
 + */
 +if (qemu_get_byte(f) == QEMU_VM_VMDESCRIPTION) {

You could use qemu_peek_byte for that?

 +uint32_t size = qemu_get_be32(f);
 +uint8_t *buf = g_malloc(size);
 +
 +if (buf) {
 +qemu_get_buffer(f, buf, size);
 +g_free(buf);
 +}

This is slightly dangerous; a malformed file could send you a huge
value and get you to allocate lots of memory for no good reason.

You could do some clever; but personally I'd just loop around a
nice small buffer until it's gone.

As mentioned on IRC; I'm still worried though that this is only
a fix for loading on newer versions; migration to an older QEMU
with the same machine type would fail.
(Yes I know mythically that no one cares about this; but I do).

Dave

 +}
 +
  cpu_synchronize_all_post_init();
 
  ret = 0;
 @@ -1045,7 +1062,8 @@ out:
  }
 
  if (ret == 0) {
 -ret = qemu_file_get_error(f);
 +/* We may not have a VMDESC section, so ignore relative errors */
 +ret = file_error_after_eof;
  }
 
  return ret;
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-16 Thread Alexander Graf


On 16.02.15 19:57, Dr. David Alan Gilbert wrote:
 * Alexander Graf (ag...@suse.de) wrote:
 
 snip
 
 Can you please test whether the patch below makes things work for you again?
 
 The patch below fixes RDMA migration (same host); however, see comments.
 
 Alex

 From ef6fde21007e62529799264f57a65c6bb3d0d414 Mon Sep 17 00:00:00 2001
 From: Alexander Graf ag...@suse.de
 Date: Sat, 14 Feb 2015 00:21:01 +0100
 Subject: [PATCH] migration: Read JSON VM description on incoming migration

 One of the really nice things about the VM description format is that it
 goes
 over the wire when live migration is happening. Unfortunately QEMU today
 closes
 any socket once it sees VM_EOF coming, so we never give the VMDESC the
 chance to
 actually land on the wire.

 This patch makes QEMU read the description as well. This way we ensure that
 anything wire tapping us in between will get the chance to also
 interpret the
 stream.

 Along the way we also fix virt tests that assume that number_bytes_sent
 on the
 sender side is equal to number_bytes_read which was true before the VMDESC
 patches and is true again with this patch.

 Signed-off-by: Alexander Graf ag...@suse.de

 diff --git a/savevm.c b/savevm.c
 index 8040766..ff4bead 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -929,6 +929,7 @@ int qemu_loadvm_state(QEMUFile *f)
  uint8_t section_type;
  unsigned int v;
  int ret;
 +int file_error_after_eof = -1;

  if (qemu_savevm_state_blocked(local_err)) {
  error_report(%s, error_get_pretty(local_err));
 @@ -1034,6 +1035,22 @@ int qemu_loadvm_state(QEMUFile *f)
  }
  }

 +file_error_after_eof = qemu_file_get_error(f);
 +
 +/*
 + * Try to read in the VMDESC section as well, so that dumping tools
 that
 + * intercept our migration stream have the chance to see it.
 + */
 +if (qemu_get_byte(f) == QEMU_VM_VMDESCRIPTION) {
 
 You could use qemu_peek_byte for that?

It's what I had originally, but qemu_peek_byte() at the end of the day
is the exact same as qemu_get_byte, but doesn't increment the internal
buffer counter. So any error conditions that incur because the read
failed still happen with peek_byte and are a lot less intuitive.

 
 +uint32_t size = qemu_get_be32(f);
 +uint8_t *buf = g_malloc(size);
 +
 +if (buf) {
 +qemu_get_buffer(f, buf, size);
 +g_free(buf);
 +}
 
 This is slightly dangerous; a malformed file could send you a huge
 value and get you to allocate lots of memory for no good reason.
 
 You could do some clever; but personally I'd just loop around a
 nice small buffer until it's gone.

Good idea. Will change.

 As mentioned on IRC; I'm still worried though that this is only
 a fix for loading on newer versions; migration to an older QEMU
 with the same machine type would fail.
 (Yes I know mythically that no one cares about this; but I do).

Yeah, I guess I'll follow up with a fix to disable VMDESC submission on
older versions, just to be on the safe side.


Alex



Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-16 Thread Paolo Bonzini


On 16/02/2015 21:24, Alexander Graf wrote:
 As mentioned on IRC; I'm still worried though that this is only
  a fix for loading on newer versions; migration to an older QEMU
  with the same machine type would fail.
  (Yes I know mythically that no one cares about this; but I do).
 Yeah, I guess I'll follow up with a fix to disable VMDESC submission on
 older versions, just to be on the safe side.

Can you make it a capability?

Paolo



Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-16 Thread Paolo Bonzini


On 16/02/2015 22:08, Alexander Graf wrote:
  Can you make it a capability?
 When did live migration start to have capability negotiation? :)

Only capability without negotiation. :)

Negotiation is done above QEMU.

Paolo



Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-16 Thread Alexander Graf


On 16.02.15 22:06, Paolo Bonzini wrote:
 
 
 On 16/02/2015 21:24, Alexander Graf wrote:
 As mentioned on IRC; I'm still worried though that this is only
 a fix for loading on newer versions; migration to an older QEMU
 with the same machine type would fail.
 (Yes I know mythically that no one cares about this; but I do).
 Yeah, I guess I'll follow up with a fix to disable VMDESC submission on
 older versions, just to be on the safe side.
 
 Can you make it a capability?

When did live migration start to have capability negotiation? :)


Alex



Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-13 Thread Dr. David Alan Gilbert
* Alexander Graf (ag...@suse.de) wrote:
 
 
 On 13.02.15 01:29, Lucas Meneghel Rodrigues wrote:
  Copying Alex.
  
  OK, after bisecting, this is what I've got:
  
  8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
  commit 8118f0950fc77cce7873002a5021172dd6e040b5
  Author: Alexander Graf ag...@suse.de mailto:ag...@suse.de
  Date:   Thu Jan 22 15:01:39 2015 +0100
  
  migration: Append JSON description of migration stream
  
  One of the annoyances of the current migration format is the fact that
  it's not self-describing. In fact, it's not properly describing at all.
  Some code randomly scattered throughout QEMU elaborates roughly how to
  read and write a stream of bytes.
  
  We discussed an idea during KVM Forum 2013 to add a JSON description of
  the migration protocol itself to the migration stream. This patch
  adds a section after the VM_END migration end marker that contains
  description data on what the device sections of the stream are
  composed of.
  
  This approach is backwards compatible with any QEMU version reading the
  stream, because QEMU just stops reading after the VM_END marker and
  ignores
  any data following it.
  
  With an additional external program this allows us to decipher the
  contents of any migration stream and hopefully make migration bugs
  easier
  to track down.
  
  Signed-off-by: Alexander Graf ag...@suse.de mailto:ag...@suse.de
  Signed-off-by: Amit Shah amit.s...@redhat.com
  mailto:amit.s...@redhat.com
  Signed-off-by: Juan Quintela quint...@redhat.com
  mailto:quint...@redhat.com
  
  :04 04 e9aac242a61fbd05bbb0daa3e8877970e738
  61df81f831bc86b29f65883523ea95abb36f1ec5 Mhw
  :04 04 fe0659bed17d86c43657c26622d64fd44a1af037
  7092a6b6515a3d0077f68ff2d80dbd74597a244f Minclude
  :04 04 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1
  c2b1dcda197d96657458d699c185e39ae45f3c6c Mmigration
  :100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d
  80407662ad3ed860d33a9d35f5c44b1d19c4612b Msavevm.c
  :04 04 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa
  7aaf3d10ef7f73413b228e854fe6f04317151e46 Mtests
  
  So there you go. I'm going to sleep, if you need any extra help let me know.
 
 So the major difference with this patch applied is that the sender could
 send more data than the receive wants to read. I can't see the actual
 migrate command you used down there.
 
 I haven't seen this actually being a problem so far, as the receiver
 just close()s its file descriptor once it hits VM_EOF. This should only
 break senders if they expect they can send more. That said, I think I
 only tested offline migration (via exec:), so maybe QEMU is behaving
 badly and actually wants to send all data and just fails the migration
 without?

Hmm, for such an odd change to the migration stream it's a surprise you
didn't test it live.

The only obvious thing to me of what could go wrong would be that
if the destination closed it's migration fd when it received what it thought
was a terminator then the source could get upset at it's failure to send
the last few kB with the JSON in it.

Dave

 
 
 Alex
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-13 Thread Lucas Meneghel Rodrigues

Alex, Dave:

Virt-Test fd migration starts by sending a fd to the source vm

22:20:40 DEBUG| Send file descriptor migfd_28_1423786840 to source VM.
22:20:40 DEBUG| (monitor hmp1) Sending command 'getfd 
migfd_28_1423786840'


later on...

22:20:42 INFO | Migrating to fd:migfd_28_1423786840
22:20:42 DEBUG| (monitor hmp1) Sending command 'migrate -d 
fd:migfd_28_1423786840'

22:20:42 DEBUG| Send command: migrate -d fd:migfd_28_1423786840

Attached to this message you can find a .tar.bz2 file (~36Kb) with 
virt-test results. It contains extra information, such as a a record of 
vm registers taken periodically during the testing process.


Cheers,

Lucas

On Thu, Feb 12, 2015 at 10:36 PM, Alexander Graf ag...@suse.de wrote:



On 13.02.15 01:29, Lucas Meneghel Rodrigues wrote:

 Copying Alex.

 OK, after bisecting, this is what I've got:

 8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
 commit 8118f0950fc77cce7873002a5021172dd6e040b5
 Author: Alexander Graf ag...@suse.de mailto:ag...@suse.de
 Date:   Thu Jan 22 15:01:39 2015 +0100

 migration: Append JSON description of migration stream

 One of the annoyances of the current migration format is the 
fact that
 it's not self-describing. In fact, it's not properly describing 
at all.
 Some code randomly scattered throughout QEMU elaborates roughly 
how to

 read and write a stream of bytes.

 We discussed an idea during KVM Forum 2013 to add a JSON 
description of
 the migration protocol itself to the migration stream. This 
patch
 adds a section after the VM_END migration end marker that 
contains

 description data on what the device sections of the stream are
 composed of.

 This approach is backwards compatible with any QEMU version 
reading the
 stream, because QEMU just stops reading after the VM_END marker 
and

 ignores
 any data following it.

 With an additional external program this allows us to decipher 
the
 contents of any migration stream and hopefully make migration 
bugs

 easier
 to track down.

 Signed-off-by: Alexander Graf ag...@suse.de 
mailto:ag...@suse.de

 Signed-off-by: Amit Shah amit.s...@redhat.com
 mailto:amit.s...@redhat.com
 Signed-off-by: Juan Quintela quint...@redhat.com
 mailto:quint...@redhat.com

 :04 04 e9aac242a61fbd05bbb0daa3e8877970e738
 61df81f831bc86b29f65883523ea95abb36f1ec5 Mhw
 :04 04 fe0659bed17d86c43657c26622d64fd44a1af037
 7092a6b6515a3d0077f68ff2d80dbd74597a244f Minclude
 :04 04 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1
 c2b1dcda197d96657458d699c185e39ae45f3c6c Mmigration
 :100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d
 80407662ad3ed860d33a9d35f5c44b1d19c4612b Msavevm.c
 :04 04 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa
 7aaf3d10ef7f73413b228e854fe6f04317151e46 Mtests

 So there you go. I'm going to sleep, if you need any extra help let 
me know.


So the major difference with this patch applied is that the sender 
could

send more data than the receive wants to read. I can't see the actual
migrate command you used down there.

I haven't seen this actually being a problem so far, as the receiver
just close()s its file descriptor once it hits VM_EOF. This should 
only

break senders if they expect they can send more. That said, I think I
only tested offline migration (via exec:), so maybe QEMU is behaving
badly and actually wants to send all data and just fails the migration
without?


Alex



run-2015-02-12-22.20.21.tar.bz2
Description: application/bzip-compressed-tar


Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-13 Thread Alexander Graf


On 13.02.15 10:04, Dr. David Alan Gilbert wrote:
 * Alexander Graf (ag...@suse.de) wrote:


 On 13.02.15 01:29, Lucas Meneghel Rodrigues wrote:
 Copying Alex.

 OK, after bisecting, this is what I've got:

 8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
 commit 8118f0950fc77cce7873002a5021172dd6e040b5
 Author: Alexander Graf ag...@suse.de mailto:ag...@suse.de
 Date:   Thu Jan 22 15:01:39 2015 +0100

 migration: Append JSON description of migration stream
 
 One of the annoyances of the current migration format is the fact that
 it's not self-describing. In fact, it's not properly describing at all.
 Some code randomly scattered throughout QEMU elaborates roughly how to
 read and write a stream of bytes.
 
 We discussed an idea during KVM Forum 2013 to add a JSON description of
 the migration protocol itself to the migration stream. This patch
 adds a section after the VM_END migration end marker that contains
 description data on what the device sections of the stream are
 composed of.
 
 This approach is backwards compatible with any QEMU version reading the
 stream, because QEMU just stops reading after the VM_END marker and
 ignores
 any data following it.
 
 With an additional external program this allows us to decipher the
 contents of any migration stream and hopefully make migration bugs
 easier
 to track down.
 
 Signed-off-by: Alexander Graf ag...@suse.de mailto:ag...@suse.de
 Signed-off-by: Amit Shah amit.s...@redhat.com
 mailto:amit.s...@redhat.com
 Signed-off-by: Juan Quintela quint...@redhat.com
 mailto:quint...@redhat.com

 :04 04 e9aac242a61fbd05bbb0daa3e8877970e738
 61df81f831bc86b29f65883523ea95abb36f1ec5 Mhw
 :04 04 fe0659bed17d86c43657c26622d64fd44a1af037
 7092a6b6515a3d0077f68ff2d80dbd74597a244f Minclude
 :04 04 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1
 c2b1dcda197d96657458d699c185e39ae45f3c6c Mmigration
 :100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d
 80407662ad3ed860d33a9d35f5c44b1d19c4612b Msavevm.c
 :04 04 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa
 7aaf3d10ef7f73413b228e854fe6f04317151e46 Mtests

 So there you go. I'm going to sleep, if you need any extra help let me know.

 So the major difference with this patch applied is that the sender could
 send more data than the receive wants to read. I can't see the actual
 migrate command you used down there.

 I haven't seen this actually being a problem so far, as the receiver
 just close()s its file descriptor once it hits VM_EOF. This should only
 break senders if they expect they can send more. That said, I think I
 only tested offline migration (via exec:), so maybe QEMU is behaving
 badly and actually wants to send all data and just fails the migration
 without?
 
 Hmm, for such an odd change to the migration stream it's a surprise you
 didn't test it live.

Well, let's say I don't remember explicitly testing it live - I probably
did at one point.

I just verified that migrating with tcp:... works fine in master.


Alex



Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-13 Thread Lucas Meneghel Rodrigues



On Fri, Feb 13, 2015 at 9:18 AM, Alexander Graf ag...@suse.de wrote:



On 13.02.15 10:04, Dr. David Alan Gilbert wrote:

 * Alexander Graf (ag...@suse.de) wrote:



 On 13.02.15 01:29, Lucas Meneghel Rodrigues wrote:

 Copying Alex.

 OK, after bisecting, this is what I've got:

 8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
 commit 8118f0950fc77cce7873002a5021172dd6e040b5
 Author: Alexander Graf ag...@suse.de mailto:ag...@suse.de
 Date:   Thu Jan 22 15:01:39 2015 +0100

 migration: Append JSON description of migration stream

 One of the annoyances of the current migration format is the 
fact that
 it's not self-describing. In fact, it's not properly 
describing at all.
 Some code randomly scattered throughout QEMU elaborates 
roughly how to

 read and write a stream of bytes.

 We discussed an idea during KVM Forum 2013 to add a JSON 
description of
 the migration protocol itself to the migration stream. This 
patch
 adds a section after the VM_END migration end marker that 
contains

 description data on what the device sections of the stream are
 composed of.

 This approach is backwards compatible with any QEMU version 
reading the
 stream, because QEMU just stops reading after the VM_END 
marker and

 ignores
 any data following it.

 With an additional external program this allows us to 
decipher the
 contents of any migration stream and hopefully make migration 
bugs

 easier
 to track down.

 Signed-off-by: Alexander Graf ag...@suse.de 
mailto:ag...@suse.de

 Signed-off-by: Amit Shah amit.s...@redhat.com
 mailto:amit.s...@redhat.com
 Signed-off-by: Juan Quintela quint...@redhat.com
 mailto:quint...@redhat.com

 :04 04 e9aac242a61fbd05bbb0daa3e8877970e738
 61df81f831bc86b29f65883523ea95abb36f1ec5 Mhw
 :04 04 fe0659bed17d86c43657c26622d64fd44a1af037
 7092a6b6515a3d0077f68ff2d80dbd74597a244f Minclude
 :04 04 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1
 c2b1dcda197d96657458d699c185e39ae45f3c6c Mmigration
 :100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d
 80407662ad3ed860d33a9d35f5c44b1d19c4612b Msavevm.c
 :04 04 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa
 7aaf3d10ef7f73413b228e854fe6f04317151e46 Mtests

 So there you go. I'm going to sleep, if you need any extra help 
let me know.


 So the major difference with this patch applied is that the sender 
could
 send more data than the receive wants to read. I can't see the 
actual

 migrate command you used down there.

 I haven't seen this actually being a problem so far, as the 
receiver
 just close()s its file descriptor once it hits VM_EOF. This should 
only
 break senders if they expect they can send more. That said, I 
think I
 only tested offline migration (via exec:), so maybe QEMU is 
behaving
 badly and actually wants to send all data and just fails the 
migration

 without?


 Hmm, for such an odd change to the migration stream it's a surprise 
you

 didn't test it live.


Well, let's say I don't remember explicitly testing it live - I 
probably

did at one point.

I just verified that migrating with tcp:... works fine in master.


It is working fine with tcp migration in master indeed. The thing is, 
virt-test tests a bunch of variants, among them fd. fd is the only one 
failing from the list of things we do test (which also happen to be the 
virt-test default test set).




Alex






Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-13 Thread Dr. David Alan Gilbert
* Alexander Graf (ag...@suse.de) wrote:
 
 
 On 13.02.15 10:04, Dr. David Alan Gilbert wrote:
  * Alexander Graf (ag...@suse.de) wrote:
 
 
  On 13.02.15 01:29, Lucas Meneghel Rodrigues wrote:
  Copying Alex.
 
  OK, after bisecting, this is what I've got:
 
  8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
  commit 8118f0950fc77cce7873002a5021172dd6e040b5
  Author: Alexander Graf ag...@suse.de mailto:ag...@suse.de
  Date:   Thu Jan 22 15:01:39 2015 +0100
 
  migration: Append JSON description of migration stream
  
  One of the annoyances of the current migration format is the fact that
  it's not self-describing. In fact, it's not properly describing at 
  all.
  Some code randomly scattered throughout QEMU elaborates roughly how to
  read and write a stream of bytes.
  
  We discussed an idea during KVM Forum 2013 to add a JSON description 
  of
  the migration protocol itself to the migration stream. This patch
  adds a section after the VM_END migration end marker that contains
  description data on what the device sections of the stream are
  composed of.
  
  This approach is backwards compatible with any QEMU version reading 
  the
  stream, because QEMU just stops reading after the VM_END marker and
  ignores
  any data following it.
  
  With an additional external program this allows us to decipher the
  contents of any migration stream and hopefully make migration bugs
  easier
  to track down.
  
  Signed-off-by: Alexander Graf ag...@suse.de mailto:ag...@suse.de
  Signed-off-by: Amit Shah amit.s...@redhat.com
  mailto:amit.s...@redhat.com
  Signed-off-by: Juan Quintela quint...@redhat.com
  mailto:quint...@redhat.com
 
  :04 04 e9aac242a61fbd05bbb0daa3e8877970e738
  61df81f831bc86b29f65883523ea95abb36f1ec5 Mhw
  :04 04 fe0659bed17d86c43657c26622d64fd44a1af037
  7092a6b6515a3d0077f68ff2d80dbd74597a244f Minclude
  :04 04 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1
  c2b1dcda197d96657458d699c185e39ae45f3c6c Mmigration
  :100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d
  80407662ad3ed860d33a9d35f5c44b1d19c4612b Msavevm.c
  :04 04 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa
  7aaf3d10ef7f73413b228e854fe6f04317151e46 Mtests
 
  So there you go. I'm going to sleep, if you need any extra help let me 
  know.
 
  So the major difference with this patch applied is that the sender could
  send more data than the receive wants to read. I can't see the actual
  migrate command you used down there.
 
  I haven't seen this actually being a problem so far, as the receiver
  just close()s its file descriptor once it hits VM_EOF. This should only
  break senders if they expect they can send more. That said, I think I
  only tested offline migration (via exec:), so maybe QEMU is behaving
  badly and actually wants to send all data and just fails the migration
  without?
  
  Hmm, for such an odd change to the migration stream it's a surprise you
  didn't test it live.
 
 Well, let's say I don't remember explicitly testing it live - I probably
 did at one point.
 
 I just verified that migrating with tcp:... works fine in master.

Yes, that's fair.

My suspicion (for which I have no proof) is that it might depend on the
amount of buffer in the connection; if there's enough buffer to hold
your JSON description it'll work, because you'll have sent the JSON
before the destination has spotted the terminator; if you've
not got much buffering (e.g. on a local fd) then the source might
get stuck trying to write the json or error because the destination
has closed the fd.

Dave

 
 
 Alex
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-13 Thread Alexander Graf


On 13.02.15 12:23, Lucas Meneghel Rodrigues wrote:
 
 
 On Fri, Feb 13, 2015 at 9:18 AM, Alexander Graf ag...@suse.de wrote:


 On 13.02.15 10:04, Dr. David Alan Gilbert wrote:
  * Alexander Graf (ag...@suse.de) wrote:


  On 13.02.15 01:29, Lucas Meneghel Rodrigues wrote:
  Copying Alex.

  OK, after bisecting, this is what I've got:

  8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
  commit 8118f0950fc77cce7873002a5021172dd6e040b5
  Author: Alexander Graf ag...@suse.de mailto:ag...@suse.de
  Date:   Thu Jan 22 15:01:39 2015 +0100

  migration: Append JSON description of migration stream

  One of the annoyances of the current migration format is the
 fact that
  it's not self-describing. In fact, it's not properly
 describing at all.
  Some code randomly scattered throughout QEMU elaborates
 roughly how to
  read and write a stream of bytes.

  We discussed an idea during KVM Forum 2013 to add a JSON
 description of
  the migration protocol itself to the migration stream. This patch
  adds a section after the VM_END migration end marker that
 contains
  description data on what the device sections of the stream are
  composed of.

  This approach is backwards compatible with any QEMU version
 reading the
  stream, because QEMU just stops reading after the VM_END
 marker and
  ignores
  any data following it.

  With an additional external program this allows us to decipher
 the
  contents of any migration stream and hopefully make migration
 bugs
  easier
  to track down.

  Signed-off-by: Alexander Graf ag...@suse.de
 mailto:ag...@suse.de
  Signed-off-by: Amit Shah amit.s...@redhat.com
  mailto:amit.s...@redhat.com
  Signed-off-by: Juan Quintela quint...@redhat.com
  mailto:quint...@redhat.com

  :04 04 e9aac242a61fbd05bbb0daa3e8877970e738
  61df81f831bc86b29f65883523ea95abb36f1ec5 Mhw
  :04 04 fe0659bed17d86c43657c26622d64fd44a1af037
  7092a6b6515a3d0077f68ff2d80dbd74597a244f Minclude
  :04 04 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1
  c2b1dcda197d96657458d699c185e39ae45f3c6c Mmigration
  :100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d
  80407662ad3ed860d33a9d35f5c44b1d19c4612b Msavevm.c
  :04 04 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa
  7aaf3d10ef7f73413b228e854fe6f04317151e46 Mtests

  So there you go. I'm going to sleep, if you need any extra help
 let me know.

  So the major difference with this patch applied is that the sender
 could
  send more data than the receive wants to read. I can't see the actual
  migrate command you used down there.

  I haven't seen this actually being a problem so far, as the receiver
  just close()s its file descriptor once it hits VM_EOF. This should
 only
  break senders if they expect they can send more. That said, I think I
  only tested offline migration (via exec:), so maybe QEMU is behaving
  badly and actually wants to send all data and just fails the migration
  without?

  Hmm, for such an odd change to the migration stream it's a surprise you
  didn't test it live.

 Well, let's say I don't remember explicitly testing it live - I probably
 did at one point.

 I just verified that migrating with tcp:... works fine in master.
 
 It is working fine with tcp migration in master indeed. The thing is,
 virt-test tests a bunch of variants, among them fd. fd is the only one
 failing from the list of things we do test (which also happen to be the
 virt-test default test set).

Can you please test whether the patch below makes things work for you again?


Alex

From ef6fde21007e62529799264f57a65c6bb3d0d414 Mon Sep 17 00:00:00 2001
From: Alexander Graf ag...@suse.de
Date: Sat, 14 Feb 2015 00:21:01 +0100
Subject: [PATCH] migration: Read JSON VM description on incoming migration

One of the really nice things about the VM description format is that it
goes
over the wire when live migration is happening. Unfortunately QEMU today
closes
any socket once it sees VM_EOF coming, so we never give the VMDESC the
chance to
actually land on the wire.

This patch makes QEMU read the description as well. This way we ensure that
anything wire tapping us in between will get the chance to also
interpret the
stream.

Along the way we also fix virt tests that assume that number_bytes_sent
on the
sender side is equal to number_bytes_read which was true before the VMDESC
patches and is true again with this patch.

Signed-off-by: Alexander Graf ag...@suse.de

diff --git a/savevm.c b/savevm.c
index 8040766..ff4bead 100644
--- a/savevm.c
+++ b/savevm.c
@@ -929,6 +929,7 @@ int qemu_loadvm_state(QEMUFile *f)
 uint8_t section_type;
 unsigned int v;
 int ret;
+int file_error_after_eof = -1;

 if (qemu_savevm_state_blocked(local_err)) {
 error_report(%s, error_get_pretty(local_err));
@@ -1034,6 +1035,22 @@ int qemu_loadvm_state(QEMUFile *f)
 }
 }

+file_error_after_eof = qemu_file_get_error(f);
+
+/*
+ * Try 

Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-12 Thread Lucas Meneghel Rodrigues
OK, indeed I can reproduce the problem. It's specific to the filedescriptor
migration. An easy way to reproduce it is by doing:

git clone https://github.com/autotest/virt-test.git

cd virt-test
./run -t qemu --bootstrap
./run -t qemu
--tests type_specific.io-github-autotest-qemu.migrate.default.fd

That's it. I will see if I can bisect this quickly to pinpoint the QEMU
commit that brought the regression.

The qemu master commit I just tested is:

commit 449008f86418583a1f0fb946cf91ee7b4797317d
Merge: 5c697ae bc5baff
Author: Peter Maydell peter.mayd...@linaro.org
Date:   Wed Feb 11 05:14:41 2015 +

Merge remote-tracking branch
'remotes/awilliam/tags/vfio-update-20150210.0' into staging

RCU fixes and cleanup (Paolo Bonzini)
Switch to v2 IOMMU interface (Alex Williamson)
DEBUG build fix (Alexey Kardashevskiy)

# gpg: Signature made Tue 10 Feb 2015 17:37:06 GMT using RSA key ID
3BB08B22
# gpg: Good signature from Alex Williamson alex.william...@redhat.com

# gpg: aka Alex Williamson a...@shazbot.org
# gpg: aka Alex Williamson alwil...@redhat.com
# gpg: aka Alex Williamson alex.l.william...@gmail.com


* remotes/awilliam/tags/vfio-update-20150210.0:
  vfio: Fix debug message compile error
  vfio: Use vfio type1 v2 IOMMU interface
  vfio: unmap and free BAR data in instance_finalize
  vfio: free dynamically-allocated data in instance_finalize
  vfio: cleanup vfio_get_device error path, remove vfio_populate_device
callback
  memory: unregister AddressSpace MemoryListener within BQL

Signed-off-by: Peter Maydell peter.mayd...@linaro.org


On Thu, Feb 12, 2015 at 8:19 PM, Lucas Meneghel Rodrigues look...@gmail.com
 wrote:

 From what the log says, after a round of migrations 'info migrate' does
 not respond after 4 minutes, timing out. Virt Test then shuts down the VM.
 When it tries to check the qcow2 image, it is corrupted. I'm checking out
 the latest master to see how reproducible this problem is.

 On Thu, Feb 12, 2015 at 8:12 PM, Juan Quintela quint...@redhat.com
 wrote:


 Hi

 while testing my changes I noticed that virt-test was failing.  I
 check-out master, and failures are there.

 This is one extract of the log after the 1st failure.  Notice that it
 fails randomly, not every time.

 I have to go to bed right now, so if anybody beats me with a fix, I
 would be happy when I wakeup.

 Thanks, Juan.


 22:54:07 DEBUG| (monitor hmp1) Response to 'info migrate'
 22:54:07 DEBUG| (monitor hmp1)capabilities: xbzrle: off rdma-pin-all:
 off auto-converge: off zero-blocks: off
 22:54:07 DEBUG| (monitor hmp1)Migration status: active
 22:54:07 DEBUG| (monitor hmp1)total time: 2003 milliseconds
 22:54:07 DEBUG| (monitor hmp1)expected downtime: 300 milliseconds
 22:54:07 DEBUG| (monitor hmp1)setup: 3 milliseconds
 22:54:07 DEBUG| (monitor hmp1)transferred ram: 67619 kbytes
 22:54:07 DEBUG| (monitor hmp1)throughput: 268.61 mbps
 22:54:07 DEBUG| (monitor hmp1)remaining ram: 103056 kbytes
 22:54:07 DEBUG| (monitor hmp1)total ram: 1065796 kbytes
 22:54:07 DEBUG| (monitor hmp1)duplicate: 224304 pages
 22:54:07 DEBUG| (monitor hmp1)skipped: 0 pages
 22:54:07 DEBUG| (monitor hmp1)normal: 16380 pages
 22:54:07 DEBUG| (monitor hmp1)normal bytes: 65520 kbytes
 22:54:07 DEBUG| (monitor hmp1)dirty sync count: 0
 22:54:09 DEBUG| Waiting for migration to complete (4.006475 secs)
 22:54:09 DEBUG| (monitor hmp1) Sending command 'info migrate'
 22:54:09 DEBUG| Send command: info migrate
 22:54:09 DEBUG| (monitor hmp1) Response to 'info migrate'
 22:54:09 DEBUG| (monitor hmp1)capabilities: xbzrle: off rdma-pin-all:
 off auto-converge: off zero-blocks: off
 22:54:09 DEBUG| (monitor hmp1)Migration status: active
 22:54:09 DEBUG| (monitor hmp1)total time: 4008 milliseconds
 22:54:09 DEBUG| (monitor hmp1)expected downtime: 300 milliseconds
 22:54:09 DEBUG| (monitor hmp1)setup: 3 milliseconds
 22:54:09 DEBUG| (monitor hmp1)transferred ram: 131397 kbytes
 22:54:09 DEBUG| (monitor hmp1)throughput: 268.57 mbps
 22:54:09 DEBUG| (monitor hmp1)remaining ram: 31392 kbytes
 22:54:09 DEBUG| (monitor hmp1)total ram: 1065796 kbytes
 22:54:09 DEBUG| (monitor hmp1)duplicate: 226311 pages
 22:54:09 DEBUG| (monitor hmp1)skipped: 0 pages
 22:54:09 DEBUG| (monitor hmp1)normal: 32289 pages
 22:54:09 DEBUG| (monitor hmp1)normal bytes: 129156 kbytes
 22:54:09 DEBUG| (monitor hmp1)dirty sync count: 0
 22:54:11 DEBUG| Waiting for migration to complete (6.011556 secs)
 22:54:11 DEBUG| (monitor hmp1) Sending command 'info migrate'
 22:54:11 DEBUG| Send command: info migrate
 22:54:32 WARNI| virt-tests-vm1 is not alive. Can not query the register
 status
 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10880)
 22:58:11 DEBUG| Ending VM virt-tests-vm1 process (monitor)
 22:58:11 INFO | [qemu output] (Process terminated with status 

Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-12 Thread Lucas Meneghel Rodrigues
On Thu, Feb 12, 2015 at 8:56 PM, Lucas Meneghel Rodrigues look...@gmail.com
 wrote:

 OK, indeed I can reproduce the problem. It's specific to the
 filedescriptor migration. An easy way to reproduce it is by doing:

 git clone https://github.com/autotest/virt-test.git

 cd virt-test
 ./run -t qemu --bootstrap
 ./run -t qemu
 --tests type_specific.io-github-autotest-qemu.migrate.default.fd


A little correction here, it should've been:

./run -t qemu
--tests type_specific.io-github-autotest-qemu.migrate.default.fd --qemu-bin
/path/to/qemu-built-from-master



 That's it. I will see if I can bisect this quickly to pinpoint the QEMU
 commit that brought the regression.

 The qemu master commit I just tested is:

 commit 449008f86418583a1f0fb946cf91ee7b4797317d
 Merge: 5c697ae bc5baff
 Author: Peter Maydell peter.mayd...@linaro.org
 Date:   Wed Feb 11 05:14:41 2015 +

 Merge remote-tracking branch
 'remotes/awilliam/tags/vfio-update-20150210.0' into staging

 RCU fixes and cleanup (Paolo Bonzini)
 Switch to v2 IOMMU interface (Alex Williamson)
 DEBUG build fix (Alexey Kardashevskiy)

 # gpg: Signature made Tue 10 Feb 2015 17:37:06 GMT using RSA key ID
 3BB08B22
 # gpg: Good signature from Alex Williamson 
 alex.william...@redhat.com
 # gpg: aka Alex Williamson a...@shazbot.org
 # gpg: aka Alex Williamson alwil...@redhat.com
 # gpg: aka Alex Williamson 
 alex.l.william...@gmail.com

 * remotes/awilliam/tags/vfio-update-20150210.0:
   vfio: Fix debug message compile error
   vfio: Use vfio type1 v2 IOMMU interface
   vfio: unmap and free BAR data in instance_finalize
   vfio: free dynamically-allocated data in instance_finalize
   vfio: cleanup vfio_get_device error path, remove
 vfio_populate_device callback
   memory: unregister AddressSpace MemoryListener within BQL

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org


 On Thu, Feb 12, 2015 at 8:19 PM, Lucas Meneghel Rodrigues 
 look...@gmail.com wrote:

 From what the log says, after a round of migrations 'info migrate' does
 not respond after 4 minutes, timing out. Virt Test then shuts down the VM.
 When it tries to check the qcow2 image, it is corrupted. I'm checking out
 the latest master to see how reproducible this problem is.

 On Thu, Feb 12, 2015 at 8:12 PM, Juan Quintela quint...@redhat.com
 wrote:


 Hi

 while testing my changes I noticed that virt-test was failing.  I
 check-out master, and failures are there.

 This is one extract of the log after the 1st failure.  Notice that it
 fails randomly, not every time.

 I have to go to bed right now, so if anybody beats me with a fix, I
 would be happy when I wakeup.

 Thanks, Juan.


 22:54:07 DEBUG| (monitor hmp1) Response to 'info migrate'
 22:54:07 DEBUG| (monitor hmp1)capabilities: xbzrle: off
 rdma-pin-all: off auto-converge: off zero-blocks: off
 22:54:07 DEBUG| (monitor hmp1)Migration status: active
 22:54:07 DEBUG| (monitor hmp1)total time: 2003 milliseconds
 22:54:07 DEBUG| (monitor hmp1)expected downtime: 300 milliseconds
 22:54:07 DEBUG| (monitor hmp1)setup: 3 milliseconds
 22:54:07 DEBUG| (monitor hmp1)transferred ram: 67619 kbytes
 22:54:07 DEBUG| (monitor hmp1)throughput: 268.61 mbps
 22:54:07 DEBUG| (monitor hmp1)remaining ram: 103056 kbytes
 22:54:07 DEBUG| (monitor hmp1)total ram: 1065796 kbytes
 22:54:07 DEBUG| (monitor hmp1)duplicate: 224304 pages
 22:54:07 DEBUG| (monitor hmp1)skipped: 0 pages
 22:54:07 DEBUG| (monitor hmp1)normal: 16380 pages
 22:54:07 DEBUG| (monitor hmp1)normal bytes: 65520 kbytes
 22:54:07 DEBUG| (monitor hmp1)dirty sync count: 0
 22:54:09 DEBUG| Waiting for migration to complete (4.006475 secs)
 22:54:09 DEBUG| (monitor hmp1) Sending command 'info migrate'
 22:54:09 DEBUG| Send command: info migrate
 22:54:09 DEBUG| (monitor hmp1) Response to 'info migrate'
 22:54:09 DEBUG| (monitor hmp1)capabilities: xbzrle: off
 rdma-pin-all: off auto-converge: off zero-blocks: off
 22:54:09 DEBUG| (monitor hmp1)Migration status: active
 22:54:09 DEBUG| (monitor hmp1)total time: 4008 milliseconds
 22:54:09 DEBUG| (monitor hmp1)expected downtime: 300 milliseconds
 22:54:09 DEBUG| (monitor hmp1)setup: 3 milliseconds
 22:54:09 DEBUG| (monitor hmp1)transferred ram: 131397 kbytes
 22:54:09 DEBUG| (monitor hmp1)throughput: 268.57 mbps
 22:54:09 DEBUG| (monitor hmp1)remaining ram: 31392 kbytes
 22:54:09 DEBUG| (monitor hmp1)total ram: 1065796 kbytes
 22:54:09 DEBUG| (monitor hmp1)duplicate: 226311 pages
 22:54:09 DEBUG| (monitor hmp1)skipped: 0 pages
 22:54:09 DEBUG| (monitor hmp1)normal: 32289 pages
 22:54:09 DEBUG| (monitor hmp1)normal bytes: 129156 kbytes
 22:54:09 DEBUG| (monitor hmp1)dirty sync count: 0
 22:54:11 DEBUG| Waiting for migration to complete (6.011556 secs)
 22:54:11 DEBUG| (monitor hmp1) Sending command 'info migrate'
 22:54:11 

Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-12 Thread Lucas Meneghel Rodrigues
From what the log says, after a round of migrations 'info migrate' does not
respond after 4 minutes, timing out. Virt Test then shuts down the VM. When
it tries to check the qcow2 image, it is corrupted. I'm checking out the
latest master to see how reproducible this problem is.

On Thu, Feb 12, 2015 at 8:12 PM, Juan Quintela quint...@redhat.com wrote:


 Hi

 while testing my changes I noticed that virt-test was failing.  I
 check-out master, and failures are there.

 This is one extract of the log after the 1st failure.  Notice that it
 fails randomly, not every time.

 I have to go to bed right now, so if anybody beats me with a fix, I
 would be happy when I wakeup.

 Thanks, Juan.


 22:54:07 DEBUG| (monitor hmp1) Response to 'info migrate'
 22:54:07 DEBUG| (monitor hmp1)capabilities: xbzrle: off rdma-pin-all:
 off auto-converge: off zero-blocks: off
 22:54:07 DEBUG| (monitor hmp1)Migration status: active
 22:54:07 DEBUG| (monitor hmp1)total time: 2003 milliseconds
 22:54:07 DEBUG| (monitor hmp1)expected downtime: 300 milliseconds
 22:54:07 DEBUG| (monitor hmp1)setup: 3 milliseconds
 22:54:07 DEBUG| (monitor hmp1)transferred ram: 67619 kbytes
 22:54:07 DEBUG| (monitor hmp1)throughput: 268.61 mbps
 22:54:07 DEBUG| (monitor hmp1)remaining ram: 103056 kbytes
 22:54:07 DEBUG| (monitor hmp1)total ram: 1065796 kbytes
 22:54:07 DEBUG| (monitor hmp1)duplicate: 224304 pages
 22:54:07 DEBUG| (monitor hmp1)skipped: 0 pages
 22:54:07 DEBUG| (monitor hmp1)normal: 16380 pages
 22:54:07 DEBUG| (monitor hmp1)normal bytes: 65520 kbytes
 22:54:07 DEBUG| (monitor hmp1)dirty sync count: 0
 22:54:09 DEBUG| Waiting for migration to complete (4.006475 secs)
 22:54:09 DEBUG| (monitor hmp1) Sending command 'info migrate'
 22:54:09 DEBUG| Send command: info migrate
 22:54:09 DEBUG| (monitor hmp1) Response to 'info migrate'
 22:54:09 DEBUG| (monitor hmp1)capabilities: xbzrle: off rdma-pin-all:
 off auto-converge: off zero-blocks: off
 22:54:09 DEBUG| (monitor hmp1)Migration status: active
 22:54:09 DEBUG| (monitor hmp1)total time: 4008 milliseconds
 22:54:09 DEBUG| (monitor hmp1)expected downtime: 300 milliseconds
 22:54:09 DEBUG| (monitor hmp1)setup: 3 milliseconds
 22:54:09 DEBUG| (monitor hmp1)transferred ram: 131397 kbytes
 22:54:09 DEBUG| (monitor hmp1)throughput: 268.57 mbps
 22:54:09 DEBUG| (monitor hmp1)remaining ram: 31392 kbytes
 22:54:09 DEBUG| (monitor hmp1)total ram: 1065796 kbytes
 22:54:09 DEBUG| (monitor hmp1)duplicate: 226311 pages
 22:54:09 DEBUG| (monitor hmp1)skipped: 0 pages
 22:54:09 DEBUG| (monitor hmp1)normal: 32289 pages
 22:54:09 DEBUG| (monitor hmp1)normal bytes: 129156 kbytes
 22:54:09 DEBUG| (monitor hmp1)dirty sync count: 0
 22:54:11 DEBUG| Waiting for migration to complete (6.011556 secs)
 22:54:11 DEBUG| (monitor hmp1) Sending command 'info migrate'
 22:54:11 DEBUG| Send command: info migrate
 22:54:32 WARNI| virt-tests-vm1 is not alive. Can not query the register
 status
 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10880)
 22:58:11 DEBUG| Ending VM virt-tests-vm1 process (monitor)
 22:58:11 INFO | [qemu output] (Process terminated with status 0)
 22:58:11 DEBUG| VM virt-tests-vm1 down (monitor)
 22:58:11 DEBUG| Host does not support OpenVSwitch: Missing command:
 ovs-vswitchd
 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10763)
 22:58:11 DEBUG| Shutting down VM virt-tests-vm1 (shell)
 22:58:11 DEBUG| Login command: 'ssh -o UserKnownHostsFile=/dev/null -o
 StrictHostKeyChecking=no -o PreferredAuthentications=password -p 5000
 root@192.168.10.200'
 22:58:11 DEBUG| virt-tests-vm1 alive now. Used to failed to get register
 info from guest 9 times
 22:58:13 INFO | [qemu output] (Process terminated with status 0)
 22:58:13 DEBUG| VM virt-tests-vm1 down (shell)
 22:58:14 DEBUG| Host does not support OpenVSwitch: Missing command:
 ovs-vswitchd
 22:58:14 DEBUG| Checking image file
 /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2
 22:58:14 DEBUG| Running '/bin/qemu-img info
 /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
 22:58:14 DEBUG| Running '/bin/qemu-img check
 /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
 22:58:14 ERROR| [stdout]
 22:58:14 ERROR| [stdout] 1 errors were found on the image.
 22:58:14 ERROR| [stdout] Data may be corrupted, or further writes to the
 image may corrupt it.
 22:58:14 ERROR| [stdout] 13495/163840 = 8.24% allocated, 0.03% fragmented,
 0.00% compressed clusters
 22:58:14 ERROR| [stdout] Image end offset: 885129216
 22:58:14 ERROR| [stderr] ERROR cluster 13505 refcount=1 reference=2
 22:58:14 ERROR| Errors found on image:
 '/mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
 22:58:14 WARNI| virt-tests-vm1 is not alive. Can not query the register
 status
 22:58:14 DEBUG| Thread quit. Used to failed to get register info from
 guest 20150212-225320-Mb1E4VV7 for 1 times.




-- 
Lucas


Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-12 Thread Alexander Graf


On 13.02.15 01:29, Lucas Meneghel Rodrigues wrote:
 Copying Alex.
 
 OK, after bisecting, this is what I've got:
 
 8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
 commit 8118f0950fc77cce7873002a5021172dd6e040b5
 Author: Alexander Graf ag...@suse.de mailto:ag...@suse.de
 Date:   Thu Jan 22 15:01:39 2015 +0100
 
 migration: Append JSON description of migration stream
 
 One of the annoyances of the current migration format is the fact that
 it's not self-describing. In fact, it's not properly describing at all.
 Some code randomly scattered throughout QEMU elaborates roughly how to
 read and write a stream of bytes.
 
 We discussed an idea during KVM Forum 2013 to add a JSON description of
 the migration protocol itself to the migration stream. This patch
 adds a section after the VM_END migration end marker that contains
 description data on what the device sections of the stream are
 composed of.
 
 This approach is backwards compatible with any QEMU version reading the
 stream, because QEMU just stops reading after the VM_END marker and
 ignores
 any data following it.
 
 With an additional external program this allows us to decipher the
 contents of any migration stream and hopefully make migration bugs
 easier
 to track down.
 
 Signed-off-by: Alexander Graf ag...@suse.de mailto:ag...@suse.de
 Signed-off-by: Amit Shah amit.s...@redhat.com
 mailto:amit.s...@redhat.com
 Signed-off-by: Juan Quintela quint...@redhat.com
 mailto:quint...@redhat.com
 
 :04 04 e9aac242a61fbd05bbb0daa3e8877970e738
 61df81f831bc86b29f65883523ea95abb36f1ec5 Mhw
 :04 04 fe0659bed17d86c43657c26622d64fd44a1af037
 7092a6b6515a3d0077f68ff2d80dbd74597a244f Minclude
 :04 04 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1
 c2b1dcda197d96657458d699c185e39ae45f3c6c Mmigration
 :100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d
 80407662ad3ed860d33a9d35f5c44b1d19c4612b Msavevm.c
 :04 04 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa
 7aaf3d10ef7f73413b228e854fe6f04317151e46 Mtests
 
 So there you go. I'm going to sleep, if you need any extra help let me know.

So the major difference with this patch applied is that the sender could
send more data than the receive wants to read. I can't see the actual
migrate command you used down there.

I haven't seen this actually being a problem so far, as the receiver
just close()s its file descriptor once it hits VM_EOF. This should only
break senders if they expect they can send more. That said, I think I
only tested offline migration (via exec:), so maybe QEMU is behaving
badly and actually wants to send all data and just fails the migration
without?


Alex



Re: [Qemu-devel] HEAD is failing virt-test on migration tests

2015-02-12 Thread Lucas Meneghel Rodrigues
Copying Alex.

OK, after bisecting, this is what I've got:

8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
commit 8118f0950fc77cce7873002a5021172dd6e040b5
Author: Alexander Graf ag...@suse.de
Date:   Thu Jan 22 15:01:39 2015 +0100

migration: Append JSON description of migration stream

One of the annoyances of the current migration format is the fact that
it's not self-describing. In fact, it's not properly describing at all.
Some code randomly scattered throughout QEMU elaborates roughly how to
read and write a stream of bytes.

We discussed an idea during KVM Forum 2013 to add a JSON description of
the migration protocol itself to the migration stream. This patch
adds a section after the VM_END migration end marker that contains
description data on what the device sections of the stream are composed
of.

This approach is backwards compatible with any QEMU version reading the
stream, because QEMU just stops reading after the VM_END marker and
ignores
any data following it.

With an additional external program this allows us to decipher the
contents of any migration stream and hopefully make migration bugs
easier
to track down.

Signed-off-by: Alexander Graf ag...@suse.de
Signed-off-by: Amit Shah amit.s...@redhat.com
Signed-off-by: Juan Quintela quint...@redhat.com

:04 04 e9aac242a61fbd05bbb0daa3e8877970e738
61df81f831bc86b29f65883523ea95abb36f1ec5 M hw
:04 04 fe0659bed17d86c43657c26622d64fd44a1af037
7092a6b6515a3d0077f68ff2d80dbd74597a244f M include
:04 04 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1
c2b1dcda197d96657458d699c185e39ae45f3c6c M migration
:100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d
80407662ad3ed860d33a9d35f5c44b1d19c4612b M savevm.c
:04 04 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa
7aaf3d10ef7f73413b228e854fe6f04317151e46 M tests

So there you go. I'm going to sleep, if you need any extra help let me know.

Cheers,

Lucas

On Thu, Feb 12, 2015 at 8:56 PM, Lucas Meneghel Rodrigues look...@gmail.com
 wrote:

 OK, indeed I can reproduce the problem. It's specific to the
 filedescriptor migration. An easy way to reproduce it is by doing:

 git clone https://github.com/autotest/virt-test.git

 cd virt-test
 ./run -t qemu --bootstrap
 ./run -t qemu
 --tests type_specific.io-github-autotest-qemu.migrate.default.fd

 That's it. I will see if I can bisect this quickly to pinpoint the QEMU
 commit that brought the regression.

 The qemu master commit I just tested is:

 commit 449008f86418583a1f0fb946cf91ee7b4797317d
 Merge: 5c697ae bc5baff
 Author: Peter Maydell peter.mayd...@linaro.org
 Date:   Wed Feb 11 05:14:41 2015 +

 Merge remote-tracking branch
 'remotes/awilliam/tags/vfio-update-20150210.0' into staging

 RCU fixes and cleanup (Paolo Bonzini)
 Switch to v2 IOMMU interface (Alex Williamson)
 DEBUG build fix (Alexey Kardashevskiy)

 # gpg: Signature made Tue 10 Feb 2015 17:37:06 GMT using RSA key ID
 3BB08B22
 # gpg: Good signature from Alex Williamson 
 alex.william...@redhat.com
 # gpg: aka Alex Williamson a...@shazbot.org
 # gpg: aka Alex Williamson alwil...@redhat.com
 # gpg: aka Alex Williamson 
 alex.l.william...@gmail.com

 * remotes/awilliam/tags/vfio-update-20150210.0:
   vfio: Fix debug message compile error
   vfio: Use vfio type1 v2 IOMMU interface
   vfio: unmap and free BAR data in instance_finalize
   vfio: free dynamically-allocated data in instance_finalize
   vfio: cleanup vfio_get_device error path, remove
 vfio_populate_device callback
   memory: unregister AddressSpace MemoryListener within BQL

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org


 On Thu, Feb 12, 2015 at 8:19 PM, Lucas Meneghel Rodrigues 
 look...@gmail.com wrote:

 From what the log says, after a round of migrations 'info migrate' does
 not respond after 4 minutes, timing out. Virt Test then shuts down the VM.
 When it tries to check the qcow2 image, it is corrupted. I'm checking out
 the latest master to see how reproducible this problem is.

 On Thu, Feb 12, 2015 at 8:12 PM, Juan Quintela quint...@redhat.com
 wrote:


 Hi

 while testing my changes I noticed that virt-test was failing.  I
 check-out master, and failures are there.

 This is one extract of the log after the 1st failure.  Notice that it
 fails randomly, not every time.

 I have to go to bed right now, so if anybody beats me with a fix, I
 would be happy when I wakeup.

 Thanks, Juan.


 22:54:07 DEBUG| (monitor hmp1) Response to 'info migrate'
 22:54:07 DEBUG| (monitor hmp1)capabilities: xbzrle: off
 rdma-pin-all: off auto-converge: off zero-blocks: off
 22:54:07 DEBUG| (monitor hmp1)Migration status: active
 22:54:07 DEBUG| (monitor hmp1)total time: 2003 milliseconds
 22:54:07 DEBUG| (monitor hmp1)expected downtime: 300 milliseconds
 22:54:07 DEBUG| (monitor hmp1)