* Juan Quintela (quint...@redhat.com) wrote: > "Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote: > > * Juan Quintela (quint...@redhat.com) wrote: > >> "Dr. David Alan Gilbert (git)" <dgilb...@redhat.com> wrote: > >> > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > >> > > >> > My e4d633207 patch has an over zealous sanity check that checked > >> > the lengths of the RAM Blocks on source/destination were the same. This > >> > isn't true because of the 'used_length' trick for RAM blocks like the > >> > ACPI table that vary in size. > >> > > >> > Prior to that patch RDMA would also fail in this case, but it should > >> > now work with the changes in the set e4d633207 is in. > >> > > >> > Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > >> > > >> > Fixes: e4d633207c129dc5b7d145240ac4a1997ef3902f > >> > --- > >> > migration/rdma.c | 13 +++++++------ > >> > 1 file changed, 7 insertions(+), 6 deletions(-) > >> > > >> > diff --git a/migration/rdma.c b/migration/rdma.c > >> > index f106b2a..1d094b0 100644 > >> > --- a/migration/rdma.c > >> > +++ b/migration/rdma.c > >> > @@ -3338,14 +3338,15 @@ static int qemu_rdma_registration_stop(QEMUFile > >> > *f, void *opaque, > >> > for (i = 0; i < nb_dest_blocks; i++) { > >> > network_to_dest_block(&rdma->dest_blocks[i]); > >> > > >> > - /* We require that the blocks are in the same order */ > >> > + /* We require that the blocks are in the same order, > >> > + * but the used_length trick for acpi blocks means that > >> > + * the destination can validly be larger than the source > >> > + */ > >> > if (rdma->dest_blocks[i].length != local->block[i].length) { > >> > >> Should we change the check to be that destination is bigger or equal > >> than source? > >> > >> With your change, we only remove the check? > > > > I'm actually going to drop this change; so keep the error if they're > > different. > > > > My argument works like this (I've not yet found a good way to test it): > > > > 1) The source sends to the destination a list of RAM blocks in the > > qemu-file stream > > 2) The destination performs a resize on the RAM blocks to match the > > source > > so at this point the destination's block sizes should match. > > Humm, I *thought* that what the resize does is getting it bigger, but if > destination is bigger, it does nothing, no?
The code in migration/ram.c calls qemu_ram_resize for length != block->used_length and similalry qemu_ram_resize always seems to set block->used_length. I'm going to have to have more of a dig into this and figure out what's going on. Dave > > > > 3) The source sends a series of RDMA block registration requests for the > > RAM > > 4) The destination sends a list of RAM registrations back to the source > > 5) This check is checking that this destination list matches the local > > list > > 6) As long as (4) happens after (2) then the size that the destination > > sees > > should always match the source. > > 7) I think 4 is after 2 due to a qemu_fflush > > > > So keeping this check guards against 7 not really being true and/or > > the destination populating it's list of blocks prior to (2) - which I have > > a sneaky feeling might be happening, but am not sure yet. > > > > > > > Dave > > > >> > >> Thanks, Juan. > >> > >> > >> > - ERROR(errp, "Block %s/%d has a different length %" > >> > PRIu64 > >> > - "vs %" PRIu64, local->block[i].block_name, > >> > i, > >> > - local->block[i].length, > >> > + fprintf(stderr, "INFO: Block %s/%d has a different > >> > length %" > >> > + PRIu64 "vs %" PRIu64, > >> > local->block[i].block_name, > >> > + i, local->block[i].length, > >> > rdma->dest_blocks[i].length); > >> > - rdma->error_state = -EINVAL; > >> > - return -EINVAL; > >> > } > >> > local->block[i].remote_host_addr = > >> > rdma->dest_blocks[i].remote_host_addr; > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK