Re: [Qemu-devel] [PATCH V6 08/10] migration: calculate vCPU blocktime on dst side

Peter Xu Wed, 24 May 2017 04:23:52 -0700

On Wed, May 24, 2017 at 12:37:20PM +0300, Alexey wrote:
> On Wed, May 24, 2017 at 03:53:05PM +0800, Peter Xu wrote:
> > On Tue, May 23, 2017 at 02:31:09PM +0300, Alexey Perevalov wrote:
> > > This patch provides blocktime calculation per vCPU,
> > > as a summary and as a overlapped value for all vCPUs.
> > > 
> > > This approach was suggested by Peter Xu, as an improvements of
> > > previous approch where QEMU kept tree with faulted page address and cpus 
> > > bitmask
> > > in it. Now QEMU is keeping array with faulted page address as value and 
> > > vCPU
> > > as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > list for blocktime per vCPU (could be traced with page_fault_addr)
> > > 
> > > Blocktime will not calculated if postcopy_blocktime field of
> > > MigrationIncomingState wasn't initialized.
> > > 
> > > Signed-off-by: Alexey Perevalov <a.pereva...@samsung.com>
> > > ---
> > >  migration/postcopy-ram.c | 102 
> > > ++++++++++++++++++++++++++++++++++++++++++++++-
> > >  migration/trace-events   |   5 ++-
> > >  2 files changed, 105 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index d647769..e70c44b 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -23,6 +23,7 @@
> > >  #include "postcopy-ram.h"
> > >  #include "sysemu/sysemu.h"
> > >  #include "sysemu/balloon.h"
> > > +#include <sys/param.h>
> > >  #include "qemu/error-report.h"
> > >  #include "trace.h"
> > >  
> > > @@ -577,6 +578,101 @@ static int ram_block_enable_notify(const char 
> > > *block_name, void *host_addr,
> > >      return 0;
> > >  }
> > >  
> > > +static int get_mem_fault_cpu_index(uint32_t pid)
> > > +{
> > > +    CPUState *cpu_iter;
> > > +
> > > +    CPU_FOREACH(cpu_iter) {
> > > +        if (cpu_iter->thread_id == pid) {
> > > +            return cpu_iter->cpu_index;
> > > +        }
> > > +    }
> > > +    trace_get_mem_fault_cpu_index(pid);
> > > +    return -1;
> > > +}
> > > +
> > > +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
> > > +        RAMBlock *rb)
> > > +{
> > > +    int cpu;
> > > +    unsigned long int nr_bit;
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> > > +    int64_t now_ms;
> > > +
> > > +    if (!dc || ptid == 0) {
> > > +        return;
> > > +    }
> > > +    cpu = get_mem_fault_cpu_index(ptid);
> > > +    if (cpu < 0) {
> > > +        return;
> > > +    }
> > > +    nr_bit = get_copied_bit_offset(addr);
> > > +    if (test_bit(nr_bit, mis->copied_pages)) {
> > > +        return;
> > > +    }
> > > +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +    if (dc->vcpu_addr[cpu] == 0) {
> > > +        atomic_inc(&dc->smp_cpus_down);
> > > +    }
> > > +
> > > +    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
> > > +    atomic_xchg__nocheck(&dc->last_begin, now_ms);
> > > +    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
> > 
> > Looks like this is not what you and Dave have discussed?
> > 
> > (Btw, sorry to have not followed the thread recently, so I just went
> >  over the discussion again...)
> > 
> > What I see that Dave suggested is (I copied from Dave's email):
> > 
> > blocktime_start:
> >    set CPU stall address
> >    check bitmap entry
> >      if set then zero stall-address
> > 
> > While here it is:
> > 
> > blocktime_start:
> >    check bitmap entry
> >      if set then return
> >    set CPU stall address
> > 
> > I don't think current version can really solve the risk condition. See
> > this possible sequence:
> > 
> >        receive-thread             fault-thread 
> >        --------------             ------------
> >                                   blocktime_start
> >                                     check bitmap entry,
> >                                       if set then return
> >        blocktime_end
> >          set bitmap entry
> >          read CPU stall address,
> >            if none-0 then zero it
> >                                     set CPU stall address [1]
> >         
> > Then imho the address set at [1] will be stall again until forever.
> > 
> agree, I check is in incorrect order
> 
> > I think we should follow exactly what Dave has suggested.
> > 
> > And.. after a second thought, I am afraid even this would not satisfy
> > all risk conditions. What if we consider the UFFDIO_COPY ioctl in?
> > AFAIU after UFFDIO_COPY the faulted vcpu can be running again, then
> > the question is, can it quickly trigger another page fault?
> >
> yes, it can
> 
> > Firstly, a workable sequence is (adding UFFDIO_COPY ioctl in, and
> > showing vcpu-thread X as well):
> > 
> >   receive-thread       fault-thread        vcpu-thread X
> >   --------------       ------------        -------------
> >                                            fault at addr A1
> >                        fault_addr[X]=A1
> >   UFFDIO_COPY page A1
> >   check fault_addr[X] with A1
> >     if match, clear fault_addr[X]
> >                                            vcpu X starts
> > 
> > This is fine.
> > 
> > While since "vcpu X starts" can be right after UFFDIO_COPY, can this
> > be possible?
> Previous picture isn't possible, due to mark_postcopy_blocktime_end
> is being called right after ioctl, and vCPU is waking up
> inside ioctl, so check fault_addr will be after vcpu X starts.
> 
> > 
> >   receive-thread       fault-thread        vcpu-thread X
> >   --------------       ------------        -------------
> >                                            fault at addr A1
> >                        fault_addr[X]=A1
> >   UFFDIO_COPY page A1
> >                                            vcpu X starts
> >                                            fault at addr A2
> >                        fault_addr[X]=A2
> >   check fault_addr[X] with A1
> >     if match, clear fault_addr[X]
> >         ^
> >         |
> >         +---------- here it will not match since now fault_addr[X]==A2
> > 
> > Then looks like fault_addr[X], which is currently A1, will stall
> > again?
> 
> It will be another address(A2), but probably the same vCPU and if in
> this case blocktime_start will be called before blocktime_end we lost
> block time for page A1. Address of the page is unique key in this
> process, not vCPU ;)


I failed to understand the last sentence, anyway...

> Here maybe reasonable to wake up vCPU after blocktime_end.

... I guess this can solve the problem. :)

> 
> 
> 
> > 
> > (I feel like finally we may need something like a per-cpu lock... or I
> >  must have missed something)
> I think no, because locking time of the vCPU is critical in this process.

Yes.

> > 
> > > +
> > > +    trace_mark_postcopy_blocktime_begin(addr, dc, 
> > > dc->page_fault_vcpu_time[cpu],
> > > +            cpu);
> > > +}
> > > +
> > > +static void mark_postcopy_blocktime_end(uint64_t addr)
> > > +{
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> > > +    int i, affected_cpu = 0;
> > > +    int64_t now_ms;
> > > +    bool vcpu_total_blocktime = false;
> > > +    unsigned long int nr_bit;
> > > +
> > > +    if (!dc) {
> > > +        return;
> > > +    }
> > > +    /* mark that page as copied */
> > > +    nr_bit = get_copied_bit_offset(addr);
> > > +    set_bit_atomic(nr_bit, mis->copied_pages);
> > > +
> > > +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +
> > > +    /* lookup cpu, to clear it,
> > > +     * that algorithm looks straighforward, but it's not
> > > +     * optimal, more optimal algorithm is keeping tree or hash
> > > +     * where key is address value is a list of  */
> > > +    for (i = 0; i < smp_cpus; i++) {
> > > +        uint64_t vcpu_blocktime = 0;
> > > +        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) {
> > > +            continue;
> > > +        }
> > > +        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
> > 
> > Why use *__nocheck() rather than atomic_xchg() or even atomic_read()?
> > Same thing happened a lot in current patch.
> atomic_read/atomic_xchg for mingw/(gcc on arm32) has build time check,
> 
> QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));
> 
> it prevents using 64 atomic operation on 32 architecture, just mingw I
> think, but postcopy-ram.c isn't compiling for mingw.
> On other 32 platforms as I know clang/gcc allow to use 8 bytes
> long variables in built atomic operations. In arm32 it allows in
> builtin. But QEMU on arm32 still
> has that sanity check, and I think it's bug, so I just worked it around.
> Maybe better was to fix it.
> 
> I tested in docker, using follow command:
> make docker-test-build@debian-armhf-cross
> 
> And got following error
> 
> /tmp/qemu-test/src/migration/postcopy-ram.c: In function
> 'mark_postcopy_blocktime_begin':
> /tmp/qemu-test/src/include/qemu/compiler.h:86:30: error: static
> assertion failed: "not expecting: sizeof(*&dc->vcpu_addr[cpu]) >
> sizeof(void *)"
>  #define QEMU_BUILD_BUG_ON(x) _Static_assert(!(x), "not expecting: " #x)
> 
> when I used atomic_xchg,
> I agree with you, but I think need to fix atomic.h firstly and add additional
> #ifdef there.
> 
> And I didn't want to split 64 bit values onto 32 bit values, but I saw
> in mailing list people are doing it.

If this is a bug, then I guess the best way is to fix it. But before
that - can a 32bit system really do 64bit atomic ops? Is it really a
bug at all?

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH V6 08/10] migration: calculate vCPU blocktime on dst side

Reply via email to