On Sep 25 16:18, Peter Maydell wrote:
> On 28 August 2018 at 21:03, Aaron Lindsay <aclin...@gmail.com> wrote:
> > On Jun 28 17:40, Peter Maydell wrote:
> >> I had a look through the patchset, and I still can't see
> >> anything here which indicates how we're handling migration.
> >
> > I'll admit that I had not previously thought through everything
> > exhaustively, but I think I've convinced myself that *almost* everything
> > is already handled by calling .readfn/writefn on the registers for
> > migration... more below.
> 
> Thanks for writing up this explanation; it's very helpful. Sorry
> it's taken me a while to get to it -- you unfortunately sent it
> just as I was going on holiday...

No problem - I noticed you went quiet on the list right when I sent it
and figured I scared you away!

I've got a v6 staged - If you don't come up with significant design
changes based on my response below, I'll plan to send it in the next few
days.

> >> If this field is only ever used as a temporary while we're
> >> between op_start and op_finish, then we can just return it from
> >> start and take it as an argument to finish, and we're fine
> >> (migration of the PMCCNTR_EL0 register will be done by calling
> >> its readfn on the source and its writefn on the destination).
> >>
> >> If there's a reason for having this be a field in the state struct
> >> (I think you said counter overflow)? then we need to be sure that
> >> migration is going to do the right thing and that we don't for instance
> >> lose overflow indications during a migration.
> >
> > I think we do need additional state for supporting counter overflow
> > during normal operation (more than just between op_start and op_finish),
> > but I don't think sending additional state over the wire is necessary
> > for migration. Let me lay out my reasoning to see if you think it makes
> > sense:
> >
> > I observe that we seem to deal mostly with the following values with
> > respect to PMU counters, and I think it might be easier to discuss this
> > in terms of values rather than variables:
> >
> >         AC = current architectural value of PMCCNTR
> >         AL = architectural value of PMCCNTR when last observed/updated by
> >              the OS
> >         UC = current underlying counter value
> >         UL = underlying counter value when PMCCNTR last observed/updated by
> >              OS
> >         D = difference between underlying counter value and architectural
> >             value
> >
> > At any given point in time, the following should then be true during
> > normal execution (i.e. between OS accesses to the counter):
> >         UL - AL == D == UC - AC
> >
> > The approach taken previous to my patchset was to store D in
> > cp15.c15_ccnt most of the time, updating that variable to hold AC when
> > read (by doing `c15_ccnt = UC - c15_ccnt`... an implementation using `AC
> > = UC - D`, which is a different way to write `D = UC - AC` from above).
> > This operation was reversed after a write completes, so that c15_ccnt
> > again holds a value corresponding to D.
> >
> > Only storing D works well and minimizes memory usage if you don't care
> > about detecting overflows. However, without some frame of reference for
> > what the last observed architectural value was relative to its current
> > value, we have no way to know if the counter has overflowed in the
> > intervening time. In other words, we need to know both AC and AL to know
> > if the counter has overflowed. Assuming we store D and can obtain UC we
> > can calculate AC, but we need to store at least one additional piece of
> > information to obtain AL (either AL itself or UL).
> 
> In this scheme, how do you tell the difference between "AC > AL
> because it hasn't overflowed" and "AC > AL because it has overflowed
> and run all the way round past AL again" ? That is, I'm not sure that
> storing AL helps you to reliably detect overflow.

I wrestled with how to accomplish generic overflow detection in a
foolproof way that fits with the event-driven design. My current
implementation (see [1]) adds a function called ns_per_count for each
counter type, which is used to setup a timer to overflow at the earliest
time a given counter is expected to overflow. My thinking is that even
if this isn't perfect (i.e. if there's a little 'skid'), it is likely to
be close enough to prevent the type of wrap-around double-overflow you
describe. I'm not wedded to this design - please don't hesitate to let
me know if you think of something better. 

> > In my patchset so far, I'm storing AL in c15_ccnt and D in
> > c15_ccnt_delta during normal operation and AL/AC in c15_ccnt, UL/UC in
> > c15_ccnt_delta between op_start and op_finish (note that AL == AC, UL ==
> > UC, and D == 0 during this time). Also note that UL/UC are only stored
> > between op_start and op_finish calls to avoid losing time, and are not
> > needed to maintain architectural state across migrations.
> 
> > Now for migration - The read/writefn's already marshal/unmarshal both
> > the counter value and overflow state to their architectural values in
> > PMCCNTR and PMOVSCLR. I think this would work on its own if we made
> > two big (and wrong) assumptions:
> >         1. PMOVSCLR is read after PMCCNTR on save (ensures all overflows
> >            which have occurred are recorded in PMOVSCLR) and written before
> >            it on load (ensures any overflow bits in PMOVSCLR cause the
> >            interrupt line to be raised)
> 
> There is indeed no guaranteed ordering in which registers are saved/loaded.
> Also, the load function mustn't raise interrupt lines: it has to transfer
> the state just of this device, and trust that the device on the other
> end restores its state into one that matches the current level of the
> interrupt line.

Okay, got it.

> >         2. Time (as measured by `qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)`) 
> > does
> >            not change during the process of saving or loading CP registers
> >            (ensures PMCCNTR can't have overflowed between when it was saved
> >            and when PMOVSCLR was later saved).
> 
> I think this assumption is true, isn't it? (VM migration happens while
> the VM is stopped, and do_vm_stop() calls cpu_disable_ticks() which
> means QEMU_CLOCK_VIRTUAL won't advance.)

Ah, okay. I wasn't aware of this detail.

-Aaron

[1] - http://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg06843.html

Reply via email to