On Sep 25 16:18, Peter Maydell wrote: > On 28 August 2018 at 21:03, Aaron Lindsay <aclin...@gmail.com> wrote: > > On Jun 28 17:40, Peter Maydell wrote: > >> I had a look through the patchset, and I still can't see > >> anything here which indicates how we're handling migration. > > > > I'll admit that I had not previously thought through everything > > exhaustively, but I think I've convinced myself that *almost* everything > > is already handled by calling .readfn/writefn on the registers for > > migration... more below. > > Thanks for writing up this explanation; it's very helpful. Sorry > it's taken me a while to get to it -- you unfortunately sent it > just as I was going on holiday...
No problem - I noticed you went quiet on the list right when I sent it and figured I scared you away! I've got a v6 staged - If you don't come up with significant design changes based on my response below, I'll plan to send it in the next few days. > >> If this field is only ever used as a temporary while we're > >> between op_start and op_finish, then we can just return it from > >> start and take it as an argument to finish, and we're fine > >> (migration of the PMCCNTR_EL0 register will be done by calling > >> its readfn on the source and its writefn on the destination). > >> > >> If there's a reason for having this be a field in the state struct > >> (I think you said counter overflow)? then we need to be sure that > >> migration is going to do the right thing and that we don't for instance > >> lose overflow indications during a migration. > > > > I think we do need additional state for supporting counter overflow > > during normal operation (more than just between op_start and op_finish), > > but I don't think sending additional state over the wire is necessary > > for migration. Let me lay out my reasoning to see if you think it makes > > sense: > > > > I observe that we seem to deal mostly with the following values with > > respect to PMU counters, and I think it might be easier to discuss this > > in terms of values rather than variables: > > > > AC = current architectural value of PMCCNTR > > AL = architectural value of PMCCNTR when last observed/updated by > > the OS > > UC = current underlying counter value > > UL = underlying counter value when PMCCNTR last observed/updated by > > OS > > D = difference between underlying counter value and architectural > > value > > > > At any given point in time, the following should then be true during > > normal execution (i.e. between OS accesses to the counter): > > UL - AL == D == UC - AC > > > > The approach taken previous to my patchset was to store D in > > cp15.c15_ccnt most of the time, updating that variable to hold AC when > > read (by doing `c15_ccnt = UC - c15_ccnt`... an implementation using `AC > > = UC - D`, which is a different way to write `D = UC - AC` from above). > > This operation was reversed after a write completes, so that c15_ccnt > > again holds a value corresponding to D. > > > > Only storing D works well and minimizes memory usage if you don't care > > about detecting overflows. However, without some frame of reference for > > what the last observed architectural value was relative to its current > > value, we have no way to know if the counter has overflowed in the > > intervening time. In other words, we need to know both AC and AL to know > > if the counter has overflowed. Assuming we store D and can obtain UC we > > can calculate AC, but we need to store at least one additional piece of > > information to obtain AL (either AL itself or UL). > > In this scheme, how do you tell the difference between "AC > AL > because it hasn't overflowed" and "AC > AL because it has overflowed > and run all the way round past AL again" ? That is, I'm not sure that > storing AL helps you to reliably detect overflow. I wrestled with how to accomplish generic overflow detection in a foolproof way that fits with the event-driven design. My current implementation (see [1]) adds a function called ns_per_count for each counter type, which is used to setup a timer to overflow at the earliest time a given counter is expected to overflow. My thinking is that even if this isn't perfect (i.e. if there's a little 'skid'), it is likely to be close enough to prevent the type of wrap-around double-overflow you describe. I'm not wedded to this design - please don't hesitate to let me know if you think of something better. > > In my patchset so far, I'm storing AL in c15_ccnt and D in > > c15_ccnt_delta during normal operation and AL/AC in c15_ccnt, UL/UC in > > c15_ccnt_delta between op_start and op_finish (note that AL == AC, UL == > > UC, and D == 0 during this time). Also note that UL/UC are only stored > > between op_start and op_finish calls to avoid losing time, and are not > > needed to maintain architectural state across migrations. > > > Now for migration - The read/writefn's already marshal/unmarshal both > > the counter value and overflow state to their architectural values in > > PMCCNTR and PMOVSCLR. I think this would work on its own if we made > > two big (and wrong) assumptions: > > 1. PMOVSCLR is read after PMCCNTR on save (ensures all overflows > > which have occurred are recorded in PMOVSCLR) and written before > > it on load (ensures any overflow bits in PMOVSCLR cause the > > interrupt line to be raised) > > There is indeed no guaranteed ordering in which registers are saved/loaded. > Also, the load function mustn't raise interrupt lines: it has to transfer > the state just of this device, and trust that the device on the other > end restores its state into one that matches the current level of the > interrupt line. Okay, got it. > > 2. Time (as measured by `qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)`) > > does > > not change during the process of saving or loading CP registers > > (ensures PMCCNTR can't have overflowed between when it was saved > > and when PMOVSCLR was later saved). > > I think this assumption is true, isn't it? (VM migration happens while > the VM is stopped, and do_vm_stop() calls cpu_disable_ticks() which > means QEMU_CLOCK_VIRTUAL won't advance.) Ah, okay. I wasn't aware of this detail. -Aaron [1] - http://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg06843.html