On Wednesday 15 January 2014 17:59:10 Julian Stecklina wrote: > On Mi, 2014-01-15 at 17:04 +0100, Roger Pau Monné wrote: > > On 15/01/14 14:45, Julian Stecklina wrote: > > > On Mi, 2014-01-15 at 14:08 +0100, Roger Pau Monné wrote: > > >> On 15/01/14 13:05, Julian Stecklina wrote: > > >>> On 01/14/2014 05:13 PM, Peter Grehan wrote: > > >>>> Hi Julian, > > >>>> > > >>>>> is anyone working on KVM clock support for FreeBSD? If not, I > > >>>>> might take a shot at it. > > >>>> > > >>>> None I know of: go for it :) > > >>> > > >>> Works for me so far: > > >>> https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bde > > >>> dc65c3148f> >> > > >> Looking > > >> > > >> at the code it seems some common routines could be shared > > >> between the KVM PV clock and the Xen PV clock > > >> (sys/dev/xen/timer/timer.c). The data passed from the hypervisor to > > >> the guest has exactly the same format (see struct vcpu_time_info in > > >> Xen public headers). > > > > > > Yes, I know. Didn't get around to making it pretty yesterday evening. ;) > > > I'll post an updated patch, when I have some time. Any suggestions where > > > to put the two common functions? > > > > > >> At a first sight the KVM clock can benefit from using scale_delta > > >> (which is going to be faster than the C version implemented in > > >> kvmclock_get_timecount), > > > > > > At least somewhat on amd64. 32-bit looks pretty identical. > > > > > >> and xen_fetch_vcpu_tinfo is exactly the same > > >> > > >> as kvmclock_fetch. > > > > > > I think xen_fetch_vcpu_tinfo has a subtle bug: > > > 217 do { > > > 218 dst->version = src->version; > > > 219 rmb(); > > > 220 dst->tsc_timestamp = src->tsc_timestamp; > > > 221 dst->system_time = src->system_time; > > > 222 dst->tsc_to_system_mul = src->tsc_to_system_mul; > > > 223 dst->tsc_shift = src->tsc_shift; > > > 224 rmb(); > > > 225 } while ((src->version & 1) | (dst->version ^ > > > > > > src->version)); > > > > > > In line 225 src->version is potentially read twice. If you consider the > > > following schedule: > > > > > > host starts updating data, sets src->version to 1 > > > guest reads src->version (1) and stores it into dst->version. > > > guest copies inconsistent data > > > guest reads src->version (1) and computes xor with dst->version. > > > host finishes updating data and sets src->version to 2 > > > guest reads src->version (2) and checks whether lower bit is not set. > > > while loop exits with inconsistent data! > > > > > > I think the C standard (at least C11) permits this as it does not > > > specify in which order the two reads in line 225 need to happen. > > > > According to the operator precedence and associativity in C > > (http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_preceden > > ce), if I'm reading it right, the condition in the while line will be > > evaluated as follows (because of the left-to-right associativity of the > > > > | operator): > > 1. (src->version & 1) > > 2. (dst->version ^ src->version) > > 3. result of 1 | result of 2 > > > > So AFAICT the flow that you describe could never happen, because > > (src->version & 1) is always done before (dst->version ^ src->version). > > Operator precedence doesn't matter. Consider it written like this: > > or(and(src->version, 1), xor(dst->version, src->version)) > > C gives you no guarantees whether the and or the xor will be executed > first. There is no sequence point in between. And even with a sequence > point, the C11 memory model gives you no guarantees about how the reads > are ordered. > > This discussion is somewhat theoretical, because > a) the hypervisor will probably update the data structure in the > current vCPU context (making memory consistency issues go away). > b) the compiler will likely not be an asshole. ;) > > Pseudocode for a better fetch could be: > > dst->version = src->version; > rmb(); > ... read data ... > rmb(); > version_post = src->version; > rmb(); <- keeps the compiler from reading src->version multiple times > > and then doing the test with version_post and dst->version. > Unfortunately, rmb() expands into lfence, even if there is no need for > that here.
There is the __compiler_membar() macro in <sys/cdefs.h> that you could use if this code is x86-specific (and thus knows it only needs a compiler barrier). -- John Baldwin _______________________________________________ freebsd-virtualization@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"