On Tue, Aug 11, 2015 at 3:52 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: > > > On 07/08/2015 19:03, Alvise Rigo wrote: >> +static inline int cpu_physical_memory_excl_atleast_one_clean(ram_addr_t >> addr) >> +{ >> + unsigned long *bitmap = ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]; >> + unsigned long next, end; >> + >> + if (likely(smp_cpus <= BITS_PER_LONG)) { > > This only works if smp_cpus divides BITS_PER_LONG, i.e. BITS_PER_LONG % > smp_cpus == 0.
You are right, > > Why not have a separate function for the cpu == smp_cpus case? I'll rework this part a bit. > > I don't think real hardware has ll/sc per CPU. Can we have the bitmap as: > > - 0 if one or more CPUs have the address set to exclusive, _and_ no CPU > has done a concurrent access > > - 1 if no CPUs have the address set to exclusive, _or_ one CPU has done > a concurrent access. > > Then: > > - ll sets the bit to 0, and requests a flush if it was 1 > > - when setting a TLB entry, set it to TLB_EXCL if the bitmap has 0 > > - in the TLB_EXCL slow path, set the bit to 1 and, for conditional > stores, succeed if the bit was 0 > > - when removing an exclusive entry, set the bit to 1 This can lead to an excessive rate of flush requests, since for one CPU that removes the TLB_EXCL flag, all the others that are competing for the same excl address will need to flush the entire cache and start all over again. If for a start we want to have a simpler implementation, I can revert back to the one-bit design. alvise > > Paolo