Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
On 05/03/2015 10:55 PM, Juergen Gross wrote: > I did a small measurement of the pure locking functions on bare metal > without and with my patches. > > spin_lock() for the first time (lock and code not in cache) dropped from > about 600 to 500 cycles. > > spin_unlock() for first time dropped from 145 to 87 cycles. > > spin_lock() in a loop dropped from 48 to 45 cycles. > > spin_unlock() in the same loop dropped from 24 to 22 cycles. Did you isolate icache hot/cold from dcache hot/cold? It seems to me the main difference will be whether the branch predictor is warmed up rather than if the lock itself is in dcache, but its much more likely that the lock code is icache if the code is lock intensive, making the cold case moot. But that's pure speculation. Could you see any differences in workloads beyond microbenchmarks? Not that its my call at all, but I think we'd need to see some concrete improvements in real workloads before adding the complexity of more pvops. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
On 04/30/2015 03:53 AM, Juergen Gross wrote: > Paravirtualized spinlocks produce some overhead even if the kernel is > running on bare metal. The main reason are the more complex locking > and unlocking functions. Especially unlocking is no longer just one > instruction but so complex that it is no longer inlined. > > This patch series addresses this issue by adding two more pvops > functions to reduce the size of the inlined spinlock functions. When > running on bare metal unlocking is again basically one instruction. Out of curiosity, is there a measurable difference? J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/11/2015 03:28 PM, Linus Torvalds wrote: > > > On Feb 11, 2015 3:15 PM, "Jeremy Fitzhardinge" <mailto:jer...@goop.org>> wrote: > > > > Right now it needs to be a locked operation to prevent read-reordering. > > x86 memory ordering rules state that all writes are seen in a globally > > consistent order, and are globally ordered wrt reads *on the same > > addresses*, but reads to different addresses can be reordered wrt to > writes. > > The modern x86 rules are actually much tighter than that. > > Every store is a release, and every load is an acquire. So a > non-atomic store is actually a perfectly fine unlock. All preceding > stores will be seen by other cpu's before the unlock, and while reads > can pass stores, they only pass *earlier* stores. > Right, so in this particular instance, the read of the SLOWPATH flag *can't* pass the previous unlock store, hence the need for an atomic unlock or some other mechanism to prevent the read from being reordered. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/11/2015 09:24 AM, Oleg Nesterov wrote: > I agree, and I have to admit I am not sure I fully understand why > unlock uses the locked add. Except we need a barrier to avoid the race > with the enter_slowpath() users, of course. Perhaps this is the only > reason? Right now it needs to be a locked operation to prevent read-reordering. x86 memory ordering rules state that all writes are seen in a globally consistent order, and are globally ordered wrt reads *on the same addresses*, but reads to different addresses can be reordered wrt to writes. So, if the unlocking add were not a locked operation: __add(&lock->tickets.head, TICKET_LOCK_INC);/* not locked */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); Then the read of lock->tickets.tail can be reordered before the unlock, which introduces a race: /* read reordered here */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) /* false */ /* ... */; /* other CPU sets SLOWPATH and blocks */ __add(&lock->tickets.head, TICKET_LOCK_INC);/* not locked */ /* other CPU hung */ So it doesn't *have* to be a locked operation. This should also work: __add(&lock->tickets.head, TICKET_LOCK_INC);/* not locked */ lfence(); /* prevent read reordering */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); but in practice a locked add is cheaper than an lfence (or at least was). This *might* be OK, but I think it's on dubious ground: __add(&lock->tickets.head, TICKET_LOCK_INC);/* not locked */ /* read overlaps write, and so is ordered */ if (unlikely(lock->head_tail & (TICKET_SLOWPATH_FLAG << TICKET_SHIFT)) __ticket_unlock_slowpath(lock, prev); because I think Intel and AMD differed in interpretation about how overlapping but different-sized reads & writes are ordered (or it simply isn't architecturally defined). If the slowpath flag is moved to head, then it would always have to be locked anyway, because it needs to be atomic against other CPU's RMW operations setting the flag. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/10/2015 05:26 AM, Oleg Nesterov wrote: > On 02/10, Raghavendra K T wrote: >> On 02/10/2015 06:23 AM, Linus Torvalds wrote: >> >>> add_smp(&lock->tickets.head, TICKET_LOCK_INC); >>> if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) .. >>> >>> into something like >>> >>> val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << >>> TICKET_SHIFT); >>> if (unlikely(val & TICKET_SLOWPATH_FLAG)) ... >>> >>> would be the right thing to do. Somebody should just check that I got >>> that shift right, and that the tail is in the high bytes (head really >>> needs to be high to work, if it's in the low byte(s) the xadd would >>> overflow from head into tail which would be wrong). >> Unfortunately xadd could result in head overflow as tail is high. >> >> The other option was repeated cmpxchg which is bad I believe. >> Any suggestions? > Stupid question... what if we simply move SLOWPATH from .tail to .head? > In this case arch_spin_unlock() could do xadd(tickets.head) and check > the result Well, right now, "tail" is manipulated by locked instructions by CPUs who are contending for the ticketlock, but head can be manipulated unlocked by the CPU which currently owns the ticketlock. If SLOWPATH moved into head, then non-owner CPUs would be touching head, requiring everyone to use locked instructions on it. That's the theory, but I don't see much (any?) code which depends on that. Ideally we could find a way so that pv ticketlocks could use a plain unlocked add for the unlock like the non-pv case, but I just don't see a way to do it. > In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg > the whole .head_tail. Plus obviously more boring changes. This needs a > separate patch even _if_ this can work. Definitely. > BTW. If we move "clear slowpath" into "lock" path, then probably trylock > should be changed too? Something like below, we just need to clear SLOWPATH > before cmpxchg. How important / widely used is trylock these days? J > > Oleg. > > --- x/arch/x86/include/asm/spinlock.h > +++ x/arch/x86/include/asm/spinlock.h > @@ -109,7 +109,8 @@ static __always_inline int arch_spin_try > if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG)) > return 0; > > - new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT); > + new.tickets.head = old.tickets.head; > + new.tickets.tail = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) + > TICKET_LOCK_INC; > > /* cmpxchg is a full barrier, so nothing can move before it */ > return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == > old.head_tail; > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/06/2015 06:49 AM, Raghavendra K T wrote: > Paravirt spinlock clears slowpath flag after doing unlock. > As explained by Linus currently it does: > prev = *lock; > add_smp(&lock->tickets.head, TICKET_LOCK_INC); > > /* add_smp() is a full mb() */ > > if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) > __ticket_unlock_slowpath(lock, prev); > > > which is *exactly* the kind of things you cannot do with spinlocks, > because after you've done the "add_smp()" and released the spinlock > for the fast-path, you can't access the spinlock any more. Exactly > because a fast-path lock might come in, and release the whole data > structure. Yeah, that's an embarrasingly obvious bug in retrospect. > Linus suggested that we should not do any writes to lock after unlock(), > and we can move slowpath clearing to fastpath lock. Yep, that seems like a sound approach. > However it brings additional case to be handled, viz., slowpath still > could be set when somebody does arch_trylock. Handle that too by ignoring > slowpath flag during lock availability check. > > Reported-by: Sasha Levin > Suggested-by: Linus Torvalds > Signed-off-by: Raghavendra K T > --- > arch/x86/include/asm/spinlock.h | 70 > - > 1 file changed, 34 insertions(+), 36 deletions(-) > > diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h > index 625660f..0829f86 100644 > --- a/arch/x86/include/asm/spinlock.h > +++ b/arch/x86/include/asm/spinlock.h > @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t > *lock) > set_bit(0, (volatile unsigned long *)&lock->tickets.tail); > } > > +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) > +{ > + arch_spinlock_t old, new; > + __ticket_t diff; > + > + old.tickets = READ_ONCE(lock->tickets); Couldn't the caller pass in the lock state that it read rather than re-reading it? > + diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.tickets.head; > + > + /* try to clear slowpath flag when there are no contenders */ > + if ((old.tickets.tail & TICKET_SLOWPATH_FLAG) && > + (diff == TICKET_LOCK_INC)) { > + new = old; > + new.tickets.tail &= ~TICKET_SLOWPATH_FLAG; > + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); > + } > +} > + > #else /* !CONFIG_PARAVIRT_SPINLOCKS */ > static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock, > __ticket_t ticket) > @@ -59,6 +76,10 @@ static inline void __ticket_unlock_kick(arch_spinlock_t > *lock, > { > } > > +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) > +{ > +} > + > #endif /* CONFIG_PARAVIRT_SPINLOCKS */ > > static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock) > @@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t > *lock) > register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC }; > > inc = xadd(&lock->tickets, inc); > - if (likely(inc.head == inc.tail)) > + if (likely(inc.head == (inc.tail & ~TICKET_SLOWPATH_FLAG))) The intent of this conditional was to be the quickest possible path when taking a fastpath lock, with the code below being used for all slowpath locks (free or taken). So I don't think masking out SLOWPATH_FLAG is necessary here. > goto out; > > inc.tail &= ~TICKET_SLOWPATH_FLAG; > @@ -98,7 +119,10 @@ static __always_inline void > arch_spin_lock(arch_spinlock_t *lock) > } while (--count); > __ticket_lock_spinning(lock, inc.tail); > } > -out: barrier(); /* make sure nothing creeps before the lock is taken */ > +out: > + __ticket_check_and_clear_slowpath(lock); > + > + barrier(); /* make sure nothing creeps before the lock is taken */ Which means that if "goto out" path is only ever used for fastpath locks, you can limit calling __ticket_check_and_clear_slowpath() to the slowpath case. > } > > static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) > @@ -115,47 +139,21 @@ static __always_inline int > arch_spin_trylock(arch_spinlock_t *lock) > return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == > old.head_tail; > } > > -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock, > - arch_spinlock_t old) > -{ > - arch_spinlock_t new; > - > - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS); > - > - /* Perform the unlock on the "before" copy */ > - old.tickets.head += TICKET_LOCK_INC; NB (see below) > - > - /* Clear the slowpath flag */ > - new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT); > - > - /* > - * If the lock is uncontended, clear the flag - us
Re: [PATCH delta V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
On 08/13/2013 01:02 PM, Raghavendra K T wrote: > * Ingo Molnar [2013-08-13 18:55:52]: > >> Would be nice to have a delta fix patch against tip:x86/spinlocks, which >> I'll then backmerge into that series via rebasing it. >> > There was a namespace collision of PER_CPU lock_waiting variable when > we have both Xen and KVM enabled. > > Perhaps this week wasn't for me. Had run 100 times randconfig in a loop > for the fix sent earlier :(. > > Ingo, below delta patch should fix it, IIRC, I hope you will be folding this > back to patch 14/14 itself. Else please let me. > I have already run allnoconfig, allyesconfig, randconfig with below patch. > But will > test again. This should apply on top of tip:x86/spinlocks. > > ---8<--- > From: Raghavendra K T > > Fix Namespace collision for lock_waiting > > Signed-off-by: Raghavendra K T > --- > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index d442471..b8ef630 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -673,7 +673,7 @@ struct kvm_lock_waiting { > static cpumask_t waiting_cpus; > > /* Track spinlock on which a cpu is waiting */ > -static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting); > +static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting); Has static stopped meaning static? J > > static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) > { > @@ -685,7 +685,7 @@ static void kvm_lock_spinning(struct arch_spinlock *lock, > __ticket_t want) > if (in_nmi()) > return; > > - w = &__get_cpu_var(lock_waiting); > + w = &__get_cpu_var(klock_waiting); > cpu = smp_processor_id(); > start = spin_time_start(); > > @@ -756,7 +756,7 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, > __ticket_t ticket) > > add_stats(RELEASED_SLOW, 1); > for_each_cpu(cpu, &waiting_cpus) { > - const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu); > + const struct kvm_lock_waiting *w = &per_cpu(klock_waiting, cpu); > if (ACCESS_ONCE(w->lock) == lock && > ACCESS_ONCE(w->want) == ticket) { > add_stats(RELEASED_SLOW_KICKED, 1); > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V9 1/19] x86/spinlock: Replace pv spinlocks with pv ticketlocks
On 06/01/2013 12:21 PM, Raghavendra K T wrote: > x86/spinlock: Replace pv spinlocks with pv ticketlocks > > From: Jeremy Fitzhardinge I'm not sure what the etiquette is here; I did the work while at Citrix, but jer...@goop.org is my canonical email address. The Citrix address is dead and bounces, so is useless for anything. Probably best to change it. J > > Rather than outright replacing the entire spinlock implementation in > order to paravirtualize it, keep the ticket lock implementation but add > a couple of pvops hooks on the slow patch (long spin on lock, unlocking > a contended lock). > > Ticket locks have a number of nice properties, but they also have some > surprising behaviours in virtual environments. They enforce a strict > FIFO ordering on cpus trying to take a lock; however, if the hypervisor > scheduler does not schedule the cpus in the correct order, the system can > waste a huge amount of time spinning until the next cpu can take the lock. > > (See Thomas Friebel's talk "Prevent Guests from Spinning Around" > http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) > > To address this, we add two hooks: > - __ticket_spin_lock which is called after the cpu has been >spinning on the lock for a significant number of iterations but has >failed to take the lock (presumably because the cpu holding the lock >has been descheduled). The lock_spinning pvop is expected to block >the cpu until it has been kicked by the current lock holder. > - __ticket_spin_unlock, which on releasing a contended lock >(there are more cpus with tail tickets), it looks to see if the next >cpu is blocked and wakes it if so. > > When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub > functions causes all the extra code to go away. > > Signed-off-by: Jeremy Fitzhardinge > Reviewed-by: Konrad Rzeszutek Wilk > Tested-by: Attilio Rao > [ Raghavendra: Changed SPIN_THRESHOLD ] > Signed-off-by: Raghavendra K T > --- > arch/x86/include/asm/paravirt.h | 32 > arch/x86/include/asm/paravirt_types.h | 10 ++ > arch/x86/include/asm/spinlock.h | 53 > +++-- > arch/x86/include/asm/spinlock_types.h |4 -- > arch/x86/kernel/paravirt-spinlocks.c | 15 + > arch/x86/xen/spinlock.c |8 - > 6 files changed, 61 insertions(+), 61 deletions(-) > > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h > index cfdc9ee..040e72d 100644 > --- a/arch/x86/include/asm/paravirt.h > +++ b/arch/x86/include/asm/paravirt.h > @@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum > fixed_addresses */ idx, > > #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS) > > -static inline int arch_spin_is_locked(struct arch_spinlock *lock) > +static __always_inline void __ticket_lock_spinning(struct arch_spinlock > *lock, > + __ticket_t ticket) > { > - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock); > + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); > } > > -static inline int arch_spin_is_contended(struct arch_spinlock *lock) > +static __always_inline void ticket_unlock_kick(struct arch_spinlock > *lock, > + __ticket_t ticket) > { > - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock); > -} > -#define arch_spin_is_contended arch_spin_is_contended > - > -static __always_inline void arch_spin_lock(struct arch_spinlock *lock) > -{ > - PVOP_VCALL1(pv_lock_ops.spin_lock, lock); > -} > - > -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock, > - unsigned long flags) > -{ > - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags); > -} > - > -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock) > -{ > - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock); > -} > - > -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock) > -{ > - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock); > + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); > } > > #endif > diff --git a/arch/x86/include/asm/paravirt_types.h > b/arch/x86/include/asm/paravirt_types.h > index 0db1fca..d5deb6d 100644 > --- a/arch/x86/include/asm/paravirt_types.h > +++ b/arch/x86/include/asm/paravirt_types.h > @@ -327,13 +327,11 @@ struct pv_mmu_ops { > }; > > struct arch_spinlock; > +#include > + > struct pv_lock_ops { > - int (*spin_is_locked)(st
Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks
On 06/01/2013 01:14 PM, Andi Kleen wrote: > FWIW I use the paravirt spinlock ops for adding lock elision > to the spinlocks. Does lock elision still use the ticketlock algorithm/structure, or are they different? If they're still basically ticketlocks, then it seems to me that they're complimentary - hle handles the fastpath, and pv the slowpath. > This needs to be done at the top level (so the level you're removing) > > However I don't like the pv mechanism very much and would > be fine with using an static key hook in the main path > like I do for all the other lock types. Right. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 09/14] xen: events: Remove redundant check on unsigned variable
To be honest I'd nack this kind of patch. The test is only redundant in the most trivial sense that the compiler can easily optimise away. The point of the test is to make sure that the range is OK even if the type subsequently becomes signed (to hold a -ve error, for example). J Konrad Rzeszutek Wilk wrote: >On Fri, Nov 16, 2012 at 12:20:41PM +0530, Tushar Behera wrote: >> No need to check whether unsigned variable is less than 0. >> >> CC: Konrad Rzeszutek Wilk > >Acked-by: Konrad Rzeszutek Wilk > >> CC: Jeremy Fitzhardinge >> CC: xen-de...@lists.xensource.com >> CC: virtualization@lists.linux-foundation.org >> Signed-off-by: Tushar Behera >> --- >> drivers/xen/events.c |2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/xen/events.c b/drivers/xen/events.c >> index 4293c57..cadd7d1 100644 >> --- a/drivers/xen/events.c >> +++ b/drivers/xen/events.c >> @@ -216,7 +216,7 @@ static void xen_irq_info_pirq_init(unsigned irq, >> */ >> static unsigned int evtchn_from_irq(unsigned irq) >> { >> -if (unlikely(WARN(irq < 0 || irq >= nr_irqs, "Invalid irq %d!\n", >irq))) >> +if (unlikely(WARN(irq >= nr_irqs, "Invalid irq %d!\n", irq))) >> return 0; >> >> return info_for_irq(irq)->evtchn; >> -- >> 1.7.4.1 -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V8 0/17] Paravirtualized ticket spinlocks
On 05/13/2012 11:45 AM, Raghavendra K T wrote: > On 05/07/2012 08:22 PM, Avi Kivity wrote: > > I could not come with pv-flush results (also Nikunj had clarified that > the result was on NOn PLE > >> I'd like to see those numbers, then. >> >> Ingo, please hold on the kvm-specific patches, meanwhile. >> > > 3 guests 8GB RAM, 1 used for kernbench > (kernbench -f -H -M -o 20) other for cpuhog (shell script with while > true do hackbench) > > 1x: no hogs > 2x: 8hogs in one guest > 3x: 8hogs each in two guest > > kernbench on PLE: > Machine : IBM xSeries with Intel(R) Xeon(R) X7560 2.27GHz CPU with 32 > core, with 8 online cpus and 4*64GB RAM. > > The average is taken over 4 iterations with 3 run each (4*3=12). and > stdev is calculated over mean reported in each run. > > > A): 8 vcpu guest > > BASEBASE+patch %improvement w.r.t > mean (sd) mean (sd) > patched kernel time > case 1*1x:61.7075 (1.17872)60.93 (1.475625)1.27605 > case 1*2x:107.2125 (1.3821349)97.506675 (1.3461878) 9.95401 > case 1*3x:144.3515 (1.8203927)138.9525 (0.58309319) 3.8855 > > > B): 16 vcpu guest > BASEBASE+patch %improvement w.r.t > mean (sd) mean (sd) > patched kernel time > case 2*1x:70.524 (1.5941395)69.68866 (1.9392529) 1.19867 > case 2*2x:133.0738 (1.4558653)124.8568 (1.4544986) 6.58114 > case 2*3x:206.0094 (1.3437359)181.4712 (2.9134116) 13.5218 > > B): 32 vcpu guest > BASEBASE+patch %improvementw.r.t > mean (sd) mean (sd) > patched kernel time > case 4*1x:100.61046 (2.7603485) 85.48734 (2.6035035) 17.6905 What does the "4*1x" notation mean? Do these workloads have overcommit of the PCPU resources? When I measured it, even quite small amounts of overcommit lead to large performance drops with non-pv ticket locks (on the order of 10% improvements when there were 5 busy VCPUs on a 4 cpu system). I never tested it on larger machines, but I guess that represents around 25% overcommit, or 40 busy VCPUs on a 32-PCPU system. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V8 0/17] Paravirtualized ticket spinlocks
On 05/07/2012 06:49 AM, Avi Kivity wrote: > On 05/07/2012 04:46 PM, Srivatsa Vaddagiri wrote: >> * Raghavendra K T [2012-05-07 19:08:51]: >> >>> I 'll get hold of a PLE mc and come up with the numbers soon. but I >>> 'll expect the improvement around 1-3% as it was in last version. >> Deferring preemption (when vcpu is holding lock) may give us better than >> 1-3% >> results on PLE hardware. Something worth trying IMHO. > Is the improvement so low, because PLE is interfering with the patch, or > because PLE already does a good job? How does PLE help with ticket scheduling on unlock? I thought it would just help with the actual spin loops. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH RFC V6 0/11] Paravirtualized ticketlocks
On 04/16/2012 09:36 AM, Ian Campbell wrote: > On Mon, 2012-04-16 at 16:44 +0100, Konrad Rzeszutek Wilk wrote: >> On Sat, Mar 31, 2012 at 09:37:45AM +0530, Srivatsa Vaddagiri wrote: >>> * Thomas Gleixner [2012-03-31 00:07:58]: >>> I know that Peter is going to go berserk on me, but if we are running a paravirt guest then it's simple to provide a mechanism which allows the host (aka hypervisor) to check that in the guest just by looking at some global state. So if a guest exits due to an external event it's easy to inspect the state of that guest and avoid to schedule away when it was interrupted in a spinlock held section. That guest/host shared state needs to be modified to indicate the guest to invoke an exit when the last nested lock has been released. >>> I had attempted something like that long back: >>> >>> http://lkml.org/lkml/2010/6/3/4 >>> >>> The issue is with ticketlocks though. VCPUs could go into a spin w/o >>> a lock being held by anybody. Say VCPUs 1-99 try to grab a lock in >>> that order (on a host with one cpu). VCPU1 wins (after VCPU0 releases it) >>> and releases the lock. VCPU1 is next eligible to take the lock. If >>> that is not scheduled early enough by host, then remaining vcpus would keep >>> spinning (even though lock is technically not held by anybody) w/o making >>> forward progress. >>> >>> In that situation, what we really need is for the guest to hint to host >>> scheduler to schedule VCPU1 early (via yield_to or something similar). >>> >>> The current pv-spinlock patches however does not track which vcpu is >>> spinning at what head of the ticketlock. I suppose we can consider >>> that optimization in future and see how much benefit it provides (over >>> plain yield/sleep the way its done now). >> Right. I think Jeremy played around with this some time? > 5/11 "xen/pvticketlock: Xen implementation for PV ticket locks" tracks > which vcpus are waiting for a lock in "cpumask_t waiting_cpus" and > tracks which lock each is waiting for in per-cpu "lock_waiting". This is > used in xen_unlock_kick to kick the right CPU. There's a loop over only > the waiting cpus to figure out who to kick. Yes, and AFAIK the KVM pv-ticketlock patches do the same thing. If a (V)CPU is asleep, then sending it a kick is pretty much equivalent to a yield to (not precisely, but it should get scheduled soon enough, and it won't be competing with a pile of VCPUs with no useful work to do). J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
On 01/19/2012 12:54 AM, Srivatsa Vaddagiri wrote: > >> That logic relies on the "kick" being level triggered, so that "kick" >> before "block" will cause the block to fall out immediately. If you're >> using "hlt" as the block and it has the usual edge-triggered behaviour, >> what stops a "kick-before-hlt" from losing the kick? > Hmm ..'hlt' should result in a check for kick request (in hypervisor > context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick > vcpu0 > will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check > before it puts vcpu0 to sleep because of trapped 'hlt' instruction. > > Won't that trap the 'kick-before-hlt' case? What am I missing here? Nothing, that sounds fine. It wasn't clear to me that your kick operation left persistent state, and so has a level-triggered effect on hlt. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
On 01/17/2012 10:33 PM, Srivatsa Vaddagiri wrote: > * Marcelo Tosatti [2012-01-17 09:02:11]: > >>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */ >>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) >>> +{ >>> + int cpu; >>> + int apicid; >>> + >>> + add_stats(RELEASED_SLOW, 1); >>> + >>> + for_each_cpu(cpu, &waiting_cpus) { >>> + const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu); >>> + if (ACCESS_ONCE(w->lock) == lock && >>> + ACCESS_ONCE(w->want) == ticket) { >>> + add_stats(RELEASED_SLOW_KICKED, 1); >>> + apicid = per_cpu(x86_cpu_to_apicid, cpu); >>> + kvm_kick_cpu(apicid); >>> + break; >>> + } >>> + } >> What prevents a kick from being lost here, if say, the waiter is at >> local_irq_save in kvm_lock_spinning, before the lock/want assignments? > The waiter does check for lock becoming available before actually > sleeping: > > + /* > +* check again make sure it didn't become free while > +* we weren't looking. > +*/ > + if (ACCESS_ONCE(lock->tickets.head) == want) { > + add_stats(TAKEN_SLOW_PICKUP, 1); > + goto out; > + } That logic relies on the "kick" being level triggered, so that "kick" before "block" will cause the block to fall out immediately. If you're using "hlt" as the block and it has the usual edge-triggered behaviour, what stops a "kick-before-hlt" from losing the kick? J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
On 01/16/2012 09:24 PM, Alexander Graf wrote: > This is true in case you're spinning. If on overcommit spinlocks would > instead of spin just yield(), we wouldn't have any vcpu running that's > just waiting for a late ticket. Yes, but the reality is that most spinlocks are held for a short period of time and there's a low likelihood of being preempted while within a spinlock critical section. Therefore if someone else tries to get the spinlock and there's contention, it's always worth spinning for a little while because the lock will likely become free soon. At least that's the case if the lock has low contention (shallow queue depth and not in slow state). Again, maybe it makes sense to never spin for deep queues or already slowstate locks. > We still have an issue finding the point in time when a vcpu could run again, > which is what this whole series is about. My point above was that instead of > doing a count loop, we could just do the normal spin dance and set the > threshold to when we enable the magic to have another spin lock notify us in > the CPU. That way we > > * don't change the uncontended case I don't follow you. What do you mean by "the normal spin dance"? What do you mean by "have another spinlock notify us in the CPU"? Don't change which uncontended case? Do you mean in the locking path? Or the unlock path? Or both? > * can set the threshold on the host, which knows how contended the system is Hm, I'm not convinced that knowing how contended the system is is all that useful overall. What's important is how contended a particular lock is, and what state the current holder is in. If it's not currently running, then knowing the overall system contention would give you some idea about how long you need to wait for it to be rescheduled, but that's getting pretty indirect. I think the "slowpath if preempted while spinning" idea I mentioned in the other mail is probably worth following up, since that give specific actionable information to the guest from the hypervisor. But lots of caveats. [[ A possible mechanism: * register ranges of [er]ips with the hypervisor * each range is paired with a "resched handler block" * if vcpu is preempted within such a range, make sure it is rescheduled in the resched handler block This is obviously akin to the exception mechanism, but it is partially implemented by the hypervisor. It allows the spinlock code to be unchanged from native, but make use of a resched rather than an explicit counter to determine when to slowpath the lock. And it's a nice general mechanism that could be potentially useful elsewhere. Unfortunately, it doesn't change the unlock path at all; it still needs to explicitly test if a VCPU needs to be kicked on unlock. ]] > And since we control what spin locks look like, we can for example always > keep the pointer to it in a specific register so that we can handle > pv_lock_ops.lock_spinning() inside there and fetch all the information we > need from our pt_regs. You've left a pile of parts of an idea lying around, but I'm not sure what shape you intend it to be. >>> Speaking of which - have you benchmarked performance degradation of pv >>> ticket locks on bare metal? Last time I checked, enabling all the PV ops >>> did incur significant slowdown which is why I went though the work to split >>> the individual pv ops features up to only enable a few for KVM guests. >> The whole point of the pv-ticketlock work is to keep the pvops hooks out of >> the locking fast path, so that the calls are only made on the slow path - >> that is, when spinning too long on a contended lock, and when releasing a >> lock that's in a "slow" state. In the fast path case of no contention, >> there are no pvops, and the executed code path is almost identical to native. > You're still changing a tight loop that does nothing (CPU detects it and > saves power) into something that performs calculations. It still has a "pause" instruction in that loop, so that CPU mechanism will still come into play. "pause" doesn't directly "save power"; it's more about making sure that memory dependence cycles are broken and that two competing threads will make similar progress. Besides I'm not sure adding a dec+test to a loop that's already got a memory read and compare in it is adding much in the way of "calculations". J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
On 01/16/2012 07:55 PM, Avi Kivity wrote: > On 01/16/2012 08:40 AM, Jeremy Fitzhardinge wrote: >>> That means we're spinning for n cycles, then notify the spinlock holder >>> that we'd like to get kicked and go sleeping. While I'm pretty sure that it >>> improves the situation, it doesn't solve all of the issues we have. >>> >>> Imagine we have an idle host. All vcpus can freely run and everyone can >>> fetch the lock as fast as on real machines. We don't need to / want to go >>> to sleep here. Locks that take too long are bugs that need to be solved on >>> real hw just as well, so all we do is possibly incur overhead. >> I'm not quite sure what your concern is. The lock is under contention, so >> there's nothing to do except spin; all this patch adds is a variable >> decrement/test to the spin loop, but that's not going to waste any more CPU >> than the non-counting case. And once it falls into the blocking path, its a >> win because the VCPU isn't burning CPU any more. > The wakeup path is slower though. The previous lock holder has to > hypercall, and the new lock holder has to be scheduled, and transition > from halted state to running (a vmentry). So it's only a clear win if > we can do something with the cpu other than go into the idle loop. Not burning power is a win too. Actually what you want is something like "if you preempt a VCPU while its spinning in a lock, then push it into the slowpath and don't reschedule it without a kick". But I think that interface would have a lot of fiddly corners. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
On 01/17/2012 01:47 AM, Avi Kivity wrote: > On 01/16/2012 04:13 PM, Raghavendra K T wrote: >>> Please drop all of these and replace with tracepoints in the appropriate >>> spots. Everything else (including the historgram) can be reconstructed >>> the tracepoints in userspace. >>> >> >> I think Jeremy pointed that tracepoints use spinlocks and hence debugfs >> is the option.. no ? >> > Yeah, I think you're right. What a pity. Yes, I went through the same thought process. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
problem gets. > Speaking of which - have you benchmarked performance degradation of pv ticket > locks on bare metal? Last time I checked, enabling all the PV ops did incur > significant slowdown which is why I went though the work to split the > individual pv ops features up to only enable a few for KVM guests. The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state. In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native. But as I mentioned above, I'd like to see some benchmarks to prove that's the case. J > >> >> Changes in V4: >> - reabsed to 3.2.0 pre. >> - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. >> (Avi) >> - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and >> related >> changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, >> Marcello) >> - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU) >> and capabilty (KVM_CAP_PVLOCK_KICK) (Avi) >> - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello) >> - cumulative variable type changed (int ==> u32) in add_stat (Konrad) >> - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case >> >> Changes in V3: >> - rebased to 3.2-rc1 >> - use halt() instead of wait for kick hypercall. >> - modify kick hyper call to do wakeup halted vcpu. >> - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of >> head##.c). >> - fix the potential race when zero_stat is read. >> - export debugfs_create_32 and add documentation to API. >> - use static inline and enum instead of ADDSTAT macro. >> - add barrier() in after setting kick_vcpu. >> - empty static inline function for kvm_spinlock_init. >> - combine the patches one and two readuce overhead. >> - make KVM_DEBUGFS depends on DEBUGFS. >> - include debugfs header unconditionally. >> >> Changes in V2: >> - rebased patchesto -rc9 >> - synchronization related changes based on Jeremy's changes >> (Jeremy Fitzhardinge ) pointed by >> Stephan Diestelhorst >> - enabling 32 bit guests >> - splitted patches into two more chunks >> >> Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): >> Add debugfs support to print u32-arrays in debugfs >> Add a hypercall to KVM hypervisor to support pv-ticketlocks >> Added configuration support to enable debug information for KVM Guests >> pv-ticketlocks support for linux guests running on KVM hypervisor >> Add documentation on Hypercalls and features used for PV spinlock >> >> Test Set up : >> The BASE patch is pre 3.2.0 + Jeremy's following patches. >> xadd (https://lkml.org/lkml/2011/10/4/328) >> x86/ticketlocklock (https://lkml.org/lkml/2011/10/12/496). >> Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE >> (Note:locked add change is not taken yet) >> >> Results: >> The performance gain is mainly because of reduced busy-wait time. >> From the results we can see that patched kernel performance is similar to >> BASE when there is no lock contention. But once we start seeing more >> contention, patched kernel outperforms BASE (non PLE). >> On PLE machine we do not see greater performance improvement because of PLE >> complimenting halt() >> >> 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench >> (kernbench -f -H -M -o 20) other for cpuhog (shell script while >> true with an instruction) >> >> scenario A: unpinned >> >> 1x: no hogs >> 2x: 8hogs in one guest >> 3x: 8hogs each in two guest >> >> scenario B: unpinned, run kernbench on all the guests no hogs. >> >> Dbench on PLE machine: >> dbench run on all the guest simultaneously with >> dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32). >> >> Result for Non PLE machine : >> >> Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , >> 64GB RAM >> BASEBASE+patch%improvement >> mean (sd) mean (sd) >> Scenario A: >> case 1x: 164.233 (16.5506) 163.584 (15.4598 0.39517 >> case 2x: 897.654 (543.993) 328.63 (103.771) 63.3901 >> case 3x: 2855.73 (2201.41) 315.029 (111.854)
Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
On 01/10/2012 05:44 AM, Stefano Stabellini wrote: > On Mon, 9 Jan 2012, Andrew Jones wrote: >> I guess if we did the s/XEN_DOM0/LOCAL_APIC && IO_APIC && ACPI/ in >> arch/x86/pci/xen.c it would be pretty easy to review for equivalence. >> Then keep CONFIG_PRIVILIGED, but drop XEN_DOM0 from everywhere else and >> compile in the 3 files. I don't think it makes much sense to do it >> though. XEN_DOM0 keeps things tidier now and might be useful later. > we can keep things clean with the following: > > #ifdef CONFIG_LOCAL_APIC && CONFIG_IO_APIC && CONFIG_ACPI && CONFIG_PCI_XEN > > #define XEN_DOM0 > > #endif > > in include/xen/xen.h. > > So in the source files we can still '#ifdef XEN_DOM0', but at the same > time we can get rid of the build symbol: everybody wins. No, really, I think this is silly. This kind of dependency information should be encoded in the Kconfig system, and not reimplemented in an ad-hoc way. If there were a clean way to do what Andrew wants then I'd support it, but this thread has descended into a spiral of madness, which makes me think its a lost cause. If the root complaint is that "customers think that anything set in .config is a supported feature", then the solutions are to support all the features in .config, re-educate the customers that they're wrong, or maintain a local patch to do this stuff. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 0/6] Collected vdso/vsyscall fixes for 3.1
On 08/03/2011 06:59 AM, Andrew Lutomirski wrote: > On Wed, Aug 3, 2011 at 9:56 AM, Konrad Rzeszutek Wilk > wrote: >> On Wed, Aug 03, 2011 at 09:53:22AM -0400, Konrad Rzeszutek Wilk wrote: >>> On Wed, Aug 03, 2011 at 09:31:48AM -0400, Andy Lutomirski wrote: This fixes various problems that cropped up with the vdso patches. - Patch 1 fixes an information leak to userspace. - Patches 2 and 3 fix the kernel build on gold. - Patches 4 and 5 fix Xen (I hope). - Patch 6 (optional) adds a trace event to vsyscall emulation. It will make it easier to handle performance regression reports :) >>> Hm, you seemed to have the x86 maintainers on your email.. >> I definitly need some coffee. What I meant was that you missing putting >> the x86 maintainers on this patchset. They are the ones that will handle >> this. >> >> I put them on the To list for you. > Are you sure about that coffee? I'm pretty sure I had x...@kernel.org in > there. What's the state of this series? Has tip.git picked it up, or does it need another go-around? I just booted a Linus tree and got a backtrace that looks like this issue, though I haven't looked into it in detail yet. Thanks, J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: linux-next: Tree for July 25 (xen)
On 08/08/2011 06:01 AM, Steven Rostedt wrote: > On Sat, 2011-08-06 at 11:22 -0700, Jeremy Fitzhardinge wrote: >> On 08/05/2011 06:58 PM, Konrad Rzeszutek Wilk wrote: >>> On Fri, Aug 05, 2011 at 11:14:23PM +0200, Ingo Molnar wrote: >>>> * Ingo Molnar wrote: >>>> >>>>> * Randy Dunlap wrote: >>>>> >>>>>> On 08/04/11 15:40, Konrad Rzeszutek Wilk wrote: >>>>>>> On Thu, Aug 04, 2011 at 06:32:59PM -0400, Konrad Rzeszutek Wilk wrote: >>>>>>>>>>> These build failures are still triggering upstream: >>>>>>>>>>> >>>>>>>>>>> arch/x86/xen/trace.c:44:2: error: array index in initializer not >>>>>>>>>>> of integer type >>>>>>>>>>> arch/x86/xen/trace.c:44:2: error: (near initialization for >>>>>>>>>>> ‘xen_hypercall_names’) >>>>>>>>>>> arch/x86/xen/trace.c:45:1: error: ‘__HYPERVISOR_arch_4’ undeclared >>>>>>>>>>> here (not in a function) >>>>>>>>>>> arch/x86/xen/trace.c:45:2: error: array index in initializer not >>>>>>>>>>> of integer type >>>>>>>>>>> arch/x86/xen/trace.c:45:2: error: (near initialization for >>>>>>>>>>> ‘xen_hypercall_names’) >>>>>>>>>> Oh, that I haven't seen. Can you send me the .config for that please. >>>>>>>>> You can't be trying very hard then. I see lots of these (but no, >>>>>>>> Ah, I am getting it now. Thanks for reporting it. >>>>>>> This should do the trick: >>>>>> Acked-by: Randy Dunlap >>>>> That patch did the trick here too: >>>>> >>>>> Acked-by: Ingo Molnar >>>> Except that i'm still seeing the occasional build failure - see the >>>> error log below. Config attached. >>> Much appreciate for the report. I believe this fix (which I think hit >>> linux-next yesterday?) should do that: >>> >>> commit 1e9ea2656b656edd3c8de98675bbc0340211b5bd >>> Author: Jeremy Fitzhardinge >>> Date: Wed Aug 3 09:43:44 2011 -0700 >>> >>> xen/tracing: it looks like we wanted CONFIG_FTRACE >>> >>> Apparently we wanted CONFIG_FTRACE rather the CONFIG_FUNCTION_TRACER. >>> >>> Reported-by: Sander Eikelenboom >>> Tested-by: Sander Eikelenboom >>> Signed-off-by: Jeremy Fitzhardinge >>> Signed-off-by: Konrad Rzeszutek Wilk >>> >>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile >>> index 45e94ac..3326204 100644 >>> --- a/arch/x86/xen/Makefile >>> +++ b/arch/x86/xen/Makefile >>> @@ -15,7 +15,7 @@ obj-y := enlighten.o setup.o multicalls.o >>> mmu.o irq.o \ >>> grant-table.o suspend.o platform-pci-unplug.o \ >>> p2m.o >>> >>> -obj-$(CONFIG_FUNCTION_TRACER) += trace.o >>> +obj-$(CONFIG_FTRACE) += trace.o >>> >> I'm not sure this is correct either. Maybe it should be >> CONFIG_TRACEPOINTS? Steven? > Actually, I believe the correct answer is: > > CONFIG_EVENT_TRACING OK, thanks. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: linux-next: Tree for July 25 (xen)
On 08/05/2011 06:58 PM, Konrad Rzeszutek Wilk wrote: > On Fri, Aug 05, 2011 at 11:14:23PM +0200, Ingo Molnar wrote: >> * Ingo Molnar wrote: >> >>> * Randy Dunlap wrote: >>> >>>> On 08/04/11 15:40, Konrad Rzeszutek Wilk wrote: >>>>> On Thu, Aug 04, 2011 at 06:32:59PM -0400, Konrad Rzeszutek Wilk wrote: >>>>>>>>> These build failures are still triggering upstream: >>>>>>>>> >>>>>>>>> arch/x86/xen/trace.c:44:2: error: array index in initializer not of >>>>>>>>> integer type >>>>>>>>> arch/x86/xen/trace.c:44:2: error: (near initialization for >>>>>>>>> ‘xen_hypercall_names’) >>>>>>>>> arch/x86/xen/trace.c:45:1: error: ‘__HYPERVISOR_arch_4’ undeclared >>>>>>>>> here (not in a function) >>>>>>>>> arch/x86/xen/trace.c:45:2: error: array index in initializer not of >>>>>>>>> integer type >>>>>>>>> arch/x86/xen/trace.c:45:2: error: (near initialization for >>>>>>>>> ‘xen_hypercall_names’) >>>>>>>> Oh, that I haven't seen. Can you send me the .config for that please. >>>>>>> You can't be trying very hard then. I see lots of these (but no, >>>>>> Ah, I am getting it now. Thanks for reporting it. >>>>> This should do the trick: >>>> Acked-by: Randy Dunlap >>> That patch did the trick here too: >>> >>> Acked-by: Ingo Molnar >> Except that i'm still seeing the occasional build failure - see the >> error log below. Config attached. > Much appreciate for the report. I believe this fix (which I think hit > linux-next yesterday?) should do that: > > commit 1e9ea2656b656edd3c8de98675bbc0340211b5bd > Author: Jeremy Fitzhardinge > Date: Wed Aug 3 09:43:44 2011 -0700 > > xen/tracing: it looks like we wanted CONFIG_FTRACE > > Apparently we wanted CONFIG_FTRACE rather the CONFIG_FUNCTION_TRACER. > > Reported-by: Sander Eikelenboom > Tested-by: Sander Eikelenboom > Signed-off-by: Jeremy Fitzhardinge > Signed-off-by: Konrad Rzeszutek Wilk > > diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile > index 45e94ac..3326204 100644 > --- a/arch/x86/xen/Makefile > +++ b/arch/x86/xen/Makefile > @@ -15,7 +15,7 @@ obj-y := enlighten.o setup.o multicalls.o > mmu.o irq.o \ > grant-table.o suspend.o platform-pci-unplug.o \ > p2m.o > > -obj-$(CONFIG_FUNCTION_TRACER) += trace.o > +obj-$(CONFIG_FTRACE) += trace.o > I'm not sure this is correct either. Maybe it should be CONFIG_TRACEPOINTS? Steven? Thanks, J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 5/5] x86-64: Add user_64bit_mode paravirt op
On 07/26/2011 08:20 PM, Andy Lutomirski wrote: > Three places in the kernel assume that the only long mode CPL 3 > selector is __USER_CS. This is not true on Xen -- Xen's sysretq > changes cs to the magic value 0xe033. > > Two of the places are corner cases, but as of "x86-64: Improve > vsyscall emulation CS and RIP handling" > (c9712944b2a12373cb6ff8059afcfb7e826a6c54), vsyscalls will segfault > if called with Xen's extra CS selector. This causes a panic when > older init builds die. > > It seems impossible to make Xen use __USER_CS reliably without > taking a performance hit on every system call, so this fixes the > tests instead with a new paravirt op. It's a little ugly because > ptrace.h can't include paravirt.h. > > Signed-off-by: Andy Lutomirski > Reported-by: Konrad Rzeszutek Wilk > --- > arch/x86/include/asm/desc.h |4 ++-- > arch/x86/include/asm/paravirt_types.h |6 ++ > arch/x86/include/asm/ptrace.h | 19 +++ > arch/x86/kernel/paravirt.c|4 > arch/x86/kernel/step.c|2 +- > arch/x86/kernel/vsyscall_64.c |6 +- > arch/x86/mm/fault.c |2 +- > arch/x86/xen/enlighten.c |1 + > 8 files changed, 35 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h > index 7b439d9..41935fa 100644 > --- a/arch/x86/include/asm/desc.h > +++ b/arch/x86/include/asm/desc.h > @@ -27,8 +27,8 @@ static inline void fill_ldt(struct desc_struct *desc, const > struct user_desc *in > > desc->base2 = (info->base_addr & 0xff00) >> 24; > /* > - * Don't allow setting of the lm bit. It is useless anyway > - * because 64bit system calls require __USER_CS: > + * Don't allow setting of the lm bit. It would confuse > + * user_64bit_mode and would get overridden by sysret anyway. >*/ > desc->l = 0; > } > diff --git a/arch/x86/include/asm/paravirt_types.h > b/arch/x86/include/asm/paravirt_types.h > index 2c76521..8e8b9a4 100644 > --- a/arch/x86/include/asm/paravirt_types.h > +++ b/arch/x86/include/asm/paravirt_types.h > @@ -41,6 +41,7 @@ > > #include > #include > +#include > > struct page; > struct thread_struct; > @@ -63,6 +64,11 @@ struct paravirt_callee_save { > struct pv_info { > unsigned int kernel_rpl; > int shared_kernel_pmd; > + > +#ifdef CONFIG_X86_64 > + u16 extra_user_64bit_cs; /* __USER_CS if none */ > +#endif > + > int paravirt_enabled; > const char *name; > }; > diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h > index 94e7618..3566454 100644 > --- a/arch/x86/include/asm/ptrace.h > +++ b/arch/x86/include/asm/ptrace.h > @@ -131,6 +131,9 @@ struct pt_regs { > #ifdef __KERNEL__ > > #include > +#ifdef CONFIG_PARAVIRT > +#include > +#endif > > struct cpuinfo_x86; > struct task_struct; > @@ -187,6 +190,22 @@ static inline int v8086_mode(struct pt_regs *regs) > #endif > } > > +#ifdef CONFIG_X86_64 > +static inline bool user_64bit_mode(struct pt_regs *regs) > +{ > +#ifndef CONFIG_PARAVIRT > + /* > + * On non-paravirt systems, this is the only long mode CPL 3 > + * selector. We do not allow long mode selectors in the LDT. > + */ > + return regs->cs == __USER_CS; > +#else > + /* Headers are too twisted for this to go in paravirt.h. */ > + return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; Is this necessary because usermode may sometimes be on __USER_CS or sometimes on Xen's? Could we just commit to one or the other and make it a simple comparison? What if __USER_CS were a variable? J > +#endif > +} > +#endif > + > /* > * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode > * when it traps. The previous stack will be directly underneath the saved > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c > index 613a793..d90272e 100644 > --- a/arch/x86/kernel/paravirt.c > +++ b/arch/x86/kernel/paravirt.c > @@ -307,6 +307,10 @@ struct pv_info pv_info = { > .paravirt_enabled = 0, > .kernel_rpl = 0, > .shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */ > + > +#ifdef CONFIG_X86_64 > + .extra_user_64bit_cs = __USER_CS, > +#endif > }; > > struct pv_init_ops pv_init_ops = { > diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c > index 7977f0c..c346d11 100644 > --- a/arch/x86/kernel/step.c > +++ b/arch/x86/kernel/step.c > @@ -74,7 +74,7 @@ static int is_setting_trap_flag(struct task_struct *child, > struct pt_regs *regs) > > #ifdef CONFIG_X86_64 > case 0x40 ... 0x4f: > - if (regs->cs != __USER_CS) > + if (!user_64bit_mode(regs)) > /* 32-bit mode: register increment */ > return 0; > /* 64-bit mode: REX p
Re: [PATCH 5/7] Xen: fix whitespaces, tabs coding style issue in drivers/xen/xenbus/xenbus_client.c
On 07/26/2011 04:16 AM, ruslanpisa...@gmail.com wrote: > From: Ruslan Pisarev > > This is a patch to the xenbus_client.c file that fixed up whitespaces, tabs > errors found by the checkpatch.pl tools. > > Signed-off-by: Ruslan Pisarev > --- > drivers/xen/xenbus/xenbus_client.c | 18 +- > 1 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/drivers/xen/xenbus/xenbus_client.c > b/drivers/xen/xenbus/xenbus_client.c > index cdacf92..ff4f03c 100644 > --- a/drivers/xen/xenbus/xenbus_client.c > +++ b/drivers/xen/xenbus/xenbus_client.c > @@ -43,15 +43,15 @@ > const char *xenbus_strstate(enum xenbus_state state) > { > static const char *const name[] = { > - [ XenbusStateUnknown ] = "Unknown", > - [ XenbusStateInitialising ] = "Initialising", > - [ XenbusStateInitWait ] = "InitWait", > - [ XenbusStateInitialised ] = "Initialised", > - [ XenbusStateConnected] = "Connected", > - [ XenbusStateClosing ] = "Closing", > - [ XenbusStateClosed ] = "Closed", > - [XenbusStateReconfiguring] = "Reconfiguring", > - [XenbusStateReconfigured] = "Reconfigured", > + [XenbusStateUnknown] = "Unknown", > + [XenbusStateInitialising] = "Initialising", > + [XenbusStateInitWait] = "InitWait", > + [XenbusStateInitialised] = "Initialised", > + [XenbusStateConnected] ="Connected", > + [XenbusStateClosing] = "Closing", > + [XenbusStateClosed] = "Closed", > + [XenbusStateReconfiguring] ="Reconfiguring", > + [XenbusStateReconfigured] = "Reconfigured", > }; Eh, I think this looks worse now. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 09/11] xen/xenbus: use printk_ratelimited() instead of printk_ratelimit()
On 06/16/2011 05:14 AM, Manuel Zerpies wrote: > Since printk_ratelimit() shouldn't be used anymore (see comment in > include/linux/printk.h), replace it with printk_ratelimited() > Looks OK to me, but please fix the indentation of the rest of the statement to match. Thanks, J > Signed-off-by: Manuel Zerpies > --- > drivers/xen/xenbus/xenbus_xs.c |4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c > index 5534690..4055858 100644 > --- a/drivers/xen/xenbus/xenbus_xs.c > +++ b/drivers/xen/xenbus/xenbus_xs.c > @@ -44,6 +44,7 @@ > #include > #include > #include > +#include > #include > #include "xenbus_comms.h" > > @@ -270,8 +271,7 @@ static void *xs_talkv(struct xenbus_transaction t, > } > > if (msg.type != type) { > - if (printk_ratelimit()) > - printk(KERN_WARNING > + `printk_ratelimited(KERN_WARNING > "XENBUS unexpected type [%d], expected [%d]\n", > msg.type, type); > kfree(ret); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [patch] xen: off by one errors in multicalls.c
On 06/02/2011 09:45 PM, Dan Carpenter wrote: > b->args[] has MC_ARGS elements, so the comparison here should be > ">=" instead of ">". Otherwise we read past the end of the array > one space. Yeah, looks like a correct fix. Fortunately I don't think anything currently hits that path in practice, though there are some pending patches which will exercise it more. Thanks, J > Signed-off-by: Dan Carpenter > --- > This is a static checker patch and I haven't tested it. Please > review carefully. > > diff --git a/arch/x86/xen/multicalls.c b/arch/x86/xen/multicalls.c > index 8bff7e7..1b2b73f 100644 > --- a/arch/x86/xen/multicalls.c > +++ b/arch/x86/xen/multicalls.c > @@ -189,10 +189,10 @@ struct multicall_space __xen_mc_entry(size_t args) > unsigned argidx = roundup(b->argidx, sizeof(u64)); > > BUG_ON(preemptible()); > - BUG_ON(b->argidx > MC_ARGS); > + BUG_ON(b->argidx >= MC_ARGS); > > if (b->mcidx == MC_BATCH || > - (argidx + args) > MC_ARGS) { > + (argidx + args) >= MC_ARGS) { > mc_stats_flush(b->mcidx == MC_BATCH ? FL_SLOTS : FL_ARGS); > xen_mc_flush(); > argidx = roundup(b->argidx, sizeof(u64)); > @@ -206,7 +206,7 @@ struct multicall_space __xen_mc_entry(size_t args) > ret.args = &b->args[argidx]; > b->argidx = argidx + args; > > - BUG_ON(b->argidx > MC_ARGS); > + BUG_ON(b->argidx >= MC_ARGS); > return ret; > } > > @@ -216,7 +216,7 @@ struct multicall_space xen_mc_extend_args(unsigned long > op, size_t size) > struct multicall_space ret = { NULL, NULL }; > > BUG_ON(preemptible()); > - BUG_ON(b->argidx > MC_ARGS); > + BUG_ON(b->argidx >= MC_ARGS); > > if (b->mcidx == 0) > return ret; > @@ -224,14 +224,14 @@ struct multicall_space xen_mc_extend_args(unsigned long > op, size_t size) > if (b->entries[b->mcidx - 1].op != op) > return ret; > > - if ((b->argidx + size) > MC_ARGS) > + if ((b->argidx + size) >= MC_ARGS) > return ret; > > ret.mc = &b->entries[b->mcidx - 1]; > ret.args = &b->args[b->argidx]; > b->argidx += size; > > - BUG_ON(b->argidx > MC_ARGS); > + BUG_ON(b->argidx >= MC_ARGS); > return ret; > } > > ___ > Virtualization mailing list > Virtualization@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/virtualization > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: [PATCH] xen: drop anti-dependency on X86_VISWS
On 04/09/2011 03:34 AM, Ian Campbell wrote: >>> Actually it does - see the "#ifndef CONFIG_X86_CMPXCHG" section >>> in asm/cmpxchg_32.h. >> Hm, OK. Still, I'm happiest with that dependency in case someone >> knobbles the cpu to exclude cmpxchg and breaks things. > Dropping the TSC patch is sensible though? You mean dropping the TSC dependency? Yes, I think so. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: [PATCH] xen: drop anti-dependency on X86_VISWS
On 04/08/2011 08:42 AM, Jan Beulich wrote: >>>> On 08.04.11 at 17:25, Jeremy Fitzhardinge wrote: >> On 04/07/2011 11:38 PM, Ian Campbell wrote: >>> Is there any downside to this patch (is X86_CMPXCHG in the same sort of >>> boat?) >> Only if we don't use cmpxchg in shared memory with other domains or the >> hypervisor. (I don't think it will dynamically switch between real and >> emulated cmpxchg depending on availability.) > Actually it does - see the "#ifndef CONFIG_X86_CMPXCHG" section > in asm/cmpxchg_32.h. Hm, OK. Still, I'm happiest with that dependency in case someone knobbles the cpu to exclude cmpxchg and breaks things. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen: drop anti-dependency on X86_VISWS
On 04/07/2011 11:38 PM, Ian Campbell wrote: >> Not really. The TSC register is a requirement, but that's going to be >> present on any CPU which can boot Xen. We don't need any of the >> kernel's TSC machinery though. > So why the Kconfig dependency then? In principal a kernel compiled for a > non-TSC processor (which meets the other requirements for Xen, such as > PAE support) will run just fine under Xen on a newer piece of hardware. Not sure where it came from. It was probably never needed, or just added for some secondary effect we wanted. > Is there any downside to this patch (is X86_CMPXCHG in the same sort of > boat?) Only if we don't use cmpxchg in shared memory with other domains or the hypervisor. (I don't think it will dynamically switch between real and emulated cmpxchg depending on availability.) J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen: drop anti-dependency on X86_VISWS
On 04/06/2011 11:58 PM, Ian Campbell wrote: > On Wed, 2011-04-06 at 22:45 +0100, David Miller wrote: >> From: Ian Campbell >> Date: Mon, 4 Apr 2011 10:55:55 +0100 >> >>> You mean the "!X86_VISWS" I presume? It doesn't make sense to me either. >> No, I think 32-bit x86 allmodconfig elides XEN because of it's X86_TSC >> dependency. > TSC is a real dependency of the Xen interfaces. Not really. The TSC register is a requirement, but that's going to be present on any CPU which can boot Xen. We don't need any of the kernel's TSC machinery though. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 13/14] x86/ticketlock: add slowpath logic
On 01/19/2011 08:23 AM, Srivatsa Vaddagiri wrote: > On Mon, Jan 17, 2011 at 08:52:22PM +0530, Srivatsa Vaddagiri wrote: >> I think this is still racy .. >> >> Unlocker Locker >> >> >> test slowpath >> -> false >> >> set slowpath flag >> test for lock pickup >> -> fail >> block >> >> >> unlock >> >> unlock needs to happen first before testing slowpath? I have made that change >> for my KVM guest and it seems to be working well with that change .. Will >> cleanup and post my patches shortly > Patch below fixes the race described above. You can fold this to your patch > 13/14 if you agree this is in right direction. > > > Signed-off-by: Srivatsa Vaddagiri > > --- > arch/x86/include/asm/spinlock.h |7 +++ > arch/x86/kernel/paravirt-spinlocks.c | 22 +- > 2 files changed, 8 insertions(+), 21 deletions(-) > > Index: linux-2.6.37/arch/x86/include/asm/spinlock.h > === > --- linux-2.6.37.orig/arch/x86/include/asm/spinlock.h > +++ linux-2.6.37/arch/x86/include/asm/spinlock.h > @@ -55,7 +55,7 @@ static __always_inline void __ticket_unl > > /* Only defined when CONFIG_PARAVIRT_SPINLOCKS defined, but may as > * well leave the prototype always visible. */ > -extern void __ticket_unlock_release_slowpath(struct arch_spinlock *lock); > +extern void __ticket_unlock_slowpath(struct arch_spinlock *lock); > > #ifdef CONFIG_PARAVIRT_SPINLOCKS > > @@ -166,10 +166,9 @@ static __always_inline int arch_spin_try > static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) > { > barrier(); /* prevent reordering out of locked region */ > + __ticket_unlock_release(lock); > if (unlikely(__ticket_in_slowpath(lock))) > - __ticket_unlock_release_slowpath(lock); > - else > - __ticket_unlock_release(lock); > + __ticket_unlock_slowpath(lock); > barrier(); /* prevent reordering into locked region */ > } > > Index: linux-2.6.37/arch/x86/kernel/paravirt-spinlocks.c > === > --- linux-2.6.37.orig/arch/x86/kernel/paravirt-spinlocks.c > +++ linux-2.6.37/arch/x86/kernel/paravirt-spinlocks.c > @@ -22,33 +22,21 @@ EXPORT_SYMBOL(pv_lock_ops); > * bits. However, we need to be careful about this because someone > * may just be entering as we leave, and enter the slowpath. > */ > -void __ticket_unlock_release_slowpath(struct arch_spinlock *lock) > +void __ticket_unlock_slowpath(struct arch_spinlock *lock) > { > struct arch_spinlock old, new; > > BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS); > > old = ACCESS_ONCE(*lock); > - > new = old; > - new.tickets.head += TICKET_LOCK_INC; > > /* Clear the slowpath flag */ > new.tickets.tail &= ~TICKET_SLOWPATH_FLAG; > + if (new.tickets.head == new.tickets.tail) > + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); > > - /* > - * If there's currently people waiting or someone snuck in > - * since we read the lock above, then do a normal unlock and > - * kick. If we managed to unlock with no queued waiters, then > - * we can clear the slowpath flag. > - */ > - if (new.tickets.head != new.tickets.tail || > - cmpxchg(&lock->head_tail, > - old.head_tail, new.head_tail) != old.head_tail) { > - /* still people waiting */ > - __ticket_unlock_release(lock); > - } > - > + /* Wake up an appropriate waiter */ > __ticket_unlock_kick(lock, new.tickets.head); Does the __ticket_unlock_kick need to be unconditional? If the cmpxchg() fails, then it means someone else has come in and changed the lock under our feet, and presumably they'll do the kick as needed? Or is there the possibility that the kick will get lost? There's certainly no harm in doing a redundant kick, and I'd expect that case to be pretty rare... J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
On 01/22/2011 06:53 AM, Rik van Riel wrote: > The main question that remains is whether the PV ticketlocks are > a large enough improvement to also merge those. I expect they > will be, and we'll see so in the benchmark numbers. The pathological worst-case of ticket locks in a virtual environment isn't very hard to hit, so I expect they'll make a huge difference there. On lightly loaded systems (little or no CPU overcommit) then they should be close to a no-op. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
On 01/20/2011 03:59 AM, Srivatsa Vaddagiri wrote: >> At least in the Xen code, a current owner isn't very useful, because we >> need the current owner to kick the *next* owner to life at release time, >> which we can't do without some structure recording which ticket belongs >> to which cpu. > If we had a yield-to [1] sort of interface _and_ information on which vcpu > owns a lock, then lock-spinners can yield-to the owning vcpu, while the > unlocking vcpu can yield-to the next-vcpu-in-waiting. Perhaps, but the core problem is how to find "next-vcpu-in-waiting" efficiently. Once you have that info, there's a number of things you can usefully do with it. > The key here is not to > sleep when waiting for locks (as implemented by current patch-series, which > can > put other VMs at an advantage by giving them more time than they are entitled > to) Why? If a VCPU can't make progress because its waiting for some resource, then why not schedule something else instead? Presumably when the VCPU does become runnable, the scheduler will credit its previous blocked state and let it run in preference to something else. > and also to ensure that lock-owner as well as the next-in-line lock-owner > are not unduly made to wait for cpu. > > Is there a way we can dynamically expand the size of lock only upon contention > to include additional information like owning vcpu? Have the lock point to a > per-cpu area upon contention where additional details can be stored perhaps? As soon as you add a pointer to the lock, you're increasing its size. If we had a pointer in there already, then all of this would be moot. If auxiliary per-lock is uncommon, then using a hash keyed on lock address would be one way to do it. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
On 01/20/2011 03:42 AM, Srivatsa Vaddagiri wrote: > On Wed, Jan 19, 2011 at 10:53:52AM -0800, Jeremy Fitzhardinge wrote: >>> The reason for wanting this should be clear I guess, it allows PI. >> Well, if we can expand the spinlock to include an owner, then all this >> becomes moot... > How so? Having an owner will not eliminate the need for pv-ticketlocks > afaict. We still need a mechanism to reduce latency in scheduling the next > thread-in-waiting, which is achieved by your patches. No, sorry, I should have been clearer. I meant that going to the effort of not increasing the lock size to record "in slowpath" state. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 13/14] x86/ticketlock: add slowpath logic
On 01/19/2011 10:39 AM, Srivatsa Vaddagiri wrote: > I have tested quite extensively with booting a 16-vcpu guest (on a 16-pcpu > host) > and running kernel compine (with 32-threads). Without this patch, I had > difficulty booting/shutting-down successfully (it would hang mid-way). Sounds good. But I like to test with "make -j 100-200" to really give things a workout. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
On 01/19/2011 09:21 AM, Peter Zijlstra wrote: > On Wed, 2011-01-19 at 22:42 +0530, Srivatsa Vaddagiri wrote: >> Add two hypercalls to KVM hypervisor to support pv-ticketlocks. >> >> KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or >> it >> is woken up because of an event like interrupt. >> >> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu. >> >> The presence of these hypercalls is indicated to guest via >> KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding >> patch to pass up the presence of this feature to guest via cpuid. Patch to >> qemu >> will be sent separately. > > I didn't really read the patch, and I totally forgot everything from > when I looked at the Xen series, but does the Xen/KVM hypercall > interface for this include the vcpu to await the kick from? > > My guess is not, since the ticket locks used don't know who the owner > is, which is of course, sad. There are FIFO spinlock implementations > that can do this though.. although I think they all have a bigger memory > footprint. At least in the Xen code, a current owner isn't very useful, because we need the current owner to kick the *next* owner to life at release time, which we can't do without some structure recording which ticket belongs to which cpu. (A reminder: the big problem with ticket locks is not with the current owner getting preempted, but making sure the next VCPU gets scheduled quickly when the current one releases; without that all the waiting VCPUs burn the timeslices until the VCPU scheduler gets around to scheduling the actual next in line.) At present, the code needs to scan an array of percpu "I am waiting on lock X with ticket Y" structures to work out who's next. The search is somewhat optimised by keeping a cpuset of which CPUs are actually blocked on spinlocks, but its still going to scale badly with lots of CPUs. I haven't thought of a good way to improve on this; an obvious approach is to just add a pointer to the spinlock and hang an explicit linked list off it, but that's incompatible with wanting to avoid expanding the lock. You could have a table of auxiliary per-lock data hashed on the lock address, but its not clear to me that its an improvement on the array approach, especially given the synchronization issues of keeping that structure up to date (do we have a generic lockless hashtable implementation?). But perhaps its one of those things that makes sense at larger scales. > The reason for wanting this should be clear I guess, it allows PI. Well, if we can expand the spinlock to include an owner, then all this becomes moot... J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 13/14] x86/ticketlock: add slowpath logic
On 01/17/2011 07:22 AM, Srivatsa Vaddagiri wrote: > On Tue, Nov 16, 2010 at 01:08:44PM -0800, Jeremy Fitzhardinge wrote: >> From: Jeremy Fitzhardinge >> >> Maintain a flag in both LSBs of the ticket lock which indicates whether >> anyone is in the lock slowpath and may need kicking when the current >> holder unlocks. The flags are set when the first locker enters >> the slowpath, and cleared when unlocking to an empty queue. >> >> In the specific implementation of lock_spinning(), make sure to set >> the slowpath flags on the lock just before blocking. We must do >> this before the last-chance pickup test to prevent a deadlock >> with the unlocker: >> >> Unlocker Locker >> test for lock pickup >> -> fail >> test slowpath + unlock >> -> false >> set slowpath flags >> block >> >> Whereas this works in any ordering: >> >> Unlocker Locker >> set slowpath flags >> test for lock pickup >> -> fail >> block >> test slowpath + unlock >> -> true, kick > I think this is still racy .. > > Unlocker Locker > > > test slowpath > -> false > > set slowpath flag > test for lock pickup > -> fail > block > > > unlock > > unlock needs to happen first before testing slowpath? I have made that change > for my KVM guest and it seems to be working well with that change .. Will > cleanup and post my patches shortly I think you're probably right; when I last tested this code, it was hanging in at about the rate this kind of race would cause. And in my previous analysis of similar schemes (the current pv spinlock code), it was always correct to do the "slowpath" test after doing do the unlock. *But* I haven't yet had the chance to specifically go through and analyse your patch in detail to make sure there's nothing else going on, so take this as provisional. How much testing have you given it? Thanks, J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86/pvclock-xen: zero last_value on resume
On 11/26/2010 02:40 AM, Ingo Molnar wrote: > * Jeremy Fitzhardinge wrote: > >> On 10/26/2010 09:59 AM, Jeremy Fitzhardinge wrote: >>> If the guest domain has been suspend/resumed or migrated, then the >>> system clock backing the pvclock clocksource may revert to a smaller >>> value (ie, can be non-monotonic across the migration/save-restore). >>> Make sure we zero last_value in that case so that the domain >>> continues to see clock updates. >> Ping? Looks like this fell through the gaps. > Does not apply cleanly here - mind resending the latest version? It rebased cleanly to 2.6.37-rc3. You can pull it from: git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git pvclock-resume Jeremy Fitzhardinge (1): x86/pvclock: zero last_value on resume arch/x86/include/asm/pvclock.h |1 + arch/x86/kernel/pvclock.c |5 + arch/x86/xen/time.c|2 ++ 3 files changed, 8 insertions(+), 0 deletions(-) Thanks, J Subject: [PATCH] x86/pvclock: zero last_value on resume If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by: Jeremy Fitzhardinge diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index 7f7e577..31d84ac 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -11,6 +11,7 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, struct timespec *ts); +void pvclock_resume(void); /* * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 008b91e..42eb330 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -83,6 +83,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src) static atomic64_t last_value = ATOMIC64_INIT(0); +void pvclock_resume(void) +{ + atomic64_set(&last_value, 0); +} + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index b2bb5aa..5da5e53 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -426,6 +426,8 @@ void xen_timer_resume(void) { int cpu; + pvclock_resume(); + if (xen_clockevent != &xen_vcpuop_clockevent) return; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86/pvclock-xen: zero last_value on resume
On 10/26/2010 09:59 AM, Jeremy Fitzhardinge wrote: > > If the guest domain has been suspend/resumed or migrated, then the > system clock backing the pvclock clocksource may revert to a smaller > value (ie, can be non-monotonic across the migration/save-restore). > Make sure we zero last_value in that case so that the domain > continues to see clock updates. Ping? Looks like this fell through the gaps. J > [ I don't know if kvm needs an analogous fix or not. ] > > Signed-off-by: Jeremy Fitzhardinge > Cc: Stable Kernel > Reported-by: Eelco Dolstra > Reported-by: Olivier Hanesse > Bisected-by: Cédric Schieli > Tested-by: Cédric Schieli > > diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h > index cd02f32..6226870 100644 > --- a/arch/x86/include/asm/pvclock.h > +++ b/arch/x86/include/asm/pvclock.h > @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info > *src); > void pvclock_read_wallclock(struct pvclock_wall_clock *wall, > struct pvclock_vcpu_time_info *vcpu, > struct timespec *ts); > +void pvclock_resume(void); > > #endif /* _ASM_X86_PVCLOCK_H */ > diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c > index 239427c..a4f07c1 100644 > --- a/arch/x86/kernel/pvclock.c > +++ b/arch/x86/kernel/pvclock.c > @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct > pvclock_vcpu_time_info *src) > > static atomic64_t last_value = ATOMIC64_INIT(0); > > +void pvclock_resume(void) > +{ > + atomic64_set(&last_value, 0); > +} > + > cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) > { > struct pvclock_shadow_time shadow; > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c > index b2bb5aa..5da5e53 100644 > --- a/arch/x86/xen/time.c > +++ b/arch/x86/xen/time.c > @@ -426,6 +426,8 @@ void xen_timer_resume(void) > { > int cpu; > > + pvclock_resume(); > + > if (xen_clockevent != &xen_vcpuop_clockevent) > return; > > > > > > ___ > Virtualization mailing list > Virtualization@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/virtualization ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: [PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks
On 11/17/2010 02:34 AM, Jan Beulich wrote: >> Actually, on second thoughts, maybe it doesn't matter so much. The main >> issue is making sure that the interrupt will make the VCPU drop out of >> xen_poll_irq() - if it happens before xen_poll_irq(), it should leave >> the event pending, which will cause the poll to return immediately. I >> hope. Certainly disabling interrupts for some of the function will make >> it easier to analyze with respect to interrupt nesting. > That's not my main concern. Instead, what if you get interrupted > anywhere here, the interrupt handler tries to acquire another > spinlock and also has to go into the slow path? It'll overwrite part > or all of the outer context's state. That doesn't matter if the outer context doesn't end up blocking. If it has already blocked then it will unblock as a result of the interrupt; if it hasn't yet blocked, then the inner context will leave the event pending and cause it to not block. Either way, it no longer uses or needs that per-cpu state: it will return to the spin loop and (maybe) get re-entered, setting it all up again. I think there is a problem with the code as posted because it sets up the percpu data before clearing the pending event, so it can end up blocking with bad percpu data. >> Another issue may be making sure the writes and reads of "w->want" and >> "w->lock" are ordered properly to make sure that xen_unlock_kick() never >> sees an inconsistent view of the (lock,want) tuple. The risk being that >> xen_unlock_kick() sees a random, spurious (lock,want) pairing and sends >> the kick event to the wrong VCPU, leaving the deserving one hung. > Yes, proper operation sequence (and barriers) is certainly > required here. If you allowed nesting, this may even become > simpler (as you'd have a single write making visible the new > "head" pointer, after having written all relevant fields of the > new "head" structure). Yes, simple nesting should be quite straightforward (ie allowing an interrupt handler to take some other lock than the one the outer context is waiting on). J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: [PATCH 13/14] x86/ticketlock: add slowpath logic
On 11/17/2010 04:21 AM, Peter Zijlstra wrote: > On Tue, 2010-11-16 at 13:08 -0800, Jeremy Fitzhardinge wrote: >> Maintain a flag in both LSBs of the ticket lock which indicates whether >> anyone is in the lock slowpath and may need kicking when the current >> holder unlocks. The flags are set when the first locker enters >> the slowpath, and cleared when unlocking to an empty queue. > So here you say you set both LSBs in order to keep head == tail working, > but the code seems to suggest you only use the tail LSB. > > I think I see why using only one LSB is sufficient, but some consistency > would be nice :-) I tried that initially, but it turned out more messy. The problem is that the flag can change while you're spinning on the lock, so you need to mask it out every time you read tail before you can compare it to head; if head has the flag set too, you just need to mask it out of there as well: ticket = xadd(lock, 2 << TICKET_SHIFT); ticket.tail &= ~TICKET_SLOW_FLAGS; while (ticket.head != ticket.tail) { relax(); ticket.head = lock->head /* & ~TICKET_SLOW_FLAGS */; } IOW setting both doesn't help anything, and just requires an extra mask in the spin loop (and anywhere else that uses 'head'). And hey, extra bit. Bound to be useful for something. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: [PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks
On 11/17/2010 12:52 AM, Jeremy Fitzhardinge wrote: > On 11/17/2010 12:11 AM, Jan Beulich wrote: >>>>> On 16.11.10 at 22:08, Jeremy Fitzhardinge wrote: >>> +static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want) >>> { >>> - struct xen_spinlock *xl = (struct xen_spinlock *)lock; >>> - struct xen_spinlock *prev; >>> int irq = __get_cpu_var(lock_kicker_irq); >>> - int ret; >>> + struct xen_lock_waiting *w = &__get_cpu_var(lock_waiting); >>> + int cpu = smp_processor_id(); >>> u64 start; >>> >>> /* If kicker interrupts not initialized yet, just spin */ >>> if (irq == -1) >>> - return 0; >>> + return; >>> >>> start = spin_time_start(); >>> >>> - /* announce we're spinning */ >>> - prev = spinning_lock(xl); >>> + w->want = want; >>> + w->lock = lock; >>> + >>> + /* This uses set_bit, which atomic and therefore a barrier */ >>> + cpumask_set_cpu(cpu, &waiting_cpus); >> Since you don't allow nesting, don't you need to disable >> interrupts before you touch per-CPU state? > Yes, I think you're right - interrupts need to be disabled for the bulk > of this function. Actually, on second thoughts, maybe it doesn't matter so much. The main issue is making sure that the interrupt will make the VCPU drop out of xen_poll_irq() - if it happens before xen_poll_irq(), it should leave the event pending, which will cause the poll to return immediately. I hope. Certainly disabling interrupts for some of the function will make it easier to analyze with respect to interrupt nesting. Another issue may be making sure the writes and reads of "w->want" and "w->lock" are ordered properly to make sure that xen_unlock_kick() never sees an inconsistent view of the (lock,want) tuple. The risk being that xen_unlock_kick() sees a random, spurious (lock,want) pairing and sends the kick event to the wrong VCPU, leaving the deserving one hung. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 13/14] x86/ticketlock: add slowpath logic
On 11/17/2010 12:56 AM, Jeremy Fitzhardinge wrote: > On 11/17/2010 12:52 AM, Jeremy Fitzhardinge wrote: >> But, yes, %z0 sounds interesting. Is it documented anywhere? I think >> I've tried to use it in the past and run into gcc bugs. > This one: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39590 > > Should be OK in this case because there's no 64-bit values to be seen... Hm, it fails when __ticket_t is 16 bits: /home/jeremy/git/linux/arch/x86/include/asm/spinlock.h: Assembler messages: /home/jeremy/git/linux/arch/x86/include/asm/spinlock.h:73: Error: suffix or operands invalid for `or' lock; ors $1, 2(%rbx) #, So I don't think that's going to work out... J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 13/14] x86/ticketlock: add slowpath logic
On 11/17/2010 12:58 AM, Avi Kivity wrote: >> Actually in this case I'm pretty sure there's already a "set bit" >> function which will do the job. set_bit(), I guess, though it takes a >> bit number rather than a mask... >> > > > set_bit() operates on a long, while the intel manuals recommend > against operating on operands of different size, especially with > locked operations. I think newer processors have more relaxed > requirements, though. Despite its prototype, set_bit() is pretty specifically using "orb" for a the constant case, or bts otherwise (I don't know what size memory operation bts is considered to generate). J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 13/14] x86/ticketlock: add slowpath logic
On 11/17/2010 12:52 AM, Jeremy Fitzhardinge wrote: > But, yes, %z0 sounds interesting. Is it documented anywhere? I think > I've tried to use it in the past and run into gcc bugs. This one: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39590 Should be OK in this case because there's no 64-bit values to be seen... J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 13/14] x86/ticketlock: add slowpath logic
On 11/17/2010 12:31 AM, Jan Beulich wrote: >>>> On 16.11.10 at 22:08, Jeremy Fitzhardinge wrote: >> +static inline void __ticket_enter_slowpath(struct arch_spinlock *lock) >> +{ >> +if (sizeof(lock->tickets.tail) == sizeof(u8)) >> +asm (LOCK_PREFIX "orb %1, %0" >> + : "+m" (lock->tickets.tail) >> + : "i" (TICKET_SLOWPATH_FLAG) : "memory"); >> +else >> +asm (LOCK_PREFIX "orw %1, %0" >> + : "+m" (lock->tickets.tail) >> + : "i" (TICKET_SLOWPATH_FLAG) : "memory"); >> +} > Came only now to mind: Here and elsewhere, did you try using > %z0 to have gcc produce the opcode suffix character, rather > than having these somewhat ugly if()-s? Actually in this case I'm pretty sure there's already a "set bit" function which will do the job. set_bit(), I guess, though it takes a bit number rather than a mask... But, yes, %z0 sounds interesting. Is it documented anywhere? I think I've tried to use it in the past and run into gcc bugs. Thanks, J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks
On 11/17/2010 12:11 AM, Jan Beulich wrote: >>>> On 16.11.10 at 22:08, Jeremy Fitzhardinge wrote: >> +static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want) >> { >> -struct xen_spinlock *xl = (struct xen_spinlock *)lock; >> -struct xen_spinlock *prev; >> int irq = __get_cpu_var(lock_kicker_irq); >> -int ret; >> +struct xen_lock_waiting *w = &__get_cpu_var(lock_waiting); >> +int cpu = smp_processor_id(); >> u64 start; >> >> /* If kicker interrupts not initialized yet, just spin */ >> if (irq == -1) >> -return 0; >> +return; >> >> start = spin_time_start(); >> >> -/* announce we're spinning */ >> -prev = spinning_lock(xl); >> +w->want = want; >> +w->lock = lock; >> + >> +/* This uses set_bit, which atomic and therefore a barrier */ >> +cpumask_set_cpu(cpu, &waiting_cpus); > Since you don't allow nesting, don't you need to disable > interrupts before you touch per-CPU state? Yes, I think you're right - interrupts need to be disabled for the bulk of this function. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks
From: Jeremy Fitzhardinge Replace the old Xen implementation of PV spinlocks with and implementation of xen_lock_spinning and xen_unlock_kick. xen_lock_spinning simply registers the cpu in its entry in lock_waiting, adds itself to the waiting_cpus set, and blocks on an event channel until the channel becomes pending. xen_unlock_kick searches the cpus in waiting_cpus looking for the one which next wants this lock with the next ticket, if any. If found, it kicks it by making its event channel pending, which wakes it up. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/xen/spinlock.c | 281 ++- 1 files changed, 36 insertions(+), 245 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 3d9da72..5feb897 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -19,32 +19,21 @@ #ifdef CONFIG_XEN_DEBUG_FS static struct xen_spinlock_stats { - u64 taken; u32 taken_slow; - u32 taken_slow_nested; u32 taken_slow_pickup; u32 taken_slow_spurious; - u32 taken_slow_irqenable; - u64 released; u32 released_slow; u32 released_slow_kicked; #define HISTO_BUCKETS 30 - u32 histo_spin_total[HISTO_BUCKETS+1]; - u32 histo_spin_spinning[HISTO_BUCKETS+1]; u32 histo_spin_blocked[HISTO_BUCKETS+1]; - u64 time_total; - u64 time_spinning; u64 time_blocked; } spinlock_stats; static u8 zero_stats; -static unsigned lock_timeout = 1 << 10; -#define TIMEOUT lock_timeout - static inline void check_zero(void) { if (unlikely(zero_stats)) { @@ -73,22 +62,6 @@ static void __spin_time_accum(u64 delta, u32 *array) array[HISTO_BUCKETS]++; } -static inline void spin_time_accum_spinning(u64 start) -{ - u32 delta = xen_clocksource_read() - start; - - __spin_time_accum(delta, spinlock_stats.histo_spin_spinning); - spinlock_stats.time_spinning += delta; -} - -static inline void spin_time_accum_total(u64 start) -{ - u32 delta = xen_clocksource_read() - start; - - __spin_time_accum(delta, spinlock_stats.histo_spin_total); - spinlock_stats.time_total += delta; -} - static inline void spin_time_accum_blocked(u64 start) { u32 delta = xen_clocksource_read() - start; @@ -105,214 +78,76 @@ static inline u64 spin_time_start(void) return 0; } -static inline void spin_time_accum_total(u64 start) -{ -} -static inline void spin_time_accum_spinning(u64 start) -{ -} static inline void spin_time_accum_blocked(u64 start) { } #endif /* CONFIG_XEN_DEBUG_FS */ -struct xen_spinlock { - unsigned char lock; /* 0 -> free; 1 -> locked */ - unsigned short spinners;/* count of waiting cpus */ +struct xen_lock_waiting { + struct arch_spinlock *lock; + __ticket_t want; }; static DEFINE_PER_CPU(int, lock_kicker_irq) = -1; +static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting); +static cpumask_t waiting_cpus; -#if 0 -static int xen_spin_is_locked(struct arch_spinlock *lock) -{ - struct xen_spinlock *xl = (struct xen_spinlock *)lock; - - return xl->lock != 0; -} - -static int xen_spin_is_contended(struct arch_spinlock *lock) -{ - struct xen_spinlock *xl = (struct xen_spinlock *)lock; - - /* Not strictly true; this is only the count of contended - lock-takers entering the slow path. */ - return xl->spinners != 0; -} - -static int xen_spin_trylock(struct arch_spinlock *lock) -{ - struct xen_spinlock *xl = (struct xen_spinlock *)lock; - u8 old = 1; - - asm("xchgb %b0,%1" - : "+q" (old), "+m" (xl->lock) : : "memory"); - - return old == 0; -} - -static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners); - -/* - * Mark a cpu as interested in a lock. Returns the CPU's previous - * lock of interest, in case we got preempted by an interrupt. - */ -static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl) -{ - struct xen_spinlock *prev; - - prev = __get_cpu_var(lock_spinners); - __get_cpu_var(lock_spinners) = xl; - - wmb(); /* set lock of interest before count */ - - asm(LOCK_PREFIX " incw %0" - : "+m" (xl->spinners) : : "memory"); - - return prev; -} - -/* - * Mark a cpu as no longer interested in a lock. Restores previous - * lock of interest (NULL for none). - */ -static inline void unspinning_lock(struct xen_spinlock *xl, struct xen_spinlock *prev) -{ - asm(LOCK_PREFIX " decw %0" - : "+m" (xl->spinners) : : "memory"); - wmb(); /* decrement count before restoring lock */ - __get_cpu_var(lock_spinners) = prev; -} - -static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable) +stati
[PATCH 12/14] x86/ticketlocks: when paravirtualizing ticket locks, increment by 2
From: Jeremy Fitzhardinge Increment ticket head/tails by 2 rather than 1 to leave the LSB free to store a "is in slowpath state" bit. This halves the number of possible CPUs for a given ticket size, but this shouldn't matter in practice - kernels built for 32k+ CPU systems are probably specially built for the hardware rather than a generic distro kernel. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 18 +- arch/x86/include/asm/spinlock_types.h | 10 +- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index cfa80b5..9e1c7ce 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -36,17 +36,17 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) { if (sizeof(lock->tickets.head) == sizeof(u8)) - asm (LOCK_PREFIX "incb %0" -: "+m" (lock->tickets.head) : : "memory"); + asm (LOCK_PREFIX "addb %1, %0" +: "+m" (lock->tickets.head) : "i" (TICKET_LOCK_INC) : "memory"); else - asm (LOCK_PREFIX "incw %0" -: "+m" (lock->tickets.head) : : "memory"); + asm (LOCK_PREFIX "addw %1, %0" +: "+m" (lock->tickets.head) : "i" (TICKET_LOCK_INC) : "memory"); } #else static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) { - lock->tickets.head++; + lock->tickets.head += TICKET_LOCK_INC; } #endif @@ -84,7 +84,7 @@ static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, u */ static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock) { - register struct __raw_tickets tickets = { .tail = 1 }; + register struct __raw_tickets tickets = { .tail = TICKET_LOCK_INC }; if (sizeof(lock->tickets.head) == sizeof(u8)) asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" @@ -136,7 +136,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) if (old.tickets.head != old.tickets.tail) return 0; - new.head_tail = old.head_tail + (1 << TICKET_SHIFT); + new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT); /* cmpxchg is a full barrier, so nothing can move before it */ return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail; @@ -144,7 +144,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { - __ticket_t next = lock->tickets.head + 1; + __ticket_t next = lock->tickets.head + TICKET_LOCK_INC; __ticket_unlock_release(lock); __ticket_unlock_kick(lock, next); barrier(); /* prevent reordering into locked region */ @@ -161,7 +161,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); - return ((tmp.tail - tmp.head) & TICKET_MASK) > 1; + return ((tmp.tail - tmp.head) & TICKET_MASK) > TICKET_LOCK_INC; } #define arch_spin_is_contended arch_spin_is_contended diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index 72e154e..0553c0b 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -7,7 +7,13 @@ #include -#if (CONFIG_NR_CPUS < 256) +#ifdef CONFIG_PARAVIRT_SPINLOCKS +#define __TICKET_LOCK_INC 2 +#else +#define __TICKET_LOCK_INC 1 +#endif + +#if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC)) typedef u8 __ticket_t; typedef u16 __ticketpair_t; #else @@ -15,6 +21,8 @@ typedef u16 __ticket_t; typedef u32 __ticketpair_t; #endif +#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC) + #define TICKET_SHIFT (sizeof(__ticket_t) * 8) #define TICKET_MASK((__ticket_t)((1 << TICKET_SHIFT) - 1)) -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 10/14] x86/pvticketlock: use callee-save for lock_spinning
From: Jeremy Fitzhardinge Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/paravirt.h |2 +- arch/x86/include/asm/paravirt_types.h |2 +- arch/x86/kernel/paravirt-spinlocks.c |2 +- arch/x86/xen/spinlock.c |3 ++- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index c864775..6f275ca 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -719,7 +719,7 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned ticket) { - PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); + PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket); } static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned ticket) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 1078474..53f249a 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -315,7 +315,7 @@ struct pv_mmu_ops { struct arch_spinlock; struct pv_lock_ops { - void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket); + struct paravirt_callee_save lock_spinning; void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket); }; diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c index c2e010e..4251c1d 100644 --- a/arch/x86/kernel/paravirt-spinlocks.c +++ b/arch/x86/kernel/paravirt-spinlocks.c @@ -9,7 +9,7 @@ struct pv_lock_ops pv_lock_ops = { #ifdef CONFIG_SMP - .lock_spinning = paravirt_nop, + .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop), .unlock_kick = paravirt_nop, #endif }; diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 5feb897..c31c5a3 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -137,6 +137,7 @@ out: w->lock = NULL; spin_time_accum_blocked(start); } +PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning); static void xen_unlock_kick(struct arch_spinlock *lock, unsigned next) { @@ -189,7 +190,7 @@ void xen_uninit_lock_cpu(int cpu) void __init xen_init_spinlocks(void) { - pv_lock_ops.lock_spinning = xen_lock_spinning; + pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning); pv_lock_ops.unlock_kick = xen_unlock_kick; } -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 14/14] x86/ticketlocks: tidy up __ticket_unlock_kick()
From: Jeremy Fitzhardinge __ticket_unlock_kick() is now only called from known slowpaths, so there's no need for it to do any checking of its own. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/paravirt.h |2 +- arch/x86/include/asm/spinlock.h | 14 -- 2 files changed, 1 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 6f275ca..7755b16 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -722,7 +722,7 @@ static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned t PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket); } -static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned ticket) +static inline void __ticket_unlock_kick(struct arch_spinlock *lock, unsigned ticket) { PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); } diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 8d1cb42..70675bc 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -90,10 +90,6 @@ static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, u { } -static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned ticket) -{ -} - #endif /* CONFIG_PARAVIRT_SPINLOCKS */ /* @@ -133,16 +129,6 @@ static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spin return tickets.tickets; } -/* - * If a spinlock has someone waiting on it, then kick the appropriate - * waiting cpu. - */ -static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, __ticket_t next) -{ - if (unlikely(lock->tickets.tail != next)) - ticket_unlock_kick(lock, next); -} - static __always_inline void arch_spin_lock(struct arch_spinlock *lock) { register struct __raw_tickets inc; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 06/14] x86/ticketlock: make __ticket_spin_trylock common
From: Jeremy Fitzhardinge Make trylock code common regardless of ticket size. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 49 +++-- arch/x86/include/asm/spinlock_types.h |6 +++- 2 files changed, 14 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index f722f96..3afb1a7 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) out: barrier(); /* make sure nothing creeps before the lock is taken */ } -#if (NR_CPUS < 256) static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { - unsigned int tmp, new; - - asm volatile("movzwl %2, %0\n\t" -"cmpb %h0,%b0\n\t" -"leal 0x100(%" REG_PTR_MODE "0), %1\n\t" -"jne 1f\n\t" -LOCK_PREFIX "cmpxchgw %w1,%2\n\t" -"1:" -"sete %b1\n\t" -"movzbl %b1,%0\n\t" -: "=&a" (tmp), "=&q" (new), "+m" (lock->slock) -: -: "memory", "cc"); - - return tmp; -} -#else -static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) -{ - unsigned tmp; - unsigned new; - - asm volatile("movl %2,%0\n\t" -"movl %0,%1\n\t" -"roll $16, %0\n\t" -"cmpl %0,%1\n\t" -"leal 0x0001(%" REG_PTR_MODE "0), %1\n\t" -"jne 1f\n\t" -LOCK_PREFIX "cmpxchgl %1,%2\n\t" -"1:" -"sete %b1\n\t" -"movzbl %b1,%0\n\t" -: "=&a" (tmp), "=&q" (new), "+m" (lock->slock) -: -: "memory", "cc"); - - return tmp; + arch_spinlock_t old, new; + + old.tickets = ACCESS_ONCE(lock->tickets); + if (old.tickets.head != old.tickets.tail) + return 0; + + new.head_tail = old.head_tail + (1 << TICKET_SHIFT); + + /* cmpxchg is a full barrier, so nothing can move before it */ + return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail; } -#endif static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) { diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index e3ad1e3..72e154e 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -9,8 +9,10 @@ #if (CONFIG_NR_CPUS < 256) typedef u8 __ticket_t; +typedef u16 __ticketpair_t; #else typedef u16 __ticket_t; +typedef u32 __ticketpair_t; #endif #define TICKET_SHIFT (sizeof(__ticket_t) * 8) @@ -18,14 +20,14 @@ typedef u16 __ticket_t; typedef struct arch_spinlock { union { - unsigned int slock; + __ticketpair_t head_tail; struct __raw_tickets { __ticket_t head, tail; } tickets; }; } arch_spinlock_t; -#define __ARCH_SPIN_LOCK_UNLOCKED { { .slock = 0 } } +#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } } typedef struct { unsigned int lock; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 11/14] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks
From: Jeremy Fitzhardinge The code size expands somewhat, and its probably better to just call a function rather than inline it. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/Kconfig |3 +++ kernel/Kconfig.locks |2 +- 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e832768..a615c9c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -531,6 +531,9 @@ config PARAVIRT_SPINLOCKS If you are unsure how to answer this question, answer N. +config ARCH_NOINLINE_SPIN_UNLOCK + def_bool PARAVIRT_SPINLOCKS + config PARAVIRT_CLOCK bool diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 88c92fb..3216c22 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE ARCH_INLINE_SPIN_LOCK_IRQSAVE config INLINE_SPIN_UNLOCK - def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) + def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) && !ARCH_NOINLINE_SPIN_UNLOCK config INLINE_SPIN_UNLOCK_BH def_bool !DEBUG_SPINLOCK && ARCH_INLINE_SPIN_UNLOCK_BH -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 01/14] x86/ticketlock: clean up types and accessors
From: Jeremy Fitzhardinge A few cleanups to the way spinlocks are defined and accessed: - define __ticket_t which is the size of a spinlock ticket (ie, enough bits to hold all the cpus) - Define struct arch_spinlock as a union containing plain slock and the head and tail tickets - Use head and tail to implement some of the spinlock predicates. - Make all ticket variables unsigned. - Use TICKET_SHIFT to form constants Most of this will be used in later patches. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 24 ++-- arch/x86/include/asm/spinlock_types.h | 20 ++-- 2 files changed, 28 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 3089f70..d6d5784 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -56,11 +56,9 @@ * much between them in performance though, especially as locks are out of line. */ #if (NR_CPUS < 256) -#define TICKET_SHIFT 8 - static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - short inc = 0x0100; + unsigned short inc = 1 << TICKET_SHIFT; asm volatile ( LOCK_PREFIX "xaddw %w0, %1\n" @@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { - int tmp, new; + unsigned int tmp, new; asm volatile("movzwl %2, %0\n\t" "cmpb %h0,%b0\n\t" @@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) : "memory", "cc"); } #else -#define TICKET_SHIFT 16 - static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - int inc = 0x0001; - int tmp; + unsigned inc = 1 << TICKET_SHIFT; + unsigned tmp; asm volatile(LOCK_PREFIX "xaddl %0, %1\n" "movzwl %w0, %2\n\t" @@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { - int tmp; - int new; + unsigned tmp; + unsigned new; asm volatile("movl %2,%0\n\t" "movl %0,%1\n\t" @@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) static inline int __ticket_spin_is_locked(arch_spinlock_t *lock) { - int tmp = ACCESS_ONCE(lock->slock); + struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); - return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1)); + return !!(tmp.tail ^ tmp.head); } static inline int __ticket_spin_is_contended(arch_spinlock_t *lock) { - int tmp = ACCESS_ONCE(lock->slock); + struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); - return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1; + return ((tmp.tail - tmp.head) & TICKET_MASK) > 1; } #ifndef CONFIG_PARAVIRT_SPINLOCKS diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index dcb48b2..e3ad1e3 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -5,11 +5,27 @@ # error "please don't include this file directly" #endif +#include + +#if (CONFIG_NR_CPUS < 256) +typedef u8 __ticket_t; +#else +typedef u16 __ticket_t; +#endif + +#define TICKET_SHIFT (sizeof(__ticket_t) * 8) +#define TICKET_MASK((__ticket_t)((1 << TICKET_SHIFT) - 1)) + typedef struct arch_spinlock { - unsigned int slock; + union { + unsigned int slock; + struct __raw_tickets { + __ticket_t head, tail; + } tickets; + }; } arch_spinlock_t; -#define __ARCH_SPIN_LOCK_UNLOCKED { 0 } +#define __ARCH_SPIN_LOCK_UNLOCKED { { .slock = 0 } } typedef struct { unsigned int lock; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 05/14] x86/ticketlock: make __ticket_spin_lock common
From: Jeremy Fitzhardinge Aside from the particular form of the xadd instruction, they're identical. So factor out the xadd and use common code for the rest. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 42 ++ 1 files changed, 20 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 1b81809..f722f96 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) * save some instructions and make the code more elegant. There really isn't * much between them in performance though, especially as locks are out of line. */ -#if (NR_CPUS < 256) -static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) +static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock) { - register struct __raw_tickets inc = { .tail = 1 }; + register struct __raw_tickets tickets = { .tail = 1 }; + + if (sizeof(lock->tickets.head) == sizeof(u8)) + asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" + : "+r" (tickets), "+m" (lock->tickets) + : : "memory", "cc"); + else + asm volatile (LOCK_PREFIX "xaddl %0, %1\n" +: "+r" (tickets), "+m" (lock->tickets) +: : "memory", "cc"); - asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" - : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc"); + return tickets; +} + +static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) +{ + register struct __raw_tickets inc; + + inc = __ticket_spin_claim(lock); for (;;) { if (inc.head == inc.tail) @@ -84,6 +98,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) out: barrier(); /* make sure nothing creeps before the lock is taken */ } +#if (NR_CPUS < 256) static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { unsigned int tmp, new; @@ -103,23 +118,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return tmp; } #else -static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) -{ - register struct __raw_tickets inc = { .tail = 1 }; - - asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t" -: "+r" (inc), "+m" (lock->tickets) -: : "memory", "cc"); - - for (;;) { - if (inc.head == inc.tail) - goto out; - cpu_relax(); - inc.head = ACCESS_ONCE(lock->tickets.head); - } -out: barrier(); /* make sure nothing creeps before the lock is taken */ -} - static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { unsigned tmp; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 13/14] x86/ticketlock: add slowpath logic
From: Jeremy Fitzhardinge Maintain a flag in both LSBs of the ticket lock which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue. In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: UnlockerLocker test for lock pickup -> fail test slowpath + unlock -> false set slowpath flags block Whereas this works in any ordering: UnlockerLocker set slowpath flags test for lock pickup -> fail block test slowpath + unlock -> true, kick Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 53 - arch/x86/include/asm/spinlock_types.h |2 + arch/x86/kernel/paravirt-spinlocks.c | 37 +++ arch/x86/xen/spinlock.c |4 ++ 4 files changed, 88 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 9e1c7ce..8d1cb42 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -53,7 +53,38 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) /* How long a lock should spin before we consider blocking */ #define SPIN_THRESHOLD (1 << 11) -#ifndef CONFIG_PARAVIRT_SPINLOCKS +/* Only defined when CONFIG_PARAVIRT_SPINLOCKS defined, but may as + * well leave the prototype always visible. */ +extern void __ticket_unlock_release_slowpath(struct arch_spinlock *lock); + +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +/* + * Return true if someone is in the slowpath on this lock. This + * should only be used by the current lock-holder. + */ +static inline bool __ticket_in_slowpath(struct arch_spinlock *lock) +{ + return !!(lock->tickets.tail & TICKET_SLOWPATH_FLAG); +} + +static inline void __ticket_enter_slowpath(struct arch_spinlock *lock) +{ + if (sizeof(lock->tickets.tail) == sizeof(u8)) + asm (LOCK_PREFIX "orb %1, %0" +: "+m" (lock->tickets.tail) +: "i" (TICKET_SLOWPATH_FLAG) : "memory"); + else + asm (LOCK_PREFIX "orw %1, %0" +: "+m" (lock->tickets.tail) +: "i" (TICKET_SLOWPATH_FLAG) : "memory"); +} + +#else /* !CONFIG_PARAVIRT_SPINLOCKS */ +static inline bool __ticket_in_slowpath(struct arch_spinlock *lock) +{ + return false; +} static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned ticket) { @@ -84,18 +115,22 @@ static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, u */ static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock) { - register struct __raw_tickets tickets = { .tail = TICKET_LOCK_INC }; + register struct arch_spinlock tickets = { + { .tickets.tail = TICKET_LOCK_INC } + }; if (sizeof(lock->tickets.head) == sizeof(u8)) asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" - : "+r" (tickets), "+m" (lock->tickets) + : "+r" (tickets.head_tail), "+m" (lock->tickets) : : "memory", "cc"); else asm volatile (LOCK_PREFIX "xaddl %0, %1\n" -: "+r" (tickets), "+m" (lock->tickets) +: "+r" (tickets.head_tail), "+m" (lock->tickets) : : "memory", "cc"); - return tickets; + tickets.tickets.tail &= ~TICKET_SLOWPATH_FLAG; + + return tickets.tickets; } /* @@ -144,9 +179,11 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { - __ticket_t next = lock->tickets.head + TICKET_LOCK_INC; - __ticket_unlock_release(lock); - __ticket_unlock_kick(lock, next); + barrier(); /* prevent reordering out of locked region */ + if (unlikely(__ticket_in_slowpath(lock))) + __ticket_unlock_release_slowpath(lock); + else + __ticket_unlock_release(lock); barrie
[PATCH 07/14] x86/spinlocks: replace pv spinlocks with pv ticketlocks
From: Jeremy Fitzhardinge Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/paravirt.h | 30 +++--- arch/x86/include/asm/paravirt_types.h |8 + arch/x86/include/asm/spinlock.h | 44 +++-- arch/x86/kernel/paravirt-spinlocks.c | 15 +- arch/x86/xen/spinlock.c |7 - 5 files changed, 50 insertions(+), 54 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 18e3b8a..c864775 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -717,36 +717,14 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS) -static inline int arch_spin_is_locked(struct arch_spinlock *lock) +static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned ticket) { - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock); + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); } -static inline int arch_spin_is_contended(struct arch_spinlock *lock) +static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned ticket) { - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock); -} -#define arch_spin_is_contended arch_spin_is_contended - -static __always_inline void arch_spin_lock(struct arch_spinlock *lock) -{ - PVOP_VCALL1(pv_lock_ops.spin_lock, lock); -} - -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock, - unsigned long flags) -{ - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags); -} - -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock) -{ - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock); -} - -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock) -{ - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock); + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); } #endif diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index b82bac9..1078474 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -315,12 +315,8 @@ struct pv_mmu_ops { struct arch_spinlock; struct pv_lock_ops { - int (*spin_is_locked)(struct arch_spinlock *lock); - int (*spin_is_contended)(struct arch_spinlock *lock); - void (*spin_lock)(struct arch_spinlock *lock); - void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long flags); - int (*spin_trylock)(struct arch_spinlock *lock); - void (*spin_unlock)(struct arch_spinlock *lock); + void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket); + void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket); }; /* This contains all the paravirt structures: we get a convenient diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 3afb1a7..8e379d3 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -50,6 +50,21 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) } #endif +/* How long a lock should spin before we consider blocking */ +#define SPIN_THRESHOLD (1 << 11) + +#ifndef CONFIG_PARAVIRT_SPINLOCKS + +static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned ticket) +{ +} + +static __always_inline void ticket_unlock_kick(struct arch_spi
[PATCH 08/14] x86/ticketlock: collapse a layer of functions
From: Jeremy Fitzhardinge Now that the paravirtualization layer doesn't exist at the spinlock level any more, we can collapse the __ticket_ functions into the arch_ functions. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 35 +-- 1 files changed, 5 insertions(+), 30 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 8e379d3..cfa80b5 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -108,7 +108,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, __t ticket_unlock_kick(lock, next); } -static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) +static __always_inline void arch_spin_lock(struct arch_spinlock *lock) { register struct __raw_tickets inc; @@ -128,7 +128,7 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) out: barrier(); /* make sure nothing creeps before the lock is taken */ } -static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) +static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) { arch_spinlock_t old, new; @@ -142,7 +142,7 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail; } -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) +static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { __ticket_t next = lock->tickets.head + 1; __ticket_unlock_release(lock); @@ -150,46 +150,21 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) barrier(); /* prevent reordering into locked region */ } -static inline int __ticket_spin_is_locked(arch_spinlock_t *lock) +static inline int arch_spin_is_locked(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); return !!(tmp.tail ^ tmp.head); } -static inline int __ticket_spin_is_contended(arch_spinlock_t *lock) +static inline int arch_spin_is_contended(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); return ((tmp.tail - tmp.head) & TICKET_MASK) > 1; } - -static inline int arch_spin_is_locked(arch_spinlock_t *lock) -{ - return __ticket_spin_is_locked(lock); -} - -static inline int arch_spin_is_contended(arch_spinlock_t *lock) -{ - return __ticket_spin_is_contended(lock); -} #define arch_spin_is_contended arch_spin_is_contended -static __always_inline void arch_spin_lock(arch_spinlock_t *lock) -{ - __ticket_spin_lock(lock); -} - -static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) -{ - return __ticket_spin_trylock(lock); -} - -static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) -{ - __ticket_spin_unlock(lock); -} - static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) { -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 00/14] PV ticket locks without expanding spinlock
From: Jeremy Fitzhardinge Hi all, This is a revised version of the pvticket lock series. The early part of the series is mostly unchanged: it converts the bulk of the ticket lock code into C and makes the "small" and "large" ticket code common. The only changes are the incorporation of various review comments. The latter part of the series converts from pv spinlocks to pv ticket locks (ie, using the ticket lock fastpath as-is, but adding pv ops for the ticketlock slowpaths). The significant difference here is that rather than adding a new ticket_t-sized element to arch_spinlock_t - effectively doubling the size - I steal the LSB of the tickets themselves to store a bit. This allows the structure to remain the same size, but at the cost of halving the max number of CPUs (127 for a 8-bit ticket, and a hard max of 32767 overall). The extra bit (well, two, but one is unused) in indicates whether the lock has gone into "slowpath state", which means one of its lockers has entered its slowpath and has blocked in the hypervisor. This means the current lock-holder needs to make sure it gets kicked out of the hypervisor on unlock. The spinlock remains in slowpath state until the last unlock happens (ie there are no more queued lockers). This code survives for a while with moderate testing, (make -j 100 on 8 VCPUs on a 4 PCPU system), but locks up after about 20 iterations, so there's still some race/deadlock in there (probably something misordered), but I think the basic approach is sound. Thanks, J Jeremy Fitzhardinge (14): x86/ticketlock: clean up types and accessors x86/ticketlock: convert spin loop to C x86/ticketlock: Use C for __ticket_spin_unlock x86/ticketlock: make large and small ticket versions of spin_lock the same x86/ticketlock: make __ticket_spin_lock common x86/ticketlock: make __ticket_spin_trylock common x86/spinlocks: replace pv spinlocks with pv ticketlocks x86/ticketlock: collapse a layer of functions xen/pvticketlock: Xen implementation for PV ticket locks x86/pvticketlock: use callee-save for lock_spinning x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks x86/ticketlocks: when paravirtualizing ticket locks, increment by 2 x86/ticketlock: add slowpath logic x86/ticketlocks: tidy up __ticket_unlock_kick() arch/x86/Kconfig |3 + arch/x86/include/asm/paravirt.h | 30 +--- arch/x86/include/asm/paravirt_types.h |8 +- arch/x86/include/asm/spinlock.h | 236 +--- arch/x86/include/asm/spinlock_types.h | 32 - arch/x86/kernel/paravirt-spinlocks.c | 52 +-- arch/x86/xen/spinlock.c | 281 + kernel/Kconfig.locks |2 +- 8 files changed, 231 insertions(+), 413 deletions(-) -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 04/14] x86/ticketlock: make large and small ticket versions of spin_lock the same
From: Jeremy Fitzhardinge Make the bulk of __ticket_spin_lock look identical for large and small number of cpus. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 23 --- 1 files changed, 8 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 0170ba9..1b81809 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) #if (NR_CPUS < 256) static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - register union { - struct __raw_tickets tickets; - unsigned short slock; - } inc = { .slock = 1 << TICKET_SHIFT }; + register struct __raw_tickets inc = { .tail = 1 }; asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" - : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc"); + : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc"); for (;;) { - if (inc.tickets.head == inc.tickets.tail) + if (inc.head == inc.tail) goto out; cpu_relax(); - inc.tickets.head = ACCESS_ONCE(lock->tickets.head); + inc.head = ACCESS_ONCE(lock->tickets.head); } out: barrier(); /* make sure nothing creeps before the lock is taken */ } @@ -108,21 +105,17 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) #else static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - unsigned inc = 1 << TICKET_SHIFT; - __ticket_t tmp; + register struct __raw_tickets inc = { .tail = 1 }; asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t" -: "+r" (inc), "+m" (lock->slock) +: "+r" (inc), "+m" (lock->tickets) : : "memory", "cc"); - tmp = inc; - inc >>= TICKET_SHIFT; - for (;;) { - if ((__ticket_t)inc == tmp) + if (inc.head == inc.tail) goto out; cpu_relax(); - tmp = ACCESS_ONCE(lock->tickets.head); + inc.head = ACCESS_ONCE(lock->tickets.head); } out: barrier(); /* make sure nothing creeps before the lock is taken */ } -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 02/14] x86/ticketlock: convert spin loop to C
From: Jeremy Fitzhardinge The inner loop of __ticket_spin_lock isn't doing anything very special, so reimplement it in C. For the 8 bit ticket lock variant, we use a register union to get direct access to the lower and upper bytes in the tickets, but unfortunately gcc won't generate a direct comparison between the two halves of the register, so the generated asm isn't quite as pretty as the hand-coded version. However benchmarking shows that this is actually a small improvement in runtime performance on some benchmarks, and never a slowdown. We also need to make sure there's a barrier at the end of the lock loop to make sure that the compiler doesn't move any instructions from within the locked region into the region where we don't yet own the lock. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 58 +++--- 1 files changed, 29 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index d6d5784..f48a6e3 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -58,21 +58,21 @@ #if (NR_CPUS < 256) static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - unsigned short inc = 1 << TICKET_SHIFT; - - asm volatile ( - LOCK_PREFIX "xaddw %w0, %1\n" - "1:\t" - "cmpb %h0, %b0\n\t" - "je 2f\n\t" - "rep ; nop\n\t" - "movb %1, %b0\n\t" - /* don't need lfence here, because loads are in-order */ - "jmp 1b\n" - "2:" - : "+Q" (inc), "+m" (lock->slock) - : - : "memory", "cc"); + register union { + struct __raw_tickets tickets; + unsigned short slock; + } inc = { .slock = 1 << TICKET_SHIFT }; + + asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" + : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc"); + + for (;;) { + if (inc.tickets.head == inc.tickets.tail) + goto out; + cpu_relax(); + inc.tickets.head = ACCESS_ONCE(lock->tickets.head); + } +out: barrier(); /* make sure nothing creeps before the lock is taken */ } static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) @@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { unsigned inc = 1 << TICKET_SHIFT; - unsigned tmp; + __ticket_t tmp; - asm volatile(LOCK_PREFIX "xaddl %0, %1\n" -"movzwl %w0, %2\n\t" -"shrl $16, %0\n\t" -"1:\t" -"cmpl %0, %2\n\t" -"je 2f\n\t" -"rep ; nop\n\t" -"movzwl %1, %2\n\t" -/* don't need lfence here, because loads are in-order */ -"jmp 1b\n" -"2:" -: "+r" (inc), "+m" (lock->slock), "=&r" (tmp) -: -: "memory", "cc"); + asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t" +: "+r" (inc), "+m" (lock->slock) +: : "memory", "cc"); + + tmp = inc; + inc >>= TICKET_SHIFT; + + for (;;) { + if ((__ticket_t)inc == tmp) + goto out; + cpu_relax(); + tmp = ACCESS_ONCE(lock->tickets.head); + } +out: barrier(); /* make sure nothing creeps before the lock is taken */ } static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 03/14] x86/ticketlock: Use C for __ticket_spin_unlock
From: Jeremy Fitzhardinge If we don't need to use a locked inc for unlock, then implement it in C. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 32 +--- 1 files changed, 17 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index f48a6e3..0170ba9 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -33,9 +33,21 @@ * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock * (PPro errata 66, 92) */ -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) +{ + if (sizeof(lock->tickets.head) == sizeof(u8)) + asm (LOCK_PREFIX "incb %0" +: "+m" (lock->tickets.head) : : "memory"); + else + asm (LOCK_PREFIX "incw %0" +: "+m" (lock->tickets.head) : : "memory"); + +} #else -# define UNLOCK_LOCK_PREFIX +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) +{ + lock->tickets.head++; +} #endif /* @@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return tmp; } - -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) -{ - asm volatile(UNLOCK_LOCK_PREFIX "incb %0" -: "+m" (lock->slock) -: -: "memory", "cc"); -} #else static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { @@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return tmp; } +#endif static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) { - asm volatile(UNLOCK_LOCK_PREFIX "incw %0" -: "+m" (lock->slock) -: -: "memory", "cc"); + __ticket_unlock_release(lock); + barrier(); /* prevent reordering into locked region */ } -#endif static inline int __ticket_spin_is_locked(arch_spinlock_t *lock) { -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization
On 11/15/2010 12:14 PM, Peter Zijlstra wrote: > On Mon, 2010-11-15 at 12:03 -0800, H. Peter Anvin wrote: >> On 11/15/2010 12:00 PM, Jeremy Fitzhardinge wrote: >>> Another approach I discussed with PeterZ and Mathieu is to steal the LSB >>> of the ticket counters (halving the max CPU count) to use as a "there is >>> someone in slowpath waiting on this lock". But I haven't spent the time >>> to work out an algorithm to maintain that flag (or flags, since there >>> are bits available) in a correct and efficient way. >>> >> Definitely worth pondering. > Right, so the idea was to make the ticket increment 2, which would leave > the LSB of both the head and tail available. I think that if one were to > set both (using a cmpxchg), the ticket fast-path wouldn't need any > changes since head==tail is still the correct condition for acquisition. > > Then the unlock needs an added conditional: > if (tail & 1) > unlock_slowpath() The tricky part is knowing how to clear the bit(s) on the last person dropping out of the slow path, and making that race-free with respect to new lockers entering the slow path. I guess you could leave it in slowpath state until you're the last unlocker (ie, you're unlocking into uncontended state), whereupon you also clear the bits; I guess that would probably need a cmpxchg to make it safe WRT new lockers entering slowpath. As a heuristic, it shouldn't be too bad performancewise, since (handwaving) if ticketholder N has entered the slowpath, then its likely that N+1 will as well. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization
On 11/12/2010 02:20 PM, H. Peter Anvin wrote: > On 11/12/2010 02:17 PM, Jeremy Fitzhardinge wrote: >> On 11/12/2010 02:12 PM, H. Peter Anvin wrote: >>> On 11/03/2010 07:59 AM, Jeremy Fitzhardinge wrote: >>>> - with an unmodified struct spinlock, it can check to see if >>>> head == tail after unlock; if not, then there's someone else >>>> trying to lock, and we can do a kick. Unfortunately this >>>> generates very high level of redundant kicks, because the >>>> waiting CPU might not have blocked yet (which is the common >>>> case) >>>> >>> How high is "very high" here -- most of the time (so that any mitigation >>> on the slow patch is useless)? >> I'll need to remeasure, but I think around 90% of the slowpath entries >> were spurious without this. In other words, when spinlocks do contend, >> most of the time it isn't very serious and the other cpu doesn't spend >> much time spinning. >> > 90% of the slowpath entries is one thing, my real question is the > fraction of fastpath entries that get diverted to the slowpath. It > affects where mitigation needs to happen. There are two questions: how many unlock events *must* go into the slowpath for correctness reasons (ie, because the corresponding lock also went slowpath and got blocked there), and how many end up going into the slowpath due to imperfect heuristics? The number of lock events which go slowpath is very dependent on the workload of the kernel in question and of the machine overall. On a system with no CPU overcommit it should be zero (assuming that in the native case no Linux spinlock remains contended for so long that it will trigger the slowpath). On a very overcommitted system, it comes down to what the likelihood that a VCPU will get preempted while running in a critical region: Tcrit * Phz, where Tcrit is the critical section time in S and Phz is the preemption rate of the VCPU scheduler in Hz. So, for example, a lock with a 10uS critical section and a 100Hz preemption rate will have a .1% chance of getting preempted and possibly causing the other lockers to enter the slow path. On the unlock side, it needs to test whether lock has any waiters in a slowpath state. A conservative test is whether there are any outstanding tickets, but in my measurements 90% of CPUs which spun on a lock ended up getting it without having to take the slowpath. This lead me to investigate more precise tests, which is currently a count of slowpath-entering CPUs waiting on the lock. Another approach I discussed with PeterZ and Mathieu is to steal the LSB of the ticket counters (halving the max CPU count) to use as a "there is someone in slowpath waiting on this lock". But I haven't spent the time to work out an algorithm to maintain that flag (or flags, since there are bits available) in a correct and efficient way. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 06/20] x86/ticketlock: make __ticket_spin_trylock common
On 11/13/2010 02:48 AM, Eric Dumazet wrote: > Le samedi 13 novembre 2010 à 18:17 +0800, Américo Wang a écrit : >> On Wed, Nov 03, 2010 at 10:59:47AM -0400, Jeremy Fitzhardinge wrote: >> >>> + union { >>> + struct __raw_tickets tickets; >>> + __ticketpair_t slock; >>> + } tmp, new; >>> + int ret; >>> + >>> + tmp.tickets = ACCESS_ONCE(lock->tickets); >>> + if (tmp.tickets.head != tmp.tickets.tail) >>> + return 0; >>> + >>> + new.slock = tmp.slock + (1 << TICKET_SHIFT); >>> + >>> + ret = cmpxchg(&lock->ticketpair, tmp.slock, new.slock) == tmp.slock; >>> + barrier(); /* just make nothing creeps before lock is >>> claimed */ >> This one should be smp_wmb(), right? No CONFIG_X86_OOSTORE protected. > cmpxchg() is a full memory barrier, no need for smp_wmb() or barrier() Agreed. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 03/20] x86/ticketlock: Use C for __ticket_spin_unlock
On 11/13/2010 02:05 AM, Américo Wang wrote: > On Wed, Nov 03, 2010 at 10:59:44AM -0400, Jeremy Fitzhardinge wrote: >> * On PPro SMP or if we are using OOSTORE, we use a locked operation to >> unlock >> * (PPro errata 66, 92) >> */ >> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX >> +static __always_inline void __ticket_unlock_release(struct arch_spinlock >> *lock) >> +{ >> +if (sizeof(lock->tickets.head) == sizeof(u8)) >> +asm (LOCK_PREFIX "incb %0" >> + : "+m" (lock->tickets.head) : : "memory"); >> +else >> +asm (LOCK_PREFIX "incw %0" >> + : "+m" (lock->tickets.head) : : "memory"); > This 'if/else' really should be done with #ifdef, even though > the compiler may be smart enough to remove it. No, we depend on if/else with constant arguments doing the right thing all over the kernel. It is always preferable to use it instead of #ifdef where possible, so that the two branches of code are always subjected to compiler checking, even if they're not being used. >> + >> +} >> #else >> -# define UNLOCK_LOCK_PREFIX >> +static __always_inline void __ticket_unlock_release(struct arch_spinlock >> *lock) >> +{ >> +barrier(); >> +lock->tickets.head++; >> +barrier(); > The second barrier() is not needed. Agreed. It gets removed in a later patch. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 01/20] x86/ticketlock: clean up types and accessors
On 11/13/2010 01:57 AM, Américo Wang wrote: > On Wed, Nov 03, 2010 at 10:59:42AM -0400, Jeremy Fitzhardinge wrote: > ... >> static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) >> { >> -int inc = 0x0001; >> -int tmp; >> +unsigned inc = 1 << TICKET_SHIFT; >> +unsigned tmp; > Please don't use old implicit-int. I don't mind much either way, but I don't think I've seen anyone worry about this before. >> -return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1; >> +return ((tmp.tail - tmp.head) & TICKET_MASK) > 1; > > There is a type promotion here. ... >> } >> >> #ifndef CONFIG_PARAVIRT_SPINLOCKS >> diff --git a/arch/x86/include/asm/spinlock_types.h >> b/arch/x86/include/asm/spinlock_types.h >> index dcb48b2..4582640 100644 >> --- a/arch/x86/include/asm/spinlock_types.h >> +++ b/arch/x86/include/asm/spinlock_types.h >> @@ -5,11 +5,27 @@ >> # error "please don't include this file directly" >> #endif >> >> +#include >> + >> +#if (CONFIG_NR_CPUS < 256) >> +typedef u8 __ticket_t; >> +#else >> +typedef u16 __ticket_t; >> +#endif >> + >> +#define TICKET_SHIFT(sizeof(__ticket_t) * 8) >> +#define TICKET_MASK ((1 << TICKET_SHIFT) - 1) > > So here you may need to cast the result to __ticket_t. OK. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH -next] xen: fix header export to userspace
On 11/13/2010 08:44 AM, Randy Dunlap wrote: > From: Randy Dunlap > > scripts/headers_install.pl prevents "__user" from being exported > to userspace headers, so just use compiler.h to make sure that > __user is defined and avoid the error. > > unifdef: linux-next-20101112/xx64/usr/include/xen/privcmd.h.tmp: 79: > Premature EOF (#if line 33 depth 1) Ah, OK, thanks. I was wondering what the proper fix for this was. I'll stick this in my tree. Thanks, J > Signed-off-by: Randy Dunlap > Cc: Jeremy Fitzhardinge > Cc: Konrad Rzeszutek Wilk > Cc: xen-de...@lists.xensource.com (moderated for non-subscribers) > Cc: virtualizat...@lists.osdl.org > Cc: Tony Finch > --- > include/xen/privcmd.h |5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > --- linux-next-20101112.orig/include/xen/privcmd.h > +++ linux-next-20101112/include/xen/privcmd.h > @@ -34,13 +34,10 @@ > #define __LINUX_PUBLIC_PRIVCMD_H__ > > #include > +#include > > typedef unsigned long xen_pfn_t; > > -#ifndef __user > -#define __user > -#endif > - > struct privcmd_hypercall { > __u64 op; > __u64 arg[5]; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization
On 11/12/2010 02:12 PM, H. Peter Anvin wrote: > On 11/03/2010 07:59 AM, Jeremy Fitzhardinge wrote: >> - with an unmodified struct spinlock, it can check to see if >> head == tail after unlock; if not, then there's someone else >> trying to lock, and we can do a kick. Unfortunately this >> generates very high level of redundant kicks, because the >> waiting CPU might not have blocked yet (which is the common >> case) >> > How high is "very high" here -- most of the time (so that any mitigation > on the slow patch is useless)? I'll need to remeasure, but I think around 90% of the slowpath entries were spurious without this. In other words, when spinlocks do contend, most of the time it isn't very serious and the other cpu doesn't spend much time spinning. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 04/20] x86/ticketlock: make large and small ticket versions of spin_lock the same
On 11/12/2010 04:19 AM, Srivatsa Vaddagiri wrote: > On Wed, Nov 03, 2010 at 10:59:45AM -0400, Jeremy Fitzhardinge wrote: >> Make the bulk of __ticket_spin_lock look identical for large and small >> number of cpus. > [snip] > >> #if (NR_CPUS < 256) >> static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) >> { >> -register union { >> -struct __raw_tickets tickets; >> -unsigned short slock; >> -} inc = { .slock = 1 << TICKET_SHIFT }; >> +register struct __raw_tickets inc = { .tail = 1 }; > [snip] > >> #else >> static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) >> { >> -unsigned inc = 1 << TICKET_SHIFT; >> -__ticket_t tmp; >> +register struct __raw_tickets inc = { .tickets.tail = 1 }; > s/.tickets//? > > Otherwise I get a compile error for NR_CPUS > 256, with just 4 patches > applied. Yeah, likely. That's precisely why I wanted to make them the same ;). J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/3] VIDEO: xen-fb, switch to for_each_console
On 11/03/2010 11:28 AM, Jiri Slaby wrote: > Use newly added for_each_console for iterating consoles. > > Signed-off-by: Jiri Slaby > Cc: Jeremy Fitzhardinge Sure, if that's what all the kids are doing these days. Acked-by: Jeremy Fitzhardinge J > Cc: Chris Wright > Cc: virtualizat...@lists.osdl.org > Cc: xen-de...@lists.xensource.com > Cc: linux-fb...@vger.kernel.org > --- > drivers/video/xen-fbfront.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/video/xen-fbfront.c b/drivers/video/xen-fbfront.c > index 428d273..4abb0b9 100644 > --- a/drivers/video/xen-fbfront.c > +++ b/drivers/video/xen-fbfront.c > @@ -492,7 +492,7 @@ xenfb_make_preferred_console(void) > return; > > acquire_console_sem(); > - for (c = console_drivers; c; c = c->next) { > + for_each_console(c) { > if (!strcmp(c->name, "tty") && c->index == 0) > break; > } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 03/20] x86/ticketlock: Use C for __ticket_spin_unlock
On 11/03/2010 11:13 AM, Eric Dumazet wrote: > Le mercredi 03 novembre 2010 à 10:59 -0400, Jeremy Fitzhardinge a > écrit : >> From: Jeremy Fitzhardinge >> >> If we don't need to use a locked inc for unlock, then implement it in C. >> >> Signed-off-by: Jeremy Fitzhardinge >> --- >> arch/x86/include/asm/spinlock.h | 33 ++--- >> 1 files changed, 18 insertions(+), 15 deletions(-) >> >> diff --git a/arch/x86/include/asm/spinlock.h >> b/arch/x86/include/asm/spinlock.h >> index 6711d36..082990a 100644 >> --- a/arch/x86/include/asm/spinlock.h >> +++ b/arch/x86/include/asm/spinlock.h >> @@ -33,9 +33,23 @@ >> * On PPro SMP or if we are using OOSTORE, we use a locked operation to >> unlock >> * (PPro errata 66, 92) >> */ >> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX >> +static __always_inline void __ticket_unlock_release(struct arch_spinlock >> *lock) >> +{ >> +if (sizeof(lock->tickets.head) == sizeof(u8)) >> +asm (LOCK_PREFIX "incb %0" >> + : "+m" (lock->tickets.head) : : "memory"); >> +else >> +asm (LOCK_PREFIX "incw %0" >> + : "+m" (lock->tickets.head) : : "memory"); >> + >> +} >> #else >> -# define UNLOCK_LOCK_PREFIX >> +static __always_inline void __ticket_unlock_release(struct arch_spinlock >> *lock) >> +{ >> +barrier(); > technically speaking, it should be : smp_wmb() Perhaps. In practise it won't make a difference because it is defined as barrier() unless OOSTORE is defined, in which case we need to do a locked increment anyway. Thanks, J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 18/20] x86/ticketlock: remove .slock
From: Jeremy Fitzhardinge Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock_types.h |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index b396ed5..def8010 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -20,7 +20,6 @@ typedef u32 __ticketpair_t; typedef struct arch_spinlock { union { - unsigned int slock; __ticketpair_t ticketpair; struct __raw_tickets { __ticket_t head, tail; @@ -31,7 +30,7 @@ typedef struct arch_spinlock { #endif } arch_spinlock_t; -#define __ARCH_SPIN_LOCK_UNLOCKED { { .slock = 0 } } +#define __ARCH_SPIN_LOCK_UNLOCKED { { .tickets = {0, 0} } } typedef struct { unsigned int lock; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 02/20] x86/ticketlock: convert spin loop to C
On 11/03/2010 11:11 AM, Eric Dumazet wrote: > Le mercredi 03 novembre 2010 à 10:59 -0400, Jeremy Fitzhardinge a > écrit : >> From: Jeremy Fitzhardinge >> >> The inner loop of __ticket_spin_lock isn't doing anything very special, >> so reimplement it in C. >> >> For the 8 bit ticket lock variant, we use a register union to get direct >> access to the lower and upper bytes in the tickets, but unfortunately gcc >> won't generate a direct comparison between the two halves of the register, >> so the generated asm isn't quite as pretty as the hand-coded version. >> However benchmarking shows that this is actually a small improvement in >> runtime performance on some benchmarks, and never a slowdown. >> >> We also need to make sure there's a barrier at the end of the lock loop >> to make sure that the compiler doesn't move any instructions from within >> the locked region into the region where we don't yet own the lock. >> >> Signed-off-by: Jeremy Fitzhardinge >> --- >> arch/x86/include/asm/spinlock.h | 58 >> +++--- >> 1 files changed, 29 insertions(+), 29 deletions(-) >> >> diff --git a/arch/x86/include/asm/spinlock.h >> b/arch/x86/include/asm/spinlock.h >> index d6d5784..6711d36 100644 >> --- a/arch/x86/include/asm/spinlock.h >> +++ b/arch/x86/include/asm/spinlock.h >> @@ -58,21 +58,21 @@ >> #if (NR_CPUS < 256) >> static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) >> { >> -unsigned short inc = 1 << TICKET_SHIFT; >> - >> -asm volatile ( >> -LOCK_PREFIX "xaddw %w0, %1\n" >> -"1:\t" >> -"cmpb %h0, %b0\n\t" >> -"je 2f\n\t" >> -"rep ; nop\n\t" >> -"movb %1, %b0\n\t" >> -/* don't need lfence here, because loads are in-order */ >> -"jmp 1b\n" >> -"2:" >> -: "+Q" (inc), "+m" (lock->slock) >> -: >> -: "memory", "cc"); >> +register union { >> +struct __raw_tickets tickets; >> +unsigned short slock; >> +} inc = { .slock = 1 << TICKET_SHIFT }; >> + >> +asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" >> + : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc"); >> + >> +for (;;) { >> +if (inc.tickets.head == inc.tickets.tail) >> +return; >> +cpu_relax(); >> +inc.tickets.head = ACCESS_ONCE(lock->tickets.head); >> +} >> +barrier(); /* make sure nothing creeps before the lock is >> taken */ > Isnt this barrier() never reached ? Sorry, a later patch makes this clearer. I should have folded it in. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 05/20] x86/ticketlock: make __ticket_spin_lock common
From: Jeremy Fitzhardinge Aside from the particular form of the xadd instruction, they're identical. So factor out the xadd and use common code for the rest. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 42 ++ 1 files changed, 20 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 7586d7a..4f9fa24 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -69,13 +69,27 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) * save some instructions and make the code more elegant. There really isn't * much between them in performance though, especially as locks are out of line. */ -#if (NR_CPUS < 256) -static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) +static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock) { - register struct __raw_tickets inc = { .tail = 1 }; + register struct __raw_tickets tickets = { .tail = 1 }; + + if (sizeof(lock->tickets.head) == sizeof(u8)) + asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" + : "+r" (tickets), "+m" (lock->tickets) + : : "memory", "cc"); + else + asm volatile (LOCK_PREFIX "xaddl %0, %1\n" +: "+r" (tickets), "+m" (lock->tickets) +: : "memory", "cc"); - asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" - : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc"); + return tickets; +} + +static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) +{ + register struct __raw_tickets inc; + + inc = __ticket_spin_claim(lock); for (;;) { if (inc.head == inc.tail) @@ -86,6 +100,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) barrier(); /* make sure nothing creeps before the lock is taken */ } +#if (NR_CPUS < 256) static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { unsigned int tmp, new; @@ -105,23 +120,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return tmp; } #else -static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) -{ - register struct __raw_tickets inc = { .tickets.tail = 1 }; - - asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t" -: "+r" (inc), "+m" (lock->tickets) -: : "memory", "cc"); - - for (;;) { - if (inc.head == inc.tail) - return; - cpu_relax(); - inc.head = ACCESS_ONCE(lock->tickets.head); - } - barrier(); /* make sure nothing creeps before the lock is taken */ -} - static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { unsigned tmp; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 09/20] xen/pvticketlock: Xen implementation for PV ticket locks
From: Jeremy Fitzhardinge Replace the old Xen implementation of PV spinlocks with and implementation of xen_lock_spinning and xen_unlock_kick. xen_lock_spinning simply registers the cpu in its entry in lock_waiting, adds itself to the waiting_cpus set, and blocks on an event channel until the channel becomes pending. xen_unlock_kick searches the cpus in waiting_cpus looking for the one which next wants this lock with the next ticket, if any. If found, it kicks it by making its event channel pending, which wakes it up. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/xen/spinlock.c | 281 ++- 1 files changed, 36 insertions(+), 245 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 7e9eb52..5feb897 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -19,32 +19,21 @@ #ifdef CONFIG_XEN_DEBUG_FS static struct xen_spinlock_stats { - u64 taken; u32 taken_slow; - u32 taken_slow_nested; u32 taken_slow_pickup; u32 taken_slow_spurious; - u32 taken_slow_irqenable; - u64 released; u32 released_slow; u32 released_slow_kicked; #define HISTO_BUCKETS 30 - u32 histo_spin_total[HISTO_BUCKETS+1]; - u32 histo_spin_spinning[HISTO_BUCKETS+1]; u32 histo_spin_blocked[HISTO_BUCKETS+1]; - u64 time_total; - u64 time_spinning; u64 time_blocked; } spinlock_stats; static u8 zero_stats; -static unsigned lock_timeout = 1 << 10; -#define TIMEOUT lock_timeout - static inline void check_zero(void) { if (unlikely(zero_stats)) { @@ -73,22 +62,6 @@ static void __spin_time_accum(u64 delta, u32 *array) array[HISTO_BUCKETS]++; } -static inline void spin_time_accum_spinning(u64 start) -{ - u32 delta = xen_clocksource_read() - start; - - __spin_time_accum(delta, spinlock_stats.histo_spin_spinning); - spinlock_stats.time_spinning += delta; -} - -static inline void spin_time_accum_total(u64 start) -{ - u32 delta = xen_clocksource_read() - start; - - __spin_time_accum(delta, spinlock_stats.histo_spin_total); - spinlock_stats.time_total += delta; -} - static inline void spin_time_accum_blocked(u64 start) { u32 delta = xen_clocksource_read() - start; @@ -105,214 +78,76 @@ static inline u64 spin_time_start(void) return 0; } -static inline void spin_time_accum_total(u64 start) -{ -} -static inline void spin_time_accum_spinning(u64 start) -{ -} static inline void spin_time_accum_blocked(u64 start) { } #endif /* CONFIG_XEN_DEBUG_FS */ -struct xen_spinlock { - unsigned char lock; /* 0 -> free; 1 -> locked */ - unsigned short spinners;/* count of waiting cpus */ +struct xen_lock_waiting { + struct arch_spinlock *lock; + __ticket_t want; }; static DEFINE_PER_CPU(int, lock_kicker_irq) = -1; +static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting); +static cpumask_t waiting_cpus; -#if 0 -static int xen_spin_is_locked(struct arch_spinlock *lock) -{ - struct xen_spinlock *xl = (struct xen_spinlock *)lock; - - return xl->lock != 0; -} - -static int xen_spin_is_contended(struct arch_spinlock *lock) -{ - struct xen_spinlock *xl = (struct xen_spinlock *)lock; - - /* Not strictly true; this is only the count of contended - lock-takers entering the slow path. */ - return xl->spinners != 0; -} - -static int xen_spin_trylock(struct arch_spinlock *lock) -{ - struct xen_spinlock *xl = (struct xen_spinlock *)lock; - u8 old = 1; - - asm("xchgb %b0,%1" - : "+q" (old), "+m" (xl->lock) : : "memory"); - - return old == 0; -} - -static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners); - -/* - * Mark a cpu as interested in a lock. Returns the CPU's previous - * lock of interest, in case we got preempted by an interrupt. - */ -static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl) -{ - struct xen_spinlock *prev; - - prev = __get_cpu_var(lock_spinners); - __get_cpu_var(lock_spinners) = xl; - - wmb(); /* set lock of interest before count */ - - asm(LOCK_PREFIX " incw %0" - : "+m" (xl->spinners) : : "memory"); - - return prev; -} - -/* - * Mark a cpu as no longer interested in a lock. Restores previous - * lock of interest (NULL for none). - */ -static inline void unspinning_lock(struct xen_spinlock *xl, struct xen_spinlock *prev) -{ - asm(LOCK_PREFIX " decw %0" - : "+m" (xl->spinners) : : "memory"); - wmb(); /* decrement count before restoring lock */ - __get_cpu_var(lock_spinners) = prev; -} - -static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable) +stati
[PATCH 19/20] x86/ticketlocks: use overlapping read to eliminate mb()
From: Jeremy Fitzhardinge When reading the 'waiting' counter, use a longer-than-necessary read which also overlaps 'head'. This read is guaranteed to be in-order with respect to and unlock writes to 'head', thereby eliminating the need for an explicit mb() to enforce the read-after-write ordering. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 14 +++--- arch/x86/include/asm/spinlock_types.h | 24 2 files changed, 27 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index c7455e1..a16b6e4 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -86,14 +86,14 @@ static inline void __ticket_sub_waiting(struct arch_spinlock *lock) static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock *lock) { - /* -* Make sure everyone sees the unlock write before we check the -* waiting count. The processor ordering doesn't guarantee this -* because they're different memory locations. + /* +* lock->waiting_head is a union element which aliases both +* 'waiting' and 'head'. Since the memory access overlaps +* 'head', the read is forced to be in-order with respect to +* unlock writes to 'head', eliminating the need for an +* explicit mb(). (Intel memory ordering rules.) */ - mb(); - - return ACCESS_ONCE(lock->waiting) != 0; + return ((__ticket_t)ACCESS_ONCE(lock->waiting_head)) != 0; } #endif /* CONFIG_PARAVIRT_SPINLOCKS */ diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index def8010..307ef0b 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -18,6 +18,24 @@ typedef u32 __ticketpair_t; #define TICKET_SHIFT (sizeof(__ticket_t) * 8) #define TICKET_MASK((1 << TICKET_SHIFT) - 1) +#ifdef CONFIG_PARAVIRT_SPINLOCKS +typedef struct arch_spinlock { + union { + struct { + __ticket_t waiting; + union { + __ticketpair_t ticketpair; + struct __raw_tickets { + __ticket_t head, tail; + } tickets; + }; + }; + __ticketpair_t waiting_head; /* aliases waiting and head */ + }; +} arch_spinlock_t; + +#define __ARCH_SPIN_LOCK_UNLOCKED { { { 0, { 0 } } } } +#else typedef struct arch_spinlock { union { __ticketpair_t ticketpair; @@ -25,12 +43,10 @@ typedef struct arch_spinlock { __ticket_t head, tail; } tickets; }; -#ifdef CONFIG_PARAVIRT_SPINLOCKS - __ticket_t waiting; -#endif } arch_spinlock_t; -#define __ARCH_SPIN_LOCK_UNLOCKED { { .tickets = {0, 0} } } +#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } } +#endif typedef struct { unsigned int lock; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 10/20] x86/pvticketlock: keep count of blocked cpus
From: Jeremy Fitzhardinge When a CPU blocks by calling into __ticket_lock_spinning, keep a count in the spinlock. This allows __ticket_lock_kick to more accurately tell whether it has any work to do (in many cases, a spinlock may be contended, but none of the waiters have gone into blocking). This adds two locked instructions to the spinlock slow path (once the lock has already spun for SPIN_THRESHOLD iterations), and adds another one or two bytes to struct arch_spinlock. We need to make sure we increment the waiting counter before doing the last-chance check of the lock to see if we picked it up in the meantime. If we don't then there's a potential deadlock: lock holder lock waiter clear event channel check lock for pickup (did not) release lock check waiting counter (=0, no kick) add waiting counter block (=deadlock) Moving the "add waiting counter earler" avoids the deadlock: lock holder lock waiter clear event channel add waiting counter check lock for pickup (did not) release lock check waiting counter (=1, kick) block (and immediately wake) Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 27 ++- arch/x86/include/asm/spinlock_types.h |3 +++ arch/x86/xen/spinlock.c |4 3 files changed, 33 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index a79dfee..3deabca 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -65,6 +65,31 @@ static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, u { } +static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock *lock) +{ + return false; +} +#else +static inline void __ticket_add_waiting(struct arch_spinlock *lock) +{ + if (sizeof(lock->waiting) == sizeof(u8)) + asm (LOCK_PREFIX "addb $1, %0" : "+m" (lock->waiting) : : "memory"); + else + asm (LOCK_PREFIX "addw $1, %0" : "+m" (lock->waiting) : : "memory"); +} + +static inline void __ticket_sub_waiting(struct arch_spinlock *lock) +{ + if (sizeof(lock->waiting) == sizeof(u8)) + asm (LOCK_PREFIX "subb $1, %0" : "+m" (lock->waiting) : : "memory"); + else + asm (LOCK_PREFIX "subw $1, %0" : "+m" (lock->waiting) : : "memory"); +} + +static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock *lock) +{ + return ACCESS_ONCE(lock->waiting) != 0; +} #endif /* CONFIG_PARAVIRT_SPINLOCKS */ /* @@ -106,7 +131,7 @@ static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spin */ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, __ticket_t next) { - if (unlikely(lock->tickets.tail != next)) + if (unlikely(__ticket_lock_waiters(lock))) ticket_unlock_kick(lock, next); } diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index 48dafc3..b396ed5 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -26,6 +26,9 @@ typedef struct arch_spinlock { __ticket_t head, tail; } tickets; }; +#ifdef CONFIG_PARAVIRT_SPINLOCKS + __ticket_t waiting; +#endif } arch_spinlock_t; #define __ARCH_SPIN_LOCK_UNLOCKED { { .slock = 0 } } diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 5feb897..b8f09d5 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -119,6 +119,8 @@ static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want) /* Only check lock once pending cleared */ barrier(); + __ticket_add_waiting(lock); + /* check again make sure it didn't become free while we weren't looking */ if (ACCESS_ONCE(lock->tickets.head) == want) { @@ -133,6 +135,8 @@ static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want) kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq)); out: + __ticket_sub_waiting(lock); + cpumask_clear_cpu(cpu, &waiting_cpus); w->lock = NULL; spin_time_accum_blocked(start); -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 20/20] x86/ticketlock: rename ticketpair to head_tail
From: Jeremy Fitzhardinge Make it clearer what fields head_tail is actually overlapping with. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h |2 +- arch/x86/include/asm/spinlock_types.h |4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index a16b6e4..6850884 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -174,7 +174,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) new.slock = tmp.slock + (1 << TICKET_SHIFT); - ret = cmpxchg(&lock->ticketpair, tmp.slock, new.slock) == tmp.slock; + ret = cmpxchg(&lock->head_tail, tmp.slock, new.slock) == tmp.slock; barrier(); /* just make nothing creeps before lock is claimed */ return ret; diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index 307ef0b..df031ec 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -24,7 +24,7 @@ typedef struct arch_spinlock { struct { __ticket_t waiting; union { - __ticketpair_t ticketpair; + __ticketpair_t head_tail; struct __raw_tickets { __ticket_t head, tail; } tickets; @@ -38,7 +38,7 @@ typedef struct arch_spinlock { #else typedef struct arch_spinlock { union { - __ticketpair_t ticketpair; + __ticketpair_t head_tail; struct __raw_tickets { __ticket_t head, tail; } tickets; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 14/20] x86/ticketlock: loosen ordering restraints on unlock
From: Jeremy Fitzhardinge It's only necessary to prevent the compiler from reordering code out of the locked region past the unlock. Putting too many barriers in prevents the compiler from generating the best code when PV spinlocks are enabled. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index b9a1aae..d6de5e7 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -46,9 +46,7 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) #else static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) { - barrier(); lock->tickets.head++; - barrier(); } #endif @@ -184,7 +182,11 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { - __ticket_t next = lock->tickets.head + 1; + __ticket_t next; + + barrier(); /* prevent reordering out of locked region */ + + next = lock->tickets.head + 1; __ticket_unlock_release(lock); __ticket_unlock_kick(lock, next); } -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 13/20] x86/pvticketlock: make sure unlock is seen by everyone before checking waiters
From: Jeremy Fitzhardinge If we don't put a real memory barrier between the unlocking increment of the queue head and the check for lock waiters, we can end up with a deadlock as a result of the unload write being reordered with respect to the waiters read. In effect, the check on the waiters count could happen before the unlock, which leads to a deadlock: unlockerlocker check waiters (none) check lock head (timeout) add to waiters block release lock=> deadlock Adding a mb() barrier prevents the unlocker's waiter check from happening before the lock release. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 3deabca..b9a1aae 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -88,6 +88,13 @@ static inline void __ticket_sub_waiting(struct arch_spinlock *lock) static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock *lock) { + /* +* Make sure everyone sees the unlock write before we check the +* waiting count. The processor ordering doesn't guarantee this +* because they're different memory locations. +*/ + mb(); + return ACCESS_ONCE(lock->waiting) != 0; } #endif /* CONFIG_PARAVIRT_SPINLOCKS */ -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 02/20] x86/ticketlock: convert spin loop to C
From: Jeremy Fitzhardinge The inner loop of __ticket_spin_lock isn't doing anything very special, so reimplement it in C. For the 8 bit ticket lock variant, we use a register union to get direct access to the lower and upper bytes in the tickets, but unfortunately gcc won't generate a direct comparison between the two halves of the register, so the generated asm isn't quite as pretty as the hand-coded version. However benchmarking shows that this is actually a small improvement in runtime performance on some benchmarks, and never a slowdown. We also need to make sure there's a barrier at the end of the lock loop to make sure that the compiler doesn't move any instructions from within the locked region into the region where we don't yet own the lock. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 58 +++--- 1 files changed, 29 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index d6d5784..6711d36 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -58,21 +58,21 @@ #if (NR_CPUS < 256) static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - unsigned short inc = 1 << TICKET_SHIFT; - - asm volatile ( - LOCK_PREFIX "xaddw %w0, %1\n" - "1:\t" - "cmpb %h0, %b0\n\t" - "je 2f\n\t" - "rep ; nop\n\t" - "movb %1, %b0\n\t" - /* don't need lfence here, because loads are in-order */ - "jmp 1b\n" - "2:" - : "+Q" (inc), "+m" (lock->slock) - : - : "memory", "cc"); + register union { + struct __raw_tickets tickets; + unsigned short slock; + } inc = { .slock = 1 << TICKET_SHIFT }; + + asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" + : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc"); + + for (;;) { + if (inc.tickets.head == inc.tickets.tail) + return; + cpu_relax(); + inc.tickets.head = ACCESS_ONCE(lock->tickets.head); + } + barrier(); /* make sure nothing creeps before the lock is taken */ } static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) @@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { unsigned inc = 1 << TICKET_SHIFT; - unsigned tmp; + __ticket_t tmp; - asm volatile(LOCK_PREFIX "xaddl %0, %1\n" -"movzwl %w0, %2\n\t" -"shrl $16, %0\n\t" -"1:\t" -"cmpl %0, %2\n\t" -"je 2f\n\t" -"rep ; nop\n\t" -"movzwl %1, %2\n\t" -/* don't need lfence here, because loads are in-order */ -"jmp 1b\n" -"2:" -: "+r" (inc), "+m" (lock->slock), "=&r" (tmp) -: -: "memory", "cc"); + asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t" +: "+r" (inc), "+m" (lock->slock) +: : "memory", "cc"); + + tmp = inc; + inc >>= TICKET_SHIFT; + + for (;;) { + if ((__ticket_t)inc == tmp) + return; + cpu_relax(); + tmp = ACCESS_ONCE(lock->tickets.head); + } + barrier(); /* make sure nothing creeps before the lock is taken */ } static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 07/20] x86/spinlocks: replace pv spinlocks with pv ticketlocks
From: Jeremy Fitzhardinge Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/paravirt.h | 30 +++--- arch/x86/include/asm/paravirt_types.h |8 + arch/x86/include/asm/spinlock.h | 44 +++-- arch/x86/kernel/paravirt-spinlocks.c | 15 +- arch/x86/xen/spinlock.c |7 - 5 files changed, 50 insertions(+), 54 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 5653f43..79c77e5 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -722,36 +722,14 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS) -static inline int arch_spin_is_locked(struct arch_spinlock *lock) +static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned ticket) { - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock); + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); } -static inline int arch_spin_is_contended(struct arch_spinlock *lock) +static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned ticket) { - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock); -} -#define arch_spin_is_contended arch_spin_is_contended - -static __always_inline void arch_spin_lock(struct arch_spinlock *lock) -{ - PVOP_VCALL1(pv_lock_ops.spin_lock, lock); -} - -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock, - unsigned long flags) -{ - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags); -} - -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock) -{ - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock); -} - -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock) -{ - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock); + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); } #endif diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index db9ef55..968ee31 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -316,12 +316,8 @@ struct pv_mmu_ops { struct arch_spinlock; struct pv_lock_ops { - int (*spin_is_locked)(struct arch_spinlock *lock); - int (*spin_is_contended)(struct arch_spinlock *lock); - void (*spin_lock)(struct arch_spinlock *lock); - void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long flags); - int (*spin_trylock)(struct arch_spinlock *lock); - void (*spin_unlock)(struct arch_spinlock *lock); + void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket); + void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket); }; /* This contains all the paravirt structures: we get a convenient diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 507f83c..06815f3 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -52,6 +52,21 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) } #endif +/* How long a lock should spin before we consider blocking */ +#define SPIN_THRESHOLD (1 << 11) + +#ifndef CONFIG_PARAVIRT_SPINLOCKS + +static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned ticket) +{ +} + +static __always_inline void ticket_unlock_kick(struct arch_spi
[PATCH 15/20] x86/ticketlock: prevent compiler reordering into locked region
From: Jeremy Fitzhardinge Add a barrier() at the end of __raw_spin_unlock() to prevent instructions from after the locked region from being reordered into it. In theory doing so shouldn't cause any problems, but in practice, the system locks up under load... Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index d6de5e7..158b330 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -189,6 +189,8 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) next = lock->tickets.head + 1; __ticket_unlock_release(lock); __ticket_unlock_kick(lock, next); + + barrier(); /* prevent reordering into locked region */ } static inline int arch_spin_is_locked(arch_spinlock_t *lock) -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 16/20] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks
From: Jeremy Fitzhardinge The code size expands somewhat, and its probably better to just call a function rather than inline it. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/Kconfig |3 +++ kernel/Kconfig.locks |2 +- 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cea0cd9..e67bf49 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -577,6 +577,9 @@ config PARAVIRT_SPINLOCKS If you are unsure how to answer this question, answer N. +config ARCH_NOINLINE_SPIN_UNLOCK + def_bool PARAVIRT_SPINLOCKS + config PARAVIRT_CLOCK bool diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 88c92fb..3216c22 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE ARCH_INLINE_SPIN_LOCK_IRQSAVE config INLINE_SPIN_UNLOCK - def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) + def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) && !ARCH_NOINLINE_SPIN_UNLOCK config INLINE_SPIN_UNLOCK_BH def_bool !DEBUG_SPINLOCK && ARCH_INLINE_SPIN_UNLOCK_BH -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 17/20] x86/ticketlock: clarify barrier in arch_spin_lock
From: Jeremy Fitzhardinge Make sure the barrier in arch_spin_lock is definitely in the code path. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 158b330..c7455e1 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -151,13 +151,13 @@ static __always_inline void arch_spin_lock(struct arch_spinlock *lock) do { if (inc.head == inc.tail) - return; + goto out; cpu_relax(); inc.head = ACCESS_ONCE(lock->tickets.head); } while (--count); __ticket_lock_spinning(lock, inc.tail); } - barrier(); /* make sure nothing creeps before the lock is taken */ +out: barrier(); /* make sure nothing creeps before the lock is taken */ } static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 12/20] x86/pvticketlock: use callee-save for unlock_kick as well
From: Jeremy Fitzhardinge The unlock code is typically inlined throughout the kernel, so its useful to make sure there's minimal register pressure overhead from the presence of the unlock_tick pvop call. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/paravirt.h |2 +- arch/x86/include/asm/paravirt_types.h |2 +- arch/x86/kernel/paravirt-spinlocks.c |2 +- arch/x86/xen/spinlock.c |3 ++- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index dbd5914..b6ff89a 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -729,7 +729,7 @@ static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned t static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned ticket) { - PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); + PVOP_VCALLEE2(pv_lock_ops.unlock_kick, lock, ticket); } #endif diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 9ad6d088..bb02831 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -317,7 +317,7 @@ struct pv_mmu_ops { struct arch_spinlock; struct pv_lock_ops { struct paravirt_callee_save lock_spinning; - void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket); + struct paravirt_callee_save unlock_kick; }; /* This contains all the paravirt structures: we get a convenient diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c index 4251c1d..efad219 100644 --- a/arch/x86/kernel/paravirt-spinlocks.c +++ b/arch/x86/kernel/paravirt-spinlocks.c @@ -10,7 +10,7 @@ struct pv_lock_ops pv_lock_ops = { #ifdef CONFIG_SMP .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop), - .unlock_kick = paravirt_nop, + .unlock_kick = __PV_IS_CALLEE_SAVE(paravirt_nop), #endif }; EXPORT_SYMBOL(pv_lock_ops); diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 2bd51d8..bf2f409 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -159,6 +159,7 @@ static void xen_unlock_kick(struct arch_spinlock *lock, unsigned next) } } } +PV_CALLEE_SAVE_REGS_THUNK(xen_unlock_kick); static irqreturn_t dummy_handler(int irq, void *dev_id) { @@ -195,7 +196,7 @@ void xen_uninit_lock_cpu(int cpu) void __init xen_init_spinlocks(void) { pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning); - pv_lock_ops.unlock_kick = xen_unlock_kick; + pv_lock_ops.unlock_kick = PV_CALLEE_SAVE(xen_unlock_kick); } #ifdef CONFIG_XEN_DEBUG_FS -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 00/20] x86: ticket lock rewrite and paravirtualization
From: Jeremy Fitzhardinge Hi all, This series does two major things: 1. It converts the bulk of the implementation to C, and makes the "small ticket" and "large ticket" code common. Only the actual size-dependent asm instructions are specific to the ticket size. The resulting generated asm is very similar to the current hand-written code. This results in a very large reduction in lines of code. 2. Get rid of pv spinlocks, and replace them with pv ticket locks. Currently we have the notion of "pv spinlocks" where a pv-ops backend can completely replace the spinlock implementation with a new one. This has two disadvantages: - its a completely separate spinlock implementation, and - there's an extra layer of indirection in front of every spinlock operation. To replace this, this series introduces the notion of pv ticket locks. In this design, the ticket lock fastpath is the standard ticketlock algorithm. However, after an iteration threshold it falls into a slow path which invokes a pv-op to block the spinning CPU. Conversely, on unlock it does the normal unlock, and then checks to see if it needs to do a special "kick" to wake the next CPU. The net result is that the pv-op calls are restricted to the slow paths, and the normal fast-paths are largely unaffected. There are still some overheads, however: - When locking, there's some extra tests to count the spin iterations. There are no extra instructions in the uncontended case though. - When unlocking, there are two ways to detect when it is necessary to kick a blocked CPU: - with an unmodified struct spinlock, it can check to see if head == tail after unlock; if not, then there's someone else trying to lock, and we can do a kick. Unfortunately this generates very high level of redundant kicks, because the waiting CPU might not have blocked yet (which is the common case) - With a struct spinlock modified to include a "waiters" field, to keep count of blocked CPUs, which is a much tighter test. But it does increase the size of each spinlock by 50% (doubled with padding). The series is very fine-grained, and I've left a lightly cleaned up history of the various options I evaluated, since they're not all obvious. Jeremy Fitzhardinge (20): x86/ticketlock: clean up types and accessors x86/ticketlock: convert spin loop to C x86/ticketlock: Use C for __ticket_spin_unlock x86/ticketlock: make large and small ticket versions of spin_lock the same x86/ticketlock: make __ticket_spin_lock common x86/ticketlock: make __ticket_spin_trylock common x86/spinlocks: replace pv spinlocks with pv ticketlocks x86/ticketlock: collapse a layer of functions xen/pvticketlock: Xen implementation for PV ticket locks x86/pvticketlock: keep count of blocked cpus x86/pvticketlock: use callee-save for lock_spinning x86/pvticketlock: use callee-save for unlock_kick as well x86/pvticketlock: make sure unlock is seen by everyone before checking waiters x86/ticketlock: loosen ordering restraints on unlock x86/ticketlock: prevent compiler reordering into locked region x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks x86/ticketlock: clarify barrier in arch_spin_lock x86/ticketlock: remove .slock x86/ticketlocks: use overlapping read to eliminate mb() x86/ticketlock: rename ticketpair to head_tail arch/x86/Kconfig |3 + arch/x86/include/asm/paravirt.h | 30 +--- arch/x86/include/asm/paravirt_types.h |8 +- arch/x86/include/asm/spinlock.h | 250 +++-- arch/x86/include/asm/spinlock_types.h | 41 +- arch/x86/kernel/paravirt-spinlocks.c | 15 +-- arch/x86/xen/spinlock.c | 282 + kernel/Kconfig.locks |2 +- 8 files changed, 221 insertions(+), 410 deletions(-) -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 11/20] x86/pvticketlock: use callee-save for lock_spinning
From: Jeremy Fitzhardinge Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/paravirt.h |2 +- arch/x86/include/asm/paravirt_types.h |2 +- arch/x86/kernel/paravirt-spinlocks.c |2 +- arch/x86/xen/spinlock.c |3 ++- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 79c77e5..dbd5914 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -724,7 +724,7 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned ticket) { - PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); + PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket); } static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned ticket) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 968ee31..9ad6d088 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -316,7 +316,7 @@ struct pv_mmu_ops { struct arch_spinlock; struct pv_lock_ops { - void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket); + struct paravirt_callee_save lock_spinning; void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket); }; diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c index c2e010e..4251c1d 100644 --- a/arch/x86/kernel/paravirt-spinlocks.c +++ b/arch/x86/kernel/paravirt-spinlocks.c @@ -9,7 +9,7 @@ struct pv_lock_ops pv_lock_ops = { #ifdef CONFIG_SMP - .lock_spinning = paravirt_nop, + .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop), .unlock_kick = paravirt_nop, #endif }; diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index b8f09d5..2bd51d8 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -141,6 +141,7 @@ out: w->lock = NULL; spin_time_accum_blocked(start); } +PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning); static void xen_unlock_kick(struct arch_spinlock *lock, unsigned next) { @@ -193,7 +194,7 @@ void xen_uninit_lock_cpu(int cpu) void __init xen_init_spinlocks(void) { - pv_lock_ops.lock_spinning = xen_lock_spinning; + pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning); pv_lock_ops.unlock_kick = xen_unlock_kick; } -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 01/20] x86/ticketlock: clean up types and accessors
From: Jeremy Fitzhardinge A few cleanups to the way spinlocks are defined and accessed: - define __ticket_t which is the size of a spinlock ticket (ie, enough bits to hold all the cpus) - Define struct arch_spinlock as a union containing plain slock and the head and tail tickets - Use head and tail to implement some of the spinlock predicates. - Make all ticket variables unsigned. - Use TICKET_SHIFT to form constants Most of this will be used in later patches. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 24 ++-- arch/x86/include/asm/spinlock_types.h | 20 ++-- 2 files changed, 28 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 3089f70..d6d5784 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -56,11 +56,9 @@ * much between them in performance though, especially as locks are out of line. */ #if (NR_CPUS < 256) -#define TICKET_SHIFT 8 - static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - short inc = 0x0100; + unsigned short inc = 1 << TICKET_SHIFT; asm volatile ( LOCK_PREFIX "xaddw %w0, %1\n" @@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { - int tmp, new; + unsigned int tmp, new; asm volatile("movzwl %2, %0\n\t" "cmpb %h0,%b0\n\t" @@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) : "memory", "cc"); } #else -#define TICKET_SHIFT 16 - static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - int inc = 0x0001; - int tmp; + unsigned inc = 1 << TICKET_SHIFT; + unsigned tmp; asm volatile(LOCK_PREFIX "xaddl %0, %1\n" "movzwl %w0, %2\n\t" @@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { - int tmp; - int new; + unsigned tmp; + unsigned new; asm volatile("movl %2,%0\n\t" "movl %0,%1\n\t" @@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) static inline int __ticket_spin_is_locked(arch_spinlock_t *lock) { - int tmp = ACCESS_ONCE(lock->slock); + struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); - return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1)); + return !!(tmp.tail ^ tmp.head); } static inline int __ticket_spin_is_contended(arch_spinlock_t *lock) { - int tmp = ACCESS_ONCE(lock->slock); + struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); - return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1; + return ((tmp.tail - tmp.head) & TICKET_MASK) > 1; } #ifndef CONFIG_PARAVIRT_SPINLOCKS diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index dcb48b2..4582640 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -5,11 +5,27 @@ # error "please don't include this file directly" #endif +#include + +#if (CONFIG_NR_CPUS < 256) +typedef u8 __ticket_t; +#else +typedef u16 __ticket_t; +#endif + +#define TICKET_SHIFT (sizeof(__ticket_t) * 8) +#define TICKET_MASK((1 << TICKET_SHIFT) - 1) + typedef struct arch_spinlock { - unsigned int slock; + union { + unsigned int slock; + struct __raw_tickets { + __ticket_t head, tail; + } tickets; + }; } arch_spinlock_t; -#define __ARCH_SPIN_LOCK_UNLOCKED { 0 } +#define __ARCH_SPIN_LOCK_UNLOCKED { { .slock = 0 } } typedef struct { unsigned int lock; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 06/20] x86/ticketlock: make __ticket_spin_trylock common
From: Jeremy Fitzhardinge Make trylock code common regardless of ticket size. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 55 +--- arch/x86/include/asm/spinlock_types.h |3 ++ 2 files changed, 19 insertions(+), 39 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 4f9fa24..507f83c 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -100,48 +100,25 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) barrier(); /* make sure nothing creeps before the lock is taken */ } -#if (NR_CPUS < 256) static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) { - unsigned int tmp, new; - - asm volatile("movzwl %2, %0\n\t" -"cmpb %h0,%b0\n\t" -"leal 0x100(%" REG_PTR_MODE "0), %1\n\t" -"jne 1f\n\t" -LOCK_PREFIX "cmpxchgw %w1,%2\n\t" -"1:" -"sete %b1\n\t" -"movzbl %b1,%0\n\t" -: "=&a" (tmp), "=&q" (new), "+m" (lock->slock) -: -: "memory", "cc"); - - return tmp; -} -#else -static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) -{ - unsigned tmp; - unsigned new; - - asm volatile("movl %2,%0\n\t" -"movl %0,%1\n\t" -"roll $16, %0\n\t" -"cmpl %0,%1\n\t" -"leal 0x0001(%" REG_PTR_MODE "0), %1\n\t" -"jne 1f\n\t" -LOCK_PREFIX "cmpxchgl %1,%2\n\t" -"1:" -"sete %b1\n\t" -"movzbl %b1,%0\n\t" -: "=&a" (tmp), "=&q" (new), "+m" (lock->slock) -: -: "memory", "cc"); - - return tmp; + union { + struct __raw_tickets tickets; + __ticketpair_t slock; + } tmp, new; + int ret; + + tmp.tickets = ACCESS_ONCE(lock->tickets); + if (tmp.tickets.head != tmp.tickets.tail) + return 0; + + new.slock = tmp.slock + (1 << TICKET_SHIFT); + + ret = cmpxchg(&lock->ticketpair, tmp.slock, new.slock) == tmp.slock; + barrier(); /* just make nothing creeps before lock is claimed */ + + return ret; } -#endif static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) { diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index 4582640..48dafc3 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -9,8 +9,10 @@ #if (CONFIG_NR_CPUS < 256) typedef u8 __ticket_t; +typedef u16 __ticketpair_t; #else typedef u16 __ticket_t; +typedef u32 __ticketpair_t; #endif #define TICKET_SHIFT (sizeof(__ticket_t) * 8) @@ -19,6 +21,7 @@ typedef u16 __ticket_t; typedef struct arch_spinlock { union { unsigned int slock; + __ticketpair_t ticketpair; struct __raw_tickets { __ticket_t head, tail; } tickets; -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 08/20] x86/ticketlock: collapse a layer of functions
From: Jeremy Fitzhardinge Now that the paravirtualization layer doesn't exist at the spinlock level any more, we can collapse the __ticket_ functions into the arch_ functions. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 35 +-- 1 files changed, 5 insertions(+), 30 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 06815f3..a79dfee 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -110,7 +110,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, __t ticket_unlock_kick(lock, next); } -static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) +static __always_inline void arch_spin_lock(struct arch_spinlock *lock) { register struct __raw_tickets inc; @@ -130,7 +130,7 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) barrier(); /* make sure nothing creeps before the lock is taken */ } -static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) +static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) { union { struct __raw_tickets tickets; @@ -150,53 +150,28 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return ret; } -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) +static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { __ticket_t next = lock->tickets.head + 1; __ticket_unlock_release(lock); __ticket_unlock_kick(lock, next); } -static inline int __ticket_spin_is_locked(arch_spinlock_t *lock) +static inline int arch_spin_is_locked(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); return !!(tmp.tail ^ tmp.head); } -static inline int __ticket_spin_is_contended(arch_spinlock_t *lock) +static inline int arch_spin_is_contended(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); return ((tmp.tail - tmp.head) & TICKET_MASK) > 1; } - -static inline int arch_spin_is_locked(arch_spinlock_t *lock) -{ - return __ticket_spin_is_locked(lock); -} - -static inline int arch_spin_is_contended(arch_spinlock_t *lock) -{ - return __ticket_spin_is_contended(lock); -} #define arch_spin_is_contended arch_spin_is_contended -static __always_inline void arch_spin_lock(arch_spinlock_t *lock) -{ - __ticket_spin_lock(lock); -} - -static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) -{ - return __ticket_spin_trylock(lock); -} - -static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) -{ - __ticket_spin_unlock(lock); -} - static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) { -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 03/20] x86/ticketlock: Use C for __ticket_spin_unlock
From: Jeremy Fitzhardinge If we don't need to use a locked inc for unlock, then implement it in C. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 33 ++--- 1 files changed, 18 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 6711d36..082990a 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -33,9 +33,23 @@ * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock * (PPro errata 66, 92) */ -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) +{ + if (sizeof(lock->tickets.head) == sizeof(u8)) + asm (LOCK_PREFIX "incb %0" +: "+m" (lock->tickets.head) : : "memory"); + else + asm (LOCK_PREFIX "incw %0" +: "+m" (lock->tickets.head) : : "memory"); + +} #else -# define UNLOCK_LOCK_PREFIX +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) +{ + barrier(); + lock->tickets.head++; + barrier(); +} #endif /* @@ -93,14 +107,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return tmp; } - -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) -{ - asm volatile(UNLOCK_LOCK_PREFIX "incb %0" -: "+m" (lock->slock) -: -: "memory", "cc"); -} #else static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { @@ -144,15 +150,12 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return tmp; } +#endif static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) { - asm volatile(UNLOCK_LOCK_PREFIX "incw %0" -: "+m" (lock->slock) -: -: "memory", "cc"); + __ticket_unlock_release(lock); } -#endif static inline int __ticket_spin_is_locked(arch_spinlock_t *lock) { -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 04/20] x86/ticketlock: make large and small ticket versions of spin_lock the same
From: Jeremy Fitzhardinge Make the bulk of __ticket_spin_lock look identical for large and small number of cpus. Signed-off-by: Jeremy Fitzhardinge --- arch/x86/include/asm/spinlock.h | 23 --- 1 files changed, 8 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 082990a..7586d7a 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -72,19 +72,16 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock) #if (NR_CPUS < 256) static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - register union { - struct __raw_tickets tickets; - unsigned short slock; - } inc = { .slock = 1 << TICKET_SHIFT }; + register struct __raw_tickets inc = { .tail = 1 }; asm volatile (LOCK_PREFIX "xaddw %w0, %1\n" - : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc"); + : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc"); for (;;) { - if (inc.tickets.head == inc.tickets.tail) + if (inc.head == inc.tail) return; cpu_relax(); - inc.tickets.head = ACCESS_ONCE(lock->tickets.head); + inc.head = ACCESS_ONCE(lock->tickets.head); } barrier(); /* make sure nothing creeps before the lock is taken */ } @@ -110,21 +107,17 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) #else static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { - unsigned inc = 1 << TICKET_SHIFT; - __ticket_t tmp; + register struct __raw_tickets inc = { .tickets.tail = 1 }; asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t" -: "+r" (inc), "+m" (lock->slock) +: "+r" (inc), "+m" (lock->tickets) : : "memory", "cc"); - tmp = inc; - inc >>= TICKET_SHIFT; - for (;;) { - if ((__ticket_t)inc == tmp) + if (inc.head == inc.tail) return; cpu_relax(); - tmp = ACCESS_ONCE(lock->tickets.head); + inc.head = ACCESS_ONCE(lock->tickets.head); } barrier(); /* make sure nothing creeps before the lock is taken */ } -- 1.7.2.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen: xenfs: privcmd: check put_user() return code
On 10/29/2010 10:44 AM, Ian Campbell wrote: > On Fri, 2010-10-29 at 18:18 +0100, Jeremy Fitzhardinge wrote: >> On 10/28/2010 04:39 AM, Vasiliy Kulikov wrote: >>> put_user() may fail. In this case propagate error code from >>> privcmd_ioctl_mmap_batch(). >> Thanks for looking at this. I'm in two minds about this; the existing >> logic is such that these put_users can only fail if something else has >> already failed and its returning an error. I guess it would be useful >> to get an EFAULT if you've got a problem writing back the results. >> >> IanC, any opinion? > Not a strong one. > > Perhaps what we really want in this case is for traverse_pages to return > the total number of callback failures it encountered rather than > aborting after the first failure? > > On the other hand you are correct that gather_array() has already > touched all the pages which we are going to be touching here so how > likely is a new failure at this point anyway? I could think of two cases: the array is mapped RO, so only the writeback fails, or someone changes the mapping under our feet from another thread. J > Ian. > >> Thanks, >> J >> >>> Signed-off-by: Vasiliy Kulikov >>> --- >>> Compile tested. >>> >>> drivers/xen/xenfs/privcmd.c |8 ++-- >>> 1 files changed, 2 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c >>> index f80be7f..2eb04c8 100644 >>> --- a/drivers/xen/xenfs/privcmd.c >>> +++ b/drivers/xen/xenfs/privcmd.c >>> @@ -266,9 +266,7 @@ static int mmap_return_errors(void *data, void *state) >>> xen_pfn_t *mfnp = data; >>> struct mmap_batch_state *st = state; >>> >>> - put_user(*mfnp, st->user++); >>> - >>> - return 0; >>> + return put_user(*mfnp, st->user++); >>> } >>> >>> static struct vm_operations_struct privcmd_vm_ops; >>> @@ -323,10 +321,8 @@ static long privcmd_ioctl_mmap_batch(void __user >>> *udata) >>> up_write(&mm->mmap_sem); >>> >>> if (state.err > 0) { >>> - ret = 0; >>> - >>> state.user = m.arr; >>> - traverse_pages(m.num, sizeof(xen_pfn_t), >>> + ret = traverse_pages(m.num, sizeof(xen_pfn_t), >>>&pagelist, >>>mmap_return_errors, &state); >>> } > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen: xenfs: privcmd: check put_user() return code
On 10/28/2010 04:39 AM, Vasiliy Kulikov wrote: > put_user() may fail. In this case propagate error code from > privcmd_ioctl_mmap_batch(). Thanks for looking at this. I'm in two minds about this; the existing logic is such that these put_users can only fail if something else has already failed and its returning an error. I guess it would be useful to get an EFAULT if you've got a problem writing back the results. IanC, any opinion? Thanks, J > Signed-off-by: Vasiliy Kulikov > --- > Compile tested. > > drivers/xen/xenfs/privcmd.c |8 ++-- > 1 files changed, 2 insertions(+), 6 deletions(-) > > diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c > index f80be7f..2eb04c8 100644 > --- a/drivers/xen/xenfs/privcmd.c > +++ b/drivers/xen/xenfs/privcmd.c > @@ -266,9 +266,7 @@ static int mmap_return_errors(void *data, void *state) > xen_pfn_t *mfnp = data; > struct mmap_batch_state *st = state; > > - put_user(*mfnp, st->user++); > - > - return 0; > + return put_user(*mfnp, st->user++); > } > > static struct vm_operations_struct privcmd_vm_ops; > @@ -323,10 +321,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata) > up_write(&mm->mmap_sem); > > if (state.err > 0) { > - ret = 0; > - > state.user = m.arr; > - traverse_pages(m.num, sizeof(xen_pfn_t), > + ret = traverse_pages(m.num, sizeof(xen_pfn_t), > &pagelist, > mmap_return_errors, &state); > } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: [PATCH] x86/pvclock-xen: zero last_value on resume
On 10/26/2010 10:48 AM, Glauber Costa wrote: > On Tue, 2010-10-26 at 09:59 -0700, Jeremy Fitzhardinge wrote: >> If the guest domain has been suspend/resumed or migrated, then the >> system clock backing the pvclock clocksource may revert to a smaller >> value (ie, can be non-monotonic across the migration/save-restore). >> Make sure we zero last_value in that case so that the domain >> continues to see clock updates. >> >> [ I don't know if kvm needs an analogous fix or not. ] > After migration, save/restore, etc, we issue an ioctl where we tell > the host the last clock value. That (in theory) guarantees monotonicity. > > I am not opposed to this patch in any way, however. Thanks. HPA, do you want to take this, or shall I send it on? Thanks, J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH] x86/pvclock-xen: zero last_value on resume
If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. [ I don't know if kvm needs an analogous fix or not. ] Signed-off-by: Jeremy Fitzhardinge Cc: Stable Kernel Reported-by: Eelco Dolstra Reported-by: Olivier Hanesse Bisected-by: Cédric Schieli Tested-by: Cédric Schieli diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index cd02f32..6226870 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, struct timespec *ts); +void pvclock_resume(void); #endif /* _ASM_X86_PVCLOCK_H */ diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 239427c..a4f07c1 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src) static atomic64_t last_value = ATOMIC64_INIT(0); +void pvclock_resume(void) +{ + atomic64_set(&last_value, 0); +} + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index b2bb5aa..5da5e53 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -426,6 +426,8 @@ void xen_timer_resume(void) { int cpu; + pvclock_resume(); + if (xen_clockevent != &xen_vcpuop_clockevent) return; From 29acbb4e1d93e719250648db1ce8c7a24144fd86 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Mon, 25 Oct 2010 16:53:46 -0700 Subject: [PATCH] x86/pvclock: zero last_value on resume If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by: Jeremy Fitzhardinge Cc: Stable Kernel Reported-by: Eelco Dolstra Reported-by: Olivier Hanesse Bisected-by: Cédric Schieli Tested-by: Cédric Schieli diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index cd02f32..6226870 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, struct timespec *ts); +void pvclock_resume(void); #endif /* _ASM_X86_PVCLOCK_H */ diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 239427c..a4f07c1 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src) static atomic64_t last_value = ATOMIC64_INIT(0); +void pvclock_resume(void) +{ + atomic64_set(&last_value, 0); +} + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index b2bb5aa..5da5e53 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -426,6 +426,8 @@ void xen_timer_resume(void) { int cpu; + pvclock_resume(); + if (xen_clockevent != &xen_vcpuop_clockevent) return; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: xen PV on HVM and initial domain merge in linux-next
Yes all ok. J "Stefano Stabellini" wrote: >On Wed, 20 Oct 2010, Stephen Rothwell wrote: >> Hi Stefano, >> >> [just casting the net a bit wider ...] >> >> On Tue, 19 Oct 2010 18:51:47 +0100 Stefano Stabellini > wrote: >> > >> > I forgot to CC the LKML and linux-next... >> > >> > On Tue, 19 Oct 2010, Stefano Stabellini wrote: >> > > Stephen, >> > > I have two patch series to merge in linux-next: >> > > >> > > PV on HVM: receive interrupts as xen events >> > > xen: initial domain support >> > > >> > > they have all the acked-by needed and are both stable since >several >> > > weeks, however they depend on Konrad's xen-pcifront series and >for this >> > > reason I waited until now to ask for a merge in linux-next. >> > > >> > > Could you please pull: >> > > >> > > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git >linux-next-initial-domain-v4 >> > > >> > > it contains both series rebased on Konrad's pcifront series >merged on >> > > linux-next (warning: it still contains the merge commit of >> > > xen-pcifront-0.8.2 in linux-next). >> > > Let me know if you have any conflicts or if you need me to change >the >> > > branch somehow. >> >> Not following the Xen develpment at all, I would like to have a >positive >> reply from the listed Xen contacts, please, >> > >Sure. >Jeremy? > > >> I do have concerns that this is turning up so late, but I realise >that >> that is mainly due to a misunderstanding on the part of some of the >Xen >> community. >> > >Thank you very much for understanding! > > >> Also, the above tree is based on next-20101019 which means that I >cannot >> use it as is. All the trees merged into linux-next must be base on >some >> other stable tree (almost always Linus' tree). linux-next is rebuilt >> from scratch every day, so I cannot ever include a previous day's >version. >> >> Merging in other stable trees is OK (as long as the other maintainer >is >> aware of that and makes sure that their tree does not reabse). >> >> Basically what you send to me should be what you intend to send to >Linus >> during the next merge window. > >All right. >I merged Jeremy's and Konrad's branches (the ones you just merged on >linux-next) on top of linux 2.6.36 rc8, then I rebased my series on top >of the result. >Please checkout this branch: > >git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git >2.6.36-rc8-initial-domain-v5 > >and let me know if it is suitable, it shouldn't have any merge >conflicts. > >Cheers, > >Stefano > >___ >Virtualization mailing list >Virtualization@lists.linux-foundation.org >https://lists.linux-foundation.org/mailman/listinfo/virtualization -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization