On Tue, 2013-06-11 at 11:57 -0400, Waiman Long wrote: > This is an interesting patch and I think it is useful for workloads that > run on systems with a large number of CPUs.
I would say it is definitely a fun academic patch, now if it is something for a production environment remains to be seen. > > > diff --git a/arch/x86/include/asm/spinlock_types.h > > b/arch/x86/include/asm/spinlock_types.h > > index ad0ad07..cdaefdd 100644 > > --- a/arch/x86/include/asm/spinlock_types.h > > +++ b/arch/x86/include/asm/spinlock_types.h > > @@ -7,12 +7,18 @@ > > > > #include<linux/types.h> > > > > -#if (CONFIG_NR_CPUS< 256) > > +#if (CONFIG_NR_CPUS< 128) > > typedef u8 __ticket_t; > > typedef u16 __ticketpair_t; > > -#else > > +#define TICKET_T_CMP_GE(a, b) (UCHAR_MAX / 2>= (unsigned char)((a) - (b))) > > +#elif (CONFIG_NR_CPUS< 32768) > > typedef u16 __ticket_t; > > typedef u32 __ticketpair_t; > > +#define TICKET_T_CMP_GE(a, b) (USHRT_MAX / 2>= (unsigned short)((a) - (b))) > > +#else > > +typedef u32 __ticket_t; > > +typedef u64 __ticketpair_t; > > +#define TICKET_T_CMP_GE(a, b) (UINT_MAX / 2>= (unsigned int)((a) - (b))) > > #endif > > It is theoretically possible that a large number of CPUs (says 64 or > more with CONFIG_NR_CPUS < 128) can acquire a ticket from the lock > before the check for TICKET_T_CMP_GE() in tkt_spin_pass(). So the check > will fail even when there is a large number of CPUs contending for the > lock. The chance of this happening is, of course, extremely rare. This > is not an error as the lock is still working as it should be without > your change. Can you explain this more. How can you acquire the ticket and update at the same time? If a queue has been set, then you can't acquire the ticket as the head has a 1 appended to it. > > > > +/* > > + * TKT_Q_SWITCH is twice the number of CPUs that must be spinning on a > > + * given ticket lock to motivate switching to spinning on a queue. > > + * The reason that it is twice the number is because the bottom bit of > > + * the ticket is reserved for the bit that indicates that a queue is > > + * associated with the lock. > > + */ > > +#define TKT_Q_SWITCH (16 * 2) > > + > > +/* > > + * TKT_Q_NQUEUES is the number of queues to maintain. Large systems > > + * might have multiple highly contended locks, so provide more queues for > > + * systems with larger numbers of CPUs. > > + */ > > +#define TKT_Q_NQUEUES (DIV_ROUND_UP(NR_CPUS + TKT_Q_SWITCH - 1, > > TKT_Q_SWITCH) * 2) > > + > > +/* The queues themselves. */ > > +struct tkt_q_head tkt_q_heads[TKT_Q_NQUEUES]; > > I am a bit concern about the size of the head queue table itself. RHEL6, > for example, had defined CONFIG_NR_CPUS to be 4096 which mean a table > size of 256. Maybe it is better to dynamically allocate the table at > init time depending on the actual number of CPUs in the system. Yeah, it can be allocated dynamically at boot. > > > +/* > > + * Return a pointer to the queue header associated with the specified lock, > > + * or return NULL if there is no queue for the lock or if the lock's queue > > + * is in transition. > > + */ > > +static struct tkt_q_head *tkt_q_find_head(arch_spinlock_t *asp) > > +{ > > + int i; > > + int start; > > + > > + start = i = tkt_q_hash(asp); > > + do > > + if (tkt_q_heads[i].ref == asp) > > + return&tkt_q_heads[i]; > > + while ((i = tkt_q_next_slot(i)) != start); > > + return NULL; > > +} > > With a table size of 256 and you have to scan the whole table to find > the right head queue. This can be a significant overhead. I will suggest > setting a limiting of how many entries it scans before it aborts rather > than checking the whole table. We could add a limit, but in practice I'm not sure that would have any issue. I thought the same thing when I first saw this, but to hit most of the list, would require a large collision in the hash algorithm, would could probably be fixed with a better hash. > > > +/* > > + * Hand the lock off to the first CPU on the queue. > > + */ > > +void tkt_q_do_wake(arch_spinlock_t *asp) > > +{ > > + struct tkt_q_head *tqhp; > > + struct tkt_q *tqp; > > + > > + /* If the queue is still being set up, wait for it. */ > > + while ((tqhp = tkt_q_find_head(asp)) == NULL) > > + cpu_relax(); > > + > > + for (;;) { > > + > > + /* Find the first queue element. */ > > + tqp = ACCESS_ONCE(tqhp->spin); > > + if (tqp != NULL) > > + break; /* Element exists, hand off lock. */ > > + if (tkt_q_try_unqueue(asp, tqhp)) > > + return; /* No element, successfully removed queue. */ > > + cpu_relax(); > > + } > > + if (ACCESS_ONCE(tqhp->head_tkt) != -1) > > + ACCESS_ONCE(tqhp->head_tkt) = -1; > > In case NR_CPUS is 32768 or higher, the ticket will be of type u32 and > tqhp->head_tkt is s32. So -1 will be a valid ticket number. You may have > to conditionally define head_tkt to be s64 when the ticket is u32. Good point. > > Do you have any data on how much this patch can actually improve > performance on certain workloads? This will help the discussion here. Yeah, that's come up already in the thread. Linus wants to see hard numbers *and* an explanation of why the contended locks can't be fixed, before he even considers merging this type of change. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/