On Thu, Dec 15, 2016 at 11:21:05PM +0800, Boqun Feng wrote: > There are some places inside RCU core, where we need to iterate all mask > (->qsmask, ->expmask, etc) bits in a leaf node, in order to iterate all > corresponding CPUs. The current code iterates all possible CPUs in this > leaf node and then checks with the mask to see whether the bit is set. > > However, given the fact that most bits in cpu_possible_mask are set but > rare bits in an RCU leaf node mask are set(in other words, ->qsmask and > its friends are usually more sparse than cpu_possible_mask), it's better > to iterate in the other way, that is iterating mask bits in a leaf node. > By doing so, we can save several checks in the loop, moreover, that fast > path checking(e.g. ->qsmask == 0) could then be consolidated into the > loop logic. > > This patch introduce for_each_leaf_node_cpu() to iterate mask bits in a > more efficient way. > > By design, The CPUs whose bits are set in the leaf node masks should be > a subset of possible CPUs, so we don't need extra check with > cpu_possible(), however, a WARN_ON_ONCE() is put to check whether there > are some nasty cases we miss, and we skip that "impossible" CPU in that > case. > > Signed-off-by: Boqun Feng <[email protected]>
Acked-by: Mark Rutland <[email protected]> Thanks, Mark. > --- > kernel/rcu/tree.h | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h > index c0a4bf8f1ed0..b35da5b5dab1 100644 > --- a/kernel/rcu/tree.h > +++ b/kernel/rcu/tree.h > @@ -295,6 +295,25 @@ struct rcu_node { > cpu <= rnp->grphi; \ > cpu = cpumask_next((cpu), cpu_possible_mask)) > > + > +#define MASK_BITS(mask) (BITS_PER_BYTE * sizeof(mask)) > +/* > + * Iterate over all CPUs a leaf RCU node which are still masked in > + * @mask. > + * > + * Note @rnp has to be a leaf node and @mask has to belong to @rnp. And we > + * assume that no CPU is masked in @mask but not set in cpu_possible_mask. > IOW, > + * masks of a leaf node never set a bit for an "impossible" CPU. > + */ > +#define for_each_leaf_node_cpu(rnp, mask, cpu) \ > + for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \ > + (cpu) <= (rnp)->grphi; \ > + (cpu) = (rnp)->grplo + find_next_bit(&(mask), MASK_BITS(mask), \ > + (cpu) - (rnp)->grplo + 1)) \ > + if (WARN_ON_ONCE(!cpu_possible(cpu))) \ > + continue; \ > + else > + > /* > * Union to allow "aggregate OR" operation on the need for a quiescent > * state by the normal and expedited grace periods. > -- > 2.10.2 >

