On Fri, Apr 12, 2013 at 11:38:04PM -0700, Paul E. McKenney wrote:
> On Fri, Apr 12, 2013 at 04:54:02PM -0700, Josh Triplett wrote:
> > On Fri, Apr 12, 2013 at 04:19:13PM -0700, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" <paul...@linux.vnet.ibm.com>
> > > 
> > > Systems with HZ=100 can have slow bootup times due to the default
> > > three-jiffy delays between quiescent-state forcing attempts.  This
> > > commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based
> > > on the value of HZ.  However, this would break very large systems that
> > > require more time between quiescent-state forcing attempts.  This
> > > commit therefore also ups the default delay by one jiffy for each
> > > 256 CPUs that might be on the system (based off of nr_cpu_ids at
> > > runtime, -not- NR_CPUS at build time).
> > > 
> > > Reported-by: Paul Mackerras <pau...@au1.ibm.com>
> > > Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> > 
> > Something seems very wrong if RCU regularly hits the fqs code during
> > boot; feels like there's some more straightforward solution we're
> > missing.  What causes these CPUs to fall under RCU's scrutiny during
> > boot yet not actually hit the RCU codepaths naturally?
> 
> The problem is that they are running HZ=100, so that RCU will often
> take 30-60 milliseconds per grace period.  At that point, you only
> need 16-30 grace periods to chew up a full second, so it is not all
> that hard to eat up the additional 8-12 seconds of boot time that
> they were seeing.  IIRC, UP boot was costing them 4 seconds.
> 
> For HZ=1000, this would translate to 800ms to 1.2s, which is nowhere
> near as annoying.

That raises two questions, though.  First, who calls synchronize_rcu()
repeatedly during boot, and could they call call_rcu() instead to avoid
blocking for an RCU grace period?  Second, why does RCU need 3-6 jiffies
to resolve a grace period during boot?  That suggests that RCU doesn't
actually resolve a grace period until the force-quiescent-state
machinery kicks in, meaning that the normal quiescent-state mechanism
didn't work.

> > Also, a comment below.
> > 
> > > --- a/kernel/rcutree.h
> > > +++ b/kernel/rcutree.h
> > > @@ -342,7 +342,17 @@ struct rcu_data {
> > >  #define RCU_FORCE_QS             3       /* Need to force quiescent 
> > > state. */
> > >  #define RCU_SIGNAL_INIT          RCU_SAVE_DYNTICK
> > >  
> > > -#define RCU_JIFFIES_TILL_FORCE_QS         3      /* for 
> > > rsp->jiffies_force_qs */
> > > +#if HZ > 500
> > > +#define RCU_JIFFIES_TILL_FORCE_QS         3      /* for 
> > > jiffies_till_first_fqs */
> > > +#elif HZ > 250
> > > +#define RCU_JIFFIES_TILL_FORCE_QS         2
> > > +#else
> > > +#define RCU_JIFFIES_TILL_FORCE_QS         1
> > > +#endif
> > 
> > This seems like it really wants to use a duration calculated directly
> > from HZ; perhaps (HZ/100)?
> 
> Very possibly to the direct calculation, but HZ/100 would get 10 ticks
> delay at HZ=1000, which is too high -- the value of 3 ticks for HZ=1000
> works well.  But I could do something like this:
> 
> #define RCU_JIFFIES_TILL_FORCE_QS (((HZ + 199) / 300) + ((HZ + 199) / 300 ? 0 
> : 1))
> 
> Or maybe a bit better:
> 
> #define RCU_JTFQS_SE ((HZ + 199) / 300)
> #define RCU_JIFFIES_TILL_FORCE_QS (RCU_JTFQS_SE + (RCU_JTFQS_SE ? 0 : 1))
> 
> This would come reasonably close to the values shown above.  Would
> this work for you?

I'd argue that if you need something that complex, you should just
explicitly write it as a step function:

#define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to