On Wed, 2007-07-25 at 07:29 +1000, Benjamin Herrenschmidt wrote: > On Tue, 2007-07-24 at 14:13 +0200, Andi Kleen wrote: > > Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > > > > > > What a truly putrid patch. I am suspecting that this was a quick > > > > get-you-out-of-trouble thing, which then got forgotten about. > > > > > > > > We have two months to do the "right fix". Please? > > > > > > Working on it... > > > > Ideally the patch would DTRT even on non preemptible kernels, > > aka do cond_resched()s when needed. > > First is to rework the batch structure to make it more manageable. That > is, patch #1 will keep the page list in per-cpu (and thus non-preempt), > but the batch "head" will be on the stack. > > Now, there are two approaches regarding getting rid of the > get_cpu/put_cpu: > > - One is to have a small number of entries for the page list in the > batch structure on the stack, and attempt to gfp' a page for more. If > that fails, we can still free, though with less batching, using only the > few entries in the batch struct itself. That's Hugh initial appraoch > iirc. > > - Another is to hook up with those folks who've been asking for a > notifier that we are being preempted/scheduled out. In this case, I can > happily access the per-cpu list, and just trigger a batch flush if we > happen to be scheduled out. > > I tend to prefer the former solution though, gfp should be fast, and > there is no need to force a flush if we get scheduled out. It would be > rare to hit the worst case scenario of falling back to the few page > heads in the batch itself. On the other hand, that solution has the > problem of bloating the stack a bit (with the few page pointers) even in > the case where I plan to use the extended batch outside of zap_*, such > as fork, mprotect, .... > > So I'll first do patch #1, which will not fix the problem, but will make > the fix easier to fit in, in the meantime, please provide feedback of > your preferred solution for avoiding the get/put_cpu of the 2 above, > unless you find a good 3rd one.
I too would prefer the former solution. I think preemption notifiers are a particular iffy hack. You could perhaps use C99 variable length arrays to avoid the stack waste when not needed, however Andi once told me that generates rather dubious code. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/