On Tue, 2007-07-24 at 14:13 +0200, Andi Kleen wrote: > Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > > > > What a truly putrid patch. I am suspecting that this was a quick > > > get-you-out-of-trouble thing, which then got forgotten about. > > > > > > We have two months to do the "right fix". Please? > > > > Working on it... > > Ideally the patch would DTRT even on non preemptible kernels, > aka do cond_resched()s when needed.
First is to rework the batch structure to make it more manageable. That is, patch #1 will keep the page list in per-cpu (and thus non-preempt), but the batch "head" will be on the stack. Now, there are two approaches regarding getting rid of the get_cpu/put_cpu: - One is to have a small number of entries for the page list in the batch structure on the stack, and attempt to gfp' a page for more. If that fails, we can still free, though with less batching, using only the few entries in the batch struct itself. That's Hugh initial appraoch iirc. - Another is to hook up with those folks who've been asking for a notifier that we are being preempted/scheduled out. In this case, I can happily access the per-cpu list, and just trigger a batch flush if we happen to be scheduled out. I tend to prefer the former solution though, gfp should be fast, and there is no need to force a flush if we get scheduled out. It would be rare to hit the worst case scenario of falling back to the few page heads in the batch itself. On the other hand, that solution has the problem of bloating the stack a bit (with the few page pointers) even in the case where I plan to use the extended batch outside of zap_*, such as fork, mprotect, .... So I'll first do patch #1, which will not fix the problem, but will make the fix easier to fit in, in the meantime, please provide feedback of your preferred solution for avoiding the get/put_cpu of the 2 above, unless you find a good 3rd one. Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/