Nathan Lynch <nath...@linux.ibm.com> writes: > Michael Ellerman <m...@ellerman.id.au> writes: >> Nathan Lynch <nath...@linux.ibm.com> writes: >>> Michael Ellerman <m...@ellerman.id.au> writes: >>>> Nathan Lynch <nath...@linux.ibm.com> writes: >>>>> Laurent Dufour <lduf...@linux.ibm.com> writes: >>>>>> Le 28/07/2020 à 19:37, Nathan Lynch a écrit : >>>>>>> The drmem lmb list can have hundreds of thousands of entries, and >>>>>>> unfortunately lookups take the form of linear searches. As long as >>>>>>> this is the case, traversals have the potential to monopolize the CPU >>>>>>> and provoke lockup reports, workqueue stalls, and the like unless >>>>>>> they explicitly yield. >>>>>>> >>>>>>> Rather than placing cond_resched() calls within various >>>>>>> for_each_drmem_lmb() loop blocks in the code, put it in the iteration >>>>>>> expression of the loop macro itself so users can't omit it. >>>>>> >>>>>> Is that not too much to call cond_resched() on every LMB? >>>>>> >>>>>> Could that be less frequent, every 10, or 100, I don't really know ? >>>>> >>>>> Everything done within for_each_drmem_lmb is relatively heavyweight >>>>> already. E.g. calling dlpar_remove_lmb()/dlpar_add_lmb() can take dozens >>>>> of milliseconds. I don't think cond_resched() is an expensive check in >>>>> this context. >>>> >>>> Hmm, mostly. >>>> >>>> But there are quite a few cases like drmem_update_dt_v1(): >>>> >>>> for_each_drmem_lmb(lmb) { >>>> dr_cell->base_addr = cpu_to_be64(lmb->base_addr); >>>> dr_cell->drc_index = cpu_to_be32(lmb->drc_index); >>>> dr_cell->aa_index = cpu_to_be32(lmb->aa_index); >>>> dr_cell->flags = cpu_to_be32(drmem_lmb_flags(lmb)); >>>> >>>> dr_cell++; >>>> } >>>> >>>> Which will compile to a pretty tight loop at the moment. >>>> >>>> Or drmem_update_dt_v2() which has two loops over all lmbs. >>>> >>>> And although the actual TIF check is cheap the function call to do it is >>>> not free. >>>> >>>> So I worry this is going to make some of those long loops take even >>>> longer. >>> >>> That's fair, and I was wrong - some of the loop bodies are relatively >>> simple, not doing allocations or taking locks, etc. >>> >>> One way to deal is to keep for_each_drmem_lmb() as-is and add a new >>> iterator that can reschedule, e.g. for_each_drmem_lmb_slow(). >> >> If we did that, how many call-sites would need converting? >> Is it ~2 or ~20 or ~200? > > At a glance I would convert 15-20 out of the 24 users in the tree I'm > looking at. Let me know if I should do a v2 with that approach.
OK, that's a bunch of churn then, if we're planning to rework the code significantly in the near future. One thought, which I possibly should not put in writing, is that we could use the alignment of the pointer as a poor man's substitute for a counter, eg: +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb) +{ + if (lmb % PAGE_SIZE == 0) + cond_resched(); + + return ++lmb; +} I think the lmbs are allocated in a block, so I think that will work. Maybe PAGE_SIZE is not the right size to use, but you get the idea. Gross I know, but might be OK as short term solution? cheers