Hello, Michael. On Tue, May 23, 2017 at 02:44:23PM -0500, Michael Bringmann wrote: > On 05/16/2017 10:55 AM, Tejun Heo wrote: > > Hello, Michael. > > > > On Mon, May 15, 2017 at 10:48:04AM -0500, Michael Bringmann wrote: > >>>> --- a/kernel/workqueue.c > >>>> +++ b/kernel/workqueue.c > >>>> @@ -3366,6 +3366,8 @@ static struct worker_pool *get_unbound_pool(const > >>>> struct workqueue_attrs *attrs) > >>>> copy_workqueue_attrs(pool->attrs, attrs); > >>>> pool->node = target_node; > >>>> > >>>> + cpumask_copy(pool->attrs->cpumask, > >>>> cpumask_of(smp_processor_id())); > >>> > >>> What prevents a cpu getting added right here tho? > >> > >> PowerPC has only one control path to add/remove CPUs via DLPAR operations. > >> Even so, the underlying code is protected through multiple locks. > > > > The more I look at the patch, the less sense it seems to make. So, > > whenever we create a new pool, we ignore the requested cpumask and > > override it with the cpumask of the current thread? > > No. As I mentioned previously, the operation/problem occurs within a DLPAR > hotplug add/remove operation. This is happening to a node which previously
But that's what the code is doing. Whenever it creates a new unbound pool, it ends up ignoring the requested cpumask and overwrites it with the cpumask containing self. > did not have any CPUs associated to it -- we are trying to add more resources > to an LPAR / partition. At this point, the cpumask for the node is empty / > zero. > Sorry for not being more clear on this point earlier. ... > > A new unbound workqueue and thus unbound pool can also be created from > > paths outside cpu hotplug, so get_unbound_pool() can race against > > hotplug. Can you please explain the failures that you see in more > > detail? I'm sure your patch works around the issue somehow but it > > doesn't look like the right fix. > > We fill in an empty cpumask field with a guaranteed non-empty value. > I verified that the incoming cpumask in the attrs was zero at this point > preceding the failure. If we proceed without putting in a useful value, > we go to 'wake_up_process()' (kernel/sched/core.c) next to wakeup the new > worker for the new unbound pool. While there, the code runs through > 'select_task_rq()' and invokes cpumask_any() on a copy of the cpumask. > Unfortunately, running that function over an empty/non-initialized cpumask > returns an index beyond the end of the list, resulting shortly thereafter > in an instruction/data fetch exception. > > If you have a suggestion for an alternate non-empty value to use, I would > be happy to try it. Can you please post the backtrace of the problematic worker pool being created (WARN_ON empty cpumask while creating a new pool)? Thanks. -- tejun