Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Jeff Moyer
Tejun Heo  writes:

> Hello, Jens.
>
> On Thu, Dec 06, 2012 at 07:19:34PM +0100, Jens Axboe wrote:
>> We need to expose it. Once the binding is set from the kernel side on a
>> kernel thread, it can't be modified.
>
> That's only if kthread_bind() is used.  Caling set_cpus_allowed_ptr()
> doesn't set PF_THREAD_BOUND and userland can adjust affinity like any
> other tasks.
>
>> Binding either for performance reasons or for ensuring that we
>> explicitly don't run in some places is a very useful feature.
>
> Sure, but I think this is too specific.  Something more generic would
> be much better.  It can be as simple as generating a uevent.

I'm in favor of a more general approach.  For now, I'm still going to
push a patch that at least binds to the proper numa node.  I'll post
that after I've tested it.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Tejun Heo
Hello, Jens.

On Thu, Dec 06, 2012 at 07:19:34PM +0100, Jens Axboe wrote:
> We need to expose it. Once the binding is set from the kernel side on a
> kernel thread, it can't be modified.

That's only if kthread_bind() is used.  Caling set_cpus_allowed_ptr()
doesn't set PF_THREAD_BOUND and userland can adjust affinity like any
other tasks.

> Binding either for performance reasons or for ensuring that we
> explicitly don't run in some places is a very useful feature.

Sure, but I think this is too specific.  Something more generic would
be much better.  It can be as simple as generating a uevent.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Jens Axboe
On 2012-12-06 19:01, Tejun Heo wrote:
> As for the original patch, I think it's a bit too much to expose to
> userland.  It's probably a good idea to bind the flusher to the local
> node but do we really need to expose an interface to let userland
> control the affinity directly?  Do we actually have a use case at
> hand?

We need to expose it. Once the binding is set from the kernel side on a
kernel thread, it can't be modified.

Binding either for performance reasons or for ensuring that we
explicitly don't run in some places is a very useful feature.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Tejun Heo
Hello,

On Thu, Dec 06, 2012 at 01:08:18PM -0500, Jeff Moyer wrote:
> > As for the original patch, I think it's a bit too much to expose to
> > userland.  It's probably a good idea to bind the flusher to the local
> > node but do we really need to expose an interface to let userland
> > control the affinity directly?  Do we actually have a use case at
> > hand?
> 
> Yeah, folks pinning realtime processes to a particular cpu don't want
> the flusher threads interfering with their latency.  I don't have any
> performance numbers on hand to convince you of the benefit, though.

What I don't get is, RT tasks win over bdi flushers every time and I'm
skeptical allowing bdi or not on a particular CPU would make a big
difference on non-RT kernels anyway.  If the use case calls for
stricter isolation, there's isolcpus.  While I can see why someone
might think that they need something like this, I'm not sure it's
actually something necessary.

And, even if it's actually something necessary, I think we'll probably
be better off with adding a mechanism to notify userland of new
kthreads and let userland adjust affinity using the usual mechanism
rather than adding dedicated knobs for each kthread users.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Jeff Moyer
Tejun Heo  writes:

> Hmmm... cpu binding usually is done by kthread_bind() or explicit
> set_cpus_allowed_ptr() by the kthread itself.  The node part of the
> API was added later because there was no way to control where the
> stack is allocated and we often ended up with kthreads which are bound
> to a CPU with stack on a remote node.
>
> I don't know.  @node usually controls memory allocation and it could
> be surprising for it to control cpu binding, especially because most
> kthreads which are bound to CPU[s] require explicit affinity
> management as CPUs go up and down.  I don't know.  Maybe I'm just too
> used to the existing interface.

OK, I can understand this line of reasoning.

> As for the original patch, I think it's a bit too much to expose to
> userland.  It's probably a good idea to bind the flusher to the local
> node but do we really need to expose an interface to let userland
> control the affinity directly?  Do we actually have a use case at
> hand?

Yeah, folks pinning realtime processes to a particular cpu don't want
the flusher threads interfering with their latency.  I don't have any
performance numbers on hand to convince you of the benefit, though.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Tejun Heo
Hello,

On Tue, Dec 04, 2012 at 05:26:26PM -0500, Jeff Moyer wrote:
> I think it's a bit more involved than that.  If you look at
> kthread_create_on_node, the node portion only applies to where the
> memory comes from, it says nothing of scheduling.  To whit:
> 
> /*
>   
>  * root may have changed our (kthreadd's) priority or CPU 
> mask.
>  * The kernel thread should not inherit these properties. 
>   
>  */
> sched_setscheduler_nocheck(create.result, SCHED_NORMAL, 
> );
> set_cpus_allowed_ptr(create.result, cpu_all_mask);
> 
> So, if I were to make the change you suggested, I would be modifying the
> existing behaviour.  The way things stand, I think
> kthread_create_on_node violates the principal of least surprise.  ;-)  I
> would prefer a variant that affected scheduling behaviour as well as
> memory placement.  Tejun, Peter, Ingo, what are your opinions?

Hmmm... cpu binding usually is done by kthread_bind() or explicit
set_cpus_allowed_ptr() by the kthread itself.  The node part of the
API was added later because there was no way to control where the
stack is allocated and we often ended up with kthreads which are bound
to a CPU with stack on a remote node.

I don't know.  @node usually controls memory allocation and it could
be surprising for it to control cpu binding, especially because most
kthreads which are bound to CPU[s] require explicit affinity
management as CPUs go up and down.  I don't know.  Maybe I'm just too
used to the existing interface.

As for the original patch, I think it's a bit too much to expose to
userland.  It's probably a good idea to bind the flusher to the local
node but do we really need to expose an interface to let userland
control the affinity directly?  Do we actually have a use case at
hand?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Tejun Heo
Hello,

On Tue, Dec 04, 2012 at 05:26:26PM -0500, Jeff Moyer wrote:
 I think it's a bit more involved than that.  If you look at
 kthread_create_on_node, the node portion only applies to where the
 memory comes from, it says nothing of scheduling.  To whit:
 
 /*
   
  * root may have changed our (kthreadd's) priority or CPU 
 mask.
  * The kernel thread should not inherit these properties. 
   
  */
 sched_setscheduler_nocheck(create.result, SCHED_NORMAL, 
 param);
 set_cpus_allowed_ptr(create.result, cpu_all_mask);
 
 So, if I were to make the change you suggested, I would be modifying the
 existing behaviour.  The way things stand, I think
 kthread_create_on_node violates the principal of least surprise.  ;-)  I
 would prefer a variant that affected scheduling behaviour as well as
 memory placement.  Tejun, Peter, Ingo, what are your opinions?

Hmmm... cpu binding usually is done by kthread_bind() or explicit
set_cpus_allowed_ptr() by the kthread itself.  The node part of the
API was added later because there was no way to control where the
stack is allocated and we often ended up with kthreads which are bound
to a CPU with stack on a remote node.

I don't know.  @node usually controls memory allocation and it could
be surprising for it to control cpu binding, especially because most
kthreads which are bound to CPU[s] require explicit affinity
management as CPUs go up and down.  I don't know.  Maybe I'm just too
used to the existing interface.

As for the original patch, I think it's a bit too much to expose to
userland.  It's probably a good idea to bind the flusher to the local
node but do we really need to expose an interface to let userland
control the affinity directly?  Do we actually have a use case at
hand?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Jeff Moyer
Tejun Heo t...@kernel.org writes:

 Hmmm... cpu binding usually is done by kthread_bind() or explicit
 set_cpus_allowed_ptr() by the kthread itself.  The node part of the
 API was added later because there was no way to control where the
 stack is allocated and we often ended up with kthreads which are bound
 to a CPU with stack on a remote node.

 I don't know.  @node usually controls memory allocation and it could
 be surprising for it to control cpu binding, especially because most
 kthreads which are bound to CPU[s] require explicit affinity
 management as CPUs go up and down.  I don't know.  Maybe I'm just too
 used to the existing interface.

OK, I can understand this line of reasoning.

 As for the original patch, I think it's a bit too much to expose to
 userland.  It's probably a good idea to bind the flusher to the local
 node but do we really need to expose an interface to let userland
 control the affinity directly?  Do we actually have a use case at
 hand?

Yeah, folks pinning realtime processes to a particular cpu don't want
the flusher threads interfering with their latency.  I don't have any
performance numbers on hand to convince you of the benefit, though.

Cheers,
Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Tejun Heo
Hello,

On Thu, Dec 06, 2012 at 01:08:18PM -0500, Jeff Moyer wrote:
  As for the original patch, I think it's a bit too much to expose to
  userland.  It's probably a good idea to bind the flusher to the local
  node but do we really need to expose an interface to let userland
  control the affinity directly?  Do we actually have a use case at
  hand?
 
 Yeah, folks pinning realtime processes to a particular cpu don't want
 the flusher threads interfering with their latency.  I don't have any
 performance numbers on hand to convince you of the benefit, though.

What I don't get is, RT tasks win over bdi flushers every time and I'm
skeptical allowing bdi or not on a particular CPU would make a big
difference on non-RT kernels anyway.  If the use case calls for
stricter isolation, there's isolcpus.  While I can see why someone
might think that they need something like this, I'm not sure it's
actually something necessary.

And, even if it's actually something necessary, I think we'll probably
be better off with adding a mechanism to notify userland of new
kthreads and let userland adjust affinity using the usual mechanism
rather than adding dedicated knobs for each kthread users.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Jens Axboe
On 2012-12-06 19:01, Tejun Heo wrote:
 As for the original patch, I think it's a bit too much to expose to
 userland.  It's probably a good idea to bind the flusher to the local
 node but do we really need to expose an interface to let userland
 control the affinity directly?  Do we actually have a use case at
 hand?

We need to expose it. Once the binding is set from the kernel side on a
kernel thread, it can't be modified.

Binding either for performance reasons or for ensuring that we
explicitly don't run in some places is a very useful feature.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Tejun Heo
Hello, Jens.

On Thu, Dec 06, 2012 at 07:19:34PM +0100, Jens Axboe wrote:
 We need to expose it. Once the binding is set from the kernel side on a
 kernel thread, it can't be modified.

That's only if kthread_bind() is used.  Caling set_cpus_allowed_ptr()
doesn't set PF_THREAD_BOUND and userland can adjust affinity like any
other tasks.

 Binding either for performance reasons or for ensuring that we
 explicitly don't run in some places is a very useful feature.

Sure, but I think this is too specific.  Something more generic would
be much better.  It can be as simple as generating a uevent.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-06 Thread Jeff Moyer
Tejun Heo t...@kernel.org writes:

 Hello, Jens.

 On Thu, Dec 06, 2012 at 07:19:34PM +0100, Jens Axboe wrote:
 We need to expose it. Once the binding is set from the kernel side on a
 kernel thread, it can't be modified.

 That's only if kthread_bind() is used.  Caling set_cpus_allowed_ptr()
 doesn't set PF_THREAD_BOUND and userland can adjust affinity like any
 other tasks.

 Binding either for performance reasons or for ensuring that we
 explicitly don't run in some places is a very useful feature.

 Sure, but I think this is too specific.  Something more generic would
 be much better.  It can be as simple as generating a uevent.

I'm in favor of a more general approach.  For now, I'm still going to
push a patch that at least binds to the proper numa node.  I'll post
that after I've tested it.

Cheers,
Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jens Axboe
On 2012-12-04 23:26, Jeff Moyer wrote:
> Jens Axboe  writes:
> 
> @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
>   spin_lock_bh(>wb_lock);
>   bdi->wb.task = task;
>   spin_unlock_bh(>wb_lock);
> + mutex_lock(>flusher_cpumask_mutex);
> + ret = set_cpus_allowed_ptr(task,
> + bdi->flusher_cpumask);
> + mutex_unlock(>flusher_cpumask_mutex);

 It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
 of a _node() variant, since the latter could easily be implemented on
 top of the former. But not really a show stopper for the patch...
>>>
>>> Hmm, if it isn't too scary, I might give this a try.
>>
>> Should not be, pretty much just removing the node part of the create
>> struct passed in and making it a cpumask. And for the on_node() case,
>> cpumask_of_ndoe() will do the trick.
> 
> I think it's a bit more involved than that.  If you look at
> kthread_create_on_node, the node portion only applies to where the
> memory comes from, it says nothing of scheduling.  To whit:
> 
> /*
>   
>  * root may have changed our (kthreadd's) priority or CPU 
> mask.
>  * The kernel thread should not inherit these properties. 
>   
>  */
> sched_setscheduler_nocheck(create.result, SCHED_NORMAL, 
> );
> set_cpus_allowed_ptr(create.result, cpu_all_mask);
> 
> So, if I were to make the change you suggested, I would be modifying the
> existing behaviour.  The way things stand, I think
> kthread_create_on_node violates the principal of least surprise.  ;-)  I
> would prefer a variant that affected scheduling behaviour as well as
> memory placement.  Tejun, Peter, Ingo, what are your opinions?

Huh you are right, I completely missed that set_cpus_allowed_ptr() uses
cpu_all_mask and not mask_of_node(node). Doesn't make a lot of sense to
me... And yes, in any case, it definitely is a bad API, not very
logical.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jeff Moyer
Jens Axboe  writes:

 @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
spin_lock_bh(>wb_lock);
bdi->wb.task = task;
spin_unlock_bh(>wb_lock);
 +  mutex_lock(>flusher_cpumask_mutex);
 +  ret = set_cpus_allowed_ptr(task,
 +  bdi->flusher_cpumask);
 +  mutex_unlock(>flusher_cpumask_mutex);
>>>
>>> It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
>>> of a _node() variant, since the latter could easily be implemented on
>>> top of the former. But not really a show stopper for the patch...
>> 
>> Hmm, if it isn't too scary, I might give this a try.
>
> Should not be, pretty much just removing the node part of the create
> struct passed in and making it a cpumask. And for the on_node() case,
> cpumask_of_ndoe() will do the trick.

I think it's a bit more involved than that.  If you look at
kthread_create_on_node, the node portion only applies to where the
memory comes from, it says nothing of scheduling.  To whit:

/*  
 * root may have changed our (kthreadd's) priority or CPU mask.
 * The kernel thread should not inherit these properties.   
 */
sched_setscheduler_nocheck(create.result, SCHED_NORMAL, );
set_cpus_allowed_ptr(create.result, cpu_all_mask);

So, if I were to make the change you suggested, I would be modifying the
existing behaviour.  The way things stand, I think
kthread_create_on_node violates the principal of least surprise.  ;-)  I
would prefer a variant that affected scheduling behaviour as well as
memory placement.  Tejun, Peter, Ingo, what are your opinions?

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Dave Chinner
On Tue, Dec 04, 2012 at 09:42:55AM -0500, Jeff Moyer wrote:
> Dave Chinner  writes:
> 
> > On Mon, Dec 03, 2012 at 01:53:39PM -0500, Jeff Moyer wrote:
> >> +static ssize_t cpu_list_store(struct device *dev,
> >> +  struct device_attribute *attr, const char *buf, size_t count)
> >> +{
> >> +  struct backing_dev_info *bdi = dev_get_drvdata(dev);
> >> +  struct bdi_writeback *wb = >wb;
> >> +  cpumask_var_t newmask;
> >> +  ssize_t ret;
> >> +  struct task_struct *task;
> >> +
> >> +  if (!alloc_cpumask_var(, GFP_KERNEL))
> >> +  return -ENOMEM;
> >> +
> >> +  ret = cpulist_parse(buf, newmask);
> >> +  if (!ret) {
> >> +  spin_lock(>wb_lock);
> >> +  task = wb->task;
> >> +  if (task)
> >> +  get_task_struct(task);
> >> +  spin_unlock(>wb_lock);
> >> +  if (task) {
> >> +  ret = set_cpus_allowed_ptr(task, newmask);
> >> +  put_task_struct(task);
> >> +  }
> >
> > Why is this set here outside the bdi->flusher_cpumask_mutex?
> 
> The cpumask mutex protects updates to bdi->flusher_cpumask, it has
> nothing to do with the call to set_cpus_allowed.  We are protected from
> concurrent calls to cpu_list_store by the sysfs mutex that is taken on
> entry.  I understand that this is non-obvious, and it wouldn't be wrong
> to hold the mutex here.  If you'd like me to do that for clarity, that
> would be ok with me.

At minimum it needs a comment like this otherwise someone is going
to come along and ask "why is that safe?" like I just did. I'd
prefer the code to be obviously consistent to avoid the need for
commenting about the special case, especially when the obviously
correct code is simpler ;)

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jens Axboe
On 2012-12-04 21:23, Jeff Moyer wrote:
> Jens Axboe  writes:
> 
>> On 2012-12-03 19:53, Jeff Moyer wrote:
>>> Hi,
>>>
>>> In realtime environments, it may be desirable to keep the per-bdi
>>> flusher threads from running on certain cpus.  This patch adds a
>>> cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
>>> the flusher threads to the same numa node as the backing device (though
>>> I could be convinced to make it a mask of all cpus to avoid a change in
>>> behaviour).
>>
>> Looks sane, and I think defaulting to the home node is a sane default.
>> One comment:
>>
>>> +   ret = cpulist_parse(buf, newmask);
>>> +   if (!ret) {
>>> +   spin_lock(>wb_lock);
>>> +   task = wb->task;
>>> +   if (task)
>>> +   get_task_struct(task);
>>> +   spin_unlock(>wb_lock);
>>
>> bdi->wb_lock needs to be bh safe. The above should have caused lockdep
>> warnings for you.
> 
> No lockdep complaints.  I'll double check that's enabled (but I usually
> have it enabled...).
> 
>>> @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
>>> spin_lock_bh(>wb_lock);
>>> bdi->wb.task = task;
>>> spin_unlock_bh(>wb_lock);
>>> +   mutex_lock(>flusher_cpumask_mutex);
>>> +   ret = set_cpus_allowed_ptr(task,
>>> +   bdi->flusher_cpumask);
>>> +   mutex_unlock(>flusher_cpumask_mutex);
>>
>> It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
>> of a _node() variant, since the latter could easily be implemented on
>> top of the former. But not really a show stopper for the patch...
> 
> Hmm, if it isn't too scary, I might give this a try.

Should not be, pretty much just removing the node part of the create
struct passed in and making it a cpumask. And for the on_node() case,
cpumask_of_ndoe() will do the trick.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jeff Moyer
Jens Axboe  writes:

> On 2012-12-03 19:53, Jeff Moyer wrote:
>> Hi,
>> 
>> In realtime environments, it may be desirable to keep the per-bdi
>> flusher threads from running on certain cpus.  This patch adds a
>> cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
>> the flusher threads to the same numa node as the backing device (though
>> I could be convinced to make it a mask of all cpus to avoid a change in
>> behaviour).
>
> Looks sane, and I think defaulting to the home node is a sane default.
> One comment:
>
>> +ret = cpulist_parse(buf, newmask);
>> +if (!ret) {
>> +spin_lock(>wb_lock);
>> +task = wb->task;
>> +if (task)
>> +get_task_struct(task);
>> +spin_unlock(>wb_lock);
>
> bdi->wb_lock needs to be bh safe. The above should have caused lockdep
> warnings for you.

No lockdep complaints.  I'll double check that's enabled (but I usually
have it enabled...).

>> @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
>>  spin_lock_bh(>wb_lock);
>>  bdi->wb.task = task;
>>  spin_unlock_bh(>wb_lock);
>> +mutex_lock(>flusher_cpumask_mutex);
>> +ret = set_cpus_allowed_ptr(task,
>> +bdi->flusher_cpumask);
>> +mutex_unlock(>flusher_cpumask_mutex);
>
> It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
> of a _node() variant, since the latter could easily be implemented on
> top of the former. But not really a show stopper for the patch...

Hmm, if it isn't too scary, I might give this a try.

Thanks!
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jens Axboe
On 2012-12-03 19:53, Jeff Moyer wrote:
> Hi,
> 
> In realtime environments, it may be desirable to keep the per-bdi
> flusher threads from running on certain cpus.  This patch adds a
> cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
> the flusher threads to the same numa node as the backing device (though
> I could be convinced to make it a mask of all cpus to avoid a change in
> behaviour).

Looks sane, and I think defaulting to the home node is a sane default.
One comment:

> + ret = cpulist_parse(buf, newmask);
> + if (!ret) {
> + spin_lock(>wb_lock);
> + task = wb->task;
> + if (task)
> + get_task_struct(task);
> + spin_unlock(>wb_lock);

bdi->wb_lock needs to be bh safe. The above should have caused lockdep
warnings for you.

> + if (task) {
> + ret = set_cpus_allowed_ptr(task, newmask);
> + put_task_struct(task);
> + }
> + if (ret == 0) {
> + mutex_lock(>flusher_cpumask_mutex);
> + cpumask_copy(bdi->flusher_cpumask, newmask);
> + mutex_unlock(>flusher_cpumask_mutex);
> + ret = count;
> + }
> + }

> @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
>   spin_lock_bh(>wb_lock);
>   bdi->wb.task = task;
>   spin_unlock_bh(>wb_lock);
> + mutex_lock(>flusher_cpumask_mutex);
> + ret = set_cpus_allowed_ptr(task,
> + bdi->flusher_cpumask);
> + mutex_unlock(>flusher_cpumask_mutex);

It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
of a _node() variant, since the latter could easily be implemented on
top of the former. But not really a show stopper for the patch...

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jeff Moyer
Dave Chinner  writes:

> On Mon, Dec 03, 2012 at 01:53:39PM -0500, Jeff Moyer wrote:
>> +static ssize_t cpu_list_store(struct device *dev,
>> +struct device_attribute *attr, const char *buf, size_t count)
>> +{
>> +struct backing_dev_info *bdi = dev_get_drvdata(dev);
>> +struct bdi_writeback *wb = >wb;
>> +cpumask_var_t newmask;
>> +ssize_t ret;
>> +struct task_struct *task;
>> +
>> +if (!alloc_cpumask_var(, GFP_KERNEL))
>> +return -ENOMEM;
>> +
>> +ret = cpulist_parse(buf, newmask);
>> +if (!ret) {
>> +spin_lock(>wb_lock);
>> +task = wb->task;
>> +if (task)
>> +get_task_struct(task);
>> +spin_unlock(>wb_lock);
>> +if (task) {
>> +ret = set_cpus_allowed_ptr(task, newmask);
>> +put_task_struct(task);
>> +}
>
> Why is this set here outside the bdi->flusher_cpumask_mutex?

The cpumask mutex protects updates to bdi->flusher_cpumask, it has
nothing to do with the call to set_cpus_allowed.  We are protected from
concurrent calls to cpu_list_store by the sysfs mutex that is taken on
entry.  I understand that this is non-obvious, and it wouldn't be wrong
to hold the mutex here.  If you'd like me to do that for clarity, that
would be ok with me.

> Also, I'd prefer it named "..._lock" as that is the normal
> convention for such variables. You can tell the type of lock from
> the declaration or the use...

I'm sure I can find counter-examples, but it doesn't really matter to
me.  I'll change it.

>> @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
>>  spin_lock_bh(>wb_lock);
>>  bdi->wb.task = task;
>>  spin_unlock_bh(>wb_lock);
>> +mutex_lock(>flusher_cpumask_mutex);
>> +ret = set_cpus_allowed_ptr(task,
>> +bdi->flusher_cpumask);
>> +mutex_unlock(>flusher_cpumask_mutex);
>
> As it is set under the lock here

It's done under the lock here since we need to keep bdi->flusher_cpumask
from changing during the call to set_cpus_allowed.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jeff Moyer
Dave Chinner da...@fromorbit.com writes:

 On Mon, Dec 03, 2012 at 01:53:39PM -0500, Jeff Moyer wrote:
 +static ssize_t cpu_list_store(struct device *dev,
 +struct device_attribute *attr, const char *buf, size_t count)
 +{
 +struct backing_dev_info *bdi = dev_get_drvdata(dev);
 +struct bdi_writeback *wb = bdi-wb;
 +cpumask_var_t newmask;
 +ssize_t ret;
 +struct task_struct *task;
 +
 +if (!alloc_cpumask_var(newmask, GFP_KERNEL))
 +return -ENOMEM;
 +
 +ret = cpulist_parse(buf, newmask);
 +if (!ret) {
 +spin_lock(bdi-wb_lock);
 +task = wb-task;
 +if (task)
 +get_task_struct(task);
 +spin_unlock(bdi-wb_lock);
 +if (task) {
 +ret = set_cpus_allowed_ptr(task, newmask);
 +put_task_struct(task);
 +}

 Why is this set here outside the bdi-flusher_cpumask_mutex?

The cpumask mutex protects updates to bdi-flusher_cpumask, it has
nothing to do with the call to set_cpus_allowed.  We are protected from
concurrent calls to cpu_list_store by the sysfs mutex that is taken on
entry.  I understand that this is non-obvious, and it wouldn't be wrong
to hold the mutex here.  If you'd like me to do that for clarity, that
would be ok with me.

 Also, I'd prefer it named ..._lock as that is the normal
 convention for such variables. You can tell the type of lock from
 the declaration or the use...

I'm sure I can find counter-examples, but it doesn't really matter to
me.  I'll change it.

 @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
  spin_lock_bh(bdi-wb_lock);
  bdi-wb.task = task;
  spin_unlock_bh(bdi-wb_lock);
 +mutex_lock(bdi-flusher_cpumask_mutex);
 +ret = set_cpus_allowed_ptr(task,
 +bdi-flusher_cpumask);
 +mutex_unlock(bdi-flusher_cpumask_mutex);

 As it is set under the lock here

It's done under the lock here since we need to keep bdi-flusher_cpumask
from changing during the call to set_cpus_allowed.

Cheers,
Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jens Axboe
On 2012-12-03 19:53, Jeff Moyer wrote:
 Hi,
 
 In realtime environments, it may be desirable to keep the per-bdi
 flusher threads from running on certain cpus.  This patch adds a
 cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
 the flusher threads to the same numa node as the backing device (though
 I could be convinced to make it a mask of all cpus to avoid a change in
 behaviour).

Looks sane, and I think defaulting to the home node is a sane default.
One comment:

 + ret = cpulist_parse(buf, newmask);
 + if (!ret) {
 + spin_lock(bdi-wb_lock);
 + task = wb-task;
 + if (task)
 + get_task_struct(task);
 + spin_unlock(bdi-wb_lock);

bdi-wb_lock needs to be bh safe. The above should have caused lockdep
warnings for you.

 + if (task) {
 + ret = set_cpus_allowed_ptr(task, newmask);
 + put_task_struct(task);
 + }
 + if (ret == 0) {
 + mutex_lock(bdi-flusher_cpumask_mutex);
 + cpumask_copy(bdi-flusher_cpumask, newmask);
 + mutex_unlock(bdi-flusher_cpumask_mutex);
 + ret = count;
 + }
 + }

 @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
   spin_lock_bh(bdi-wb_lock);
   bdi-wb.task = task;
   spin_unlock_bh(bdi-wb_lock);
 + mutex_lock(bdi-flusher_cpumask_mutex);
 + ret = set_cpus_allowed_ptr(task,
 + bdi-flusher_cpumask);
 + mutex_unlock(bdi-flusher_cpumask_mutex);

It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
of a _node() variant, since the latter could easily be implemented on
top of the former. But not really a show stopper for the patch...

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jeff Moyer
Jens Axboe jax...@fusionio.com writes:

 On 2012-12-03 19:53, Jeff Moyer wrote:
 Hi,
 
 In realtime environments, it may be desirable to keep the per-bdi
 flusher threads from running on certain cpus.  This patch adds a
 cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
 the flusher threads to the same numa node as the backing device (though
 I could be convinced to make it a mask of all cpus to avoid a change in
 behaviour).

 Looks sane, and I think defaulting to the home node is a sane default.
 One comment:

 +ret = cpulist_parse(buf, newmask);
 +if (!ret) {
 +spin_lock(bdi-wb_lock);
 +task = wb-task;
 +if (task)
 +get_task_struct(task);
 +spin_unlock(bdi-wb_lock);

 bdi-wb_lock needs to be bh safe. The above should have caused lockdep
 warnings for you.

No lockdep complaints.  I'll double check that's enabled (but I usually
have it enabled...).

 @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
  spin_lock_bh(bdi-wb_lock);
  bdi-wb.task = task;
  spin_unlock_bh(bdi-wb_lock);
 +mutex_lock(bdi-flusher_cpumask_mutex);
 +ret = set_cpus_allowed_ptr(task,
 +bdi-flusher_cpumask);
 +mutex_unlock(bdi-flusher_cpumask_mutex);

 It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
 of a _node() variant, since the latter could easily be implemented on
 top of the former. But not really a show stopper for the patch...

Hmm, if it isn't too scary, I might give this a try.

Thanks!
Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jens Axboe
On 2012-12-04 21:23, Jeff Moyer wrote:
 Jens Axboe jax...@fusionio.com writes:
 
 On 2012-12-03 19:53, Jeff Moyer wrote:
 Hi,

 In realtime environments, it may be desirable to keep the per-bdi
 flusher threads from running on certain cpus.  This patch adds a
 cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
 the flusher threads to the same numa node as the backing device (though
 I could be convinced to make it a mask of all cpus to avoid a change in
 behaviour).

 Looks sane, and I think defaulting to the home node is a sane default.
 One comment:

 +   ret = cpulist_parse(buf, newmask);
 +   if (!ret) {
 +   spin_lock(bdi-wb_lock);
 +   task = wb-task;
 +   if (task)
 +   get_task_struct(task);
 +   spin_unlock(bdi-wb_lock);

 bdi-wb_lock needs to be bh safe. The above should have caused lockdep
 warnings for you.
 
 No lockdep complaints.  I'll double check that's enabled (but I usually
 have it enabled...).
 
 @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
 spin_lock_bh(bdi-wb_lock);
 bdi-wb.task = task;
 spin_unlock_bh(bdi-wb_lock);
 +   mutex_lock(bdi-flusher_cpumask_mutex);
 +   ret = set_cpus_allowed_ptr(task,
 +   bdi-flusher_cpumask);
 +   mutex_unlock(bdi-flusher_cpumask_mutex);

 It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
 of a _node() variant, since the latter could easily be implemented on
 top of the former. But not really a show stopper for the patch...
 
 Hmm, if it isn't too scary, I might give this a try.

Should not be, pretty much just removing the node part of the create
struct passed in and making it a cpumask. And for the on_node() case,
cpumask_of_ndoe() will do the trick.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Dave Chinner
On Tue, Dec 04, 2012 at 09:42:55AM -0500, Jeff Moyer wrote:
 Dave Chinner da...@fromorbit.com writes:
 
  On Mon, Dec 03, 2012 at 01:53:39PM -0500, Jeff Moyer wrote:
  +static ssize_t cpu_list_store(struct device *dev,
  +  struct device_attribute *attr, const char *buf, size_t count)
  +{
  +  struct backing_dev_info *bdi = dev_get_drvdata(dev);
  +  struct bdi_writeback *wb = bdi-wb;
  +  cpumask_var_t newmask;
  +  ssize_t ret;
  +  struct task_struct *task;
  +
  +  if (!alloc_cpumask_var(newmask, GFP_KERNEL))
  +  return -ENOMEM;
  +
  +  ret = cpulist_parse(buf, newmask);
  +  if (!ret) {
  +  spin_lock(bdi-wb_lock);
  +  task = wb-task;
  +  if (task)
  +  get_task_struct(task);
  +  spin_unlock(bdi-wb_lock);
  +  if (task) {
  +  ret = set_cpus_allowed_ptr(task, newmask);
  +  put_task_struct(task);
  +  }
 
  Why is this set here outside the bdi-flusher_cpumask_mutex?
 
 The cpumask mutex protects updates to bdi-flusher_cpumask, it has
 nothing to do with the call to set_cpus_allowed.  We are protected from
 concurrent calls to cpu_list_store by the sysfs mutex that is taken on
 entry.  I understand that this is non-obvious, and it wouldn't be wrong
 to hold the mutex here.  If you'd like me to do that for clarity, that
 would be ok with me.

At minimum it needs a comment like this otherwise someone is going
to come along and ask why is that safe? like I just did. I'd
prefer the code to be obviously consistent to avoid the need for
commenting about the special case, especially when the obviously
correct code is simpler ;)

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jeff Moyer
Jens Axboe jax...@fusionio.com writes:

 @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
spin_lock_bh(bdi-wb_lock);
bdi-wb.task = task;
spin_unlock_bh(bdi-wb_lock);
 +  mutex_lock(bdi-flusher_cpumask_mutex);
 +  ret = set_cpus_allowed_ptr(task,
 +  bdi-flusher_cpumask);
 +  mutex_unlock(bdi-flusher_cpumask_mutex);

 It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
 of a _node() variant, since the latter could easily be implemented on
 top of the former. But not really a show stopper for the patch...
 
 Hmm, if it isn't too scary, I might give this a try.

 Should not be, pretty much just removing the node part of the create
 struct passed in and making it a cpumask. And for the on_node() case,
 cpumask_of_ndoe() will do the trick.

I think it's a bit more involved than that.  If you look at
kthread_create_on_node, the node portion only applies to where the
memory comes from, it says nothing of scheduling.  To whit:

/*  
 * root may have changed our (kthreadd's) priority or CPU mask.
 * The kernel thread should not inherit these properties.   
 */
sched_setscheduler_nocheck(create.result, SCHED_NORMAL, param);
set_cpus_allowed_ptr(create.result, cpu_all_mask);

So, if I were to make the change you suggested, I would be modifying the
existing behaviour.  The way things stand, I think
kthread_create_on_node violates the principal of least surprise.  ;-)  I
would prefer a variant that affected scheduling behaviour as well as
memory placement.  Tejun, Peter, Ingo, what are your opinions?

Cheers,
Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-04 Thread Jens Axboe
On 2012-12-04 23:26, Jeff Moyer wrote:
 Jens Axboe jax...@fusionio.com writes:
 
 @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
   spin_lock_bh(bdi-wb_lock);
   bdi-wb.task = task;
   spin_unlock_bh(bdi-wb_lock);
 + mutex_lock(bdi-flusher_cpumask_mutex);
 + ret = set_cpus_allowed_ptr(task,
 + bdi-flusher_cpumask);
 + mutex_unlock(bdi-flusher_cpumask_mutex);

 It'd be very useful if we had a kthread_create_cpu_on_cpumask() instead
 of a _node() variant, since the latter could easily be implemented on
 top of the former. But not really a show stopper for the patch...

 Hmm, if it isn't too scary, I might give this a try.

 Should not be, pretty much just removing the node part of the create
 struct passed in and making it a cpumask. And for the on_node() case,
 cpumask_of_ndoe() will do the trick.
 
 I think it's a bit more involved than that.  If you look at
 kthread_create_on_node, the node portion only applies to where the
 memory comes from, it says nothing of scheduling.  To whit:
 
 /*
   
  * root may have changed our (kthreadd's) priority or CPU 
 mask.
  * The kernel thread should not inherit these properties. 
   
  */
 sched_setscheduler_nocheck(create.result, SCHED_NORMAL, 
 param);
 set_cpus_allowed_ptr(create.result, cpu_all_mask);
 
 So, if I were to make the change you suggested, I would be modifying the
 existing behaviour.  The way things stand, I think
 kthread_create_on_node violates the principal of least surprise.  ;-)  I
 would prefer a variant that affected scheduling behaviour as well as
 memory placement.  Tejun, Peter, Ingo, what are your opinions?

Huh you are right, I completely missed that set_cpus_allowed_ptr() uses
cpu_all_mask and not mask_of_node(node). Doesn't make a lot of sense to
me... And yes, in any case, it definitely is a bad API, not very
logical.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-03 Thread Dave Chinner
On Mon, Dec 03, 2012 at 01:53:39PM -0500, Jeff Moyer wrote:
> Hi,
> 
> In realtime environments, it may be desirable to keep the per-bdi
> flusher threads from running on certain cpus.  This patch adds a
> cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
> the flusher threads to the same numa node as the backing device (though
> I could be convinced to make it a mask of all cpus to avoid a change in
> behaviour).

The default seems reasonable to me.

> Comments, as always, are appreciated.
.

> +static ssize_t cpu_list_store(struct device *dev,
> + struct device_attribute *attr, const char *buf, size_t count)
> +{
> + struct backing_dev_info *bdi = dev_get_drvdata(dev);
> + struct bdi_writeback *wb = >wb;
> + cpumask_var_t newmask;
> + ssize_t ret;
> + struct task_struct *task;
> +
> + if (!alloc_cpumask_var(, GFP_KERNEL))
> + return -ENOMEM;
> +
> + ret = cpulist_parse(buf, newmask);
> + if (!ret) {
> + spin_lock(>wb_lock);
> + task = wb->task;
> + if (task)
> + get_task_struct(task);
> + spin_unlock(>wb_lock);
> + if (task) {
> + ret = set_cpus_allowed_ptr(task, newmask);
> + put_task_struct(task);
> + }

Why is this set here outside the bdi->flusher_cpumask_mutex?

Also, I'd prefer it named "..._lock" as that is the normal
convention for such variables. You can tell the type of lock from
the declaration or the use...



> @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
>   spin_lock_bh(>wb_lock);
>   bdi->wb.task = task;
>   spin_unlock_bh(>wb_lock);
> + mutex_lock(>flusher_cpumask_mutex);
> + ret = set_cpus_allowed_ptr(task,
> + bdi->flusher_cpumask);
> + mutex_unlock(>flusher_cpumask_mutex);

As it is set under the lock here

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-03 Thread Jeff Moyer
Hi,

In realtime environments, it may be desirable to keep the per-bdi
flusher threads from running on certain cpus.  This patch adds a
cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
the flusher threads to the same numa node as the backing device (though
I could be convinced to make it a mask of all cpus to avoid a change in
behaviour).

Comments, as always, are appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer 

--
changes from v1->v2:
- fixed missing free in error path of bdi_init
- fixed up unchecked references to task in the store function

diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 2a9a9ab..68263e0 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct page;
 struct device;
@@ -105,6 +106,9 @@ struct backing_dev_info {
 
struct timer_list laptop_mode_wb_timer;
 
+   cpumask_t *flusher_cpumask; /* used for writeback thread scheduling */
+   struct mutex flusher_cpumask_mutex;
+
 #ifdef CONFIG_DEBUG_FS
struct dentry *debug_dir;
struct dentry *debug_stats;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d3ca2b3..1cc1ff2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
@@ -221,12 +222,61 @@ static ssize_t max_ratio_store(struct device *dev,
 }
 BDI_SHOW(max_ratio, bdi->max_ratio)
 
+static ssize_t cpu_list_store(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t count)
+{
+   struct backing_dev_info *bdi = dev_get_drvdata(dev);
+   struct bdi_writeback *wb = >wb;
+   cpumask_var_t newmask;
+   ssize_t ret;
+   struct task_struct *task;
+
+   if (!alloc_cpumask_var(, GFP_KERNEL))
+   return -ENOMEM;
+
+   ret = cpulist_parse(buf, newmask);
+   if (!ret) {
+   spin_lock(>wb_lock);
+   task = wb->task;
+   if (task)
+   get_task_struct(task);
+   spin_unlock(>wb_lock);
+   if (task) {
+   ret = set_cpus_allowed_ptr(task, newmask);
+   put_task_struct(task);
+   }
+   if (ret == 0) {
+   mutex_lock(>flusher_cpumask_mutex);
+   cpumask_copy(bdi->flusher_cpumask, newmask);
+   mutex_unlock(>flusher_cpumask_mutex);
+   ret = count;
+   }
+   }
+   free_cpumask_var(newmask);
+
+   return ret;
+}
+
+static ssize_t cpu_list_show(struct device *dev,
+   struct device_attribute *attr, char *page)
+{
+   struct backing_dev_info *bdi = dev_get_drvdata(dev);
+   ssize_t ret;
+
+   mutex_lock(>flusher_cpumask_mutex);
+   ret = cpulist_scnprintf(page, PAGE_SIZE-1, bdi->flusher_cpumask);
+   mutex_unlock(>flusher_cpumask_mutex);
+
+   return ret;
+}
+
 #define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store)
 
 static struct device_attribute bdi_dev_attrs[] = {
__ATTR_RW(read_ahead_kb),
__ATTR_RW(min_ratio),
__ATTR_RW(max_ratio),
+   __ATTR_RW(cpu_list),
__ATTR_NULL,
 };
 
@@ -428,6 +478,7 @@ static int bdi_forker_thread(void *ptr)
writeback_inodes_wb(>wb, 1024,
WB_REASON_FORKER_THREAD);
} else {
+   int ret;
/*
 * The spinlock makes sure we do not lose
 * wake-ups when racing with 'bdi_queue_work()'.
@@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
spin_lock_bh(>wb_lock);
bdi->wb.task = task;
spin_unlock_bh(>wb_lock);
+   mutex_lock(>flusher_cpumask_mutex);
+   ret = set_cpus_allowed_ptr(task,
+   bdi->flusher_cpumask);
+   mutex_unlock(>flusher_cpumask_mutex);
+   if (ret)
+   printk_once("%s: failed to bind flusher"
+   " thread %s, error %d\n",
+   __func__, task->comm, ret);
wake_up_process(task);
}
bdi_clear_pending(bdi);
@@ -509,6 +568,17 @@ int bdi_register(struct backing_dev_info *bdi, struct 
device *parent,
dev_name(dev));
if (IS_ERR(wb->task))
return PTR_ERR(wb->task);
+   } else {
+  

[patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-03 Thread Jeff Moyer
Hi,

In realtime environments, it may be desirable to keep the per-bdi
flusher threads from running on certain cpus.  This patch adds a
cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
the flusher threads to the same numa node as the backing device (though
I could be convinced to make it a mask of all cpus to avoid a change in
behaviour).

Comments, as always, are appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer jmo...@redhat.com

--
changes from v1-v2:
- fixed missing free in error path of bdi_init
- fixed up unchecked references to task in the store function

diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 2a9a9ab..68263e0 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -18,6 +18,7 @@
 #include linux/writeback.h
 #include linux/atomic.h
 #include linux/sysctl.h
+#include linux/mutex.h
 
 struct page;
 struct device;
@@ -105,6 +106,9 @@ struct backing_dev_info {
 
struct timer_list laptop_mode_wb_timer;
 
+   cpumask_t *flusher_cpumask; /* used for writeback thread scheduling */
+   struct mutex flusher_cpumask_mutex;
+
 #ifdef CONFIG_DEBUG_FS
struct dentry *debug_dir;
struct dentry *debug_stats;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d3ca2b3..1cc1ff2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -10,6 +10,7 @@
 #include linux/module.h
 #include linux/writeback.h
 #include linux/device.h
+#include linux/slab.h
 #include trace/events/writeback.h
 
 static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
@@ -221,12 +222,61 @@ static ssize_t max_ratio_store(struct device *dev,
 }
 BDI_SHOW(max_ratio, bdi-max_ratio)
 
+static ssize_t cpu_list_store(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t count)
+{
+   struct backing_dev_info *bdi = dev_get_drvdata(dev);
+   struct bdi_writeback *wb = bdi-wb;
+   cpumask_var_t newmask;
+   ssize_t ret;
+   struct task_struct *task;
+
+   if (!alloc_cpumask_var(newmask, GFP_KERNEL))
+   return -ENOMEM;
+
+   ret = cpulist_parse(buf, newmask);
+   if (!ret) {
+   spin_lock(bdi-wb_lock);
+   task = wb-task;
+   if (task)
+   get_task_struct(task);
+   spin_unlock(bdi-wb_lock);
+   if (task) {
+   ret = set_cpus_allowed_ptr(task, newmask);
+   put_task_struct(task);
+   }
+   if (ret == 0) {
+   mutex_lock(bdi-flusher_cpumask_mutex);
+   cpumask_copy(bdi-flusher_cpumask, newmask);
+   mutex_unlock(bdi-flusher_cpumask_mutex);
+   ret = count;
+   }
+   }
+   free_cpumask_var(newmask);
+
+   return ret;
+}
+
+static ssize_t cpu_list_show(struct device *dev,
+   struct device_attribute *attr, char *page)
+{
+   struct backing_dev_info *bdi = dev_get_drvdata(dev);
+   ssize_t ret;
+
+   mutex_lock(bdi-flusher_cpumask_mutex);
+   ret = cpulist_scnprintf(page, PAGE_SIZE-1, bdi-flusher_cpumask);
+   mutex_unlock(bdi-flusher_cpumask_mutex);
+
+   return ret;
+}
+
 #define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store)
 
 static struct device_attribute bdi_dev_attrs[] = {
__ATTR_RW(read_ahead_kb),
__ATTR_RW(min_ratio),
__ATTR_RW(max_ratio),
+   __ATTR_RW(cpu_list),
__ATTR_NULL,
 };
 
@@ -428,6 +478,7 @@ static int bdi_forker_thread(void *ptr)
writeback_inodes_wb(bdi-wb, 1024,
WB_REASON_FORKER_THREAD);
} else {
+   int ret;
/*
 * The spinlock makes sure we do not lose
 * wake-ups when racing with 'bdi_queue_work()'.
@@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
spin_lock_bh(bdi-wb_lock);
bdi-wb.task = task;
spin_unlock_bh(bdi-wb_lock);
+   mutex_lock(bdi-flusher_cpumask_mutex);
+   ret = set_cpus_allowed_ptr(task,
+   bdi-flusher_cpumask);
+   mutex_unlock(bdi-flusher_cpumask_mutex);
+   if (ret)
+   printk_once(%s: failed to bind flusher
+thread %s, error %d\n,
+   __func__, task-comm, ret);
wake_up_process(task);
}
bdi_clear_pending(bdi);
@@ -509,6 +568,17 @@ int bdi_register(struct backing_dev_info *bdi, struct 

Re: [patch,v2] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-03 Thread Dave Chinner
On Mon, Dec 03, 2012 at 01:53:39PM -0500, Jeff Moyer wrote:
 Hi,
 
 In realtime environments, it may be desirable to keep the per-bdi
 flusher threads from running on certain cpus.  This patch adds a
 cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
 the flusher threads to the same numa node as the backing device (though
 I could be convinced to make it a mask of all cpus to avoid a change in
 behaviour).

The default seems reasonable to me.

 Comments, as always, are appreciated.
.

 +static ssize_t cpu_list_store(struct device *dev,
 + struct device_attribute *attr, const char *buf, size_t count)
 +{
 + struct backing_dev_info *bdi = dev_get_drvdata(dev);
 + struct bdi_writeback *wb = bdi-wb;
 + cpumask_var_t newmask;
 + ssize_t ret;
 + struct task_struct *task;
 +
 + if (!alloc_cpumask_var(newmask, GFP_KERNEL))
 + return -ENOMEM;
 +
 + ret = cpulist_parse(buf, newmask);
 + if (!ret) {
 + spin_lock(bdi-wb_lock);
 + task = wb-task;
 + if (task)
 + get_task_struct(task);
 + spin_unlock(bdi-wb_lock);
 + if (task) {
 + ret = set_cpus_allowed_ptr(task, newmask);
 + put_task_struct(task);
 + }

Why is this set here outside the bdi-flusher_cpumask_mutex?

Also, I'd prefer it named ..._lock as that is the normal
convention for such variables. You can tell the type of lock from
the declaration or the use...



 @@ -437,6 +488,14 @@ static int bdi_forker_thread(void *ptr)
   spin_lock_bh(bdi-wb_lock);
   bdi-wb.task = task;
   spin_unlock_bh(bdi-wb_lock);
 + mutex_lock(bdi-flusher_cpumask_mutex);
 + ret = set_cpus_allowed_ptr(task,
 + bdi-flusher_cpumask);
 + mutex_unlock(bdi-flusher_cpumask_mutex);

As it is set under the lock here

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/