subject:"\[PATCH 7\/8\] blk\-mq\: create hctx for each present CPU"

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

2017-06-07 Thread Christoph Hellwig

On Wed, Jun 07, 2017 at 03:04:11PM -0700, Omar Sandoval wrote:
> On Sat, Jun 03, 2017 at 04:04:02PM +0200, Christoph Hellwig wrote:
> > Currently we only create hctx for online CPUs, which can lead to a lot
> > of churn due to frequent soft offline / online operations.  Instead
> > allocate one for each present CPU to avoid this and dramatically simplify
> > the code.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> Oh man, this cleanup is great. Did you run blktests on this? block/008
> does a bunch of hotplugging while I/O is running.

I haven't run blktests yet, in fact when I did the work blktests didn't
exist yet.   But thanks for the reminder, I'll run it now.

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

2017-06-07 Thread Ming Lei

On Wed, Jun 07, 2017 at 09:06:59PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 07, 2017 at 05:10:46PM +0800, Ming Lei wrote:
> > One thing not sure is that we may need to handle new CPU hotplug
> > after initialization. Without the CPU hotplug handler, system may
> > not scale well when more CPUs are added to sockets.
> 
> Adding physical CPUs to sockets is a very rare activity, and we
> should not optimize for it.  Taking CPUs that are physically present
> offline and online is the case that is interesting, and that's what
> this patchset changes.

Yeah, I understand. It depends if there are real use cases in which
system can't be rebooted, but sometimes CPUs need to be added or removed
for some reasons. Looks CONFIG_HOTPLUG_CPU addresses this requirement,

>From searching on google, looks there are some for virtualization,
and both Qemu and Vmware support that.

> 
> > Another thing is that I don't see how NVMe handles this situation,
> > blk_mq_update_nr_hw_queues() is called in nvme_reset_work(), so
> > that means RESET need to be triggered after new CPUs are added to
> > system?
> 
> Yes.

Unfortunately I don't see the Reset comes when new CPU cores becomes
online.

> 
> > I have tried to add new CPUs runtime on Qemu, and not see
> > new hw queues are added no matter this patchset is applied or not.
> 
> Do you even see the CPUs in your VM?  For physical hotplug you'll
> need to reserve spots in the cpumap beforehand.

Yes, I can add the new CPUs in console of Qemu, and these CPUs become
present first in VM, then switched online via command line after the hotplug,
but number of hw queues doesn't change.

Please see the following log:

root@ming:/sys/kernel/debug/block# echo "before adding one CPU"
before adding one CPU
root@ming:/sys/kernel/debug/block#
root@ming:/sys/kernel/debug/block# lscpu | head -n 10
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):4
On-line CPU(s) list:   0-3
Thread(s) per core:1
Core(s) per socket:2
Socket(s): 2
NUMA node(s):  1
Vendor ID: GenuineIntel
root@ming:/sys/kernel/debug/block# ls nvme0n1/
hctx0  hctx1  hctx2  hctx3  poll_stat  requeue_list  state
root@ming:/sys/kernel/debug/block#
root@ming:/sys/kernel/debug/block# echo "one CPU will be added"
one CPU will be added
root@ming:/sys/kernel/debug/block#
device: 'cpu4': device_add
bus: 'cpu': add device cpu4
PM: Adding info for cpu:cpu4
CPU4 has been hot-added
bus: 'cpu': driver_probe_device: matched device cpu4 with driver processor
bus: 'cpu': really_probe: probing driver processor with device cpu4
processor cpu4: no default pinctrl state
devices_kset: Moving cpu4 to end of list
driver: 'processor': driver_bound: bound to device 'cpu4'
bus: 'cpu': really_probe: bound device cpu4 to driver processor

root@ming:/sys/kernel/debug/block# echo 1 > /sys/devices/system/cpu/cpu4/online
smpboot: Booting Node 0 Processor 4 APIC 0x4/sys/devices/system/cpu/cpu4/online
kvm-clock: cpu 4, msr 2:7ff5e101, secondary cpu clock
TSC ADJUST compensate: CPU4 observed 349440008297 warp. Adjust: 2147483647
TSC ADJUST compensate: CPU4 observed 347292524597 warp. Adjust: 2147483647
TSC synchronization [CPU#1 -> CPU#4]:
Measured 347292524561 cycles TSC warp between CPUs, turning off TSC clock.
tsc: Marking TSC unstable due to check_tsc_sync_source failed
KVM setup async PF for cpu 4
kvm-stealtime: cpu 4, msr 27fd0d900
Will online and init hotplugged CPU: 4
device: 'cooling_device4': device_add
PM: Adding info for No Bus:cooling_device4
device: 'cache': device_add
PM: Adding info for No Bus:cache
device: 'index0': device_add
PM: Adding info for No Bus:index0
device: 'index1': device_add
PM: Adding info for No Bus:index1
device: 'index2': device_add
PM: Adding info for No Bus:index2
device: 'index3': device_add
PM: Adding info for No Bus:index3
device: 'machinecheck4': device_add
bus: 'machinecheck': add device machinecheck4
PM: Adding info for machinecheck:machinecheck4
root@ming:/sys/kernel/debug/block# lscpu | head -n 10
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):5
On-line CPU(s) list:   0-4
Thread(s) per core:1
Core(s) per socket:1
Socket(s): 3
NUMA node(s):  1
Vendor ID: GenuineIntel
root@ming:/sys/kernel/debug/block# ls nvme0n1/
hctx0  hctx1  hctx2  hctx3  poll_stat  requeue_list  state

Thanks,
Ming

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

2017-06-07 Thread Omar Sandoval

On Sat, Jun 03, 2017 at 04:04:02PM +0200, Christoph Hellwig wrote:
> Currently we only create hctx for online CPUs, which can lead to a lot
> of churn due to frequent soft offline / online operations.  Instead
> allocate one for each present CPU to avoid this and dramatically simplify
> the code.
> 
> Signed-off-by: Christoph Hellwig 

Oh man, this cleanup is great. Did you run blktests on this? block/008
does a bunch of hotplugging while I/O is running.

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

2017-06-07 Thread Christoph Hellwig

On Wed, Jun 07, 2017 at 05:10:46PM +0800, Ming Lei wrote:
> One thing not sure is that we may need to handle new CPU hotplug
> after initialization. Without the CPU hotplug handler, system may
> not scale well when more CPUs are added to sockets.

Adding physical CPUs to sockets is a very rare activity, and we
should not optimize for it.  Taking CPUs that are physically present
offline and online is the case that is interesting, and that's what
this patchset changes.

> Another thing is that I don't see how NVMe handles this situation,
> blk_mq_update_nr_hw_queues() is called in nvme_reset_work(), so
> that means RESET need to be triggered after new CPUs are added to
> system?

Yes.

> I have tried to add new CPUs runtime on Qemu, and not see
> new hw queues are added no matter this patchset is applied or not.

Do you even see the CPUs in your VM?  For physical hotplug you'll
need to reserve spots in the cpumap beforehand.

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

2017-06-07 Thread Ming Lei

On Sat, Jun 03, 2017 at 04:04:02PM +0200, Christoph Hellwig wrote:
> Currently we only create hctx for online CPUs, which can lead to a lot
> of churn due to frequent soft offline / online operations.  Instead
> allocate one for each present CPU to avoid this and dramatically simplify
> the code.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/blk-mq.c | 120 
> +
>  block/blk-mq.h |   5 --
>  include/linux/cpuhotplug.h |   1 -
>  3 files changed, 11 insertions(+), 115 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 1bcccedcc74f..66ca9a090984 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -37,9 +37,6 @@
>  #include "blk-wbt.h"
>  #include "blk-mq-sched.h"
>  
> -static DEFINE_MUTEX(all_q_mutex);
> -static LIST_HEAD(all_q_list);
> -
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
>  static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync);
> @@ -1966,8 +1963,8 @@ static void blk_mq_init_cpu_queues(struct request_queue 
> *q,
>   INIT_LIST_HEAD(&__ctx->rq_list);
>   __ctx->queue = q;
>  
> - /* If the cpu isn't online, the cpu is mapped to first hctx */
> - if (!cpu_online(i))
> + /* If the cpu isn't present, the cpu is mapped to first hctx */
> + if (!cpu_present(i))
>   continue;
>  
>   hctx = blk_mq_map_queue(q, i);
> @@ -2010,8 +2007,7 @@ static void blk_mq_free_map_and_requests(struct 
> blk_mq_tag_set *set,
>   }
>  }
>  
> -static void blk_mq_map_swqueue(struct request_queue *q,
> -const struct cpumask *online_mask)
> +static void blk_mq_map_swqueue(struct request_queue *q)
>  {
>   unsigned int i, hctx_idx;
>   struct blk_mq_hw_ctx *hctx;
> @@ -2029,13 +2025,11 @@ static void blk_mq_map_swqueue(struct request_queue 
> *q,
>   }
>  
>   /*
> -  * Map software to hardware queues
> +  * Map software to hardware queues.
> +  *
> +  * If the cpu isn't present, the cpu is mapped to first hctx.
>*/
> - for_each_possible_cpu(i) {
> - /* If the cpu isn't online, the cpu is mapped to first hctx */
> - if (!cpumask_test_cpu(i, online_mask))
> - continue;
> -
> + for_each_present_cpu(i) {
>   hctx_idx = q->mq_map[i];
>   /* unmapped hw queue can be remapped after CPU topo changed */
>   if (!set->tags[hctx_idx] &&
> @@ -2321,16 +2315,8 @@ struct request_queue 
> *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>   blk_queue_softirq_done(q, set->ops->complete);
>  
>   blk_mq_init_cpu_queues(q, set->nr_hw_queues);
> -
> - get_online_cpus();
> - mutex_lock(&all_q_mutex);
> -
> - list_add_tail(&q->all_q_node, &all_q_list);
>   blk_mq_add_queue_tag_set(set, q);
> - blk_mq_map_swqueue(q, cpu_online_mask);
> -
> - mutex_unlock(&all_q_mutex);
> - put_online_cpus();
> + blk_mq_map_swqueue(q);
>  
>   if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
>   int ret;
> @@ -2356,18 +2342,12 @@ void blk_mq_free_queue(struct request_queue *q)
>  {
>   struct blk_mq_tag_set   *set = q->tag_set;
>  
> - mutex_lock(&all_q_mutex);
> - list_del_init(&q->all_q_node);
> - mutex_unlock(&all_q_mutex);
> -
>   blk_mq_del_queue_tag_set(q);
> -
>   blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
>  }
>  
>  /* Basically redo blk_mq_init_queue with queue frozen */
> -static void blk_mq_queue_reinit(struct request_queue *q,
> - const struct cpumask *online_mask)
> +static void blk_mq_queue_reinit(struct request_queue *q)
>  {
>   WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
>  
> @@ -2380,76 +2360,12 @@ static void blk_mq_queue_reinit(struct request_queue 
> *q,
>* involves free and re-allocate memory, worthy doing?)
>*/
>  
> - blk_mq_map_swqueue(q, online_mask);
> + blk_mq_map_swqueue(q);
>  
>   blk_mq_sysfs_register(q);
>   blk_mq_debugfs_register_hctxs(q);
>  }
>  
> -/*
> - * New online cpumask which is going to be set in this hotplug event.
> - * Declare this cpumasks as global as cpu-hotplug operation is invoked
> - * one-by-one and dynamically allocating this could result in a failure.
> - */
> -static struct cpumask cpuhp_online_new;
> -
> -static void blk_mq_queue_reinit_work(void)
> -{
> - struct request_queue *q;
> -
> - mutex_lock(&all_q_mutex);
> - /*
> -  * We need to freeze and reinit all existing queues.  Freezing
> -  * involves synchronous wait for an RCU grace period and doing it
> -  * one by one may take a long time.  Start freezing all queues in
> -  * one swoop and then wait for the completions so that freezing can
> -  * take place in parallel.
> -  */
> - list

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

2017-06-04 Thread Sagi Grimberg


Nice cleanup!

Reviewed-by: Sagi Grimberg

[PATCH 7/8] blk-mq: create hctx for each present CPU

2017-06-03 Thread Christoph Hellwig

Currently we only create hctx for online CPUs, which can lead to a lot
of churn due to frequent soft offline / online operations.  Instead
allocate one for each present CPU to avoid this and dramatically simplify
the code.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c | 120 +
 block/blk-mq.h |   5 --
 include/linux/cpuhotplug.h |   1 -
 3 files changed, 11 insertions(+), 115 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1bcccedcc74f..66ca9a090984 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,9 +37,6 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
-static DEFINE_MUTEX(all_q_mutex);
-static LIST_HEAD(all_q_list);
-
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync);
@@ -1966,8 +1963,8 @@ static void blk_mq_init_cpu_queues(struct request_queue 
*q,
INIT_LIST_HEAD(&__ctx->rq_list);
__ctx->queue = q;
 
-   /* If the cpu isn't online, the cpu is mapped to first hctx */
-   if (!cpu_online(i))
+   /* If the cpu isn't present, the cpu is mapped to first hctx */
+   if (!cpu_present(i))
continue;
 
hctx = blk_mq_map_queue(q, i);
@@ -2010,8 +2007,7 @@ static void blk_mq_free_map_and_requests(struct 
blk_mq_tag_set *set,
}
 }
 
-static void blk_mq_map_swqueue(struct request_queue *q,
-  const struct cpumask *online_mask)
+static void blk_mq_map_swqueue(struct request_queue *q)
 {
unsigned int i, hctx_idx;
struct blk_mq_hw_ctx *hctx;
@@ -2029,13 +2025,11 @@ static void blk_mq_map_swqueue(struct request_queue *q,
}
 
/*
-* Map software to hardware queues
+* Map software to hardware queues.
+*
+* If the cpu isn't present, the cpu is mapped to first hctx.
 */
-   for_each_possible_cpu(i) {
-   /* If the cpu isn't online, the cpu is mapped to first hctx */
-   if (!cpumask_test_cpu(i, online_mask))
-   continue;
-
+   for_each_present_cpu(i) {
hctx_idx = q->mq_map[i];
/* unmapped hw queue can be remapped after CPU topo changed */
if (!set->tags[hctx_idx] &&
@@ -2321,16 +2315,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
blk_queue_softirq_done(q, set->ops->complete);
 
blk_mq_init_cpu_queues(q, set->nr_hw_queues);
-
-   get_online_cpus();
-   mutex_lock(&all_q_mutex);
-
-   list_add_tail(&q->all_q_node, &all_q_list);
blk_mq_add_queue_tag_set(set, q);
-   blk_mq_map_swqueue(q, cpu_online_mask);
-
-   mutex_unlock(&all_q_mutex);
-   put_online_cpus();
+   blk_mq_map_swqueue(q);
 
if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
int ret;
@@ -2356,18 +2342,12 @@ void blk_mq_free_queue(struct request_queue *q)
 {
struct blk_mq_tag_set   *set = q->tag_set;
 
-   mutex_lock(&all_q_mutex);
-   list_del_init(&q->all_q_node);
-   mutex_unlock(&all_q_mutex);
-
blk_mq_del_queue_tag_set(q);
-
blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 }
 
 /* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q,
-   const struct cpumask *online_mask)
+static void blk_mq_queue_reinit(struct request_queue *q)
 {
WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
 
@@ -2380,76 +2360,12 @@ static void blk_mq_queue_reinit(struct request_queue *q,
 * involves free and re-allocate memory, worthy doing?)
 */
 
-   blk_mq_map_swqueue(q, online_mask);
+   blk_mq_map_swqueue(q);
 
blk_mq_sysfs_register(q);
blk_mq_debugfs_register_hctxs(q);
 }
 
-/*
- * New online cpumask which is going to be set in this hotplug event.
- * Declare this cpumasks as global as cpu-hotplug operation is invoked
- * one-by-one and dynamically allocating this could result in a failure.
- */
-static struct cpumask cpuhp_online_new;
-
-static void blk_mq_queue_reinit_work(void)
-{
-   struct request_queue *q;
-
-   mutex_lock(&all_q_mutex);
-   /*
-* We need to freeze and reinit all existing queues.  Freezing
-* involves synchronous wait for an RCU grace period and doing it
-* one by one may take a long time.  Start freezing all queues in
-* one swoop and then wait for the completions so that freezing can
-* take place in parallel.
-*/
-   list_for_each_entry(q, &all_q_list, all_q_node)
-   blk_freeze_queue_start(q);
-   list_for_each_entry(q, &all_q_list, all_q_node)
-   blk_mq_freeze_queue_wait(q);
-
-   list_for_each_entry(q, &al

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

[PATCH 7/8] blk-mq: create hctx for each present CPU

7 matches

Site Navigation

Mail list logo

Footer information