Re: [dm-devel] [PATCH 1/1] block: Convert hd_struct in_flight from atomic to percpu
On 06/28/2017 04:07 PM, Brian King wrote: > On 06/28/2017 04:59 PM, Jens Axboe wrote: >> On 06/28/2017 03:54 PM, Jens Axboe wrote: >>> On 06/28/2017 03:12 PM, Brian King wrote: -static inline int part_in_flight(struct hd_struct *part) +static inline unsigned long part_in_flight(struct hd_struct *part) { - return atomic_read(>in_flight[0]) + atomic_read(>in_flight[1]); + return part_stat_read(part, in_flight[0]) + part_stat_read(part, in_flight[1]); >>> >>> One obvious improvement would be to not do this twice, but only have to >>> loop once. Instead of making this an array, make it a structure with a >>> read and write count. >>> >>> It still doesn't really fix the issue of someone running on a kernel >>> with a ton of possible CPUs configured. But it does reduce the overhead >>> by 50%. >> >> Or something as simple as this: >> >> #define part_stat_read_double(part, field1, field2) \ >> ({ \ >> typeof((part)->dkstats->field1) res = 0;\ >> unsigned int _cpu; \ >> for_each_possible_cpu(_cpu) { \ >> res += per_cpu_ptr((part)->dkstats, _cpu)->field1; \ >> res += per_cpu_ptr((part)->dkstats, _cpu)->field2; \ >> } \ >> res;\ >> }) >> >> static inline unsigned long part_in_flight(struct hd_struct *part) >> { >> return part_stat_read_double(part, in_flight[0], in_flight[1]); >> } >> > > I'll give this a try and also see about running some more exhaustive > runs to see if there are any cases where we go backwards in performance. > > I'll also run with partitions and see how that impacts this. And do something nuts, like setting NR_CPUS to 512 or whatever. What do distros ship with? -- Jens Axboe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 1/1] block: Convert hd_struct in_flight from atomic to percpu
On 06/28/2017 04:59 PM, Jens Axboe wrote: > On 06/28/2017 03:54 PM, Jens Axboe wrote: >> On 06/28/2017 03:12 PM, Brian King wrote: >>> -static inline int part_in_flight(struct hd_struct *part) >>> +static inline unsigned long part_in_flight(struct hd_struct *part) >>> { >>> - return atomic_read(>in_flight[0]) + >>> atomic_read(>in_flight[1]); >>> + return part_stat_read(part, in_flight[0]) + part_stat_read(part, >>> in_flight[1]); >> >> One obvious improvement would be to not do this twice, but only have to >> loop once. Instead of making this an array, make it a structure with a >> read and write count. >> >> It still doesn't really fix the issue of someone running on a kernel >> with a ton of possible CPUs configured. But it does reduce the overhead >> by 50%. > > Or something as simple as this: > > #define part_stat_read_double(part, field1, field2) \ > ({\ > typeof((part)->dkstats->field1) res = 0;\ > unsigned int _cpu; \ > for_each_possible_cpu(_cpu) { \ > res += per_cpu_ptr((part)->dkstats, _cpu)->field1; \ > res += per_cpu_ptr((part)->dkstats, _cpu)->field2; \ > } \ > res;\ > }) > > static inline unsigned long part_in_flight(struct hd_struct *part) > { > return part_stat_read_double(part, in_flight[0], in_flight[1]); > } > I'll give this a try and also see about running some more exhaustive runs to see if there are any cases where we go backwards in performance. I'll also run with partitions and see how that impacts this. Thanks, Brian -- Brian King Power Linux I/O IBM Linux Technology Center -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 1/1] block: Convert hd_struct in_flight from atomic to percpu
On 06/28/2017 04:49 PM, Jens Axboe wrote: > On 06/28/2017 03:12 PM, Brian King wrote: >> This patch converts the in_flight counter in struct hd_struct from a >> pair of atomics to a pair of percpu counters. This eliminates a couple >> of atomics from the hot path. When running this on a Power system, to >> a single null_blk device with 80 submission queues, irq mode 0, with >> 80 fio jobs, I saw IOPs go from 1.5M IO/s to 11.4 IO/s. > > This has been done before, but I've never really liked it. The reason is > that it means that reading the part stat inflight count now has to > iterate over every possible CPU. Did you use partitions in your testing? > How many CPUs were configured? When I last tested this a few years ago I did not use partitions. I was running this on a 4 socket Power 8 machine with 5 cores per socket, running with 4 threads per core, so a total of 80 logical CPUs were usable in Linux. I was missing the fact that part_round_stats_single calls part_in_flight and had only noticed the sysfs and procfs users of part_in_flight previously. -Brian -- Brian King Power Linux I/O IBM Linux Technology Center -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 1/1] block: Convert hd_struct in_flight from atomic to percpu
On 06/28/2017 03:54 PM, Jens Axboe wrote: > On 06/28/2017 03:12 PM, Brian King wrote: >> -static inline int part_in_flight(struct hd_struct *part) >> +static inline unsigned long part_in_flight(struct hd_struct *part) >> { >> -return atomic_read(>in_flight[0]) + >> atomic_read(>in_flight[1]); >> +return part_stat_read(part, in_flight[0]) + part_stat_read(part, >> in_flight[1]); > > One obvious improvement would be to not do this twice, but only have to > loop once. Instead of making this an array, make it a structure with a > read and write count. > > It still doesn't really fix the issue of someone running on a kernel > with a ton of possible CPUs configured. But it does reduce the overhead > by 50%. Or something as simple as this: #define part_stat_read_double(part, field1, field2) \ ({ \ typeof((part)->dkstats->field1) res = 0;\ unsigned int _cpu; \ for_each_possible_cpu(_cpu) { \ res += per_cpu_ptr((part)->dkstats, _cpu)->field1; \ res += per_cpu_ptr((part)->dkstats, _cpu)->field2; \ } \ res;\ }) static inline unsigned long part_in_flight(struct hd_struct *part) { return part_stat_read_double(part, in_flight[0], in_flight[1]); } -- Jens Axboe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 1/1] block: Convert hd_struct in_flight from atomic to percpu
On 06/28/2017 03:12 PM, Brian King wrote: > This patch converts the in_flight counter in struct hd_struct from a > pair of atomics to a pair of percpu counters. This eliminates a couple > of atomics from the hot path. When running this on a Power system, to > a single null_blk device with 80 submission queues, irq mode 0, with > 80 fio jobs, I saw IOPs go from 1.5M IO/s to 11.4 IO/s. This has been done before, but I've never really liked it. The reason is that it means that reading the part stat inflight count now has to iterate over every possible CPU. Did you use partitions in your testing? How many CPUs were configured? When I last tested this a few years ago on even a quad core nehalem (which is notoriously shitty for cross-node latencies), it was a net loss. I do agree that we should do something about it, and it's one of those items I've highlighted in talks about blk-mq on pending issues to fix up. It's just not great as it currently stands, but I don't think per CPU counters is the right way to fix it, at least not for the inflight counter. -- Jens Axboe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 1/1] block: Convert hd_struct in_flight from atomic to percpu
This patch converts the in_flight counter in struct hd_struct from a pair of atomics to a pair of percpu counters. This eliminates a couple of atomics from the hot path. When running this on a Power system, to a single null_blk device with 80 submission queues, irq mode 0, with 80 fio jobs, I saw IOPs go from 1.5M IO/s to 11.4 IO/s. Signed-off-by: Brian King--- block/bio.c |4 ++-- block/blk-core.c |4 ++-- block/blk-merge.c |2 +- block/genhd.c |2 +- block/partition-generic.c |6 +++--- drivers/md/dm.c | 10 ++ include/linux/genhd.h | 18 +- 7 files changed, 24 insertions(+), 22 deletions(-) diff -puN include/linux/genhd.h~blk_in_flight_atomic_remove include/linux/genhd.h --- linux-block/include/linux/genhd.h~blk_in_flight_atomic_remove 2017-06-28 16:06:43.037948079 -0500 +++ linux-block-bjking1/include/linux/genhd.h 2017-06-28 16:06:43.064947978 -0500 @@ -87,6 +87,7 @@ struct disk_stats { unsigned long ticks[2]; unsigned long io_ticks; unsigned long time_in_queue; + unsigned long in_flight[2]; }; #define PARTITION_META_INFO_VOLNAMELTH 64 @@ -120,7 +121,6 @@ struct hd_struct { int make_it_fail; #endif unsigned long stamp; - atomic_t in_flight[2]; #ifdef CONFIG_SMP struct disk_stats __percpu *dkstats; #else @@ -362,23 +362,23 @@ static inline void free_part_stats(struc #define part_stat_sub(cpu, gendiskp, field, subnd) \ part_stat_add(cpu, gendiskp, field, -subnd) -static inline void part_inc_in_flight(struct hd_struct *part, int rw) +static inline void part_inc_in_flight(int cpu, struct hd_struct *part, int rw) { - atomic_inc(>in_flight[rw]); + part_stat_inc(cpu, part, in_flight[rw]); if (part->partno) - atomic_inc(_to_disk(part)->part0.in_flight[rw]); + part_stat_inc(cpu, _to_disk(part)->part0, in_flight[rw]); } -static inline void part_dec_in_flight(struct hd_struct *part, int rw) +static inline void part_dec_in_flight(int cpu, struct hd_struct *part, int rw) { - atomic_dec(>in_flight[rw]); + part_stat_dec(cpu, part, in_flight[rw]); if (part->partno) - atomic_dec(_to_disk(part)->part0.in_flight[rw]); + part_stat_dec(cpu, _to_disk(part)->part0, in_flight[rw]); } -static inline int part_in_flight(struct hd_struct *part) +static inline unsigned long part_in_flight(struct hd_struct *part) { - return atomic_read(>in_flight[0]) + atomic_read(>in_flight[1]); + return part_stat_read(part, in_flight[0]) + part_stat_read(part, in_flight[1]); } static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk) diff -puN block/bio.c~blk_in_flight_atomic_remove block/bio.c --- linux-block/block/bio.c~blk_in_flight_atomic_remove 2017-06-28 16:06:43.041948064 -0500 +++ linux-block-bjking1/block/bio.c 2017-06-28 16:06:43.065947974 -0500 @@ -1737,7 +1737,7 @@ void generic_start_io_acct(int rw, unsig part_round_stats(cpu, part); part_stat_inc(cpu, part, ios[rw]); part_stat_add(cpu, part, sectors[rw], sectors); - part_inc_in_flight(part, rw); + part_inc_in_flight(cpu, part, rw); part_stat_unlock(); } @@ -1751,7 +1751,7 @@ void generic_end_io_acct(int rw, struct part_stat_add(cpu, part, ticks[rw], duration); part_round_stats(cpu, part); - part_dec_in_flight(part, rw); + part_dec_in_flight(cpu, part, rw); part_stat_unlock(); } diff -puN block/blk-core.c~blk_in_flight_atomic_remove block/blk-core.c --- linux-block/block/blk-core.c~blk_in_flight_atomic_remove2017-06-28 16:06:43.045948049 -0500 +++ linux-block-bjking1/block/blk-core.c2017-06-28 16:06:43.066947970 -0500 @@ -2435,7 +2435,7 @@ void blk_account_io_done(struct request part_stat_inc(cpu, part, ios[rw]); part_stat_add(cpu, part, ticks[rw], duration); part_round_stats(cpu, part); - part_dec_in_flight(part, rw); + part_dec_in_flight(cpu, part, rw); hd_struct_put(part); part_stat_unlock(); @@ -2493,7 +2493,7 @@ void blk_account_io_start(struct request hd_struct_get(part); } part_round_stats(cpu, part); - part_inc_in_flight(part, rw); + part_inc_in_flight(cpu, part, rw); rq->part = part; } diff -puN block/blk-merge.c~blk_in_flight_atomic_remove block/blk-merge.c --- linux-block/block/blk-merge.c~blk_in_flight_atomic_remove 2017-06-28 16:06:43.048948038 -0500 +++ linux-block-bjking1/block/blk-merge.c 2017-06-28 16:06:43.067947967 -0500 @@ -634,7 +634,7 @@ static void blk_account_io_merge(struct part = req->part;
Re: [dm-devel] [PATCH] libmultipath: update 3PARdata builtin config
On 06/26/2017 09:03 PM, Benjamin Marzinski wrote: > This updated config comes from hp. It would be nice to have more information. Why and when is this needed? BTW: HPE 'MSA 205x' and 'StoreVirtual 3200'(LeftHand) are missing. > Signed-off-by: Benjamin Marzinski> --- > libmultipath/hwtable.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c > index 390d143..54bdcfc 100644 > --- a/libmultipath/hwtable.c > +++ b/libmultipath/hwtable.c > @@ -49,6 +49,8 @@ static struct hwentry default_hw[] = { > .hwhandler = "1 alua", > .prio_name = PRIO_ALUA, > .no_path_retry = 18, > + .fast_io_fail = 10, > + .dev_loss = MAX_DEV_LOSS_TMO, > }, > { > /* RA8000 / ESA12000 */ > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH RESEND] dm-raid: Do not call BUG() in __rdev_sectors()
On Wed, Jun 28 2017 at 4:04P -0400, Hannes Reineckewrote: > __rdev_sectors() might be called for an invalid/non-existing > RAID set during raid_ctr(), which is perfectly fine and no > reason to panic. > > Signed-off-by: Hannes Reinecke > --- > drivers/md/dm-raid.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c > index 7d89322..9bbb596 100644 > --- a/drivers/md/dm-raid.c > +++ b/drivers/md/dm-raid.c > @@ -1571,7 +1571,8 @@ static sector_t __rdev_sectors(struct raid_set *rs) > return rdev->sectors; > } > > - BUG(); /* Constructor ensures we got some. */ > + /* No valid raid devices */ > + return 0; > } > > /* Calculate the sectors per device and per array used for @rs */ > -- > 1.8.5.6 > The use of BUG() is certainly suspect. BUG() is very rarely the right answer. But I don't think returning 0 makes sense for all __rdev_sectors() callers, e.g.: rs_setup_recovery() __rdev_sectors() could easily be made to check if ti->error is passed as a non-NULL pointer.. if so set *error, return 0, have ctr check for 0, and gracefully fail ctr by returning the ti->error that __rdev_sectors() set. If the error pointer passed to __rdev_sectors() is NULL, resort to BUG() still? Would really like to avoid BUG().. but I'll defer to Heinz on what might be a better response. Alternatively, you could look to see where "Constructor ensures we got some." and fix the fact that it clearly isn't ensuring as much for your case? Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 2/2] multipath: attempt at common multipath.rules
On Tue, 2017-06-27 at 23:41 +0200, Martin Wilck wrote: > > > The other change is the redhat multipath rules remove the partition > > device nodes for devices claimed by multipath. The udev rule will > > only > > do this one time (both to keep from running partx on every event, > > and > > so > > that if users manually reread the partition table, we don't keep > > removing them when clearly they are wanted). Redhat does this > > because > > we > > had multiple customer issues where they were using the scsi > > partitions > > instead of the kpartx devices. Obviously, with setting the > > partition > > devices to not ready and clearing their fs_type, this isn't > > essential, > > but it has helped make customers do the right thing. > > Isn't this racy? You call "partx -d" on the parent device (e.g. sda). > At this point in time, the kernel will already have detected the > partitions and "add" uevents for them are likely to follow up > quickly. > You generate "remove" events that may race with the "add" - or am I > overlooking something? > > I'd feel better if we'd call "partx -d" while processing the "add" > uevent for the _partitions_, ideally late in that process. That would > make (almost) sure that "add" was finished when "remove" was > generated. > See below. It turns out that my idea doesn't work quite like your approach, because users can't simply run "partx -a /dev/sd$X" to get the deleted partitions back. They need an additional, manual "echo change >/sys/class/block/sd$X/uevent" first to make "partx -a work". So, if you can confirm that uevent races are no problem, your technique seems to be more user-friendly. > > +LABEL="check_kpartx" > > + > > +IMPORT{db}="DM_MULTIPATH_NEED_KPARTX" > > As remarked above, I'd vote for removing "MULTIPATH" from this > property > name. > > > +ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}!="1", > > IMPORT{db}="DM_SUBSYSTEM_UDEV_FLAG1" > > +ENV{DM_SUBSYSTEM_UDEV_FLAG1}=="1", GOTO="end_mpath" > > +ACTION!="change", GOTO="end_mpath" > > +ENV{DM_UUID}!="mpath-?*", GOTO="end_mpath" > > +ENV{DM_ACTIVATION}=="1", ENV{DM_MULTIPATH_NEED_KPARTX}="1" > > +ENV{DM_SUSPENDED}=="1", GOTO="end_mpath" > > +ENV{DM_ACTION}=="PATH_FAILED", GOTO="end_mpath" > > The previous code had ENV{DM_ACTION}=="PATH_FAILED|PATH_REINSTATED", > why did you drop the latter? > Anyway, I think both aren't needed because in 11-dm-mpath.rules, > "DM_ACTIVATION" is set to "0" in the "reload if paths are > lost/recovered" case. I think the cleaner way to express the logic > here > would be: > > # don't rerun kpartx on "reload" events (see 11-dm-mpath.rules) > ENV{DM_ACTIVATION}=="0", GOTO="end_mpath" > # don't run kpartx when there are no paths > ENV{MPATH_DEVICE_READY}=="0", GOTO="end_mpath" The ENV{DM_ACTIVATION}=="0" can be left out, too, I think. I'll post a patch with what I think the setup should be soon. Best Martin -- Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm-raid: Do not call BUG() in __rdev_sectors()
On 06/28/2017 09:55 AM, Hannes Reinecke wrote: > __rdev_sectors() might be called for an invalid/non-existing > RAID set during raid_ctr(), which is perfectly fine and no > reason to panic. > > Signed-off-by: Hannes Reinecke> --- > drivers/md/dm-raid.c | 3 ++- > drivers/scsi/pmcraid.c| 2 +- > drivers/scsi/qla4xxx/ql4_os.c | 2 +- > 3 files changed, 4 insertions(+), 3 deletions(-) > Gnaa. Those bits shouldn't be here. Will be resending. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH RESEND] dm-raid: Do not call BUG() in __rdev_sectors()
__rdev_sectors() might be called for an invalid/non-existing RAID set during raid_ctr(), which is perfectly fine and no reason to panic. Signed-off-by: Hannes Reinecke--- drivers/md/dm-raid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 7d89322..9bbb596 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -1571,7 +1571,8 @@ static sector_t __rdev_sectors(struct raid_set *rs) return rdev->sectors; } - BUG(); /* Constructor ensures we got some. */ + /* No valid raid devices */ + return 0; } /* Calculate the sectors per device and per array used for @rs */ -- 1.8.5.6 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm-raid: Do not call BUG() in __rdev_sectors()
On Wed, Jun 28, 2017 at 09:55:03AM +0200, Hannes Reinecke wrote: > __rdev_sectors() might be called for an invalid/non-existing > RAID set during raid_ctr(), which is perfectly fine and no > reason to panic. > > Signed-off-by: Hannes Reinecke> --- > drivers/md/dm-raid.c | 3 ++- > drivers/scsi/pmcraid.c| 2 +- > drivers/scsi/qla4xxx/ql4_os.c | 2 +- Ahm, accidential git commit -a? -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] dm-raid: Do not call BUG() in __rdev_sectors()
__rdev_sectors() might be called for an invalid/non-existing RAID set during raid_ctr(), which is perfectly fine and no reason to panic. Signed-off-by: Hannes Reinecke--- drivers/md/dm-raid.c | 3 ++- drivers/scsi/pmcraid.c| 2 +- drivers/scsi/qla4xxx/ql4_os.c | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 7d89322..9bbb596 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -1571,7 +1571,8 @@ static sector_t __rdev_sectors(struct raid_set *rs) return rdev->sectors; } - BUG(); /* Constructor ensures we got some. */ + /* No valid raid devices */ + return 0; } /* Calculate the sectors per device and per array used for @rs */ diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c index 56928d6..4b66b92 100644 --- a/drivers/scsi/pmcraid.c +++ b/drivers/scsi/pmcraid.c @@ -3089,7 +3089,7 @@ static int pmcraid_eh_target_reset_handler(struct scsi_cmnd *scmd) /** * pmcraid_eh_host_reset_handler - adapter reset handler callback * - * @scmd: pointer to scsi_cmd that was sent to a resource of adapter + * @shost: pointer to scsi_host * * Initiates adapter reset to bring it up to operational state * diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c index 5d55de5..11bd2da 100644 --- a/drivers/scsi/qla4xxx/ql4_os.c +++ b/drivers/scsi/qla4xxx/ql4_os.c @@ -9367,7 +9367,7 @@ static int qla4xxx_is_eh_active(struct Scsi_Host *shost) /** * qla4xxx_eh_host_reset - kernel callback - * @cmd: Pointer to Linux's SCSI command structure + * @host: Pointer to Linux's SCSI host structure * * This routine is invoked by the Linux kernel to perform fatal error * recovery on the specified adapter. -- 1.8.5.6 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel