Re: [patch] block layer: kmemcheck fixes
On Fri, Feb 08, 2008 at 02:56:09PM -0800, Arjan van de Ven wrote: > Nick Piggin wrote: > >>>Maybe cpus these days have so much store bandwith that doing > >>>things like the above is OK, but I doubt it :-) > >>on modern x86 cpus the memset may even be faster if the memory isn't in > >>cache; > >>the "explicit" method ends up doing Write Allocate on the cache lines > >>(so read them from memory) even though they then end up being written > >>entirely. > >>With memset the CPU is told that the entire range is set to a new value, > >>and > >>the WA can be avoided for the whole-cachelines in the range. > > > >Don't you have write combining store buffers? Or is it still speculatively > >issuing the reads even before the whole cacheline is combined? > > x86 memory order model doesn't allow that quite; and you need a "series" of > at least 64 bytes > without any other memory accesses in between even if it would > not happening in practice. OK, fair enough... then it will be a very nice test to see if it helps. I'm sure you could have an arch specific initialisation function if it makes a significant difference. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
Nick Piggin wrote: Maybe cpus these days have so much store bandwith that doing things like the above is OK, but I doubt it :-) on modern x86 cpus the memset may even be faster if the memory isn't in cache; the "explicit" method ends up doing Write Allocate on the cache lines (so read them from memory) even though they then end up being written entirely. With memset the CPU is told that the entire range is set to a new value, and the WA can be avoided for the whole-cachelines in the range. Don't you have write combining store buffers? Or is it still speculatively issuing the reads even before the whole cacheline is combined? x86 memory order model doesn't allow that quite; and you need a "series" of at least 64 bytes without any other memory accesses in between even if it would not happening in practice. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
On Fri, Feb 08, 2008 at 07:09:07AM -0800, Arjan van de Ven wrote: > David Miller wrote: > >From: Linus Torvalds <[EMAIL PROTECTED]> > >Date: Thu, 7 Feb 2008 09:42:56 -0800 (PST) > > > >>Can we please just stop doing these one-by-one assignments, and just do > >>something like > >> > >>memset(rq, 0, sizeof(*rq)); > >>rq->q = q; > >>rq->ref_count = 1; > >>INIT_HLIST_NODE(&rq->hash); > >>RB_CLEAR_NODE(&rq->rb_node); > >> > >>instead? > >> > >>The memset() is likely faster and smaller than one-by-one assignments > >>anyway, even if the one-by-ones can avoid initializing some field or > >>there ends up being a double initialization.. > > > >The problem is store buffer compression. At least a few years > >ago this made a huge difference in sk_buff initialization in the > >networking. > > > >Maybe cpus these days have so much store bandwith that doing > >things like the above is OK, but I doubt it :-) > > on modern x86 cpus the memset may even be faster if the memory isn't in > cache; > the "explicit" method ends up doing Write Allocate on the cache lines > (so read them from memory) even though they then end up being written > entirely. > With memset the CPU is told that the entire range is set to a new value, and > the WA can be avoided for the whole-cachelines in the range. Don't you have write combining store buffers? Or is it still speculatively issuing the reads even before the whole cacheline is combined? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
David Miller wrote: From: Linus Torvalds <[EMAIL PROTECTED]> Date: Thu, 7 Feb 2008 09:42:56 -0800 (PST) Can we please just stop doing these one-by-one assignments, and just do something like memset(rq, 0, sizeof(*rq)); rq->q = q; rq->ref_count = 1; INIT_HLIST_NODE(&rq->hash); RB_CLEAR_NODE(&rq->rb_node); instead? The memset() is likely faster and smaller than one-by-one assignments anyway, even if the one-by-ones can avoid initializing some field or there ends up being a double initialization.. The problem is store buffer compression. At least a few years ago this made a huge difference in sk_buff initialization in the networking. Maybe cpus these days have so much store bandwith that doing things like the above is OK, but I doubt it :-) on modern x86 cpus the memset may even be faster if the memory isn't in cache; the "explicit" method ends up doing Write Allocate on the cache lines (so read them from memory) even though they then end up being written entirely. With memset the CPU is told that the entire range is set to a new value, and the WA can be avoided for the whole-cachelines in the range. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
On Thu, Feb 07 2008, Linus Torvalds wrote: > > > On Thu, 7 Feb 2008, Ingo Molnar wrote: > > INIT_HLIST_NODE(&rq->hash); > > RB_CLEAR_NODE(&rq->rb_node); > > - rq->ioprio = 0; > > - rq->buffer = NULL; > > - rq->ref_count = 1; > > - rq->q = q; > > - rq->special = NULL; > > - rq->data_len = 0; > > - rq->data = NULL; > > - rq->nr_phys_segments = 0; > > - rq->sense = NULL; > > - rq->end_io = NULL; > > - rq->end_io_data = NULL; > > - rq->completion_data = NULL; > > - rq->next_rq = NULL; > > + rq->completion_data = NULL; > > + /* rq->elevator_private */ > > + /* rq->elevator_private2*/ > > + /* rq->rq_disk */ > > + /* rq->start_time */ > > + rq->nr_phys_segments= 0; > > + /* rq->nr_hw_segments */ > > + rq->ioprio = 0; > > + rq->special = NULL; > > + rq->buffer = NULL; > ... > > Can we please just stop doing these one-by-one assignments, and just do > something like > > memset(rq, 0, sizeof(*rq)); > rq->q = q; > rq->ref_count = 1; > INIT_HLIST_NODE(&rq->hash); > RB_CLEAR_NODE(&rq->rb_node); > > instead? > > The memset() is likely faster and smaller than one-by-one assignments > anyway, even if the one-by-ones can avoid initializing some field or there > ends up being a double initialization.. Looked into this approach and we can't currently do that here, since some members of the request are being set from blk_alloc_request() and then from the io scheduler attaching private data to it. So we have to preserve ->cmd_flags and ->elevator_private and ->elevator_private2 at least. Now rq_init() is also used for stored requests, so we cannot just rely on clearing at allocation time. So I'd prefer being a little conservative here. The below reorders rq_init() a bit and clears some more members to be on the safer side, adding comments to why we cannot memset and an associated comment in blkdev.h. diff --git a/block/blk-core.c b/block/blk-core.c index 4afb39c..fba4ca7 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -102,27 +102,38 @@ struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev) } EXPORT_SYMBOL(blk_get_backing_dev_info); +/* + * We can't just memset() the structure, since the allocation path + * already stored some information in the request. + */ void rq_init(struct request_queue *q, struct request *rq) { INIT_LIST_HEAD(&rq->queuelist); INIT_LIST_HEAD(&rq->donelist); - - rq->errors = 0; + rq->q = q; + rq->sector = rq->hard_sector = (sector_t) -1; + rq->nr_sectors = rq->hard_nr_sectors = 0; + rq->current_nr_sectors = rq->hard_cur_sectors = 0; rq->bio = rq->biotail = NULL; INIT_HLIST_NODE(&rq->hash); RB_CLEAR_NODE(&rq->rb_node); + rq->rq_disk = NULL; + rq->nr_phys_segments = 0; + rq->nr_hw_segments = 0; rq->ioprio = 0; + rq->special = NULL; rq->buffer = NULL; + rq->tag = -1; + rq->errors = 0; rq->ref_count = 1; - rq->q = q; - rq->special = NULL; + rq->cmd_len = 0; + memset(rq->cmd, 0, sizeof(rq->cmd)); rq->data_len = 0; + rq->sense_len = 0; rq->data = NULL; - rq->nr_phys_segments = 0; rq->sense = NULL; rq->end_io = NULL; rq->end_io_data = NULL; - rq->completion_data = NULL; rq->next_rq = NULL; } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 90392a9..e1888cc 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -137,7 +137,9 @@ enum rq_flag_bits { #define BLK_MAX_CDB16 /* - * try to put the fields that are referenced together in the same cacheline + * try to put the fields that are referenced together in the same cacheline. + * if you modify this structure, be sure to check block/blk-core.c:rq_init() + * as well! */ struct request { struct list_head queuelist; -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
On Thu, 7 Feb 2008, David Miller wrote: > > Maybe cpus these days have so much store bandwith that doing > things like the above is OK, but I doubt it :-) I seriously doubt the same is true for the IO requests (which are different anyway, and tend to happen at a much lower frequency than high-speed networking). I also suspect that your timings were very hardware-dependent in the first place (and yes, the long-term effect is likely working against that "optimization"), and depend a lot on just how sparse the initialization can to be. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
From: Linus Torvalds <[EMAIL PROTECTED]> Date: Thu, 7 Feb 2008 09:42:56 -0800 (PST) > Can we please just stop doing these one-by-one assignments, and just do > something like > > memset(rq, 0, sizeof(*rq)); > rq->q = q; > rq->ref_count = 1; > INIT_HLIST_NODE(&rq->hash); > RB_CLEAR_NODE(&rq->rb_node); > > instead? > > The memset() is likely faster and smaller than one-by-one assignments > anyway, even if the one-by-ones can avoid initializing some field or there > ends up being a double initialization.. The problem is store buffer compression. At least a few years ago this made a huge difference in sk_buff initialization in the networking. Maybe cpus these days have so much store bandwith that doing things like the above is OK, but I doubt it :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
On Thu, Feb 07 2008, Ingo Molnar wrote: > > * Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Thu, 7 Feb 2008, Ingo Molnar wrote: > > > INIT_HLIST_NODE(&rq->hash); > > > RB_CLEAR_NODE(&rq->rb_node); > > > - rq->ioprio = 0; > > > - rq->buffer = NULL; > > > - rq->ref_count = 1; > > > - rq->q = q; > > > - rq->special = NULL; > > > - rq->data_len = 0; > > > - rq->data = NULL; > > > - rq->nr_phys_segments = 0; > > > - rq->sense = NULL; > > > - rq->end_io = NULL; > > > - rq->end_io_data = NULL; > > > - rq->completion_data = NULL; > > > - rq->next_rq = NULL; > > > + rq->completion_data = NULL; > > > + /* rq->elevator_private */ > > > + /* rq->elevator_private2*/ > > > + /* rq->rq_disk */ > > > + /* rq->start_time */ > > > + rq->nr_phys_segments= 0; > > > + /* rq->nr_hw_segments */ > > > + rq->ioprio = 0; > > > + rq->special = NULL; > > > + rq->buffer = NULL; > > ... > > > > Can we please just stop doing these one-by-one assignments, and just do > > something like > > > > memset(rq, 0, sizeof(*rq)); > > rq->q = q; > > rq->ref_count = 1; > > INIT_HLIST_NODE(&rq->hash); > > RB_CLEAR_NODE(&rq->rb_node); > > > > instead? > > > > The memset() is likely faster and smaller than one-by-one assignments > > anyway, even if the one-by-ones can avoid initializing some field or > > there ends up being a double initialization.. > > i definitely agree and do that for all code i write. > > But if someone does item by item initialization for some crazy > performance reason (networking folks tend to have such constructs), it > should be done i think how i've done it in the patch: by systematically > listing _every_ field in the structure, in the same order, and > indicating it clearly when it is not initialized and why. That assumes that people find the references in two places when adding members to a structure, not very likely (people are lazy!). > and there it already shows that we do not initialize a few other members > that could cause problems later on: > > + rq->data_len= 0; > + /* rq->sense_len*/ > + rq->data= NULL; > + rq->sense = NULL; > > why is sense_len not initialized - while data_len is? In any case, these because sense isn't set, when someone sets ->sense they should set sense_len as well. > days the memclear instructions are dirt cheap and we should just always > initialize everything to zero by default, especially if it's almost all > zero-initialized anyway. Completely agree, some of these are just dormant bugs waiting to happen. Clearing everything is the sanest approach. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Thu, 7 Feb 2008, Ingo Molnar wrote: > > INIT_HLIST_NODE(&rq->hash); > > RB_CLEAR_NODE(&rq->rb_node); > > - rq->ioprio = 0; > > - rq->buffer = NULL; > > - rq->ref_count = 1; > > - rq->q = q; > > - rq->special = NULL; > > - rq->data_len = 0; > > - rq->data = NULL; > > - rq->nr_phys_segments = 0; > > - rq->sense = NULL; > > - rq->end_io = NULL; > > - rq->end_io_data = NULL; > > - rq->completion_data = NULL; > > - rq->next_rq = NULL; > > + rq->completion_data = NULL; > > + /* rq->elevator_private */ > > + /* rq->elevator_private2*/ > > + /* rq->rq_disk */ > > + /* rq->start_time */ > > + rq->nr_phys_segments= 0; > > + /* rq->nr_hw_segments */ > > + rq->ioprio = 0; > > + rq->special = NULL; > > + rq->buffer = NULL; > ... > > Can we please just stop doing these one-by-one assignments, and just do > something like > > memset(rq, 0, sizeof(*rq)); > rq->q = q; > rq->ref_count = 1; > INIT_HLIST_NODE(&rq->hash); > RB_CLEAR_NODE(&rq->rb_node); > > instead? > > The memset() is likely faster and smaller than one-by-one assignments > anyway, even if the one-by-ones can avoid initializing some field or > there ends up being a double initialization.. i definitely agree and do that for all code i write. But if someone does item by item initialization for some crazy performance reason (networking folks tend to have such constructs), it should be done i think how i've done it in the patch: by systematically listing _every_ field in the structure, in the same order, and indicating it clearly when it is not initialized and why. and there it already shows that we do not initialize a few other members that could cause problems later on: + rq->data_len= 0; + /* rq->sense_len*/ + rq->data= NULL; + rq->sense = NULL; why is sense_len not initialized - while data_len is? In any case, these days the memclear instructions are dirt cheap and we should just always initialize everything to zero by default, especially if it's almost all zero-initialized anyway. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
On Thu, Feb 07 2008, Linus Torvalds wrote: > > > On Thu, 7 Feb 2008, Ingo Molnar wrote: > > INIT_HLIST_NODE(&rq->hash); > > RB_CLEAR_NODE(&rq->rb_node); > > - rq->ioprio = 0; > > - rq->buffer = NULL; > > - rq->ref_count = 1; > > - rq->q = q; > > - rq->special = NULL; > > - rq->data_len = 0; > > - rq->data = NULL; > > - rq->nr_phys_segments = 0; > > - rq->sense = NULL; > > - rq->end_io = NULL; > > - rq->end_io_data = NULL; > > - rq->completion_data = NULL; > > - rq->next_rq = NULL; > > + rq->completion_data = NULL; > > + /* rq->elevator_private */ > > + /* rq->elevator_private2*/ > > + /* rq->rq_disk */ > > + /* rq->start_time */ > > + rq->nr_phys_segments= 0; > > + /* rq->nr_hw_segments */ > > + rq->ioprio = 0; > > + rq->special = NULL; > > + rq->buffer = NULL; > ... > > Can we please just stop doing these one-by-one assignments, and just do > something like > > memset(rq, 0, sizeof(*rq)); > rq->q = q; > rq->ref_count = 1; > INIT_HLIST_NODE(&rq->hash); > RB_CLEAR_NODE(&rq->rb_node); > > instead? > > The memset() is likely faster and smaller than one-by-one assignments > anyway, even if the one-by-ones can avoid initializing some field or there > ends up being a double initialization.. I completely agree, the fidgeting with single members quickly loses appeal. Ingo, I'll merge and integrate your fixes before leaving, I'll be here all day tomorrow as well. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block layer: kmemcheck fixes
On Thu, 7 Feb 2008, Ingo Molnar wrote: > INIT_HLIST_NODE(&rq->hash); > RB_CLEAR_NODE(&rq->rb_node); > - rq->ioprio = 0; > - rq->buffer = NULL; > - rq->ref_count = 1; > - rq->q = q; > - rq->special = NULL; > - rq->data_len = 0; > - rq->data = NULL; > - rq->nr_phys_segments = 0; > - rq->sense = NULL; > - rq->end_io = NULL; > - rq->end_io_data = NULL; > - rq->completion_data = NULL; > - rq->next_rq = NULL; > + rq->completion_data = NULL; > + /* rq->elevator_private */ > + /* rq->elevator_private2*/ > + /* rq->rq_disk */ > + /* rq->start_time */ > + rq->nr_phys_segments= 0; > + /* rq->nr_hw_segments */ > + rq->ioprio = 0; > + rq->special = NULL; > + rq->buffer = NULL; ... Can we please just stop doing these one-by-one assignments, and just do something like memset(rq, 0, sizeof(*rq)); rq->q = q; rq->ref_count = 1; INIT_HLIST_NODE(&rq->hash); RB_CLEAR_NODE(&rq->rb_node); instead? The memset() is likely faster and smaller than one-by-one assignments anyway, even if the one-by-ones can avoid initializing some field or there ends up being a double initialization.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] block layer: kmemcheck fixes
* Jens Axboe <[EMAIL PROTECTED]> wrote: > [...] but may not post anything until after my vacation. oh, you going on a vacation. I am sitting on a few block layer patches you might be interested in :-) i am playing with Vegard Nossum's kmemcheck on x86 (with much help from Pekka Enberg for the SLUB details) and it's getting quite promising. Kmemcheck is in essence Valgrind for the native kernel - it is "Da Bomb" in terms of kernel object lifetime and access validation debugging helpers. it promptly triggered a few uninitialized accesses in the block layer (amongst others), resulting in the following 4 fixes (find them below): block: restructure rq_init() to make it safer block: fix access to uninitialized field block: initialize rq->tag block: rq->cmd* initialization i _think_ the actual uninitialized memory accesses are only latent bugs not actual runtime bugs because they relate to SCSI init sequences that do not truly need the elevator's attention - but rq_init() sure looked a bit dangerous in this regard and the elv_next_request() access to those uninitialized fields is not nice. Do these fixes look good to you? I had them in testing for a few days already. Ingo -> Subject: block: restructure rq_init() to make it safer From: Ingo Molnar <[EMAIL PROTECTED]> reorder the initializations done in rq_init() to both align them to memory order, and to document them better. They now match the field ordering of "struct request" in blkdev.h. No change in functionality: textdata bss dec hex filename 8426 0 20844620fe blk-core.o.before 8426 0 20844620fe blk-core.o.after Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- block/blk-core.c | 51 +++ 1 file changed, 35 insertions(+), 16 deletions(-) Index: linux/block/blk-core.c === --- linux.orig/block/blk-core.c +++ linux/block/blk-core.c @@ -106,24 +106,43 @@ void rq_init(struct request_queue *q, st { INIT_LIST_HEAD(&rq->queuelist); INIT_LIST_HEAD(&rq->donelist); - - rq->errors = 0; - rq->bio = rq->biotail = NULL; + rq->q = q; + /* rq->cmd_flags*/ + /* rq->cmd_type */ + /* rq->sector */ + /* rq->hard_sector */ + /* rq->nr_sectors */ + /* rq->hard_nr_sectors */ + /* rq->current_nr_sectors */ + /* rq->hard_cur_sectors */ + rq->bio = NULL; + rq->biotail = NULL; INIT_HLIST_NODE(&rq->hash); RB_CLEAR_NODE(&rq->rb_node); - rq->ioprio = 0; - rq->buffer = NULL; - rq->ref_count = 1; - rq->q = q; - rq->special = NULL; - rq->data_len = 0; - rq->data = NULL; - rq->nr_phys_segments = 0; - rq->sense = NULL; - rq->end_io = NULL; - rq->end_io_data = NULL; - rq->completion_data = NULL; - rq->next_rq = NULL; + rq->completion_data = NULL; + /* rq->elevator_private */ + /* rq->elevator_private2*/ + /* rq->rq_disk */ + /* rq->start_time */ + rq->nr_phys_segments= 0; + /* rq->nr_hw_segments */ + rq->ioprio = 0; + rq->special = NULL; + rq->buffer = NULL; + /* rq->tag */ + rq->errors = 0; + rq->ref_count = 1; + /* rq->cmd_len */ + /* rq->cmd[]*/ + rq->data_len= 0; + /* rq->sense_len*/ + rq->data= NULL; + rq->sense = NULL; + /* rq->timeout */ + /* rq->retries */ + rq->end_io = NULL; + rq->end_io_data = NULL; + rq->next_rq = NULL; } static void req_bio_endio(struct request *rq, struct bio *bio, Subject: block: fix access to uninitialized field From: Ingo Molnar <[EMAIL PROTECTED]> kmemcheck caught the following uninitialized memory access in the block layer: kmemcheck: Caught uninitialized read: EIP = c020e596 (elv_next_request+0x63/0x154), address f74985dc, size 32 Pid: 1, comm: swapper Not tainted (2.6.24 #5) EIP: 0060:[] EFLAGS: 00010046 CPU: 0 EIP is at elv_next_request+0x63/0x154 EAX: EBX: f74b83b0 ECX: 0002 EDX: ESI: f74985c0 EDI: f7c5bbf0 EBP: f7c5bc00 ESP: f7c5bbf0