Re: NULL deref around blkmq in v4.0-rc1–rc7
On Saturday 2015-04-11 19:46, Linus Torvalds wrote: >On Thu, Apr 9, 2015 at 5:07 PM, Linus Torvalds > wrote: >> On Thu, Apr 9, 2015 at 3:29 PM, Jan Engelhardt wrote: >>> >>> Yes, this seems to solve it for me. >> >> Can you humor me, and try that patch on top of a few different kernel >> versions that you had trouble with. > >Jan? I'm inclined to commit the patch for 4.0 even though it is late >(it's not like it can hurt), but since you seem to be the only one >seeing this issue, I'd really like you to double-check it.. Yeah, just commit. I'm currently "stuck" in the weekend. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On 04/11/2015 11:46 AM, Linus Torvalds wrote: On Thu, Apr 9, 2015 at 5:07 PM, Linus Torvalds wrote: On Thu, Apr 9, 2015 at 3:29 PM, Jan Engelhardt wrote: Yes, this seems to solve it for me. Can you humor me, and try that patch on top of a few different kernel versions that you had trouble with. Jan? I'm inclined to commit the patch for 4.0 even though it is late (it's not like it can hurt), but since you seem to be the only one seeing this issue, I'd really like you to double-check it.. I already committed for 4.1 (marked stable), but I'd be fine with adding it to this series as well obviously. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On Thu, Apr 9, 2015 at 5:07 PM, Linus Torvalds wrote: > On Thu, Apr 9, 2015 at 3:29 PM, Jan Engelhardt wrote: >> >> Yes, this seems to solve it for me. > > Can you humor me, and try that patch on top of a few different kernel > versions that you had trouble with. Jan? I'm inclined to commit the patch for 4.0 even though it is late (it's not like it can hurt), but since you seem to be the only one seeing this issue, I'd really like you to double-check it.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On Saturday 2015-04-11 19:46, Linus Torvalds wrote: On Thu, Apr 9, 2015 at 5:07 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Thu, Apr 9, 2015 at 3:29 PM, Jan Engelhardt jeng...@inai.de wrote: Yes, this seems to solve it for me. Can you humor me, and try that patch on top of a few different kernel versions that you had trouble with. Jan? I'm inclined to commit the patch for 4.0 even though it is late (it's not like it can hurt), but since you seem to be the only one seeing this issue, I'd really like you to double-check it.. Yeah, just commit. I'm currently stuck in the weekend. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On Thu, Apr 9, 2015 at 5:07 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Thu, Apr 9, 2015 at 3:29 PM, Jan Engelhardt jeng...@inai.de wrote: Yes, this seems to solve it for me. Can you humor me, and try that patch on top of a few different kernel versions that you had trouble with. Jan? I'm inclined to commit the patch for 4.0 even though it is late (it's not like it can hurt), but since you seem to be the only one seeing this issue, I'd really like you to double-check it.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On 04/11/2015 11:46 AM, Linus Torvalds wrote: On Thu, Apr 9, 2015 at 5:07 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Thu, Apr 9, 2015 at 3:29 PM, Jan Engelhardt jeng...@inai.de wrote: Yes, this seems to solve it for me. Can you humor me, and try that patch on top of a few different kernel versions that you had trouble with. Jan? I'm inclined to commit the patch for 4.0 even though it is late (it's not like it can hurt), but since you seem to be the only one seeing this issue, I'd really like you to double-check it.. I already committed for 4.1 (marked stable), but I'd be fine with adding it to this series as well obviously. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On 04/09/2015 03:12 PM, Linus Torvalds wrote: On Thu, Apr 9, 2015 at 11:24 AM, Jan Engelhardt wrote: It's fairly consistent (reproducible?). Only 1 in 15 or so (have not kept track really) attempts does it not die. With frame pointers: [] scsi_queue_rq+0x2e8/0x3d2 [] __blk_mq_run_hw_queue+0x19b/0x2a2 [] ? blk_mq_merge_queue_io+0x75/0x147 [] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] [] blk_mq_run_hw_queue+0x4f/0x99 [] blk_sq_make_request+0x163/0x170 Ok, good. So the cmd comes from struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); which in turn is just return (void *) rq + sizeof(*rq); which in turn is written by some crazy monkey on crack. That's some shit code. Why the hell you'd write it that way, when the natural thing to do would be just return rq + 1; without the sizeof, and without the cast. The particular crazy monkey on crack is Jens Axboe, in commit 320ae51feed5c. Jens, really. This code is shit. That's a bit rough on a single line of code like that, don't you think? But yes, rq + 1 is identical and cleaner... That ->sense_buffer thing is supposed to be initialized by the blk_mq_ops.init_request() function, which is called - if it exists = when the array of requests ('->rqs[]') is initialized. And that code too looks like crap. It seems to be very clever, trying to allocaet big contiguous chunks of RAM for the requests, but then the initialization sequence is questionable as hell. It takes that nonzeroed allocation, and zeroes a few fields randomly. The rest will contain whatever garbage data they used to. Does this entirely untested patch make any difference? And Jens, this all really looks very fishy. When I look at these kinds of core functions, and find just *stupid* code like this, it makes me unhappy. Not sure why it isn't all zeroed, definitely the saner thing to do at init time. So patch looks fine, should be applied regardless of whether or not it fixes this issue. But it's _always_ been like that, so it's not a change in behavior. It is fragile, so perhaps some SCSI change modified alloc behavior and it's not causing and issue. And if this is mpt, we recently ran into some list corruption issues due to a bug in the driver. It hit on reboot, but it was scan related, so could be a boot issue as well. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On Thu, Apr 9, 2015 at 11:24 AM, Jan Engelhardt wrote: > > It's fairly consistent (reproducible?). Only 1 in 15 or so (have not kept > track > really) attempts does it not die. > > With frame pointers: > [] scsi_queue_rq+0x2e8/0x3d2 > [] __blk_mq_run_hw_queue+0x19b/0x2a2 > [] ? blk_mq_merge_queue_io+0x75/0x147 > [] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] > [] blk_mq_run_hw_queue+0x4f/0x99 > [] blk_sq_make_request+0x163/0x170 Ok, good. So the cmd comes from struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); which in turn is just return (void *) rq + sizeof(*rq); which in turn is written by some crazy monkey on crack. That's some shit code. Why the hell you'd write it that way, when the natural thing to do would be just return rq + 1; without the sizeof, and without the cast. The particular crazy monkey on crack is Jens Axboe, in commit 320ae51feed5c. Jens, really. This code is shit. That ->sense_buffer thing is supposed to be initialized by the blk_mq_ops.init_request() function, which is called - if it exists = when the array of requests ('->rqs[]') is initialized. And that code too looks like crap. It seems to be very clever, trying to allocaet big contiguous chunks of RAM for the requests, but then the initialization sequence is questionable as hell. It takes that nonzeroed allocation, and zeroes a few fields randomly. The rest will contain whatever garbage data they used to. Does this entirely untested patch make any difference? And Jens, this all really looks very fishy. When I look at these kinds of core functions, and find just *stupid* code like this, it makes me unhappy. Anyway, I assume that the actual "disk" in question is mpt fusion? Linus block/blk-mq.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index b7b8933ec241..33c428530193 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1457,7 +1457,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set, do { page = alloc_pages_node(set->numa_node, - GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY, + GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO, this_order); if (page) break; @@ -1479,8 +1479,6 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set, left -= to_do * rq_size; for (j = 0; j < to_do; j++) { tags->rqs[i] = p; - tags->rqs[i]->atomic_flags = 0; - tags->rqs[i]->cmd_flags = 0; if (set->ops->init_request) { if (set->ops->init_request(set->driver_data, tags->rqs[i], hctx_idx, i,
Re: NULL deref around blkmq in v4.0-rc1–rc7
On Thursday 2015-04-09 19:38, Linus Torvalds wrote: >> >> I reran bisect just to be sure. >> It now shows v4.0-rc1~9 is bad, v4.0-rc1~9^1 is ok, and v4.0-rc~9^2 is >> ok too. So this means that the combination of the both ~9 childs work >> badly together. > >Ok, that's just _odd_. >[...] >So I get the feeling that the oops you are seeing is likely not >consistent, and may depend on allocation patterns or similar. It's fairly consistent (reproducible?). Only 1 in 15 or so (have not kept track really) attempts does it not die. With frame pointers: BUG: unable to handle kernel paging request at 1000 IP: [] scsi_init_cmd_errh+0x2a/0x62 PGD 0 Oops: 0002 [#1] SMP Modules linked in: xfs crc32c_generic libcrc32c dm_crypt xts gf128mul algif_skcipher af_alg sd_mod mptsas scsi_transport_sas mptscsih mptbase dm_mod sg ipv6 CPU: 0 PID: 403 Comm: kworker/u2:1 Not tainted 4.0.0-rc7+ #55 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: 88007b686f60 ti: 88007bcb4000 task.ti: 88007bcb4000 RIP: 0010:[] [] scsi_init_cmd_errh+0x2a/0x62 RSP: 0018:88007bcb77a8 EFLAGS: 00010246 RAX: RBX: 88007bf8d800 RCX: 0018 RDX: 88007bf7ab70 RSI: RDI: 1000 RBP: 88007bcb77a8 R08: 88007beb9c40 R09: R10: R11: ea0001fe17c0 R12: 88007bf7ab70 R13: R14: 88007bf8d800 R15: 88007bf7aa00 FS: () GS:88007fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 1000 CR3: 7cb0d000 CR4: 07f0 Stack: 88007bcb7818 81286d59 88007b686f60 88007bc24000 88007bf7ab78 88007bf8d968 88007be56c00 88007bc24000 88007cbfb400 88007bcb7850 88007be56c08 88007bf7aa00 Call Trace: [] scsi_queue_rq+0x2e8/0x3d2 [] __blk_mq_run_hw_queue+0x19b/0x2a2 [] ? blk_mq_merge_queue_io+0x75/0x147 [] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] [] blk_mq_run_hw_queue+0x4f/0x99 [] blk_sq_make_request+0x163/0x170 [] generic_make_request+0x97/0xd6 [] submit_bio+0x10d/0x12c [] ? __lru_cache_add+0x1e/0x3f [] mpage_bio_submit+0x25/0x2c [] mpage_readpages+0xf8/0x10c [] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] [] xfs_vm_readpages+0x18/0x1a [xfs] [] __do_page_cache_readahead+0x137/0x1d3 [] ondemand_readahead+0x20a/0x21b [] page_cache_sync_readahead+0x38/0x3a [] generic_file_read_iter+0x191/0x4fb [] ? xfs_ilock+0x32/0x5d [xfs] [] xfs_file_read_iter+0x1c2/0x213 [xfs] [] new_sync_read+0x74/0x98 [] __vfs_read+0x14/0x3b [] vfs_read+0x74/0xc1 [] kernel_read+0x3c/0x4a [] prepare_binprm+0x117/0x11f [] do_execveat_common.isra.31+0x3b2/0x5d8 [] do_execve+0x27/0x29 [] call_usermodehelper+0x10a/0x138 [] ? call_usermodehelper+0x49/0x49 [] ret_from_fork+0x58/0x90 [] ? call_usermodehelper+0x49/0x49 Code: c3 55 48 89 fa 48 c7 87 b0 00 00 00 00 00 00 00 c7 87 f4 00 00 00 00 00 00 00 48 8b bf 10 01 00 00 31 c0 b9 18 00 00 00 48 89 e5 ab 66 83 ba cc 00 00 00 00 75 2a 48 8b 8a d8 00 00 00 8a 01 RIP [] scsi_init_cmd_errh+0x2a/0x62 RSP CR2: 1000 ---[ end trace fbec0fe487830b1d ]--- >and %rdi is 0x1000. It seems to be simply > > memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); > >where 'cmd->sense_buffer' has some insane value ("PAGE_SIZE" or just a >flipped bit, or whatever) Having been observed on two isolated different systems, I don't think so much that it would be a broken HW-induced bitflip. Oh yeah, if anybody likes, I can hand out the virtualbox image. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On Thursday 2015-04-09 19:38, Linus Torvalds wrote: I reran bisect just to be sure. It now shows v4.0-rc1~9 is bad, v4.0-rc1~9^1 is ok, and v4.0-rc~9^2 is ok too. So this means that the combination of the both ~9 childs work badly together. Ok, that's just _odd_. [...] So I get the feeling that the oops you are seeing is likely not consistent, and may depend on allocation patterns or similar. It's fairly consistent (reproducible?). Only 1 in 15 or so (have not kept track really) attempts does it not die. With frame pointers: BUG: unable to handle kernel paging request at 1000 IP: [812853c9] scsi_init_cmd_errh+0x2a/0x62 PGD 0 Oops: 0002 [#1] SMP Modules linked in: xfs crc32c_generic libcrc32c dm_crypt xts gf128mul algif_skcipher af_alg sd_mod mptsas scsi_transport_sas mptscsih mptbase dm_mod sg ipv6 CPU: 0 PID: 403 Comm: kworker/u2:1 Not tainted 4.0.0-rc7+ #55 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: 88007b686f60 ti: 88007bcb4000 task.ti: 88007bcb4000 RIP: 0010:[812853c9] [812853c9] scsi_init_cmd_errh+0x2a/0x62 RSP: 0018:88007bcb77a8 EFLAGS: 00010246 RAX: RBX: 88007bf8d800 RCX: 0018 RDX: 88007bf7ab70 RSI: RDI: 1000 RBP: 88007bcb77a8 R08: 88007beb9c40 R09: R10: R11: ea0001fe17c0 R12: 88007bf7ab70 R13: R14: 88007bf8d800 R15: 88007bf7aa00 FS: () GS:88007fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 1000 CR3: 7cb0d000 CR4: 07f0 Stack: 88007bcb7818 81286d59 88007b686f60 88007bc24000 88007bf7ab78 88007bf8d968 88007be56c00 88007bc24000 88007cbfb400 88007bcb7850 88007be56c08 88007bf7aa00 Call Trace: [81286d59] scsi_queue_rq+0x2e8/0x3d2 [8119e64d] __blk_mq_run_hw_queue+0x19b/0x2a2 [8119e901] ? blk_mq_merge_queue_io+0x75/0x147 [a00fa34a] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] [8119edeb] blk_mq_run_hw_queue+0x4f/0x99 [8119fab9] blk_sq_make_request+0x163/0x170 [81196a8b] generic_make_request+0x97/0xd6 [81196bd7] submit_bio+0x10d/0x12c [810d5e15] ? __lru_cache_add+0x1e/0x3f [81142af5] mpage_bio_submit+0x25/0x2c [8114387d] mpage_readpages+0xf8/0x10c [a00fa34a] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] [a00f9c45] xfs_vm_readpages+0x18/0x1a [xfs] [810d4e5c] __do_page_cache_readahead+0x137/0x1d3 [810d5102] ondemand_readahead+0x20a/0x21b [810d5262] page_cache_sync_readahead+0x38/0x3a [810cd1c5] generic_file_read_iter+0x191/0x4fb [a010b2cf] ? xfs_ilock+0x32/0x5d [xfs] [a01023c2] xfs_file_read_iter+0x1c2/0x213 [xfs] [81118e63] new_sync_read+0x74/0x98 [81119aef] __vfs_read+0x14/0x3b [81119b8a] vfs_read+0x74/0xc1 [8111d977] kernel_read+0x3c/0x4a [8111dbd2] prepare_binprm+0x117/0x11f [8111f10d] do_execveat_common.isra.31+0x3b2/0x5d8 [8111f35a] do_execve+0x27/0x29 [81050e07] call_usermodehelper+0x10a/0x138 [81050cfd] ? call_usermodehelper+0x49/0x49 [8133b3d8] ret_from_fork+0x58/0x90 [81050cfd] ? call_usermodehelper+0x49/0x49 Code: c3 55 48 89 fa 48 c7 87 b0 00 00 00 00 00 00 00 c7 87 f4 00 00 00 00 00 00 00 48 8b bf 10 01 00 00 31 c0 b9 18 00 00 00 48 89 e5 f3 ab 66 83 ba cc 00 00 00 00 75 2a 48 8b 8a d8 00 00 00 8a 01 RIP [812853c9] scsi_init_cmd_errh+0x2a/0x62 RSP 88007bcb77a8 CR2: 1000 ---[ end trace fbec0fe487830b1d ]--- and %rdi is 0x1000. It seems to be simply memset(cmd-sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); where 'cmd-sense_buffer' has some insane value (PAGE_SIZE or just a flipped bit, or whatever) Having been observed on two isolated different systems, I don't think so much that it would be a broken HW-induced bitflip. Oh yeah, if anybody likes, I can hand out the virtualbox image. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On 04/09/2015 03:12 PM, Linus Torvalds wrote: On Thu, Apr 9, 2015 at 11:24 AM, Jan Engelhardt jeng...@inai.de wrote: It's fairly consistent (reproducible?). Only 1 in 15 or so (have not kept track really) attempts does it not die. With frame pointers: [81286d59] scsi_queue_rq+0x2e8/0x3d2 [8119e64d] __blk_mq_run_hw_queue+0x19b/0x2a2 [8119e901] ? blk_mq_merge_queue_io+0x75/0x147 [a00fa34a] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] [8119edeb] blk_mq_run_hw_queue+0x4f/0x99 [8119fab9] blk_sq_make_request+0x163/0x170 Ok, good. So the cmd comes from struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); which in turn is just return (void *) rq + sizeof(*rq); which in turn is written by some crazy monkey on crack. That's some shit code. Why the hell you'd write it that way, when the natural thing to do would be just return rq + 1; without the sizeof, and without the cast. The particular crazy monkey on crack is Jens Axboe, in commit 320ae51feed5c. Jens, really. This code is shit. That's a bit rough on a single line of code like that, don't you think? But yes, rq + 1 is identical and cleaner... That -sense_buffer thing is supposed to be initialized by the blk_mq_ops.init_request() function, which is called - if it exists = when the array of requests ('-rqs[]') is initialized. And that code too looks like crap. It seems to be very clever, trying to allocaet big contiguous chunks of RAM for the requests, but then the initialization sequence is questionable as hell. It takes that nonzeroed allocation, and zeroes a few fields randomly. The rest will contain whatever garbage data they used to. Does this entirely untested patch make any difference? And Jens, this all really looks very fishy. When I look at these kinds of core functions, and find just *stupid* code like this, it makes me unhappy. Not sure why it isn't all zeroed, definitely the saner thing to do at init time. So patch looks fine, should be applied regardless of whether or not it fixes this issue. But it's _always_ been like that, so it's not a change in behavior. It is fragile, so perhaps some SCSI change modified alloc behavior and it's not causing and issue. And if this is mpt, we recently ran into some list corruption issues due to a bug in the driver. It hit on reboot, but it was scan related, so could be a boot issue as well. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL deref around blkmq in v4.0-rc1–rc7
On Thu, Apr 9, 2015 at 11:24 AM, Jan Engelhardt jeng...@inai.de wrote: It's fairly consistent (reproducible?). Only 1 in 15 or so (have not kept track really) attempts does it not die. With frame pointers: [81286d59] scsi_queue_rq+0x2e8/0x3d2 [8119e64d] __blk_mq_run_hw_queue+0x19b/0x2a2 [8119e901] ? blk_mq_merge_queue_io+0x75/0x147 [a00fa34a] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] [8119edeb] blk_mq_run_hw_queue+0x4f/0x99 [8119fab9] blk_sq_make_request+0x163/0x170 Ok, good. So the cmd comes from struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); which in turn is just return (void *) rq + sizeof(*rq); which in turn is written by some crazy monkey on crack. That's some shit code. Why the hell you'd write it that way, when the natural thing to do would be just return rq + 1; without the sizeof, and without the cast. The particular crazy monkey on crack is Jens Axboe, in commit 320ae51feed5c. Jens, really. This code is shit. That -sense_buffer thing is supposed to be initialized by the blk_mq_ops.init_request() function, which is called - if it exists = when the array of requests ('-rqs[]') is initialized. And that code too looks like crap. It seems to be very clever, trying to allocaet big contiguous chunks of RAM for the requests, but then the initialization sequence is questionable as hell. It takes that nonzeroed allocation, and zeroes a few fields randomly. The rest will contain whatever garbage data they used to. Does this entirely untested patch make any difference? And Jens, this all really looks very fishy. When I look at these kinds of core functions, and find just *stupid* code like this, it makes me unhappy. Anyway, I assume that the actual disk in question is mpt fusion? Linus block/blk-mq.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index b7b8933ec241..33c428530193 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1457,7 +1457,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set, do { page = alloc_pages_node(set-numa_node, - GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY, + GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO, this_order); if (page) break; @@ -1479,8 +1479,6 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set, left -= to_do * rq_size; for (j = 0; j to_do; j++) { tags-rqs[i] = p; - tags-rqs[i]-atomic_flags = 0; - tags-rqs[i]-cmd_flags = 0; if (set-ops-init_request) { if (set-ops-init_request(set-driver_data, tags-rqs[i], hctx_idx, i,