Re: usercopy whitelist woe in scsi_sense_cache

2018-04-21 Thread Paolo Valente


> Il giorno 20 apr 2018, alle ore 22:23, Kees Cook  ha 
> scritto:
> 
> On Thu, Apr 19, 2018 at 2:32 AM, Paolo Valente  
> wrote:
>> I'm missing something here.  When the request gets completed in the
>> first place, the hook bfq_finish_requeue_request gets called, and that
>> hook clears both ->elv.priv elements (as the request has a non-null
>> elv.icq).  So, when bfq gets the same request again, those elements
>> must be NULL.  What am I getting wrong?
>> 
>> I have some more concern on this point, but I'll stick to this for the
>> moment, to not create more confusion.
> 
> I don't know the "how", I only found the "what". :)

Got it, although I think you did much more than that ;)

Anyway, my reply was exactly to a (Jens') detailed description of the
how.  And my concern is that there seems to be an inconsistency in
that description.  In addition, Jens is proposing a patch basing on
that description.  But, if this inconsistency is not solved, that
patch may eliminate the symptom at hand, but it may not fix the real
cause, or may even contribute to bury it deeper.

> If you want, grab
> the reproducer VM linked to earlier in this thread; it'll hit the
> problem within about 30 seconds of running the reproducer.
> 

Yep.  Actually, I've been investigating this kind of failure, in
different incarnations, for months now.  In this respect, other
examples are the srp-test failures reported by Bart, e.g., here [1].
According to my analysis, the cause of the problem is somewhere in
blk-mq, outside bfq.  Unfortunately, I didn't make it to find where it
exactly is, mainly because of my limited expertise on blk-mq
internals.  So I have asked for any kind of help and suggestions to
Jens, Mike and any other knowledgeable guy.  Probably those help
requests got somehow lost on those threads, but your results, Kees,
and the analysis that followed from Jens seems now to be carrying us
to the solution of the not-so-recent issue.  Time will tell.


Thanks,
Paolo

[1] https://www.spinics.net/lists/linux-block/msg22760.html


> -Kees
> 
> -- 
> Kees Cook
> Pixel Security



Re: usercopy whitelist woe in scsi_sense_cache

2018-04-19 Thread Paolo Valente


> Il giorno 18 apr 2018, alle ore 16:30, Jens Axboe  ha 
> scritto:
> 
> On 4/18/18 3:08 AM, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe  ha 
>>> scritto:
>>> 
>>> On 4/17/18 3:48 PM, Jens Axboe wrote:
>>>> On 4/17/18 3:47 PM, Kees Cook wrote:
>>>>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe  wrote:
>>>>>> On 4/17/18 3:25 PM, Kees Cook wrote:
>>>>>>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook  
>>>>>>> wrote:
>>>>>>>> I see elv.priv[1] assignments made in a few places -- is it possible
>>>>>>>> there is some kind of uninitialized-but-not-NULL state that can leak
>>>>>>>> in there?
>>>>>>> 
>>>>>>> Got it. This fixes it for me:
>>>>>>> 
>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>>>> index 0dc9e341c2a7..859df3160303 100644
>>>>>>> --- a/block/blk-mq.c
>>>>>>> +++ b/block/blk-mq.c
>>>>>>> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct
>>>>>>> request_queue *q,
>>>>>>> 
>>>>>>>   rq = blk_mq_rq_ctx_init(data, tag, op);
>>>>>>>   if (!op_is_flush(op)) {
>>>>>>> -   rq->elv.icq = NULL;
>>>>>>> +   memset(&rq->elv, 0, sizeof(rq->elv));
>>>>>>>   if (e && e->type->ops.mq.prepare_request) {
>>>>>>>   if (e->type->icq_cache && rq_ioc(bio))
>>>>>>>   blk_mq_sched_assign_ioc(rq, bio);
>>>>>>> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
>>>>>>>   e->type->ops.mq.finish_request(rq);
>>>>>>>   if (rq->elv.icq) {
>>>>>>>   put_io_context(rq->elv.icq->ioc);
>>>>>>> -   rq->elv.icq = NULL;
>>>>>>> +   memset(&rq->elv, 0, sizeof(rq->elv));
>>>>>>>   }
>>>>>>>   }
>>>>>> 
>>>>>> This looks like a BFQ problem, this should not be necessary. Paolo,
>>>>>> you're calling your own prepare request handler from the insert
>>>>>> as well, and your prepare request does nothing if rq->elv.icq == NULL.
>>>>> 
>>>>> I sent the patch anyway, since it's kind of a robustness improvement,
>>>>> I'd hope. If you fix BFQ also, please add:
>>>> 
>>>> It's also a memset() in the hot path, would prefer to avoid that...
>>>> The issue here is really the convoluted bfq usage of insert/prepare,
>>>> I'm sure Paolo can take it from here.
>>> 
>> 
>> Hi,
>> I'm very sorry for tuning in very late, but, at the same time, very
>> glad to find the problem probably already solved ;) (in this respect, I 
>> swear,
>> my delay was not intentional)
>> 
>>> Does this fix it?
>>> 
>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>> index f0ecd98509d8..d883469a1582 100644
>>> --- a/block/bfq-iosched.c
>>> +++ b/block/bfq-iosched.c
>>> @@ -4934,8 +4934,11 @@ static void bfq_prepare_request(struct request *rq, 
>>> struct bio *bio)
>>> bool new_queue = false;
>>> bool bfqq_already_existing = false, split = false;
>>> 
>>> -   if (!rq->elv.icq)
>>> +   if (!rq->elv.icq) {
>>> +   rq->elv.priv[0] = rq->elv.priv[1] = NULL;
>>> return;
>>> +   }
>>> +
>> 
>> This does solve the problem at hand.  But it also arouses a question,
>> related to a possible subtle bug.
>> 
>> For BFQ, !rq->elv.icq basically means "this request is not for me, as
>> I am an icq-based scheduler".  But, IIUC the main points in this
>> thread, then this assumption is false.  If it is actually false, then
>> I hope that all requests with !rq->elv.icq that are sent to BFQ do
>> verify the condition (at_head || blk_rq_is_passthrough(rq)).  In fact,
>> requests that do not verify that condition are those that BFQ must put
>> in a bfq_queue.  So, even if this patch makes the cras

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-18 Thread Paolo Valente


> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe  ha 
> scritto:
> 
> On 4/17/18 3:48 PM, Jens Axboe wrote:
>> On 4/17/18 3:47 PM, Kees Cook wrote:
>>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe  wrote:
 On 4/17/18 3:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook  wrote:
>> I see elv.priv[1] assignments made in a few places -- is it possible
>> there is some kind of uninitialized-but-not-NULL state that can leak
>> in there?
> 
> Got it. This fixes it for me:
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0dc9e341c2a7..859df3160303 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct
> request_queue *q,
> 
>rq = blk_mq_rq_ctx_init(data, tag, op);
>if (!op_is_flush(op)) {
> -   rq->elv.icq = NULL;
> +   memset(&rq->elv, 0, sizeof(rq->elv));
>if (e && e->type->ops.mq.prepare_request) {
>if (e->type->icq_cache && rq_ioc(bio))
>blk_mq_sched_assign_ioc(rq, bio);
> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
>e->type->ops.mq.finish_request(rq);
>if (rq->elv.icq) {
>put_io_context(rq->elv.icq->ioc);
> -   rq->elv.icq = NULL;
> +   memset(&rq->elv, 0, sizeof(rq->elv));
>}
>}
 
 This looks like a BFQ problem, this should not be necessary. Paolo,
 you're calling your own prepare request handler from the insert
 as well, and your prepare request does nothing if rq->elv.icq == NULL.
>>> 
>>> I sent the patch anyway, since it's kind of a robustness improvement,
>>> I'd hope. If you fix BFQ also, please add:
>> 
>> It's also a memset() in the hot path, would prefer to avoid that...
>> The issue here is really the convoluted bfq usage of insert/prepare,
>> I'm sure Paolo can take it from here.
> 

Hi,
I'm very sorry for tuning in very late, but, at the same time, very
glad to find the problem probably already solved ;) (in this respect, I swear,
my delay was not intentional)

> Does this fix it?
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index f0ecd98509d8..d883469a1582 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -4934,8 +4934,11 @@ static void bfq_prepare_request(struct request *rq, 
> struct bio *bio)
>   bool new_queue = false;
>   bool bfqq_already_existing = false, split = false;
> 
> - if (!rq->elv.icq)
> + if (!rq->elv.icq) {
> + rq->elv.priv[0] = rq->elv.priv[1] = NULL;
>   return;
> + }
> +

This does solve the problem at hand.  But it also arouses a question,
related to a possible subtle bug.

For BFQ, !rq->elv.icq basically means "this request is not for me, as
I am an icq-based scheduler".  But, IIUC the main points in this
thread, then this assumption is false.  If it is actually false, then
I hope that all requests with !rq->elv.icq that are sent to BFQ do
verify the condition (at_head || blk_rq_is_passthrough(rq)).  In fact,
requests that do not verify that condition are those that BFQ must put
in a bfq_queue.  So, even if this patch makes the crash disappear, we
drive BFQ completely crazy (and we may expect other strange failures)
if we send BFQ a request with !((at_head || blk_rq_is_passthrough(rq))
and !rq->elv.icq.  BFQ has to put that rq into a bfq_queue, but simply
cannot.

Jens, or any other, could you please shed a light on this, and explain
how things are exactly?

Thanks,
Paolo

>   bic = icq_to_bic(rq->elv.icq);
> 
>   spin_lock_irq(&bfqd->lock);
> 
> -- 
> Jens Axboe



Re: [PATCH] blk-mq: Clear out elevator private data

2018-04-18 Thread Paolo Valente


> Il giorno 17 apr 2018, alle ore 23:42, Kees Cook  ha 
> scritto:
> 
> Some elevators may not correctly check rq->rq_flags & RQF_ELVPRIV, and
> may attempt to read rq->elv fields. When requests got reused, this
> caused BFQ to think it already had a bfqq (rq->elv.priv[1]) allocated.

Hi Kees,
where does BFQ gets confused and operates on a request not destined to
it?  I'm asking because I paid attention to always avoid such a
mistake.

Thanks,
Paolo

> This could lead to odd behaviors like having the sense buffer address
> slowly start incrementing. This eventually tripped HARDENED_USERCOPY
> and KASAN.
> 
> This patch wipes all of rq->elv instead of just rq->elv.icq. While
> it shouldn't technically be needed, this ends up being a robustness
> improvement that should lead to at least finding bugs in elevators faster.
> 
> Reported-by: Oleksandr Natalenko 
> Fixes: bd166ef183c26 ("blk-mq-sched: add framework for MQ capable IO 
> schedulers")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
> In theory, BFQ needs to also check the RQF_ELVPRIV flag, but I'll leave that
> to Paolo to figure out. Also, my Fixes line is kind of a best-guess. This
> is where icq was originally wiped, so it seemed as good a commit as any.
> ---
> block/blk-mq.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0dc9e341c2a7..859df3160303 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct 
> request_queue *q,
> 
>   rq = blk_mq_rq_ctx_init(data, tag, op);
>   if (!op_is_flush(op)) {
> - rq->elv.icq = NULL;
> + memset(&rq->elv, 0, sizeof(rq->elv));
>   if (e && e->type->ops.mq.prepare_request) {
>   if (e->type->icq_cache && rq_ioc(bio))
>   blk_mq_sched_assign_ioc(rq, bio);
> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
>   e->type->ops.mq.finish_request(rq);
>   if (rq->elv.icq) {
>   put_io_context(rq->elv.icq->ioc);
> - rq->elv.icq = NULL;
> + memset(&rq->elv, 0, sizeof(rq->elv));
>   }
>   }
> 
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security



Re: General protection fault with use_blk_mq=1.

2018-03-28 Thread Paolo Valente


> Il giorno 29 mar 2018, alle ore 05:22, Jens Axboe  ha 
> scritto:
> 
> On 3/28/18 9:13 PM, Zephaniah E. Loss-Cutler-Hull wrote:
>> On 03/28/2018 06:02 PM, Jens Axboe wrote:
>>> On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote:
 I am not subscribed to any of the lists on the To list here, please CC
 me on any replies.
 
 I am encountering a fairly consistent crash anywhere from 15 minutes to
 12 hours after boot with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1> 
 The crash looks like:
 
>> 
 
 Looking through the code, I'd guess that this is dying inside
 blkg_rwstat_add, which calls percpu_counter_add_batch, which is what RIP
 is pointing at.
>>> 
>>> Leaving the whole thing here for Paolo - it's crashing off insertion of
>>> a request coming out of SG_IO. Don't think we've seen this BFQ failure
>>> case before.
>>> 
>>> You can mitigate this by switching the scsi-mq devices to mq-deadline
>>> instead.
>>> 
>> 
>> I'm thinking that I should also be able to mitigate it by disabling
>> CONFIG_DEBUG_BLK_CGROUP.
>> 
>> That should remove that entire chunk of code.
>> 
>> Of course, that won't help if this is actually a symptom of a bigger
>> problem.
> 
> Yes, it's not a given that it will fully mask the issue at hand. But
> turning off BFQ has a much higher chance of working for you.
> 
> This time actually CC'ing Paolo.
> 

Hi Zephaniah,
if you are actually interested in the benefits of BFQ (low latency,
high responsiveness, fairness, ...) then it may be worth to try what
you yourself suggest: disabling CONFIG_DEBUG_BLK_CGROUP.  Also because
this option activates the heavy computation of debug cgroup statistics,
which probably you don't use.

In addition, the outcome of your attempt without
CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information:
- if no failure occurs, then the issue is likely to be confined in
that debugging code (which, on the bright side, is likely to be of
occasional interest, for only a handful of developers)
- if the issue still shows up, then we may have new hints on this odd
failure

Finally, consider that this issue has been reported to disappear from
4.16 [1], and, as a plus, that the service quality of BFQ had a
further boost exactly from 4.16.

Looking forward to your feedback, in case you try BFQ without
CONFIG_DEBUG_BLK_CGROUP,
Paolo

[1] https://www.spinics.net/lists/linux-block/msg21422.html

> 
> -- 
> Jens Axboe



Re: General protection fault with use_blk_mq=1.

2018-03-28 Thread Paolo Valente


> Il giorno 29 mar 2018, alle ore 03:02, Jens Axboe  ha 
> scritto:
> 
> On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote:
>> I am not subscribed to any of the lists on the To list here, please CC
>> me on any replies.
>> 
>> I am encountering a fairly consistent crash anywhere from 15 minutes to
>> 12 hours after boot with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1> 
>> The crash looks like:
>> 
>> [ 5466.075993] general protection fault:  [#1] PREEMPT SMP PTI
>> [ 5466.075997] Modules linked in: esp4 xfrm4_mode_tunnel fuse usblp
>> uvcvideo pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O)
>> ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4
>> xt_conntrack nf_conntrack iptable_filter ip_tables x_tables intel_rapl
>> joydev serio_raw wmi_bmof iwldvm iwlwifi shpchp kvm_intel kvm irqbypass
>> autofs4 algif_skcipher nls_iso8859_1 nls_cp437 crc32_pclmul
>> ghash_clmulni_intel
>> [ 5466.076022] CPU: 3 PID: 10573 Comm: pool Tainted: G   O
>> 4.15.13-f1-dirty #148
>> [ 5466.076024] Hardware name: Hewlett-Packard HP EliteBook Folio
>> 9470m/18DF, BIOS 68IBD Ver. F.44 05/22/2013
>> [ 5466.076029] RIP: 0010:percpu_counter_add_batch+0x2b/0xb0
>> [ 5466.076031] RSP: 0018:a556c47afb58 EFLAGS: 00010002
>> [ 5466.076033] RAX: 95cda87ce018 RBX: 95cda87cdb68 RCX:
>> 
>> [ 5466.076034] RDX: 3fff RSI: 896495c4 RDI:
>> 895b2bed
>> [ 5466.076036] RBP: 3fff R08:  R09:
>> 95cb7d5f8148
>> [ 5466.076037] R10: 0200 R11:  R12:
>> 0001
>> [ 5466.076038] R13: 95cda87ce088 R14: 95cda6ebd100 R15:
>> a556c47afc58
>> [ 5466.076040] FS:  7f25f5305700() GS:95cdbeac()
>> knlGS:
>> [ 5466.076042] CS:  0010 DS:  ES:  CR0: 80050033
>> [ 5466.076043] CR2: 7f25e807e0a8 CR3: 0003ed5a6001 CR4:
>> 001606e0
>> [ 5466.076044] Call Trace:
>> [ 5466.076050]  bfqg_stats_update_io_add+0x58/0x100
>> [ 5466.076055]  bfq_insert_requests+0xec/0xd80
>> [ 5466.076059]  ? blk_rq_append_bio+0x8f/0xa0
>> [ 5466.076061]  ? blk_rq_map_user_iov+0xc3/0x1d0
>> [ 5466.076065]  blk_mq_sched_insert_request+0xa3/0x130
>> [ 5466.076068]  blk_execute_rq+0x3a/0x50
>> [ 5466.076070]  sg_io+0x197/0x3e0
>> [ 5466.076073]  ? dput+0xca/0x210
>> [ 5466.076077]  ? mntput_no_expire+0x11/0x1a0
>> [ 5466.076079]  scsi_cmd_ioctl+0x289/0x400
>> [ 5466.076082]  ? filename_lookup+0xe1/0x170
>> [ 5466.076085]  sd_ioctl+0xc7/0x1a0
>> [ 5466.076088]  blkdev_ioctl+0x4d4/0x8c0
>> [ 5466.076091]  block_ioctl+0x39/0x40
>> [ 5466.076094]  do_vfs_ioctl+0x92/0x5e0
>> [ 5466.076097]  ? __fget+0x73/0xc0
>> [ 5466.076099]  SyS_ioctl+0x74/0x80
>> [ 5466.076102]  do_syscall_64+0x60/0x110
>> [ 5466.076106]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>> [ 5466.076109] RIP: 0033:0x7f25f75fef47
>> [ 5466.076110] RSP: 002b:7f25f53049a8 EFLAGS: 0246 ORIG_RAX:
>> 0010
>> [ 5466.076112] RAX: ffda RBX: 000c RCX:
>> 7f25f75fef47
>> [ 5466.076114] RDX: 7f25f53049b0 RSI: 2285 RDI:
>> 000c
>> [ 5466.076115] RBP: 0010 R08: 7f25e8007818 R09:
>> 0200
>> [ 5466.076116] R10: 0001 R11: 0246 R12:
>> 
>> [ 5466.076118] R13:  R14: 7f25f8a6b5e0 R15:
>> 7f25e80173e0
>> [ 5466.076120] Code: 41 55 49 89 fd bf 01 00 00 00 41 54 49 89 f4 55 89
>> d5 53 e8 18 e1 bb ff 48 c7 c7 c4 95 64 89 e8 dc e9 fb ff 49 8b 45 20 48
>> 63 d5 <65> 8b 18 48 63 db 4c 01 e3 48 39 d3 7d 0a f7 dd 48 63 ed 48 39
>> [ 5466.076147] RIP: percpu_counter_add_batch+0x2b/0xb0 RSP: a556c47afb58
>> [ 5466.076149] ---[ end trace 8d7eb80aafef4494 ]---
>> [ 5466.670153] note: pool[10573] exited with preempt_count 2
>> 
>> (I only have the one instance right this minute as a result of not
>> having remote syslog setup before now.)
>> 
>> This is clearly deep in the blk_mq code, and it goes away when I remove
>> the use_blk_mq kernel command line parameters.
>> 
>> My next obvious step is to try and disable the load of the vbox modules.
>> 
>> I can include the full dmesg output if it would be helpful.
>> 
>> The system is an older HP Ultrabook, and the root partition is, sda1 (a
>> SSD) -> a LUKS encrypted partition -> LVM -> BTRFS.
>> 
>> The kernel is a stock 4.15.11, however I only recently added the blk_mq
>> options, so while I can state that I have seen this on multiple kernels
>> in the 4.15.x series, I have not tested earlier kernels in this
>> configuration.
>> 
>> Looking through the code, I'd guess that this is dying inside
>> blkg_rwstat_add, which calls percpu_counter_add_batch, which is what RIP
>> is pointing at.
> 
> Leaving the whole thing here for Paolo - it's crashing off insertion of
> a request coming out of SG_IO. Don't think we've seen this BFQ failure
> case before.
> 

Actually, we have.  Found and reported by Ming ab

Re: [PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1)

2017-10-10 Thread Paolo Valente

> Il giorno 10 ott 2017, alle ore 14:34, Johannes Thumshirn 
>  ha scritto:
> 
> Hi John,
> 
> On Tue, Oct 10, 2017 at 01:24:52PM +0100, John Garry wrote:
>> It's using cfq (for non-mq) and mq-deadline (obviously for mq).
> 
> Please be aware that cfq and mq-deadline are _not_ comparable, for a realistic
> comparasion please use deadline and mq-deadline or cfq and bfq.
> 

Please set low_latency=0 for bfq if yours is just a maximum-throughput test.

Thanks,
Paolo

>> root@(none)$ pwd
>> /sys/devices/platform/HISI0162:01/host0/port-0:0/expander-0:0/port-0:0:7/end_device-0:0:7
>> root@(none)$ more ./target0:0:3/0:0:3:0/block/sdd/queue/scheduler
>> noop [cfq]
> 
> Maybe missing CONFIG_IOSCHED_DEADLINE?
> 
> Thanks,
>   Johannes
> 
> -- 
> Johannes Thumshirn  Storage
> jthumsh...@suse.de+49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850