Re: usercopy whitelist woe in scsi_sense_cache

2018-04-20 Thread Oleksandr Natalenko

Hi.

On 20.04.2018 22:23, Kees Cook wrote:

I don't know the "how", I only found the "what". :) If you want, grab
the reproducer VM linked to earlier in this thread; it'll hit the
problem within about 30 seconds of running the reproducer.


Just to avoid a possible confusion I should note that I've removed the 
reproducer from my server, but I can re-upload it if needed.


--
  Oleksandr Natalenko (post-factum)


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Oleksandr Natalenko

Hi.

17.04.2018 23:47, Kees Cook wrote:

I sent the patch anyway, since it's kind of a robustness improvement,
I'd hope. If you fix BFQ also, please add:

Reported-by: Oleksandr Natalenko <oleksa...@natalenko.name>
Root-caused-by: Kees Cook <keesc...@chromium.org>

:) I gotta task-switch to other things!

Thanks for the pointers, and thank you Oleksandr for providing the 
reproducer!


That was a great fun to read. Thank you for digging into it.

Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Oleksandr Natalenko

Hi.

17.04.2018 05:12, Kees Cook wrote:
Turning off HARDENED_USERCOPY and turning on KASAN, I see the same 
report:


[   38.274106] BUG: KASAN: slab-out-of-bounds in 
_copy_to_user+0x42/0x60
[   38.274841] Read of size 22 at addr 8800122b8c4b by task 
smartctl/1064

[   38.275630]
[   38.275818] CPU: 2 PID: 1064 Comm: smartctl Not tainted 
4.17.0-rc1-ARCH+ #266

[   38.276631] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   38.277690] Call Trace:
[   38.277988]  dump_stack+0x71/0xab
[   38.278397]  ? _copy_to_user+0x42/0x60
[   38.278833]  print_address_description+0x6a/0x270
[   38.279368]  ? _copy_to_user+0x42/0x60
[   38.279800]  kasan_report+0x243/0x360
[   38.280221]  _copy_to_user+0x42/0x60
[   38.280635]  sg_io+0x459/0x660
...

Though we get slightly more details (some we already knew):

[   38.301330] Allocated by task 329:
[   38.301734]  kmem_cache_alloc_node+0xca/0x220
[   38.302239]  scsi_mq_init_request+0x64/0x130 [scsi_mod]
[   38.302821]  blk_mq_alloc_rqs+0x2cf/0x370
[   38.303265]  blk_mq_sched_alloc_tags.isra.4+0x7d/0xb0
[   38.303820]  blk_mq_init_sched+0xf0/0x220
[   38.304268]  elevator_switch+0x17a/0x2c0
[   38.304705]  elv_iosched_store+0x173/0x220
[   38.305171]  queue_attr_store+0x72/0xb0
[   38.305602]  kernfs_fop_write+0x188/0x220
[   38.306049]  __vfs_write+0xb6/0x330
[   38.306436]  vfs_write+0xe9/0x240
[   38.306804]  ksys_write+0x98/0x110
[   38.307181]  do_syscall_64+0x6d/0x1d0
[   38.307590]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   38.308142]
[   38.308316] Freed by task 0:
[   38.308652] (stack is not available)
[   38.309060]
[   38.309243] The buggy address belongs to the object at 
8800122b8c00

[   38.309243]  which belongs to the cache scsi_sense_cache of size 96
[   38.310625] The buggy address is located 75 bytes inside of
[   38.310625]  96-byte region [8800122b8c00, 8800122b8c60)


With a hardware watchpoint, I've isolated the corruption to here:

bfq_dispatch_request+0x2be/0x1610:
__bfq_dispatch_request at block/bfq-iosched.c:3902
3900if (rq) {
3901inc_in_driver_start_rq:
3902bfqd->rq_in_driver++;
3903start_rq:
3904rq->rq_flags |= RQF_STARTED;
3905}

Through some race condition(?), rq_in_driver is also sense_buffer, and
it can get incremented.
…
I still haven't figured this out, though... any have a moment to look 
at this?


By any chance, have you tried to simplify the reproducer environment, or 
it still needs my complex layout to trigger things even with KASAN?


Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-12 Thread Oleksandr Natalenko
Hi.

On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote:
> My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire
> up timeouts"), which seems insane given that null_blk isn't even built
> in the .config. I managed to get the testing automated now for a "git
> bisect run ...", so I'm restarting again to hopefully get a better
> answer. Normally the Oops happens in the first minute of running. I've
> set the timeout to 10 minutes for a "good" run...

Apparently, you do this on Linus' tree, right? If so, I won't compile it here 
then.

Thanks for taking care of this.

Regards,
  Oleksandr




Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Oleksandr Natalenko

Hi, Kees, Paolo et al.

10.04.2018 08:53, Kees Cook wrote:

Unfortunately I only had a single hang with no dumps. I haven't been
able to reproduce it since. :(


For your convenience I've prepared a VM that contains a reproducer.

It consists of 3 disk images (sda.img is for the system, it is 
Arch-based, sdb/sdc.img are for RAID). They are available (in a 
compressed form) to download here [1].


RAID is built as RAID10 with far2 layout, on top of it there is a LUKS 
container (can be opened either with keyfile under the /root or using 
"qwerty" password). There's one LVM PV, one VG and one volume on top of 
LUKS containing XFS. RAID is automatically assembled during the boot, so 
you don't have to worry about it.


I run the VM like this:

$ qemu-system-x86_64 -display gtk,gl=on -machine q35,accel=kvm -cpu 
host,+vmx -enable-kvm -netdev user,id=user.0 -device 
virtio-net,netdev=user.0 -usb -device nec-usb-xhci,id=xhci -device 
usb-tablet,bus=xhci.0 -serial stdio -smp 4 -m 512 -hda sda.img -hdb 
sdb.img -hdc sdc.img


The system is accessible via both VGA and serial console. The user is 
"root", the password is "qwerty".


Under the /root folder there is a reproducer script (reproducer.sh). It 
does trivial things like enabling sysrq, opening LUKS device, mounting a 
volume, running a background I/O (this is an important part, actually, 
since I wasn't able to trigger the issue without the background I/O) 
and, finally, running the smartctl in a loop. If you are lucky, within a 
minute or two you'll get the first warning followed shortly by 
subsequent bugs and I/O stall (htop is pre-installed for your 
convenience too).


Notable changes in this VM comparing to generic defaults:

1) blk-mq is enabled via kernel cmdline (scsi_mod.use_blk_mq=1 is in 
/etc/default/grub)
2) BFQ is set via udev (check /etc/udev/rules.d/10-io-scheduler.rules 
file)


Again, I wasn't able to reproduce the usercopy warning/bug and I/O hang 
without all these components being involved.


Hope you enjoy it.

P.S. I haven't tested Linus' master branch yet. For now, this VM runs 
v4.16 kernel.


Regards,
  Oleksandr

[1] https://natalenko.name/myfiles/usercopy_bfq_woe/


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Oleksandr Natalenko

Hi.

10.04.2018 08:35, Oleksandr Natalenko wrote:

- does it reproduce _without_ hardened usercopy? (I would assume yes,
but you'd just not get any warning until the hangs started.) If it
does reproduce without hardened usercopy, then a new bisect run could
narrow the search even more.


Looks like it cannot be disabled via kernel cmdline, so I have to
re-compile the kernel, right? I can certainly do that anyway.


Okay, I've recompiled the kernel without hardened usercopy:

[root@archlinux ~]# zgrep USERCOPY /proc/config.gz
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set

and I cannot reproduce the issue anymore. I/O doesn't hang regardless of 
how long I hammer it.


Eeeh? Maybe, this is a matter of some cleanup code path once the 
warn/bug condition is hit with hardening enabled? I'm just guessing here 
again.


Will work towards checking Linus' master branch now…

Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Oleksandr Natalenko

Hi.

09.04.2018 22:30, Kees Cook wrote:

echo 1 | tee /sys/block/sd*/queue/nr_requests


I can't get this below "4".


Oops, yeah. It cannot be less than BLKDEV_MIN_RQ (which is 4), so it is 
enforced explicitly in queue_requests_store(). It is the same for me.



echo 1 | tee /sys/block/sd*/device/queue_depth


I've got this now too.
Ah! dm-crypt too. I'll see if I can get that added easily to my tests.
And XFS! You love your corner cases. ;)


Yeah, so far this wonderful configuration has allowed me to uncover a 
bunch of bugs, and see, we are not done yet ;).



Two other questions, since you can reproduce this easily:
- does it reproduce _without_ hardened usercopy? (I would assume yes,
but you'd just not get any warning until the hangs started.) If it
does reproduce without hardened usercopy, then a new bisect run could
narrow the search even more.


Looks like it cannot be disabled via kernel cmdline, so I have to 
re-compile the kernel, right? I can certainly do that anyway.



- does it reproduce with Linus's current tree?


Will try this too.


What would imply missing locking, yes? Yikes. But I'd expect
use-after-free or something, or bad data, not having the pointer slip
forward?


I still think this has something to do with blk-mq re-queuing capability 
and how BFQ implements it, because there are no sings of issue popping 
up with Kyber so far.



Quick update: I added dm-crypt (with XFS on top) and it hung my system
almost immediately. I got no warnings at all, though.


Did your system hang on smartctl hammering too? Have you got some stack 
traces to compare with mine ones?


Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Oleksandr Natalenko
 on this please?



The thing I can't figure out is how req->sense is slipping forward in
(and even beyond!) the allocation.


*cough* race condition *cough* ☺

Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Oleksandr Natalenko

Hi.

09.04.2018 11:35, Christoph Hellwig wrote:

I really can't make sense of that report.


Sorry, I have nothing to add there so far, I just see the symptom of 
something going wrong in the ioctl code path that is invoked by 
smartctl, but I have no idea what's the minimal environment to reproduce 
it. I'll try to collect more info.



And I'm also curious why
you think 17cb960f29c2 should change anything for that code path.


Maybe, I've just mis-read that code.

Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-08 Thread Oleksandr Natalenko
st me 
not being persistent enough.

It looks like this code path was re-written completely with 17cb960f29c2, but 
it went merged for the upcoming v4.17 only, and thus I haven't tried it yet.

Kees took a brief look at it already: [1]. This is what smartctl does [2] 
(just a usual strace capture when the bug is not triggered).

Christoph, do you have some idea on why this can happen?

Thanks.

Regards,
  Oleksandr

[1] https://marc.info/?l=linux-scsi=152287333013845=2
[2] https://gist.github.com/pfactum/6f58f8891468aeba1ab2cc9f45668735




Re: usercopy whitelist woe in scsi_sense_cache

2018-04-06 Thread Oleksandr Natalenko

Hi.

05.04.2018 20:52, Kees Cook wrote:

Okay. My qemu gets mad about that and wants the format=raw argument,
so I'm using:

-drive file=sda.img,format=raw \
-drive file=sdb.img,format=raw \

How are you running your smartctl? I'm doing this now:

[1]   Running while :; do
( smartctl -a /dev/sda; smartctl -a /dev/sdb ) > /dev/null;
done &


Yes, so do I.


I assume I'm missing something from your .config, but since I don't
boot with an initramfs, I had to tweak it a bit. I'll try again...


Let me, maybe, describe, what both the VM and the server have in common:

1. have 4 CPUs
2. are EFI-based
3. use blk-mq with BFQ scheduler (it is set up via udev rule during 
boot)

4. have zswap enabled
5. have 2 SATA disks with RAID10 on top of it (layout f2)
6. have LUKS on top of the RAID, and LVM on top of the LUKS

VM has machine type "q35", BTW.

Do you think something of what's mentioned above is relevant for the 
code path in question?


Thanks.

Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-05 Thread Oleksandr Natalenko

05.04.2018 16:32, Oleksandr Natalenko wrote:

"-hda sda.img -hdb sda.img"


"-hda sda.img -hdb sdb.img", of course, I don't pass the same disk twice 
☺


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-05 Thread Oleksandr Natalenko

Hi.

05.04.2018 16:21, Kees Cook wrote:

I had a VM running over night with:

[1]   Running while :; do
smartctl -a /dev/sda > /dev/null;
done &
[2]-  Running while :; do
ls --color=auto -lR / > /dev/null 2> /dev/null;
done &
[3]+  Running while :; do
sleep $(( $RANDOM % 100 )); sync; echo 3 > 
/proc/sys/vm/drop_caches;

done &

and I haven't seen the issue. :(

FWIW, I'm using the ahci qemu driver:

-drive file=disk-image.raw,if=none,id=drive0,format=raw \
-device ahci,id=bus0 \
-device ide-drive,bus=bus0.0,drive=drive0

Does this match your qemu instance?


Well, not really. I just pass 2 raw disks as "-hda sda.img -hdb sda.img" 
(it is a playground VM for me with RAID10, LVM and LUKS inside, but I 
hope this doesn't matter). Does passing "-hda" differ from your 
approach?


Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-05 Thread Oleksandr Natalenko

Hi.

04.04.2018 23:25, Kees Cook wrote:

Thanks for the report! I hope someone more familiar with sg_io() can
help explain the changing buffer offset... :P


Also, FYI, I kept the server running with smartctl periodically invoked, 
and it was still triggering BUGs, however, I consider them to be more or 
less harmless until the server got stuck with high I/O wait this morning 
after next smartctl invocation. So, it isn't harmless, it seems…


It could be unrelated, of course, since the journal didn't give me any 
hint (or a stack trace) on what happened, thus I'll monitor how things 
behave without smartctl too.


Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-04 Thread Oleksandr Natalenko

Hi.

04.04.2018 23:25, Kees Cook wrote:

Actually, I can trigger a BUG too:

[  129.259213] usercopy: Kernel memory exposure attempt detected from 
SLUB

object 'scsi_sense_cache' (offset 119, size 22)!


Wow, yeah, that's totally outside the slub object_size. How did you
trigger this? Just luck or something specific?


Just luck, I suppose. It usually comes after the first warning if you 
wait long enough (maybe, a couple of extra minutes).


To give you an idea regarding variety of offsets, I've summarised kernel 
log from the server:


$ sudo journalctl -kb | grep "Kernel memory exposure attempt detected" | 
grep -oE 'offset [0-9]+, size [0-9]+' | sort | uniq -c

  9 offset 107, size 22
  6 offset 108, size 22
  8 offset 109, size 22
  7 offset 110, size 22
  5 offset 111, size 22
  5 offset 112, size 22
  2 offset 113, size 22
  2 offset 114, size 22
  1 offset 115, size 22
  1 offset 116, size 22
  1 offset 119, size 22
  1 offset 85, size 22


I'd really like to understand how the buffer position can be
changing... I'd expect that to break all kinds of things (i.e.
free()ing the slab later would break too...)


I haven't checked the code yet, but the first thing that comes to my 
mind is some uninitialised variable. Just guessing here, though.



Thanks for the report! I hope someone more familiar with sg_io() can
help explain the changing buffer offset... :P


Hopefully, SCSI people are Cc'ed here properly…

Thanks!

Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-04 Thread Oleksandr Natalenko

Hi.

On středa 4. dubna 2018 22:21:53 CEST Kees Cook wrote:

...
That means scsi_sense_cache should be 96 bytes in size? But a 22 byte
read starting at offset 94 happened? That seems like a 20 byte read
beyond the end of the SLUB object? Though if it were reading past the
actual end of the object, I'd expect the hardened usercopy BUG (rather
than the WARN) to kick in. Ah, it looks like
/sys/kernel/slab/scsi_sense_cache/slab_size shows this to be 128 bytes
of actual allocation, so the 20 bytes doesn't strictly overlap another
object (hence no BUG):
...


Actually, I can trigger a BUG too:

[  129.259213] usercopy: Kernel memory exposure attempt detected from 
SLUB

object 'scsi_sense_cache' (offset 119, size 22)!
[  129.265167] [ cut here ]
[  129.267579] kernel BUG at mm/usercopy.c:100!

And also offset can be different, as you may see:

[   55.993224] Bad or missing usercopy whitelist? Kernel memory exposure
attempt detected from SLUB object 'scsi_sense_cache' (offset 76, size 
22)!
[   55.998678] WARNING: CPU: 0 PID: 1305 at mm/usercopy.c:81 
usercopy_warn

+0x7e/0xa0

It looks like only the size stays the same.


Can you send me your .config? What SCSI drivers are you using in the
VM and on the real server?


This is an Arch kernel with a config available here [1].

For both server and VM "lspci -vv" shows "ahci" in use. Is this what you 
are

asking for?


Are you able to see what ioctl()s smartctl is issuing? I'll try to
reproduce this on my end...


As per [2], strace shows "SG_IO" requests. Is this detailed enough?

Thanks for looking into it.

Regards,
  Oleksandr

[1] https://git.archlinux.org/svntogit/packages.git/plain/trunk/config?
h=packages/linux=d7625be23f83416491d202d5cea96e5a871fb216
[2] https://gist.github.com/6f58f8891468aeba1ab2cc9f45668735


usercopy whitelist woe in scsi_sense_cache

2018-04-04 Thread Oleksandr Natalenko
Hi, Kees, David et al.

With v4.16 I get the following dump while using smartctl:

===
[  261.260617] [ cut here ]
[  261.262135] Bad or missing usercopy whitelist? Kernel memory exposure 
attempt detected from SLUB object 'scsi_sense_cache' (offset 94, size 22)!
[  261.267672] WARNING: CPU: 2 PID: 27041 at mm/usercopy.c:81 usercopy_warn
+0x7e/0xa0
[  261.273624] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel 
kvm iTCO_wdt ppdev irqbypass bochs_drm ttm iTCO_vendor_support drm_kms_helper 
drm psmouse input_leds led_class pcspkr joydev intel_agp parport_pc mousedev 
evdev syscopyarea intel_gtt i2c_i801 sysfillrect parport rtc_cmos sysimgblt 
qemu_fw_cfg mac_hid agpgart fb_sys_fops lpc_ich ip_tables x_tables xfs 
dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c 
crc32c_generic dm_crypt algif_skcipher af_alg dm_mod raid10 md_mod hid_generic 
usbhid hid sr_mod cdrom sd_mod crct10dif_pclmul uhci_hcd crc32_pclmul 
crc32c_intel ghash_clmulni_intel pcbc serio_raw ahci atkbd aesni_intel 
xhci_pci aes_x86_64 ehci_pci libahci crypto_simd libps2 glue_helper xhci_hcd 
ehci_hcd libata cryptd usbcore usb_common i8042 serio virtio_scsi scsi_mod
[  261.300752]  virtio_blk virtio_net virtio_pci virtio_ring virtio
[  261.305534] CPU: 2 PID: 27041 Comm: smartctl Not tainted 4.16.0-1-ARCH #1
[  261.309936] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[  261.313668] RIP: 0010:usercopy_warn+0x7e/0xa0
[  261.315653] RSP: 0018:ab5aca6cfc40 EFLAGS: 00010286
[  261.320038] RAX:  RBX: 8e8cd893605e RCX: 
0001
[  261.322215] RDX: 8001 RSI: 83eb4672 RDI: 

[  261.325680] RBP: 0016 R08:  R09: 
0282
[  261.328462] R10: 83e896b1 R11: 0001 R12: 
0001
[  261.330584] R13: 8e8cd8936074 R14: 8e8cd893605e R15: 
0016
[  261.332748] FS:  7f5a81bdf040() GS:8e8cdf70() knlGS:

[  261.337929] CS:  0010 DS:  ES:  CR0: 80050033
[  261.343128] CR2: 7fff3a6790a8 CR3: 18228006 CR4: 
00160ee0
[  261.345976] Call Trace:
[  261.350620]  __check_object_size+0x130/0x1a0
[  261.355775]  sg_io+0x269/0x3f0
[  261.360729]  ? path_lookupat+0xaa/0x1f0
[  261.364027]  ? current_time+0x18/0x70
[  261.366684]  scsi_cmd_ioctl+0x257/0x410
[  261.369871]  ? xfs_bmapi_read+0x1c3/0x340 [xfs]
[  261.372231]  sd_ioctl+0xbf/0x1a0 [sd_mod]
[  261.375456]  blkdev_ioctl+0x8ca/0x990
[  261.381156]  ? read_null+0x10/0x10
[  261.384984]  block_ioctl+0x39/0x40
[  261.388739]  do_vfs_ioctl+0xa4/0x630
[  261.392624]  ? vfs_write+0x164/0x1a0
[  261.396658]  SyS_ioctl+0x74/0x80
[  261.399563]  do_syscall_64+0x74/0x190
[  261.402685]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  261.414154] RIP: 0033:0x7f5a8115ed87
[  261.417184] RSP: 002b:7fff3a65a458 EFLAGS: 0246 ORIG_RAX: 
0010
[  261.427362] RAX: ffda RBX: 7fff3a65a700 RCX: 
7f5a8115ed87
[  261.432075] RDX: 7fff3a65a470 RSI: 2285 RDI: 
0003
[  261.436200] RBP: 7fff3a65a750 R08: 0010 R09: 

[  261.446689] R10:  R11: 0246 R12: 
55b5481d9ce0
[  261.450059] R13:  R14: 55b5481d3550 R15: 
00da
[  261.455103] Code: 48 c7 c0 f1 af e5 83 48 0f 44 c2 41 50 51 41 51 48 89 f9 
49 89 f1 4d 89 d8 4c 89 d2 48 89 c6 48 c7 c7 48 b0 e5 83 e8 32 a7 e3 ff <0f> 
0b 48 83 c4 18 c3 48 c7 c6 44 0d e5 83 49 89 f1 49 89 f3 eb 
[  261.467988] ---[ end trace 75034b3832c364e4 ]---
===

I can easily reproduce it with a qemu VM and 2 virtual SCSI disks by calling 
smartctl in a loop and doing some usual background I/O. The warning is 
triggered within 3 minutes or so (not instantly).

Initially, it was produced on my server after a kernel update (because disks 
are monitored with smartctl via Zabbix).

Looks like the thing was introduced with 
0afe76e88c57d91ef5697720aed380a339e3df70.

Any idea how to deal with this please? If needed, I can provide any additional 
info, and also I'm happy/ready to test any proposed patches.

Thanks.

Regards,
  Oleksandr




Re: [PATCH v11 0/7] block, scsi, md: Improve suspend and resume

2017-11-09 Thread Oleksandr Natalenko
Then,

Reported-by: Oleksandr Natalenko <oleksa...@natalenko.name>
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>

On čtvrtek 9. listopadu 2017 17:55:58 CET Jens Axboe wrote:
> On 11/09/2017 09:54 AM, Bart Van Assche wrote:
> > On Thu, 2017-11-09 at 07:16 +0100, Oleksandr Natalenko wrote:
> >> is this something known to you, or it is just my fault applying this
> >> series to v4.13? Except having this warning, suspend/resume works for
> >> me:
> >> 
> >> [   27.383846] sd 0:0:0:0: [sda] Starting disk
> >> [   27.383976] sd 1:0:0:0: [sdb] Starting disk
> >> [   27.451218] sdb: Attempt to allocate non-preempt request in
> >> preempt-only
> >> mode.
> >> [   27.459640] [ cut here ]
> >> [   27.464521] WARNING: CPU: 0 PID: 172 at block/blk-core.c:823
> >> blk_queue_enter+0x222/0x280> 
> > Hello Oleksandr,
> > 
> > Thanks for the testing. The warning that you reported is expected. Maybe
> > it
> > should be left out to avoid that users get confused.
> 
> If the warning is expected, then it should be removed. It'll just confuse
> users and cause useless bug reports.
> 
> > Jens, this patch series still applies cleanly on top of your
> > for-4.15/block
> > branch. Are you fine with this patch series or do you perhaps want me to
> > repost it with Oleksandr's Tested-by tag added to each patch?
> 
> Since you need to kill the warning anyway, let's get it respun.




Re: [PATCH v11 0/7] block, scsi, md: Improve suspend and resume

2017-11-08 Thread Oleksandr Natalenko
Bart,

is this something known to you, or it is just my fault applying this series to 
v4.13? Except having this warning, suspend/resume works for me:

===
[   27.383846] sd 0:0:0:0: [sda] Starting disk
[   27.383976] sd 1:0:0:0: [sdb] Starting disk
[   27.451218] sdb: Attempt to allocate non-preempt request in preempt-only 
mode.
[   27.459640] [ cut here ]
[   27.464521] WARNING: CPU: 0 PID: 172 at block/blk-core.c:823 
blk_queue_enter+0x222/0x280
[   27.470867] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel 
iTCO_wdt bochs_drm iTCO_vendor_support ppdev kvm ttm irqbypass evdev 
input_leds drm_kms_helper joydev psmouse led_class lpc_ich pcspkr i2c_i801 
mousedev parport_pc mac_hid qemu_fw_cfg drm parport syscopyarea sysfillrect 
sysimgblt button fb_sys_fops intel_agp intel_gtt sch_fq_codel ip_tables 
x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c 
crc32c_generic dm_crypt algif_skcipher af_alg dm_mod dax raid10 md_mod sr_mod 
sd_mod cdrom hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel 
ghash_clmulni_intel uhci_hcd pcbc serio_raw atkbd libps2 ahci aesni_intel 
xhci_pci ehci_pci aes_x86_64 crypto_simd glue_helper xhci_hcd ehci_hcd libahci 
cryptd libata usbcore usb_common i8042 serio virtio_pci
[   27.501799]  virtio_net virtio_scsi scsi_mod virtio_ring virtio
[   27.503639] CPU: 0 PID: 172 Comm: md0_raid10 Not tainted 4.13.0-pf13 #1
[   27.505492] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[   27.507693] task: 88001f6aa340 task.stack: c95e8000
[   27.509516] RIP: 0010:blk_queue_enter+0x222/0x280
[   27.511623] RSP: 0018:c95ebb70 EFLAGS: 00010282
[   27.512978] RAX: 0042 RBX:  RCX: 

[   27.514389] RDX:  RSI: 88001f80dbd8 RDI: 
88001f80dbd8
[   27.516339] RBP: c95ebbd0 R08: 028e R09: 

[   27.519083] R10: c95ebc50 R11:  R12: 
0001
[   27.521298] R13: 88001deaa100 R14:  R15: 
88001deaa100
[   27.523577] FS:  () GS:88001f80() knlGS:

[   27.525889] CS:  0010 DS:  ES:  CR0: 80050033
[   27.527928] CR2: 5568d4d27858 CR3: 1983b000 CR4: 
001406f0
[   27.529721] Call Trace:
[   27.530622]  ? wait_woken+0x80/0x80
[   27.531739]  generic_make_request+0xf1/0x320
[   27.532806]  submit_bio+0x73/0x150
[   27.533775]  ? submit_bio+0x73/0x150
[   27.534773]  md_super_write.part.58+0xbd/0xe0 [md_mod]
[   27.536078]  md_update_sb.part.59+0x534/0x840 [md_mod]
[   27.537468]  ? percpu_ref_switch_to_percpu+0x36/0x40
[   27.538862]  md_check_recovery+0x452/0x510 [md_mod]
[   27.540273]  raid10d+0x62/0x1420 [raid10]
[   27.541757]  ? schedule+0x3d/0xb0
[   27.542744]  ? schedule+0x3d/0xb0
[   27.544013]  ? schedule_timeout+0x208/0x390
[   27.546399]  md_thread+0x120/0x160 [md_mod]
[   27.548810]  ? md_thread+0x120/0x160 [md_mod]
[   27.550394]  ? wait_woken+0x80/0x80
[   27.551840]  kthread+0x124/0x140
[   27.551846]  ? state_show+0x2f0/0x2f0 [md_mod]
[   27.551848]  ? kthread_create_on_node+0x70/0x70
[   27.551852]  ? SyS_exit_group+0x14/0x20
[   27.551857]  ret_from_fork+0x25/0x30
[   27.551859] Code: 00 00 e9 6d fe ff ff 31 c0 e9 66 fe ff ff 49 8b 87 e0 01 
00 00 48 c7 c7 78 73 95 81 c6 05 6c 11 80 00 01 48 8b 30 e8 0f 0f de ff <0f> 
ff e9 97 fe ff ff 49 8b b7 a8 01 00 00 89 c2 83 e6 20 0f 85 
[   27.551882] ---[ end trace ba6164315560503f ]---
[   27.701328] ata4: SATA link down (SStatus 0 SControl 300)
[   27.710425] ata5: SATA link down (SStatus 0 SControl 300)
[   27.714620] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   27.722375] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   27.733520] ata6: SATA link down (SStatus 0 SControl 300)
[   27.738315] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   27.743962] ata3.00: configured for UDMA/100
[   27.747153] ata2.00: configured for UDMA/100
[   27.750833] ata1.00: configured for UDMA/100
[   27.781627] usb 2-1: reset high-speed USB device number 2 using xhci_hcd
[   27.963142] PM: resume of devices complete after 627.949 msecs
[   27.971546] OOM killer enabled.
[   27.978424] Restarting tasks ... done.
===

Thanks.

On pondělí 30. října 2017 23:41:58 CET Bart Van Assche wrote:
> It is known that during the resume following a hibernate, especially when
> using an md RAID1 array created on top of SCSI devices, sometimes the system
> hangs instead of coming up properly. This patch series fixes that
> problem. These patches have been tested on top of the block layer for-next
> branch. Please consider these changes for kernel v4.15.

> Changes between v10 and v11:
> - Left out the three md patches because a deadlock was reported when using
> XFS on top of md RAID 1. This deadlock occurred because the md kernel
> thread got frozen before the kernel thread running 

Re: [PATCH V8 0/8] block/scsi: safe SCSI quiescing

2017-11-07 Thread Oleksandr Natalenko

Hi Ming, Jens.

What is the fate of this patchset please?

03.10.2017 16:03, Ming Lei wrote:

Hi Jens,

Please consider this patchset for V4.15, and it fixes one
kind of long-term I/O hang issue in either block legacy path
or blk-mq.

The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Once SCSI device is put into QUIESCE, no new request except for
RQF_PREEMPT can be dispatched to SCSI successfully, and
scsi_device_quiesce() just simply waits for completion of I/Os
dispatched to SCSI stack. It isn't enough at all.

Because new request still can be comming, but all the allocated
requests can't be dispatched successfully, so request pool can be
consumed up easily.

Then request with RQF_PREEMPT can't be allocated and wait forever,
then system hangs forever, such as during system suspend or
sending SCSI domain alidation in case of transport_spi.

Both IO hang inside system suspend[1] or SCSI domain validation
were reported before.

This patch introduces preempt only mode, and solves the issue
by allowing RQF_PREEMP only during SCSI quiesce.

Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
them all.

V8:
- fix one race as pointed out by Bart
- pass 'op' to blk_queue_enter() as suggested by Christoph

V7:
- add Reviewed-by & Tested-by
- one line change in patch 5 for checking preempt request

V6:
- borrow Bart's idea of preempt only, with clean
  implementation(patch 5/patch 6)
- needn't any external driver's dependency, such as MD's
change

V5:
- fix one tiny race by introducing blk_queue_enter_preempt_freeze()
given this change is small enough compared with V4, I added
tested-by directly

V4:
- reorganize patch order to make it more reasonable
- support nested preempt freeze, as required by SCSI transport spi
- check preempt freezing in slow path of of blk_queue_enter()
- add "SCSI: transport_spi: resume a quiesced device"
- wake up freeze queue in setting dying for both blk-mq and legacy
- rename blk_mq_[freeze|unfreeze]_queue() in one patch
- rename .mq_freeze_wq and .mq_freeze_depth
- improve comment

V3:
- introduce q->preempt_unfreezing to fix one bug of preempt freeze
- call blk_queue_enter_live() only when queue is preempt frozen
- cleanup a bit on the implementation of preempt freeze
- only patch 6 and 7 are changed

V2:
- drop the 1st patch in V1 because percpu_ref_is_dying() is
enough as pointed by Tejun
- introduce preempt version of blk_[freeze|unfreeze]_queue
- sync between preempt freeze and normal freeze
- fix warning from percpu-refcount as reported by Oleksandr


[1] https://marc.info/?t=150340250100013=3=2


Thanks,
Ming

Bart Van Assche (1):
  block: Convert RQF_PREEMPT into REQ_PREEMPT

Ming Lei (7):
  blk-mq: only run hw queues for blk-mq
  block: tracking request allocation with q_usage_counter
  block: pass 'op' to blk_queue_enter()
  percpu-refcount: introduce __percpu_ref_tryget_live
  blk-mq: return if queue is frozen via current blk_freeze_queue_start
  block: support PREEMPT_ONLY
  SCSI: set block queue at preempt only when SCSI device is put into
quiesce

 block/blk-core.c| 66 
+

 block/blk-mq-debugfs.c  |  2 +-
 block/blk-mq.c  | 26 
 block/blk-mq.h  |  1 -
 block/blk-timeout.c |  2 +-
 block/blk.h |  2 +-
 drivers/ide/ide-atapi.c |  3 +-
 drivers/ide/ide-io.c|  2 +-
 drivers/ide/ide-pm.c|  4 +--
 drivers/scsi/scsi_lib.c | 31 +++
 fs/block_dev.c  |  4 +--
 include/linux/blk-mq.h  |  4 +--
 include/linux/blk_types.h   |  6 
 include/linux/blkdev.h  | 10 ---
 include/linux/percpu-refcount.h | 27 ++---
 15 files changed, 137 insertions(+), 53 deletions(-)


Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume

2017-10-21 Thread Oleksandr Natalenko
Well,

I've cherry-picked this series for current upstream/master branch, and got 
this while performing another suspend try:

===
[   62.415890] Freezing of tasks failed after 20.007 seconds (1 tasks refusing 
to freeze, wq_busy=0):
[   62.421150] xfsaild/dm-7D0   289  2 0x8000
[   62.425800] Call Trace:
[   62.428902]  __schedule+0x239/0x870
[   62.431834]  schedule+0x33/0x90
[   62.434156]  _xfs_log_force+0x143/0x280 [xfs]
[   62.438767]  ? schedule_timeout+0x188/0x390
[   62.443592]  ? wake_up_q+0x80/0x80
[   62.446545]  ? xfsaild+0x18d/0x780 [xfs]
[   62.449702]  xfs_log_force+0x2c/0x90 [xfs]
[   62.453217]  xfsaild+0x18d/0x780 [xfs]
[   62.456717]  kthread+0x124/0x140
[   62.459237]  ? kthread+0x124/0x140
[   62.461818]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[   62.465146]  ? kthread_create_on_node+0x70/0x70
[   62.467331]  ret_from_fork+0x25/0x30
[   62.474386] Restarting kernel threads ... done.
===

After this it looks like the system tried to freeze anyway:

===
[   62.478290] OOM killer enabled.
[   62.481711] Restarting tasks ... done.
[   62.488931] PM: suspend exit
[   62.491497] PM: suspend entry (s2idle)
[   62.493445] PM: Syncing filesystems ... done.
[   63.774220] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   63.782707] OOM killer disabled.
[   63.785226] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
done.
[   63.861548] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[   63.868153] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   63.868463] sd 1:0:0:0: [sdb] Stopping disk
[   63.873216] sd 0:0:0:0: [sda] Stopping disk
===

but got hung completely. After some time hung task was detected:

===
[  247.531069] INFO: task systemd-sleep:663 blocked for more than 120 seconds.
[  247.535307]   Not tainted 4.14.0-pf0 #1
[  247.537820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[  247.541015] systemd-sleep   D0   663  1 0x
[  247.542706] Call Trace:
[  247.543386]  __schedule+0x239/0x870
[  247.544351]  schedule+0x33/0x90
[  247.545197]  suspend_devices_and_enter+0x61b/0x890
[  247.546539]  ? wait_woken+0x80/0x80
[  247.547517]  pm_suspend+0x340/0x3b0
[  247.548550]  state_store+0x5a/0x90
[  247.549646]  kobj_attr_store+0xf/0x20
[  247.550649]  sysfs_kf_write+0x37/0x40
[  247.551640]  kernfs_fop_write+0x11c/0x1a0
[  247.552708]  __vfs_write+0x37/0x150
[  247.553641]  ? SYSC_newfstat+0x44/0x70
[  247.554628]  vfs_write+0xb1/0x1a0
[  247.09]  SyS_write+0x55/0xc0
[  247.556366]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  247.557667] RIP: 0033:0x7f56b74ec8d4
[  247.558616] RSP: 002b:7fff141c7738 EFLAGS: 0246 ORIG_RAX: 
0001
[  247.560667] RAX: ffda RBX: 55dd61863290 RCX: 
7f56b74ec8d4
[  247.562639] RDX: 0007 RSI: 55dd61864eb0 RDI: 
0004
[  247.564874] RBP: 7f56b77b3240 R08: 55dd61863370 R09: 
7f56b79c88c0
[  247.566875] R10: 000a R11: 0246 R12: 

[  247.569213] R13: 55dd61863290 R14: 55dd61863d08 R15: 
ffea
===

P.S. Current Ming's series is enough for 4.13 to not experience any issues 
like this.

On středa 18. října 2017 17:27:14 CEST Bart Van Assche wrote:
> I think this version (v10) has significant advantages over the most recent
> patch series posted by Ming Lei to address suspend, resume and SPI domain
> validation. So it would be appreciated if you could switch to this series
> for testing suspend and resume.


Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume

2017-10-17 Thread Oleksandr Natalenko
I've indeed tested some previous versions of this patchset, but still use 
Ming's solution.

Can it be clarified which one (Bart's or Ming's) is a correct approach?

On středa 18. října 2017 1:28:14 CEST Jens Axboe wrote:
> On 10/17/2017 05:26 PM, Bart Van Assche wrote:
> > Hello Jens,
> > 
> > It is known that during the resume following a hibernate, especially when
> > using an md RAID1 array created on top of SCSI devices, sometimes the
> > system hangs instead of coming up properly. This patch series fixes that
> > problem. These patches have been tested on top of the block layer for-next
> > branch. Please consider these changes for kernel v4.15.
> 
> The folks that had trouble with this issue, have they tested the
> patchset? I'd like to get a solution queued up for this sooner rather
> than later, but I'd greatly prefer if we had some folks that have
> tested and verified it.




Re: [PATCH V10 0/8] blk-mq-sched: improve sequential I/O performance

2017-10-14 Thread Oleksandr Natalenko
Hi.

By any chance, could this be backported to 4.14? I'm confused with "SCSI: 
allow to pass null rq to scsi_prep_state_check()" since it uses refactored 
flags.

===
if (req && !(req->rq_flags & RQF_PREEMPT))
===

Is it safe to revert to REQ_PREEMPT here, or rq_flags should also be replaced 
with cmd_flags?

Thanks.

On sobota 14. října 2017 11:22:24 CEST Ming Lei wrote:
> Hi Jens,
> 
> In Red Hat internal storage test wrt. blk-mq scheduler, we found that I/O
> performance is much bad with mq-deadline, especially about sequential I/O
> on some multi-queue SCSI devcies(lpfc, qla2xxx, SRP...)
> 
> Turns out one big issue causes the performance regression: requests are
> still dequeued from sw queue/scheduler queue even when ldd's queue is
> busy, so I/O merge becomes quite difficult to make, then sequential IO
> performance degrades a lot.
> 
> This issue becomes one of mains reasons for reverting default SCSI_MQ
> in V4.13.
> 
> This 8 patches improve this situation, and brings back performance loss.
> 
> With this change, SCSI-MQ sequential I/O performance is improved much, Paolo
> reported that mq-deadline performance improved much[2] in his dbench test
> wrt V2. Also performance improvement on lpfc/qla2xx was observed with
> V1.[1]
> 
> [1] http://marc.info/?l=linux-block=150151989915776=2
> [2] https://marc.info/?l=linux-block=150217980602843=2
> 
> gitweb:
>   https://github.com/ming1/linux/commits/
blk_mq_improve_scsi_mpath_perf_V10_p
> art1
> 
> V10:
>   - fix hang with kyber used
>   - address comments from Bart
> 
> V9:
>   - change title of patch1
>   - rename blk_mq_get_budget() as blk_mq_get_dispatch_budget()
>   - make check on .get_budget/.put_budget cleaner
>   - all are suggested by Jens, thank for review
> 
> V8:
>   - introduce helper of blk_mq_get_budget
>   - handle failure path of get_budget
> 
> V7:
>   - introduce .get_budget/.put_budget, and IO merge gets improved
>   compared with V6, and in theory this approach is better than the way
>   taken in block legacy
> 
>   - drop patch of 'blk-mq-sched: don't dequeue request until all in
> ->dispatch are flushed'
> 
> V6:
>   - address comments from Christoph
>   - drop the 1st patch which changes blk_mq_request_bypass_insert(),
>   which should belong to dm-mpath's improvement
>   - move ' blk-mq-sched: move actual dispatching into one helper'
>   as 2nd patch, and use the introduced helper to simplify dispatch
>   logic
>   - merge two previous patches into one for improving dispatch from sw 
queue
>   - make comment/commit log line width as ~70, as suggested by
> Christoph
> 
> V5:
>   - address some comments from Omar
>   - add Tested-by & Reveiewed-by tag
>   - use direct issue for blk_mq_request_bypass_insert(), and
>   start to consider to improve sequential I/O for dm-mpath
>   - only include part 1(the original patch 1 ~ 6), as suggested
>   by Omar
> 
> V4:
>   - add Reviewed-by tag
>   - some trival change: typo fix in commit log or comment,
>   variable name, no actual functional change
> 
> V3:
>   - totally round robin for picking req from ctx, as suggested
>   by Bart
>   - remove one local variable in __sbitmap_for_each_set()
>   - drop patches of single dispatch list, which can improve
>   performance on mq-deadline, but cause a bit degrade on
>   none because all hctxs need to be checked after ->dispatch
>   is flushed. Will post it again once it is mature.
>   - rebase on v4.13-rc6 with block for-next
> 
> V2:
>   - dequeue request from sw queues in round roubin's style
>   as suggested by Bart, and introduces one helper in sbitmap
>   for this purpose
>   - improve bio merge via hash table from sw queue
>   - add comments about using DISPATCH_BUSY state in lockless way,
>   simplifying handling on busy state,
>   - hold ctx->lock when clearing ctx busy bit as suggested
>   by Bart
> 
> 
> Ming Lei (8):
>   blk-mq-sched: dispatch from scheduler only after progress is made on
> ->dispatch
>   blk-mq-sched: move actual dispatching into one helper
>   sbitmap: introduce __sbitmap_for_each_set()
>   block: kyber: check if there is request in ctx in kyber_has_work()
>   blk-mq: introduce .get_budget and .put_budget in blk_mq_ops
>   blk-mq-sched: improve dispatching from sw queue
>   SCSI: allow to pass null rq to scsi_prep_state_check()
>   SCSI: implement .get_budget and .put_budget for blk-mq
> 
>  block/blk-mq-sched.c| 148
> +--- block/blk-mq-sched.h| 
>  2 +-
>  block/blk-mq.c  |  82 +--
>  block/blk-mq.h  |  22 ++-
>  block/kyber-iosched.c   |   2 +-
>  drivers/scsi/scsi_lib.c |  79 ++
>  include/linux/blk-mq.h  |  13 +
>  include/linux/sbitmap.h |  64 +++--
>  8 files changed, 342 

Re: [PATCH V8 0/8] block/scsi: safe SCSI quiescing

2017-10-03 Thread Oleksandr Natalenko
Also

Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>

for whole v8.

On úterý 3. října 2017 16:03:58 CEST Ming Lei wrote:
> Hi Jens,
> 
> Please consider this patchset for V4.15, and it fixes one
> kind of long-term I/O hang issue in either block legacy path
> or blk-mq.
> 
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for
> RQF_PREEMPT can be dispatched to SCSI successfully, and
> scsi_device_quiesce() just simply waits for completion of I/Os
> dispatched to SCSI stack. It isn't enough at all.
> 
> Because new request still can be comming, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated and wait forever,
> then system hangs forever, such as during system suspend or
> sending SCSI domain alidation in case of transport_spi.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch introduces preempt only mode, and solves the issue
> by allowing RQF_PREEMP only during SCSI quiesce.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all.
> 
> V8:
>   - fix one race as pointed out by Bart
>   - pass 'op' to blk_queue_enter() as suggested by Christoph
> 
> V7:
>   - add Reviewed-by & Tested-by
>   - one line change in patch 5 for checking preempt request
> 
> V6:
>   - borrow Bart's idea of preempt only, with clean
> implementation(patch 5/patch 6)
>   - needn't any external driver's dependency, such as MD's
>   change
> 
> V5:
>   - fix one tiny race by introducing blk_queue_enter_preempt_freeze()
>   given this change is small enough compared with V4, I added
>   tested-by directly
> 
> V4:
>   - reorganize patch order to make it more reasonable
>   - support nested preempt freeze, as required by SCSI transport spi
>   - check preempt freezing in slow path of of blk_queue_enter()
>   - add "SCSI: transport_spi: resume a quiesced device"
>   - wake up freeze queue in setting dying for both blk-mq and legacy
>   - rename blk_mq_[freeze|unfreeze]_queue() in one patch
>   - rename .mq_freeze_wq and .mq_freeze_depth
>   - improve comment
> 
> V3:
>   - introduce q->preempt_unfreezing to fix one bug of preempt freeze
>   - call blk_queue_enter_live() only when queue is preempt frozen
>   - cleanup a bit on the implementation of preempt freeze
>   - only patch 6 and 7 are changed
> 
> V2:
>   - drop the 1st patch in V1 because percpu_ref_is_dying() is
>   enough as pointed by Tejun
>   - introduce preempt version of blk_[freeze|unfreeze]_queue
>   - sync between preempt freeze and normal freeze
>   - fix warning from percpu-refcount as reported by Oleksandr
> 
> 
> [1] https://marc.info/?t=150340250100013=3=2
> 
> 
> Thanks,
> Ming
> 
> Bart Van Assche (1):
>   block: Convert RQF_PREEMPT into REQ_PREEMPT
> 
> Ming Lei (7):
>   blk-mq: only run hw queues for blk-mq
>   block: tracking request allocation with q_usage_counter
>   block: pass 'op' to blk_queue_enter()
>   percpu-refcount: introduce __percpu_ref_tryget_live
>   blk-mq: return if queue is frozen via current blk_freeze_queue_start
>   block: support PREEMPT_ONLY
>   SCSI: set block queue at preempt only when SCSI device is put into
> quiesce
> 
>  block/blk-core.c| 66
> + block/blk-mq-debugfs.c  |
>  2 +-
>  block/blk-mq.c  | 26 
>  block/blk-mq.h  |  1 -
>  block/blk-timeout.c |  2 +-
>  block/blk.h |  2 +-
>  drivers/ide/ide-atapi.c |  3 +-
>  drivers/ide/ide-io.c|  2 +-
>  drivers/ide/ide-pm.c|  4 +--
>  drivers/scsi/scsi_lib.c | 31 +++
>  fs/block_dev.c  |  4 +--
>  include/linux/blk-mq.h  |  4 +--
>  include/linux/blk_types.h   |  6 
>  include/linux/blkdev.h  | 10 ---
>  include/linux/percpu-refcount.h | 27 ++---
>  15 files changed, 137 insertions(+), 53 deletions(-)




Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing

2017-09-28 Thread Oleksandr Natalenko
Hey.

I can confirm that v6 of your patchset still works well for me. Tested on 
v4.13 kernel.

Thanks.

On středa 27. září 2017 10:52:41 CEST Ming Lei wrote:
> On Wed, Sep 27, 2017 at 04:27:51PM +0800, Ming Lei wrote:
> > On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> > > Hi Ming.
> > > 
> > > Ming Lei - 27.09.17, 13:48:
> > > > Hi,
> > > > 
> > > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> > > > 
> > > > Once SCSI device is put into QUIESCE, no new request except for
> > > > RQF_PREEMPT can be dispatched to SCSI successfully, and
> > > > scsi_device_quiesce() just simply waits for completion of I/Os
> > > > dispatched to SCSI stack. It isn't enough at all.
> > > > 
> > > > Because new request still can be comming, but all the allocated
> > > > requests can't be dispatched successfully, so request pool can be
> > > > consumed up easily.
> > > > 
> > > > Then request with RQF_PREEMPT can't be allocated and wait forever,
> > > > meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> > > > then system hangs forever, such as during system suspend or
> > > > sending SCSI domain alidation.
> > > > 
> > > > Both IO hang inside system suspend[1] or SCSI domain validation
> > > > were reported before.
> > > > 
> > > > This patch introduces preempt only mode, and solves the issue
> > > > by allowing RQF_PREEMP only during SCSI quiesce.
> > > > 
> > > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> > > > them all.
> > > > 
> > > > V6:
> > > > - borrow Bart's idea of preempt only, with clean
> > > > 
> > > >   implementation(patch 5/patch 6)
> > > > 
> > > > - needn't any external driver's dependency, such as MD's
> > > > change
> > > 
> > > Do you want me to test with v6 of the patch set? If so, it would be nice
> > > if
> > > you´d make a v6 branch in your git repo.
> > 
> > Hi Martin,
> > 
> > I appreciate much if you may run V6 and provide your test result,
> > follows the branch:
> > 
> > https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6
> > 
> > https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6
> 
> Also follows the branch against V4.13:
> 
> https://github.com/ming1/linux/tree/v4.13-safe-scsi-quiesce_V6_for_test
> 
> https://github.com/ming1/linux.git #v4.13-safe-scsi-quiesce_V6_for_test



Re: [PATCH V4 0/10] block/scsi: safe SCSI quiescing

2017-09-11 Thread Oleksandr Natalenko
For v4 with regard to suspend/resume:

Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>

On pondělí 11. září 2017 13:10:11 CEST Ming Lei wrote:
> Hi,
> 
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for
> RQF_PREEMPT can be dispatched to SCSI successfully, and
> scsi_device_quiesce() just simply waits for completion of I/Os
> dispatched to SCSI stack. It isn't enough at all.
> 
> Because new request still can be comming, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated and wait forever,
> meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> then system hangs forever, such as during system suspend or
> sending SCSI domain alidation.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch introduces preempt freeze, and solves the issue
> by preempt freezing block queue during SCSI quiesce, and allows
> to allocate request of RQF_PREEMPT when queue is in this state.
> 
> Oleksandr verified that V3 does fix the hang during suspend/resume,
> and Cathy verified that revised V3 fixes hang in sending
> SCSI domain validation.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all by introducing/unifying blk_freeze_queue_preempt() and
> blk_unfreeze_queue_preempt(), and cleanup is done together.
> 
> The patchset can be found in the following gitweb:
> 
>   https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V4
> 
> V4:
>   - reorganize patch order to make it more reasonable
>   - support nested preempt freeze, as required by SCSI transport spi
>   - check preempt freezing in slow path of of blk_queue_enter()
>   - add "SCSI: transport_spi: resume a quiesced device"
>   - wake up freeze queue in setting dying for both blk-mq and legacy
>   - rename blk_mq_[freeze|unfreeze]_queue() in one patch
>   - rename .mq_freeze_wq and .mq_freeze_depth
>   - improve comment
> 
> V3:
>   - introduce q->preempt_unfreezing to fix one bug of preempt freeze
>   - call blk_queue_enter_live() only when queue is preempt frozen
>   - cleanup a bit on the implementation of preempt freeze
>   - only patch 6 and 7 are changed
> 
> V2:
>   - drop the 1st patch in V1 because percpu_ref_is_dying() is
>   enough as pointed by Tejun
>   - introduce preempt version of blk_[freeze|unfreeze]_queue
>   - sync between preempt freeze and normal freeze
>   - fix warning from percpu-refcount as reported by Oleksandr
> 
> 
> [1] https://marc.info/?t=150340250100013=3=2
> 
> 
> Thanks,
> Ming
> 
> 
> Ming Lei (10):
>   blk-mq: only run hw queues for blk-mq
>   block: tracking request allocation with q_usage_counter
>   blk-mq: rename blk_mq_[freeze|unfreeze]_queue
>   blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
>   block: rename .mq_freeze_wq and .mq_freeze_depth
>   block: pass flags to blk_queue_enter()
>   block: introduce preempt version of blk_[freeze|unfreeze]_queue
>   block: allow to allocate req with RQF_PREEMPT when queue is preempt
> frozen
>   SCSI: transport_spi: resume a quiesced device
>   SCSI: preempt freeze block queue when SCSI device is put into quiesce
> 
>  block/bfq-iosched.c   |   2 +-
>  block/blk-cgroup.c|   8 +-
>  block/blk-core.c  |  95 
>  block/blk-mq.c| 180
> -- block/blk-mq.h| 
>  1 -
>  block/blk-timeout.c   |   2 +-
>  block/blk.h   |  12 +++
>  block/elevator.c  |   4 +-
>  drivers/block/loop.c  |  24 ++---
>  drivers/block/rbd.c   |   2 +-
>  drivers/nvme/host/core.c  |   8 +-
>  drivers/scsi/scsi_lib.c   |  25 +-
>  drivers/scsi/scsi_transport_spi.c |   3 +
>  fs/block_dev.c|   4 +-
>  include/linux/blk-mq.h|  15 ++--
>  include/linux/blkdev.h|  32 +--
>  16 files changed, 313 insertions(+), 104 deletions(-)




Re: [PATCH V3 0/8] block/scsi: safe SCSI quiescing

2017-09-02 Thread Oleksandr Natalenko
Again,

Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>

On sobota 2. září 2017 15:08:32 CEST Ming Lei wrote:
> Hi,
> 
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
> can be dispatched to SCSI successfully, and scsi_device_quiesce() just
> simply waits for completion of I/Os dispatched to SCSI stack. It isn't
> enough at all.
> 
> Because new request still can be allocated, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated, and system may
> hang forever, such as during system suspend or SCSI domain alidation.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch introduces preempt freez, and tries to solve the issue
> by preempt freezing block queue during SCSI quiesce, and allows
> to allocate request of RQF_PREEMPT when queue is preempt-frozen.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all by introducing blk_freeze_queue_preempt() and
> blk_unfreeze_queue_preempt(), also unifying current interfaces for
> freezing queue between block legacy and blk-mq.
> 
> Oleksandr has verified that this patchset V2 fixes his I/O hang
> during suspend/resume cycle.
> 
> V3:
>   - introduce q->preempt_unfreezing to fix one bug of preempt freeze
>   - call blk_queue_enter_live() only when queue is preempt frozen
>   - cleanup a bit on the implementation of preempt freeze
>   - only patch 6 and 7 are changed
> 
> V2:
>   - drop the 1st patch in V1 because percpu_ref_is_dying() is
>   enough as pointed by Tejun
>   - introduce preempt version of blk_[freeze|unfreeze]_queue
>   - sync between preempt freeze and normal freeze
>   - fix warning from percpu-refcount as reported by Oleksandr
> 
> 
> [1] https://marc.info/?t=150340250100013=3=2
> 
> 
> 
> Ming Lei (8):
>   blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
>   blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
>   blk-mq: only run hw queues for blk-mq
>   blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
>   block: tracking request allocation with q_usage_counter
>   block: introduce preempt version of blk_[freeze|unfreeze]_queue
>   block: allow to allocate req with REQF_PREEMPT when queue is preempt
> frozen
>   SCSI: preempt freeze block queue when SCSI device is put into quiesce
> 
>  block/bfq-iosched.c  |   2 +-
>  block/blk-cgroup.c   |   8 +--
>  block/blk-core.c |  53 ---
>  block/blk-mq.c   | 170
> +++ block/blk-mq.h   | 
>  1 -
>  block/blk.h  |  17 +
>  block/elevator.c |   4 +-
>  drivers/block/loop.c |  16 ++---
>  drivers/block/rbd.c  |   2 +-
>  drivers/nvme/host/core.c |   8 +--
>  drivers/scsi/scsi_lib.c  |  22 +-
>  include/linux/blk-mq.h   |  15 +++--
>  include/linux/blkdev.h   |  21 +-
>  13 files changed, 273 insertions(+), 66 deletions(-)




Re: [PATCH V2 0/8] block/scsi: safe SCSI quiescing

2017-09-02 Thread Oleksandr Natalenko
With regard to suspend/resume cycle:

Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>

On pátek 1. září 2017 20:49:49 CEST Ming Lei wrote:
> Hi,
> 
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
> can be dispatched to SCSI successfully, and scsi_device_quiesce() just
> simply waits for completion of I/Os dispatched to SCSI stack. It isn't
> enough at all.
> 
> Because new request still can be allocated, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated, and system may
> hang forever, such as during system suspend or SCSI domain alidation.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch tries to solve the issue by freezing block queue during
> SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
> when queue is frozen.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all by introducing preempt version of blk_freeze_queue() and
> blk_unfreeze_queue().
> 
> V2:
>   - drop the 1st patch in V1 because percpu_ref_is_dying() is
>   enough as pointed by Tejun
> 
>   - introduce preempt version of blk_[freeze|unfreeze]_queue
> 
>   - sync between preempt freeze and normal freeze
> 
>   - fix warning from percpu-refcount as reported by Oleksandr
> 
> 
> [1] https://marc.info/?t=150340250100013=3=2
> 
> 
> 
> Ming Lei (8):
>   blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
>   blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
>   blk-mq: only run hw queues for blk-mq
>   blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
>   block: tracking request allocation with q_usage_counter
>   block: allow to allocate req with REQF_PREEMPT when queue is frozen
>   block: introduce preempt version of blk_[freeze|unfreeze]_queue
>   SCSI: freeze block queue when SCSI device is put into quiesce
> 
>  block/bfq-iosched.c  |   2 +-
>  block/blk-cgroup.c   |   8 ++--
>  block/blk-core.c |  50 
>  block/blk-mq.c   | 119
> --- block/blk-mq.h   | 
>  1 -
>  block/blk.h  |   6 +++
>  block/elevator.c |   4 +-
>  drivers/block/loop.c |  16 +++
>  drivers/block/rbd.c  |   2 +-
>  drivers/nvme/host/core.c |   8 ++--
>  drivers/scsi/scsi_lib.c  |  21 -
>  include/linux/blk-mq.h   |  15 +++---
>  include/linux/blkdev.h   |  20 +++-
>  13 files changed, 206 insertions(+), 66 deletions(-)




Re: [PATCH 0/9] block/scsi: safe SCSI quiescing

2017-09-01 Thread oleksandr

Hi.

01.09.2017 05:45, Ming Lei wrote:

Could you try the following patch against this patchset to see
if there is still the warning?


With this patch I wasn't able to trigger per-CPU-related warning.

Also, for 8th patch you've written:

I have sent one delta patch in list, which will only call 
blk_queue_enter_live() iff SCSI device is in QUIESCE.


This refers to the patch I've just tested, right?


Re: [PATCH 0/9] block/scsi: safe SCSI quiescing

2017-08-31 Thread Oleksandr Natalenko
Tested against v4.13-rc7. With this patchset it looks like I/O doesn't hang, 
but once (just once, not each time) I've got the following stacktrace on 
resume:

===
[   55.577173] ata1.00: Security Log not supported
[   55.580690] ata1.00: configured for UDMA/100
[   55.582257] [ cut here ]
[   55.583924] usb 1-1: reset high-speed USB device number 2 using xhci_hcd
[   55.587489] WARNING: CPU: 3 PID: 646 at lib/percpu-refcount.c:361 
percpu_ref_reinit+0x21/0x80
[   55.590073] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt 
kvm_intel bochs_drm ppdev kvm ttm iTCO_vendor_support drm_kms_helper irqbypass 
8139too input_leds drm evdev psmouse led_class pcspkr syscopyarea joydev 
sysfillrect lpc_ich 8139cp parport_pc sysimgblt mousedev intel_agp i2c_i801 
fb_sys_fops mii mac_hid intel_gtt parport qemu_fw_cfg button sch_fq_codel 
ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio 
libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod dax raid10 
md_mod sr_mod cdrom sd_mod hid_generic usbhid hid uhci_hcd crct10dif_pclmul 
crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_rng ahci xhci_pci 
serio_raw pcbc ehci_pci xhci_hcd rng_core atkbd libps2 libahci ehci_hcd libata 
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd
[   55.611580]  usbcore virtio_pci scsi_mod usb_common virtio_ring virtio 
i8042 serio
[   55.614305] CPU: 3 PID: 646 Comm: kworker/u8:23 Not tainted 4.13.0-pf1 #1
[   55.616611] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[   55.619903] Workqueue: events_unbound async_run_entry_fn
[   55.621888] task: 88001b271e00 task.stack: c9a2c000
[   55.623674] RIP: 0010:percpu_ref_reinit+0x21/0x80
[   55.625751] RSP: :c9a2fdb0 EFLAGS: 00010002
[   55.628687] RAX: 0002 RBX: 88001dd80768 RCX: 
88001dd80758
[   55.631475] RDX: 0001 RSI: 0212 RDI: 
81f3e2f0
[   55.633694] RBP: c9a2fdc0 R08: 000cc61e7800 R09: 
88001f9929c0
[   55.637144] R10: ffec3296 R11: 7fff R12: 
0246
[   55.642456] R13: 88001f410800 R14: 88001f414300 R15: 

[   55.644832] FS:  () GS:88001f98() knlGS:

[   55.647388] CS:  0010 DS:  ES:  CR0: 80050033
[   55.649608] CR2:  CR3: 1aa5 CR4: 
001406e0
[   55.652688] Call Trace:
[   55.654597]  blk_unfreeze_queue+0x2f/0x50
[   55.656794]  scsi_device_resume+0x28/0x70 [scsi_mod]
[   55.659059]  scsi_dev_type_resume+0x38/0x90 [scsi_mod]
[   55.660875]  async_sdev_resume+0x15/0x20 [scsi_mod]
[   55.662564]  async_run_entry_fn+0x36/0x150
[   55.664241]  process_one_work+0x1de/0x430
[   55.666018]  worker_thread+0x47/0x3f0
[   55.667387]  kthread+0x125/0x140
[   55.672740]  ? process_one_work+0x430/0x430
[   55.674971]  ? kthread_create_on_node+0x70/0x70
[   55.677110]  ret_from_fork+0x25/0x30
[   55.679098] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 
fb 48 c7 c7 f0 e2 f3 81 e8 0a de 32 00 49 89 c4 48 8b 43 08 a8 03 75 42 <0f> 
ff 48 83 63 08 fd 65 ff 05 31 7d cc 7e 48 8b 53 08 f6 c2 03 
[   55.684822] ---[ end trace dbbf5aed3cf35331 ]---
[   55.714306] PM: resume of devices complete after 500.175 msecs
[   55.717299] OOM killer enabled.
===

Here:

===
355 void percpu_ref_reinit(struct percpu_ref *ref)
356 {
357 unsigned long flags;
358 
359 spin_lock_irqsave(_ref_switch_lock, flags);
360 
361 WARN_ON_ONCE(!percpu_ref_is_zero(ref));   // <--
362 
363 ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD;
364 percpu_ref_get(ref);
365 __percpu_ref_switch_mode(ref, NULL);
366 
367 spin_unlock_irqrestore(_ref_switch_lock, flags);
368 }
===

On čtvrtek 31. srpna 2017 19:38:34 CEST Ming Lei wrote:
> On Thu, Aug 31, 2017 at 07:34:06PM +0200, Oleksandr Natalenko wrote:
> > Since I'm in CC, does this series aim to replace 2 patches I've tested
> > before:
> > 
> > blk-mq: add requests in the tail of hctx->dispatch
> > blk-mq: align to legacy's implementation of blk_execute_rq
> > 
> > ?
> 
> Yeah, this solution is more generic, and the old one in above
> two patches may run into the same deadlock inevitably.
> 
> Oleksandr, could you test this patchset and provide the feedback?
> 
> BTW, it fixes the I/O hang in my raid10 test, but I just write
> 'devices' to pm_test.
> 
> Thanks!




Re: [PATCH 0/9] block/scsi: safe SCSI quiescing

2017-08-31 Thread Oleksandr Natalenko
Since I'm in CC, does this series aim to replace 2 patches I've tested before:

blk-mq: add requests in the tail of hctx->dispatch
blk-mq: align to legacy's implementation of blk_execute_rq

?

On čtvrtek 31. srpna 2017 19:27:19 CEST Ming Lei wrote:
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
> can be dispatched to SCSI successfully, and scsi_device_quiesce() just
> simply waits for completion of I/Os dispatched to SCSI stack. It isn't
> enough at all.
> 
> Because new request still can be allocated, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated, and system may
> hang forever, such as during system suspend or SCSI domain alidation.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch tries to solve the issue by freezing block queue during
> SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
> when queue is frozen.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all by unifying blk_freeze_queue() and blk_unfreeze_queue().
> 
> 
> [1] https://marc.info/?t=150340250100013=3=2
> 
> Ming Lei (9):
>   percpu-refcount: introduce percpu_ref_is_dead()
>   blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
>   blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
>   blk-mq: only run hw queues for blk-mq
>   block: introduce blk_drain_queue()
>   blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
>   block: tracking request allocation with q_usage_counter
>   block: allow to allocate req with REQF_PREEMPT when queue is frozen
>   SCSI: freeze block queue when SCSI device is put into quiesce
> 
>  block/bfq-iosched.c |  2 +-
>  block/blk-cgroup.c  |  8 +++
>  block/blk-core.c| 49
> + block/blk-mq.c  |
> 49 + block/blk-mq.h
>  |  1 -
>  block/blk.h |  5 +
>  block/elevator.c|  4 ++--
>  drivers/block/loop.c| 16 +++---
>  drivers/block/rbd.c |  2 +-
>  drivers/ide/ide-pm.c|  3 ++-
>  drivers/nvme/host/core.c|  8 +++
>  drivers/scsi/scsi_lib.c | 23 +--
>  include/linux/blk-mq.h  | 13 ++-
>  include/linux/blkdev.h  | 18 +--
>  include/linux/percpu-refcount.h | 17 ++
>  15 files changed, 153 insertions(+), 65 deletions(-)


[PATCH 1/1] scsi: qla4xxx: change some variables to hexadecimal string via %pm, %pmR and %pM formats

2013-07-06 Thread Oleksandr Khoshaba
The patch modifies some variables using the formats '%pm', '%pmR', '%pM' and 
prints them as a hexadecimal string with the '' and ':'  separators.

Signed-off-by: Oleksandr Khoshaba oleksandr.khosh...@gmail.com
---
 drivers/scsi/qla4xxx/ql4_mbx.c |6 ++
 drivers/scsi/qla4xxx/ql4_nx.c  |8 ++--
 drivers/scsi/qla4xxx/ql4_os.c  |   15 ---
 3 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/qla4xxx/ql4_mbx.c b/drivers/scsi/qla4xxx/ql4_mbx.c
index a501bea..004ada6 100644
--- a/drivers/scsi/qla4xxx/ql4_mbx.c
+++ b/drivers/scsi/qla4xxx/ql4_mbx.c
@@ -1837,10 +1837,8 @@ int qla4xxx_set_param_ddbentry(struct scsi_qla_host *ha,
ptid = (uint16_t *)fw_ddb_entry-isid[1];
*ptid = cpu_to_le16((uint16_t)ddb_entry-sess-target_id);
 
-   DEBUG2(ql4_printk(KERN_INFO, ha, ISID [%02x%02x%02x%02x%02x%02x]\n,
- fw_ddb_entry-isid[5], fw_ddb_entry-isid[4],
- fw_ddb_entry-isid[3], fw_ddb_entry-isid[2],
- fw_ddb_entry-isid[1], fw_ddb_entry-isid[0]));
+   DEBUG2(ql4_printk(KERN_INFO, ha, ISID [%pmR]\n,
+ fw_ddb_entry-isid));
 
iscsi_opts = le16_to_cpu(fw_ddb_entry-iscsi_options);
memset(fw_ddb_entry-iscsi_alias, 0, sizeof(fw_ddb_entry-iscsi_alias));
diff --git a/drivers/scsi/qla4xxx/ql4_nx.c b/drivers/scsi/qla4xxx/ql4_nx.c
index eaf00c1..2bc1046 100644
--- a/drivers/scsi/qla4xxx/ql4_nx.c
+++ b/drivers/scsi/qla4xxx/ql4_nx.c
@@ -3455,12 +3455,8 @@ int qla4_8xxx_get_sys_info(struct scsi_qla_host *ha)
ha-phy_port_num = sys_info-port_num;
ha-iscsi_pci_func_cnt = sys_info-iscsi_pci_func_cnt;
 
-   DEBUG2(printk(scsi%ld: %s: 
-   mac %02x:%02x:%02x:%02x:%02x:%02x 
-   serial %s\n, ha-host_no, __func__,
-   ha-my_mac[0], ha-my_mac[1], ha-my_mac[2],
-   ha-my_mac[3], ha-my_mac[4], ha-my_mac[5],
-   ha-serial_number));
+   DEBUG2(printk(scsi%ld: %s: mac %pM serial %s\n, ha-host_no,
+   __func__, ha-my_mac, ha-serial_number));
 
status = QLA_SUCCESS;
 
diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index b246b3c..2797b8f 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -4786,13 +4786,9 @@ static int qla4xxx_compare_tuple_ddb(struct 
scsi_qla_host *ha,
 * ISID would not match firmware generated ISID.
 */
if (is_isid_compare) {
-   DEBUG2(ql4_printk(KERN_INFO, ha, %s: old ISID [%02x%02x%02x
-   %02x%02x%02x] New ISID [%02x%02x%02x%02x%02x%02x]\n,
-   __func__, old_tddb-isid[5], old_tddb-isid[4],
-   old_tddb-isid[3], old_tddb-isid[2], old_tddb-isid[1],
-   old_tddb-isid[0], new_tddb-isid[5], new_tddb-isid[4],
-   new_tddb-isid[3], new_tddb-isid[2], new_tddb-isid[1],
-   new_tddb-isid[0]));
+   DEBUG2(ql4_printk(KERN_INFO, ha,
+   %s: old ISID [%pmR] New ISID [%pmR]\n,
+   __func__, old_tddb-isid, new_tddb-isid));
 
if (memcmp(old_tddb-isid[0], new_tddb-isid[0],
   sizeof(old_tddb-isid)))
@@ -6211,10 +6207,7 @@ qla4xxx_sysfs_ddb_get_param(struct 
iscsi_bus_flash_session *fnode_sess,
rc = sprintf(buf, %u\n, fnode_conn-keepalive_timeout);
break;
case ISCSI_FLASHNODE_ISID:
-   rc = sprintf(buf, %02x%02x%02x%02x%02x%02x\n,
-fnode_sess-isid[0], fnode_sess-isid[1],
-fnode_sess-isid[2], fnode_sess-isid[3],
-fnode_sess-isid[4], fnode_sess-isid[5]);
+   rc = sprintf(buf, %pm\n, fnode_sess-isid);
break;
case ISCSI_FLASHNODE_TSID:
rc = sprintf(buf, %u\n, fnode_sess-tsid);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] scsi: qla2xxx: change some variables to hexadecimal string via %*phN format

2013-06-29 Thread Oleksandr Khoshaba
The patch modifies some variables using the format '%*phN' and prints them as a 
hexadecimal string with the separator ''.

Signed-off-by: Oleksandr Khoshaba oleksandr.khosh...@gmail.com
---
 drivers/scsi/qla2xxx/qla_gs.c  |   12 ++--
 drivers/scsi/qla2xxx/qla_isr.c |9 ++---
 drivers/scsi/qla2xxx/qla_mr.c  |   10 ++
 3 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index 8b2f75a..4155e15 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -1394,16 +1394,8 @@ qla2x00_fdmi_rhba(scsi_qla_host_t *vha)
qla2x00_update_ms_fdmi_iocb(vha, size + 16);
 
ql_dbg(ql_dbg_disc, vha, 0x202e,
-   RHBA identifier = 
-   %02x%02x%02x%02x%02x%02x%02x%02x size=%d.\n,
-   ct_req-req.rhba.hba_identifier[0],
-   ct_req-req.rhba.hba_identifier[1],
-   ct_req-req.rhba.hba_identifier[2],
-   ct_req-req.rhba.hba_identifier[3],
-   ct_req-req.rhba.hba_identifier[4],
-   ct_req-req.rhba.hba_identifier[5],
-   ct_req-req.rhba.hba_identifier[6],
-   ct_req-req.rhba.hba_identifier[7], size);
+   RHBA identifier = %8phN size=%d.\n,
+   ct_req-req.rhba.hba_identifier, size);
ql_dump_buffer(ql_dbg_disc + ql_dbg_buffer, vha, 0x2076,
entries, size);
 
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index d2a4c75..b269682 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -2212,16 +2212,11 @@ check_scsi_status:
 out:
if (logit)
ql_dbg(ql_dbg_io, fcport-vha, 0x3022,
-   FCP command status: 0x%x-0x%x (0x%x) 
-   nexus=%ld:%d:%d portid=%02x%02x%02x oxid=0x%x 
-   cdb=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x len=0x%x 
-   rsp_info=0x%x resid=0x%x fw_resid=0x%x.\n,
+   FCP command status: 0x%x-0x%x (0x%x) nexus=%ld:%d:%d 
portid=%02x%02x%02x oxid=0x%x cdb=%10phN len=0x%x rsp_info=0x%x resid=0x%x 
fw_resid=0x%x.\n,
comp_status, scsi_status, res, vha-host_no,
cp-device-id, cp-device-lun, fcport-d_id.b.domain,
fcport-d_id.b.area, fcport-d_id.b.al_pa, ox_id,
-   cp-cmnd[0], cp-cmnd[1], cp-cmnd[2], cp-cmnd[3],
-   cp-cmnd[4], cp-cmnd[5], cp-cmnd[6], cp-cmnd[7],
-   cp-cmnd[8], cp-cmnd[9], scsi_bufflen(cp), rsp_info_len,
+   cp-cmnd, scsi_bufflen(cp), rsp_info_len,
resid_len, fw_resid_len);
 
if (!res)
diff --git a/drivers/scsi/qla2xxx/qla_mr.c b/drivers/scsi/qla2xxx/qla_mr.c
index a6df558..8c3ffcc 100644
--- a/drivers/scsi/qla2xxx/qla_mr.c
+++ b/drivers/scsi/qla2xxx/qla_mr.c
@@ -2517,16 +2517,10 @@ check_scsi_status:
 
if (logit)
ql_dbg(ql_dbg_io, fcport-vha, 0x3058,
-   FCP command status: 0x%x-0x%x (0x%x) 
-   nexus=%ld:%d:%d tgt_id: 0x%x lscsi_status: 0x%x
-   cdb=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x len=0x%x 
-   rsp_info=0x%x resid=0x%x fw_resid=0x%x 
-   sense_len=0x%x, par_sense_len=0x%x, rsp_info_len=0x%x\n,
+   FCP command status: 0x%x-0x%x (0x%x) nexus=%ld:%d:%d 
tgt_id: 0x%x lscsi_status: 0x%x cdb=%10phN len=0x%x rsp_info=0x%x resid=0x%x 
fw_resid=0x%x sense_len=0x%x, par_sense_len=0x%x, rsp_info_len=0x%x\n,
comp_status, scsi_status, res, vha-host_no,
cp-device-id, cp-device-lun, fcport-tgt_id,
-   lscsi_status, cp-cmnd[0], cp-cmnd[1], cp-cmnd[2],
-   cp-cmnd[3], cp-cmnd[4], cp-cmnd[5], cp-cmnd[6],
-   cp-cmnd[7], cp-cmnd[8], cp-cmnd[9], scsi_bufflen(cp),
+   lscsi_status, cp-cmnd, scsi_bufflen(cp),
rsp_info_len, resid_len, fw_resid_len, sense_len,
par_sense_len, rsp_info_len);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] scsi: qla2xxx: change some variables to hexadecimal string via %*phC or %*phN format

2013-06-23 Thread Oleksandr Khoshaba
The patch modifies some variables using the format '%*ph[CN]' and prints them 
as a hexadecimal string with the separator ':' or ''.

Signed-off-by: Oleksandr Khoshaba oleksandr.khosh...@gmail.com
---
 drivers/scsi/qla2xxx/qla_bsg.c |8 +---
 drivers/scsi/qla2xxx/qla_gs.c  |   82 ++--
 drivers/scsi/qla2xxx/qla_init.c|   25 +++
 drivers/scsi/qla2xxx/qla_os.c  |   10 +
 drivers/scsi/qla2xxx/qla_target.c  |   74 +---
 drivers/scsi/qla2xxx/tcm_qla2xxx.c |   18 +++-
 6 files changed, 51 insertions(+), 166 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_bsg.c b/drivers/scsi/qla2xxx/qla_bsg.c
index 39719f8..8e19079 100644
--- a/drivers/scsi/qla2xxx/qla_bsg.c
+++ b/drivers/scsi/qla2xxx/qla_bsg.c
@@ -1333,12 +1333,8 @@ qla24xx_iidma(struct fc_bsg_job *bsg_job)
 
if (rval) {
ql_log(ql_log_warn, vha, 0x704c,
-   iIDMA cmd failed for %02x%02x%02x%02x%02x%02x%02x%02x -- 
-   %04x %x %04x %04x.\n, fcport-port_name[0],
-   fcport-port_name[1], fcport-port_name[2],
-   fcport-port_name[3], fcport-port_name[4],
-   fcport-port_name[5], fcport-port_name[6],
-   fcport-port_name[7], rval, fcport-fp_speed, mb[0], mb[1]);
+   iIDMA cmd failed for %8phN -- %04x %x %04x %04x.\n,
+   fcport-port_name, rval, fcport-fp_speed, mb[0], mb[1]);
rval = (DID_ERROR  16);
} else {
if (!port_param-mode) {
diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index d0ea8b9..8b2f75a 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -226,17 +226,8 @@ qla2x00_ga_nxt(scsi_qla_host_t *vha, fc_port_t *fcport)
fcport-d_id.b.domain = 0xf0;
 
ql_dbg(ql_dbg_disc, vha, 0x2063,
-   GA_NXT entry - nn %02x%02x%02x%02x%02x%02x%02x%02x 
-   pn %02x%02x%02x%02x%02x%02x%02x%02x 
-   port_id=%02x%02x%02x.\n,
-   fcport-node_name[0], fcport-node_name[1],
-   fcport-node_name[2], fcport-node_name[3],
-   fcport-node_name[4], fcport-node_name[5],
-   fcport-node_name[6], fcport-node_name[7],
-   fcport-port_name[0], fcport-port_name[1],
-   fcport-port_name[2], fcport-port_name[3],
-   fcport-port_name[4], fcport-port_name[5],
-   fcport-port_name[6], fcport-port_name[7],
+   GA_NXT entry - nn %8phN pn %8phN port_id=%02x%02x%02x.\n,
+   fcport-node_name, fcport-port_name,
fcport-d_id.b.domain, fcport-d_id.b.area,
fcport-d_id.b.al_pa);
}
@@ -448,17 +439,9 @@ qla2x00_gnn_id(scsi_qla_host_t *vha, sw_info_t *list)
ct_rsp-rsp.gnn_id.node_name, WWN_SIZE);
 
ql_dbg(ql_dbg_disc, vha, 0x2058,
-   GID_PT entry - nn %02x%02x%02x%02x%02x%02x%02X%02x 

-   pn %02x%02x%02x%02x%02x%02x%02X%02x 
+   GID_PT entry - nn %8phN pn %8phN 
portid=%02x%02x%02x.\n,
-   list[i].node_name[0], list[i].node_name[1],
-   list[i].node_name[2], list[i].node_name[3],
-   list[i].node_name[4], list[i].node_name[5],
-   list[i].node_name[6], list[i].node_name[7],
-   list[i].port_name[0], list[i].port_name[1],
-   list[i].port_name[2], list[i].port_name[3],
-   list[i].port_name[4], list[i].port_name[5],
-   list[i].port_name[6], list[i].port_name[7],
+   list[i].node_name, list[i].port_name,
list[i].d_id.b.domain, list[i].d_id.b.area,
list[i].d_id.b.al_pa);
}
@@ -798,17 +781,8 @@ qla2x00_sns_ga_nxt(scsi_qla_host_t *vha, fc_port_t *fcport)
fcport-d_id.b.domain = 0xf0;
 
ql_dbg(ql_dbg_disc, vha, 0x2061,
-   GA_NXT entry - nn %02x%02x%02x%02x%02x%02x%02x%02x 
-   pn %02x%02x%02x%02x%02x%02x%02x%02x 
-   port_id=%02x%02x%02x.\n,
-   fcport-node_name[0], fcport-node_name[1],
-   fcport-node_name[2], fcport-node_name[3],
-   fcport-node_name[4], fcport-node_name[5],
-   fcport-node_name[6], fcport-node_name[7],
-   fcport-port_name[0], fcport-port_name[1],
-   fcport-port_name[2], fcport-port_name[3],
-   fcport-port_name[4], fcport-port_name[5],
-   fcport-port_name[6], fcport-port_name[7],
+   GA_NXT entry - nn