Re: [PATCH] Revert "scsi-mq: Always unprepare before requeuing a request"
Bart Van Assche writes: > On Wed, 2017-08-16 at 22:51 -0400, Martin K. Petersen wrote: >> > When I checked earlier today the ipr patch was not yet in linux-next >> >> That's weird. They were both committed two weeks ago. >> >> They appear to be in there now, though: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/drivers/scsi?ofs=50 > > Hello Martin, > > As far as I can see commit 270065e92c31 ("scsi-mq: Always unprepare before > requeuing a request") was yesterday in linux-next but not the ipr fix > (commit b0e17a9b0df2 ("scsi: ipr: Fix scsi-mq lockdep issue")). The > ipr fix is in today's linux-next but was not in linux-next yesterday (the > Australian time zone applies to the linux-next date labels): > $ git tag --contains 270065e92c31 > next-20170816 > $ git tag --contains b0e17a9b0df2 > next-20170817 > $ git show next-20170816 | head -3 > tag next-20170816 > Tagger: Stephen Rothwell > Date: Wed Aug 16 15:26:43 2017 +1000 > $ git show next-20170817 | head -3 > tag next-20170817 > Tagger: Stephen Rothwell > Date: Thu Aug 17 16:34:30 2017 +1000 > > I think this means that the ipr fix went upstream before it ended up in > linux-next. It was in linux-next, but as a different commit. I don't know why. $ git log -1 --format=oneline next-20170816 drivers/scsi/ipr.c 48b580cacfae123471f8cd43ca81b0e53c9cf702 scsi: ipr: Fix scsi-mq lockdep issue $ git tag --contains 48b580cacfae123471f8cd43ca81b0e53c9cf702 next-20170809 next-20170810 next-20170811 next-20170815 next-20170816 cheers
HBA recommended as FC target
Hello, I look for a FC HBA that works with the Linux target. Can somone recommend a HBA type to me? Cheers, Thomas
Sniffing FC traffic
Hello, I would like to create a setup that allows me to sniff FC traffic. Is it possible with Linux or can someone recommend a setup that works. I want to avoid buying a 120kUSD fabric analyzer. Cheers, Thomas
Re: HBA recommended as FC target
On Fri, Aug 18, 2017 at 8:36 AM, Thomas Glanzmann wrote: > Hello, > I look for a FC HBA that works with the Linux target. Can somone > recommend a HBA type to me? > > Cheers, > Thomas Hello Any of the Qlogic qla24xx or qla25xx and higher that allow you to disable initiator mode will work. I use qla25xx 8Gbit in all my Target arrays. Thanks Laurence
Re: Sniffing FC traffic
On Fri, Aug 18, 2017 at 8:37 AM, Thomas Glanzmann wrote: > Hello, > I would like to create a setup that allows me to sniff FC traffic. Is it > possible with Linux or can someone recommend a setup that works. I want > to avoid buying a 120kUSD fabric analyzer. > > Cheers, > Thomas There is no way to do this using adapters and generic F/C with Linux as the O/S. The ability to enable debugging in the F/C drivers will expose some of the internals but there is no way to sniff directly as far as I am aware. Many switches allow port level tracing facilities but inline sniffing using Linux and generic hosts is not possible. We have Finisars for inline tracing when we have to debug host and fabric issues. Software based FCOE using libfc and the Intel cards for example will allow Wireshark tracing but that is encapsulated F/C in Ethernet packets hence the Wireshark ability. Thanks Laurence
Re: Sniffing FC traffic
On 08/18/2017 08:31 AM, Laurence Oberman wrote: On Fri, Aug 18, 2017 at 8:37 AM, Thomas Glanzmann wrote: I would like to create a setup that allows me to sniff FC traffic. Is it possible with Linux or can someone recommend a setup that works. I want to avoid buying a 120kUSD fabric analyzer. There is no way to do this using adapters and generic F/C with Linux as the O/S. The ability to enable debugging in the F/C drivers will expose some of the internals but there is no way to sniff directly as far as I am aware. Many switches allow port level tracing facilities but inline sniffing using Linux and generic hosts is not possible. We have Finisars for inline tracing when we have to debug host and fabric issues. Software based FCOE using libfc and the Intel cards for example will allow Wireshark tracing but that is encapsulated F/C in Ethernet packets hence the Wireshark ability. We have had some success using a (Teledyne) LeCroy analyzer. Its GUI (SierraNet) runs only on a Windows host, but it is capable of exporting a capture to Wireshark. This mostly shows just the ELS and FCP transactions; to debug low-level link issues you'd have to work with the original capture in SierraNet. Like any other kind of analyzer the cost will be proportional to the speed of the link you're trying to sniff and the amount of capture depth you need. You may be able to save some money by getting secondhand equipment and/or running the link at lower speed when you need to debug something. Regards, Steven J. Magnani "I claim this network for MARS! www.digidescorp.com Earthling, return my space modulator!" #include
Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t
Dear Christoph, On 08/06/17 20:06, Paul Menzel wrote: On 2017-08-05 11:30, Christoph Hellwig wrote: On Thu, Aug 03, 2017 at 07:42:15PM +0200, Paul Menzel wrote: Since the merge windows opened for Linux 4.13, I am unable to resume from ACPI S3 suspend on a Lenovo X60t. The graphics comes back, but I am unable to enter anything, and the system seems to be hung. Magic SysRq keys still work though, but powering the system of doesn’t work. The power button also does not work. Please find the stack trace with Linux 4.13-rc3 captured over the serial console below. Is this really -rc3? rc3 has a commit to disable block runtime pm for blk-mq, which is now the default for scsi. So with -rc1 we've seen similar reports, but rc3 would be odd and suggest we have further problems. Yes, this was 4.13-rc3. Rebuilding the Linux kernel from commit 0fdd951c (Merge tag 'media/v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media) shows the same behavior. Just an update, that this is still present in Linux 4.13-rc5+, that means commit 04d49f3638d0 (Merge tag 'drm-fixes-for-v4.13-rc6' of git://people.freedesktop.org/~airlied/linux). Kind regards, Paul
Re: HBA recommended as FC target
Hello Laurence, * Laurence Oberman [2017-08-18 15:26]: > Any of the Qlogic qla24xx or qla25xx and higher that allow you to > disable initiator mode will work. I use qla25xx 8Gbit in all my > Target arrays. thank you for your recommendations I ordered a Qlogic QLE2462-HP -PX2510401 - 4GB 2-Port Fibre on ebay. * Laurence Oberman [2017-08-18 15:31]: > There is no way to do this using adapters and generic F/C with Linux > as the O/S. The ability to enable debugging in the F/C drivers will > expose some of the internals but there is no way to sniff directly as > far as I am aware. I enabled debugging in the Linux target once and I could see quiet detailed information. Probably this will be enough for me. At the time I was debugging SCSI reservations. > Many switches allow port level tracing facilities but inline sniffing > using Linux and generic hosts is not possible. Are you aware of facilities in entry level brocade switches that I can use. I just have to trace one port. Cheers, Thomas
Re: Sniffing FC traffic
Hello Steve, > We have had some success using a (Teledyne) LeCroy analyzer. Its GUI I found a LeCroy FC analyzer on ebay for 500 EURs. I'm not buying it yet, but keep it in mind. Much less the cost I had in mind (100 T EUR). Cheers, Thomas
[Bug 196707] New: Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Ma
https://bugzilla.kernel.org/show_bug.cgi?id=196707 Bug ID: 196707 Summary: Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Manager" Product: SCSI Drivers Version: 2.5 Kernel Version: 4.12.8 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: AACRAID Assignee: scsi_drivers-aacr...@kernel-bugs.osdl.org Reporter: kernel@yaze-ag.de CC: a...@borouhin.com, ar...@maven.pl, bug...@grubelek.pl, chris...@niksun.com, davyg...@pobox.com, har...@snel.ws, he...@fitmsg.net, hm10...@gmail.com, rodrigoaguileraparr...@gmail.com, scnai...@hotmail.com, scsi_drivers-aacr...@kernel-bugs.osdl.org Regression: No Created attachment 258013 --> https://bugzilla.kernel.org/attachment.cgi?id=258013&action=edit Kernel output of 3 kernel starts with connected / disconnected HDDs and different kernels +++ This bug was initially created as a clone of Bug #151661 +++ Hi, my english language is not so good. I hope you understand what I write here. The kernel 4.12.8 crashes after errors from aacraid and after the start of the "kernel device manager" when I connect 8 HDDs (power on) to the ICP9087MA-RAID-Controller. (See attachment 0_...) The crash is before the Kernel opens log files in /var/log. I found NO entries in the log files. So I configure grub (in /etc/default/grub) and the kernel to use the serial line (/dev/ttyS0, COM1) as the console. On my Notebook I capture the serial output of the starting kernel with minicom. If I disconnect the 8 HDDs (power off) the kernel 4.12.8 starts normal and I can use the system but without the two Host drives (2 x RAID-5) of the ICP9087MA. (See attachment 1_...) When I use different kernels of version 3.x.x (from kernel.org or Debian) all works fine when the 8 HDDs are connected to the ICP9087MA-Raid-Controller. (See attachment 2_...) Attachment (output of 3 Kernel starts in one attachment): 0_kernel_crash_(4.12.8)_with_all_drives_at_the_ICP9087MA.cap 1_normal_start_(4.12.8)_without_any_drive_at_the_ICP9087MA.cap 2_normal_start_(3.18.65)_with_all_drives_at_the_ICP9087MA.cap -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
[PATCH] sd: preserve sysfs updates to max_sectors_kb
prevent systemd-udevd from changing a device's sysfs entry max_sectors_kb back to the default value. - max_sectors_kb can be tweaked for better performance. - udev can be triggered by sg_logs -t or scsi_temperature, ... - sd_revalidate_disk is called from udev by ioctl BLKRRPART Reviewed-by: Scott Teel Signed-off-by: Don Brace --- drivers/scsi/sd.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index bea36ad..457dc7c 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3055,6 +3055,7 @@ static int sd_revalidate_disk(struct gendisk *disk) sector_t old_capacity = sdkp->capacity; unsigned char *buffer; unsigned int dev_max, rw_max; + unsigned int max_sectors; SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_revalidate_disk\n")); @@ -3128,9 +3129,14 @@ static int sd_revalidate_disk(struct gendisk *disk) rw_max = min_not_zero(logical_to_sectors(sdp, dev_max), (sector_t)BLK_DEF_MAX_SECTORS); - /* Combine with controller limits */ - q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); + /* Check for max_sectors_kb update through sysfs */ + if (q->limits.max_sectors < min(rw_max, queue_max_hw_sectors(q))) + max_sectors = q->limits.max_sectors; + else + max_sectors = min(rw_max, queue_max_hw_sectors(q)); + /* Combine with controller limits */ + q->limits.max_sectors = max_sectors; set_capacity(disk, logical_to_sectors(sdp, sdkp->capacity)); sd_config_write_same(sdkp); kfree(buffer);
Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc
On 08/17/2017 10:52 AM, Bart Van Assche wrote: > On Wed, 2017-08-16 at 18:18 -0500, Brian King wrote: >> On 08/16/2017 12:21 PM, Bart Van Assche wrote: >>> On Wed, 2017-08-16 at 22:30 +0530, Abdul Haleem wrote: As of next-20170809, linux-next on powerpc boot hung with below trace message. [ ... ] A bisection resulted in first bad commit (270065e92 - scsi: scsi-mq: Always unprepare ...) in the merge branch 'scsi/for-next' System booted fine when the below commit is reverted: commit 270065e92c317845d69095ec8e3d18616b5b39d5 Author: Bart Van Assche Date: Thu Aug 3 14:40:14 2017 -0700 scsi: scsi-mq: Always unprepare before requeuing a request >>> >>> Hello Brian and Michael, >>> >>> Do you agree that this probably indicates a bug in the PowerPC block driver >>> that is used to access the boot disk? Anyway, since a solution is not yet >>> available, I will submit a revert for this patch. >> >> I've been looking at this a bit, and can recreate the issue, but haven't >> got to root cause of the issue as of yet. If I do a sysrq-w while the system >> is hung >> during boot I see this: >> >> [ 25.561523] Workqueue: events_unbound async_run_entry_fn >> [ 25.561527] Call Trace: >> [ 25.561529] [c001697873f0] [c00169701600] 0xc00169701600 >> (unreliable) >> [ 25.561534] [c001697875c0] [c001ab78] __switch_to+0x2e8/0x430 >> [ 25.561539] [c00169787620] [c091ccb0] __schedule+0x310/0xa00 >> [ 25.561543] [c001697876f0] [c091d3e0] schedule+0x40/0xb0 >> [ 25.561548] [c00169787720] [c0921e40] >> schedule_timeout+0x200/0x430 >> [ 25.561553] [c00169787810] [c091db10] >> io_schedule_timeout+0x30/0x70 >> [ 25.561558] [c00169787840] [c091e978] >> wait_for_common_io.constprop.3+0x178/0x280 >> [ 25.561563] [c001697878c0] [c047f7ec] blk_execute_rq+0x7c/0xd0 >> [ 25.561567] [c00169787910] [c0614cd0] scsi_execute+0x100/0x230 >> [ 25.561572] [c00169787990] [c060d29c] >> scsi_report_opcode+0xbc/0x170 >> [ 25.561577] [c00169787a50] [d4fe6404] >> sd_revalidate_disk+0xe04/0x1620 [sd_mod] >> [ 25.561583] [c00169787b80] [d4fe6d84] >> sd_probe_async+0xb4/0x230 [sd_mod] >> [ 25.561588] [c00169787c00] [c010fc44] >> async_run_entry_fn+0x74/0x210 >> [ 25.561593] [c00169787c90] [c0102f48] >> process_one_work+0x198/0x480 >> [ 25.561598] [c00169787d30] [c01032b8] worker_thread+0x88/0x510 >> [ 25.561603] [c00169787dc0] [c010b030] kthread+0x160/0x1a0 >> [ 25.561608] [c00169787e30] [c000b3a4] >> ret_from_kernel_thread+0x5c/0xb8 >> >> I was noticing that we are commonly in scsi_report_opcode. Since ipr RAID >> arrays don't support >> the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, I tried setting >> sdev->no_report_opcodes = 1 >> in ipr's slave configure. This seems to eliminate the boot hang for me, but >> is only working around >> the issue. Since this command is not supported by ipr, it should return with >> an illegal request. >> When I'm hung at this point, there is nothing outstanding to the adapter / >> driver. I'll continue >> debugging... > > (+linux-scsi) > > Hello Brian, > > Is kernel debugging enabled on your test system? Is lockdep enabled? > Anyway, stack traces like the above usually mean that a request got stuck in > a block or scsi driver (ipr in this case). Information about pending requests, > including the SCSI CDB, is available under /sys/kernel/debug/block (see also > commit 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")). I think I have an understanding what is going on and why Bart's patch is causing problems for ipr. I can work around the boot hang in ipr, but ultimately I think we need to figure out a fix in scsi / block. I added some tracing and confirmed its not a matter of commands getting stuck in ipr. The issue is we are retrying failed commands until we finally run out of time. This is what I see: 1. sd_revalidate_disk calls scsi_report_opcode 2. ipr RAID arrays don't support MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES 3. ipr returns the command with DID_ERROR 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and returns NEEDS_RETRY 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which calls scsi_mq_requeue_cmd 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior we did not 7. This results in the command getting scmd->retries zeroed out when it gets re-queued, since we go through prep again and we lose our retry counter, resulting in lots and lots of retries. 8. Since the default command timeout for an ipr RAID array is 120 seconds, these retries go on for quite a long time... 9. Finally, the command has been retried so long we trip over the overall retry timer in scsi_soft
Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb
On Fri, 2017-08-18 at 16:00 -0500, Don Brace wrote: > prevent systemd-udevd from changing a device's sysfs entry > max_sectors_kb back to the default value. > - max_sectors_kb can be tweaked for better performance. > - udev can be triggered by sg_logs -t or scsi_temperature, ... > - sd_revalidate_disk is called from udev by ioctl BLKRRPART Hello Don, Which udev rule changes max_sectors_kb back to the default? Why do you want to change the kernel code instead of modifying that udev rule? What software changes max_sectors_kb to a smaller value? Is it a udev rule or perhaps something else? Thanks, Bart.
[PATCH] ipr: Set no_report_opcodes for RAID arrays
Since ipr RAID arrays do not support the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, set no_report_opcodes to prevent it from being sent. Signed-off-by: Brian King --- Index: linux-2.6.git/drivers/scsi/ipr.c === --- linux-2.6.git.orig/drivers/scsi/ipr.c +++ linux-2.6.git/drivers/scsi/ipr.c @@ -4935,6 +4935,7 @@ static int ipr_slave_configure(struct sc } if (ipr_is_vset_device(res)) { sdev->scsi_level = SCSI_SPC_3; + sdev->no_report_opcodes = 1; blk_queue_rq_timeout(sdev->request_queue, IPR_VSET_RW_TIMEOUT); blk_queue_max_hw_sectors(sdev->request_queue, IPR_VSET_MAX_SECTORS);
RE: [PATCH] sd: preserve sysfs updates to max_sectors_kb
> -Original Message- > From: Bart Van Assche [mailto:bart.vanass...@wdc.com] > Sent: Friday, August 18, 2017 4:06 PM > To: h...@infradead.org; Viswas G ; Gerry > Morong ; Mahesh Rajashekhara > ; posw...@suse.com; Scott > Benesh ; Don Brace > ; Bader Ali - Saleh > ; Kevin Barnett > ; joseph.szczy...@hpe.com; Scott Teel > ; j...@linux.vnet.ibm.com; Justin Lindley > ; John Hall > Cc: linux-scsi@vger.kernel.org > Subject: Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb > > EXTERNAL EMAIL > > > On Fri, 2017-08-18 at 16:00 -0500, Don Brace wrote: > > prevent systemd-udevd from changing a device's sysfs entry > > max_sectors_kb back to the default value. > > - max_sectors_kb can be tweaked for better performance. > > - udev can be triggered by sg_logs -t or scsi_temperature, ... > > - sd_revalidate_disk is called from udev by ioctl BLKRRPART > > Hello Don, > > Which udev rule changes max_sectors_kb back to the default? Why do you > want > to change the kernel code instead of modifying that udev rule? What > software > changes max_sectors_kb to a smaller value? Is it a udev rule or perhaps > something else? > > Thanks, > > Bart. As far as I can see, udev looks for file access in sysfs. I am not exactly sure which rule changes this. It was added in more recent distros. Can someone help me out? I wanted to change the kernel code because it looks to me like anytime sd_revalidate_disk is called max_sectors is reset to its maximum value. Anyone tweaking max_sectors_kb for performance reasons will find that it is not persistent. If this distills down to a simpler rule change, then all the better. From my testing: I set /sys/block/sdd/queue/max_sectors_kb to some value. echo 64 > /sys/block/sdd/queue/max_sectors_kb I run sg_logs -t /dev/sdd and the value is reset back to its original value. Other utilities can also trigger udev to run. udevadm monitor monitor will print the received events for: UDEV - the event which udev sends out after rule processing KERNEL - the kernel uevent KERNEL[8537223.347520] change /devices/pci:00/:00:03.0/:08:00.0/host4/port-4:4/end_device-4:4/target4:0:3/4:0:3:0/block/sdd (block) UDEV [8537223.399243] change /devices/pci:00/:00:03.0/:08:00.0/host4/port-4:4/end_device-4:4/target4:0:3/4:0:3:0/block/sdd (block) ... manager->fd_inotify = udev_watch_init(manager->udev); sd_event_add_io(manager->event, &manager->inotify_event, manager->fd_inotify, EPOLLIN, on_inotify, manager); on_inotify (systemd source code: src/udev/udevd.c) synthesize_change ioctl --> BLKRRPART -- Start of kernel code. -- blkdev_ioctl (block/ioctl.c) CASE:BLKRRPART: blkdev_reread_part (block/ioctl.c) _blkdev_reread_part (block/ioctl.c) rescan_partitions (block/partition-generic.c) if (disk->fops->revalidate_disk) disk->fops->revalidate_disk(disk); -- sd driver (drivers/scsi/sd.c sd_revalidate_disk Thanks for your input, Don Brace ESC - Smart Storage Microsemi Corporation
Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc
On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote: > I think I have an understanding what is going on and why Bart's patch is > causing problems for ipr. > I can work around the boot hang in ipr, but ultimately I think we need to > figure out a fix > in scsi / block. I added some tracing and confirmed its not a matter of > commands getting stuck > in ipr. The issue is we are retrying failed commands until we finally run out > of time. This is > what I see: > > 1. sd_revalidate_disk calls scsi_report_opcode > 2. ipr RAID arrays don't support MAINTENANCE_IN / > MI_REPORT_SUPPORTED_OPERATION_CODES > 3. ipr returns the command with DID_ERROR > 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and > returns NEEDS_RETRY > 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which > calls scsi_mq_requeue_cmd > 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior > we did not > 7. This results in the command getting scmd->retries zeroed out when it gets > re-queued, >since we go through prep again and we lose our retry counter, resulting in > lots and lots of retries. > 8. Since the default command timeout for an ipr RAID array is 120 seconds, > these retries go on for >quite a long time... > 9. Finally, the command has been retried so long we trip over the overall > retry timer >in scsi_softirq_done and we timeout the command. > > I'll follow up with a patch to ipr to workaround the hang, but I think we > need to somehow preserve > the retry counter in the scsi command, as this will likely cause issues with > other drivers. Hello Brian, Thanks for the detailed analysis. This is very helpful. Have you considered to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION CODES commands with the appropriate check condition code instead of DID_ERROR? Thanks, Bart.
Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb
On Fri, 2017-08-18 at 21:29 +, Don Brace wrote: > As far as I can see, udev looks for file access in sysfs. > I am not exactly sure which rule changes this. It was added in more recent > distros. Can someone help me out? Hello Don, Can you check on your test system which udev rule changes max_sectors_kb? I have checked two recent Linux distro's but haven't been able to find such a udev rule: $ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc -l 0 Thanks, Bart.
[Bug 196707] Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Manager
https://bugzilla.kernel.org/show_bug.cgi?id=196707 Dave Carroll (david.carr...@microsemi.com) changed: What|Removed |Added CC||david.carr...@microsemi.com --- Comment #1 from Dave Carroll (david.carr...@microsemi.com) --- Created attachment 258015 --> https://bugzilla.kernel.org/attachment.cgi?id=258015&action=edit Patch to move pci check to pcie cards Hi, I've attached a patch based on Linus' current tree, but should apply to your kernel. Can you try this patch, and report back. Thanks, -Dave Carroll -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc
On 08/18/2017 04:41 PM, Bart Van Assche wrote: > On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote: >> I think I have an understanding what is going on and why Bart's patch is >> causing problems for ipr. >> I can work around the boot hang in ipr, but ultimately I think we need to >> figure out a fix >> in scsi / block. I added some tracing and confirmed its not a matter of >> commands getting stuck >> in ipr. The issue is we are retrying failed commands until we finally run >> out of time. This is >> what I see: >> >> 1. sd_revalidate_disk calls scsi_report_opcode >> 2. ipr RAID arrays don't support MAINTENANCE_IN / >> MI_REPORT_SUPPORTED_OPERATION_CODES >> 3. ipr returns the command with DID_ERROR >> 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, >> and returns NEEDS_RETRY >> 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which >> calls scsi_mq_requeue_cmd >> 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior >> we did not >> 7. This results in the command getting scmd->retries zeroed out when it gets >> re-queued, >>since we go through prep again and we lose our retry counter, resulting >> in lots and lots of retries. >> 8. Since the default command timeout for an ipr RAID array is 120 seconds, >> these retries go on for >>quite a long time... >> 9. Finally, the command has been retried so long we trip over the overall >> retry timer >>in scsi_softirq_done and we timeout the command. >> >> I'll follow up with a patch to ipr to workaround the hang, but I think we >> need to somehow preserve >> the retry counter in the scsi command, as this will likely cause issues with >> other drivers. > > Hello Brian, > > Thanks for the detailed analysis. This is very helpful. Have you considered > to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION > CODES commands with the appropriate check condition code instead of DID_ERROR? Yes. That data is actually in the sense buffer, but since I'm also setting DID_ERROR, scsi_decide_disposition isn't using it. I've got a patch to do just as you suggest, to stop setting DID_ERROR when there is more detailed error data available, but it will need some additional testing before I submit, as it will impact much more than just this case. To add to my analysis above, #9 should not be there... It looks like jiffies_at_alloc would also be getting reinitialized in this case, resulting in a perpetual retry, which is what I was seeing. Thanks, Brian -- Brian King Power Linux I/O IBM Linux Technology Center
Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc
On Fri, 2017-08-18 at 16:57 -0500, Brian King wrote: > To add to my analysis above, #9 should not be there... It looks like > jiffies_at_alloc would also be getting reinitialized in this case, resulting > in > a perpetual retry, which is what I was seeing. Hello Brian, Some time ago I noticed that jiffies_at_alloc is indeed set while a command is being prepared instead of at command allocation time. I think that behavior was introduced in 2005 through commit b21a41385118 ("[SCSI] add global timeout to the scsi mid-layer"). At that time SCSI commands were allocated at prep time and freed at unprep time. Recently that has been changed such that a SCSI command (struct scsi_cmnd) has the same lifetime as struct request. In other words, it was not possible in 2005 but it is possible today to set jiffies_at_alloc at command allocation time instead of when a command is being prepared. Do you want me to submit a patch that implements this change? Bart.
[Bug 196707] Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Manager
https://bugzilla.kernel.org/show_bug.cgi?id=196707 --- Comment #2 from Andreas Gerlich (kernel@yaze-ag.de) --- Created attachment 258017 --> https://bugzilla.kernel.org/attachment.cgi?id=258017&action=edit Kernel_output_after_patch_from_Dave_Carroll.cap Hello Dave Carroll, I apply the patch and put the output of the kernel start (4.12.8) as a attachment. The Kernel crashes again at the "Kernel Device Manager" after errors from the aacraid. Best Regards Andreas Gerlich -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.