date:20170818

Re: [PATCH] Revert "scsi-mq: Always unprepare before requeuing a request"

2017-08-18 Thread Michael Ellerman

Bart Van Assche  writes:

> On Wed, 2017-08-16 at 22:51 -0400, Martin K. Petersen wrote:
>> > When I checked earlier today the ipr patch was not yet in linux-next
>> 
>> That's weird. They were both committed two weeks ago.
>> 
>> They appear to be in there now, though:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/drivers/scsi?ofs=50
>
> Hello Martin,
>
> As far as I can see commit 270065e92c31 ("scsi-mq: Always unprepare before
> requeuing a request") was yesterday in linux-next but not the ipr fix
> (commit b0e17a9b0df2 ("scsi: ipr: Fix scsi-mq lockdep issue")). The
> ipr fix is in today's linux-next but was not in linux-next yesterday (the
> Australian time zone applies to the linux-next date labels):
> $ git tag --contains 270065e92c31
> next-20170816
> $ git tag --contains b0e17a9b0df2
> next-20170817
> $ git show next-20170816 | head -3
> tag next-20170816
> Tagger: Stephen Rothwell 
> Date:   Wed Aug 16 15:26:43 2017 +1000
> $ git show next-20170817 | head -3  
> tag next-20170817
> Tagger: Stephen Rothwell 
> Date:   Thu Aug 17 16:34:30 2017 +1000
>
> I think this means that the ipr fix went upstream before it ended up in
> linux-next.

It was in linux-next, but as a different commit. I don't know why.

$ git log -1 --format=oneline next-20170816 drivers/scsi/ipr.c
48b580cacfae123471f8cd43ca81b0e53c9cf702 scsi: ipr: Fix scsi-mq lockdep issue


$ git tag --contains 48b580cacfae123471f8cd43ca81b0e53c9cf702
next-20170809
next-20170810
next-20170811
next-20170815
next-20170816


cheers

HBA recommended as FC target

2017-08-18 Thread Thomas Glanzmann

Hello,
I look for a FC HBA that works with the Linux target. Can somone
recommend a HBA type to me?

Cheers,
Thomas

Sniffing FC traffic

2017-08-18 Thread Thomas Glanzmann

Hello,
I would like to create a setup that allows me to sniff FC traffic. Is it
possible with Linux or can someone recommend a setup that works. I want
to avoid buying a 120kUSD fabric analyzer.

Cheers,
Thomas

Re: HBA recommended as FC target

2017-08-18 Thread Laurence Oberman

On Fri, Aug 18, 2017 at 8:36 AM, Thomas Glanzmann  wrote:
> Hello,
> I look for a FC HBA that works with the Linux target. Can somone
> recommend a HBA type to me?
>
> Cheers,
> Thomas

Hello

Any of the Qlogic qla24xx or qla25xx and higher that allow you to
disable initiator mode will work.
I use qla25xx 8Gbit in all my Target arrays.

Thanks
Laurence

Re: Sniffing FC traffic

2017-08-18 Thread Laurence Oberman

On Fri, Aug 18, 2017 at 8:37 AM, Thomas Glanzmann  wrote:
> Hello,
> I would like to create a setup that allows me to sniff FC traffic. Is it
> possible with Linux or can someone recommend a setup that works. I want
> to avoid buying a 120kUSD fabric analyzer.
>
> Cheers,
> Thomas

There is no way to do this using adapters and generic F/C with Linux as the O/S.
The ability to enable debugging in the F/C drivers will expose some of
the internals but there is no way to sniff directly as far as I am
aware.

Many switches allow port level tracing facilities but inline sniffing
using Linux and generic hosts is not possible.

We have Finisars for inline tracing when we have to debug host and
fabric issues.

Software based FCOE using libfc and the Intel cards for example will
allow Wireshark tracing but that is encapsulated F/C in Ethernet
packets hence the Wireshark ability.

Thanks
Laurence

Re: Sniffing FC traffic

2017-08-18 Thread Steve Magnani




On 08/18/2017 08:31 AM, Laurence Oberman wrote:

On Fri, Aug 18, 2017 at 8:37 AM, Thomas Glanzmann  wrote:

I would like to create a setup that allows me to sniff FC traffic. Is it
possible with Linux or can someone recommend a setup that works. I want
to avoid buying a 120kUSD fabric analyzer.


There is no way to do this using adapters and generic F/C with Linux as the O/S.
The ability to enable debugging in the F/C drivers will expose some of
the internals but there is no way to sniff directly as far as I am
aware.

Many switches allow port level tracing facilities but inline sniffing
using Linux and generic hosts is not possible.

We have Finisars for inline tracing when we have to debug host and
fabric issues.

Software based FCOE using libfc and the Intel cards for example will
allow Wireshark tracing but that is encapsulated F/C in Ethernet
packets hence the Wireshark ability.


We have had some success using a (Teledyne) LeCroy analyzer. Its GUI 
(SierraNet) runs only on a Windows host, but it is capable of exporting 
a capture to Wireshark. This mostly shows just the ELS and FCP 
transactions; to debug low-level link issues you'd have to work with the 
original capture in SierraNet.


Like any other kind of analyzer the cost will be proportional to the 
speed of the link you're trying to sniff and the amount of capture depth 
you need. You may be able to save some money by getting secondhand 
equipment and/or running the link at lower speed when you need to debug 
something.


Regards,

 Steven J. Magnani   "I claim this network for MARS!
www.digidescorp.com Earthling, return my space modulator!"

 #include

Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t

2017-08-18 Thread Paul Menzel


Dear Christoph,


On 08/06/17 20:06, Paul Menzel wrote:


On 2017-08-05 11:30, Christoph Hellwig wrote:

On Thu, Aug 03, 2017 at 07:42:15PM +0200, Paul Menzel wrote:


Since the merge windows opened for Linux 4.13, I am unable to resume 
from ACPI S3 suspend on a Lenovo X60t. The graphics comes back, but I am 
unable to enter anything, and the system seems to be hung. Magic SysRq keys 
still work though, but powering the system of doesn’t work. The power 
button also does not work.


Please find the stack trace with Linux 4.13-rc3 captured over the serial
console below.


Is this really -rc3?  rc3 has a commit to disable block runtime pm
for blk-mq, which is now the default for scsi.  So with -rc1 we've
seen similar reports, but rc3 would be odd and suggest we have further
problems.


Yes, this was 4.13-rc3. Rebuilding the Linux kernel from commit 0fdd951c 
(Merge tag 'media/v4.13-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media) shows 
the same behavior.


Just an update, that this is still present in Linux 4.13-rc5+, that 
means commit 04d49f3638d0 (Merge tag 'drm-fixes-for-v4.13-rc6' of 
git://people.freedesktop.org/~airlied/linux).



Kind regards,

Paul

Re: HBA recommended as FC target

2017-08-18 Thread Thomas Glanzmann

Hello Laurence,

* Laurence Oberman  [2017-08-18 15:26]:
> Any of the Qlogic qla24xx or qla25xx and higher that allow you to
> disable initiator mode will work.  I use qla25xx 8Gbit in all my
> Target arrays.

thank you for your recommendations I ordered a Qlogic QLE2462-HP
-PX2510401 - 4GB 2-Port Fibre on ebay.

* Laurence Oberman  [2017-08-18 15:31]:
> There is no way to do this using adapters and generic F/C with Linux
> as the O/S. The ability to enable debugging in the F/C drivers will
> expose some of the internals but there is no way to sniff directly as
> far as I am aware.

I enabled debugging in the Linux target once and I could see quiet
detailed information. Probably this will be enough for me. At the time I
was debugging SCSI reservations.

> Many switches allow port level tracing facilities but inline sniffing
> using Linux and generic hosts is not possible.

Are you aware of facilities in entry level brocade switches that I can
use. I just have to trace one port.

Cheers,
Thomas

Re: Sniffing FC traffic

2017-08-18 Thread Thomas Glanzmann

Hello Steve,

> We have had some success using a (Teledyne) LeCroy analyzer. Its GUI

I found a LeCroy FC analyzer on ebay for 500 EURs. I'm not buying it
yet, but keep it in mind. Much less the cost I had in mind (100 T EUR).

Cheers,
Thomas

[Bug 196707] New: Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Ma

2017-08-18 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=196707

Bug ID: 196707
   Summary: Adaptec ICP9087MA: aacraid prints "AAC: Host adapter
is dead (or got a PCI error) -1" twice times around
errors of aacraid and the kernel crashes after
starting the "Kernel Device Manager"
   Product: SCSI Drivers
   Version: 2.5
Kernel Version: 4.12.8
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: AACRAID
  Assignee: scsi_drivers-aacr...@kernel-bugs.osdl.org
  Reporter: kernel@yaze-ag.de
CC: a...@borouhin.com, ar...@maven.pl, bug...@grubelek.pl,
chris...@niksun.com, davyg...@pobox.com,
har...@snel.ws, he...@fitmsg.net, hm10...@gmail.com,
rodrigoaguileraparr...@gmail.com,
scnai...@hotmail.com,
scsi_drivers-aacr...@kernel-bugs.osdl.org
Regression: No

Created attachment 258013
  --> https://bugzilla.kernel.org/attachment.cgi?id=258013&action=edit
Kernel output of 3 kernel starts with connected / disconnected HDDs and
different kernels

+++ This bug was initially created as a clone of Bug #151661 +++

Hi,

my english language is not so good. I hope you understand what I write here.

The kernel 4.12.8 crashes after errors from aacraid and after the start of the
"kernel device manager" when I connect 8 HDDs (power on) to the
ICP9087MA-RAID-Controller. (See attachment 0_...)

The crash is before the Kernel opens log files in /var/log. I found NO entries
in the log files. So I configure grub (in /etc/default/grub) and the kernel to
use the serial line (/dev/ttyS0, COM1) as the console. On my Notebook I capture
the serial output of the starting kernel with minicom.

If I disconnect the 8 HDDs (power off) the kernel 4.12.8 starts normal and I
can use the system but without the two Host drives (2 x RAID-5) of the
ICP9087MA.
(See attachment 1_...)

When I use different kernels of version 3.x.x (from kernel.org or Debian) all
works fine when the 8 HDDs are connected to the ICP9087MA-Raid-Controller. (See
attachment 2_...)

Attachment (output of 3 Kernel starts in one attachment):

0_kernel_crash_(4.12.8)_with_all_drives_at_the_ICP9087MA.cap
1_normal_start_(4.12.8)_without_any_drive_at_the_ICP9087MA.cap
2_normal_start_(3.18.65)_with_all_drives_at_the_ICP9087MA.cap

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.

[PATCH] sd: preserve sysfs updates to max_sectors_kb

2017-08-18 Thread Don Brace

prevent systemd-udevd from changing a device's sysfs entry
max_sectors_kb back to the default value.
 - max_sectors_kb can be tweaked for better performance.
 - udev can be triggered by sg_logs -t or scsi_temperature, ...
 - sd_revalidate_disk is called from udev by ioctl BLKRRPART

Reviewed-by: Scott Teel 
Signed-off-by: Don Brace 
---
 drivers/scsi/sd.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bea36ad..457dc7c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3055,6 +3055,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
sector_t old_capacity = sdkp->capacity;
unsigned char *buffer;
unsigned int dev_max, rw_max;
+   unsigned int max_sectors;
 
SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
  "sd_revalidate_disk\n"));
@@ -3128,9 +3129,14 @@ static int sd_revalidate_disk(struct gendisk *disk)
rw_max = min_not_zero(logical_to_sectors(sdp, dev_max),
  (sector_t)BLK_DEF_MAX_SECTORS);
 
-   /* Combine with controller limits */
-   q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
+   /* Check for max_sectors_kb update through sysfs */
+   if (q->limits.max_sectors < min(rw_max, queue_max_hw_sectors(q)))
+   max_sectors = q->limits.max_sectors;
+   else
+   max_sectors = min(rw_max, queue_max_hw_sectors(q));
 
+   /* Combine with controller limits */
+   q->limits.max_sectors = max_sectors;
set_capacity(disk, logical_to_sectors(sdp, sdkp->capacity));
sd_config_write_same(sdkp);
kfree(buffer);

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Brian King

On 08/17/2017 10:52 AM, Bart Van Assche wrote:
> On Wed, 2017-08-16 at 18:18 -0500, Brian King wrote:
>> On 08/16/2017 12:21 PM, Bart Van Assche wrote:
>>> On Wed, 2017-08-16 at 22:30 +0530, Abdul Haleem wrote:
 As of next-20170809, linux-next on powerpc boot hung with below trace
 message.

 [ ... ]

 A bisection resulted in first bad commit (270065e92 - scsi: scsi-mq:
 Always unprepare ...) in the merge branch 'scsi/for-next'

 System booted fine when the below commit is reverted: 

 commit 270065e92c317845d69095ec8e3d18616b5b39d5
 Author: Bart Van Assche 
 Date:   Thu Aug 3 14:40:14 2017 -0700

 scsi: scsi-mq: Always unprepare before requeuing a request
>>>
>>> Hello Brian and Michael,
>>>
>>> Do you agree that this probably indicates a bug in the PowerPC block driver
>>> that is used to access the boot disk? Anyway, since a solution is not yet
>>> available, I will submit a revert for this patch.
>>
>> I've been looking at this a bit, and can recreate the issue, but haven't
>> got to root cause of the issue as of yet. If I do a sysrq-w while the system 
>> is hung
>> during boot I see this:
>>
>> [   25.561523] Workqueue: events_unbound async_run_entry_fn
>> [   25.561527] Call Trace:
>> [   25.561529] [c001697873f0] [c00169701600] 0xc00169701600 
>> (unreliable)
>> [   25.561534] [c001697875c0] [c001ab78] __switch_to+0x2e8/0x430
>> [   25.561539] [c00169787620] [c091ccb0] __schedule+0x310/0xa00
>> [   25.561543] [c001697876f0] [c091d3e0] schedule+0x40/0xb0
>> [   25.561548] [c00169787720] [c0921e40] 
>> schedule_timeout+0x200/0x430
>> [   25.561553] [c00169787810] [c091db10] 
>> io_schedule_timeout+0x30/0x70
>> [   25.561558] [c00169787840] [c091e978] 
>> wait_for_common_io.constprop.3+0x178/0x280
>> [   25.561563] [c001697878c0] [c047f7ec] blk_execute_rq+0x7c/0xd0
>> [   25.561567] [c00169787910] [c0614cd0] scsi_execute+0x100/0x230
>> [   25.561572] [c00169787990] [c060d29c] 
>> scsi_report_opcode+0xbc/0x170
>> [   25.561577] [c00169787a50] [d4fe6404] 
>> sd_revalidate_disk+0xe04/0x1620 [sd_mod]
>> [   25.561583] [c00169787b80] [d4fe6d84] 
>> sd_probe_async+0xb4/0x230 [sd_mod]
>> [   25.561588] [c00169787c00] [c010fc44] 
>> async_run_entry_fn+0x74/0x210
>> [   25.561593] [c00169787c90] [c0102f48] 
>> process_one_work+0x198/0x480
>> [   25.561598] [c00169787d30] [c01032b8] worker_thread+0x88/0x510
>> [   25.561603] [c00169787dc0] [c010b030] kthread+0x160/0x1a0
>> [   25.561608] [c00169787e30] [c000b3a4] 
>> ret_from_kernel_thread+0x5c/0xb8
>>
>> I was noticing that we are commonly in scsi_report_opcode. Since ipr RAID 
>> arrays don't support
>> the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, I tried setting 
>> sdev->no_report_opcodes = 1
>> in ipr's slave configure. This seems to eliminate the boot hang for me, but 
>> is only working around
>> the issue. Since this command is not supported by ipr, it should return with 
>> an illegal request.
>> When I'm hung at this point, there is nothing outstanding to the adapter / 
>> driver. I'll continue
>> debugging...
> 
> (+linux-scsi)
> 
> Hello Brian,
> 
> Is kernel debugging enabled on your test system? Is lockdep enabled?
> Anyway, stack traces like the above usually mean that a request got stuck in
> a block or scsi driver (ipr in this case). Information about pending requests,
> including the SCSI CDB, is available under /sys/kernel/debug/block (see also
> commit 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")).

I think I have an understanding what is going on and why Bart's patch is 
causing problems for ipr.
I can work around the boot hang in ipr, but ultimately I think we need to 
figure out a fix
in scsi / block. I added some tracing and confirmed its not a matter of 
commands getting stuck
in ipr. The issue is we are retrying failed commands until we finally run out 
of time. This is
what I see:

1. sd_revalidate_disk calls scsi_report_opcode
2. ipr RAID arrays don't support MAINTENANCE_IN / 
MI_REPORT_SUPPORTED_OPERATION_CODES
3. ipr returns the command with DID_ERROR
4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and 
returns NEEDS_RETRY
5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which 
calls scsi_mq_requeue_cmd
6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior we 
did not
7. This results in the command getting scmd->retries zeroed out when it gets 
re-queued,
   since we go through prep again and we lose our retry counter, resulting in 
lots and lots of retries.
8. Since the default command timeout for an ipr RAID array is 120 seconds, 
these retries go on for
   quite a long time...
9. Finally, the command has been retried so long we trip over the overall retry 
timer
   in scsi_soft

Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb

2017-08-18 Thread Bart Van Assche

On Fri, 2017-08-18 at 16:00 -0500, Don Brace wrote:
> prevent systemd-udevd from changing a device's sysfs entry
> max_sectors_kb back to the default value.
>  - max_sectors_kb can be tweaked for better performance.
>  - udev can be triggered by sg_logs -t or scsi_temperature, ...
>  - sd_revalidate_disk is called from udev by ioctl BLKRRPART

Hello Don,

Which udev rule changes max_sectors_kb back to the default? Why do you want
to change the kernel code instead of modifying that udev rule? What software
changes max_sectors_kb to a smaller value? Is it a udev rule or perhaps
something else?

Thanks,

Bart.

[PATCH] ipr: Set no_report_opcodes for RAID arrays

2017-08-18 Thread Brian King

Since ipr RAID arrays do not support the MAINTENANCE_IN /
MI_REPORT_SUPPORTED_OPERATION_CODES, set no_report_opcodes
to prevent it from being sent.

Signed-off-by: Brian King 
---

Index: linux-2.6.git/drivers/scsi/ipr.c
===
--- linux-2.6.git.orig/drivers/scsi/ipr.c
+++ linux-2.6.git/drivers/scsi/ipr.c
@@ -4935,6 +4935,7 @@ static int ipr_slave_configure(struct sc
}
if (ipr_is_vset_device(res)) {
sdev->scsi_level = SCSI_SPC_3;
+   sdev->no_report_opcodes = 1;
blk_queue_rq_timeout(sdev->request_queue,
 IPR_VSET_RW_TIMEOUT);
blk_queue_max_hw_sectors(sdev->request_queue, 
IPR_VSET_MAX_SECTORS);

RE: [PATCH] sd: preserve sysfs updates to max_sectors_kb

2017-08-18 Thread Don Brace

> -Original Message-
> From: Bart Van Assche [mailto:bart.vanass...@wdc.com]
> Sent: Friday, August 18, 2017 4:06 PM
> To: h...@infradead.org; Viswas G ; Gerry
> Morong ; Mahesh Rajashekhara
> ; posw...@suse.com; Scott
> Benesh ; Don Brace
> ; Bader Ali - Saleh
> ; Kevin Barnett
> ; joseph.szczy...@hpe.com; Scott Teel
> ; j...@linux.vnet.ibm.com; Justin Lindley
> ; John Hall 
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb
> 
> EXTERNAL EMAIL
> 
> 
> On Fri, 2017-08-18 at 16:00 -0500, Don Brace wrote:
> > prevent systemd-udevd from changing a device's sysfs entry
> > max_sectors_kb back to the default value.
> >  - max_sectors_kb can be tweaked for better performance.
> >  - udev can be triggered by sg_logs -t or scsi_temperature, ...
> >  - sd_revalidate_disk is called from udev by ioctl BLKRRPART
> 
> Hello Don,
> 
> Which udev rule changes max_sectors_kb back to the default? Why do you
> want
> to change the kernel code instead of modifying that udev rule? What
> software
> changes max_sectors_kb to a smaller value? Is it a udev rule or perhaps
> something else?
> 
> Thanks,
> 
> Bart.

As far as I can see, udev looks for file access in sysfs. 
I am not exactly sure which rule changes this. It was added in more recent
distros. Can someone help me out?

I wanted to change the kernel code because it looks to me like anytime
sd_revalidate_disk is called max_sectors is reset to its maximum value. Anyone
tweaking max_sectors_kb for performance reasons will find that it is not
persistent.

If this distills down to a simpler rule change, then all the better.

From my testing:

I set /sys/block/sdd/queue/max_sectors_kb to some value.
echo 64 > /sys/block/sdd/queue/max_sectors_kb
I run sg_logs -t /dev/sdd and the value is reset back to its original value.
Other utilities can also trigger udev to run. 

udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[8537223.347520] change   
/devices/pci:00/:00:03.0/:08:00.0/host4/port-4:4/end_device-4:4/target4:0:3/4:0:3:0/block/sdd
 (block)
UDEV  [8537223.399243] change   
/devices/pci:00/:00:03.0/:08:00.0/host4/port-4:4/end_device-4:4/target4:0:3/4:0:3:0/block/sdd
 (block)
...
manager->fd_inotify = udev_watch_init(manager->udev);
   sd_event_add_io(manager->event, &manager->inotify_event, 
manager->fd_inotify, EPOLLIN, on_inotify, manager);
   on_inotify (systemd source code: src/udev/udevd.c)
  synthesize_change
ioctl --> BLKRRPART
  --
  Start of kernel code.
  --
  blkdev_ioctl (block/ioctl.c)
   CASE:BLKRRPART: blkdev_reread_part 
(block/ioctl.c)
 _blkdev_reread_part (block/ioctl.c)
rescan_partitions 
(block/partition-generic.c)
 if 
(disk->fops->revalidate_disk)
 
disk->fops->revalidate_disk(disk);
   
--
   sd driver 
(drivers/scsi/sd.c
   sd_revalidate_disk


Thanks for your input,
Don Brace
ESC - Smart Storage
Microsemi Corporation

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Bart Van Assche

On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote:
> I think I have an understanding what is going on and why Bart's patch is 
> causing problems for ipr.
> I can work around the boot hang in ipr, but ultimately I think we need to 
> figure out a fix
> in scsi / block. I added some tracing and confirmed its not a matter of 
> commands getting stuck
> in ipr. The issue is we are retrying failed commands until we finally run out 
> of time. This is
> what I see:
> 
> 1. sd_revalidate_disk calls scsi_report_opcode
> 2. ipr RAID arrays don't support MAINTENANCE_IN / 
> MI_REPORT_SUPPORTED_OPERATION_CODES
> 3. ipr returns the command with DID_ERROR
> 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and 
> returns NEEDS_RETRY
> 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which 
> calls scsi_mq_requeue_cmd
> 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior 
> we did not
> 7. This results in the command getting scmd->retries zeroed out when it gets 
> re-queued,
>since we go through prep again and we lose our retry counter, resulting in 
> lots and lots of retries.
> 8. Since the default command timeout for an ipr RAID array is 120 seconds, 
> these retries go on for
>quite a long time...
> 9. Finally, the command has been retried so long we trip over the overall 
> retry timer
>in scsi_softirq_done and we timeout the command.
> 
> I'll follow up with a patch to ipr to workaround the hang, but I think we 
> need to somehow preserve
> the retry counter in the scsi command, as this will likely cause issues with 
> other drivers. 

Hello Brian,

Thanks for the detailed analysis. This is very helpful. Have you considered
to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION
CODES commands with the appropriate check condition code instead of DID_ERROR?

Thanks,

Bart.

Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb

2017-08-18 Thread Bart Van Assche

On Fri, 2017-08-18 at 21:29 +, Don Brace wrote:
> As far as I can see, udev looks for file access in sysfs. 
> I am not exactly sure which rule changes this. It was added in more recent
> distros. Can someone help me out?

Hello Don,

Can you check on your test system which udev rule changes max_sectors_kb? I
have checked two recent Linux distro's but haven't been able to find such a
udev rule:
$ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc -l
0

Thanks,

Bart.

[Bug 196707] Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Manager

2017-08-18 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=196707

Dave Carroll (david.carr...@microsemi.com) changed:

   What|Removed |Added

 CC||david.carr...@microsemi.com

--- Comment #1 from Dave Carroll (david.carr...@microsemi.com) ---
Created attachment 258015
  --> https://bugzilla.kernel.org/attachment.cgi?id=258015&action=edit
Patch to move pci check to pcie cards

Hi,

I've attached a patch based on Linus' current tree, but should apply to your
kernel. Can you try this patch, and report back.

Thanks, -Dave Carroll

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Brian King

On 08/18/2017 04:41 PM, Bart Van Assche wrote:
> On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote:
>> I think I have an understanding what is going on and why Bart's patch is 
>> causing problems for ipr.
>> I can work around the boot hang in ipr, but ultimately I think we need to 
>> figure out a fix
>> in scsi / block. I added some tracing and confirmed its not a matter of 
>> commands getting stuck
>> in ipr. The issue is we are retrying failed commands until we finally run 
>> out of time. This is
>> what I see:
>>
>> 1. sd_revalidate_disk calls scsi_report_opcode
>> 2. ipr RAID arrays don't support MAINTENANCE_IN / 
>> MI_REPORT_SUPPORTED_OPERATION_CODES
>> 3. ipr returns the command with DID_ERROR
>> 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, 
>> and returns NEEDS_RETRY
>> 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which 
>> calls scsi_mq_requeue_cmd
>> 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior 
>> we did not
>> 7. This results in the command getting scmd->retries zeroed out when it gets 
>> re-queued,
>>since we go through prep again and we lose our retry counter, resulting 
>> in lots and lots of retries.
>> 8. Since the default command timeout for an ipr RAID array is 120 seconds, 
>> these retries go on for
>>quite a long time...
>> 9. Finally, the command has been retried so long we trip over the overall 
>> retry timer
>>in scsi_softirq_done and we timeout the command.
>>
>> I'll follow up with a patch to ipr to workaround the hang, but I think we 
>> need to somehow preserve
>> the retry counter in the scsi command, as this will likely cause issues with 
>> other drivers. 
> 
> Hello Brian,
> 
> Thanks for the detailed analysis. This is very helpful. Have you considered
> to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION
> CODES commands with the appropriate check condition code instead of DID_ERROR?

Yes. That data is actually in the sense buffer, but since I'm also setting 
DID_ERROR,
scsi_decide_disposition isn't using it. I've got a patch to do just as you 
suggest,
to stop setting DID_ERROR when there is more detailed error data available, 
but it will need some additional testing before I submit, as it will impact much
more than just this case. 

To add to my analysis above, #9 should not be there... It looks like
jiffies_at_alloc would also be getting reinitialized in this case, resulting in
a perpetual retry, which is what I was seeing.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Bart Van Assche

On Fri, 2017-08-18 at 16:57 -0500, Brian King wrote:
> To add to my analysis above, #9 should not be there... It looks like
> jiffies_at_alloc would also be getting reinitialized in this case, resulting 
> in
> a perpetual retry, which is what I was seeing.

Hello Brian,

Some time ago I noticed that jiffies_at_alloc is indeed set while a command
is being prepared instead of at command allocation time. I think that
behavior was introduced in 2005 through commit b21a41385118 ("[SCSI] add
global timeout to the scsi mid-layer"). At that time SCSI commands were
allocated at prep time and freed at unprep time. Recently that has been
changed such that a SCSI command (struct scsi_cmnd) has the same lifetime as
struct request. In other words, it was not possible in 2005 but it is
possible today to set jiffies_at_alloc at command allocation time instead of
when a command is being prepared. Do you want me to submit a patch that
implements this change?

Bart.

[Bug 196707] Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Manager

2017-08-18 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=196707

--- Comment #2 from Andreas Gerlich (kernel@yaze-ag.de) ---
Created attachment 258017
  --> https://bugzilla.kernel.org/attachment.cgi?id=258017&action=edit
Kernel_output_after_patch_from_Dave_Carroll.cap

Hello Dave Carroll,

I apply the patch and put the output of the kernel start (4.12.8) as a
attachment.

The Kernel crashes again at the "Kernel Device Manager" after errors from the
aacraid.

Best Regards
Andreas Gerlich

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.

Re: [PATCH] Revert "scsi-mq: Always unprepare before requeuing a request"

HBA recommended as FC target

Sniffing FC traffic

Re: HBA recommended as FC target

Re: Sniffing FC traffic

Re: Sniffing FC traffic

Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t

Re: HBA recommended as FC target

Re: Sniffing FC traffic

[Bug 196707] New: Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Ma

[PATCH] sd: preserve sysfs updates to max_sectors_kb

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb

[PATCH] ipr: Set no_report_opcodes for RAID arrays

RE: [PATCH] sd: preserve sysfs updates to max_sectors_kb

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb

[Bug 196707] Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Manager

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

[Bug 196707] Adaptec ICP9087MA: aacraid prints "AAC: Host adapter is dead (or got a PCI error) -1" twice times around errors of aacraid and the kernel crashes after starting the "Kernel Device Manager

21 matches

Site Navigation

Mail list logo

Footer information