Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-21 Thread Martin K. Petersen

Brian,

>> Thanks for the detailed analysis. This is very helpful. Have you
>> considered to change the ipr driver such that it terminates REPORT
>> SUPPORTED OPERATION CODES commands with the appropriate check
>> condition code instead of DID_ERROR?
>
> Yes. That data is actually in the sense buffer, but since I'm also
> setting DID_ERROR, scsi_decide_disposition isn't using it. I've got a
> patch to do just as you suggest, to stop setting DID_ERROR when there
> is more detailed error data available, but it will need some
> additional testing before I submit, as it will impact much more than
> just this case.

I agree. In this case where a command is not supported, a check
condition would be a better way to signal the failure to the SCSI
midlayer.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Bart Van Assche
On Fri, 2017-08-18 at 16:57 -0500, Brian King wrote:
> To add to my analysis above, #9 should not be there... It looks like
> jiffies_at_alloc would also be getting reinitialized in this case, resulting 
> in
> a perpetual retry, which is what I was seeing.

Hello Brian,

Some time ago I noticed that jiffies_at_alloc is indeed set while a command
is being prepared instead of at command allocation time. I think that
behavior was introduced in 2005 through commit b21a41385118 ("[SCSI] add
global timeout to the scsi mid-layer"). At that time SCSI commands were
allocated at prep time and freed at unprep time. Recently that has been
changed such that a SCSI command (struct scsi_cmnd) has the same lifetime as
struct request. In other words, it was not possible in 2005 but it is
possible today to set jiffies_at_alloc at command allocation time instead of
when a command is being prepared. Do you want me to submit a patch that
implements this change?

Bart.



Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Brian King
On 08/18/2017 04:41 PM, Bart Van Assche wrote:
> On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote:
>> I think I have an understanding what is going on and why Bart's patch is 
>> causing problems for ipr.
>> I can work around the boot hang in ipr, but ultimately I think we need to 
>> figure out a fix
>> in scsi / block. I added some tracing and confirmed its not a matter of 
>> commands getting stuck
>> in ipr. The issue is we are retrying failed commands until we finally run 
>> out of time. This is
>> what I see:
>>
>> 1. sd_revalidate_disk calls scsi_report_opcode
>> 2. ipr RAID arrays don't support MAINTENANCE_IN / 
>> MI_REPORT_SUPPORTED_OPERATION_CODES
>> 3. ipr returns the command with DID_ERROR
>> 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, 
>> and returns NEEDS_RETRY
>> 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which 
>> calls scsi_mq_requeue_cmd
>> 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior 
>> we did not
>> 7. This results in the command getting scmd->retries zeroed out when it gets 
>> re-queued,
>>since we go through prep again and we lose our retry counter, resulting 
>> in lots and lots of retries.
>> 8. Since the default command timeout for an ipr RAID array is 120 seconds, 
>> these retries go on for
>>quite a long time...
>> 9. Finally, the command has been retried so long we trip over the overall 
>> retry timer
>>in scsi_softirq_done and we timeout the command.
>>
>> I'll follow up with a patch to ipr to workaround the hang, but I think we 
>> need to somehow preserve
>> the retry counter in the scsi command, as this will likely cause issues with 
>> other drivers. 
> 
> Hello Brian,
> 
> Thanks for the detailed analysis. This is very helpful. Have you considered
> to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION
> CODES commands with the appropriate check condition code instead of DID_ERROR?

Yes. That data is actually in the sense buffer, but since I'm also setting 
DID_ERROR,
scsi_decide_disposition isn't using it. I've got a patch to do just as you 
suggest,
to stop setting DID_ERROR when there is more detailed error data available, 
but it will need some additional testing before I submit, as it will impact much
more than just this case. 

To add to my analysis above, #9 should not be there... It looks like
jiffies_at_alloc would also be getting reinitialized in this case, resulting in
a perpetual retry, which is what I was seeing.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Bart Van Assche
On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote:
> I think I have an understanding what is going on and why Bart's patch is 
> causing problems for ipr.
> I can work around the boot hang in ipr, but ultimately I think we need to 
> figure out a fix
> in scsi / block. I added some tracing and confirmed its not a matter of 
> commands getting stuck
> in ipr. The issue is we are retrying failed commands until we finally run out 
> of time. This is
> what I see:
> 
> 1. sd_revalidate_disk calls scsi_report_opcode
> 2. ipr RAID arrays don't support MAINTENANCE_IN / 
> MI_REPORT_SUPPORTED_OPERATION_CODES
> 3. ipr returns the command with DID_ERROR
> 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and 
> returns NEEDS_RETRY
> 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which 
> calls scsi_mq_requeue_cmd
> 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior 
> we did not
> 7. This results in the command getting scmd->retries zeroed out when it gets 
> re-queued,
>since we go through prep again and we lose our retry counter, resulting in 
> lots and lots of retries.
> 8. Since the default command timeout for an ipr RAID array is 120 seconds, 
> these retries go on for
>quite a long time...
> 9. Finally, the command has been retried so long we trip over the overall 
> retry timer
>in scsi_softirq_done and we timeout the command.
> 
> I'll follow up with a patch to ipr to workaround the hang, but I think we 
> need to somehow preserve
> the retry counter in the scsi command, as this will likely cause issues with 
> other drivers. 

Hello Brian,

Thanks for the detailed analysis. This is very helpful. Have you considered
to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION
CODES commands with the appropriate check condition code instead of DID_ERROR?

Thanks,

Bart.

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-18 Thread Brian King
On 08/17/2017 10:52 AM, Bart Van Assche wrote:
> On Wed, 2017-08-16 at 18:18 -0500, Brian King wrote:
>> On 08/16/2017 12:21 PM, Bart Van Assche wrote:
>>> On Wed, 2017-08-16 at 22:30 +0530, Abdul Haleem wrote:
 As of next-20170809, linux-next on powerpc boot hung with below trace
 message.

 [ ... ]

 A bisection resulted in first bad commit (270065e92 - scsi: scsi-mq:
 Always unprepare ...) in the merge branch 'scsi/for-next'

 System booted fine when the below commit is reverted: 

 commit 270065e92c317845d69095ec8e3d18616b5b39d5
 Author: Bart Van Assche 
 Date:   Thu Aug 3 14:40:14 2017 -0700

 scsi: scsi-mq: Always unprepare before requeuing a request
>>>
>>> Hello Brian and Michael,
>>>
>>> Do you agree that this probably indicates a bug in the PowerPC block driver
>>> that is used to access the boot disk? Anyway, since a solution is not yet
>>> available, I will submit a revert for this patch.
>>
>> I've been looking at this a bit, and can recreate the issue, but haven't
>> got to root cause of the issue as of yet. If I do a sysrq-w while the system 
>> is hung
>> during boot I see this:
>>
>> [   25.561523] Workqueue: events_unbound async_run_entry_fn
>> [   25.561527] Call Trace:
>> [   25.561529] [c001697873f0] [c00169701600] 0xc00169701600 
>> (unreliable)
>> [   25.561534] [c001697875c0] [c001ab78] __switch_to+0x2e8/0x430
>> [   25.561539] [c00169787620] [c091ccb0] __schedule+0x310/0xa00
>> [   25.561543] [c001697876f0] [c091d3e0] schedule+0x40/0xb0
>> [   25.561548] [c00169787720] [c0921e40] 
>> schedule_timeout+0x200/0x430
>> [   25.561553] [c00169787810] [c091db10] 
>> io_schedule_timeout+0x30/0x70
>> [   25.561558] [c00169787840] [c091e978] 
>> wait_for_common_io.constprop.3+0x178/0x280
>> [   25.561563] [c001697878c0] [c047f7ec] blk_execute_rq+0x7c/0xd0
>> [   25.561567] [c00169787910] [c0614cd0] scsi_execute+0x100/0x230
>> [   25.561572] [c00169787990] [c060d29c] 
>> scsi_report_opcode+0xbc/0x170
>> [   25.561577] [c00169787a50] [d4fe6404] 
>> sd_revalidate_disk+0xe04/0x1620 [sd_mod]
>> [   25.561583] [c00169787b80] [d4fe6d84] 
>> sd_probe_async+0xb4/0x230 [sd_mod]
>> [   25.561588] [c00169787c00] [c010fc44] 
>> async_run_entry_fn+0x74/0x210
>> [   25.561593] [c00169787c90] [c0102f48] 
>> process_one_work+0x198/0x480
>> [   25.561598] [c00169787d30] [c01032b8] worker_thread+0x88/0x510
>> [   25.561603] [c00169787dc0] [c010b030] kthread+0x160/0x1a0
>> [   25.561608] [c00169787e30] [c000b3a4] 
>> ret_from_kernel_thread+0x5c/0xb8
>>
>> I was noticing that we are commonly in scsi_report_opcode. Since ipr RAID 
>> arrays don't support
>> the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, I tried setting 
>> sdev->no_report_opcodes = 1
>> in ipr's slave configure. This seems to eliminate the boot hang for me, but 
>> is only working around
>> the issue. Since this command is not supported by ipr, it should return with 
>> an illegal request.
>> When I'm hung at this point, there is nothing outstanding to the adapter / 
>> driver. I'll continue
>> debugging...
> 
> (+linux-scsi)
> 
> Hello Brian,
> 
> Is kernel debugging enabled on your test system? Is lockdep enabled?
> Anyway, stack traces like the above usually mean that a request got stuck in
> a block or scsi driver (ipr in this case). Information about pending requests,
> including the SCSI CDB, is available under /sys/kernel/debug/block (see also
> commit 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")).

I think I have an understanding what is going on and why Bart's patch is 
causing problems for ipr.
I can work around the boot hang in ipr, but ultimately I think we need to 
figure out a fix
in scsi / block. I added some tracing and confirmed its not a matter of 
commands getting stuck
in ipr. The issue is we are retrying failed commands until we finally run out 
of time. This is
what I see:

1. sd_revalidate_disk calls scsi_report_opcode
2. ipr RAID arrays don't support MAINTENANCE_IN / 
MI_REPORT_SUPPORTED_OPERATION_CODES
3. ipr returns the command with DID_ERROR
4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and 
returns NEEDS_RETRY
5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which 
calls scsi_mq_requeue_cmd
6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior we 
did not
7. This results in the command getting scmd->retries zeroed out when it gets 
re-queued,
   since we go through prep again and we lose our retry counter, resulting in 
lots and lots of retries.
8. Since the default command timeout for an ipr RAID array is 120 seconds, 
these retries go on for
   quite a long time...
9. Finally, the command has been retried so long we trip over the overall 

Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-17 Thread Bart Van Assche
On Wed, 2017-08-16 at 18:18 -0500, Brian King wrote:
> On 08/16/2017 12:21 PM, Bart Van Assche wrote:
> > On Wed, 2017-08-16 at 22:30 +0530, Abdul Haleem wrote:
> > > As of next-20170809, linux-next on powerpc boot hung with below trace
> > > message.
> > > 
> > > [ ... ]
> > > 
> > > A bisection resulted in first bad commit (270065e92 - scsi: scsi-mq:
> > > Always unprepare ...) in the merge branch 'scsi/for-next'
> > > 
> > > System booted fine when the below commit is reverted: 
> > > 
> > > commit 270065e92c317845d69095ec8e3d18616b5b39d5
> > > Author: Bart Van Assche 
> > > Date:   Thu Aug 3 14:40:14 2017 -0700
> > > 
> > > scsi: scsi-mq: Always unprepare before requeuing a request
> > 
> > Hello Brian and Michael,
> > 
> > Do you agree that this probably indicates a bug in the PowerPC block driver
> > that is used to access the boot disk? Anyway, since a solution is not yet
> > available, I will submit a revert for this patch.
> 
> I've been looking at this a bit, and can recreate the issue, but haven't
> got to root cause of the issue as of yet. If I do a sysrq-w while the system 
> is hung
> during boot I see this:
> 
> [   25.561523] Workqueue: events_unbound async_run_entry_fn
> [   25.561527] Call Trace:
> [   25.561529] [c001697873f0] [c00169701600] 0xc00169701600 
> (unreliable)
> [   25.561534] [c001697875c0] [c001ab78] __switch_to+0x2e8/0x430
> [   25.561539] [c00169787620] [c091ccb0] __schedule+0x310/0xa00
> [   25.561543] [c001697876f0] [c091d3e0] schedule+0x40/0xb0
> [   25.561548] [c00169787720] [c0921e40] 
> schedule_timeout+0x200/0x430
> [   25.561553] [c00169787810] [c091db10] 
> io_schedule_timeout+0x30/0x70
> [   25.561558] [c00169787840] [c091e978] 
> wait_for_common_io.constprop.3+0x178/0x280
> [   25.561563] [c001697878c0] [c047f7ec] blk_execute_rq+0x7c/0xd0
> [   25.561567] [c00169787910] [c0614cd0] scsi_execute+0x100/0x230
> [   25.561572] [c00169787990] [c060d29c] 
> scsi_report_opcode+0xbc/0x170
> [   25.561577] [c00169787a50] [d4fe6404] 
> sd_revalidate_disk+0xe04/0x1620 [sd_mod]
> [   25.561583] [c00169787b80] [d4fe6d84] 
> sd_probe_async+0xb4/0x230 [sd_mod]
> [   25.561588] [c00169787c00] [c010fc44] 
> async_run_entry_fn+0x74/0x210
> [   25.561593] [c00169787c90] [c0102f48] 
> process_one_work+0x198/0x480
> [   25.561598] [c00169787d30] [c01032b8] worker_thread+0x88/0x510
> [   25.561603] [c00169787dc0] [c010b030] kthread+0x160/0x1a0
> [   25.561608] [c00169787e30] [c000b3a4] 
> ret_from_kernel_thread+0x5c/0xb8
> 
> I was noticing that we are commonly in scsi_report_opcode. Since ipr RAID 
> arrays don't support
> the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, I tried setting 
> sdev->no_report_opcodes = 1
> in ipr's slave configure. This seems to eliminate the boot hang for me, but 
> is only working around
> the issue. Since this command is not supported by ipr, it should return with 
> an illegal request.
> When I'm hung at this point, there is nothing outstanding to the adapter / 
> driver. I'll continue
> debugging...

(+linux-scsi)

Hello Brian,

Is kernel debugging enabled on your test system? Is lockdep enabled?
Anyway, stack traces like the above usually mean that a request got stuck in
a block or scsi driver (ipr in this case). Information about pending requests,
including the SCSI CDB, is available under /sys/kernel/debug/block (see also
commit 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")).

Bart.