I've managed to get the machine to reproduce this fairly regularly
now.

Without a debug kernel it still results in a panic, just at a later
stage or so I believe, the none debug panic messages is "command not
in queue".

In each none debug panic I've seen the cm_flags indicates the
command being dequeued is on the busy queue and not on the expected
free or ready queue which is being processed at the time.

The triggering issue seems to be the adapter reset code run from
mfi_timeout.

I've had a good look but can't see how a cm could be in a queue yet
have its cm_flags set to that of a different queue as all manipulation
seems to be being done via the "mfi_<method> ## name" macros which
all correctly maintain the queue / cm_flags relationship.

At this point I believe it could be a thread being interrupted by
a timeout part way the processing of a queue request hence queue
and cm_flags being out of sync.

Any pointers on how to debug this issue further / fix it would be most
appreciated.

   Regards
   Steve

----- Original Message ----- From: "Steven Hartland"
Testing a new machine which is based on 8.3-RELEASE with the mfi
driver from 8-STABLE and just got a panic.


The below is translation of the hand copied from console:-
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd5: hard error cmd=write 90827650-90827905
mfi0: I/O error, status= 46 scsi_status= 240
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd5: hard error cmd=write 90827394-90827649
mfi0: I/O error, status= 46 scsi_status= 240
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd5: hard error cmd=write 90827138-90827393
mfi0: I/O error, status= 46 scsi_status= 240
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd5: hard error cmd=write 90826882-90827137
mfi0: I/O error, status= 2 scsi_status= 2
mfi0: sense error 112, sense_key 6, asc 41, ascq 0
mfisyspd4: hard error cmd=write 90830466-90830721
mfi0: I/O error, status= 2 scsi_status= 2
mfi0: sense error 112, sense_key 6, asc 41, ascq 0
mfisyspd5: hard error cmd=write 90830722-90830977
mfi0: Adapter RESET condition detected
mfi0: First state FW reset initiated...
mfi0: ADP_RESET_TBOLT: HostDiag=a0
mfi0: first state of reset complete, second state initiated...
mfi0: Second state FW reset initiated...
panic: _mtx_lock_sleep: recursed on non-recusive mutex MFI I/O lock @ 
/usr/src/sys/dev/mfi/mfi_tbolt:346

cpuid = 6
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x178
_mtx_lock_sleep() at _mtx_lock_sleep+0x152
_mtx_lock_flags() at _mtx_lock_flags+0x80
mfi_tbolt_init_MFI_queue() at mfi_tbolt_init_MFI_queue+0x72
mfi_timeout() at mfi_timeout+0x27
softclock() at softclock+0x2aa
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop() at ithread_loop+0xb2
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff80005ccd00, rbp = 0 ---
KDB: enter panic
[thread pid 12 tid 100020 ]
Stopperd at kdb_enter+0x3b: movq    $0,0x51cb32(%rip)
db>

So questions:-
1. What are the "hard error" errors? The machine was testing IO
with dd but due to the panic I cant tell if that was the cause.
2. Looking at the code this seems like the reset was tripped by
firmware bug, is that the case?
3. Is the fix the panic a simple one we cat test?

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to