----- Original Message ----- From: "Steven Hartland" <kill...@multiplay.co.uk>

Sounds like you have made some good progress.  I looked at your prior locking
change and they good.  Haven't had time to go through the queue changes
yet.

Just to update people on this, as its taken quite some time to track down the
random issues causing panics, but I believe I made a breakthrough last night.

It seems that the cleanup interation between mfi_cmd's and tbolt_cmd's is flawed
meaning its possible that tbolt commands are processed after the caller has
already recieved a response, cleaned and returned the mfi_cmd to the free queue.

This means that its anyones guess what the result of the tbolt cleanup is as it
could well be operating on a mfi_cmd thats either now in the free queue or even
worse has already been reused.

It also possible this was the underling issue you may well have seening which
caused you to add the mfi_tbolt_complete_cmd calls to mfi_tbolt_send_frame
in r242681.

If this is correct then I believe the correct fix is to ensure that
mfi_tbolt_return_cmd is only ever called from mfi_release_command thus ensuring
completion ordering is always correct. I'm testing fixes for this theory now
but initial debug has had good results.

The patch of fixes is really growing, so definitely going to need someone to
review in detail when I'm done.

What do you think of the above, does it make sence? Would you be willing to
review the patch when I'm done, before I commit it Doug?

Ok I think I'm done.

The good news is I've managed to fix all panics and cases of commands being
processed incorrectly that we've seen here. The bad news is the patch is now
really quite large as there was a lot if issues found during debugging of the
core problems.

The main fixes are:-
1. Ensure that IO lock is not dropped during tbolt ISR processing, as this
can cause some very nasty issues when two threads end up processing the same
tbolt cmd.

2. Ensure that interaction between mfi_cmd's and tbolt_cmd's, specifically
in their cleanup, total number and range checks as if this isn't done then
again some very nasty issues can occur.

3. Ensure that tbolt init doesn't break MFI indexing by assuming it always
gets the first mfi command structure.

The reset of the fixes are for things like potential NULL pointer exceptions,
locks not being dropped during error cases etc. Full details of all the fixes
are in the patch which can be found here:-
http://blog.multiplay.co.uk/dropzone/freebsd/zz-mfi-queue.patch

It should be noted that while the changes now make the driver functionally
correct, the promotion of the IO lock to the upper layers isn't ideal and
could do with optimising.

   Regards
   Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to