Understandable ;)

I was able to produced another panic, however this time the backtrace is 
a different. This round resulted in a GP fault:

(Unfortunately, this machine does not support serial, so I will have to 
reproduce manually below)

[0]> $c
kmdb_enter+0xb()
debug_enter+0x37(...)
panicsys+0x3fd(...)
vpanic+0x15d()
panic+0x9c()
die+0xea(...)
trap+0x3d0(...)
dnet`dnet_send+0x7f(...)
dnet`dnet_m_tx+0x85(...)
dls`dls_tx+0x1d(...)
...

Steve

Garrett D'Amore wrote:
> Please review very carefully the mutexes.  I suspect a potential out of 
> order mutex allocation.
> 
> It would be informative to post the stack backtrace from the panic 
> here.   (Its a lot easier to look at a back trace than to try to 
> reproduce myself. :-)
> 
>    -- Garrett
> 
> Steven Stallion wrote:
>> Garrett/All,
>>
>> Looks like the fix wasn't much of a fix; I may have just stumbled on a 
>> pre-existing issue.
>>
>> I erred on the safe side and updated the mutex handling in dnet_send 
>> to be a bit more agressive; the behavior matches precisely what 
>> existed in dnet prior to any of my changes.
>>
>> The panic occurs less frequently, but the race condition still exists.
>>
>> Essentially, the panic is raised in mutex_vector_enter as a result of 
>> trying to obtain a lock on dnetp->intrlock via mutex_enter.
>>
>> debug64 and debug32 builds do not exhibit this behavior (I am at a 
>> loss as to why this is occurring in obj64 builds only).
>>
>> I suspect something is running afoul in the ISR (dnet_intr). A 
>> possible solution is to move the code which kicks the transmitter up 
>> into dnet_m_tx - this will result in a single interrupt per packet 
>> chain rather than once per packet.
>>
>> At this point, I would like to have someone else verify that this is 
>> indeed an issue (see below) before I do much more. The device I am 
>> testing this on is known for being a bit difficult (Cogent chipset).
>>
>> To reproduce:
>>
>> Apply the dnet patch provided in the webrev and build an obj64 version 
>> of the driver. Plumb the interface and start pushing traffic (I was 
>> issuing 'rsh <host> find /' to the NICDRV client). A panic should 
>> result within a couple of minutes.
>>
>> Any ideas?
>>
>> Steve
>>
>> Steven Stallion wrote:
>>> A quick update:
>>>
>>> Yesterday, while switching over to the auto nicdrv scripts Alan 
>>> mentioned, I also changed over to the non-debug version of the driver 
>>> and almost immediately ran into a panic.
>>>
>>> I managed to create an interesting race condition in dnet_send that 
>>> only shows up in the non-debug version of the driver. I am a bit 
>>> surprised since this really should affect the debug version equally, 
>>> however I was never able to duplicate the condition.
>>>
>>> Long story short, I was attempting to be cute with my mutex handling.
>>>
>>> Everything is now back on track, and I should have a new set of 
>>> NICDRV results later this evening.
>>>
>>> Steve
>>>
>>
> 

-- 
Yet magic and hierarchy
arise from the same source,
and this source has a null pointer.

Reference the NULL within NULL,
it is the gateway to all wizardry.
_______________________________________________
driver-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/driver-discuss

Reply via email to