Please review very carefully the mutexes.  I suspect a potential out of 
order mutex allocation.

It would be informative to post the stack backtrace from the panic 
here.   (Its a lot easier to look at a back trace than to try to 
reproduce myself. :-)

    -- Garrett

Steven Stallion wrote:
> Garrett/All,
>
> Looks like the fix wasn't much of a fix; I may have just stumbled on a 
> pre-existing issue.
>
> I erred on the safe side and updated the mutex handling in dnet_send 
> to be a bit more agressive; the behavior matches precisely what 
> existed in dnet prior to any of my changes.
>
> The panic occurs less frequently, but the race condition still exists.
>
> Essentially, the panic is raised in mutex_vector_enter as a result of 
> trying to obtain a lock on dnetp->intrlock via mutex_enter.
>
> debug64 and debug32 builds do not exhibit this behavior (I am at a 
> loss as to why this is occurring in obj64 builds only).
>
> I suspect something is running afoul in the ISR (dnet_intr). A 
> possible solution is to move the code which kicks the transmitter up 
> into dnet_m_tx - this will result in a single interrupt per packet 
> chain rather than once per packet.
>
> At this point, I would like to have someone else verify that this is 
> indeed an issue (see below) before I do much more. The device I am 
> testing this on is known for being a bit difficult (Cogent chipset).
>
> To reproduce:
>
> Apply the dnet patch provided in the webrev and build an obj64 version 
> of the driver. Plumb the interface and start pushing traffic (I was 
> issuing 'rsh <host> find /' to the NICDRV client). A panic should 
> result within a couple of minutes.
>
> Any ideas?
>
> Steve
>
> Steven Stallion wrote:
>> A quick update:
>>
>> Yesterday, while switching over to the auto nicdrv scripts Alan 
>> mentioned, I also changed over to the non-debug version of the driver 
>> and almost immediately ran into a panic.
>>
>> I managed to create an interesting race condition in dnet_send that 
>> only shows up in the non-debug version of the driver. I am a bit 
>> surprised since this really should affect the debug version equally, 
>> however I was never able to duplicate the condition.
>>
>> Long story short, I was attempting to be cute with my mutex handling.
>>
>> Everything is now back on track, and I should have a new set of 
>> NICDRV results later this evening.
>>
>> Steve
>>
>

_______________________________________________
driver-discuss mailing list
driver-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/driver-discuss

Reply via email to