Garrett/All,

Looks like the fix wasn't much of a fix; I may have just stumbled on a 
pre-existing issue.

I erred on the safe side and updated the mutex handling in dnet_send to 
be a bit more agressive; the behavior matches precisely what existed in 
dnet prior to any of my changes.

The panic occurs less frequently, but the race condition still exists.

Essentially, the panic is raised in mutex_vector_enter as a result of 
trying to obtain a lock on dnetp->intrlock via mutex_enter.

debug64 and debug32 builds do not exhibit this behavior (I am at a loss 
as to why this is occurring in obj64 builds only).

I suspect something is running afoul in the ISR (dnet_intr). A possible 
solution is to move the code which kicks the transmitter up into 
dnet_m_tx - this will result in a single interrupt per packet chain 
rather than once per packet.

At this point, I would like to have someone else verify that this is 
indeed an issue (see below) before I do much more. The device I am 
testing this on is known for being a bit difficult (Cogent chipset).

To reproduce:

Apply the dnet patch provided in the webrev and build an obj64 version 
of the driver. Plumb the interface and start pushing traffic (I was 
issuing 'rsh <host> find /' to the NICDRV client). A panic should result 
within a couple of minutes.

Any ideas?

Steve

Steven Stallion wrote:
> A quick update:
> 
> Yesterday, while switching over to the auto nicdrv scripts Alan 
> mentioned, I also changed over to the non-debug version of the driver 
> and almost immediately ran into a panic.
> 
> I managed to create an interesting race condition in dnet_send that only 
> shows up in the non-debug version of the driver. I am a bit surprised 
> since this really should affect the debug version equally, however I was 
> never able to duplicate the condition.
> 
> Long story short, I was attempting to be cute with my mutex handling.
> 
> Everything is now back on track, and I should have a new set of NICDRV 
> results later this evening.
> 
> Steve
> 

-- 
Yet magic and hierarchy
arise from the same source,
and this source has a null pointer.

Reference the NULL within NULL,
it is the gateway to all wizardry.
_______________________________________________
driver-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/driver-discuss

Reply via email to