Please review very carefully the mutexes. I suspect a potential out of order mutex allocation.
It would be informative to post the stack backtrace from the panic here. (Its a lot easier to look at a back trace than to try to reproduce myself. :-) -- Garrett Steven Stallion wrote: > Garrett/All, > > Looks like the fix wasn't much of a fix; I may have just stumbled on a > pre-existing issue. > > I erred on the safe side and updated the mutex handling in dnet_send > to be a bit more agressive; the behavior matches precisely what > existed in dnet prior to any of my changes. > > The panic occurs less frequently, but the race condition still exists. > > Essentially, the panic is raised in mutex_vector_enter as a result of > trying to obtain a lock on dnetp->intrlock via mutex_enter. > > debug64 and debug32 builds do not exhibit this behavior (I am at a > loss as to why this is occurring in obj64 builds only). > > I suspect something is running afoul in the ISR (dnet_intr). A > possible solution is to move the code which kicks the transmitter up > into dnet_m_tx - this will result in a single interrupt per packet > chain rather than once per packet. > > At this point, I would like to have someone else verify that this is > indeed an issue (see below) before I do much more. The device I am > testing this on is known for being a bit difficult (Cogent chipset). > > To reproduce: > > Apply the dnet patch provided in the webrev and build an obj64 version > of the driver. Plumb the interface and start pushing traffic (I was > issuing 'rsh <host> find /' to the NICDRV client). A panic should > result within a couple of minutes. > > Any ideas? > > Steve > > Steven Stallion wrote: >> A quick update: >> >> Yesterday, while switching over to the auto nicdrv scripts Alan >> mentioned, I also changed over to the non-debug version of the driver >> and almost immediately ran into a panic. >> >> I managed to create an interesting race condition in dnet_send that >> only shows up in the non-debug version of the driver. I am a bit >> surprised since this really should affect the debug version equally, >> however I was never able to duplicate the condition. >> >> Long story short, I was attempting to be cute with my mutex handling. >> >> Everything is now back on track, and I should have a new set of >> NICDRV results later this evening. >> >> Steve >> > _______________________________________________ driver-discuss mailing list driver-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/driver-discuss