Garrett/All, Looks like the fix wasn't much of a fix; I may have just stumbled on a pre-existing issue.
I erred on the safe side and updated the mutex handling in dnet_send to be a bit more agressive; the behavior matches precisely what existed in dnet prior to any of my changes. The panic occurs less frequently, but the race condition still exists. Essentially, the panic is raised in mutex_vector_enter as a result of trying to obtain a lock on dnetp->intrlock via mutex_enter. debug64 and debug32 builds do not exhibit this behavior (I am at a loss as to why this is occurring in obj64 builds only). I suspect something is running afoul in the ISR (dnet_intr). A possible solution is to move the code which kicks the transmitter up into dnet_m_tx - this will result in a single interrupt per packet chain rather than once per packet. At this point, I would like to have someone else verify that this is indeed an issue (see below) before I do much more. The device I am testing this on is known for being a bit difficult (Cogent chipset). To reproduce: Apply the dnet patch provided in the webrev and build an obj64 version of the driver. Plumb the interface and start pushing traffic (I was issuing 'rsh <host> find /' to the NICDRV client). A panic should result within a couple of minutes. Any ideas? Steve Steven Stallion wrote: > A quick update: > > Yesterday, while switching over to the auto nicdrv scripts Alan > mentioned, I also changed over to the non-debug version of the driver > and almost immediately ran into a panic. > > I managed to create an interesting race condition in dnet_send that only > shows up in the non-debug version of the driver. I am a bit surprised > since this really should affect the debug version equally, however I was > never able to duplicate the condition. > > Long story short, I was attempting to be cute with my mutex handling. > > Everything is now back on track, and I should have a new set of NICDRV > results later this evening. > > Steve > -- Yet magic and hierarchy arise from the same source, and this source has a null pointer. Reference the NULL within NULL, it is the gateway to all wizardry. _______________________________________________ driver-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/driver-discuss
