Just to follow-up; I happened to be on the machine's console when the most recent network outage occured; when it started working again - I immediately saw a console message:
SUNW-MSG-ID: SUNOS-8000-1L, TYPE: Defect, VER: 1, SEVERITY: Minor EVENT-TIME: Tue Aug 19 14:15:00 PDT 2008 PLATFORM: PDSMi, CSN: 0123456789, HOSTNAME: grommit SOURCE: eft, REV: 1.16 EVENT-ID: ca6dfb00-511d-6729-b83a-ef594726fc65 DESC: The EFT Diagnosis Engine encountered telemetry for which it is unable to produce a diagnosis. Refer to http://sun.com/msg/SUNOS-8000-1L for more information. AUTO-RESPONSE: Error reports from the component will be logged for examination by Sun. IMPACT: Automated diagnosis and response for these events will not occur. REC-ACTION: Run pkgchk -n SUNWfmd to ensure that fault management software is installed properly. Contact Sun for support. Which seems to match up to the ereport.io.device.stall/restored messages reported by fmdump. cheers, steve Stephen Lau wrote: > Hi Ted, > I'm quite sure I didn't observe any of these timeout/errors in > snv_75. Looking at the network pings from our network monitor graphs, > all the outages occur after I liveupgraded to snv_95. It's a fairly > typical SAMP system... I'm not running anything specific to trigger the > network outages. Some of the apps that are running are: Apache2, PHP 5, > MySQL, Mailman, Dovecot, Postfix, etc. The systemboard is a Supermicro > 5015M-MT+ with the dual Intel e1000g NICs. > > cheers, > steve > > Ted You wrote: > >> Hi Stephen, >> >> From the ereports, it seems that the device got hang and was reset by >> the driver automatically. From snv_77, the e1000g driver started to >> support FMA, so we see the FMA ereports now. The problem might have >> existed in snv_75. >> >> We have not seen this problem before. The recent bug fixes in snv_95 >> have nothing to do with this problem. I need to try to reproduce the >> problem on our local systems. Could you please let me know what >> applications or tests you have been running on your system? >> >> Thanks, >> Ted >> >> >> Stephen Lau : >> >>> I'm seeing flaky outages on my snv_95 system (just recently LU'd from >>> snv_75 where I didn't have any issues). >>> >>> I seem to get outages of a few minutes (5-6) throughout the day, with >>> no rhyme or reason as to why. >>> >>> fmdump -e shows repeated occurrences of: >>> Aug 18 12:24:57.2612 ereport.io.device.stall Aug 18 >>> 12:24:57.5985 ereport.io.service.restored >>> /usr/X11/bin/scanpci shows my e1000g devices to be: >>> pci bus 0x000d cardnum 0x00 function 0x00: vendor 0x8086 device 0x108c >>> Intel Corporation 82573E Gigabit Ethernet Controller (Copper) >>> >>> pci bus 0x000e cardnum 0x00 function 0x00: vendor 0x8086 device 0x109a >>> Intel Corporation 82573L Gigabit Ethernet Controller >>> >>> >>>> From my reading it looks like there were 1 or 2 e1000g issues that >>>> were purportedly fixed in snv_95, but I'm still seeing these >>>> problems. Anyone have any idea as to what might be the issue, and >>>> whether or not there is a workaround? I'm debating about pulling >>>> back the e1000g driver from my snv_75 LU slice and seeing if that >>>> works, but I'm hoping someone has a workaround for now. :) >>>> >>> This message posted from opensolaris.org >>> _______________________________________________ >>> networking-discuss mailing list >>> [email protected] >>> > > > -- stephen lau | [EMAIL PROTECTED] | www.whacked.net _______________________________________________ networking-discuss mailing list [email protected]
