Re: SunFire X2200 ilo's bge1 DOWN/UP
On Mon, Jun 03, 2013 at 09:25:33AM +0300, Daniel Braniss wrote: > > On Fri, May 31, 2013 at 08:24:47AM +0300, Daniel Braniss wrote: > > > > On Thursday, May 30, 2013 2:44:35 am Daniel Braniss wrote: > > > > > > --/04w6evG8XlLl3ft > > > > > > Content-Type: text/x-diff; charset=us-ascii > > > > > > Content-Disposition: attachment; filename="bge.media_sts.diff" > > > > > > > > > > > > Index: sys/dev/bge/if_bge.c > > > > > > === > > > > > > --- sys/dev/bge/if_bge.c(revision 251021) > > > > > > +++ sys/dev/bge/if_bge.c(working copy) > > > > > > @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct > > > > > > ifmediar > > > > > > > > > > > > BGE_LOCK(sc); > > > > > > > > > > > > + if ((ifp->if_flags & IFF_UP) == 0) { > > > > > > + BGE_UNLOCK(sc); > > > > > > + return; > > > > > > + } > > > > > > if (sc->bge_flags & BGE_FLAG_TBI) { > > > > > > ifmr->ifm_status = IFM_AVALID; > > > > > > ifmr->ifm_active = IFM_ETHER; > > > > > > > > > > > > --/04w6evG8XlLl3ft-- > > > > > after 18hs, the logs are empty! > > > > > it seems the patch fixes the problem. > > > > > > > > > > now maybe it's time to hunt for who is randomly calling for > > > > > bge_ifmedia_sts > > > > > ... > > > > > > > > It could be any number of daemons that query interface state such as an > > > > SNMP server, ladvd, etc. > > > > > > > > If you wanted help you could modify the patch so that it does something > > > > like > > > > this: > > > > > > > #include > > > > if (/* test for IFF_UP */) { > > > > BGE_UNLOCK(sc); > > > > if_printf(ifp, "state queried on down interface by pid > > > > %d (%s)", > > > --| > > > add a \n > > > > curthread->td_proc->p_pid, > > > > curthread->td_proc->p_comm); > > > > return; > > > > } > > > > > > > > -- > > > > John Baldwin > > > snmpd call this several times a second, (difficult to measeure since > > > sysolog > > > just says > > >last message repeated 22 times > > > in any case, the DOWN/UP appears once every few hours, oh well. > > > I have now stopped the snmpd daemon, maybe there is someone else ... > > > > I have no idea why snmpd wants to know media status for interfaces > > that are put into down state. The media status resolved after > > bringing up the interface may be different one that was seen > > before. > > The patch also makes dhclient think driver got a valid link > > regardless of link establishment. I guess that wouldn't be > > issue though. I'll commit the patch after some more testing. > > > > Thanks for reporting and testing! > > > no problem! > > after more than 3 days, there were no more 'reports', so snmpd was the > culprit. > the snmpd we use is from ports, i'll try and see waht's going on ... > FYI: Committed in r251481. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
> On Fri, May 31, 2013 at 08:24:47AM +0300, Daniel Braniss wrote: > > > On Thursday, May 30, 2013 2:44:35 am Daniel Braniss wrote: > > > > > --/04w6evG8XlLl3ft > > > > > Content-Type: text/x-diff; charset=us-ascii > > > > > Content-Disposition: attachment; filename="bge.media_sts.diff" > > > > > > > > > > Index: sys/dev/bge/if_bge.c > > > > > === > > > > > --- sys/dev/bge/if_bge.c (revision 251021) > > > > > +++ sys/dev/bge/if_bge.c (working copy) > > > > > @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct > > > > > ifmediar > > > > > > > > > > BGE_LOCK(sc); > > > > > > > > > > + if ((ifp->if_flags & IFF_UP) == 0) { > > > > > + BGE_UNLOCK(sc); > > > > > + return; > > > > > + } > > > > > if (sc->bge_flags & BGE_FLAG_TBI) { > > > > > ifmr->ifm_status = IFM_AVALID; > > > > > ifmr->ifm_active = IFM_ETHER; > > > > > > > > > > --/04w6evG8XlLl3ft-- > > > > after 18hs, the logs are empty! > > > > it seems the patch fixes the problem. > > > > > > > > now maybe it's time to hunt for who is randomly calling for > > > > bge_ifmedia_sts > > > > ... > > > > > > It could be any number of daemons that query interface state such as an > > > SNMP server, ladvd, etc. > > > > > > If you wanted help you could modify the patch so that it does something > > > like > > > this: > > > > > #include > > > if (/* test for IFF_UP */) { > > > BGE_UNLOCK(sc); > > > if_printf(ifp, "state queried on down interface by pid %d (%s)", > > --| > > add a \n > > > curthread->td_proc->p_pid, curthread->td_proc->p_comm); > > > return; > > > } > > > > > > -- > > > John Baldwin > > snmpd call this several times a second, (difficult to measeure since > > sysolog > > just says > > last message repeated 22 times > > in any case, the DOWN/UP appears once every few hours, oh well. > > I have now stopped the snmpd daemon, maybe there is someone else ... > > I have no idea why snmpd wants to know media status for interfaces > that are put into down state. The media status resolved after > bringing up the interface may be different one that was seen > before. > The patch also makes dhclient think driver got a valid link > regardless of link establishment. I guess that wouldn't be > issue though. I'll commit the patch after some more testing. > > Thanks for reporting and testing! > no problem! after more than 3 days, there were no more 'reports', so snmpd was the culprit. the snmpd we use is from ports, i'll try and see waht's going on ... thanks danny > > > > thanks, > > danny > > > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Fri, May 31, 2013 at 08:24:47AM +0300, Daniel Braniss wrote: > > On Thursday, May 30, 2013 2:44:35 am Daniel Braniss wrote: > > > > --/04w6evG8XlLl3ft > > > > Content-Type: text/x-diff; charset=us-ascii > > > > Content-Disposition: attachment; filename="bge.media_sts.diff" > > > > > > > > Index: sys/dev/bge/if_bge.c > > > > === > > > > --- sys/dev/bge/if_bge.c(revision 251021) > > > > +++ sys/dev/bge/if_bge.c(working copy) > > > > @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct > > > > ifmediar > > > > > > > > BGE_LOCK(sc); > > > > > > > > + if ((ifp->if_flags & IFF_UP) == 0) { > > > > + BGE_UNLOCK(sc); > > > > + return; > > > > + } > > > > if (sc->bge_flags & BGE_FLAG_TBI) { > > > > ifmr->ifm_status = IFM_AVALID; > > > > ifmr->ifm_active = IFM_ETHER; > > > > > > > > --/04w6evG8XlLl3ft-- > > > after 18hs, the logs are empty! > > > it seems the patch fixes the problem. > > > > > > now maybe it's time to hunt for who is randomly calling for > > > bge_ifmedia_sts > > > ... > > > > It could be any number of daemons that query interface state such as an > > SNMP server, ladvd, etc. > > > > If you wanted help you could modify the patch so that it does something > > like > > this: > > > #include > > if (/* test for IFF_UP */) { > > BGE_UNLOCK(sc); > > if_printf(ifp, "state queried on down interface by pid %d (%s)", > --| > add a \n > > curthread->td_proc->p_pid, curthread->td_proc->p_comm); > > return; > > } > > > > -- > > John Baldwin > snmpd call this several times a second, (difficult to measeure since sysolog > just says >last message repeated 22 times > in any case, the DOWN/UP appears once every few hours, oh well. > I have now stopped the snmpd daemon, maybe there is someone else ... I have no idea why snmpd wants to know media status for interfaces that are put into down state. The media status resolved after bringing up the interface may be different one that was seen before. The patch also makes dhclient think driver got a valid link regardless of link establishment. I guess that wouldn't be issue though. I'll commit the patch after some more testing. Thanks for reporting and testing! > > thanks, > danny > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
> On Thursday, May 30, 2013 2:44:35 am Daniel Braniss wrote: > > > --/04w6evG8XlLl3ft > > > Content-Type: text/x-diff; charset=us-ascii > > > Content-Disposition: attachment; filename="bge.media_sts.diff" > > > > > > Index: sys/dev/bge/if_bge.c > > > === > > > --- sys/dev/bge/if_bge.c (revision 251021) > > > +++ sys/dev/bge/if_bge.c (working copy) > > > @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar > > > > > > BGE_LOCK(sc); > > > > > > + if ((ifp->if_flags & IFF_UP) == 0) { > > > + BGE_UNLOCK(sc); > > > + return; > > > + } > > > if (sc->bge_flags & BGE_FLAG_TBI) { > > > ifmr->ifm_status = IFM_AVALID; > > > ifmr->ifm_active = IFM_ETHER; > > > > > > --/04w6evG8XlLl3ft-- > > after 18hs, the logs are empty! > > it seems the patch fixes the problem. > > > > now maybe it's time to hunt for who is randomly calling for bge_ifmedia_sts > > ... > > It could be any number of daemons that query interface state such as an > SNMP server, ladvd, etc. > > If you wanted help you could modify the patch so that it does something like > this: > #include > if (/* test for IFF_UP */) { > BGE_UNLOCK(sc); > if_printf(ifp, "state queried on down interface by pid %d (%s)", --| add a \n > curthread->td_proc->p_pid, curthread->td_proc->p_comm); > return; > } > > -- > John Baldwin snmpd call this several times a second, (difficult to measeure since sysolog just says last message repeated 22 times in any case, the DOWN/UP appears once every few hours, oh well. I have now stopped the snmpd daemon, maybe there is someone else ... thanks, danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Thursday, May 30, 2013 2:44:35 am Daniel Braniss wrote: > > --/04w6evG8XlLl3ft > > Content-Type: text/x-diff; charset=us-ascii > > Content-Disposition: attachment; filename="bge.media_sts.diff" > > > > Index: sys/dev/bge/if_bge.c > > === > > --- sys/dev/bge/if_bge.c(revision 251021) > > +++ sys/dev/bge/if_bge.c(working copy) > > @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar > > > > BGE_LOCK(sc); > > > > + if ((ifp->if_flags & IFF_UP) == 0) { > > + BGE_UNLOCK(sc); > > + return; > > + } > > if (sc->bge_flags & BGE_FLAG_TBI) { > > ifmr->ifm_status = IFM_AVALID; > > ifmr->ifm_active = IFM_ETHER; > > > > --/04w6evG8XlLl3ft-- > after 18hs, the logs are empty! > it seems the patch fixes the problem. > > now maybe it's time to hunt for who is randomly calling for bge_ifmedia_sts > ... It could be any number of daemons that query interface state such as an SNMP server, ladvd, etc. If you wanted help you could modify the patch so that it does something like this: if (/* test for IFF_UP */) { BGE_UNLOCK(sc); if_printf(ifp, "state queried on down interface by pid %d (%s)", curthread->td_proc->p_pid, curthread->td_proc->p_comm); return; } -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
> > --/04w6evG8XlLl3ft > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > > On Tue, May 28, 2013 at 09:55:24AM +0300, Daniel Braniss wrote: > > > On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote: > > > > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > > > > > hi, after upgrading to 9.1-stable, this particular hardware - > > > > > > > > SunFire X2200, > > > > > > > > > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' > > > > > > > output. > > > > > > > > > > > > > > > > > > > bge0: > > > > > 0x009003> mem > > > > > > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on > > > > > > pci6 > > > > > > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 > > > > > > MHz > > > > > > miibus2: on bge0 > > > > > > brgphy0: PHY 1 on miibus2 > > > > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, > > > > > > 1000baseT, > > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, > > > > > > auto-flow > > > > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > > > > > bge1: > > > > > 0x009003> mem > > > > > > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on > > > > > > pci6 > > > > > > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 > > > > > > MHz > > > > > > miibus3: on bge1 > > > > > > brgphy1: PHY 1 on miibus3 > > > > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, > > > > > > 1000baseT, > > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, > > > > > > auto-flow > > > > > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > > > > > > > > > sf-10> ifconfig bge1 > > > > > > bge1: flags=8802 metric 0 mtu 1500 > > > > > > > > > > > > options=8009b > > > > > TE> > > > > > > ether 00:1b:24:5d:5b:be > > > > > > nd6 options=21 > > > > > > media: Ethernet autoselect (100baseTX ) > > > > > > status: active > > > > > > > > > > > > > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > > > > > Do you have some network script run by cron? > > > > > > > > no scripts. > > > > this port is shared with the ILO/IPMI, and back in March you fixed a > > > > problem > > > > that it was hanging soon after it was initialized by the driver, > > > > (r248226 - but I'm not sure if it was ever MFC'ed). > > > > > > It was MFCed. > > > > > > > Initialy I thought it could be caused by connections to it from other > > > > hosts (either via the web, or ssh) so I killed them, but it didn't help. > > > > without that patch the connection fails, and I don't see any DOWN/UP. > > > > > > Could you check how many number of interrupts you get from bge1? > > > Ideally you shouldn't get any interrupts for bge1. > > > > it's not even mentioned :-) > > sf-04> vmstat -i > > interrupt total rate > > irq3: uart1 964 0 > > irq4: uart06 0 > > irq14: ata0 227354 0 > > irq17: bge0 1021981 2 > > irq21: ohci0 28 0 > > irq22: ehci0 2 0 > > irq23: atapci1293228 0 > > cpu0:timer 383244076 1124 > > cpu1:timer 2225144 6 > > cpu2:timer 2056087 6 > > cpu3:timer 2093943 6 > > Total 391162813 1147 > > > > Then the only way link UP/DOWN event could be generated for DOWN > interface would be invocation of media status query > (i.e. ifconfig -a) triggered by an external application. Most > drivers I touched check IFF_UP flag before poking media status > register. However I'm not sure you're seeing this issue because you > do not use any network script run by cron. > Anyway, try attached patch and let me know whether it makes any > difference. > > > > > > > > > > > > > > > > > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being > > > > > > > > used by the ILO. > > > > > > > > To check, I upgraded another identical host, and the same > > > > > > > > problem appears. > > > > > > > > > > > > > > What is the last known working revision? > > > > > > > > > > > > I have no idea, but I have older versions, and ill start from the > > > > > > oldets > > > > > > (9.1-prerelease), but > > > > > > it will take time, since it takes hours till it happens. > > > > > > > > > > > > > > > > ok. > > > > > > > > > > > > > > --/04w6evG8XlLl3ft > Content-Type: text/x-diff; charset=us-ascii > Content-Disposition: attachment; filename="bge.media_sts.diff" > > Index: sys/dev/bge/if_bge.c > === > --- sys/dev/bge/if_bge.c
Re: SunFire X2200 ilo's bge1 DOWN/UP
> > --/04w6evG8XlLl3ft > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > > On Tue, May 28, 2013 at 09:55:24AM +0300, Daniel Braniss wrote: > > > On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote: > > > > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > > > > > hi, after upgrading to 9.1-stable, this particular hardware - > > > > > > > > SunFire X2200, > > > > > > > > > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' > > > > > > > output. > > > > > > > > > > > > > > > > > > > bge0: > > > > > 0x009003> mem > > > > > > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on > > > > > > pci6 > > > > > > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 > > > > > > MHz > > > > > > miibus2: on bge0 > > > > > > brgphy0: PHY 1 on miibus2 > > > > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, > > > > > > 1000baseT, > > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, > > > > > > auto-flow > > > > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > > > > > bge1: > > > > > 0x009003> mem > > > > > > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on > > > > > > pci6 > > > > > > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 > > > > > > MHz > > > > > > miibus3: on bge1 > > > > > > brgphy1: PHY 1 on miibus3 > > > > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, > > > > > > 1000baseT, > > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, > > > > > > auto-flow > > > > > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > > > > > > > > > sf-10> ifconfig bge1 > > > > > > bge1: flags=8802 metric 0 mtu 1500 > > > > > > > > > > > > options=8009b > > > > > TE> > > > > > > ether 00:1b:24:5d:5b:be > > > > > > nd6 options=21 > > > > > > media: Ethernet autoselect (100baseTX ) > > > > > > status: active > > > > > > > > > > > > > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > > > > > Do you have some network script run by cron? > > > > > > > > no scripts. > > > > this port is shared with the ILO/IPMI, and back in March you fixed a > > > > problem > > > > that it was hanging soon after it was initialized by the driver, > > > > (r248226 - but I'm not sure if it was ever MFC'ed). > > > > > > It was MFCed. > > > > > > > Initialy I thought it could be caused by connections to it from other > > > > hosts (either via the web, or ssh) so I killed them, but it didn't help. > > > > without that patch the connection fails, and I don't see any DOWN/UP. > > > > > > Could you check how many number of interrupts you get from bge1? > > > Ideally you shouldn't get any interrupts for bge1. > > > > it's not even mentioned :-) > > sf-04> vmstat -i > > interrupt total rate > > irq3: uart1 964 0 > > irq4: uart06 0 > > irq14: ata0 227354 0 > > irq17: bge0 1021981 2 > > irq21: ohci0 28 0 > > irq22: ehci0 2 0 > > irq23: atapci1293228 0 > > cpu0:timer 383244076 1124 > > cpu1:timer 2225144 6 > > cpu2:timer 2056087 6 > > cpu3:timer 2093943 6 > > Total 391162813 1147 > > > > Then the only way link UP/DOWN event could be generated for DOWN > interface would be invocation of media status query > (i.e. ifconfig -a) triggered by an external application. Most > drivers I touched check IFF_UP flag before poking media status > register. However I'm not sure you're seeing this issue because you > do not use any network script run by cron. > Anyway, try attached patch and let me know whether it makes any > difference. > > > > > > > > > > > > > > > > > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being > > > > > > > > used by the ILO. > > > > > > > > To check, I upgraded another identical host, and the same > > > > > > > > problem appears. > > > > > > > > > > > > > > What is the last known working revision? > > > > > > > > > > > > I have no idea, but I have older versions, and ill start from the > > > > > > oldets > > > > > > (9.1-prerelease), but > > > > > > it will take time, since it takes hours till it happens. > > > > > > > > > > > > > > > > ok. > > > > > > > > > > > > > > --/04w6evG8XlLl3ft > Content-Type: text/x-diff; charset=us-ascii > Content-Disposition: attachment; filename="bge.media_sts.diff" > > Index: sys/dev/bge/if_bge.c > === > --- sys/dev/bge/if_bge.c
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Tue, May 28, 2013 at 09:55:24AM +0300, Daniel Braniss wrote: > > On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote: > > > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > > > > hi, after upgrading to 9.1-stable, this particular hardware - > > > > > > > SunFire X2200, > > > > > > > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > > > > > > > > > > > > > > > > bge0: > > > > 0x009003> mem > > > > > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on > > > > > pci6 > > > > > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > > > miibus2: on bge0 > > > > > brgphy0: PHY 1 on miibus2 > > > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > > > > bge1: > > > > 0x009003> mem > > > > > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on > > > > > pci6 > > > > > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > > > miibus3: on bge1 > > > > > brgphy1: PHY 1 on miibus3 > > > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > > > > > > > sf-10> ifconfig bge1 > > > > > bge1: flags=8802 metric 0 mtu 1500 > > > > > > > > > > options=8009b > > > > TE> > > > > > ether 00:1b:24:5d:5b:be > > > > > nd6 options=21 > > > > > media: Ethernet autoselect (100baseTX ) > > > > > status: active > > > > > > > > > > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > > > > Do you have some network script run by cron? > > > > > > no scripts. > > > this port is shared with the ILO/IPMI, and back in March you fixed a > > > problem > > > that it was hanging soon after it was initialized by the driver, > > > (r248226 - but I'm not sure if it was ever MFC'ed). > > > > It was MFCed. > > > > > Initialy I thought it could be caused by connections to it from other > > > hosts (either via the web, or ssh) so I killed them, but it didn't help. > > > without that patch the connection fails, and I don't see any DOWN/UP. > > > > Could you check how many number of interrupts you get from bge1? > > Ideally you shouldn't get any interrupts for bge1. > > it's not even mentioned :-) > sf-04> vmstat -i > interrupt total rate > irq3: uart1 964 0 > irq4: uart06 0 > irq14: ata0 227354 0 > irq17: bge0 1021981 2 > irq21: ohci0 28 0 > irq22: ehci0 2 0 > irq23: atapci1293228 0 > cpu0:timer 383244076 1124 > cpu1:timer 2225144 6 > cpu2:timer 2056087 6 > cpu3:timer 2093943 6 > Total 391162813 1147 > Then the only way link UP/DOWN event could be generated for DOWN interface would be invocation of media status query (i.e. ifconfig -a) triggered by an external application. Most drivers I touched check IFF_UP flag before poking media status register. However I'm not sure you're seeing this issue because you do not use any network script run by cron. Anyway, try attached patch and let me know whether it makes any difference. > > > > > > > > > > > > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being > > > > > > > used by the ILO. > > > > > > > To check, I upgraded another identical host, and the same problem > > > > > > > appears. > > > > > > > > > > > > What is the last known working revision? > > > > > > > > > > I have no idea, but I have older versions, and ill start from the > > > > > oldets > > > > > (9.1-prerelease), but > > > > > it will take time, since it takes hours till it happens. > > > > > > > > > > > > > ok. > > > > > > > > Index: sys/dev/bge/if_bge.c === --- sys/dev/bge/if_bge.c (revision 251021) +++ sys/dev/bge/if_bge.c (working copy) @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar BGE_LOCK(sc); + if ((ifp->if_flags & IFF_UP) == 0) { + BGE_UNLOCK(sc); + return; + } if (sc->bge_flags & BGE_FLAG_TBI) { ifmr->ifm_status = IFM_AVALID; ifmr->ifm_active = IFM_ETHER; ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Tue, May 28, 2013 at 10:57:22AM +0300, Daniel Braniss wrote: > > [...] > > 1. r248226 in head was MFC'd to stable/9 as r248858. Validation: > > > > http://svnweb.freebsd.org/base/stable/9/sys/dev/bge/if_bge.c?view=log > > > > So the answer: whether or not you have that MFC in stable/9 depends on > > what SVN rev your kernel is. > > I do a svnsync then I convert to mercurial so from the svn logs I see that > the highest rev number is 250960. > > [...] > > > > That "piggybacking" crap never should have been invented. All it has > > done is cause problems for every OS I know of (including Windows) since > > its inception, and is also exactly why today almost all vendors I've > > seen provide a dedicated NIC and RJ45 port for the iLO/IPMI interface. > > It's admission the "piggybacking" method doesn't work. And may it rot > > in hell for all I care, while simultaneously feeling very sorry for > > those who have to suffer/deal with it. > > > > This is just another reason why I've always been very picky about what > > hardware I'd buy for server deployments. Vendors never actually > > disclose this crap until you've shelled out money for the hardware, by > > which point it's too late and you're suffering. Really great model -- > > for the pocketbook. :/ > > > > I couldn't agree more! > > [...] > > in the case of the SunFire X2200, it has 4 bge ports, the > 2nd, bge1, is only used by the ilo, it's not enabled (UP'ed), > it doesn't have an interrupt assigned, it's, as far as I can tell, > just anoying to have the DOWN/UP messages - unless something more sinester > is lurking. Does output from "ps -auxH | grep kernel/bge" show anything for bge1? What about "vmstat -i -a" (you might be surprised about the -a flag and what shows up compared to just using -i). Gut feeling says it will show up there. (See vmstat(8) for what -a does) Possibly interrupt generation isn't what's "triggering" the bge(4) device to see link going up/down; maybe this is done via some memory mapped I/O, which would explain why "vmstat -i" shows nothing for bge1 (no interrupts ever generated). That doesn't change the fact that the driver still is being told via some means that link is going up/down. Just a general FYI (probably not relevant here too much, but I often have to point it out for younger SAs (not saying anyone here is one, but the list is archived...)): there is a very distinct difference between a link being physically up/down vs. administratively up/down. With *IX ifconfig, the social assumption is that there's a 1:1 correlation between those (especially with Ethernet devices), when in reality it depends on the device driver and all subsystems in between. I remember quite clearly on some OSes (can't remember if BSD or Linux or Solaris) where "ifconfig xxx down" on certain devices would still result in packets being passed across xxx. This used to shock me when I was younger, but nowadays doesn't because I have a better understanding of why. ifconfig is just a generic tool that interfaces with a lot of things and tries to do too much, in my opinion. On BSD we tend to cram as much crap into ifconfig as humanly possible, while on other OSes separate per-device tools/utilities have been developed to segregate the intended behaviours/desires. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
[...] > 1. r248226 in head was MFC'd to stable/9 as r248858. Validation: > > http://svnweb.freebsd.org/base/stable/9/sys/dev/bge/if_bge.c?view=log > > So the answer: whether or not you have that MFC in stable/9 depends on > what SVN rev your kernel is. I do a svnsync then I convert to mercurial so from the svn logs I see that the highest rev number is 250960. [...] > > That "piggybacking" crap never should have been invented. All it has > done is cause problems for every OS I know of (including Windows) since > its inception, and is also exactly why today almost all vendors I've > seen provide a dedicated NIC and RJ45 port for the iLO/IPMI interface. > It's admission the "piggybacking" method doesn't work. And may it rot > in hell for all I care, while simultaneously feeling very sorry for > those who have to suffer/deal with it. > > This is just another reason why I've always been very picky about what > hardware I'd buy for server deployments. Vendors never actually > disclose this crap until you've shelled out money for the hardware, by > which point it's too late and you're suffering. Really great model -- > for the pocketbook. :/ > I couldn't agree more! [...] in the case of the SunFire X2200, it has 4 bge ports, the 2nd, bge1, is only used by the ilo, it's not enabled (UP'ed), it doesn't have an interrupt assigned, it's, as far as I can tell, just anoying to have the DOWN/UP messages - unless something more sinester is lurking. thanks, danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Mon, May 27, 2013 at 11:49:31PM -0700, Jeremy Chadwick wrote: > Other question: is there any correlation between the amount of time that > goes by between events with, say, ARP/MAC address expiry in "arp -a"? I > mention this because I know some of the ASF methods have historically > shown two MAC addresses on the same physif, and I can see how this might > confuse some stacks. Never mind -- I thought about this more, and it's irrelevant. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
> On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote: > > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > > > hi, after upgrading to 9.1-stable, this particular hardware - > > > > > > SunFire X2200, > > > > > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > > > > > > > > > > > > > bge0: > > > 0x009003> mem > > > > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6 > > > > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > > miibus2: on bge0 > > > > brgphy0: PHY 1 on miibus2 > > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > > > bge1: > > > 0x009003> mem > > > > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6 > > > > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > > miibus3: on bge1 > > > > brgphy1: PHY 1 on miibus3 > > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > > > > > sf-10> ifconfig bge1 > > > > bge1: flags=8802 metric 0 mtu 1500 > > > > > > > > options=8009b > > > TE> > > > > ether 00:1b:24:5d:5b:be > > > > nd6 options=21 > > > > media: Ethernet autoselect (100baseTX ) > > > > status: active > > > > > > > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > > > Do you have some network script run by cron? > > > > no scripts. > > this port is shared with the ILO/IPMI, and back in March you fixed a problem > > that it was hanging soon after it was initialized by the driver, > > (r248226 - but I'm not sure if it was ever MFC'ed). > > It was MFCed. > > > Initialy I thought it could be caused by connections to it from other > > hosts (either via the web, or ssh) so I killed them, but it didn't help. > > without that patch the connection fails, and I don't see any DOWN/UP. > > Could you check how many number of interrupts you get from bge1? > Ideally you shouldn't get any interrupts for bge1. it's not even mentioned :-) sf-04> vmstat -i interrupt total rate irq3: uart1 964 0 irq4: uart06 0 irq14: ata0 227354 0 irq17: bge0 1021981 2 irq21: ohci0 28 0 irq22: ehci0 2 0 irq23: atapci1293228 0 cpu0:timer 383244076 1124 cpu1:timer 2225144 6 cpu2:timer 2056087 6 cpu3:timer 2093943 6 Total 391162813 1147 > > > > > > > > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being used > > > > > > by the ILO. > > > > > > To check, I upgraded another identical host, and the same problem > > > > > > appears. > > > > > > > > > > What is the last known working revision? > > > > > > > > I have no idea, but I have older versions, and ill start from the > > > > oldets > > > > (9.1-prerelease), but > > > > it will take time, since it takes hours till it happens. > > > > > > > > > > ok. > > > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote: > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire > > > > > X2200, > > > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > > > > > > > > > > bge0: > > 0x009003> mem > > > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6 > > > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > miibus2: on bge0 > > > brgphy0: PHY 1 on miibus2 > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > > bge1: > > 0x009003> mem > > > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6 > > > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > miibus3: on bge1 > > > brgphy1: PHY 1 on miibus3 > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > > > sf-10> ifconfig bge1 > > > bge1: flags=8802 metric 0 mtu 1500 > > > > > > options=8009b > > TE> > > > ether 00:1b:24:5d:5b:be > > > nd6 options=21 > > > media: Ethernet autoselect (100baseTX ) > > > status: active > > > > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > > Do you have some network script run by cron? > > no scripts. > this port is shared with the ILO/IPMI, and back in March you fixed a problem > that it was hanging soon after it was initialized by the driver, > (r248226 - but I'm not sure if it was ever MFC'ed). > Initialy I thought it could be caused by connections to it from other > hosts (either via the web, or ssh) so I killed them, but it didn't help. > without that patch the connection fails, and I don't see any DOWN/UP. Two things: 1. r248226 in head was MFC'd to stable/9 as r248858. Validation: http://svnweb.freebsd.org/base/stable/9/sys/dev/bge/if_bge.c?view=log So the answer: whether or not you have that MFC in stable/9 depends on what SVN rev your kernel is. 2. Is there some way to verify that the ASF/iLO/IPMI bits (i.e. the IPMI firmware itself) are not shutting down bge1's PHY intentionally? Unless the IPMI module chooses to log something useful (e.g. "I'm doing this"), I'm not sure how you'd figure that out. Other question: is there any correlation between the amount of time that goes by between events with, say, ARP/MAC address expiry in "arp -a"? I mention this because I know some of the ASF methods have historically shown two MAC addresses on the same physif, and I can see how this might confuse some stacks. That "piggybacking" crap never should have been invented. All it has done is cause problems for every OS I know of (including Windows) since its inception, and is also exactly why today almost all vendors I've seen provide a dedicated NIC and RJ45 port for the iLO/IPMI interface. It's admission the "piggybacking" method doesn't work. And may it rot in hell for all I care, while simultaneously feeling very sorry for those who have to suffer/deal with it. This is just another reason why I've always been very picky about what hardware I'd buy for server deployments. Vendors never actually disclose this crap until you've shelled out money for the hardware, by which point it's too late and you're suffering. Really great model -- for the pocketbook. :/ -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote: > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire > > > > > X2200, > > > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > > > > > > > > > > bge0: > > 0x009003> mem > > > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6 > > > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > miibus2: on bge0 > > > brgphy0: PHY 1 on miibus2 > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > > bge1: > > 0x009003> mem > > > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6 > > > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > miibus3: on bge1 > > > brgphy1: PHY 1 on miibus3 > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > > > sf-10> ifconfig bge1 > > > bge1: flags=8802 metric 0 mtu 1500 > > > > > > options=8009b > > TE> > > > ether 00:1b:24:5d:5b:be > > > nd6 options=21 > > > media: Ethernet autoselect (100baseTX ) > > > status: active > > > > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > > Do you have some network script run by cron? > > no scripts. > this port is shared with the ILO/IPMI, and back in March you fixed a problem > that it was hanging soon after it was initialized by the driver, > (r248226 - but I'm not sure if it was ever MFC'ed). It was MFCed. > Initialy I thought it could be caused by connections to it from other > hosts (either via the web, or ssh) so I killed them, but it didn't help. > without that patch the connection fails, and I don't see any DOWN/UP. Could you check how many number of interrupts you get from bge1? Ideally you shouldn't get any interrupts for bge1. > > > > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being used by > > > > > the ILO. > > > > > To check, I upgraded another identical host, and the same problem > > > > > appears. > > > > > > > > What is the last known working revision? > > > > > > I have no idea, but I have older versions, and ill start from the oldets > > > (9.1-prerelease), but > > > it will take time, since it takes hours till it happens. > > > > > > > ok. > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
... > There are ways you can speed up the replication time. I tend to flood a ser= > ver with > TCP while I've heard of it happening under UDP flood too. > > Here's a nice way to flood a server with TCP (assuming you have SSH access = > to the > system via keys): > > sh -c 'while :;do dd if=3D/dev/urandom of=3D/dev/stdout bs=3D1m count=3D102= > 4 | ssh HOST2KILL /sbin/md5; done' > > Run that about 16 times in separate screen sessions from various other host= > s on your network, > taking care to replace "HOST2KILL" with the hostname or IP of the box with = > the SunFire X2200. > > Let that run for a while, and then when you think you've had a reset (if yo= > u weren't standing > there watching for one)=85 > > grep 'bge.*DOWN' /var/log/messages > > On a system that has booted and stayed up-and-running, there shouldn't be a= > ny messages like this: > > bge0: link state changed to DOWN > > When you actually get this message (if your experience is like ours), you'l= > l be down for 90 seconds > while the NIC resets. > > However, since you say you have some older 9.1 releases=85 I'd start by fir= > st trying to bring the > replication time of the problem down by using TCP and/or UDP floods. That w= > ay you'll be able to > test for resolution of the problem as you progress up to stable/9 (where th= > e problem should be fixed > by the aforementioned SVN revisions -- specific to your hardware). ... > any ideas? > > > Well, you say the connection is OK=85 so it doesn't sound like a full reset= > as it > was in our case (we have a different chipset). > > But I agree that a log full of those would be annoying. > > Try getting up to stable/9 in its current state (note: stable/8 also has al= > l the > aforementioned revisions too). > -- > Devin Hi Devin, the kernel is pretty new, actually last Friday's, and the svn says it's r250960. the bg1 port is not UP, it's shared with the onboard BMC/ILO/IPMI thingy. connecting to it via ssh gets me into it's ILO manager: ... Sun(TM) Embedded Lights Out Manager Copyright 2004-2006 Sun Microsystems, Inc. All rights reserved. Version 3.23 ... and so typing start AgentInfo/console I can get to the 'serial' console. cheers, and thanks, danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
> On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire > > > > X2200, > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > > > > > > > bge0: > > mem > > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6 > > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > miibus2: on bge0 > > brgphy0: PHY 1 on miibus2 > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > bge1: > > mem > > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6 > > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > miibus3: on bge1 > > brgphy1: PHY 1 on miibus3 > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > sf-10> ifconfig bge1 > > bge1: flags=8802 metric 0 mtu 1500 > > > > options=8009b > TE> > > ether 00:1b:24:5d:5b:be > > nd6 options=21 > > media: Ethernet autoselect (100baseTX ) > > status: active > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > Do you have some network script run by cron? no scripts. this port is shared with the ILO/IPMI, and back in March you fixed a problem that it was hanging soon after it was initialized by the driver, (r248226 - but I'm not sure if it was ever MFC'ed). Initialy I thought it could be caused by connections to it from other hosts (either via the web, or ssh) so I killed them, but it didn't help. without that patch the connection fails, and I don't see any DOWN/UP. > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being used by > > > > the ILO. > > > > To check, I upgraded another identical host, and the same problem > > > > appears. > > > > > > What is the last known working revision? > > > > I have no idea, but I have older versions, and ill start from the oldets > > (9.1-prerelease), but > > it will take time, since it takes hours till it happens. > > > > ok. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire > > > X2200, > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > > > > bge0: > mem > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6 > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > miibus2: on bge0 > brgphy0: PHY 1 on miibus2 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > bge0: Ethernet address: 00:1b:24:5d:5b:bd > bge1: > mem > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6 > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > miibus3: on bge1 > brgphy1: PHY 1 on miibus3 > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > bge1: Ethernet address: 00:1b:24:5d:5b:be > > sf-10> ifconfig bge1 > bge1: flags=8802 metric 0 mtu 1500 > > options=8009b TE> > ether 00:1b:24:5d:5b:be > nd6 options=21 > media: Ethernet autoselect (100baseTX ) > status: active > Because bge1 is not UP, I wonder how you get link UP/DOWN events. Do you have some network script run by cron? > > > is toggeling bge1 DOWN/UP every few hours, this port is being used by the > > > ILO. > > > To check, I upgraded another identical host, and the same problem > > > appears. > > > > What is the last known working revision? > > I have no idea, but I have older versions, and ill start from the oldets > (9.1-prerelease), but > it will take time, since it takes hours till it happens. > ok. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On May 27, 2013, at 12:59 AM, Daniel Braniss wrote: On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200, If you're truly running stable/9, and it's up-to-date, you should have have already SVN revisions 248858 and 250650. Both of which have significant impact for (a) the SunFire X2200 (r248858) and (b) the DOWN/UP problem (r250650). Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. bge0: mem 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6 bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz miibus2: on bge0 brgphy0: PHY 1 on miibus2 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge0: Ethernet address: 00:1b:24:5d:5b:bd bge1: mem 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6 bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz miibus3: on bge1 brgphy1: PHY 1 on miibus3 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge1: Ethernet address: 00:1b:24:5d:5b:be sf-10> ifconfig bge1 bge1: flags=8802 metric 0 mtu 1500 options=8009b ether 00:1b:24:5d:5b:be nd6 options=21 media: Ethernet autoselect (100baseTX ) status: active Saw similar things happening over here with different broadcom chipset, and the above revisions helped significantly (URLs below): http://svnweb.freebsd.org/base?view=revision&revision=248858 http://svnweb.freebsd.org/base?view=revision&revision=250650 is toggeling bge1 DOWN/UP every few hours, this port is being used by the ILO. To check, I upgraded another identical host, and the same problem appears. What is the last known working revision? I have no idea, but I have older versions, and ill start from the oldets (9.1-prerelease), but it will take time, since it takes hours till it happens. There are ways you can speed up the replication time. I tend to flood a server with TCP while I've heard of it happening under UDP flood too. Here's a nice way to flood a server with TCP (assuming you have SSH access to the system via keys): sh -c 'while :;do dd if=/dev/urandom of=/dev/stdout bs=1m count=1024 | ssh HOST2KILL /sbin/md5; done' Run that about 16 times in separate screen sessions from various other hosts on your network, taking care to replace "HOST2KILL" with the hostname or IP of the box with the SunFire X2200. Let that run for a while, and then when you think you've had a reset (if you weren't standing there watching for one)… grep 'bge.*DOWN' /var/log/messages On a system that has booted and stayed up-and-running, there shouldn't be any messages like this: bge0: link state changed to DOWN When you actually get this message (if your experience is like ours), you'll be down for 90 seconds while the NIC resets. However, since you say you have some older 9.1 releases… I'd start by first trying to bring the replication time of the problem down by using TCP and/or UDP floods. That way you'll be able to test for resolution of the problem as you progress up to stable/9 (where the problem should be fixed by the aforementioned SVN revisions -- specific to your hardware). There is not correlation with time, since they happend at totaly different times. I rebooted both hosts at almost the same time. one host : uptime: 5:24PM up 6:15, 0 users, load averages: 0.00, 0.00, 0.00 May 24 12:53:52 sf-04 kernel: bge1: link state changed to DOWN May 24 12:53:55 sf-04 kernel: bge1: link state changed to UP May 24 15:34:25 sf-04 kernel: bge1: link state changed to DOWN May 24 15:34:28 sf-04 kernel: bge1: link state changed to UP and uptime: 5:24PM up 6:14, 0 users, load averages: 0.00, 0.00, 0.00 May 24 16:30:44 sf-10 kernel: bge1: link state changed to DOWN May 24 16:30:44 sf-10 kernel: bge1: link state changed to UP this is not serious, the ilo (ssh) connection is ok, but it's anoying, we have more than 10 of this hosts, and if I upgrade all of them, the logs will fill up with this :-) any ideas? Well, you say the connection is OK… so it doesn't sound like a full reset as it was in our case (we have a different chipset). But I agree that a log full of those would be annoying. Try getting up to stable/9 in its current state (note: stable/8 also has all the aforementioned revisions too). -- Devin _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. __
Re: SunFire X2200 ilo's bge1 DOWN/UP
> On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200, > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > bge0: mem 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6 bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz miibus2: on bge0 brgphy0: PHY 1 on miibus2 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge0: Ethernet address: 00:1b:24:5d:5b:bd bge1: mem 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6 bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz miibus3: on bge1 brgphy1: PHY 1 on miibus3 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge1: Ethernet address: 00:1b:24:5d:5b:be sf-10> ifconfig bge1 bge1: flags=8802 metric 0 mtu 1500 options=8009b ether 00:1b:24:5d:5b:be nd6 options=21 media: Ethernet autoselect (100baseTX ) status: active > > is toggeling bge1 DOWN/UP every few hours, this port is being used by the > > ILO. > > To check, I upgraded another identical host, and the same problem appears. > > What is the last known working revision? I have no idea, but I have older versions, and ill start from the oldets (9.1-prerelease), but it will take time, since it takes hours till it happens. > > > There > > is not correlation with time, since they happend at totaly different times. > > I rebooted both hosts at almost the same time. > > one host : > > uptime: 5:24PM up 6:15, 0 users, load averages: 0.00, 0.00, 0.00 > > May 24 12:53:52 sf-04 kernel: bge1: link state changed to DOWN > > May 24 12:53:55 sf-04 kernel: bge1: link state changed to UP > > May 24 15:34:25 sf-04 kernel: bge1: link state changed to DOWN > > May 24 15:34:28 sf-04 kernel: bge1: link state changed to UP > > > > and > > uptime: 5:24PM up 6:14, 0 users, load averages: 0.00, 0.00, 0.00 > > > > May 24 16:30:44 sf-10 kernel: bge1: link state changed to DOWN > > May 24 16:30:44 sf-10 kernel: bge1: link state changed to UP > > > > this is not serious, the ilo (ssh) connection is ok, but it's anoying, we > > have > > more > > than 10 of this hosts, and if I upgrade all of them, the logs will fill up > > with this :-) > > > > any ideas? > > > > cheers, > > danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200, Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > is toggeling bge1 DOWN/UP every few hours, this port is being used by the ILO. > To check, I upgraded another identical host, and the same problem appears. What is the last known working revision? > There > is not correlation with time, since they happend at totaly different times. > I rebooted both hosts at almost the same time. > one host : > uptime: 5:24PM up 6:15, 0 users, load averages: 0.00, 0.00, 0.00 > May 24 12:53:52 sf-04 kernel: bge1: link state changed to DOWN > May 24 12:53:55 sf-04 kernel: bge1: link state changed to UP > May 24 15:34:25 sf-04 kernel: bge1: link state changed to DOWN > May 24 15:34:28 sf-04 kernel: bge1: link state changed to UP > > and > uptime: 5:24PM up 6:14, 0 users, load averages: 0.00, 0.00, 0.00 > > May 24 16:30:44 sf-10 kernel: bge1: link state changed to DOWN > May 24 16:30:44 sf-10 kernel: bge1: link state changed to UP > > this is not serious, the ilo (ssh) connection is ok, but it's anoying, we > have > more > than 10 of this hosts, and if I upgrade all of them, the logs will fill up > with this :-) > > any ideas? > > cheers, > danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
> Is this your bug? > > http://www.freebsd.org/cgi/query-pr.cgi?pr=171121 no, this bge is only used for the ilo, it happens even if it's idling, ie no active connection. it is also very erratic, it happens at random intervals, from 3 hs to 10hs, and the down/up 'hickup' lasts between less than a sec to about 3sec. thanks, danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
> Daniel Braniss wrote: > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200, > > is toggeling bge1 DOWN/UP every few hours, this port is being used by the > > ILO. > > To check, I upgraded another identical host, and the same problem appears. > > There > > is not correlation with time, since they happend at totaly different times. > > I rebooted both hosts at almost the same time. > > one host : > > uptime: 5:24PM up 6:15, 0 users, load averages: 0.00, 0.00, 0.00 > > May 24 12:53:52 sf-04 kernel: bge1: link state changed to DOWN > > May 24 12:53:55 sf-04 kernel: bge1: link state changed to UP > > May 24 15:34:25 sf-04 kernel: bge1: link state changed to DOWN > > May 24 15:34:28 sf-04 kernel: bge1: link state changed to UP > > > > and > > uptime: 5:24PM up 6:14, 0 users, load averages: 0.00, 0.00, 0.00 > > > > May 24 16:30:44 sf-10 kernel: bge1: link state changed to DOWN > > May 24 16:30:44 sf-10 kernel: bge1: link state changed to UP > > > > this is not serious, the ilo (ssh) connection is ok, but it's anoying, we > > have > > more > > than 10 of this hosts, and if I upgrade all of them, the logs will fill up > > with this :-) > > What revision are you running? > Friday morning's, probably r250960 (I run svnsync then convert to hg :-) > There was problem report at February > http://lists.freebsd.org/pipermail/freebsd-net/2013-February/034715.html > http://lists.freebsd.org/pipermail/freebsd-net/2013-March/034778.html > > I provided access to Yongari to our Sun Fire X2100 M2 (bge 5715C) and he > fixed the problem. (in revision r248226, I don't know if it was MFCed) > > http://lists.freebsd.org/pipermail/freebsd-net/2013-March/034922.html > > Miroslav Lachman well, it seems different, it just goes down, then few secs later goes up. thanks, danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
Daniel Braniss wrote: hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200, is toggeling bge1 DOWN/UP every few hours, this port is being used by the ILO. To check, I upgraded another identical host, and the same problem appears. There is not correlation with time, since they happend at totaly different times. I rebooted both hosts at almost the same time. one host : uptime: 5:24PM up 6:15, 0 users, load averages: 0.00, 0.00, 0.00 May 24 12:53:52 sf-04 kernel: bge1: link state changed to DOWN May 24 12:53:55 sf-04 kernel: bge1: link state changed to UP May 24 15:34:25 sf-04 kernel: bge1: link state changed to DOWN May 24 15:34:28 sf-04 kernel: bge1: link state changed to UP and uptime: 5:24PM up 6:14, 0 users, load averages: 0.00, 0.00, 0.00 May 24 16:30:44 sf-10 kernel: bge1: link state changed to DOWN May 24 16:30:44 sf-10 kernel: bge1: link state changed to UP this is not serious, the ilo (ssh) connection is ok, but it's anoying, we have more than 10 of this hosts, and if I upgrade all of them, the logs will fill up with this :-) What revision are you running? There was problem report at February http://lists.freebsd.org/pipermail/freebsd-net/2013-February/034715.html http://lists.freebsd.org/pipermail/freebsd-net/2013-March/034778.html I provided access to Yongari to our Sun Fire X2100 M2 (bge 5715C) and he fixed the problem. (in revision r248226, I don't know if it was MFCed) http://lists.freebsd.org/pipermail/freebsd-net/2013-March/034922.html Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
Is this your bug? http://www.freebsd.org/cgi/query-pr.cgi?pr=171121 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
SunFire X2200 ilo's bge1 DOWN/UP
hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200, is toggeling bge1 DOWN/UP every few hours, this port is being used by the ILO. To check, I upgraded another identical host, and the same problem appears. There is not correlation with time, since they happend at totaly different times. I rebooted both hosts at almost the same time. one host : uptime: 5:24PM up 6:15, 0 users, load averages: 0.00, 0.00, 0.00 May 24 12:53:52 sf-04 kernel: bge1: link state changed to DOWN May 24 12:53:55 sf-04 kernel: bge1: link state changed to UP May 24 15:34:25 sf-04 kernel: bge1: link state changed to DOWN May 24 15:34:28 sf-04 kernel: bge1: link state changed to UP and uptime: 5:24PM up 6:14, 0 users, load averages: 0.00, 0.00, 0.00 May 24 16:30:44 sf-10 kernel: bge1: link state changed to DOWN May 24 16:30:44 sf-10 kernel: bge1: link state changed to UP this is not serious, the ilo (ssh) connection is ok, but it's anoying, we have more than 10 of this hosts, and if I upgrade all of them, the logs will fill up with this :-) any ideas? cheers, danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"