Re: sunfire v120 gem interfaces
On Fri, Nov 13, 2015 at 12:36:40PM +1000, David Gwynne wrote: > > > On 13 Nov 2015, at 12:16, Ryan Freeman wrote: > > > > On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote: > >> any joy? i mean, failure? > > > > Well I got something different. I've noticed the failures only seem to > > happen > > when my roommates arrive home. I can use my stuff remotely all day from > > work > > without a hitch, roommates come home and usually within an hr there is an > > internet complaint. > > > > Since I started using the little scripts to detect connection failure > > and down/up the iface in question, things had been pretty good simply in the > > fact that nobody could really notice before it fixed itself. > > > > Today the machine dropped to ddb>! of course i couldn't remember a damn > > thing to type :( i got trace, terribly sorry it wasn't more... > > > > ddb> trace > > extent_free(400012600c0, 0, 0, 0, 1fef078, 800012fa) at > > extent_free > > +0x174 > > iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at > > iommu_dvmamap_unl > > oad+0x74 > > gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, > > 80 > > 00) at gem_rint+0x160 > > gem_intr(400014ac000, c00ca000, 2000, 0, 0, 8000) at gem_intr+0x154 > > intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc > > sparc_interrupt(0, 400014b, 80206910, 400171b7c60, 40009ec0810, 0) at > > sparc > > _interrupt+0x298 > > gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, > > 40009b73c10) a > > t gem_ioctl+0x19c > > ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c > > sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190 > > syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4 > > softtrap(3, 80206910, fffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c > > ddb> > > that is interesting. if you're still in ddb, can you go sh panic? > > if not, not biggy. > > my gut feeling is our ring accounting is wonky. mpi@ and jmatthew@ have > tweaks to gem(4) for mpsafety which might fix this. ill poke them to see if > they would share. I scraped some more stuff from another panic, not running w/ the jmatthew patch yet though... Connected to /dev/cuaU0 (speed 9600) ddb> trace extent_free(400012600c0, 0, 0, 0, 1fef078, 86fc) at extent_free +0x174 iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at iommu_dvmamap_unl oad+0x74 gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, 80 00) at gem_rint+0x160 gem_intr(400014ac000, c005, 2000, 0, 0, 8000) at gem_intr+0x154 intr_handler(e0017ec8, 4000117ae00, 1b5e78e1, 0, 800, 2) at intr_handler+0xc sparc_interrupt(0, 400014b, 80206910, 40017d87c60, 40009f34cb0, 0) at sparc _interrupt+0x298 gem_ioctl(400014ac048, 400014ac000, 40017d87c60, 40017d87c60, 0, 400096ca950) a t gem_ioctl+0x19c ifioctl(0, 80206910, 40017d87c60, 400096ca950, 1012d74, 0) at ifioctl+0x38c sys_ioctl(0, 40017d87db8, 40017d87df8, 0, 0, 14b) at sys_ioctl+0x190 syscall(40017d87ed0, 436, 198ac20888, 198ac2088c, 0, 0) at syscall+0x3c4 softtrap(3, 80206910, fffd8138, 0, 0, 1ff7fff6df8) at softtrap+0x19c ddb> sh panic extent_free: extent `psycho0 dvma', region not within extent ddb> ps PID PPID PGRPUID S FLAGS WAIT COMMAND *22395 2599 32097 0 7 0x2ifconfig 2599 32097 32097 0 30x8a pause sh 32097 1585 32097 0 30x8a pause sh 1585 27132 27132 0 30x80 piperdcron 21846 1 21846 77 20x90dhclient 13160 1 13160 0 30x80 poll dhclient 5578 7747 5578 1000 30x83 ttyin ksh 7747 16002 16002 1000 30x90 selectsshd 16002 28625 16002 0 30x92 poll sshd 4106195195 0 30x83 poll pftop 195 24715195 1000 30x8b pause ksh 24715 5976 5976 1000 30x90 selectsshd 5976 28625 5976 0 30x92 poll sshd 28625 1 28625 0 30x80 selectsshd 29463 19386 29463 1000 30x83 kqreadtail 19386 24409 19386 1000 30x8b pause ksh 24409 7564 7564 1000 30x90 selectsshd 7564 1 7564 0 30x92 poll sshd
Re: sunfire v120 gem interfaces
On Fri, Nov 13, 2015 at 12:36:40PM +1000, David Gwynne wrote: > > > On 13 Nov 2015, at 12:16, Ryan Freeman wrote: > > > > On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote: > >> any joy? i mean, failure? > > > > Well I got something different. I've noticed the failures only seem to > > happen > > when my roommates arrive home. I can use my stuff remotely all day from > > work > > without a hitch, roommates come home and usually within an hr there is an > > internet complaint. > > > > Since I started using the little scripts to detect connection failure > > and down/up the iface in question, things had been pretty good simply in the > > fact that nobody could really notice before it fixed itself. > > > > Today the machine dropped to ddb>! of course i couldn't remember a damn > > thing to type :( i got trace, terribly sorry it wasn't more... > > > > ddb> trace > > extent_free(400012600c0, 0, 0, 0, 1fef078, 800012fa) at > > extent_free > > +0x174 > > iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at > > iommu_dvmamap_unl > > oad+0x74 > > gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, > > 80 > > 00) at gem_rint+0x160 > > gem_intr(400014ac000, c00ca000, 2000, 0, 0, 8000) at gem_intr+0x154 > > intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc > > sparc_interrupt(0, 400014b, 80206910, 400171b7c60, 40009ec0810, 0) at > > sparc > > _interrupt+0x298 > > gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, > > 40009b73c10) a > > t gem_ioctl+0x19c > > ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c > > sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190 > > syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4 > > softtrap(3, 80206910, fffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c > > ddb> > > that is interesting. if you're still in ddb, can you go sh panic? > > if not, not biggy. Sadly, I am not. as it is my router, I had to reboot to get back online to send the mail. If it triggers again I will make sure I include that. > my gut feeling is our ring accounting is wonky. mpi@ and jmatthew@ have > tweaks to gem(4) for mpsafety which might fix this. ill poke them to see if > they would share. I am willing to try anything! :) I will reiterate that I am just running 5.8 stable (with mtier binpatches for errata); if it requires me to bump up to -current, no biggie :) > > dlg > > > > > > > > >> > >>> On 9 Nov 2015, at 10:40 AM, Ryan Freeman wrote: > >>> > >>> On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote: > can you get the ifconfig output when its locked up? and a copy of what > systat mb is showing? > > cheers, > dlg > >>> > >>> Thanks David, > >>> > >>> I have setup a script to try and capture this immediately when it happens. > >>> > >>> FWIW here is the output as it is now, working: > >>> > >>> 16:35 ryan@void:~$ ifconfig > >>> lo0: flags=8049 mtu 32768 > >>> priority: 0 > >>> groups: lo > >>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 > >>> inet6 ::1 prefixlen 128 > >>> inet 127.0.0.1 netmask 0xff00 > >>> gem0: flags=8867 > >>> mtu 1500 > >>> lladdr 00:03:ba:2b:47:70 > >>> priority: 0 > >>> groups: egress > >>> media: Ethernet autoselect (100baseTX full-duplex) > >>> status: active > >>> inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255 > >>> gem1: > >>> flags=8b63 > >>> mtu 1500 > >>> lladdr 00:03:ba:2b:47:71 > >>> priority: 0 > >>> media: Ethernet autoselect (100baseTX full-duplex) > >>> status: active > >>> inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31 > >>> inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2 > >>> inet6 2001:470:b:6cf::1 prefixlen 64 > >>> enc0: flags=0<> > >>> priority: 0 > >>> groups: enc > >>> status: active > >>> vlan100: flags=8843 mtu 1500 > >>> lladdr 00:03:ba:2b:47:71 > >>> description: servers > >>> priority: 0 > >>> vlan: 100 parent interface: gem1 > >>> groups: vlan > >>> status: active > >>> inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31 > >>> inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5 > >>> inet6 2001:470:eac8:666::1 prefixlen 64 > >>> vlan101: flags=8843 mtu 1500 > >>> lladdr 00:03:ba:2b:47:71 > >>> description: workstations > >>> priority: 0 > >>> vlan: 101 parent interface: gem1 > >>> groups: vlan > >>> status: active > >>> inet 10.21.8.254 netmask 0xff80 broadcast 10.21.8.255 > >>> inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6 > >>> inet6 2001:470:eac8:a::1 prefixlen 64 > >>> vlan102: flags=8843 mtu 1500 > >>> lladdr 00:03:ba:2b:47:71 > >>> description: wireless > >>> priority: 0 > >>>
Re: sunfire v120 gem interfaces
> On 13 Nov 2015, at 12:16, Ryan Freeman wrote: > > On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote: >> any joy? i mean, failure? > > Well I got something different. I've noticed the failures only seem to happen > when my roommates arrive home. I can use my stuff remotely all day from work > without a hitch, roommates come home and usually within an hr there is an > internet complaint. > > Since I started using the little scripts to detect connection failure > and down/up the iface in question, things had been pretty good simply in the > fact that nobody could really notice before it fixed itself. > > Today the machine dropped to ddb>! of course i couldn't remember a damn > thing to type :( i got trace, terribly sorry it wasn't more... > > ddb> trace > extent_free(400012600c0, 0, 0, 0, 1fef078, 800012fa) at > extent_free > +0x174 > iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at > iommu_dvmamap_unl > oad+0x74 > gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, > 80 > 00) at gem_rint+0x160 > gem_intr(400014ac000, c00ca000, 2000, 0, 0, 8000) at gem_intr+0x154 > intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc > sparc_interrupt(0, 400014b, 80206910, 400171b7c60, 40009ec0810, 0) at > sparc > _interrupt+0x298 > gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, 40009b73c10) > a > t gem_ioctl+0x19c > ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c > sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190 > syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4 > softtrap(3, 80206910, fffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c > ddb> that is interesting. if you're still in ddb, can you go sh panic? if not, not biggy. my gut feeling is our ring accounting is wonky. mpi@ and jmatthew@ have tweaks to gem(4) for mpsafety which might fix this. ill poke them to see if they would share. dlg > > > >> >>> On 9 Nov 2015, at 10:40 AM, Ryan Freeman wrote: >>> >>> On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote: can you get the ifconfig output when its locked up? and a copy of what systat mb is showing? cheers, dlg >>> >>> Thanks David, >>> >>> I have setup a script to try and capture this immediately when it happens. >>> >>> FWIW here is the output as it is now, working: >>> >>> 16:35 ryan@void:~$ ifconfig >>> lo0: flags=8049 mtu 32768 >>> priority: 0 >>> groups: lo >>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 >>> inet6 ::1 prefixlen 128 >>> inet 127.0.0.1 netmask 0xff00 >>> gem0: flags=8867 >>> mtu 1500 >>> lladdr 00:03:ba:2b:47:70 >>> priority: 0 >>> groups: egress >>> media: Ethernet autoselect (100baseTX full-duplex) >>> status: active >>> inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255 >>> gem1: >>> flags=8b63 >>> mtu 1500 >>> lladdr 00:03:ba:2b:47:71 >>> priority: 0 >>> media: Ethernet autoselect (100baseTX full-duplex) >>> status: active >>> inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31 >>> inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2 >>> inet6 2001:470:b:6cf::1 prefixlen 64 >>> enc0: flags=0<> >>> priority: 0 >>> groups: enc >>> status: active >>> vlan100: flags=8843 mtu 1500 >>> lladdr 00:03:ba:2b:47:71 >>> description: servers >>> priority: 0 >>> vlan: 100 parent interface: gem1 >>> groups: vlan >>> status: active >>> inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31 >>> inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5 >>> inet6 2001:470:eac8:666::1 prefixlen 64 >>> vlan101: flags=8843 mtu 1500 >>> lladdr 00:03:ba:2b:47:71 >>> description: workstations >>> priority: 0 >>> vlan: 101 parent interface: gem1 >>> groups: vlan >>> status: active >>> inet 10.21.8.254 netmask 0xff80 broadcast 10.21.8.255 >>> inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6 >>> inet6 2001:470:eac8:a::1 prefixlen 64 >>> vlan102: flags=8843 mtu 1500 >>> lladdr 00:03:ba:2b:47:71 >>> description: wireless >>> priority: 0 >>> vlan: 102 parent interface: gem1 >>> groups: vlan >>> status: active >>> inet 10.21.9.254 netmask 0xff80 broadcast 10.21.9.255 >>> inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7 >>> inet6 2001:470:eac8:b::1 prefixlen 64 >>> vlan2: flags=8843 mtu 1500 >>> lladdr 00:03:ba:2b:47:71 >>> description: transit >>> priority: 0 >>> vlan: 2 parent interface: gem1 >>> groups: vlan >>> status: active >>> inet 172.21.1.2 netmask 0xfffc broadcast 172.21.1.3 >>> tun0: flags=51 mtu 1500 >>> priority: 0 >>> groups: tun >>> status: dow
Re: sunfire v120 gem interfaces
On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote: > any joy? i mean, failure? Well I got something different. I've noticed the failures only seem to happen when my roommates arrive home. I can use my stuff remotely all day from work without a hitch, roommates come home and usually within an hr there is an internet complaint. Since I started using the little scripts to detect connection failure and down/up the iface in question, things had been pretty good simply in the fact that nobody could really notice before it fixed itself. Today the machine dropped to ddb>! of course i couldn't remember a damn thing to type :( i got trace, terribly sorry it wasn't more... ddb> trace extent_free(400012600c0, 0, 0, 0, 1fef078, 800012fa) at extent_free +0x174 iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at iommu_dvmamap_unl oad+0x74 gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, 80 00) at gem_rint+0x160 gem_intr(400014ac000, c00ca000, 2000, 0, 0, 8000) at gem_intr+0x154 intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc sparc_interrupt(0, 400014b, 80206910, 400171b7c60, 40009ec0810, 0) at sparc _interrupt+0x298 gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, 40009b73c10) a t gem_ioctl+0x19c ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190 syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4 softtrap(3, 80206910, fffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c ddb> > > > On 9 Nov 2015, at 10:40 AM, Ryan Freeman wrote: > > > > On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote: > >> can you get the ifconfig output when its locked up? and a copy of what > >> systat mb is showing? > >> > >> cheers, > >> dlg > > > > Thanks David, > > > > I have setup a script to try and capture this immediately when it happens. > > > > FWIW here is the output as it is now, working: > > > > 16:35 ryan@void:~$ ifconfig > > lo0: flags=8049 mtu 32768 > >priority: 0 > >groups: lo > >inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 > >inet6 ::1 prefixlen 128 > >inet 127.0.0.1 netmask 0xff00 > > gem0: flags=8867 > > mtu 1500 > >lladdr 00:03:ba:2b:47:70 > >priority: 0 > >groups: egress > >media: Ethernet autoselect (100baseTX full-duplex) > >status: active > >inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255 > > gem1: > > flags=8b63 > > mtu 1500 > >lladdr 00:03:ba:2b:47:71 > >priority: 0 > >media: Ethernet autoselect (100baseTX full-duplex) > >status: active > >inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31 > >inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2 > >inet6 2001:470:b:6cf::1 prefixlen 64 > > enc0: flags=0<> > >priority: 0 > >groups: enc > >status: active > > vlan100: flags=8843 mtu 1500 > >lladdr 00:03:ba:2b:47:71 > >description: servers > >priority: 0 > >vlan: 100 parent interface: gem1 > >groups: vlan > >status: active > >inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31 > >inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5 > >inet6 2001:470:eac8:666::1 prefixlen 64 > > vlan101: flags=8843 mtu 1500 > >lladdr 00:03:ba:2b:47:71 > >description: workstations > >priority: 0 > >vlan: 101 parent interface: gem1 > >groups: vlan > >status: active > >inet 10.21.8.254 netmask 0xff80 broadcast 10.21.8.255 > >inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6 > >inet6 2001:470:eac8:a::1 prefixlen 64 > > vlan102: flags=8843 mtu 1500 > >lladdr 00:03:ba:2b:47:71 > >description: wireless > >priority: 0 > >vlan: 102 parent interface: gem1 > >groups: vlan > >status: active > >inet 10.21.9.254 netmask 0xff80 broadcast 10.21.9.255 > >inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7 > >inet6 2001:470:eac8:b::1 prefixlen 64 > > vlan2: flags=8843 mtu 1500 > >lladdr 00:03:ba:2b:47:71 > >description: transit > >priority: 0 > >vlan: 2 parent interface: gem1 > >groups: vlan > >status: active > >inet 172.21.1.2 netmask 0xfffc broadcast 172.21.1.3 > > tun0: flags=51 mtu 1500 > >priority: 0 > >groups: tun > >status: down > >inet 10.21.2.1 --> 10.21.2.2 netmask 0xfffc > > gif0: flags=8051 mtu 1280 > >priority: 0 > >groups: gif egress > >tunnel: inet 96.54.13.103 -> 216.218.226.238 > >inet6 fe80::203:baff:fe2b:4770%gif0 -> prefixlen 64 scopeid 0xa > >inet6 2001:470:a:6cf::2 -> 2001:470:a:6cf::1 prefixlen 12
Re: sunfire v120 gem interfaces
On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote: > any joy? i mean, failure? Last night my script triggered three times, hooray ;) unfortunately my eyes do not even notice much of a difference outside of system load values in the systat output :( gem0: flags=8867 mtu 1500 lladdr 00:03:ba:2b:47:70 priority: 0 groups: egress media: Ethernet autoselect (100baseTX full-duplex) status: active inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255 gem0: flags=8867 mtu 1500 lladdr 00:03:ba:2b:47:70 priority: 0 groups: egress media: Ethernet autoselect (100baseTX full-duplex) status: active inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255 gem0: flags=8867 mtu 1500 lladdr 00:03:ba:2b:47:70 priority: 0 groups: egress media: Ethernet autoselect (100baseTX full-duplex) status: active inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255 # 8 usersLoad 0.69 0.43 0.29 Mon Nov 9 20:31:11 2015 IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System0 25656 129 2048321025 lo0 gem0 204818 4 12418 gem1 204812 4 12412 enc0 vlan100 vlan101 vlan102 vlan2 tun0 gif0 pflow0 pflog0 8 usersLoad 0.44 0.39 0.29 Mon Nov 9 20:32:11 2015 IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System0 25652 129 2048251025 lo0 gem0 204811 4 12411 gem1 204812 4 12412 enc0 vlan100 vlan101 vlan102 vlan2 tun0 gif0 pflow0 pflog0 8 usersLoad 0.11 0.18 0.16 Mon Nov 9 21:54:11 2015 IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System0 25655 129 2048281025 lo0 gem0 204818 4 12418 gem1 204810 4 12410 enc0 vlan100 vlan101 vlan102 vlan2 tun0 gif0 pflow0 pflog0 > > > On 9 Nov 2015, at 10:40 AM, Ryan Freeman wrote: > > > > On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote: > >> can you get the
Re: sunfire v120 gem interfaces
any joy? i mean, failure? > On 9 Nov 2015, at 10:40 AM, Ryan Freeman wrote: > > On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote: >> can you get the ifconfig output when its locked up? and a copy of what >> systat mb is showing? >> >> cheers, >> dlg > > Thanks David, > > I have setup a script to try and capture this immediately when it happens. > > FWIW here is the output as it is now, working: > > 16:35 ryan@void:~$ ifconfig > lo0: flags=8049 mtu 32768 >priority: 0 >groups: lo >inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 >inet6 ::1 prefixlen 128 >inet 127.0.0.1 netmask 0xff00 > gem0: flags=8867 mtu > 1500 >lladdr 00:03:ba:2b:47:70 >priority: 0 >groups: egress >media: Ethernet autoselect (100baseTX full-duplex) >status: active >inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255 > gem1: > flags=8b63 > mtu 1500 >lladdr 00:03:ba:2b:47:71 >priority: 0 >media: Ethernet autoselect (100baseTX full-duplex) >status: active >inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31 >inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2 >inet6 2001:470:b:6cf::1 prefixlen 64 > enc0: flags=0<> >priority: 0 >groups: enc >status: active > vlan100: flags=8843 mtu 1500 >lladdr 00:03:ba:2b:47:71 >description: servers >priority: 0 >vlan: 100 parent interface: gem1 >groups: vlan >status: active >inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31 >inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5 >inet6 2001:470:eac8:666::1 prefixlen 64 > vlan101: flags=8843 mtu 1500 >lladdr 00:03:ba:2b:47:71 >description: workstations >priority: 0 >vlan: 101 parent interface: gem1 >groups: vlan >status: active >inet 10.21.8.254 netmask 0xff80 broadcast 10.21.8.255 >inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6 >inet6 2001:470:eac8:a::1 prefixlen 64 > vlan102: flags=8843 mtu 1500 >lladdr 00:03:ba:2b:47:71 >description: wireless >priority: 0 >vlan: 102 parent interface: gem1 >groups: vlan >status: active >inet 10.21.9.254 netmask 0xff80 broadcast 10.21.9.255 >inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7 >inet6 2001:470:eac8:b::1 prefixlen 64 > vlan2: flags=8843 mtu 1500 >lladdr 00:03:ba:2b:47:71 >description: transit >priority: 0 >vlan: 2 parent interface: gem1 >groups: vlan >status: active >inet 172.21.1.2 netmask 0xfffc broadcast 172.21.1.3 > tun0: flags=51 mtu 1500 >priority: 0 >groups: tun >status: down >inet 10.21.2.1 --> 10.21.2.2 netmask 0xfffc > gif0: flags=8051 mtu 1280 >priority: 0 >groups: gif egress >tunnel: inet 96.54.13.103 -> 216.218.226.238 >inet6 fe80::203:baff:fe2b:4770%gif0 -> prefixlen 64 scopeid 0xa >inet6 2001:470:a:6cf::2 -> 2001:470:a:6cf::1 prefixlen 128 > pflow0: flags=41 mtu 1492 >priority: 0 >pflow: sender: 127.0.0.1 receiver: 127.0.0.1:9995 version: 5 >groups: pflow > pflog0: flags=141 mtu 33144 >priority: 0 >groups: pflog > > 16:36 ryan@void:~$ systat -b mb >8 usersLoad 0.21 0.25 0.26 Sun Nov 8 16:37:12 2015 > > IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM > > System0 25648 129 > > 2048241025 > > lo0 > > gem0 204811 4 12411 > > gem1 204812 4 12412 > > enc0 > > vlan100 > > vlan101 > > vlan102 > > vlan2 > > tun0 > > gif0 > > pflow0 > > pflog0 > >> >>> On 9 Nov 2015, at 09:36, Ryan Freeman wrote: >>> >>> Hey tech@, >>> >>> At my wits end here, I recently got a sunfire v120 from work for pretty >>> cheap. >>> Quite excited to have some non x86 hardware, I set it up as a
Re: sunfire v120 gem interfaces
Ryan Freeman: > However, for some reason after sometimes mere hours -- othertimes days at a > time, the gem0 interface needs to be cycled: [...] starting to wonder > if this machine is at its EOL and the network ports are dying :( I see the same problem with the gem in my Blade 150. -- Christian "naddy" Weisgerber na...@mips.inka.de
Re: sunfire v120 gem interfaces
I had problems with my dual AC200 carp setup, in that the interfaces would periodically stop receiving packets. Transmission still worked though, so the carp wouldn't fail over... Machines are retired now, but I believe details exist in the archives somewhere. I also believe henning@ had similar issues in the past. /Alexander On November 9, 2015 12:36:33 AM GMT+01:00, Ryan Freeman wrote: >Hey tech@, > >At my wits end here, I recently got a sunfire v120 from work for pretty >cheap. >Quite excited to have some non x86 hardware, I set it up as a router. > >However, for some reason after sometimes mere hours -- othertimes days >at a >time, the gem0 interface needs to be cycled: > >ifconfig gem0 down >ifconfig gem0 up >dhclient gem0 > >no packets pass until that has been done. At first I have been >placing the >blame squarely on the Hitron modem we have in the house from shaw >cable, >but now I've noticed the issue happen twice on the internal interface >as well, >gem1. All VLANs I have setup stop responding until gem1 is cycled. > >gem1 is just used by a collection of vlan(4) interfaces, so traffic >resumes >immediately after interface gem1 down/up. > >I've tried to turn on ifconfig gem0 debug to catch anything wierd, but >there >has been nothing of interest there. Dmesg attached, starting to >wonder >if this machine is at its EOL and the network ports are dying :( > >This issue occurred with the 5.7 release as well. > >dmesg: >console is /pci@1f,0/pci@1,1/isa@7/serial@0,3f8 >Copyright (c) 1982, 1986, 1989, 1991, 1993 > The Regents of the University of California. All rights reserved. >Copyright (c) 1995-2015 OpenBSD. All rights reserved. >http://www.OpenBSD.org > >OpenBSD 5.8 (GENERIC) #0: Thu Oct 22 00:24:09 PDT 2015 >r...@void.inter.lan:/usr/src/sys/arch/sparc64/compile/GENERIC >real mem = 1073741824 (1024MB) >avail mem = 1039228928 (991MB) >mpath0 at root >scsibus0 at mpath0: 256 targets >mainbus0 at root: Sun Fire V120 (UltraSPARC-IIe 648MHz) >cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 3.3) @ 648 MHz >cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 512K >external (64 b/l) >psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0 >psycho0: bus range 0-2, PCI bus 0 >psycho0: dvma map c000-dfff >pci0 at psycho0 >ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x13 >pci1 at ppb0 bus 1 >ebus0 at pci1 dev 12 function 0 "Sun RIO EBus" rev 0x01 >"flashprom" at ebus0 addr 0-f not configured >clock1 at ebus0 addr 0-1fff: mk48t59 >lom0 at ebus0 addr 20-23 ivec 0x2a: LOMlite2 rev 3.12 >alipm0 at pci1 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz >clock >iic0 at alipm0 >"max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs >spdmem0 at iic0 addr 0x54: 512MB SDRAM registered ECC PC133CL2 >spdmem1 at iic0 addr 0x55: 512MB SDRAM registered ECC PC133CL2 >ebus1 at pci1 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00 >power0 at ebus1 addr 2000-2007 ivec 0x25 >com0 at ebus1 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo >com0: console >com1 at ebus1 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo >gem0 at pci1 dev 12 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7c6, >address 00:03:ba:2b:47:70 >ukphy0 at gem0 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI >0x0010dd, model 0x0002 >ohci0 at pci1 dev 12 function 3 "Sun USB" rev 0x01: ivec 0x7e4, version >1.0, legacy support >pciide0 at pci1 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: >DMA, channel 0 configured to native-PCI, channel 1 configured to >native-PCI >pciide0: using ivec 0x7cc for native-PCI interrupt >atapiscsi0 at pciide0 channel 0 drive 0 >scsibus1 at atapiscsi0: 2 targets >cd0 at scsibus1 targ 0 lun 0: ATAPI 5/cdrom >removable >cd0(pciide0:0:0): using PIO mode 4, DMA mode 2 >pciide0: channel 1 disabled (no drives) >gem1 at pci1 dev 5 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7dc, >address 00:03:ba:2b:47:71 >ukphy1 at gem1 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI >0x0010dd, model 0x0002 >ohci1 at pci1 dev 5 function 3 "Sun USB" rev 0x01: ivec 0x7e6, version >1.0, legacy support >usb0 at ohci0: USB revision 1.0 >uhub0 at usb0 "Sun OHCI root hub" rev 1.00/1.00 addr 1 >usb1 at ohci1: USB revision 1.0 >uhub1 at usb1 "Sun OHCI root hub" rev 1.00/1.00 addr 1 >ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x13 >pci2 at ppb1 bus 2 >siop0 at pci2 dev 8 function 0 "Symbios Logic 53c896" rev 0x07: ivec >0x7e0, using 8K of on-board RAM >scsibus2 at siop0: 16 targets, initiator 7 >sym0 at scsibus2 targ 0 lun 0: SCSI3 >0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN8731804D9 >sd0 at scsibus0 targ 0 lun 0: SCSI3 >0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN8731804D9 >sd0: 34732MB, 512 bytes/sector, 71132959 sectors >probe(siop0:1:0): Check Condition (error 0x70) on opcode 0x0 >SENSE KEY: Hardware Error > ASC/ASCQ: Defect List Error > FRU CODE: 0x7 >sym1 at scsibus2 targ 1 lun 0: SCSI3 >0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0B
Re: sunfire v120 gem interfaces
On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote: > can you get the ifconfig output when its locked up? and a copy of what systat > mb is showing? > > cheers, > dlg Thanks David, I have setup a script to try and capture this immediately when it happens. FWIW here is the output as it is now, working: 16:35 ryan@void:~$ ifconfig lo0: flags=8049 mtu 32768 priority: 0 groups: lo inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff00 gem0: flags=8867 mtu 1500 lladdr 00:03:ba:2b:47:70 priority: 0 groups: egress media: Ethernet autoselect (100baseTX full-duplex) status: active inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255 gem1: flags=8b63 mtu 1500 lladdr 00:03:ba:2b:47:71 priority: 0 media: Ethernet autoselect (100baseTX full-duplex) status: active inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31 inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2 inet6 2001:470:b:6cf::1 prefixlen 64 enc0: flags=0<> priority: 0 groups: enc status: active vlan100: flags=8843 mtu 1500 lladdr 00:03:ba:2b:47:71 description: servers priority: 0 vlan: 100 parent interface: gem1 groups: vlan status: active inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31 inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5 inet6 2001:470:eac8:666::1 prefixlen 64 vlan101: flags=8843 mtu 1500 lladdr 00:03:ba:2b:47:71 description: workstations priority: 0 vlan: 101 parent interface: gem1 groups: vlan status: active inet 10.21.8.254 netmask 0xff80 broadcast 10.21.8.255 inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6 inet6 2001:470:eac8:a::1 prefixlen 64 vlan102: flags=8843 mtu 1500 lladdr 00:03:ba:2b:47:71 description: wireless priority: 0 vlan: 102 parent interface: gem1 groups: vlan status: active inet 10.21.9.254 netmask 0xff80 broadcast 10.21.9.255 inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7 inet6 2001:470:eac8:b::1 prefixlen 64 vlan2: flags=8843 mtu 1500 lladdr 00:03:ba:2b:47:71 description: transit priority: 0 vlan: 2 parent interface: gem1 groups: vlan status: active inet 172.21.1.2 netmask 0xfffc broadcast 172.21.1.3 tun0: flags=51 mtu 1500 priority: 0 groups: tun status: down inet 10.21.2.1 --> 10.21.2.2 netmask 0xfffc gif0: flags=8051 mtu 1280 priority: 0 groups: gif egress tunnel: inet 96.54.13.103 -> 216.218.226.238 inet6 fe80::203:baff:fe2b:4770%gif0 -> prefixlen 64 scopeid 0xa inet6 2001:470:a:6cf::2 -> 2001:470:a:6cf::1 prefixlen 128 pflow0: flags=41 mtu 1492 priority: 0 pflow: sender: 127.0.0.1 receiver: 127.0.0.1:9995 version: 5 groups: pflow pflog0: flags=141 mtu 33144 priority: 0 groups: pflog 16:36 ryan@void:~$ systat -b mb 8 usersLoad 0.21 0.25 0.26 Sun Nov 8 16:37:12 2015 IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System0 25648 129 2048241025 lo0 gem0 204811 4 12411 gem1 204812 4 12412 enc0 vlan100 vlan101 vlan102 vlan2 tun0 gif0 pflow0 pflog0 > > > On 9 Nov 2015, at 09:36, Ryan Freeman wrote: > > > > Hey tech@, > > > > At my wits end here, I recently got a sunfire v120 from work for pretty > > cheap. > > Quite excited to have some non x86 hardware, I set it up as a router. > > > > However, for some reason after sometimes mere hours -- othertimes days at a > > time, the gem0 interface needs to be cycled: > > > > ifconfig gem0 down > > ifconfig gem0 up > > dhclient gem0 > > > > no packets pass until that has been done. At first I have been
Re: sunfire v120 gem interfaces
can you get the ifconfig output when its locked up? and a copy of what systat mb is showing? cheers, dlg > On 9 Nov 2015, at 09:36, Ryan Freeman wrote: > > Hey tech@, > > At my wits end here, I recently got a sunfire v120 from work for pretty cheap. > Quite excited to have some non x86 hardware, I set it up as a router. > > However, for some reason after sometimes mere hours -- othertimes days at a > time, the gem0 interface needs to be cycled: > > ifconfig gem0 down > ifconfig gem0 up > dhclient gem0 > > no packets pass until that has been done. At first I have been placing the > blame squarely on the Hitron modem we have in the house from shaw cable, > but now I've noticed the issue happen twice on the internal interface as well, > gem1. All VLANs I have setup stop responding until gem1 is cycled. > > gem1 is just used by a collection of vlan(4) interfaces, so traffic resumes > immediately after interface gem1 down/up. > > I've tried to turn on ifconfig gem0 debug to catch anything wierd, but there > has been nothing of interest there. Dmesg attached, starting to wonder > if this machine is at its EOL and the network ports are dying :( > > This issue occurred with the 5.7 release as well. > > dmesg: > console is /pci@1f,0/pci@1,1/isa@7/serial@0,3f8 > Copyright (c) 1982, 1986, 1989, 1991, 1993 >The Regents of the University of California. All rights reserved. > Copyright (c) 1995-2015 OpenBSD. All rights reserved. http://www.OpenBSD.org > > OpenBSD 5.8 (GENERIC) #0: Thu Oct 22 00:24:09 PDT 2015 >r...@void.inter.lan:/usr/src/sys/arch/sparc64/compile/GENERIC > real mem = 1073741824 (1024MB) > avail mem = 1039228928 (991MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root: Sun Fire V120 (UltraSPARC-IIe 648MHz) > cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 3.3) @ 648 MHz > cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 512K external (64 > b/l) > psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0 > psycho0: bus range 0-2, PCI bus 0 > psycho0: dvma map c000-dfff > pci0 at psycho0 > ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x13 > pci1 at ppb0 bus 1 > ebus0 at pci1 dev 12 function 0 "Sun RIO EBus" rev 0x01 > "flashprom" at ebus0 addr 0-f not configured > clock1 at ebus0 addr 0-1fff: mk48t59 > lom0 at ebus0 addr 20-23 ivec 0x2a: LOMlite2 rev 3.12 > alipm0 at pci1 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz clock > iic0 at alipm0 > "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs > spdmem0 at iic0 addr 0x54: 512MB SDRAM registered ECC PC133CL2 > spdmem1 at iic0 addr 0x55: 512MB SDRAM registered ECC PC133CL2 > ebus1 at pci1 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00 > power0 at ebus1 addr 2000-2007 ivec 0x25 > com0 at ebus1 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo > com0: console > com1 at ebus1 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo > gem0 at pci1 dev 12 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7c6, address > 00:03:ba:2b:47:70 > ukphy0 at gem0 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI > 0x0010dd, model 0x0002 > ohci0 at pci1 dev 12 function 3 "Sun USB" rev 0x01: ivec 0x7e4, version 1.0, > legacy support > pciide0 at pci1 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA, > channel 0 configured to native-PCI, channel 1 configured to native-PCI > pciide0: using ivec 0x7cc for native-PCI interrupt > atapiscsi0 at pciide0 channel 0 drive 0 > scsibus1 at atapiscsi0: 2 targets > cd0 at scsibus1 targ 0 lun 0: ATAPI 5/cdrom removable > cd0(pciide0:0:0): using PIO mode 4, DMA mode 2 > pciide0: channel 1 disabled (no drives) > gem1 at pci1 dev 5 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7dc, address > 00:03:ba:2b:47:71 > ukphy1 at gem1 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI > 0x0010dd, model 0x0002 > ohci1 at pci1 dev 5 function 3 "Sun USB" rev 0x01: ivec 0x7e6, version 1.0, > legacy support > usb0 at ohci0: USB revision 1.0 > uhub0 at usb0 "Sun OHCI root hub" rev 1.00/1.00 addr 1 > usb1 at ohci1: USB revision 1.0 > uhub1 at usb1 "Sun OHCI root hub" rev 1.00/1.00 addr 1 > ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x13 > pci2 at ppb1 bus 2 > siop0 at pci2 dev 8 function 0 "Symbios Logic 53c896" rev 0x07: ivec 0x7e0, > using 8K of on-board RAM > scsibus2 at siop0: 16 targets, initiator 7 > sym0 at scsibus2 targ 0 lun 0: SCSI3 > 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN8731804D9 > sd0 at scsibus0 targ 0 lun 0: SCSI3 0/direct > fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN8731804D9 > sd0: 34732MB, 512 bytes/sector, 71132959 sectors > probe(siop0:1:0): Check Condition (error 0x70) on opcode 0x0 >SENSE KEY: Hardware Error > ASC/ASCQ: Defect List Error > FRU CODE: 0x7 > sym1 at scsibus2 targ 1 lun 0: SCSI3 > 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0BZL12316NCUL > sd1 at scsibus0 targ 1 lun 0: SCSI3 0/direct > fixed serial.SEAGATE_ST336607LSUN36G_3JA0BZL12316NCUL > siop1 at pci2 d