On Jun 4, 2010, at 3:35 AM, Pyun YongHyeon wrote: > On Thu, Jun 03, 2010 at 09:29:20AM +0300, Nikolay Denev wrote: >> On May 24, 2010, at 8:12 PM, Pyun YongHyeon wrote: >> >>> On Mon, May 24, 2010 at 09:48:33AM -0400, John Baldwin wrote: >>>> On Monday 24 May 2010 6:35:01 am Nikolay Denev wrote: >>>>> On May 24, 2010, at 8:57 AM, Nikolay Denev wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Recently I started to experience a if_sge(4) related panic. >>>>>> It happens almost every time I try to download a torrent file for >>>>>> example. >>>>>> Copying of large files over NFS seem not to trigger it, but I haven't >>>>>> tested extensively. >>>>>> >>>>>> Here is the panic message : >>>>>> >>>>>> Fatal trap 12: page fault while in kernel mode >>>>>> cpuid = 0; apic id = 00 >>>>>> fault virtual address = 0x8 >>>>>> fault code = supervisor write data, page >>>>>> not present >>>>>> instruction pointer = 0x20:0xffffffff80230413 >>>>>> stack pointer = 0x28:0xffffff80001e9280 >>>>>> frame pointer = 0x28:0xffffff80001e9510 >>>>>> code segment = base 0x0, limit 0xfffff, type 0x1b >>>>>> = DPL 0, pres 1, long 1, def32 >>>>>> 0, gran 1 >>>>>> processor eflags = interrupt enabled, resume, IOPL = 0 >>>>>> current process = 12 (irq19: sge0) >>>>>> trap number = 12 >>>>>> panic: page fault >>>>>> cpuid = 0 >>>>>> Uptime: 1d20h56m20s >>>>>> Cannot dump. Device not defined or unavailable >>>>>> Automatic reboot in 15 seconds - press a key on the console to abort >>>>>> Sleeping thread (tid 100039, pid 12) owns a non-sleepable lock >>>>>> >>>>>> My swap is on a zvol, so I don't have dump. I'll try to attach a disk on >>>>>> the eSATA port and dump there if needed. >>>>> >>>>> Here is some info from the crashdump : >>>>> >>>>> (kgdb) #0 doadump () at pcpu.h:223 >>>>> #1 0xffffffff802fb149 in boot (howto=260) >>>>> at /usr/src/sys/kern/kern_shutdown.c:416 >>>>> #2 0xffffffff802fb57c in panic (fmt=0xffffffff8055d564 "%s") >>>>> at /usr/src/sys/kern/kern_shutdown.c:590 >>>>> #3 0xffffffff805055b8 in trap_fatal (frame=0xffffff000288a3e0, >>>>> eva=Variable "eva" is not available. >>>>> ) >>>>> at /usr/src/sys/amd64/amd64/trap.c:777 >>>>> #4 0xffffffff805059dc in trap_pfault (frame=0xffffff80001e91d0, >>>>> usermode=0) >>>>> at /usr/src/sys/amd64/amd64/trap.c:693 >>>>> #5 0xffffffff805061c5 in trap (frame=0xffffff80001e91d0) >>>>> at /usr/src/sys/amd64/amd64/trap.c:451 >>>>> #6 0xffffffff804eb977 in calltrap () >>>>> at /usr/src/sys/amd64/amd64/exception.S:223 >>>>> #7 0xffffffff80230413 in sge_start_locked (ifp=0xffffff000270d800) >>>>> at /usr/src/sys/dev/sge/if_sge.c:1591 >>>> >>>> Try this. sge_encap() can sometimes return an error with m_head set to >>>> NULL: >>>> >>> >>> Thanks John. Committed in r208512. >>> >>>> Index: if_sge.c >>>> =================================================================== >>>> --- if_sge.c (revision 208375) >>>> +++ if_sge.c (working copy) >>>> @@ -1588,7 +1588,8 @@ >>>> if (m_head == NULL) >>>> break; >>>> if (sge_encap(sc, &m_head)) { >>>> - IFQ_DRV_PREPEND(&ifp->if_snd, m_head); >>>> + if (m_head != NULL) >>>> + IFQ_DRV_PREPEND(&ifp->if_snd, m_head); >>>> ifp->if_drv_flags |= IFF_DRV_OACTIVE; >>>> break; >>>> } >>>> >>>> -- >>>> John Baldwin >> >> After the patch I experienced several network outages (ping reporting "no >> buffer space available") >> that were resolved by ifconfig down/up of the sge(4) interface. >> > > Because I don't have access to sge(4) controllers I never had chance > to run it. Does ping(8) generates "no buffer space available" when > the system is in idle state? Could you show me more information on > how you checked network outages? >
It happened 4-5 times recently. I didn't do extensive investigation, but yes, ping returned "no buffer space avail" when I tried pinging from the machine itself. It was unreachable from other hosts on the network. I'm not sure what you bean by idle state but there was a torrent client running on the machine, which printed errors about inability to reach peers. >> I can see that most of the other drivers that handle XXX_encap() returning >> m_head pointing NULL, break when this condition > > Yes, most drivers written/touched by me behaves like that. > >> is hit: i.e. : >> >> Index: if_sge.c >> =================================================================== >> --- if_sge.c (revision 208375) >> +++ if_sge.c (working copy) >> @@ -1588,7 +1588,8 @@ >> if (m_head == NULL) >> break; >> if (sge_encap(sc, &m_head)) { >> - IFQ_DRV_PREPEND(&ifp->if_snd, m_head); >> + if (m_head == NULL) >> + break; >> IFQ_DRV_PREPEND(&ifp->if_snd, m_head); >> ifp->if_drv_flags |= IFF_DRV_OACTIVE; >> break; >> } >> >> But here in sge(4) we always set IFF_DRV_OACTIVE. >> Do you think this can be the source of the problem ? >> > > More correct way to set IFF_DRV_OACTIVE would be check the number > of queued frames or just exit the transmit loop. If there is no > queued frames, IFF_DRV_OACTIVE would never be cleared which in turn > cause ENOBUFS in ping(8). I think your change looks more reasonable > to me. Do you still see the same issue with the change you suggested? I'm runing with this change for a day or something now without any issues. Thanks, Niki_______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"