On Jun 4, 2010, at 3:35 AM, Pyun YongHyeon wrote:

> On Thu, Jun 03, 2010 at 09:29:20AM +0300, Nikolay Denev wrote:
>> On May 24, 2010, at 8:12 PM, Pyun YongHyeon wrote:
>> 
>>> On Mon, May 24, 2010 at 09:48:33AM -0400, John Baldwin wrote:
>>>> On Monday 24 May 2010 6:35:01 am Nikolay Denev wrote:
>>>>> On May 24, 2010, at 8:57 AM, Nikolay Denev wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Recently I started to experience a if_sge(4) related panic.
>>>>>> It happens almost every time I try to download a torrent file for 
>>>>>> example.
>>>>>> Copying of large files over NFS seem not to trigger it, but I haven't 
>>>>>> tested extensively.
>>>>>> 
>>>>>> Here is the panic message :
>>>>>> 
>>>>>> Fatal trap 12: page fault while in kernel mode
>>>>>> cpuid = 0; apic id = 00
>>>>>> fault virtual address            = 0x8
>>>>>> fault code                               = supervisor write data, page 
>>>>>> not present
>>>>>> instruction pointer              = 0x20:0xffffffff80230413
>>>>>> stack pointer                            = 0x28:0xffffff80001e9280
>>>>>> frame pointer                    = 0x28:0xffffff80001e9510
>>>>>> code segment                     = base 0x0, limit 0xfffff, type 0x1b
>>>>>>                                          = DPL 0, pres 1, long 1, def32 
>>>>>> 0, gran 1
>>>>>> processor eflags                 = interrupt enabled, resume, IOPL = 0
>>>>>> current process                  = 12 (irq19: sge0)
>>>>>> trap number                              = 12
>>>>>> panic: page fault
>>>>>> cpuid = 0
>>>>>> Uptime: 1d20h56m20s
>>>>>> Cannot dump. Device not defined or unavailable
>>>>>> Automatic reboot in 15 seconds - press a key on the console to abort
>>>>>> Sleeping thread (tid 100039, pid 12) owns a non-sleepable lock
>>>>>> 
>>>>>> My swap is on a zvol, so I don't have dump. I'll try to attach a disk on 
>>>>>> the eSATA port and dump there if needed.
>>>>> 
>>>>> Here is some info from the crashdump :
>>>>> 
>>>>> (kgdb) #0  doadump () at pcpu.h:223
>>>>> #1  0xffffffff802fb149 in boot (howto=260)
>>>>>   at /usr/src/sys/kern/kern_shutdown.c:416
>>>>> #2  0xffffffff802fb57c in panic (fmt=0xffffffff8055d564 "%s")
>>>>>   at /usr/src/sys/kern/kern_shutdown.c:590
>>>>> #3  0xffffffff805055b8 in trap_fatal (frame=0xffffff000288a3e0, 
>>>>> eva=Variable "eva" is not available.
>>>>> )
>>>>>   at /usr/src/sys/amd64/amd64/trap.c:777
>>>>> #4  0xffffffff805059dc in trap_pfault (frame=0xffffff80001e91d0, 
>>>>> usermode=0)
>>>>>   at /usr/src/sys/amd64/amd64/trap.c:693
>>>>> #5  0xffffffff805061c5 in trap (frame=0xffffff80001e91d0)
>>>>>   at /usr/src/sys/amd64/amd64/trap.c:451
>>>>> #6  0xffffffff804eb977 in calltrap ()
>>>>>   at /usr/src/sys/amd64/amd64/exception.S:223
>>>>> #7  0xffffffff80230413 in sge_start_locked (ifp=0xffffff000270d800)
>>>>>   at /usr/src/sys/dev/sge/if_sge.c:1591
>>>> 
>>>> Try this.  sge_encap() can sometimes return an error with m_head set to 
>>>> NULL:
>>>> 
>>> 
>>> Thanks John. Committed in r208512.
>>> 
>>>> Index: if_sge.c
>>>> ===================================================================
>>>> --- if_sge.c       (revision 208375)
>>>> +++ if_sge.c       (working copy)
>>>> @@ -1588,7 +1588,8 @@
>>>>            if (m_head == NULL)
>>>>                    break;
>>>>            if (sge_encap(sc, &m_head)) {
>>>> -                  IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
>>>> +                  if (m_head != NULL)
>>>> +                          IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
>>>>                    ifp->if_drv_flags |= IFF_DRV_OACTIVE;
>>>>                    break;
>>>>            }
>>>> 
>>>> -- 
>>>> John Baldwin
>> 
>> After the patch I experienced several network outages (ping reporting "no 
>> buffer space available")
>> that were resolved by ifconfig down/up of the sge(4) interface.
>> 
> 
> Because I don't have access to sge(4) controllers I never had chance
> to run it. Does ping(8) generates "no buffer space available" when
> the system is in idle state? Could you show me more information on
> how you checked network outages?
> 

It happened 4-5 times recently. I didn't do extensive investigation, but yes, 
ping
returned "no buffer space avail" when I tried pinging from the machine itself.
It was unreachable from other hosts on the network.
I'm not sure what you bean by idle state but there was a torrent client running
on the machine, which printed errors about inability to reach peers.


>> I can see that most of the other drivers that handle XXX_encap() returning 
>> m_head pointing NULL, break when this condition
> 
> Yes, most drivers written/touched by me behaves like that.
> 
>> is hit: i.e. :
>> 
>> Index: if_sge.c
>> ===================================================================
>> --- if_sge.c (revision 208375)
>> +++ if_sge.c (working copy)
>> @@ -1588,7 +1588,8 @@
>>              if (m_head == NULL)
>>                      break;
>>              if (sge_encap(sc, &m_head)) {
>> -                    IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
>> +                    if (m_head == NULL)
>> +                            break;
>>                      IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
>>                      ifp->if_drv_flags |= IFF_DRV_OACTIVE;
>>                      break;
>>              }
>> 
>> But here in sge(4) we always set IFF_DRV_OACTIVE.
>> Do you think this can be the source of the problem ?
>> 
> 
> More correct way to set IFF_DRV_OACTIVE would be check the number
> of queued frames or just exit the transmit loop. If there is no
> queued frames, IFF_DRV_OACTIVE would never be cleared which in turn
> cause ENOBUFS in ping(8). I think your change looks more reasonable
> to me. Do you still see the same issue with the change you suggested?

I'm runing with this change for a day or something now without any issues.


Thanks,
Niki_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to