Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Edwin
Max Laier ([EMAIL PROTECTED]) wrote:
> 
> Edwin, what do you have for CFLAGS?  Can you try to downgrade to "-O" for now 
> so that we have a better chance to get a full view?
> 

Max, 

I have no CFLAGS or COPTFLAGS in /etc/make.conf - this was a basic
kern-developer install on a blank PC. The only thing that's a little 
different about the box that i use to compile is that it's a dual
processor machine - but no -j# options used in compilation of the kernel.

the compile is proceding with the following as an example output 
from make/cc

$ grep netinet /tmp/make.DEBUG1.output |grep fastfwd
cc -c -O -pipe  -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -fformat-extensions -std=c99 -g -nostdinc -I-  -I. 
-I/usr/src/sys -I/usr/src/sys/contrib/dev/
acpica -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter 
-I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib
/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm 
-I/usr/src/sys/dev/twa -D_KERNEL -incl
ude opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000 
 -mno-align-long-strings -mpreferred-stack-boundary=2  -mno-mmx -mno-3dnow 
-mno-sse -mno-sse2 -ffreestanding -Werror  /usr/src/sys/netinet/ip_fastfwd.c
$ 

are you referring to the -fformat-extensions, -fno-common and -finline...etc
optimizations as well? or just the -O v. -O2/-O3/-Os one? 

If yes to the -f* optimizations - besides commenting out parts of the makefiles
- is there a 'normal' way to disable them?

FWIW - I also had (I think) the same problem with the 5.3 release - but I 
never worked it out - just other things on my plate, so I don't believe it's
a recent code change (ie. 5.4 timeframe) if it does turn out to be a code 
change.

it also has something to do with the load on the box - I'm testing with
small udp packets (using iperf) - if I step up the size - I have to step
up the bandwidth in order to cause the panic. 


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Max Laier
On Sunday 24 July 2005 17:42, Simon 'corecode' Schubert wrote:
> On 24.07.2005, at 16:19, Edwin wrote:
> > (kgdb) f 13
> > #13 0xc068f6e9 in ip_fastforward (m=0xc12e2300) at
> > /usr/src/sys/netinet/ip_fastfwd.c:572
> > (kgdb) i loc
> > ip = (struct ip *) 0xc12f000e
> > m0 = (struct mbuf *) 0xc12f000e
> > ro = {ro_rt = 0xc11ee420, ro_dst = {sa_len = 16 '\020', sa_family = 2
> > '\002',
> > sa_data = "\000\000ˬ\002\005\000\000\000\000\000\000\000"}}
> > dst = (struct sockaddr_in *) 0xc76bfc3c
> > ia = (struct in_ifaddr *) 0x0
> > ifa = (struct ifaddr *) 0x0
> > ifp = (struct ifnet *) 0xc0f91800
> > odest = {s_addr = 84060352}
> > dest = {s_addr = 84060352}
> > sum = 0
> > ip_len = 0
> > error = 84060352
> > hlen = -1057417216
> > mtu = 0
> > __func__ = "ip_fastforward"
>
> error == 84060352 == dest.s_addr
> hlen == -1057417216 == 0xc0f91800 == ifp
>
> > (kgdb) f 12
> > #12 0xc0692b74 in ip_fragment (ip=0xc12f000e, m_frag=0xc76bfc6c,
> > mtu=-1056775680, if_hwassist_flags=0, sw_csum=1)
> > at /usr/src/sys/netinet/ip_output.c:967
> > 967 m->m_next = m_copy(m0, off, len);
> > (kgdb) i loc
> > mhip = (struct ip *) 0xc102e240
> > m = (struct mbuf *) 0xc102e200
> > mhlen = 20
> > error = 0
> > hlen = 20
> > len = 1480
> > off = 1500
> > m0 = (struct mbuf *) 0xc12e2300
> > firstlen = 1480
> > mnext = (struct mbuf **) 0xc12e2304
> > nfrags = 1
>
> mtu (parameter) == -1056775680 == 0xc102e200 == m
>
> your stack (or gdb) seems seriously broken

Not necessarily.  This can well be an effect of higher optimization levels.

Edwin, what do you have for CFLAGS?  Can you try to downgrade to "-O" for now 
so that we have a better chance to get a full view?

-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


pgpX2oM3GC74E.pgp
Description: PGP signature


Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Simon 'corecode' Schubert

On 24.07.2005, at 16:19, Edwin wrote:

(kgdb) f 13
#13 0xc068f6e9 in ip_fastforward (m=0xc12e2300) at 
/usr/src/sys/netinet/ip_fastfwd.c:572

(kgdb) i loc
ip = (struct ip *) 0xc12f000e
m0 = (struct mbuf *) 0xc12f000e
ro = {ro_rt = 0xc11ee420, ro_dst = {sa_len = 16 '\020', sa_family = 2 
'\002',

sa_data = "\000\000ˬ\002\005\000\000\000\000\000\000\000"}}
dst = (struct sockaddr_in *) 0xc76bfc3c
ia = (struct in_ifaddr *) 0x0
ifa = (struct ifaddr *) 0x0
ifp = (struct ifnet *) 0xc0f91800
odest = {s_addr = 84060352}
dest = {s_addr = 84060352}
sum = 0
ip_len = 0
error = 84060352
hlen = -1057417216
mtu = 0
__func__ = "ip_fastforward"


error == 84060352 == dest.s_addr
hlen == -1057417216 == 0xc0f91800 == ifp


(kgdb) f 12
#12 0xc0692b74 in ip_fragment (ip=0xc12f000e, m_frag=0xc76bfc6c, 
mtu=-1056775680, if_hwassist_flags=0, sw_csum=1)

at /usr/src/sys/netinet/ip_output.c:967
967 m->m_next = m_copy(m0, off, len);
(kgdb) i loc
mhip = (struct ip *) 0xc102e240
m = (struct mbuf *) 0xc102e200
mhlen = 20
error = 0
hlen = 20
len = 1480
off = 1500
m0 = (struct mbuf *) 0xc12e2300
firstlen = 1480
mnext = (struct mbuf **) 0xc12e2304
nfrags = 1


mtu (parameter) == -1056775680 == 0xc102e200 == m

your stack (or gdb) seems seriously broken

cheers
  simon

--
Serve - BSD +++  RENT this banner advert  +++ASCII Ribbon   /"\
Work - Mac  +++  space for low $$$ NOW!1  +++  Campaign \ /
Party Enjoy Relax   |   http://dragonflybsd.org  Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz   Mail + News   / \



PGP.sig
Description: This is a digitally signed message part


Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Edwin
New kernel: ident D1-0723 (same as D1-0722 - but w/ IPFIREWALL* options removed)

same traces asked for previously.

Thanks again,
/Edwin



kgdb kernel.debug /usr/local/STORAGE/crash/vmcore.1
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
#0  doadump () at pcpu.h:159
159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) where
#0  doadump () at pcpu.h:159
#1  0xc0460ef6 in db_fncall (dummy1=0, dummy2=0, dummy3=43, dummy4=0xc76bf9f4 
"(úkÇ")
at /usr/src/sys/ddb/db_command.c:531
#2  0xc0460d04 in db_command (last_cmdp=0xc08be624, cmd_table=0x0, 
aux_cmd_tablep=0xc083e324, 
aux_cmd_tablep_end=0xc083e340) at /usr/src/sys/ddb/db_command.c:349
#3  0xc0460dcc in db_command_loop () at /usr/src/sys/ddb/db_command.c:455
#4  0xc0462951 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
#5  0xc06277f2 in kdb_trap (type=3, code=0, tf=0xc76bfb30) at 
/usr/src/sys/kern/subr_kdb.c:468
#6  0xc07ad874 in trap (frame=
  {tf_fs = -949288936, tf_es = -1067319280, tf_ds = -1065287664, tf_edi = 
1, tf_esi = -1065233792, tf_ebp = -949224592, tf_isp = -949224612, tf_ebx = 
-949224548, tf_edx = 0, tf_ecx = -1060921344, tf_eax = 18, tf_trapno = 3, 
tf_err = 0, tf_eip = -1067289229, tf_cs = -1065287672, tf_eflags = 658, tf_esp 
= -949224560, tf_ss = -1067377425})
at /usr/src/sys/i386/i386/trap.c:584
#7  0xc079deaa in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#8  0xc76b0018 in ?? ()
#9  0xc0620010 in sched_runnable () at /usr/src/sys/kern/sched_4bsd.c:641
#10 0xc0611cef in panic (fmt=0xc081d280 "m_copym, offset > size of mbuf chain")
at /usr/src/sys/kern/kern_shutdown.c:550
#11 0xc064172c in m_copym (m=0x0, off0=1500, len=1480, wait=1) at 
/usr/src/sys/kern/uipc_mbuf.c:385
#12 0xc0692b74 in ip_fragment (ip=0xc12f000e, m_frag=0xc76bfc6c, 
mtu=-1056775680, if_hwassist_flags=0, sw_csum=1)
at /usr/src/sys/netinet/ip_output.c:967
#13 0xc068f6e9 in ip_fastforward (m=0xc12e2300) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
#14 0xc0672759 in ether_demux (ifp=0xc0f9, m=0xc12e2300) at 
/usr/src/sys/net/if_ethersubr.c:770
#15 0xc06724f5 in ether_input (ifp=0xc0f9, m=0xc12e2300) at 
/usr/src/sys/net/if_ethersubr.c:631
#16 0xc070a9e7 in sis_rxeof (sc=0xc0f9) at /usr/src/sys/pci/if_sis.c:1636
#17 0xc070ae6f in sis_intr (arg=0xc0f9) at /usr/src/sys/pci/if_sis.c:1841
#18 0xc0600130 in ithread_loop (arg=0xc0ec6880) at 
/usr/src/sys/kern/kern_intr.c:547
#19 0xc05ff5a4 in fork_exit (callout=0xc06c , arg=0xc0ec6880, 
frame=0xc76bfd48)
at /usr/src/sys/kern/kern_fork.c:791
#20 0xc079df0c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209
(kgdb) f 13
#13 0xc068f6e9 in ip_fastforward (m=0xc12e2300) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
(kgdb) l
567 m->m_pkthdr.csum_flags |= CSUM_IP;
568 /*
569  * ip_fragment expects ip_len and ip_off in 
host byte
570  * order but returns all packets in network 
byte order
571  */
572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
573 (~ifp->if_hwassist & 
CSUM_DELAY_IP))) {
574 goto drop;
575 }
576 KASSERT(m != NULL, ("null mbuf and no error"));
(kgdb) i loc
ip = (struct ip *) 0xc12f000e
m0 = (struct mbuf *) 0xc12f000e
ro = {ro_rt = 0xc11ee420, ro_dst = {sa_len = 16 '\020', sa_family = 2 '\002', 
sa_data = "\000\000ˬ\002\005\000\000\000\000\000\000\000"}}
dst = (struct sockaddr_in *) 0xc76bfc3c
ia = (struct in_ifaddr *) 0x0
ifa = (struct ifaddr *) 0x0
ifp = (struct ifnet *) 0xc0f91800
odest = {s_addr = 84060352}
dest = {s_addr = 84060352}
sum = 0
ip_len = 0
error = 84060352
hlen = -1057417216
mtu = 0
__func__ = "ip_fastforward"
(kgdb) p *ip
$1 = {ip_hl = 5, ip_v = 4, ip_tos = 0 '\0', ip_len = 10240, ip_id = 33436, 
ip_off = 0, ip_ttl = 64 '@', 
  ip_p = 17 '\021', ip_sum = 59733, ip_src = {s_addr = 67479744}, ip_dst = 
{s_addr = 84060352}}
(kgdb) p *m
$2 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xc12f000e "E", 
mh_len = 40, mh_flags = 3, 
mh_type = 1}, M_dat = {MH = {MH_pkthdr = {rcvif = 0xc0f9, len = 40, 
header = 0x0, csum_flags = 769, 
csum_data = 0, tags = {slh_first = 0x0}}, MH_dat = {MH_ext = {ext_buf = 
0xc12f "", ext_free = 0, 
  ext_args = 

Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Max Laier
On Sunday 24 July 2005 04:38, Edwin wrote:
> If I understand correctly...(albeit an overly brief understanding :))
>
> 1. ethernet packet comes in - stuck into an mbuf
> 2. ether_demux calls ip_fastforward passing the mbuf struct
> 3. mbuf struct is copied/munged into ip struct by mtod
> 4. ntohs is called to change ip->ip_len to host byte order
>   incidentally - ip_len should be set to ntohs(ip->ip_len)
>   as well - it seems like neither one of those calls worked?
> 5. also - the call to set hlen to ip->ip_hl <<2 didn't work out well
>   either - right? since hlen = -1057417216, and i think it's
>   supposed to be 20 (5*4) - am I correct there as well?

4. and 5. are strange but not of too much significance.  Given that we got 
through the initial sanity checks and that neither is used further down, this 
might jut be an optimization effect.  You could try to mark ip_len as 
volatile.

> 6. due to ip->ip_len being in network byte order still a little
>   gremlin helps us to think we have a 10240 byte packet and we
>   need to fragment it...
> 7. in ip_fragment - ip->ip_len is still 10240 - so we assume that we
>   need to make several fragments - however, the mbuf is correct
>   (len = 40)
> 8. in ip_fragment - to create the 'second' fragment, we try to copy
>   1480 bytes @ offset 1500 out of the mbuf that only has a valid
>   data length of 40-bytes???

That's what happens, yes.

> Are we really looking for the cause of ip->ip_len not being in the correct
> order @ the right time then? - in that case - there's two possibilities
> that I see - and I don't think that ntohs not working (1) is too realistic,
> so I would suppose we are looking for what flipped it in the first place?
>
>   1. either ntohs didn't work for some reason, or
>   2. it was already in host order, and the ntohs call flipped it back to
>   network order

Neither seems very likely.  My guess is really *something* along the way 
messing things up - pfil is the only suspect I have, right now.

> If you feel that it's a ipfw/ipfil issue - I can easily take IPFIREWALL*
> options out of the kernel and build a new one - just give me about 15
> minutes.

Yes please and make sure it isn't loaded as a module either.

-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


pgprNotDzk7x2.pgp
Description: PGP signature


Re: help w/panic under heavy load - 5.4

2005-07-23 Thread Edwin
Max/et.al.,

replies to your message in-line below...

If I understand correctly...(albeit an overly brief understanding :))

1. ethernet packet comes in - stuck into an mbuf
2. ether_demux calls ip_fastforward passing the mbuf struct 
3. mbuf struct is copied/munged into ip struct by mtod
4. ntohs is called to change ip->ip_len to host byte order
incidentally - ip_len should be set to ntohs(ip->ip_len) 
as well - it seems like neither one of those calls worked?
5. also - the call to set hlen to ip->ip_hl <<2 didn't work out well 
either - right? since hlen = -1057417216, and i think it's 
supposed to be 20 (5*4) - am I correct there as well?
6. due to ip->ip_len being in network byte order still a little 
gremlin helps us to think we have a 10240 byte packet and we
need to fragment it...
7. in ip_fragment - ip->ip_len is still 10240 - so we assume that we
need to make several fragments - however, the mbuf is correct
(len = 40)
8. in ip_fragment - to create the 'second' fragment, we try to copy 
1480 bytes @ offset 1500 out of the mbuf that only has a valid 
data length of 40-bytes???

Are we really looking for the cause of ip->ip_len not being in the correct
order @ the right time then? - in that case - there's two possibilities that
I see - and I don't think that ntohs not working (1) is too realistic, so
I would suppose we are looking for what flipped it in the first place?

1. either ntohs didn't work for some reason, or 
2. it was already in host order, and the ntohs call flipped it back to
network order

If you feel that it's a ipfw/ipfil issue - I can easily take IPFIREWALL* options
out of the kernel and build a new one - just give me about 15 minutes.

cheers. /edwin


Max Laier ([EMAIL PROTECTED]) wrote:
> On Saturday 23 July 2005 20:41, Edwin wrote:
> > Kernel name: D1-0722 (for reference)
> >
> > mbsd05# kgdb kernel.debug /usr/local/STORAGE/crash/vmcore.5
> > #13 0xc06933c1 in ip_fastforward (m=0xc12e6c00) at
> > /usr/src/sys/netinet/ip_fastfwd.c:572 warning: Source file is more recent
> > than executable.
> 
> Let's hope that's still correct ...
> 

it is - result of manual patch application and removal - just the 
timestamp/dates on the file are different (verified by
diff from clean source tree just now to make sure again.

> > 572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
> > (kgdb) l
> > 567 m->m_pkthdr.csum_flags |= CSUM_IP;
> > 568 /*
> > 569  * ip_fragment expects ip_len and ip_off in 
> > host byte
> > 570  * order but returns all packets in network 
> > byte order
> > 571  */
> > 572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
> > 573 (~ifp->if_hwassist & 
> > CSUM_DELAY_IP))) {
> > 574 goto drop;
> > 575 }
> > 576 KASSERT(m != NULL, ("null mbuf and no error"));
> > (kgdb) i loc
> > ip = (struct ip *) 0xc12f700e
> > m0 = (struct mbuf *) 0xc12f700e
> > ro = {ro_rt = 0xc11f8420, ro_dst = {sa_len = 16 '\020', sa_family = 2
> > '\002', sa_data = "\000\000ˬ\002\005\000\000\000\000\000\000\000"}}
> > dst = (struct sockaddr_in *) 0xc76bfc3c
> > ia = (struct in_ifaddr *) 0x0
> > ifa = (struct ifaddr *) 0x0
> > ifp = (struct ifnet *) 0xc0f91800
> > odest = {s_addr = 84060352}
> > dest = {s_addr = 84060352}
> > sum = 0
> > ip_len = 0
> 
> This should not happen. ip_len is initialize from ntohs(ip->ip_len) and never 
> touched again.  Anyway, let's look some more ...

is it accurate to say that ip->ip_len is 10240 @ this point - but it should be 
40?

at line 542 of ip_fastfwd.c 1.17.2.7...

the ip->ip_len <= mtu should eval to true and fall through to the true case - 
but it
falls through to false (hence the ip_fragment section) - b/c it is still in 
network order?

   if (ip->ip_len <= mtu ||
(ifp->if_hwassist & CSUM_FRAGMENT && (ip->ip_off & IP_DF) == 0)) {
/*
 * Restore packet header fields to original values
 */
ip->ip_len = htons(ip->ip_len);
ip->ip_off = htons(ip->ip_off);
/*
 * Send off the packet via outgoing interface
 */
error = (*ifp->if_output)(ifp, m,
(struct sockaddr *)dst, ro.ro_rt);
} else {
/*
 * Handle EMSGSIZE with icmp reply needfrag for TCP MTU 
discovery
 */
if (ip->ip_off & IP_DF) {
ipstat.ips_cantfrag++;
icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG,
0, ifp);
goto consumed;
} else {
   

Re: help w/panic under heavy load - 5.4

2005-07-23 Thread Max Laier
On Saturday 23 July 2005 20:41, Edwin wrote:
> Kernel name: D1-0722 (for reference)
>
> mbsd05#   kgdb kernel.debug /usr/local/STORAGE/crash/vmcore.5
> #13 0xc06933c1 in ip_fastforward (m=0xc12e6c00) at
> /usr/src/sys/netinet/ip_fastfwd.c:572 warning: Source file is more recent
> than executable.

Let's hope that's still correct ...

> 572   if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
> (kgdb) l
> 567   m->m_pkthdr.csum_flags |= CSUM_IP;
> 568   /*
> 569* ip_fragment expects ip_len and ip_off in 
> host byte
> 570* order but returns all packets in network 
> byte order
> 571*/
> 572   if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
> 573   (~ifp->if_hwassist & 
> CSUM_DELAY_IP))) {
> 574   goto drop;
> 575   }
> 576   KASSERT(m != NULL, ("null mbuf and no error"));
> (kgdb) i loc
> ip = (struct ip *) 0xc12f700e
> m0 = (struct mbuf *) 0xc12f700e
> ro = {ro_rt = 0xc11f8420, ro_dst = {sa_len = 16 '\020', sa_family = 2
> '\002', sa_data = "\000\000ˬ\002\005\000\000\000\000\000\000\000"}}
> dst = (struct sockaddr_in *) 0xc76bfc3c
> ia = (struct in_ifaddr *) 0x0
> ifa = (struct ifaddr *) 0x0
> ifp = (struct ifnet *) 0xc0f91800
> odest = {s_addr = 84060352}
> dest = {s_addr = 84060352}
> sum = 0
> ip_len = 0

This should not happen. ip_len is initialize from ntohs(ip->ip_len) and never 
touched again.  Anyway, let's look some more ...

> error = 84060352
> hlen = -1057417216
> mtu = 0
> __func__ = "ip_fastforward"
> (kgdb) p *ip
> $1 = {ip_hl = 5, ip_v = 4, ip_tos = 0 '\0', ip_len = 10240, ip_id = 61249,

ip_len should be 40 as ip_len is supposed to be in HOST BYTE ORDER at this 
point.  Feeding 10240 to ntohs() give the correct value, so something 
obviously went wrong.

Let's see how we got here:
355 does the byteorder flip to host byte order
366 pfil OUT
451 pfil IN
527 first check ip_len < if_mtu etc ...

Obviously, the only thing that might mess with the byte order (unless I missed 
something along the way) is one of the pfil consumers.

***
*** What firewall(s) are you running with?
***

> ip_off = 0, ip_ttl = 63 '?', ip_p = 17 '\021', ip_sum = 31921, ip_src =
> {s_addr = 67479744}, ip_dst = {s_addr = 84060352}} (kgdb) p *m
> $2 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xc12f700e "E",
> mh_len = 40, mh_flags = 3, mh_type = 1}, M_dat = {MH = {MH_pkthdr = {rcvif
> = 0xc0f9, len = 40, header = 0x0, csum_flags = 769, csum_data = 0, tags

40, there you have it - no need to fragment at all!

> /usr/src/sys/netinet/ip_output.c:967
> 967   m->m_next = m_copy(m0, off, len);
> (kgdb) l
> 962   len = ip->ip_len - off;
> 963   m->m_flags |= M_LASTFRAG;
> 964   } else
> 965   mhip->ip_off |= IP_MF;
> 966   mhip->ip_len = htons((u_short)(len + mhlen));
> 967   m->m_next = m_copy(m0, off, len);
> 968   if (m->m_next == NULL) {/* copy failed */
> 969   m_free(m);
> 970   error = ENOBUFS;/* ??? */
> 971   ipstat.ips_odropped++;

Just to make sure, we didn't touch the original packet at this point so the 
above values are still the ones we based the (wrong) decision on.

-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


pgpcAgQ9uUdUi.pgp
Description: PGP signature


Re: help w/panic under heavy load - 5.4

2005-07-23 Thread Edwin
Max Laier ([EMAIL PROTECTED]) wrote:
> On Saturday 23 July 2005 15:53, Edwin wrote:
> 
> Can we see one complete picture, please.  This includes:
> 
>   A trace
>   local vars in ip_fastforward including unfolded ip, m, ro.ro_rt and ifp.
>   local vars in ip_fragment().
> 
> Thanks.
> 
> -- 
> /"\  Best regards,  | [EMAIL PROTECTED]
> \ /  Max Laier  | ICQ #67774661
>  X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
> / \  ASCII Ribbon Campaign  | Against HTML Mail and News

Absolutely ;) - Thanks for taking the time! 

cheers. /edwin


Kernel name: D1-0722 (for reference)


mbsd05# kgdb kernel.debug /usr/local/STORAGE/crash/vmcore.5

[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
#0  doadump () at pcpu.h:159
159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) where
#0  doadump () at pcpu.h:159
#1  0xc04611f6 in db_fncall (dummy1=0, dummy2=0, dummy3=43, dummy4=0xc76bf9f4 
"(úkÇ")
at /usr/src/sys/ddb/db_command.c:531
#2  0xc0461004 in db_command (last_cmdp=0xc08c9264, cmd_table=0x0, 
aux_cmd_tablep=0xc08483b8, 
aux_cmd_tablep_end=0xc08483d4) at /usr/src/sys/ddb/db_command.c:349
#3  0xc04610cc in db_command_loop () at /usr/src/sys/ddb/db_command.c:455
#4  0xc0462c51 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
#5  0xc0627af2 in kdb_trap (type=3, code=0, tf=0xc76bfb30) at 
/usr/src/sys/kern/subr_kdb.c:468
#6  0xc07b6394 in trap (frame=
  {tf_fs = -949288936, tf_es = -1067319280, tf_ds = -1065222128, tf_edi = 
1, tf_esi = -1065197495, tf_ebp = -949224592, tf_isp = -949224612, tf_ebx = 
-949224548, tf_edx = 0, tf_ecx = -1060921344, tf_eax = 18, tf_trapno = 3, 
tf_err = 0, tf_eip = -1067288461, tf_cs = -1065222136, tf_eflags = 658, tf_esp 
= -949224560, tf_ss = -1067376657})
at /usr/src/sys/i386/i386/trap.c:584
#7  0xc07a69ca in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#8  0xc76b0018 in ?? ()
#9  0xc0620010 in schedcpu () at /usr/src/sys/kern/sched_4bsd.c:461
#10 0xc0611fef in panic (fmt=0xc0820008 "default") at 
/usr/src/sys/kern/kern_shutdown.c:550
#11 0xc0641a2c in m_copym (m=0x0, off0=1500, len=1480, wait=1) at 
/usr/src/sys/kern/uipc_mbuf.c:385
#12 0xc069b694 in ip_fragment (ip=0xc12f700e, m_frag=0xc76bfc6c, 
mtu=-1056788992, if_hwassist_flags=0, sw_csum=1)
at /usr/src/sys/netinet/ip_output.c:967
#13 0xc06933c1 in ip_fastforward (m=0xc12e6c00) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
#14 0xc0672a59 in ether_demux (ifp=0xc0f9, m=0xc12e6c00) at 
/usr/src/sys/net/if_ethersubr.c:770
#15 0xc06727f5 in ether_input (ifp=0xc0f9, m=0xc12e6c00) at 
/usr/src/sys/net/if_ethersubr.c:631
#16 0xc0713507 in sis_rxeof (sc=0xc0f9) at /usr/src/sys/pci/if_sis.c:1636
#17 0xc071398f in sis_intr (arg=0xc0f9) at /usr/src/sys/pci/if_sis.c:1841
#18 0xc0600430 in ithread_loop (arg=0xc0ec6880) at 
/usr/src/sys/kern/kern_intr.c:547
#19 0xc05ff8a4 in fork_exit (callout=0xc060030c , arg=0xc0ec6880, 
frame=0xc76bfd48)
at /usr/src/sys/kern/kern_fork.c:791
#20 0xc07a6a2c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209
(kgdb) f 13
#13 0xc06933c1 in ip_fastforward (m=0xc12e6c00) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
warning: Source file is more recent than executable.

572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
(kgdb) l
567 m->m_pkthdr.csum_flags |= CSUM_IP;
568 /*
569  * ip_fragment expects ip_len and ip_off in 
host byte
570  * order but returns all packets in network 
byte order
571  */
572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
573 (~ifp->if_hwassist & 
CSUM_DELAY_IP))) {
574 goto drop;
575 }
576 KASSERT(m != NULL, ("null mbuf and no error"));
(kgdb) i loc
ip = (struct ip *) 0xc12f700e
m0 = (struct mbuf *) 0xc12f700e
ro = {ro_rt = 0xc11f8420, ro_dst = {sa_len = 16 '\020', sa_family = 2 '\002', 
sa_data = "\000\000ˬ\002\005\000\000\000\000\000\000\000"}}
dst = (struct sockaddr_in *) 0xc76bfc3c
ia = (struct in_ifaddr *) 0x0
ifa = (struct ifaddr *) 0x0
ifp = (struct ifnet *) 0xc0f91800
odest = {s_addr = 84060352}
dest = {s_addr = 84060352}
sum = 0
ip_len = 0
error = 84060352
hlen = -1057417216
mtu = 0
__func__ = "ip_fastforward"
(kgdb) p *ip
$1 = {ip_hl = 5, ip_v = 4, 

Re: help w/panic under heavy load - 5.4

2005-07-23 Thread Max Laier
On Saturday 23 July 2005 15:53, Edwin wrote:

Can we see one complete picture, please.  This includes:

  A trace
  local vars in ip_fastforward including unfolded ip, m, ro.ro_rt and ifp.
  local vars in ip_fragment().

Thanks.

-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


pgp4xRiuakSBC.pgp
Description: PGP signature


Re: help w/panic under heavy load - 5.4

2005-07-23 Thread Edwin
comments in-line.

Giorgos Keramidas ([EMAIL PROTECTED]) wrote:
> 
> This looks rather strange.  ip_fastforward() should pass an mtu of 1500
> but somehow the negative strange value gets passed.  It would be
> interesting to see the value of ``mtu'' in frame 13 too, if you still
> have this crash dump stored somewhere.

included right below - i have a few other questions while i'm looking @ these

 - the value of hlen (looks from the code that it should be ip->ip_hl << 2 
(5*4=20 right?)
not some large -value - plus the value of hlen is sortof close to crazy 
mtu value
 - ip->ip_len = 10240 ??? not sure why this would be either
 - i think mtu should have a value of 1500 here - both the ifp struct and ro 
struct
have values of 1500, but even if it did have a value of 1500 - since
ip->ip_len = 10240 - it's still going to drop through to else of line 
542
(ie. ip->ip_len <= mtu) which leaves either can't frag/drop or frag 
(where
we are i think) - unless I'm missing something


(kgdb) f 13
#13 0xc06933c1 in ip_fastforward (m=0xc12e6c00) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
warning: Source file is more recent than executable.

572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
(kgdb) i loc
ip = (struct ip *) 0xc12f700e
m0 = (struct mbuf *) 0xc12f700e
ro = {ro_rt = 0xc11f8420, ro_dst = {sa_len = 16 '\020', sa_family = 2 '\002',
sa_data = "\000\000�\002\005\000\000\000\000\000\000\000"}}
dst = (struct sockaddr_in *) 0xc76bfc3c
ia = (struct in_ifaddr *) 0x0
ifa = (struct ifaddr *) 0x0
ifp = (struct ifnet *) 0xc0f91800
odest = {s_addr = 84060352}
dest = {s_addr = 84060352}
sum = 0
ip_len = 0
error = 84060352
hlen = -1057417216
mtu = 0
__func__ = "ip_fastforward"
(kgdb) p *ip
$1 = {ip_hl = 5, ip_v = 4, ip_tos = 0 '\0', ip_len = 10240, ip_id = 61249, 
ip_off = 0, ip_ttl = 63 '?',  ip_p = 17 '\021', ip_sum = 31921, ip_src = 
{s_addr = 67479744}, ip_dst = {s_addr = 84060352}}
(kgdb)


> 
> You are not running a kernel with optimization and/or architecture-
> dependent optimization flags, right?
> 

ntiko - i have added CPU_GEODE/CPU_SOEKRIS to my config - but same crash on the 
generic
config as well..this is a soekris net4801 box (w/ geode proc - i586). generic
'make buildkernel KERNCONF=D1-0722' command line (ie no other make/compiler 
options).


mbsd05# diff /root/kernels/D1-0722 /root/kernels/GENERIC 
21,22d20
< makeoptions   DEBUG=-g
< 
24c22
< #cpu  I486_CPU
---
> cpu   I486_CPU
26,27c24,25
< #cpu  I686_CPU
< ident D1-0722
---
> cpu   I686_CPU
> ident GENERIC
31,48d28
< 
< options   KDB
< options   DDB
< options   INVARIANTS
< options   INVARIANT_SUPPORT
< 
< options   CPU_SOEKRIS
< options   CPU_GEODE
< 
< options   HZ=1000
< options   DEVICE_POLLING
< 
< options   IPFIREWALL
< options   IPFIREWALL_VERBOSE
< options   IPFIREWALL_VERBOSE_LIMIT
< options   IPFIREWALL_DEFAULT_TO_ACCEPT
< options   DUMMYNET
< options   IPDIVERT
mbsd05# 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help w/panic under heavy load - 5.4

2005-07-22 Thread Giorgos Keramidas
On 2005-07-22 17:53, Edwin <[EMAIL PROTECTED]> wrote:
>
> I also patched ip_fastforward.c w/ your patch - still a crash - still
> same type bogus mtu value - a few lines from kgdb included @ end of
> message.
> (kgdb) f 13
> #13 0xc06933c1 in ip_fastforward (m=0xc12e6c00) at 
> /usr/src/sys/netinet/ip_fastfwd.c:572
> 572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
> (kgdb) p ro.ro_rt->rt_rmx
> $1 = {rmx_mtu = 1500, rmx_expire = 333905919, rmx_pksent = 3868}

The route entry has mtu = 1500.

> (kgdb) p *ifp
> $3 = {if_softc = 0xc0f91800, if_link = {tqe_next = 0xc0f9, tqe_prev = 
> 0xc08ebe84},
>   if_xname = "sis0", '\0' , if_dname = 0xc0f2ec2c "sis", 
> if_dunit = 0,
>   if_addrhead = {tqh_first = 0xc0ec, tqh_last = 0xc1040460}, if_klist = {
> kl_lock = 0xc08e5a40, kl_list = {slh_first = 0x0}}, if_pcount = 0, 
> if_carp = 0x0,
>   if_bpf = 0x0, if_index = 1, if_timer = 5, if_nvlans = 0, if_flags = 34883,
>   if_capabilities = 72, if_capenable = 72, if_linkmib = 0x0, if_linkmiblen = 
> 0,
>   if_data = {ifi_type = 6 '\006', ifi_physical = 0 '\0', ifi_addrlen = 6 
> '\006',
> ifi_hdrlen = 18 '\022', ifi_link_state = 2 '\002', ifi_recvquota = 0 '\0',
> ifi_xmitquota = 0 '\0', ifi_datalen = 80 'P', ifi_mtu = 1500, ifi_metric 
> = 0,

The interface also has an mtu of 1500 (ifi_mtu in the last line above).

> #10 0xc0611fef in panic (fmt=0xc0820008 "default")
> at /usr/src/sys/kern/kern_shutdown.c:550
> #11 0xc0641a2c in m_copym (m=0x0, off0=1500, len=1480, wait=1)
> at /usr/src/sys/kern/uipc_mbuf.c:385
> #12 0xc069b694 in ip_fragment (ip=0xc12f700e, m_frag=0xc76bfc6c, 
> mtu=-1056788992,
> if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967
> #13 0xc06933c1 in ip_fastforward (m=0xc12e6c00) at 
> /usr/src/sys/netinet/ip_fastfwd.c:572

This looks rather strange.  ip_fastforward() should pass an mtu of 1500
but somehow the negative strange value gets passed.  It would be
interesting to see the value of ``mtu'' in frame 13 too, if you still
have this crash dump stored somewhere.

You are not running a kernel with optimization and/or architecture-
dependent optimization flags, right?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help w/panic under heavy load - 5.4

2005-07-22 Thread Edwin

Hi Giorgos,

I'm sorry - I have so many kernels I was trying - I belive I overwrote that 
particular
kernel/kernel.debug set - so I created a new kernel as a baseline with the same 
options
per my notes - and included the output from the crash below.

It does crash in the same  fashion, and the KGDB output shows an MTU of the 
same type
value (-1056788992 v. (-1056787456).

I also patched ip_fastforward.c w/ your patch - still a crash - still same type 
bogus
mtu value - a few lines from kgdb included @ end of message.

Thanks again,
-Edwin



the variables you were asking about from this crash.

(kgdb) f 13
#13 0xc06933c1 in ip_fastforward (m=0xc12e6c00) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
(kgdb) p ro.ro_rt->rt_rmx
$1 = {rmx_mtu = 1500, rmx_expire = 333905919, rmx_pksent = 3868}
(kgdb) p ifp
$2 = (struct ifnet *) 0xc0f91800
(kgdb) p *ifp
$3 = {if_softc = 0xc0f91800, if_link = {tqe_next = 0xc0f9, tqe_prev = 
0xc08ebe84}, 
  if_xname = "sis0", '\0' , if_dname = 0xc0f2ec2c "sis", 
if_dunit = 0, 
  if_addrhead = {tqh_first = 0xc0ec, tqh_last = 0xc1040460}, if_klist = {
kl_lock = 0xc08e5a40, kl_list = {slh_first = 0x0}}, if_pcount = 0, if_carp 
= 0x0, 
  if_bpf = 0x0, if_index = 1, if_timer = 5, if_nvlans = 0, if_flags = 34883, 
  if_capabilities = 72, if_capenable = 72, if_linkmib = 0x0, if_linkmiblen = 0, 
  if_data = {ifi_type = 6 '\006', ifi_physical = 0 '\0', ifi_addrlen = 6 
'\006', 
ifi_hdrlen = 18 '\022', ifi_link_state = 2 '\002', ifi_recvquota = 0 '\0', 
ifi_xmitquota = 0 '\0', ifi_datalen = 80 'P', ifi_mtu = 1500, ifi_metric = 
0, 
ifi_baudrate = 1000, ifi_ipackets = 50, ifi_ierrors = 0, ifi_opackets = 
3914, 
ifi_oerrors = 0, ifi_collisions = 0, ifi_ibytes = 6146, ifi_obytes = 
213356, 
ifi_imcasts = 40, ifi_omcasts = 29, ifi_iqdrops = 0, ifi_noproto = 0, 
ifi_hwassist = 0, ifi_epoch = 0, ifi_lastchange = {tv_sec = 0, tv_usec = 
0}}, 
  if_multiaddrs = {tqh_first = 0xc0fab3e0, tqh_last = 0xc0fabcc0}, if_amcount = 
0, 
  if_output = 0xc0671e04 , if_input = 0xc0672598 , 
  if_start = 0xc0713c10 , if_ioctl = 0xc071497c , 
  if_watchdog = 0xc0714b04 , if_init = 0xc0713f60 , 
  if_resolvemulti = 0xc0672e48 , if_spare1 = 0x0, if_spare2 
= 0x0, 
  if_spare3 = 0x0, if_spare_flags1 = 0, if_spare_flags2 = 0, if_snd = {ifq_head 
= 0x0, 
ifq_tail = 0x0, ifq_len = 0, ifq_maxlen = 127, ifq_drops = 0, ifq_mtx = {
  mtx_object = {lo_class = 0xc0880b3c, lo_name = 0xc0f9180c "sis0", 
lo_type = 0xc0829304 "if send queue", lo_flags = 196608, lo_list = {
  tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, 
  mtx_recurse = 0}, ifq_drv_head = 0x0, ifq_drv_tail = 0x0, ifq_drv_len = 
0, 
ifq_drv_maxlen = 127, altq_type = 0, altq_flags = 1, altq_disc = 0x0, 
altq_ifp = 0xc0f91800, altq_enqueue = 0, altq_dequeue = 0, altq_request = 
0, 
altq_clfier = 0x0, altq_classify = 0, altq_tbr = 0x0, altq_cdnr = 0x0}, 
  if_broadcastaddr = 0xc07db600 "������", lltables = 0x0, if_label 
= 0x0, 
  if_prefixhead = {tqh_first = 0x0, tqh_last = 0xc0f91968}, if_afdata = {
0x0 , 0xc0faaab0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 
0x0}, 
  if_afdata_initialized = 1, if_afdata_mtx = {mtx_object = {lo_class = 
0xc0880b3c, 
  lo_name = 0xc08292f4 "if_afdata", lo_type = 0xc08292f4 "if_afdata", 
  lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness 
= 0x0}, 
mtx_lock = 4, mtx_recurse = 0}, if_starttask = {ta_link = {stqe_next = 
0x0}, 
ta_pending = 0, ta_priority = 0, ta_func = 0xc06711c0 , 
ta_context = 0xc0f91800}}
(kgdb) 


for reference going forward - this kernel was named D1-0722, and I'm making 
cross
correlations to save the kernels/debugs/cores.


new kernel crash - all options compiled, sysctl ipff=1, polling not enabled


fb54c# panic: m_copym, offset > size of mbuf chain
KDB: enter: panic
[thread pid 21 tid 100015 ]
Stopped at  kdb_enter+0x2b: nop
db> where
Tracing pid 21 tid 100015 td 0xc0ecc780
kdb_enter(c0821a6a) at kdb_enter+0x2b
panic(c0826049,0,c076b79c,c102ae00,100) at panic+0xbb
m_copym(0,5dc,5c8,1,14) at m_copym+0x60
ip_fragment(c12f700e,c76bfc6c,5dc,0,1) at ip_fragment+0x214
ip_fastforward(c12e6c00) at ip_fastforward+0x6ed
ether_demux(c0f9,c12e6c00,3c,c0f8a8d8,a) at ether_demux+0x259
ether_input(c0f9,c12e6c00,c0f902d0,0,c08336ab) at ether_input+0x25d
sis_rxeof(c0f9) at sis_rxeof+0x1ab
sis_intr(c0f9) at sis_intr+0xf3
ithread_loop(c0ec6880,c76bfd48,c0ec6880,c060030c,0) at ithread_loop+0x124
fork_exit(c060030c,c0ec6880,c76bfd48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc76bfd7c, ebp = 0 ---
db> 

(kgdb) where
#0  doadump () at pcpu.h:159
#1  0xc04611f6 in db_fncall (dummy1=0, dummy2=0, dummy3=43, dummy4=0xc76bf9f4 
"(�k�")
at /usr/src/sys/ddb/db_command.c:531
#2  0xc0461004 in db_command (last_cmdp=0xc08c9264, cmd_table=0x

Re: help w/panic under heavy load - 5.4

2005-07-21 Thread Giorgos Keramidas
On 2005-07-21 14:57, Giorgos Keramidas <[EMAIL PROTECTED]> wrote:
> On 2005-07-20 11:41, Edwin <[EMAIL PROTECTED]> wrote:
> > I'm trying to understand the particulars about this - I get the null pointer
> > part, but as to ip_fragment - it's fragmenting mbufs to handle ip packets
> > during switching? and its failing trying to copy data past the end of the
> > chain?
>
> ip_fastfwd() thinks that it should fragment the packet because it somehow
> calculates a bogus ``mtu'' value.  See the mtu value in frame 12 of the stack
> trace below.
>
> > #10 0xc0611fef in panic (fmt=0xc0820008 "default") at 
> > /usr/src/sys/kern/kern_shutdown.c:550
> > #11 0xc0641a2c in m_copym (m=0x0, off0=1500, len=1480, wait=1)
> > at /usr/src/sys/kern/uipc_mbuf.c:385
> > #12 0xc069b694 in ip_fragment (ip=0xc11bd80e, m_frag=0xc76bfc6c, 
> > mtu=-1056787456,
> > if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967
>
> The ``mtu'' is an extremely small integer value, which is definitely a problem
> here.  Somehow, ip_fastforward() calculates a very wrong value for the 
> ``mtu''.

The check for finding the right MTU in ip_output.c is a bit different,
as it includes a check for RTF_UP:

 777:if (ip->ip_off & IP_DF) {
 778:error = EMSGSIZE;
 779:/*
 780: * This case can happen if the user changed the MTU
 781: * of an interface after enabling IP on it.  Because
 782: * most netifs don't keep track of routes pointing to
 783: * them, there is no way for one to update all its
 784: * routes when the MTU is changed.
 785: */
 786:if ((ro->ro_rt->rt_flags & (RTF_UP | RTF_HOST)) &&
 787:(ro->ro_rt->rt_rmx.rmx_mtu > ifp->if_mtu)) {
 788:ro->ro_rt->rt_rmx.rmx_mtu = ifp->if_mtu;
 789:}
 790:ipstat.ips_cantfrag++;
 791:goto bad;
 792:}

The check for RTF_UP doesn't exist in ip_fastfwd.c, except perhaps
through the ip_findroute() call.  I'm probably confused, but it seems
that ip_findroute() in ip_fastfwd.c may still return a route entry for a
gateway that is not yet RTF_UP, since the check for the RTF_UP flag is
not done for the dst.rt_gateway route entry too.

This may be the cause of the invalid MTU value you're seeing.  Can you
try the following patch for ip_fastfwd.c?

The diff is also available online at:
http://people.freebsd.org/~keramida/diff/fastfwd-mtu.patch

%%%
Index: ip_fastfwd.c
===
RCS file: /home/ncvs/src/sys/netinet/ip_fastfwd.c,v
retrieving revision 1.28
diff -u -r1.28 ip_fastfwd.c
--- ip_fastfwd.c4 May 2005 13:09:19 -   1.28
+++ ip_fastfwd.c21 Jul 2005 14:38:35 -
@@ -537,12 +537,13 @@
}
 
/*
-* Check if packet fits MTU or if hardware will fragement for us
+* Check if packet fits MTU or if hardware will fragment for us.
+* If necessary, update the MTU of the route entry too.
 */
-   if (ro.ro_rt->rt_rmx.rmx_mtu)
-   mtu = min(ro.ro_rt->rt_rmx.rmx_mtu, ifp->if_mtu);
-   else
-   mtu = ifp->if_mtu;
+   if ((ro.ro_rt.rt_flags & (RTF_UP | RTF_HOST)) &&
+   ro.ro_rt->rt_rmx.rmx_mtu > ipf->if_mtu)
+   ro.ro_rt->rt_rmx.rmx_mtu = ipf->if_mtu;
+   mtu = ifp->if_mtu;
 
if (ip->ip_len <= mtu ||
(ifp->if_hwassist & CSUM_FRAGMENT && (ip->ip_off & IP_DF) == 0)) {
%%%
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help w/panic under heavy load - 5.4

2005-07-21 Thread Giorgos Keramidas
On 2005-07-20 11:41, Edwin <[EMAIL PROTECTED]> wrote:
> I'm trying to understand the particulars about this - I get the null pointer
> part, but as to ip_fragment - it's fragmenting mbufs to handle ip packets
> during switching? and its failing trying to copy data past the end of the
> chain?

ip_fastfwd() thinks that it should fragment the packet because it somehow
calculates a bogus ``mtu'' value.  See the mtu value in frame 12 of the stack
trace below.

> mbsd05# kgdb kernel.debug /tmp/crash/vmcore.3
> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
> Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-marcel-freebsd".
> #0  doadump () at pcpu.h:159
> 159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
> (kgdb) where
> #0  doadump () at pcpu.h:159
> #1  0xc04611f6 in db_fncall (dummy1=0, dummy2=0, dummy3=-1, dummy4=0xc76bf9f4 
> "(οΏ½kοΏ½")
> at /usr/src/sys/ddb/db_command.c:531
> #2  0xc0461004 in db_command (last_cmdp=0xc08c9264, cmd_table=0x0, 
> aux_cmd_tablep=0xc08483b8, 
> aux_cmd_tablep_end=0xc08483d4) at /usr/src/sys/ddb/db_command.c:349
> #3  0xc04610cc in db_command_loop () at /usr/src/sys/ddb/db_command.c:455
> #4  0xc0462c51 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
> #5  0xc0627af2 in kdb_trap (type=3, code=0, tf=0xc76bfb30) at 
> /usr/src/sys/kern/subr_kdb.c:468
> #6  0xc07b6394 in trap (frame=
>   {tf_fs = -949288936, tf_es = -1067319280, tf_ds = -1065222128, tf_edi = 
> 1, tf_esi = -1065
> 197495, tf_ebp = -949224592, tf_isp = -949224612, tf_ebx = -949224548, tf_edx 
> = 0, tf_ecx = -10
> 60921344, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1067288461, tf_cs 
> = -1065222136, tf_eflags = 658, tf_esp = -949224560, tf_ss = -1067376657}) at 
> /usr/src/sys/i386/i386/trap.c:584
> #7  0xc07a69ca in calltrap () at /usr/src/sys/i386/i386/exception.s:140
> #8  0xc76b0018 in ?? ()
> #9  0xc0620010 in schedcpu () at /usr/src/sys/kern/sched_4bsd.c:461
> #10 0xc0611fef in panic (fmt=0xc0820008 "default") at 
> /usr/src/sys/kern/kern_shutdown.c:550
> #11 0xc0641a2c in m_copym (m=0x0, off0=1500, len=1480, wait=1)
> at /usr/src/sys/kern/uipc_mbuf.c:385
> #12 0xc069b694 in ip_fragment (ip=0xc11bd80e, m_frag=0xc76bfc6c, 
> mtu=-1056787456, 
> if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967

The ``mtu'' is an extremely small integer value, which is definitely a problem
here.  Somehow, ip_fastforward() calculates a very wrong value for the ``mtu''.

> 6933c1 in ip_fastforward (m=0xc11ab100) at 
> /usr/src/sys/netinet/ip_fastfwd.c:572

If you have this particular crash dump, can you show me a dump of the
``ro.ro_rt->rt_rmx'' and the ``ifp'' structure that ip_fastforward() is using?

One of these two seems to have an invalid mtu value.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help w/panic under heavy load - 5.4

2005-07-20 Thread Edwin
Giorgos/John/et.al :)

I have compiled/tested/traced about 15 separate kernels for this, and am happy
to provide crashdumps/etc to anyone interested :)

I decided to start over - create a GENERIC kernel 
(w/ DDB/KDB/INVARIANTS/INVARIANT_SUPPORT) and see what I started to get if I 
could
reproduce the problem more specifically.

Just using the GENERIC w/ debug kernel - I did make it crash - although it took 
some
handholding, lots of throwing packets at it and running processes on the box, 
about 
5-10 minutes - didn't really try to reproduce it - since it really wasn't the 
fast
panic that I was concerned about before. i've included the panic below here 
anyhow.

What I did notice - was w/o any options - and turning on ip.fastforwarding via
sysctl - the crash was reproducible consistently with the (pretty much) generic
kernel, same kernel traces as before basically. I also received an 'interrupt 
storm'
message on the console from the ip.fastforwarding trace - have seen that a few 
times
in the past when polling was not enabled before it panic'd.

I welcome all comments/thoughts/directions - happy to poke/prod/compile/debug - 
just really don't know where to go from here.

Thanks for your help!
/Edwin




Kernel: DDB8-GENDBG (GENERIC + options DDB/KDB/INVARIANTS/INVARIANT_SUPPORT)
sysctl: ip.fastforwarding=0 <--- turned off

ospfd# panic: m_copym, offset > size of mbuf chain
KDB: enter: panic
[thread pid 27 tid 100021 ]
Stopped at  kdb_enter+0x2b: nop
db> where
Tracing pid 27 tid 100021 td 0xc0ed0180
kdb_enter(c0821a6a) at kdb_enter+0x2b
panic(c0826049,0,c076b79c,c102bb00,100) at panic+0xbb
m_copym(0,5dc,5c8,1,14) at m_copym+0x60
ip_fragment(c124100e,c76d1a04,5dc,0,1) at ip_fragment+0x214
ip_output(c1201200,0,c76d19d0,1,0,0) at ip_output+0x74c
ip_forward(c1201200,0) at ip_forward+0x2d4
ip_input(c1201200) at ip_input+0x4a7
netisr_processqueue(c08ec138) at netisr_processqueue+0x6e
swi_net(0) at swi_net+0xc2
ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124
fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 ---
db> call doadump
Dumping 128 MB
 16 32 48 64 80 96 112
Dump complete
0xf
db>

Kernel: DDB8-GENDBG (GENERIC + options DDB/KDB/INVARIANTS/INVARIANT_SUPPORT)
Sysctl: ip.fastforwarding=1

fb54c# Interrupt storm detected on "irq10: sis0 sis1+"; throttling interrupt 
source
fb54c#
fb54c#
fb54c#
fb54c# panic: m_copym, offset > size of mbuf chain
KDB: enter: panic
[thread pid 21 tid 100015 ]
Stopped at  kdb_enter+0x2b: nop
db> where
Tracing pid 21 tid 100015 td 0xc0ecc780
kdb_enter(c08165b2) at kdb_enter+0x2b
panic(c081ab91,0,c0760a0c,c1028800,100) at panic+0xbb
m_copym(0,5dc,5c8,1,14) at m_copym+0x60
ip_fragment(c121880e,c76bfc6c,5dc,0,1) at ip_fragment+0x214
ip_fastforward(c11f2600) at ip_fastforward+0x6ed
ether_demux(c0f9,c11f2600,52,c0f8b8d8,a) at ether_demux+0x259
ether_input(c0f9,c11f2600,c0f902cc,0,c0826fc6) at ether_input+0x25d
sis_rxeof(c0f9) at sis_rxeof+0x18b
sis_intr(c0f9) at sis_intr+0xa3
ithread_loop(c0ec6880,c76bfd48,c0ec6880,c05feb3c,0) at ithread_loop+0x124
fork_exit(c05feb3c,c0ec6880,c76bfd48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc76bfd7c, ebp = 0 ---
db> doadump
No such command
db> call doadump
Dumping 128 MB
 16 32 48 64 80 96 112
Dump complete
0xf
db> reset

.




Giorgos Keramidas ([EMAIL PROTECTED]) wrote:
> On 2005-07-19 22:03, Edwin <[EMAIL PROTECTED]> wrote:
> > Hi John,
> >
> > Updated the kernel, same crash under load, looks like m is null, you're 
> > right.
> >
> > Not quite sure where to go from here. I'm happy to do the footwork - just 
> > still real
> > hazy on the BSD kernel part of things.
> >
> > panic: m_copym, offset > size of mbuf chain
> > KDB: enter: panic
> > [thread pid 27 tid 100021 ]
> > Stopped at  kdb_enter+0x2b: nop
> > db> where
> > Tracing pid 27 tid 100021 td 0xc0ed0180
> > kdb_enter(c0821a6a) at kdb_enter+0x2b
> > panic(c0826049,0,c076b79c,c102d600,100) at panic+0xbb
> > m_copym(0,5dc,5c8,1,14) at m_copym+0x60
> > ip_fragment(c123180e,c76d1c38,5dc,0,1) at ip_fragment+0x214
> > ip_fastforward(c11fee00) at ip_fastforward+0x6ed
> > ether_demux(c0f9,c11fee00,52,c0f8aad0,1f) at ether_demux+0x259
> > ether_input(c0f9,c11fee00,c0f902d0,0,c08336ab) at ether_input+0x25d
> > sis_rxeof(c0f9,1,5,c08e5500,c76d1ce0) at sis_rxeof+0x1ab
> > sis_poll(c0f9,0,5) at sis_poll+0x7f
> > netisr_poll(0) at netisr_poll+0x188
> > swi_net(0) at swi_net+0x81
> > ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124
> > fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4
> > fork_trampoline() at fork_trampoline+0x8
> > --- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 ---
> 
> Both tracebacks contain sis_poll() somewhere in the call stack?  Are you
> using POLLING?  If yes, can you try without POLLING and see if the crash
> can still be reproduced?
> 
>

Re: help w/panic under heavy load - 5.4

2005-07-20 Thread Edwin
Hi Giorgos,

Yes - I'm using polling, but it still panics even w/ polling disabled or not
compiled in. Still reproducible - same scenario (high load - actually, not even
really high load - relative load,- small network packets).

I did both (output included below):
- disable polling via sysctl
- re-compile new kernel w/o option

It appears to be still the same error - traces the same w/ the exception of 
sis_poll versus sis_intr.

I have tried various different options in my kernel before posting - w/ and/wo
ipff, ipfw, polling, didn't seem to make a difference - but then again - I 
wasn't getting traces from DDB w/ INVARIANTS - so not for sure. 

I'm trying to understand the particulars about this - I get the null pointer
part, but as to ip_fragment - it's fragmenting mbufs to handle ip packets
during switching? and its failing trying to copy data past the end of the
chain?

Thanks!
/edwin




Giorgos Keramidas ([EMAIL PROTECTED]) wrote:
<>
> > ether_input(c0f9,c11fee00,c0f902d0,0,c08336ab) at ether_input+0x25d
> > sis_rxeof(c0f9,1,5,c08e5500,c76d1ce0) at sis_rxeof+0x1ab
> > sis_poll(c0f9,0,5) at sis_poll+0x7f
> > netisr_poll(0) at netisr_poll+0x188
> > swi_net(0) at swi_net+0x81
> > ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124
> > fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4
> > fork_trampoline() at fork_trampoline+0x8
> > --- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 ---
> 
> Both tracebacks contain sis_poll() somewhere in the call stack?  Are you
> using POLLING?  If yes, can you try without POLLING and see if the crash
> can still be reproduced?
> 
> - Giorgos
> 



DDB output from disabling polling via sysctl - trace

fb54c# sysctl kern.polling.enable=0
kern.polling.enable: 1 -> 0
fb54c# panic: m_copym, offset > size of mbuf chain
KDB: enter: panic
[thread pid 21 tid 100015 ]
Stopped at  kdb_enter+0x2b: nop
db> where
Tracing pid 21 tid 100015 td 0xc0ecc780
kdb_enter(c0821a6a) at kdb_enter+0x2b
panic(c0826049,0,c076b79c,c102b400,100) at panic+0xbb
m_copym(0,5dc,5c8,1,14) at m_copym+0x60
ip_fragment(c11bd80e,c76bfc6c,5dc,0,1) at ip_fragment+0x214
ip_fastforward(c11ab100) at ip_fastforward+0x6ed
ether_demux(c0f9,c11ab100,52,c0f8abc0,29) at ether_demux+0x259
ether_input(c0f9,c11ab100,c0f902d0,0,c08336ab) at ether_input+0x25d
sis_rxeof(c0f9) at sis_rxeof+0x1ab
sis_intr(c0f9) at sis_intr+0xf3
ithread_loop(c0ec6880,c76bfd48,c0ec6880,c060030c,0) at ithread_loop+0x124
fork_exit(c060030c,c0ec6880,c76bfd48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc76bfd7c, ebp = 0 ---
db> call doadump
Dumping 128 MB
 16 32 48 64 80 96 112
Dump complete
0xf
db> 

mbsd05# kgdb kernel.debug /tmp/crash/vmcore.3
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
#0  doadump () at pcpu.h:159
159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) where
#0  doadump () at pcpu.h:159
#1  0xc04611f6 in db_fncall (dummy1=0, dummy2=0, dummy3=-1, dummy4=0xc76bf9f4 
"(�k�")
at /usr/src/sys/ddb/db_command.c:531
#2  0xc0461004 in db_command (last_cmdp=0xc08c9264, cmd_table=0x0, 
aux_cmd_tablep=0xc08483b8, 
aux_cmd_tablep_end=0xc08483d4) at /usr/src/sys/ddb/db_command.c:349
#3  0xc04610cc in db_command_loop () at /usr/src/sys/ddb/db_command.c:455
#4  0xc0462c51 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
#5  0xc0627af2 in kdb_trap (type=3, code=0, tf=0xc76bfb30) at 
/usr/src/sys/kern/subr_kdb.c:468
#6  0xc07b6394 in trap (frame=
  {tf_fs = -949288936, tf_es = -1067319280, tf_ds = -1065222128, tf_edi = 
1, tf_esi = -1065
197495, tf_ebp = -949224592, tf_isp = -949224612, tf_ebx = -949224548, tf_edx = 
0, tf_ecx = -10
60921344, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1067288461, tf_cs = 
-1065222136, tf_eflags = 658, tf_esp = -949224560, tf_ss = -1067376657}) at 
/usr/src/sys/i386/i386/trap.c:584
#7  0xc07a69ca in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#8  0xc76b0018 in ?? ()
#9  0xc0620010 in schedcpu () at /usr/src/sys/kern/sched_4bsd.c:461
#10 0xc0611fef in panic (fmt=0xc0820008 "default") at 
/usr/src/sys/kern/kern_shutdown.c:550
#11 0xc0641a2c in m_copym (m=0x0, off0=1500, len=1480, wait=1)
at /usr/src/sys/kern/uipc_mbuf.c:385
#12 0xc069b694 in ip_fragment (ip=0xc11bd80e, m_frag=0xc76bfc6c, 
mtu=-1056787456, 
if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967

6933c1 in ip_fastforward (m=0xc11ab100) at /usr/src/sys/netinet/ip_fastfwd.c:572
#14 0xc0672a59 in ether_de

Re: help w/panic under heavy load - 5.4

2005-07-20 Thread Giorgos Keramidas
On 2005-07-19 22:03, Edwin <[EMAIL PROTECTED]> wrote:
> Hi John,
>
> Updated the kernel, same crash under load, looks like m is null, you're right.
>
> Not quite sure where to go from here. I'm happy to do the footwork - just 
> still real
> hazy on the BSD kernel part of things.
>
> panic: m_copym, offset > size of mbuf chain
> KDB: enter: panic
> [thread pid 27 tid 100021 ]
> Stopped at  kdb_enter+0x2b: nop
> db> where
> Tracing pid 27 tid 100021 td 0xc0ed0180
> kdb_enter(c0821a6a) at kdb_enter+0x2b
> panic(c0826049,0,c076b79c,c102d600,100) at panic+0xbb
> m_copym(0,5dc,5c8,1,14) at m_copym+0x60
> ip_fragment(c123180e,c76d1c38,5dc,0,1) at ip_fragment+0x214
> ip_fastforward(c11fee00) at ip_fastforward+0x6ed
> ether_demux(c0f9,c11fee00,52,c0f8aad0,1f) at ether_demux+0x259
> ether_input(c0f9,c11fee00,c0f902d0,0,c08336ab) at ether_input+0x25d
> sis_rxeof(c0f9,1,5,c08e5500,c76d1ce0) at sis_rxeof+0x1ab
> sis_poll(c0f9,0,5) at sis_poll+0x7f
> netisr_poll(0) at netisr_poll+0x188
> swi_net(0) at swi_net+0x81
> ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124
> fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 ---

Both tracebacks contain sis_poll() somewhere in the call stack?  Are you
using POLLING?  If yes, can you try without POLLING and see if the crash
can still be reproduced?

- Giorgos

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help w/panic under heavy load - 5.4

2005-07-19 Thread Edwin
Hi John,

Updated the kernel, same crash under load, looks like m is null, you're right.

Not quite sure where to go from here. I'm happy to do the footwork - just still 
real
hazy on the BSD kernel part of things.

Thanks for the help!

/Edwin


Results from KDB/DDB/INVARIANTS/INVARIANT_SUPPORT - same crash (ddb and kdb 
output)


panic: m_copym, offset > size of mbuf chain
KDB: enter: panic
[thread pid 27 tid 100021 ]
Stopped at  kdb_enter+0x2b: nop 
db> where
Tracing pid 27 tid 100021 td 0xc0ed0180
kdb_enter(c0821a6a) at kdb_enter+0x2b
panic(c0826049,0,c076b79c,c102d600,100) at panic+0xbb
m_copym(0,5dc,5c8,1,14) at m_copym+0x60
ip_fragment(c123180e,c76d1c38,5dc,0,1) at ip_fragment+0x214
ip_fastforward(c11fee00) at ip_fastforward+0x6ed
ether_demux(c0f9,c11fee00,52,c0f8aad0,1f) at ether_demux+0x259
ether_input(c0f9,c11fee00,c0f902d0,0,c08336ab) at ether_input+0x25d
sis_rxeof(c0f9,1,5,c08e5500,c76d1ce0) at sis_rxeof+0x1ab
sis_poll(c0f9,0,5) at sis_poll+0x7f
netisr_poll(0) at netisr_poll+0x188
swi_net(0) at swi_net+0x81
ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124
fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 ---
db> call doadump
Dumping 128 MB
 16 32 48 64 80 96 112
Dump complete
0xf
db> reset






mbsd05# kgdb kernel.debug /tmp/crash/vmcore.1 
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
#0  doadump () at pcpu.h:159
159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) where
#0  doadump () at pcpu.h:159
#1  0xc04611f6 in db_fncall (dummy1=0, dummy2=0, dummy3=43, dummy4=0xc76d19c0 
"�\031m�")
at /usr/src/sys/ddb/db_command.c:531
#2  0xc0461004 in db_command (last_cmdp=0xc08c9264, cmd_table=0x0, 
aux_cmd_tablep=0xc08483b8, aux_cmd_tablep_end=0xc08483d4)
at /usr/src/sys/ddb/db_command.c:349
#3  0xc04610cc in db_command_loop () at /usr/src/sys/ddb/db_command.c:455
#4  0xc0462c51 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
#5  0xc0627af2 in kdb_trap (type=3, code=0, tf=0xc76d1afc)
at /usr/src/sys/kern/subr_kdb.c:468
#6  0xc07b6394 in trap (frame=
  {tf_fs = -949157864, tf_es = -1067319280, tf_ds = -1065222128, tf_edi = 
1, tf_esi = -
1065197495, tf_ebp = -949150916, tf_isp = -949150936, tf_ebx = -949150872, 
tf_edx = 0, tf_e
cx = -1060921344, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1067288461, 
tf_cs = -1065222136, tf_eflags = 646, tf_esp = -949150884, tf_ss = -1067376657})
at /usr/src/sys/i386/i386/trap.c:584
#7  0xc07a69ca in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#8  0xc76d0018 in ?? ()
#9  0xc0620010 in schedcpu () at /usr/src/sys/kern/sched_4bsd.c:461
#10 0xc0611fef in panic (fmt=0xc0820008 "default")
at /usr/src/sys/kern/kern_shutdown.c:550
#11 0xc0641a2c in m_copym (m=0x0, off0=1500, len=1480, wait=1)
at /usr/src/sys/kern/uipc_mbuf.c:385
#12 0xc069b694 in ip_fragment (ip=0xc123180e, m_frag=0xc76d1c38, 
mtu=-1056778752, 
if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967
#13 0xc06933c1 in ip_fastforward (m=0xc11fee00) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
#14 0xc0672a59 in ether_demux (ifp=0xc0f9, m=0xc11fee00)
at /usr/src/sys/net/if_ethersubr.c:770
#15 0xc06727f5 in ether_input (ifp=0xc0f9, m=0xc11fee00)
at /usr/src/sys/net/if_ethersubr.c:631
#16 0xc0713507 in sis_rxeof (sc=0xc0f9) at /usr/src/sys/pci/if_sis.c:1636
#17 0xc07137cf in sis_poll (ifp=0xc0f9, cmd=POLL_ONLY, count=0)
at /usr/src/sys/pci/if_sis.c:1769
#18 0xc05f8280 in netisr_poll () at /usr/src/sys/kern/kern_poll.c:384
#19 0xc0679985 in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:338
#20 0xc0600430 in ithread_loop (arg=0xc0ec6580) at 
/usr/src/sys/kern/kern_intr.c:547
#21 0xc05ff8a4 in fork_exit (callout=0xc060030c , arg=0xc0ec6580, 
frame=0xc76d1d48) at /usr/src/sys/kern/kern_fork.c:791
#22 0xc07a6a2c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209
(kgdb) f 12
#12 0xc069b694 in ip_fragment (ip=0xc123180e, m_frag=0xc76d1c38, 
mtu=-1056778752, 
if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967
967 m->m_next = m_copy(m0, off, len);
(kgdb) f 11
#11 0xc0641a2c in m_copym (m=0x0, off0=1500, len=1480, wait=1)
at /usr/src/sys/kern/uipc_mbuf.c:385
385 KASSERT(m != NULL, ("m_copym, offset > size of mbuf 
chain"));
(kgdb) l
380 KASSERT(len >= 0, ("m_copym, negative len %d", len));
381   

Re: help w/panic under heavy load - 5.4

2005-07-19 Thread Edwin
Hi John,

Re-compiled with INVARIANTS/INVARIANT_SUPPORT included the gdb output below - 
same situation (put heavy load 
on the box - incidentally - small (68 byte UDP packets) - fwiw.

my buildkernel kept failing on the options DDB (even tried GENERIC kernel) - so 
I'm 
sure I'm doing something wrong there - just didn't figure it out yet.

I wanted to get back to you with the output for the above asap though - I'm 
happy to input/output whatever commands
you would like if necc.

Thanks!
/Edwin


mbsd05# kgdb kernel.debug /tmp/crash/vmcore.5
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
#0  doadump () at pcpu.h:159
159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) where
#0  doadump () at pcpu.h:159
#1  0xc060c474 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
#2  0xc060c6f2 in panic (fmt=0xc081e5ad "m_copym, offset > size of mbuf chain") 
at /usr/src/sys/kern/kern_shutdown.c:566
#3  0xc063beb8 in m_copym (m=0x0, off0=1500, len=1480, wait=1) at 
/usr/src/sys/kern/uipc_mbuf.c:385
#4  0xc06996b4 in ip_fragment (ip=0xc124400e, m_frag=0xc7692c38, 
mtu=-1056768768, if_hwassist_flags=0, sw_csum=1)
at /usr/src/sys/netinet/ip_output.c:967
#5  0xc069132e in ip_fastforward (m=0xc120e700) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
#6  0xc066cd99 in ether_demux (ifp=0xc0f9, m=0xc120e700) at 
/usr/src/sys/net/if_ethersubr.c:770
#7  0xc066cb1d in ether_input (ifp=0xc0f9, m=0xc120e700) at 
/usr/src/sys/net/if_ethersubr.c:631
#8  0xc0711597 in sis_rxeof (sc=0xc0f9) at /usr/src/sys/pci/if_sis.c:1636
#9  0xc071185f in sis_poll (ifp=0xc0f9, cmd=POLL_ONLY, count=0) at 
/usr/src/sys/pci/if_sis.c:1769
#10 0xc05f2f5c in netisr_poll () at /usr/src/sys/kern/kern_poll.c:384
#11 0xc0673cc5 in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:338
#12 0xc05fb10c in ithread_loop (arg=0xc0ec6480) at 
/usr/src/sys/kern/kern_intr.c:547
#13 0xc05fa580 in fork_exit (callout=0xc05fafe8 , arg=0xc0ec6480, 
frame=0xc7692d48)
at /usr/src/sys/kern/kern_fork.c:791
#14 0xc07a26cc in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209
(kgdb) f 4
#4  0xc06996b4 in ip_fragment (ip=0xc124400e, m_frag=0xc7692c38, 
mtu=-1056768768, if_hwassist_flags=0, sw_csum=1)
at /usr/src/sys/netinet/ip_output.c:967
967 m->m_next = m_copy(m0, off, len);
(kgdb) f 3
#3  0xc063beb8 in m_copym (m=0x0, off0=1500, len=1480, wait=1) at 
/usr/src/sys/kern/uipc_mbuf.c:385
385 KASSERT(m != NULL, ("m_copym, offset > size of mbuf 
chain"));
(kgdb) quit
mbsd05# 




John Baldwin ([EMAIL PROTECTED]) wrote:
> On Monday 18 July 2005 11:42 pm, Edwin wrote:
> > Hi,
> >
> > I have a recurring (re-producible) panic on the 5.3/5.4 kernels and I would
> > like to ask for some help in tracking it down. :) - it could be some
> > misconfig on my part - but i have tried several different configs of the
> > kernel - ultimately w/ polling on/off, ipfw on/off, ipfastforwarding on/off
> > - although with ipff off - the box still crashes but in a different
> > location - it will even crash w/ GENERIC kernel under heavy load.
> >
> > I'm not quite sure where to look past the below (ie. what variables/etc to
> > present to the list).
> 
> Try turning INVARIANTS and INVARIANT_SUPPORT on in your kernel and see if you 
> can reproduce this.  Also, try to get a traceback in ddb if possible as 
> sometimes ddb gives more reliable stack traces.  It looks like your m is 
> NULL, in which case the KASSERT() on the previous line should fire if 
> INVARIANTS is on.
> 
> -- 
> John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help w/panic under heavy load - 5.4

2005-07-19 Thread John Baldwin
On Monday 18 July 2005 11:42 pm, Edwin wrote:
> Hi,
>
> I have a recurring (re-producible) panic on the 5.3/5.4 kernels and I would
> like to ask for some help in tracking it down. :) - it could be some
> misconfig on my part - but i have tried several different configs of the
> kernel - ultimately w/ polling on/off, ipfw on/off, ipfastforwarding on/off
> - although with ipff off - the box still crashes but in a different
> location - it will even crash w/ GENERIC kernel under heavy load.
>
> I'm not quite sure where to look past the below (ie. what variables/etc to
> present to the list).

Try turning INVARIANTS and INVARIANT_SUPPORT on in your kernel and see if you 
can reproduce this.  Also, try to get a traceback in ddb if possible as 
sometimes ddb gives more reliable stack traces.  It looks like your m is 
NULL, in which case the KASSERT() on the previous line should fire if 
INVARIANTS is on.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"