Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Edwin
Max Laier ([EMAIL PROTECTED]) wrote:
> 
> Edwin, what do you have for CFLAGS?  Can you try to downgrade to "-O" for now 
> so that we have a better chance to get a full view?
> 

Max, 

I have no CFLAGS or COPTFLAGS in /etc/make.conf - this was a basic
kern-developer install on a blank PC. The only thing that's a little 
different about the box that i use to compile is that it's a dual
processor machine - but no -j# options used in compilation of the kernel.

the compile is proceding with the following as an example output 
from make/cc

$ grep netinet /tmp/make.DEBUG1.output |grep fastfwd
cc -c -O -pipe  -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -fformat-extensions -std=c99 -g -nostdinc -I-  -I. 
-I/usr/src/sys -I/usr/src/sys/contrib/dev/
acpica -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter 
-I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib
/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm 
-I/usr/src/sys/dev/twa -D_KERNEL -incl
ude opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000 
 -mno-align-long-strings -mpreferred-stack-boundary=2  -mno-mmx -mno-3dnow 
-mno-sse -mno-sse2 -ffreestanding -Werror  /usr/src/sys/netinet/ip_fastfwd.c
$ 

are you referring to the -fformat-extensions, -fno-common and -finline...etc
optimizations as well? or just the -O v. -O2/-O3/-Os one? 

If yes to the -f* optimizations - besides commenting out parts of the makefiles
- is there a 'normal' way to disable them?

FWIW - I also had (I think) the same problem with the 5.3 release - but I 
never worked it out - just other things on my plate, so I don't believe it's
a recent code change (ie. 5.4 timeframe) if it does turn out to be a code 
change.

it also has something to do with the load on the box - I'm testing with
small udp packets (using iperf) - if I step up the size - I have to step
up the bandwidth in order to cause the panic. 


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Max Laier
On Sunday 24 July 2005 17:42, Simon 'corecode' Schubert wrote:
> On 24.07.2005, at 16:19, Edwin wrote:
> > (kgdb) f 13
> > #13 0xc068f6e9 in ip_fastforward (m=0xc12e2300) at
> > /usr/src/sys/netinet/ip_fastfwd.c:572
> > (kgdb) i loc
> > ip = (struct ip *) 0xc12f000e
> > m0 = (struct mbuf *) 0xc12f000e
> > ro = {ro_rt = 0xc11ee420, ro_dst = {sa_len = 16 '\020', sa_family = 2
> > '\002',
> > sa_data = "\000\000ˬ\002\005\000\000\000\000\000\000\000"}}
> > dst = (struct sockaddr_in *) 0xc76bfc3c
> > ia = (struct in_ifaddr *) 0x0
> > ifa = (struct ifaddr *) 0x0
> > ifp = (struct ifnet *) 0xc0f91800
> > odest = {s_addr = 84060352}
> > dest = {s_addr = 84060352}
> > sum = 0
> > ip_len = 0
> > error = 84060352
> > hlen = -1057417216
> > mtu = 0
> > __func__ = "ip_fastforward"
>
> error == 84060352 == dest.s_addr
> hlen == -1057417216 == 0xc0f91800 == ifp
>
> > (kgdb) f 12
> > #12 0xc0692b74 in ip_fragment (ip=0xc12f000e, m_frag=0xc76bfc6c,
> > mtu=-1056775680, if_hwassist_flags=0, sw_csum=1)
> > at /usr/src/sys/netinet/ip_output.c:967
> > 967 m->m_next = m_copy(m0, off, len);
> > (kgdb) i loc
> > mhip = (struct ip *) 0xc102e240
> > m = (struct mbuf *) 0xc102e200
> > mhlen = 20
> > error = 0
> > hlen = 20
> > len = 1480
> > off = 1500
> > m0 = (struct mbuf *) 0xc12e2300
> > firstlen = 1480
> > mnext = (struct mbuf **) 0xc12e2304
> > nfrags = 1
>
> mtu (parameter) == -1056775680 == 0xc102e200 == m
>
> your stack (or gdb) seems seriously broken

Not necessarily.  This can well be an effect of higher optimization levels.

Edwin, what do you have for CFLAGS?  Can you try to downgrade to "-O" for now 
so that we have a better chance to get a full view?

-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


pgpX2oM3GC74E.pgp
Description: PGP signature


Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Simon 'corecode' Schubert

On 24.07.2005, at 16:19, Edwin wrote:

(kgdb) f 13
#13 0xc068f6e9 in ip_fastforward (m=0xc12e2300) at 
/usr/src/sys/netinet/ip_fastfwd.c:572

(kgdb) i loc
ip = (struct ip *) 0xc12f000e
m0 = (struct mbuf *) 0xc12f000e
ro = {ro_rt = 0xc11ee420, ro_dst = {sa_len = 16 '\020', sa_family = 2 
'\002',

sa_data = "\000\000ˬ\002\005\000\000\000\000\000\000\000"}}
dst = (struct sockaddr_in *) 0xc76bfc3c
ia = (struct in_ifaddr *) 0x0
ifa = (struct ifaddr *) 0x0
ifp = (struct ifnet *) 0xc0f91800
odest = {s_addr = 84060352}
dest = {s_addr = 84060352}
sum = 0
ip_len = 0
error = 84060352
hlen = -1057417216
mtu = 0
__func__ = "ip_fastforward"


error == 84060352 == dest.s_addr
hlen == -1057417216 == 0xc0f91800 == ifp


(kgdb) f 12
#12 0xc0692b74 in ip_fragment (ip=0xc12f000e, m_frag=0xc76bfc6c, 
mtu=-1056775680, if_hwassist_flags=0, sw_csum=1)

at /usr/src/sys/netinet/ip_output.c:967
967 m->m_next = m_copy(m0, off, len);
(kgdb) i loc
mhip = (struct ip *) 0xc102e240
m = (struct mbuf *) 0xc102e200
mhlen = 20
error = 0
hlen = 20
len = 1480
off = 1500
m0 = (struct mbuf *) 0xc12e2300
firstlen = 1480
mnext = (struct mbuf **) 0xc12e2304
nfrags = 1


mtu (parameter) == -1056775680 == 0xc102e200 == m

your stack (or gdb) seems seriously broken

cheers
  simon

--
Serve - BSD +++  RENT this banner advert  +++ASCII Ribbon   /"\
Work - Mac  +++  space for low $$$ NOW!1  +++  Campaign \ /
Party Enjoy Relax   |   http://dragonflybsd.org  Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz   Mail + News   / \



PGP.sig
Description: This is a digitally signed message part


Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Edwin
New kernel: ident D1-0723 (same as D1-0722 - but w/ IPFIREWALL* options removed)

same traces asked for previously.

Thanks again,
/Edwin



kgdb kernel.debug /usr/local/STORAGE/crash/vmcore.1
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
#0  doadump () at pcpu.h:159
159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) where
#0  doadump () at pcpu.h:159
#1  0xc0460ef6 in db_fncall (dummy1=0, dummy2=0, dummy3=43, dummy4=0xc76bf9f4 
"(úkÇ")
at /usr/src/sys/ddb/db_command.c:531
#2  0xc0460d04 in db_command (last_cmdp=0xc08be624, cmd_table=0x0, 
aux_cmd_tablep=0xc083e324, 
aux_cmd_tablep_end=0xc083e340) at /usr/src/sys/ddb/db_command.c:349
#3  0xc0460dcc in db_command_loop () at /usr/src/sys/ddb/db_command.c:455
#4  0xc0462951 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
#5  0xc06277f2 in kdb_trap (type=3, code=0, tf=0xc76bfb30) at 
/usr/src/sys/kern/subr_kdb.c:468
#6  0xc07ad874 in trap (frame=
  {tf_fs = -949288936, tf_es = -1067319280, tf_ds = -1065287664, tf_edi = 
1, tf_esi = -1065233792, tf_ebp = -949224592, tf_isp = -949224612, tf_ebx = 
-949224548, tf_edx = 0, tf_ecx = -1060921344, tf_eax = 18, tf_trapno = 3, 
tf_err = 0, tf_eip = -1067289229, tf_cs = -1065287672, tf_eflags = 658, tf_esp 
= -949224560, tf_ss = -1067377425})
at /usr/src/sys/i386/i386/trap.c:584
#7  0xc079deaa in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#8  0xc76b0018 in ?? ()
#9  0xc0620010 in sched_runnable () at /usr/src/sys/kern/sched_4bsd.c:641
#10 0xc0611cef in panic (fmt=0xc081d280 "m_copym, offset > size of mbuf chain")
at /usr/src/sys/kern/kern_shutdown.c:550
#11 0xc064172c in m_copym (m=0x0, off0=1500, len=1480, wait=1) at 
/usr/src/sys/kern/uipc_mbuf.c:385
#12 0xc0692b74 in ip_fragment (ip=0xc12f000e, m_frag=0xc76bfc6c, 
mtu=-1056775680, if_hwassist_flags=0, sw_csum=1)
at /usr/src/sys/netinet/ip_output.c:967
#13 0xc068f6e9 in ip_fastforward (m=0xc12e2300) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
#14 0xc0672759 in ether_demux (ifp=0xc0f9, m=0xc12e2300) at 
/usr/src/sys/net/if_ethersubr.c:770
#15 0xc06724f5 in ether_input (ifp=0xc0f9, m=0xc12e2300) at 
/usr/src/sys/net/if_ethersubr.c:631
#16 0xc070a9e7 in sis_rxeof (sc=0xc0f9) at /usr/src/sys/pci/if_sis.c:1636
#17 0xc070ae6f in sis_intr (arg=0xc0f9) at /usr/src/sys/pci/if_sis.c:1841
#18 0xc0600130 in ithread_loop (arg=0xc0ec6880) at 
/usr/src/sys/kern/kern_intr.c:547
#19 0xc05ff5a4 in fork_exit (callout=0xc06c , arg=0xc0ec6880, 
frame=0xc76bfd48)
at /usr/src/sys/kern/kern_fork.c:791
#20 0xc079df0c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209
(kgdb) f 13
#13 0xc068f6e9 in ip_fastforward (m=0xc12e2300) at 
/usr/src/sys/netinet/ip_fastfwd.c:572
572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
(kgdb) l
567 m->m_pkthdr.csum_flags |= CSUM_IP;
568 /*
569  * ip_fragment expects ip_len and ip_off in 
host byte
570  * order but returns all packets in network 
byte order
571  */
572 if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
573 (~ifp->if_hwassist & 
CSUM_DELAY_IP))) {
574 goto drop;
575 }
576 KASSERT(m != NULL, ("null mbuf and no error"));
(kgdb) i loc
ip = (struct ip *) 0xc12f000e
m0 = (struct mbuf *) 0xc12f000e
ro = {ro_rt = 0xc11ee420, ro_dst = {sa_len = 16 '\020', sa_family = 2 '\002', 
sa_data = "\000\000ˬ\002\005\000\000\000\000\000\000\000"}}
dst = (struct sockaddr_in *) 0xc76bfc3c
ia = (struct in_ifaddr *) 0x0
ifa = (struct ifaddr *) 0x0
ifp = (struct ifnet *) 0xc0f91800
odest = {s_addr = 84060352}
dest = {s_addr = 84060352}
sum = 0
ip_len = 0
error = 84060352
hlen = -1057417216
mtu = 0
__func__ = "ip_fastforward"
(kgdb) p *ip
$1 = {ip_hl = 5, ip_v = 4, ip_tos = 0 '\0', ip_len = 10240, ip_id = 33436, 
ip_off = 0, ip_ttl = 64 '@', 
  ip_p = 17 '\021', ip_sum = 59733, ip_src = {s_addr = 67479744}, ip_dst = 
{s_addr = 84060352}}
(kgdb) p *m
$2 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xc12f000e "E", 
mh_len = 40, mh_flags = 3, 
mh_type = 1}, M_dat = {MH = {MH_pkthdr = {rcvif = 0xc0f9, len = 40, 
header = 0x0, csum_flags = 769, 
csum_data = 0, tags = {slh_first = 0x0}}, MH_dat = {MH_ext = {ext_buf = 
0xc12f "", ext_free = 0, 
  ext_args = 

Re: help w/panic under heavy load - 5.4

2005-07-24 Thread Max Laier
On Sunday 24 July 2005 04:38, Edwin wrote:
> If I understand correctly...(albeit an overly brief understanding :))
>
> 1. ethernet packet comes in - stuck into an mbuf
> 2. ether_demux calls ip_fastforward passing the mbuf struct
> 3. mbuf struct is copied/munged into ip struct by mtod
> 4. ntohs is called to change ip->ip_len to host byte order
>   incidentally - ip_len should be set to ntohs(ip->ip_len)
>   as well - it seems like neither one of those calls worked?
> 5. also - the call to set hlen to ip->ip_hl <<2 didn't work out well
>   either - right? since hlen = -1057417216, and i think it's
>   supposed to be 20 (5*4) - am I correct there as well?

4. and 5. are strange but not of too much significance.  Given that we got 
through the initial sanity checks and that neither is used further down, this 
might jut be an optimization effect.  You could try to mark ip_len as 
volatile.

> 6. due to ip->ip_len being in network byte order still a little
>   gremlin helps us to think we have a 10240 byte packet and we
>   need to fragment it...
> 7. in ip_fragment - ip->ip_len is still 10240 - so we assume that we
>   need to make several fragments - however, the mbuf is correct
>   (len = 40)
> 8. in ip_fragment - to create the 'second' fragment, we try to copy
>   1480 bytes @ offset 1500 out of the mbuf that only has a valid
>   data length of 40-bytes???

That's what happens, yes.

> Are we really looking for the cause of ip->ip_len not being in the correct
> order @ the right time then? - in that case - there's two possibilities
> that I see - and I don't think that ntohs not working (1) is too realistic,
> so I would suppose we are looking for what flipped it in the first place?
>
>   1. either ntohs didn't work for some reason, or
>   2. it was already in host order, and the ntohs call flipped it back to
>   network order

Neither seems very likely.  My guess is really *something* along the way 
messing things up - pfil is the only suspect I have, right now.

> If you feel that it's a ipfw/ipfil issue - I can easily take IPFIREWALL*
> options out of the kernel and build a new one - just give me about 15
> minutes.

Yes please and make sure it isn't loaded as a module either.

-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


pgprNotDzk7x2.pgp
Description: PGP signature