Re: FreeBSD 7.3, reboot after panic: double fault

2010-04-29 Thread c0re
And some more workaround:

On 7.2

ifconfig mtu
em0: flags=8843 metric 0 mtu 1500
gif0: flags=8051 metric 0 mtu 1280

Apache running on em0 alias address.

>From remote machine trying to download 10Kb file and looking into tcpdump.
No kernel panic. But 7.2 dows not respond to GET request, but in apache
access log I see that it responds with OK, and there no tcp push packets
from 7.2. So file not downloading. But at least no kernel panic :)
Then I setup 1200 mtu on em0 and everything is okay, file download
successful.

On 7.3 same mtu 1500 adn 1280 - kernel panic when trying to download file.
When I configure on em0 mtu 1200 - no kernel panic and file download
successful.

So it looks like mtu discovery related kernel panic.

Hope this will help to resolve problem.

Cheers.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 7.3, reboot after panic: double fault

2010-04-28 Thread c0re
I also tryed to use GIF interface istead of GRE - same result and same
backtrace in kgdb - kernel panic.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 7.3, reboot after panic: double fault

2010-04-23 Thread c0re
I tryed RELENG_7_3, RELENG_7, RELENG_8_0, RELENG_8 - results are same -
kernel panic.

This is backtrace of RELENG_8

host# kgdb kernel.debug /var/crash/vmcore.7
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

Unread portion of the kernel message buffer:

Fatal double fault:
eip = 0xc0933a38
esp = 0xe460bfd8
ebp = 0xe460c068
cpuid = 0; apic id = 00
panic: double fault
cpuid = 0
Uptime: 4m42s
Physical memory: 1011 MB
Dumping 68 MB: 53 37 21 5

Reading symbols from /boot/kernel/if_gre.ko...Reading symbols from
/boot/kernel/if_gre.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/if_gre.ko
#0  doadump () at pcpu.h:246
246 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:246
#1  0xc0891997 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416
#2  0xc0891c89 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:579
#3  0xc0b4525b in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:971
#4  0xc0933a38 in flowtable_lookup (ft=0xc4478000, ssa=0xe460c108,
dsa=0xe460c088, fibnum=0, flags=2050) at /usr/src/sys/net/flowtable.c:1115
#5  0xc093428c in flowtable_lookup_mbuf (ft=0xc4478000, m=0xc4470e00, af=2)
at /usr/src/sys/net/flowtable.c:607
#6  0xc09b141f in ip_output (m=0xc4470e00, opt=0x0, ro=0x0, flags=0,
imo=0x0, inp=0xc4c5d44c) at /usr/src/sys/netinet/ip_output.c:164
#7  0xc09baba6 in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1187
#8  0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#9  0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#10 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#11 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#12 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#13 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#14 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#15 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#16 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#17 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#18 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#19 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#20 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#21 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#22 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#23 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#24 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#25 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#26 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#27 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#28 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#29 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#30 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#31 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#32 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#33 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#34 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#35 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#36 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#37 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#38 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#39 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#40 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#41 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#42 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#43 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#44 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload.h:282
#45 0xc09bac9b in tcp_output (tp=0xc4c5f768) at
/usr/src/sys/netinet/tcp_output.c:1248
#46 0xc09bd5a8 in tcp_mtudisc (inp=0xc4c5d44c, errno=0) at tcp_offload

Re: FreeBSD 7.3, reboot after panic: double fault

2010-04-22 Thread c0re
Bjoern A. Zeeb, I send you e-mail with link to download kernel and dump.

And I remade kernel panic situation on virtual machines.

You need 2 freebsd machines for gre tunnel.
First need just to make gre tunnel like:

ifconfig em0 inet 10.0.0.1  netmask 255.255.255.0
ifconfig gre0 create
ifconfig gre0 inet 192.168.0.1 192.168.0.2 tunnel 10.10.0.1 10.10.0.2
netmask 255.255.255.252 link1 up
route add 10.10.0.3/32 10.10.0.2

Also this machine will be as a client to connect to remote. So we need to
install some browser like lynx.

Second machine:
Default installation of freebsd 7.3 with "src" checked in distributions.
After install - recompile kernel for IPFIREWALL_FORWARD support (mainly):

# Local additions
options IPFIREWALL  #firewall
options IPFIREWALL_VERBOSE  #enable logging to syslogd(8)
options IPFIREWALL_VERBOSE_LIMIT=1000   #limit verbosity
options IPFIREWALL_FORWARD  #packet destination changes
options IPDIVERT#divert sockets
options IPSTEALTH   #support for stealth forwarding
options DUMMYNET
device  carp

And make kernel KERNCONF=MYKERNEL

reboot and configure network and firewall:

ifconfig em0 inet 10.10.0.2  netmask 255.255.255.0
ifconfig em0 alias inet 10.0.0.3 netmask 255.255.255.255
ifconfig gre0 create
ifconfig gre0 inet 192.168.0.2 192.168.0.1 tunnel 10.0.0.2 10.0.0.1 netmask
255.255.255.252 link1 up

ipfw add 00100 fwd 192.168.0.1 icmp from 10.0.0.3 to any out via em0
ipfw add 00200 fwd 192.168.0.1 tcp from 10.0.0.3 80 to any out via em0
ipfw add 00300 fwd 192.168.0.1 tcp from 10.0.0.3 443 to any out via em0
ipfw add 00400 allow ip from any to any

At that moment you can check icmp ping from 10.0.0.1 10.0.0.3 and ipfw show
to view that ipfw fwd counters are working.

Next we need to have some tcp service. I used apache2.
So in port /usr/ports/www/apache20 make install clean.
apache20_enable="YES" in rc.conf
In /usr/local/etc/apache2/httpd.conf:
edit "Listen 80" to "Listen 10.0.0.3:80 "
and add virtual host with 10kb index.html

NameVirtualHost 10.0.0.3:80 
http://10.0.0.3/>>
   DocumentRoot /usr/local/www/test


mkdir /usr/local/www/test
dd if=/dev/random of=/usr/local/www/test/index.html bc=10k count=1

/usr/local/etc/rc.d/apache2 start

At that moment everything ready to panic :)
>From first machine i'm trying lynx http://10.0.0.3/

On second machine I see kernel panic.

When I was testing - I got no panic at first time. So I generated apache ssl
certs and adited ssl.conf. But next time I made same configuration - not
only 443, but 80 port connection made kernel panic too.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 7.3, reboot after panic: double fault

2010-04-21 Thread Bjoern A. Zeeb

On Tue, 20 Apr 2010, pluknet wrote:


On 20 April 2010 15:48, John Baldwin  wrote:

On Tuesday 20 April 2010 2:53:16 am c0re wrote:

Hello All!
I've upgraded freebsd from 7.0 to 7.3 and all was good until I tryed to
configure gre interface and use ipfw fwd.
I'm actually does not know what was the point of failure in my
configuration.

[ some details snipped ]

It worked about one week and then I made some configuration changes:
added gre interface and 2 aliases:

# cat /etc/rc.conf |grep
ifconfig_xl0="inet 192.168.0.10  netmask 255.255.255.0"
ifconfig_xl0_alias0="192.168.0.11 netmask 255.255.255.255"
ifconfig_xl0_alias1="192.168.0.12 netmask 255.255.255.255"
cloned_interfaces="gre0"
ifconfig_gre0="inet 192.168.250.6 192.168.250.5 tunnel 192.168.0.12
192.168.200.15 netmask 255.255.255.252 link1 up"

and

# cat /etc/rc.local
#!/bin/sh
ipfw add fwd 192.168.250.5 icmp from 192.168.0.11 to any out via xl0
ipfw add fwd 192.168.250.5 tcp from 192.168.0.11 443 to any out via xl0
ipfw add allow ip from any to any

# ifconfig gre0
gre0: flags=b050 metric 0 mtu
1476
        tunnel inet 192.168.0.12 --> 192.168.200.15
        inet 192.168.250.6 --> 192.168.250.5 netmask 0xfffc

I shutted down gre interface to prevent requests via gre to buggy IP.

The main idea of such configurations was: fwd all connections to https to
192.168.0.1 via gre interface.
And also I made apache configurations to make it listen on 192.168.0.11 too.

And make some tests: ping 192.168.0.11 - was fine, goes via gre. Telnet to
192.168.0.11  443 was fine too. Then I tryed to make browser https
connection to 192.168.0.11. Apache showed me certificate warning and I
accepted, then in browser nothing happened, it was trying to open page. But
server got kernel panic at that moment.

At first time I thought that it was some power failure, I tryed 2 more times
and got same behaviour.

So https works without kernel panic via 192.168.0.10 address but kernel
panics when I try do https via 192.168.0.11 address that source-forwarded
via gre.


Looks like the TCP output path got stuck in an infinite recursion loop until
it exhausted the kernel stack:


# cd /usr/obj/usr/src/sys/MYKERNEL
# kgdb kernel.debug /var/crash/vmcore.2
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

Unread portion of the kernel message buffer:

Fatal double fault:
eip = 0xc08e3ba3
esp = 0xccf6dfc4
ebp = 0xccf6e274
cpuid = 0; apic id = 00
panic: double fault
cpuid = 0
Uptime: 7m14s
Physical memory: 235 MB
Dumping 35 MB: 20 4

Reading symbols from /boot/kernel/acpi.ko...Reading symbols from
/boot/kernel/acpi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/acpi.ko
Reading symbols from /boot/kernel/if_gre.ko...Reading symbols from
/boot/kernel/if_gre.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/if_gre.ko
Reading symbols from /boot/kernel/linux.ko...Reading symbols from
/boot/kernel/linux.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/linux.ko
#0  doadump () at pcpu.h:196
196             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0xc07f2857 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc07f2b29 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0a7ea2b in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:983
#4  0xc08e3ba3 in ipfw_chk (args=0xccf6e28c) at
/usr/src/sys/netinet/ip_fw2.c:2465
#5  0xc08e6ce1 in ipfw_check_out (arg=0x0, m0=0xccf6e390, ifp=0xc25c5c00,
dir=2, inp=0xc28ba708) at /usr/src/sys/netinet/ip_fw_pfil.c:248
#6  0xc08a1968 in pfil_run_hooks (ph=0xc0c55240, mp=0xccf6e420,
ifp=0xc25c5c00, dir=2, inp=0xc28ba708) at /usr/src/sys/net/pfil.c:78
#7  0xc08eb6f2 in ip_output (m=0xc2710b00, opt=0x0, ro=0xccf6e3f4, flags=0,
imo=0x0, inp=0xc28ba708) at /usr/src/sys/netinet/ip_output.c:443
#8  0xc08f4016 in tcp_output (tp=0xc25b2570) at
/usr/src/sys/netinet/tcp_output.c:1134


[twiddle]


#47 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
#48 0xc08f4105 in tcp_output (tp=0xc25b2570) at
/usr/src/sys/netinet/tcp_output.c:1195
#49 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
---Type  to continue, or q  to quit---
#50 0xc08f4105 in tcp_output (tp=0xc25b2570) at
/usr/src/sys/netinet/tcp_output.c:1195
#51 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
#52 0xc08f4105 in tcp_output (tp=0xc25b2570) at
/usr/src/sys/netinet/tcp_output.c:1195
#53 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
#54 0xc08f4105 in tcp_output (tp=0xc25b2570) at
/usr/src/sys/netinet/tcp_output.c:1195
#55 0xc08fdcf8 in tcp_

Re: FreeBSD 7.3, reboot after panic: double fault

2010-04-20 Thread pluknet
On 20 April 2010 15:48, John Baldwin  wrote:
> On Tuesday 20 April 2010 2:53:16 am c0re wrote:
>> Hello All!
>> I've upgraded freebsd from 7.0 to 7.3 and all was good until I tryed to
>> configure gre interface and use ipfw fwd.
>> I'm actually does not know what was the point of failure in my
>> configuration.
>>
>> [ some details snipped ]
>>
>> It worked about one week and then I made some configuration changes:
>> added gre interface and 2 aliases:
>>
>> # cat /etc/rc.conf |grep
>> ifconfig_xl0="inet 192.168.0.10  netmask 255.255.255.0"
>> ifconfig_xl0_alias0="192.168.0.11 netmask 255.255.255.255"
>> ifconfig_xl0_alias1="192.168.0.12 netmask 255.255.255.255"
>> cloned_interfaces="gre0"
>> ifconfig_gre0="inet 192.168.250.6 192.168.250.5 tunnel 192.168.0.12
>> 192.168.200.15 netmask 255.255.255.252 link1 up"
>>
>> and
>>
>> # cat /etc/rc.local
>> #!/bin/sh
>> ipfw add fwd 192.168.250.5 icmp from 192.168.0.11 to any out via xl0
>> ipfw add fwd 192.168.250.5 tcp from 192.168.0.11 443 to any out via xl0
>> ipfw add allow ip from any to any
>>
>> # ifconfig gre0
>> gre0: flags=b050 metric 0 mtu
>> 1476
>>         tunnel inet 192.168.0.12 --> 192.168.200.15
>>         inet 192.168.250.6 --> 192.168.250.5 netmask 0xfffc
>>
>> I shutted down gre interface to prevent requests via gre to buggy IP.
>>
>> The main idea of such configurations was: fwd all connections to https to
>> 192.168.0.1 via gre interface.
>> And also I made apache configurations to make it listen on 192.168.0.11 too.
>>
>> And make some tests: ping 192.168.0.11 - was fine, goes via gre. Telnet to
>> 192.168.0.11  443 was fine too. Then I tryed to make browser https
>> connection to 192.168.0.11. Apache showed me certificate warning and I
>> accepted, then in browser nothing happened, it was trying to open page. But
>> server got kernel panic at that moment.
>>
>> At first time I thought that it was some power failure, I tryed 2 more times
>> and got same behaviour.
>>
>> So https works without kernel panic via 192.168.0.10 address but kernel
>> panics when I try do https via 192.168.0.11 address that source-forwarded
>> via gre.
>
> Looks like the TCP output path got stuck in an infinite recursion loop until
> it exhausted the kernel stack:
>
>> # cd /usr/obj/usr/src/sys/MYKERNEL
>> # kgdb kernel.debug /var/crash/vmcore.2
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "i386-marcel-freebsd"...
>>
>> Unread portion of the kernel message buffer:
>>
>> Fatal double fault:
>> eip = 0xc08e3ba3
>> esp = 0xccf6dfc4
>> ebp = 0xccf6e274
>> cpuid = 0; apic id = 00
>> panic: double fault
>> cpuid = 0
>> Uptime: 7m14s
>> Physical memory: 235 MB
>> Dumping 35 MB: 20 4
>>
>> Reading symbols from /boot/kernel/acpi.ko...Reading symbols from
>> /boot/kernel/acpi.ko.symbols...done.
>> done.
>> Loaded symbols for /boot/kernel/acpi.ko
>> Reading symbols from /boot/kernel/if_gre.ko...Reading symbols from
>> /boot/kernel/if_gre.ko.symbols...done.
>> done.
>> Loaded symbols for /boot/kernel/if_gre.ko
>> Reading symbols from /boot/kernel/linux.ko...Reading symbols from
>> /boot/kernel/linux.ko.symbols...done.
>> done.
>> Loaded symbols for /boot/kernel/linux.ko
>> #0  doadump () at pcpu.h:196
>> 196             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
>> (kgdb) bt
>> #0  doadump () at pcpu.h:196
>> #1  0xc07f2857 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
>> #2  0xc07f2b29 in panic (fmt=Variable "fmt" is not available.
>> ) at /usr/src/sys/kern/kern_shutdown.c:574
>> #3  0xc0a7ea2b in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:983
>> #4  0xc08e3ba3 in ipfw_chk (args=0xccf6e28c) at
>> /usr/src/sys/netinet/ip_fw2.c:2465
>> #5  0xc08e6ce1 in ipfw_check_out (arg=0x0, m0=0xccf6e390, ifp=0xc25c5c00,
>> dir=2, inp=0xc28ba708) at /usr/src/sys/netinet/ip_fw_pfil.c:248
>> #6  0xc08a1968 in pfil_run_hooks (ph=0xc0c55240, mp=0xccf6e420,
>> ifp=0xc25c5c00, dir=2, inp=0xc28ba708) at /usr/src/sys/net/pfil.c:78
>> #7  0xc08eb6f2 in ip_output (m=0xc2710b00, opt=0x0, ro=0xccf6e3f4, flags=0,
>> imo=0x0, inp=0xc28ba708) at /usr/src/sys/netinet/ip_output.c:443
>> #8  0xc08f4016 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1134
>> #9  0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #10 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #11 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #12 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #13 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #14 0xc08f4105 in tcp_output (tp=0xc

Re: FreeBSD 7.3, reboot after panic: double fault

2010-04-20 Thread John Baldwin
On Tuesday 20 April 2010 2:53:16 am c0re wrote:
> Hello All!
> I've upgraded freebsd from 7.0 to 7.3 and all was good until I tryed to
> configure gre interface and use ipfw fwd.
> I'm actually does not know what was the point of failure in my
> configuration.
>
> [ some details snipped ]
> 
> It worked about one week and then I made some configuration changes:
> added gre interface and 2 aliases:
> 
> # cat /etc/rc.conf |grep
> ifconfig_xl0="inet 192.168.0.10  netmask 255.255.255.0"
> ifconfig_xl0_alias0="192.168.0.11 netmask 255.255.255.255"
> ifconfig_xl0_alias1="192.168.0.12 netmask 255.255.255.255"
> cloned_interfaces="gre0"
> ifconfig_gre0="inet 192.168.250.6 192.168.250.5 tunnel 192.168.0.12
> 192.168.200.15 netmask 255.255.255.252 link1 up"
> 
> and
> 
> # cat /etc/rc.local
> #!/bin/sh
> ipfw add fwd 192.168.250.5 icmp from 192.168.0.11 to any out via xl0
> ipfw add fwd 192.168.250.5 tcp from 192.168.0.11 443 to any out via xl0
> ipfw add allow ip from any to any
> 
> # ifconfig gre0
> gre0: flags=b050 metric 0 mtu
> 1476
> tunnel inet 192.168.0.12 --> 192.168.200.15
> inet 192.168.250.6 --> 192.168.250.5 netmask 0xfffc
> 
> I shutted down gre interface to prevent requests via gre to buggy IP.
> 
> The main idea of such configurations was: fwd all connections to https to
> 192.168.0.1 via gre interface.
> And also I made apache configurations to make it listen on 192.168.0.11 too.
> 
> And make some tests: ping 192.168.0.11 - was fine, goes via gre. Telnet to
> 192.168.0.11  443 was fine too. Then I tryed to make browser https
> connection to 192.168.0.11. Apache showed me certificate warning and I
> accepted, then in browser nothing happened, it was trying to open page. But
> server got kernel panic at that moment.
> 
> At first time I thought that it was some power failure, I tryed 2 more times
> and got same behaviour.
> 
> So https works without kernel panic via 192.168.0.10 address but kernel
> panics when I try do https via 192.168.0.11 address that source-forwarded
> via gre.

Looks like the TCP output path got stuck in an infinite recursion loop until 
it exhausted the kernel stack:

> # cd /usr/obj/usr/src/sys/MYKERNEL
> # kgdb kernel.debug /var/crash/vmcore.2
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> Fatal double fault:
> eip = 0xc08e3ba3
> esp = 0xccf6dfc4
> ebp = 0xccf6e274
> cpuid = 0; apic id = 00
> panic: double fault
> cpuid = 0
> Uptime: 7m14s
> Physical memory: 235 MB
> Dumping 35 MB: 20 4
> 
> Reading symbols from /boot/kernel/acpi.ko...Reading symbols from
> /boot/kernel/acpi.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/acpi.ko
> Reading symbols from /boot/kernel/if_gre.ko...Reading symbols from
> /boot/kernel/if_gre.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/if_gre.ko
> Reading symbols from /boot/kernel/linux.ko...Reading symbols from
> /boot/kernel/linux.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/linux.ko
> #0  doadump () at pcpu.h:196
> 196 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
> (kgdb) bt
> #0  doadump () at pcpu.h:196
> #1  0xc07f2857 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
> #2  0xc07f2b29 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:574
> #3  0xc0a7ea2b in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:983
> #4  0xc08e3ba3 in ipfw_chk (args=0xccf6e28c) at
> /usr/src/sys/netinet/ip_fw2.c:2465
> #5  0xc08e6ce1 in ipfw_check_out (arg=0x0, m0=0xccf6e390, ifp=0xc25c5c00,
> dir=2, inp=0xc28ba708) at /usr/src/sys/netinet/ip_fw_pfil.c:248
> #6  0xc08a1968 in pfil_run_hooks (ph=0xc0c55240, mp=0xccf6e420,
> ifp=0xc25c5c00, dir=2, inp=0xc28ba708) at /usr/src/sys/net/pfil.c:78
> #7  0xc08eb6f2 in ip_output (m=0xc2710b00, opt=0x0, ro=0xccf6e3f4, flags=0,
> imo=0x0, inp=0xc28ba708) at /usr/src/sys/netinet/ip_output.c:443
> #8  0xc08f4016 in tcp_output (tp=0xc25b2570) at
> /usr/src/sys/netinet/tcp_output.c:1134
> #9  0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
> #10 0xc08f4105 in tcp_output (tp=0xc25b2570) at
> /usr/src/sys/netinet/tcp_output.c:1195
> #11 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
> #12 0xc08f4105 in tcp_output (tp=0xc25b2570) at
> /usr/src/sys/netinet/tcp_output.c:1195
> #13 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
> #14 0xc08f4105 in tcp_output (tp=0xc25b2570) at
> /usr/src/sys/netinet/tcp_output.c:1195
> #15 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
> #16 0xc08f4105 in 

Re: reboot after panic

2008-05-06 Thread Clifton Royston
On Tue, May 06, 2008 at 06:56:42AM -0700, Jeremy Chadwick wrote:
> On Tue, May 06, 2008 at 09:47:59AM -0400, Stephen Clark wrote:
...
> > but there is never one. It is like it hangs trying to dump the memory image.
> >
> > This mother board has both sata and pata controllers but I am using only 
> > pata
> > drives.
> 
> A kernel panic causes the kernel to dump all memory contents (from start
> to end) to whatever swap device is available.  It's written to the disk
> in a fairly "raw" format, with some header data of some sort I think.
> After it's done, the system should reboot.
> 
> My guess is that you either don't have any swap defined, swap is defined
> incorrectly (disklabel -r output would be useful), or your swap space is
> smaller than your total amount of memory.  (Swap should usually be 2x
> RAM).
> 
> dumpdir and dumpdev are used during the startup process, where
> savecore(8) is called.  The memory dump on the swap device is extracted
> and stored in a file in $dumpdir, which you can examine later.  Keep in
> mind that savecore(8) will use /dev/dumpdev, which is a symlink to
> whatever device your swap lives on -- and that's determined by reading
> /etc/fstab.
> 
> Does this help?  :-)

  You might consider the possibility that he is correct in what he has
said, rather precisely, is going on.  FreeBSD 6.2 (and apparently 6.1)
can indeed double-fault or hang during the panic dump in which case no
reboot occurs and there is nothing successfully dumped to analyze for
debugging, either.  It is likely that it only occurs with certain
hardware conditions or configurations, which is why not everyone would
see it, but that's not the same thing as it being a hardware problem.

  I've been seeing this on all of my SMP machines since October, and
have reported it onlist, and I've been successfully running FreeBSD for
nearly 10 years (starting with 3.3) and BSD/OS for years before that. 
It ain't necessarily PEBKAC.

  -- Clifton

-- 
Clifton Royston  --  [EMAIL PROTECTED] / [EMAIL PROTECTED]
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic

2008-05-06 Thread Stephen Clark

Jeremy Chadwick wrote:

On Tue, May 06, 2008 at 09:47:59AM -0400, Stephen Clark wrote:

Jeremy Chadwick wrote:

On Fri, May 02, 2008 at 09:40:20AM -0400, Stephen Clark wrote:

Mine is a nvidia 6300 mb with a dual core amd processor. I am causing the panic
while trying to develope a DD for a EVDO usb modem - so it is not a great 
problem - I was just surprised it wasn't rebooting. This is a 6.1 system.


Yes it is sort of discouraging that it is hard to get answers when you 
aren't running the latest and greatest kernel. In our case we have over 
500 units in
the field running a mix of 4.9 and 6.1 and it is not feasible to 
continually upgrade them, especially since there is no documented way to 
reliably upgrade

a remote installation.

Does the system reboot OK if you issue the "reboot" command?

If not, then the problem is likely with the reboot method being used
(ACPI vs. non-ACPI) or ACPI tweakage prior to reboot, and not anything
to do with panics.  See the following two sysctls:

hw.acpi.disable_on_reboot
hw.acpi.handle_reboot

It reboots fine when I "shutdown -r now". It is only after a panic
that it hangs. I have it set to save the crash dump:
dumpdev="AUTO"  # Device to crashdump to (device name, AUTO, or NO).
dumpdir="/var/crash"# Directory where crash dumps are to be stored

but there is never one. It is like it hangs trying to dump the memory image.

This mother board has both sata and pata controllers but I am using only pata
drives.


A kernel panic causes the kernel to dump all memory contents (from start
to end) to whatever swap device is available.  It's written to the disk
in a fairly "raw" format, with some header data of some sort I think.
After it's done, the system should reboot.

My guess is that you either don't have any swap defined, swap is defined
incorrectly (disklabel -r output would be useful), or your swap space is
smaller than your total amount of memory.  (Swap should usually be 2x
RAM).

dumpdir and dumpdev are used during the startup process, where
savecore(8) is called.  The memory dump on the swap device is extracted
and stored in a file in $dumpdir, which you can examine later.  Keep in
mind that savecore(8) will use /dev/dumpdev, which is a symlink to
whatever device your swap lives on -- and that's determined by reading
/etc/fstab.

Does this help?  :-)


Hi Jeremy,

Thanks for the response but I think I have everything set up OK.

from top:
Mem: 33M Active, 19M Inact, 56M Wired, 54M Buf, 762M Free
Swap: 2048M Total, 2048M Free

$ sudo disklabel /dev/ad0s1
# /dev/ad0s1:
8 partitions:
#size   offsetfstype   [fsize bsize bps/cpg]
  a:   20480004.2BSD 1024  819223
  b:  4194304   204800  swap
  c: 781561620unused0 0 # "raw" part, don't edit
  d: 45879682  43991044.2BSD 2048 1638489
  e:   409600 502787864.2BSD 2048 1638497
  f:  2097152 506883864.2BSD 2048 1638489
  g: 12685312 527855384.2BSD 2048 1638489
  h: 12685312 654708504.2BSD 2048 1638489
J301002:~

1 gig of memory
$ sysctl  -a |grep physmem
hw.physmem: 929439744

$ ls -al /dev/dumpdev
lrwxr-xr-x  1 root  wheel  11 May  6 05:39 /dev/dumpdev -> /dev/ad0s1b

$ less /etc/fstab
# DeviceMountpoint  FStype  Options DumpPass#
/dev/ad0s1b noneswapsw  0   0

Any other ideas?

Regards,
Steve
--

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic

2008-05-06 Thread Jeremy Chadwick
On Tue, May 06, 2008 at 09:47:59AM -0400, Stephen Clark wrote:
> Jeremy Chadwick wrote:
>> On Fri, May 02, 2008 at 09:40:20AM -0400, Stephen Clark wrote:
>>> Mine is a nvidia 6300 mb with a dual core amd processor. I am causing the 
>>> panic
>>> while trying to develope a DD for a EVDO usb modem - so it is not a great 
>>> problem - I was just surprised it wasn't rebooting. This is a 6.1 system.
>>>
>>> Yes it is sort of discouraging that it is hard to get answers when you 
>>> aren't running the latest and greatest kernel. In our case we have over 
>>> 500 units in
>>> the field running a mix of 4.9 and 6.1 and it is not feasible to 
>>> continually upgrade them, especially since there is no documented way to 
>>> reliably upgrade
>>> a remote installation.
>>
>> Does the system reboot OK if you issue the "reboot" command?
>>
>> If not, then the problem is likely with the reboot method being used
>> (ACPI vs. non-ACPI) or ACPI tweakage prior to reboot, and not anything
>> to do with panics.  See the following two sysctls:
>>
>> hw.acpi.disable_on_reboot
>> hw.acpi.handle_reboot
>
> It reboots fine when I "shutdown -r now". It is only after a panic
> that it hangs. I have it set to save the crash dump:
> dumpdev="AUTO"  # Device to crashdump to (device name, AUTO, or NO).
> dumpdir="/var/crash"# Directory where crash dumps are to be stored
>
> but there is never one. It is like it hangs trying to dump the memory image.
>
> This mother board has both sata and pata controllers but I am using only pata
> drives.

A kernel panic causes the kernel to dump all memory contents (from start
to end) to whatever swap device is available.  It's written to the disk
in a fairly "raw" format, with some header data of some sort I think.
After it's done, the system should reboot.

My guess is that you either don't have any swap defined, swap is defined
incorrectly (disklabel -r output would be useful), or your swap space is
smaller than your total amount of memory.  (Swap should usually be 2x
RAM).

dumpdir and dumpdev are used during the startup process, where
savecore(8) is called.  The memory dump on the swap device is extracted
and stored in a file in $dumpdir, which you can examine later.  Keep in
mind that savecore(8) will use /dev/dumpdev, which is a symlink to
whatever device your swap lives on -- and that's determined by reading
/etc/fstab.

Does this help?  :-)

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic

2008-05-06 Thread Stephen Clark

Jeremy Chadwick wrote:

On Fri, May 02, 2008 at 09:40:20AM -0400, Stephen Clark wrote:

Mine is a nvidia 6300 mb with a dual core amd processor. I am causing the panic
while trying to develope a DD for a EVDO usb modem - so it is not a great 
problem - I was just surprised it wasn't rebooting. This is a 6.1 system.


Yes it is sort of discouraging that it is hard to get answers when you 
aren't running the latest and greatest kernel. In our case we have over 500 
units in
the field running a mix of 4.9 and 6.1 and it is not feasible to 
continually upgrade them, especially since there is no documented way to 
reliably upgrade

a remote installation.


Does the system reboot OK if you issue the "reboot" command?

If not, then the problem is likely with the reboot method being used
(ACPI vs. non-ACPI) or ACPI tweakage prior to reboot, and not anything
to do with panics.  See the following two sysctls:

hw.acpi.disable_on_reboot
hw.acpi.handle_reboot



It reboots fine when I "shutdown -r now". It is only after a panic
that it hangs. I have it set to save the crash dump:
dumpdev="AUTO"  # Device to crashdump to (device name, AUTO, or NO).
dumpdir="/var/crash"# Directory where crash dumps are to be stored

but there is never one. It is like it hangs trying to dump the memory image.

This mother board has both sata and pata controllers but I am using only pata
drives.

dmesg from boot.
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 6.1-STABLE #36: Fri May  2 09:02:26 EDT 2008
[EMAIL PROTECTED]:/mnt/src/sys/i386/compile/WOLFPAC6SMP
ACPI APIC Table: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ (2410.99-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20f32  Stepping = 2

Features=0x178bfbff
  Features2=0x1
  AMD Features=0xe2500800
  AMD Features2=0x3
  Cores per package: 2
real memory  = 938409984 (894 MB)
avail memory = 908926976 (866 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 2
ioapic0  irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0:  on motherboard
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi0: Power Button (fixed)
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi0: reservation of 1bf0, 10 (3) failed
acpi0: reservation of 2bf0, 10 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0:  on acpi0
cpu1:  on acpi0
acpi_button0:  on acpi0
acpi_button1:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pci0:  at device 0.0 (no driver attached)
pci0:  at device 0.1 (no driver attached)
pci0:  at device 0.2 (no driver attached)
pci0:  at device 0.3 (no driver attached)
pci0:  at device 0.4 (no driver attached)
pci0:  at device 0.5 (no driver attached)
pci0:  at device 0.6 (no driver attached)
pci0:  at device 0.7 (no driver attached)
pcib1:  at device 2.0 on pci0
pci1:  on pcib1
pcib2:  at device 3.0 on pci0
pci2:  on pcib2
pcib3:  at device 4.0 on pci0
pci3:  on pcib3
pci0:  at device 5.0 (no driver attached)
pci0:  at device 9.0 (no driver attached)
isab0:  at device 10.0 on pci0
isa0:  on isab0
pci0:  at device 10.1 (no driver attached)
pci0:  at device 10.2 (no driver attached)
ohci0:  mem 0xfe02f000-0xfe02 irq 21 at 
device 11.0 on pci0

ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0:  on ohci0
usb0: USB revision 1.0
usbd_get_string: getting lang failed, using 0
uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 8 ports with 8 removable, self powered
ehci0:  mem 0xfe02e000-0xfe02e0ff at device 
11.1 on pci0

ehci0: [GIANT-LOCKED]
usb1: waiting for BIOS to give up control
usb1: timed out waiting for BIOS
usb1: EHCI version 1.0
usb1: companion controller, 8 ports each: usb0
usb1:  on ehci0
usb1: USB revision 2.0
uhub1: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub1: 8 ports with 8 removable, self powered
atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf400

-0xf40f at device 13.0 on pci0
ata0:  on atapci0
ata1:  on atapci0
atapci1:  port 
0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-

0xb73,0xe000-0xe00f mem 0xfe02d000-0xfe02dfff irq 22 at device 14.0 on pci0
ata2:  on atapci1
ata3:  on atapci1
pcib4:  at device 16.0 on pci0
pci4:  on pcib4
rl0:  port 0xac00-0xacff mem 0xfdaff000-0xfdaff0ff 
irq 18 at device 6.

0 on pci4
miibus0:  on rl0
rlphy0:  on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: Ethernet address: 00:50:fc:fb:1e:82
rl1:  port 0xa800-0xa8ff mem 0xfdafe000-0xfdafe0ff 
irq 16 at device 8.

0 on pci4
miibus1:  on rl1
rlphy1:  on miibus1
rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 1

Re: reboot after panic

2008-05-06 Thread Jeremy Chadwick
On Fri, May 02, 2008 at 09:40:20AM -0400, Stephen Clark wrote:
> Mine is a nvidia 6300 mb with a dual core amd processor. I am causing the 
> panic
> while trying to develope a DD for a EVDO usb modem - so it is not a great 
> problem - I was just surprised it wasn't rebooting. This is a 6.1 system.
>
> Yes it is sort of discouraging that it is hard to get answers when you 
> aren't running the latest and greatest kernel. In our case we have over 500 
> units in
> the field running a mix of 4.9 and 6.1 and it is not feasible to 
> continually upgrade them, especially since there is no documented way to 
> reliably upgrade
> a remote installation.

Does the system reboot OK if you issue the "reboot" command?

If not, then the problem is likely with the reboot method being used
(ACPI vs. non-ACPI) or ACPI tweakage prior to reboot, and not anything
to do with panics.  See the following two sysctls:

hw.acpi.disable_on_reboot
hw.acpi.handle_reboot

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic

2008-05-06 Thread Scott Oertel
Stephen Clark wrote:
> Hello List
>
> How do I get my freebsd 6.1 box to automatically reboot after a panic?
>
> Thanks,
> Steve
According to the handbook this is the default behavior unless you have
KBD option enabled in your kernel, in which case adding KDB_UNATTENDED
would cause the machine not to break to the debugger and to reboot after
a panic.

http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-options.html


-Scott Oertel
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


FreeBSD 6.2 kernel parameters (Was Re: reboot after panic)

2008-05-04 Thread Clifton Royston
  I got a couple requests for the tuning settings I've been using; it
seems I'm not the only one who's had problems with FreeBSD 6.x
stability as compared to 4.x.

  I don't understand the kernel well enough to say whether any of these
are to any extent "right", but despite being pure voodoo, they seem to
have helped substantially in performance and stability.  I picked some
of them off the discussion about tuning required to get ZFS running
stably, ditto for discussion of ggatec/ggated.  (The first 3 settings
in sysctl - through kern.ipc.somaxconn - were carried over from my 4.x
config.)

  Here's what I've got, and any solid recommendations from the kernel
developers will I'm sure get close attention, and not only from me. 
I.e. if you can authoritatively tell me I'm an idiot, and some of these
aren't really helping, or tell me what will, great.  As I say, they
seem to help.

  Ditto if someone can really say authoritatively whether 6.3 is in
practice more stable and higher performance than 6.2; with all the
discussed problems on list, and the discussions of known fixes not
being committed, I have not felt confident about making such a move.

/boot/loader.conf:
kern.ipc.nmbclusters="32768"
vm.kmem_size="512M"
vm.kmem_size_max="512M"
kern.maxvnodes="40"

/etc/sysctl.conf:
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#
# TUNING VALUES confirmed by Cal's mailserver testing
kern.maxfiles=16384
kern.maxfilesperproc=16384
# Speculative: enlarge listen queue for large number of incoming TCP conns
# Default listen queue size = 128, 1024 recommended for busy webservers
kern.ipc.somaxconn=1024
# From FreeBSD mailing list, reported on improving stability with
# ggatec/ggated.
net.inet.tcp.sendspace=1048576
net.inet.tcp.recvspace=1048576
kern.ipc.maxsockbuf=2049152
# Disable hyperthreading "logical CPUs"
machdep.hlt_logical_cpus=1

  -- Clifton

-- 
Clifton Royston  --  [EMAIL PROTECTED] / [EMAIL PROTECTED]
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic

2008-05-02 Thread Stephen Clark

Clifton Royston wrote:

On Thu, May 01, 2008 at 03:15:51PM -0400, Stephen Clark wrote:

Matthew X. Economou wrote:

Steve,

I recall having to set dumpdev in /etc/rc.conf before I could get
FreeBSD to reboot automatically after a panic.  I have dumpdev=AUTO
set on all of my headless servers.  If you are feeling especially
brave, you can also set fsck_y_enable=YES and background_fsck=NO.

Good luck!  ;)


Hmmm... I have that set. It only seems to not reboot on one
system I have.


  FWIW, I've had problems with 6.2 not rebooting reliably on several
SMP P4 Xeon systems; the problem seems to be that sometimes while
dumping it either freezes completely or double-faults and hangs at that
point until physically reset.

  This problem appeared simultaneously on several SMP servers when they
were upgraded to 6.2, after they had run reliably for years on FreeBSD
4.x.  Adding insult to injury, when it does dump successfully, I don't
reliably get an image saved in /var/crash.  (And if I did, it doesn't
appear that it would do me any good as nobody is interested any longer
in problems with 6.2.)

  Thankfully, via a combination of adding RAM and tuning kernel
parameters I eventually got them to where they'll reliably stay up for
reasonably long stretches, certainly more than the 20 days uptime I was
getting when I first upgraded them.
  -- Clifton
 

Thank Clifton,

Mine is a nvidia 6300 mb with a dual core amd processor. I am causing the panic
while trying to develope a DD for a EVDO usb modem - so it is not a great 
problem - I was just surprised it wasn't rebooting. This is a 6.1 system.


Yes it is sort of discouraging that it is hard to get answers when you aren't 
running the latest and greatest kernel. In our case we have over 500 units in
the field running a mix of 4.9 and 6.1 and it is not feasible to continually 
upgrade them, especially since there is no documented way to reliably upgrade

a remote installation.

Steve

--

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic

2008-05-01 Thread Clifton Royston
On Thu, May 01, 2008 at 03:15:51PM -0400, Stephen Clark wrote:
> Matthew X. Economou wrote:
> >Steve,
> >
> >I recall having to set dumpdev in /etc/rc.conf before I could get
> >FreeBSD to reboot automatically after a panic.  I have dumpdev=AUTO
> >set on all of my headless servers.  If you are feeling especially
> >brave, you can also set fsck_y_enable=YES and background_fsck=NO.
> >
> >Good luck!  ;)
> >
> Hmmm... I have that set. It only seems to not reboot on one
> system I have.

  FWIW, I've had problems with 6.2 not rebooting reliably on several
SMP P4 Xeon systems; the problem seems to be that sometimes while
dumping it either freezes completely or double-faults and hangs at that
point until physically reset.

  This problem appeared simultaneously on several SMP servers when they
were upgraded to 6.2, after they had run reliably for years on FreeBSD
4.x.  Adding insult to injury, when it does dump successfully, I don't
reliably get an image saved in /var/crash.  (And if I did, it doesn't
appear that it would do me any good as nobody is interested any longer
in problems with 6.2.)

  Thankfully, via a combination of adding RAM and tuning kernel
parameters I eventually got them to where they'll reliably stay up for
reasonably long stretches, certainly more than the 20 days uptime I was
getting when I first upgraded them.
  -- Clifton
 
-- 
Clifton Royston  --  [EMAIL PROTECTED] / [EMAIL PROTECTED]
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic

2008-05-01 Thread Stephen Clark

Matthew X. Economou wrote:

Steve,

I recall having to set dumpdev in /etc/rc.conf before I could get
FreeBSD to reboot automatically after a panic.  I have dumpdev=AUTO
set on all of my headless servers.  If you are feeling especially
brave, you can also set fsck_y_enable=YES and background_fsck=NO.

Good luck!  ;)


Hmmm... I have that set. It only seems to not reboot on one
system I have.

Thanks,
Steve

--

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: reboot after panic

2008-05-01 Thread Matthew X. Economou
Steve,

I recall having to set dumpdev in /etc/rc.conf before I could get
FreeBSD to reboot automatically after a panic.  I have dumpdev=AUTO
set on all of my headless servers.  If you are feeling especially
brave, you can also set fsck_y_enable=YES and background_fsck=NO.

Good luck!  ;)

-- 
"I slashread your textcast about jargon and nodnodnod with your
cyber-sentiment." - gad_zuki!


smime.p7s
Description: S/MIME cryptographic signature


Re: reboot after panic

2008-05-01 Thread Wilko Bulte
Quoting Stephen Clark, who wrote on Thu, May 01, 2008 at 08:44:42AM -0400 ..
> Hello List
> 
> How do I get my freebsd 6.1 box to automatically reboot after a panic?

It should do that automatically?

> Thanks,
> Steve
> -- 
> 
> "They that give up essential liberty to obtain temporary safety,
> deserve neither liberty nor safety."  (Ben Franklin)
> 
> "The course of history shows that as a government grows, liberty
> decreases."  (Thomas Jefferson)
> 
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
--- end of quoted text ---

-- 
Wilko Bulte [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


reboot after panic

2008-05-01 Thread Stephen Clark

Hello List

How do I get my freebsd 6.1 box to automatically reboot after a panic?

Thanks,
Steve
--

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


reboot after panic: page fault

2008-04-13 Thread Spil Oss
Hi all,

Posted a message earlier about a panic due to a privileged instruction
fault. As a result of that I am now running a kernel with debug
symbols.

Last night my server crashed again, and now I'm wondering if the
community is interested, the crash is probably due to a faulty memory
module.
01:20 irssi segfaulted
Apr 13 01:30:41 gigabeast savecore: reboot after panic: page fault
Apr 13 01:30:41 gigabeast savecore: writing core to vmcore.1
Apr 13 01:30:49 gigabeast kernel: pid 537 (testparm), uid 0: exited on
signal 11 (core dumped)
Apr 13 01:30:50 gigabeast kernel: pid 544 (smbd), uid 0: exited on
signal 6 (core dumped)

This morning the machine was in an unusable state (ssh unreachable,
display wouldn't switch on), machine wouldn't start at all (no POST)
until I removed the memory module that I added after the previous
crash.

Kind regards,

Spil.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: privileged instruction fault

2008-04-11 Thread Jeremy Chadwick
On Fri, Apr 11, 2008 at 11:22:12AM +0200, Spil Oss wrote:
> Hi Bjoern,
> 
> Was looking at that page, but my kernel doesn't have debug enabled.

Rebuild the kernel with debugging symbols, or do you not have the disk
space for it?

> Someone suggested getting a backtrace using the vanilla kernel, that
> kernel should still be in /boot/kernel but I can't get it to fly!

I can't see how that's going to work.  That's not a good suggestion.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: privileged instruction fault

2008-04-11 Thread Spil Oss
Hi Bjoern,

Was looking at that page, but my kernel doesn't have debug enabled.
Someone suggested getting a backtrace using the vanilla kernel, that
kernel should still be in /boot/kernel but I can't get it to fly!

/boot/kernel.old]# kgdb /boot/kernel.old/kernel /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
Cannot access memory at address 0xc0c04f54
(kgdb)

Kind regards,

Spil.

On 11/04/2008, Bjoern A. Zeeb <[EMAIL PROTECTED]> wrote:
> On Fri, 11 Apr 2008, Spil Oss wrote:
>
> > Yesterday my to-be server running FreeBSD 7.0 #0 has rebooted after a
> > kernel panic.
> >
> > FreeBSD newserver.example.net 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Fri
> > Apr  4 07:22:22 CEST 2008
> > [EMAIL PROTECTED]:/usr/obj/usr/src/sys/BEASTIE70  i386
> >
> > Please find messages and kernel-configuration attached.
> >
>
> Could you get a backtrace?
>
> http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
>
> might help you with further debugging.
>
> --
> Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT
> Software is harder than hardware  so better get it right the first time.
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: privileged instruction fault

2008-04-11 Thread Bjoern A. Zeeb

On Fri, 11 Apr 2008, Spil Oss wrote:


Yesterday my to-be server running FreeBSD 7.0 #0 has rebooted after a
kernel panic.

FreeBSD newserver.example.net 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Fri
Apr  4 07:22:22 CEST 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/BEASTIE70  i386

Please find messages and kernel-configuration attached.


Could you get a backtrace?

http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html

might help you with further debugging.

--
Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT
Software is harder than hardware  so better get it right the first time.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


reboot after panic: privileged instruction fault

2008-04-11 Thread Spil Oss
gabeast kernel: fwohci0: Link S400, max_rec 2048 bytes.
Apr 10 11:59:03 gigabeast kernel: firewire0:  on fwohci0
Apr 10 11:59:03 gigabeast kernel: sbp0:  on firewire0
Apr 10 11:59:03 gigabeast kernel: dcons_crom0:  on 
firewire0
Apr 10 11:59:03 gigabeast kernel: dcons_crom0: bus_addr 0x1138000
Apr 10 11:59:03 gigabeast kernel: fwohci0: Initiate bus reset
Apr 10 11:59:03 gigabeast kernel: fwohci0: BUS reset
Apr 10 11:59:03 gigabeast kernel: fwohci0: node_id=0xc800ffc0, gen=1, 
CYCLEMASTER mode
Apr 10 11:59:03 gigabeast kernel: pci1:  at device 1.3 (no 
driver attached)
Apr 10 11:59:03 gigabeast kernel: isab0:  at device 31.0 on pci0
Apr 10 11:59:03 gigabeast kernel: isa0:  on isab0
Apr 10 11:59:03 gigabeast kernel: atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xbfa0-0xbfaf at device 31.1 on pci0
Apr 10 11:59:03 gigabeast kernel: ata0:  on atapci0
Apr 10 11:59:03 gigabeast kernel: ata0: [ITHREAD]
Apr 10 11:59:03 gigabeast kernel: ata1:  on atapci0
Apr 10 11:59:03 gigabeast kernel: ata1: [ITHREAD]
Apr 10 11:59:03 gigabeast kernel: pci0:  at device 31.5 (no 
driver attached)
Apr 10 11:59:03 gigabeast kernel: acpi_tz0:  on acpi0
Apr 10 11:59:03 gigabeast kernel: atkbdc0:  port 
0x60,0x64 irq 1 on acpi0
Apr 10 11:59:03 gigabeast kernel: atkbd0:  irq 1 on atkbdc0
Apr 10 11:59:03 gigabeast kernel: kbd0 at atkbd0
Apr 10 11:59:03 gigabeast kernel: atkbd0: [GIANT-LOCKED]
Apr 10 11:59:03 gigabeast kernel: atkbd0: [ITHREAD]
Apr 10 11:59:03 gigabeast kernel: psm0:  irq 12 on atkbdc0
Apr 10 11:59:03 gigabeast kernel: psm0: [GIANT-LOCKED]
Apr 10 11:59:03 gigabeast kernel: psm0: [ITHREAD]
Apr 10 11:59:03 gigabeast kernel: psm0: model GlidePoint, device ID 0
Apr 10 11:59:03 gigabeast kernel: pmtimer0 on isa0
Apr 10 11:59:03 gigabeast kernel: orm0:  at iomem 
0xc-0xcd7ff,0xcd800-0xcdfff,0xce000-0xce7ff,0xce800-0xcefff,0xcf000-0xcf7ff,0xcf800-0xc
 pnpid ORM on isa0
Apr 10 11:59:03 gigabeast kernel: sc0:  at flags 0x100 on isa0
Apr 10 11:59:03 gigabeast kernel: sc0: VGA <16 virtual consoles, flags=0x300>
Apr 10 11:59:03 gigabeast kernel: vga0:  at port 0x3c0-0x3df 
iomem 0xa-0xb on isa0
Apr 10 11:59:03 gigabeast kernel: Timecounter "TSC" frequency 1398818687 Hz 
quality 800
Apr 10 11:59:03 gigabeast kernel: Timecounters tick every 1.000 msec
Apr 10 11:59:03 gigabeast kernel: ipfw2 (+ipv6) initialized, divert enabled, 
rule-based forwarding disabled, default to deny, logging limited to 5 
packets/entry by default
Apr 10 11:59:03 gigabeast kernel: firewire0: 1 nodes, maxhop <= 0, cable IRM = 
0 (me)
Apr 10 11:59:03 gigabeast kernel: firewire0: bus manager 0 (me)
Apr 10 11:59:03 gigabeast kernel: ad0: 38154MB  at 
ata0-master UDMA100
Apr 10 11:59:03 gigabeast kernel: Trying to mount root from ufs:/dev/ad0s1a
Apr 10 11:59:03 gigabeast kernel: WARNING: / was not properly dismounted
Apr 10 11:59:03 gigabeast kernel: WARNING: /home was not properly dismounted
Apr 10 11:59:03 gigabeast kernel: WARNING: /tmp was not properly dismounted
Apr 10 11:59:03 gigabeast kernel: WARNING: /usr was not properly dismounted
Apr 10 11:59:03 gigabeast kernel: WARNING: /var was not properly dismounted
Apr 10 11:59:04 gigabeast savecore: reboot after panic: privileged instruction 
fault
Apr 10 11:59:04 gigabeast savecore: writing core to vmcore.0
Apr 10 11:59:04 gigabeast kernel: bge0: link state changed to UP
Apr 10 11:59:13 gigabeast saslauthd[518]: detach_tty  : could not lock pid 
file /var/run/saslauthd/saslauthd.pid: Resource temporarily unavailable
Apr 10 11:59:13 gigabeast saslauthd[517]: detach_tty  : Cannot start 
saslauthd
Apr 10 11:59:13 gigabeast saslauthd[517]: detach_tty  : Another instance of 
saslauthd is currently running
Apr 10 11:59:13 gigabeast ntpd[567]: ntpd 4.2.0-a Thu Apr  3 22:57:31 UTC 2008 
(1)
Apr 10 11:59:14 gigabeast ntpd[567]: sendto(213.84.187.156): Network is 
unreachable
Apr 10 11:59:15 gigabeast ntpd[567]: sendto(194.109.22.18): Network is 
unreachable
Apr 10 11:59:16 gigabeast ntpd[567]: sendto(192.87.36.4): Network is unreachable
Apr 10 11:59:17 gigabeast ntpd[567]: sendto(85.12.29.43): Network is unreachable
Apr 10 11:59:18 gigabeast ntpd[567]: sendto(82.94.107.211): Network is 
unreachable
Apr 10 11:59:19 gigabeast ntpd[567]: sendto(212.142.28.67): Network is 
unreachable
Apr 10 12:00:18 gigabeast ntpd[567]: sendto(194.109.22.18): Network is 
unreachable
Apr 10 12:00:18 gigabeast ntpd[567]: sendto(213.84.187.156): Network is 
unreachable
Apr 10 12:00:21 gigabeast ntpd[567]: sendto(82.94.107.211): Network is 
unreachable
Apr 10 12:00:22 gigabeast ntpd[567]: sendto(212.142.28.67): Network is 
unreachable
Apr 10 12:00:22 gigabeast ntpd[567]: sendto(192.87.36.4): Network is unreachable
Apr 10 12:00:23 gigabeast ntpd[567]: sendto(85.12.29.43): Network is unreachable
Apr 10 12:00:43 gigabeast fsck: /dev/ad0s1g: 55 files, 2294 used, 8125169 free 
(33 frags, 1015642 blocks, 0.0% fragmentation)
Apr 10 12:00:50 gigabeast fsck: /dev/ad0s1d: UNREF FILE I=11  OWNER=root 
M

Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-12-13 Thread Vivek Khera


On Nov 14, 2007, at 10:13 AM, Vivek Khera wrote:

I'm running 6.2-REL.  The old kernel was -p5, now without the zero  
copy sockets, i'm running -p8.  I'll know in a couple of days if  
this is our solution.


For the archives:

Removing zero copy sockets seems to have fixed the issue.  Not a  
single panic on that box since, and it used to panic within 3-4 days  
under the load it has.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-14 Thread Kris Kennaway

Vivek Khera wrote:


On Nov 13, 2007, at 7:49 PM, Kris Kennaway wrote:


notification.
In the meantime, your best bet is to disable ZERO_COPY_SOCKETS.


There is a chance this was a recent regression, previously in 7.0 they 
were believed to work.




I'm running 6.2-REL.  The old kernel was -p5, now without the zero copy 
sockets, i'm running -p8.  I'll know in a couple of days if this is our 
solution.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"




According to alc, if the page is being wired by something else then ZCS 
has never worked properly.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-14 Thread Vivek Khera


On Nov 13, 2007, at 7:49 PM, Kris Kennaway wrote:


notification.
In the meantime, your best bet is to disable ZERO_COPY_SOCKETS.


There is a chance this was a recent regression, previously in 7.0  
they were believed to work.




I'm running 6.2-REL.  The old kernel was -p5, now without the zero  
copy sockets, i'm running -p8.  I'll know in a couple of days if this  
is our solution.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-13 Thread Kip Macy
Various calls that downgrade permissions or virtually copy a pmap in
pmap.c now remove PG_W (and did not 6 months ago). This may be the
cause of the regression. It would probably be better (and faster) if
the pages were "held" instead of wired.

-Kip

On Nov 13, 2007 4:49 PM, Kris Kennaway <[EMAIL PROTECTED]> wrote:
> Kip Macy wrote:
> > Unfortunately, ZERO_COPY_SOCKETs have long been a known source of
> > problems. I think also, when a page is copied as part of COW the new
> > page is unwired (see pmap_copy et al.), this could lead to
> > socow_iodone unwiring after send a page that was not wired. An added
> > issue is that parts of the VM assume that COW and wired are mutually
> > exclusive which the socow code violates.
> >
> > At some point in the near future I may be adding support for doing
> > zero copy send without COW for blocking sockets. The one down side of
> > this approach is that if you have multiple threads in your process it
> > widens the window during which they can stomp on data that you're
> > sending. Nonetheless, this would be a bug in the application code.
> > More complicated would be zero-copy non-COW send on non-blocking
> > sockets as it would require an extension to kevent for completion
> > notification.
> >
> > In the meantime, your best bet is to disable ZERO_COPY_SOCKETS.
>
> There is a chance this was a recent regression, previously in 7.0 they
> were believed to work.
>
> Kris
>
>
> >
> >
> >  -Kip
> >
> >
> >
> > On Nov 13, 2007 1:59 PM, Vivek Khera <[EMAIL PROTECTED]> wrote:
> >> On Nov 13, 2007, at 4:50 PM, Vlad GALU wrote:
> >>
> vmio = 1
> offset = Unhandled dwarf expression opcode 0x93
>  (kgdb)
> 
> >>>Do you happen to have ZERO_COPY_SOCKETS in your kernel config?
> >>>
> >>>
> >> Yes, I do.  Are they known to be bad under certain loads or just in
> >> general.  I don't have this issue with any other web server running
> >> the same kernel config but those are amd64 boxes mostly.
> >>
> >>
> >> ___
> >> freebsd-stable@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> >>
> > ___
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> >
> >
>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-13 Thread Kris Kennaway

Kip Macy wrote:

Unfortunately, ZERO_COPY_SOCKETs have long been a known source of
problems. I think also, when a page is copied as part of COW the new
page is unwired (see pmap_copy et al.), this could lead to
socow_iodone unwiring after send a page that was not wired. An added
issue is that parts of the VM assume that COW and wired are mutually
exclusive which the socow code violates.

At some point in the near future I may be adding support for doing
zero copy send without COW for blocking sockets. The one down side of
this approach is that if you have multiple threads in your process it
widens the window during which they can stomp on data that you're
sending. Nonetheless, this would be a bug in the application code.
More complicated would be zero-copy non-COW send on non-blocking
sockets as it would require an extension to kevent for completion
notification.

In the meantime, your best bet is to disable ZERO_COPY_SOCKETS.


There is a chance this was a recent regression, previously in 7.0 they 
were believed to work.


Kris




 -Kip



On Nov 13, 2007 1:59 PM, Vivek Khera <[EMAIL PROTECTED]> wrote:

On Nov 13, 2007, at 4:50 PM, Vlad GALU wrote:


   vmio = 1
   offset = Unhandled dwarf expression opcode 0x93
(kgdb)


   Do you happen to have ZERO_COPY_SOCKETS in your kernel config?



Yes, I do.  Are they known to be bad under certain loads or just in
general.  I don't have this issue with any other web server running
the same kernel config but those are amd64 boxes mostly.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-13 Thread Vivek Khera


On Nov 13, 2007, at 5:13 PM, Kip Macy wrote:


In the meantime, your best bet is to disable ZERO_COPY_SOCKETS.


Thanks for the info.  I'm putting the new kernel in place and will see  
what happens and report back.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-13 Thread Kip Macy
Unfortunately, ZERO_COPY_SOCKETs have long been a known source of
problems. I think also, when a page is copied as part of COW the new
page is unwired (see pmap_copy et al.), this could lead to
socow_iodone unwiring after send a page that was not wired. An added
issue is that parts of the VM assume that COW and wired are mutually
exclusive which the socow code violates.

At some point in the near future I may be adding support for doing
zero copy send without COW for blocking sockets. The one down side of
this approach is that if you have multiple threads in your process it
widens the window during which they can stomp on data that you're
sending. Nonetheless, this would be a bug in the application code.
More complicated would be zero-copy non-COW send on non-blocking
sockets as it would require an extension to kevent for completion
notification.

In the meantime, your best bet is to disable ZERO_COPY_SOCKETS.


 -Kip



On Nov 13, 2007 1:59 PM, Vivek Khera <[EMAIL PROTECTED]> wrote:
>
> On Nov 13, 2007, at 4:50 PM, Vlad GALU wrote:
>
> >>vmio = 1
> >>offset = Unhandled dwarf expression opcode 0x93
> >> (kgdb)
> >>
> >
> >Do you happen to have ZERO_COPY_SOCKETS in your kernel config?
> >
> >
>
> Yes, I do.  Are they known to be bad under certain loads or just in
> general.  I don't have this issue with any other web server running
> the same kernel config but those are amd64 boxes mostly.
>
>
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-13 Thread Vlad GALU
On 11/13/07, Vivek Khera <[EMAIL PROTECTED]> wrote:
>
> On Nov 13, 2007, at 4:50 PM, Vlad GALU wrote:
>
> >>vmio = 1
> >>offset = Unhandled dwarf expression opcode 0x93
> >> (kgdb)
> >>
> >
> >Do you happen to have ZERO_COPY_SOCKETS in your kernel config?
> >
> >
>
> Yes, I do.  Are they known to be bad under certain loads or just in
> general.  I don't have this issue with any other web server running
> the same kernel config but those are amd64 boxes mostly.

Remove, retry :) This thing bit me hard in the past too, see the
freebsd-fs@ archives.

>
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>


-- 
Mahnahmahnah!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-13 Thread Vivek Khera


On Nov 13, 2007, at 4:50 PM, Vlad GALU wrote:


   vmio = 1
   offset = Unhandled dwarf expression opcode 0x93
(kgdb)



   Do you happen to have ZERO_COPY_SOCKETS in your kernel config?




Yes, I do.  Are they known to be bad under certain loads or just in  
general.  I don't have this issue with any other web server running  
the same kernel config but those are amd64 boxes mostly.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-13 Thread Vlad GALU
On 11/13/07, Vivek Khera <[EMAIL PROTECTED]> wrote:
> I've got a Dell 1750 box that was rock-solid stable running 4.11 for a
> couple of years now operating a pretty busy website backend.  A month
> or so ago we wiped it clean and repurposed it to run a different
> website running Drupal with a Varnish front-end cache using FreeBSD
> 6.2-RELEASE-p5.  The system is i386 and has 1Gb of RAM.
>
> Uname output: FreeBSD mb.kcilink.com 6.2-RELEASE-p5 FreeBSD 6.2-
> RELEASE-p5 #0: Wed Jun 27 10:47:15 EDT 2007
> [EMAIL PROTECTED]:/n/lorax1/usr6/obj.i386/n/lorax1/usr6/src/sys/
> KCI32SMP  i386
>
>
> The last week or so, it has been crashing regularly.  Sometimes twice
> per day, and sometimes it runs for two days without a problem.  I
> finally managed to make it dump a crashlog and core, and discovered
> that the panic was:
>
>   reboot after panic: vm_page_unwire: invalid wire count: 0
>
> I google around and found one old PR #33637 which had a patch but that
> was for FreeBSD 4.5.  I have also found two other mentions of this
> panic, one on the mailing lists with no responses, and another for a
> PR from 6.1-PRERELEASE, PR #94578, which has no comments on it.
>
> According to the http and varnish logs, we're not being particularly
> hit very hard when the panic happens, but I don't know if we lose some
> log data during the panic.
>
> I have the core and the kernel.debug.  I'm not sure what info to
> extract from it beyond the backtrace.  The watchdog timer fired and
> dropped me to DDB, so I just typed "watchdog" and "c" and let it
> finish dumping.
>
> Here's the backtrace, and "bt full" output.
>
>
> # kgdb kernel.debug /var/crash/vmcore.0
> [GDB will not be able to debug user-mode threads: /usr/lib/
> libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-marcel-freebsd".
>
> Unread portion of the kernel message buffer:
> panic: vm_page_unwire: invalid wire count: 0
> cpuid = 1
> KDB: stack backtrace:
> kdb_backtrace(100,c5a76000,c0e88ab0,0,d90d82c8,...) at kdb_backtrace
> +0x29
> panic(c06b011f,0,c0e88ab0,efe80900,c057b96a,...) at panic+0x114
> vm_page_unwire(c0e88ab0,0) at vm_page_unwire+0x68
> vfs_vmio_release(d90d82c8) at vfs_vmio_release+0xa2
> getnewbuf(0,0,4000,4000) at getnewbuf+0x2bc
> getblk(c6f81550,4f5,0,4000,0,...) at getblk+0x360
> ffs_balloc_ufs2(c6f81550,13d4000,0,fa,c4f32780,...) at
> ffs_balloc_ufs2+0x1606
> ffs_write(efe80bec) at ffs_write+0x2ec
> VOP_WRITE_APV(c06e06a0,efe80bec) at VOP_WRITE_APV+0xce
> vn_write(c59c8000,efe80cbc,c51cf400,0,c5a76000) at vn_write+0x1ee
> dofilewrite(c5a76000,c,c59c8000,efe80cbc,,...) at dofilewrite
> +0x77
> kern_writev(c5a76000,c,efe80cbc,821bba3,fa,...) at kern_writev+0x3b
> write(c5a76000,efe80d04) at write+0x45
> syscall(3b,809003b,bfbf003b,0,bfbfeaa4,...) at syscall+0x2bf
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (4, FreeBSD ELF32, write), eip = 0x483d732f, esp =
> 0xbfbfe9dc, ebp = 0xbfbfea08 ---
> Uptime: 1d20h51m58s
> Dumping 1023 MB (2 chunks)
>chunk 0: 1MB (159 pages) ... ok
>chunk 1: 1023MB (261872 pages) 1007 991 975 959 943 927 911 895 879
> 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607
> 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335
> 319 303 287 271 255 239 223 207 191 175 159 143 127 111
> 95interrupt   total
> irq4: sio0 21758
> irq15: ata11
> irq16: bge0  4544565
> irq17: bge1 17684238
> irq18: amr0   588223
> cpu0: timer323148326
> cpu2: timer323148294
> cpu1: timer323148331
> cpu3: timer323148344
> Total  1315432158
> KDB: stack backtrace:
> kdb_backtrace(c069ec5d,4e67e6de,0,c06ea170,c06e9818,...) at
> kdb_backtrace+0x29
> watchdog_fire(c07120e0,c8,efe80634,c065c821,efe8063c,...) at
> watchdog_fire+0x9d
> hardclock(efe8063c) at hardclock+0x115
> lapic_handle_timer(0) at lapic_handle_timer+0x51
> Xtimerint(c4fe6000,1,efe806a8,c066d57b,c4fe6000,...) at Xtimerint+0x30
> getit(c4fe6000,c4fe6000,4,efe806c0,c0496f97,...) at getit+0x88
> DELAY(1) at DELAY+0x3

reboot after panic: vm_page_unwire: invalid wire count: 0

2007-11-13 Thread Vivek Khera
I've got a Dell 1750 box that was rock-solid stable running 4.11 for a  
couple of years now operating a pretty busy website backend.  A month  
or so ago we wiped it clean and repurposed it to run a different  
website running Drupal with a Varnish front-end cache using FreeBSD  
6.2-RELEASE-p5.  The system is i386 and has 1Gb of RAM.


Uname output: FreeBSD mb.kcilink.com 6.2-RELEASE-p5 FreeBSD 6.2- 
RELEASE-p5 #0: Wed Jun 27 10:47:15 EDT 2007  
[EMAIL PROTECTED]:/n/lorax1/usr6/obj.i386/n/lorax1/usr6/src/sys/ 
KCI32SMP  i386



The last week or so, it has been crashing regularly.  Sometimes twice  
per day, and sometimes it runs for two days without a problem.  I  
finally managed to make it dump a crashlog and core, and discovered  
that the panic was:


 reboot after panic: vm_page_unwire: invalid wire count: 0

I google around and found one old PR #33637 which had a patch but that  
was for FreeBSD 4.5.  I have also found two other mentions of this  
panic, one on the mailing lists with no responses, and another for a  
PR from 6.1-PRERELEASE, PR #94578, which has no comments on it.


According to the http and varnish logs, we're not being particularly  
hit very hard when the panic happens, but I don't know if we lose some  
log data during the panic.


I have the core and the kernel.debug.  I'm not sure what info to  
extract from it beyond the backtrace.  The watchdog timer fired and  
dropped me to DDB, so I just typed "watchdog" and "c" and let it  
finish dumping.


Here's the backtrace, and "bt full" output.


# kgdb kernel.debug /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/ 
libthread_db.so: Undefined symbol "ps_pglobal_lookup"]

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for  
details.

This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
panic: vm_page_unwire: invalid wire count: 0
cpuid = 1
KDB: stack backtrace:
kdb_backtrace(100,c5a76000,c0e88ab0,0,d90d82c8,...) at kdb_backtrace 
+0x29

panic(c06b011f,0,c0e88ab0,efe80900,c057b96a,...) at panic+0x114
vm_page_unwire(c0e88ab0,0) at vm_page_unwire+0x68
vfs_vmio_release(d90d82c8) at vfs_vmio_release+0xa2
getnewbuf(0,0,4000,4000) at getnewbuf+0x2bc
getblk(c6f81550,4f5,0,4000,0,...) at getblk+0x360
ffs_balloc_ufs2(c6f81550,13d4000,0,fa,c4f32780,...) at  
ffs_balloc_ufs2+0x1606

ffs_write(efe80bec) at ffs_write+0x2ec
VOP_WRITE_APV(c06e06a0,efe80bec) at VOP_WRITE_APV+0xce
vn_write(c59c8000,efe80cbc,c51cf400,0,c5a76000) at vn_write+0x1ee
dofilewrite(c5a76000,c,c59c8000,efe80cbc,,...) at dofilewrite 
+0x77

kern_writev(c5a76000,c,efe80cbc,821bba3,fa,...) at kern_writev+0x3b
write(c5a76000,efe80d04) at write+0x45
syscall(3b,809003b,bfbf003b,0,bfbfeaa4,...) at syscall+0x2bf
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (4, FreeBSD ELF32, write), eip = 0x483d732f, esp =  
0xbfbfe9dc, ebp = 0xbfbfea08 ---

Uptime: 1d20h51m58s
Dumping 1023 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 1023MB (261872 pages) 1007 991 975 959 943 927 911 895 879  
863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607  
591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335  
319 303 287 271 255 239 223 207 191 175 159 143 127 111  
95interrupt   total

irq4: sio0 21758
irq15: ata11
irq16: bge0  4544565
irq17: bge1 17684238
irq18: amr0   588223
cpu0: timer323148326
cpu2: timer323148294
cpu1: timer323148331
cpu3: timer323148344
Total  1315432158
KDB: stack backtrace:
kdb_backtrace(c069ec5d,4e67e6de,0,c06ea170,c06e9818,...) at  
kdb_backtrace+0x29
watchdog_fire(c07120e0,c8,efe80634,c065c821,efe8063c,...) at  
watchdog_fire+0x9d

hardclock(efe8063c) at hardclock+0x115
lapic_handle_timer(0) at lapic_handle_timer+0x51
Xtimerint(c4fe6000,1,efe806a8,c066d57b,c4fe6000,...) at Xtimerint+0x30
getit(c4fe6000,c4fe6000,4,efe806c0,c0496f97,...) at getit+0x88
DELAY(1) at DELAY+0x3b
amr_quartz_poll_command1(c4fe6000,c51fbff0,0,0,1000,...) at  
amr_quartz_poll_command1+0x1af
amr_setup_polled_dmamap(c51fbff0,c4fef800,1,0) at  
amr_setup_polled_dmamap+0x94
bus_dmamap_load(c4ffe380,0,c0c22000,1,c0496cd4,c51fbff0,1) at  
bus_dmamap_load+0x4b5

amr_quartz_poll_command(c51fbff0) at amr_quartz_poll_command+0x51
amr_dump_blocks(c4fe6000,0,4cb25e,c0c22000,80) at amr_dump_blocks+0x5f
amrd_dump(c515b700,c0c22000,0,9964bc00,0,1) at amrd_dump+0x7c
cb_

system does not reboot after panic (Was: savecore: first and last dump headers disagree)

2005-05-23 Thread Palle Girgensohn
--On måndag, maj 23, 2005 10.30.47 +0200 Palle Girgensohn 
<[EMAIL PROTECTED]> wrote:



Hi!

We have an amd64 system that still experiences crashes after installing
5.4, mostly during high loads. (It's been unstable all the time, really;
see previous posts.)

I've added dumpdev="/dev/amrd0s2b", and some time ago I did get
coredumps, but with latest versions of the kernel, savecore does not give
me a dump, instead it says:

savecore: first and last dump headers disagree on /dev/amrd0s2b
savecore: unsaved dumps found but not saved

What can I do to fix this? I guess I need a core dump to proceed in
finding the problem?



Peter Holm tipped me of using savecore -f. Hopefully this will give me a 
core next time. This one was already destroyed by swapping. :(




Also, the machine does not reboot after a panic, that's an even bigger
problem, really, it needs console hands-on to revive every time.


This is really *the* main issue. It won't reboot automatically, it just 
sits there waiting for keyboard action... :(  there is no debugger in the 
kernel, would adding kbd and kbd_unattende help? I doubt it? Anything else 
that can be done?


/Palle



Last time it crashed (last week, before updating to 5.4-RELEASE, that
system was a few weeks older on the RELENG_5_4 branch), it seems to have
get stuck on dumping the core, can this be the problem:

---
Fatal trap 12: page fault while in kernel mode
cpuid = 0: apic id = 00
fault virtual address= 0x00
...
trap number  = 12
panic: page fault
cpuid = 0
boot() called on cpu#0
Uptime: 1d23h50m36s
Dumping 2047 MB
 16 32

The cursor sits at the position after "32".

Seems to me it fails to dump the core, can this be it? On previous
crashes, before dumpdev was set, it would hang before that

The machine is Dell 2850 w/ Perc raid, Dual CPUs, SMP with hyperthreading
OFF in BIOS. Enclosing the KERNEL config, almost a GENERIC kernel. I can
provide more info if required.

So, in short, three question, really.

- How can I get rid of the crashes? (heh)
- How can I get the system to do unattended reboot when crashed?
- How do I get a coredump?
Any help appreciated.

/Palle


Diffing GENERIC vs KERNEL:

--- GENERIC  Tue Apr 12 15:57:01 2005
+++ KERNEL   Fri Apr 29 22:27:41 2005
@@ -20,7 +20,9 @@

 machine amd64
 cpu HAMMER
-ident   GENERIC
+ident   KERNEL
+
+makeoptions DEBUG=-g

 # To statically compile in device wiring instead of /boot/device.hints
 #hints  "GENERIC.hints" # Default places to look for
devices.
@@ -45,7 +47,7 @@
 options COMPAT_43   # Needed by COMPAT_LINUX32
 options COMPAT_IA32 # Compatible with i386 binaries
 options COMPAT_FREEBSD4 # Compatible with FreeBSD4
-options COMPAT_LINUX32  # Compatible with i386 linux
binaries
+#optionsCOMPAT_LINUX32  # Compatible with i386 linux
binaries
 options SCSI_DELAY=15000# Delay (in ms) before probing
SCSI
 options KTRACE  # ktrace(1) support
 options SYSVSHM # SYSV-style shared memory
@@ -64,10 +66,10 @@

 # Enabling NO_MIXED_MODE gives a performance improvement on some
motherboards
 # but does not work with some boards (mostly nVidia chipset based).
-#optionsNO_MIXED_MODE   # Don't penalize working chipsets
+options NO_MIXED_MODE   # Don't penalize working chipsets

 # Linux 32-bit ABI support
-options LINPROCFS   # Cannot be a module yet.
+#optionsLINPROCFS   # Cannot be a module yet.

 # Bus support.  Do not remove isa, even if you have no isa slots
 device  acpi
@@ -260,3 +262,19 @@
 device  firewire# FireWire bus code
 device  sbp # SCSI over FireWire (Requires scbus and
da)
 device  fwe # Ethernet over FireWire (non-standard!)
+
+# SMP
+options SMP
+
+# SysV stuff
+# This provides support for System V shared memory.
+#
+options SYSVSHM
+options SYSVSEM
+options SYSVMSG
+options SHMMAXPGS=65536
+options SEMMNI=40
+options SEMMNS=240
+options SEMUME=40
+options SEMMNU=120





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"