Re: Stability problems with 7-stable (after 7.1 - 7.2 - 7-stable)

2009-12-18 Thread Alexander Leidinger

Quoting Boris Samorodov b...@ipt.ru (from Thu, 17 Dec 2009 20:55:44 +0300):


Ivan Voras ivo...@freebsd.org writes:

Alexander Leidinger wrote:

Hi,

please CC me on replies.


Seems you were not CCed...


I'm now subscribed to stable@, thanks for forwarding this.


I have a system which was at 7.1-pX. After the update to 7.2-p5 it
started to exhibit deadlocks after some minutes of uptime.

With 7.1 (generic kernel) it was running fine, with 7.2 generic the
problems started directly.

The system is now at 7-stable with a custom kernel
(http://www.Leidinger.net/test/ALCATRAZ), basically generic without
unneeded drivers plus witness/invariants/sw-watchdog.

The system is an AMD Dual Core with NVidia MCP61 chipset
(http://www.Leidinger.net/test/dmesg.alcatraz), 2 GB RAM, 2
harddisks and FreeBSD 32bit install.


Some generic things to try:
- did you monitor the system with something (top or systat
-vm) to see if there is something unusual, like interrupt storms?


When I had the initial problems, I asked for a KVM-switch to be  
connected to the system (not a free service). In SU mode I didn't see  
any problem. When starting the system but not the jails, I didn't see  
any problem (cvsup/buildworld/...). When I started the jails, I  
started to see the problems.



- no physical access is a problem; If you do manage it, I'd
say try running single user for some time with systat -vm just to see
what happens.


This is not an option now.


I would not trust ZFS in 7-stable since it lags a bit behind patches
done to 8 but 7.2 should be fine - at least I don't have any such
problems with it (though no AMD boxes to test them with it).


Ivan, the system started out to be without ZFS, just after I started  
to see deadlocks I switched to ZFS. This _improved_ the situation. Now  
the system survives between 3h and about 11h without a deadlock. If I  
run every 5 minutes a script which logs 4 text lines to the root (UFS)  
and runs 3x sync + sleep 5 + 3x sync the frequency of deadlocks  
increases.



If you haven't updated your ZFS pools, I'd suggest reverting back to
7.1, then building or downloading an 8.0 kernel and try it with 7.1
userland (reboot -k ...) simply to see if it helps.


IIRC there where KBI changes (ifconfig?) which prevents me to go back  
to 7.1 without access to the console. As this is a production machine  
(it hosts not only my blog/website/mails, but stuff from other persons  
too), the goal is to stabilize this system now.


Kib analyzed 2 crashdumps I had (watchdog triggered) and he thinks  
they are because of ZFS deadlocks. So the initial problem (without  
ZFS) is not know yet, but this info will hopefully allow to stabilize  
the system further (see also my mail about at least 57 unmerged ZFS  
patches).


Bye,
Alexander.

--
Universities are places of knowledge.  The freshman each bring a little
in with them, and the seniors take none away, so knowledge accumulates.

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Stability problems with 7-stable (after 7.1 - 7.2 - 7-stable)

2009-12-15 Thread Alexander Leidinger

Hi,

please CC me on replies.

I have a system which was at 7.1-pX. After the update to 7.2-p5 it  
started to exhibit deadlocks after some minutes of uptime.


With 7.1 (generic kernel) it was running fine, with 7.2 generic the  
problems started directly.


The system is now at 7-stable with a custom kernel  
(http://www.Leidinger.net/test/ALCATRAZ), basically generic without  
unneeded drivers plus witness/invariants/sw-watchdog.


The system is an AMD Dual Core with NVidia MCP61 chipset  
(http://www.Leidinger.net/test/dmesg.alcatraz), 2 GB RAM, 2 harddisks  
and FreeBSD 32bit install.


On the system are 3 jails (one postfix+mysql+apache, one  
mysql+apache+some-perl-service, one apache+mysql+xmpp-server). All of  
them have a 7-stable world.


The 2 disks where configured with 3 partition pairs for root-mirror,  
swap-mirror, and jail-mirror.


I tested with and without SMP, both schedulers, with  
WITNESS/INVARIANTS, and by removing one part of each mirror (to rule  
out that the disks are not in sync). In all cases the system was not  
stable and deadlocked after several minutes (even with only the  
mail-jail up and running). First no interaction via ssh is possible  
anymore, then even ping does not work anymore. After configuring the  
watchdog, I got at least the system back online automatically... :(


After reading  
http://www.mail-archive.com/freebsd-stable@freebsd.org/msg96901.html I  
decided to switch the FS for the jails to ZFS (currently only on one  
harddisk, the other partition for it is still with UFS, but not  
mounted at all) as a test.


Now with a little bit of kernel tuning for ZFS  
(http://www.Leidinger.net/test/loader.conf.alcatraz) I was able to  
keep the system up for about 3h with all jails activated (I started  
one jail after another, with waiting 1h between starting each jail).  
After that no access via ssh, no ping, but also no reboot from the  
sw-watchdog, I had to do a remote power-off/-on. After that I didn't  
had any crashdump (in the watchdog cases I had dumps, but since I  
recompiled the kernel since then, I can not provide useful output).


The current gmirror status output is at
   http://www.Leidinger.net/test/gmirror.alcatraz

The system has no serial console. I have no physical access.

For such a small setup I would expect that 7.2-GENERIC is more than  
enough. At least 7.1-GENERIC was running without any problem.


Does this problem sound familiar to someone, any ideas what to try,  
anyone with patches I could test?


Bye,
Alexander.

--
I'm not a real movie star -- I've still got the same wife I started out
with twenty-eight years ago.
-- Will Rogers

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Stability problems with 7-stable (after 7.1 - 7.2 - 7-stable)

2009-12-15 Thread Ivan Voras

Alexander Leidinger wrote:

Hi,

please CC me on replies.

I have a system which was at 7.1-pX. After the update to 7.2-p5 it 
started to exhibit deadlocks after some minutes of uptime.


With 7.1 (generic kernel) it was running fine, with 7.2 generic the 
problems started directly.


The system is now at 7-stable with a custom kernel 
(http://www.Leidinger.net/test/ALCATRAZ), basically generic without 
unneeded drivers plus witness/invariants/sw-watchdog.


The system is an AMD Dual Core with NVidia MCP61 chipset 
(http://www.Leidinger.net/test/dmesg.alcatraz), 2 GB RAM, 2 harddisks 
and FreeBSD 32bit install.


Some generic things to try:
	- did you monitor the system with something (top or systat -vm) to see 
if there is something unusual, like interrupt storms?
	- no physical access is a problem; If you do manage it, I'd say try 
running single user for some time with systat -vm just to see what happens.


I would not trust ZFS in 7-stable since it lags a bit behind patches 
done to 8 but 7.2 should be fine - at least I don't have any such 
problems with it (though no AMD boxes to test them with it).


If you haven't updated your ZFS pools, I'd suggest reverting back to 
7.1, then building or downloading an 8.0 kernel and try it with 7.1 
userland (reboot -k ...) simply to see if it helps.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org