panic in kevent
Hi One of my DL360G5 boxes running 7.0 had a panic this night: jb-2 ~$ uname -rsv FreeBSD 7.0-RELEASE-p4 FreeBSD 7.0-RELEASE-p4 #2: Thu Sep 4 10:49:27 CEST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DL360G5 The config is a GENERIC with some pf, IPSEC and ALTQ stuff enabled. jb-2 /usr/obj/usr/src/sys/DL360G5# kgdb kernel.debug /var/crash/vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/ libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: panic: page fault cpuid = 1 Uptime: 40d22h42m5s Physical memory: 10225 MB Dumping 867 MB: 852 836 820 804 788 772 756 740 724 708 692 676 660 644 628 612 596 580 564 548 532 516 500 484 468 452 436 420 404 388 372 356 #0 doadump () at pcpu.h:194 194 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) where #0 doadump () at pcpu.h:194 #1 0x0004 in ?? () #2 0x804bb259 in boot (howto=260) at /usr/src/sys/kern/ kern_shutdown.c:409 #3 0x804bb65d in panic (fmt=0x104 bounds>) at /usr/src/sys/kern/kern_shutdown.c:563 #4 0x8079ec84 in trap_fatal (frame=0xff01b33229f0, eva=18446742984664492240) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0x8079f055 in trap_pfault (frame=0xb6337780, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0x8079f998 in trap (frame=0xb6337780) at /usr/src/ sys/amd64/amd64/trap.c:410 #7 0x8078560e in calltrap () at /usr/src/sys/amd64/amd64/ exception.S:169 #8 0x80494b0b in knlist_remove_kq (knl=0xff0114407748, kn=0xff0054f5fc30, knlislocked=0, kqislocked=0) at /usr/src/sys/kern/kern_event.c:1615 #9 0x80495f58 in kqueue_register (kq=Variable "kq" is not available. ) at /usr/src/sys/kern/kern_event.c:956 #10 0x804962f3 in kern_kevent (td=0xff01b33229f0, fd=Variable "fd" is not available. ) at /usr/src/sys/kern/kern_event.c:673 #11 0x80496ca5 in kevent (td=0xff01b33229f0, uap=0xb6337be0) at /usr/src/sys/kern/kern_event.c:594 #12 0x8079f2d7 in syscall (frame=0xb6337c70) at /usr/ src/sys/amd64/amd64/trap.c:852 #13 0x8078581b in Xfast_syscall () at /usr/src/sys/amd64/amd64/ exception.S:290 #14 0x10999ccc in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) Please let me know if I can help with anything else. Is there any way to know which app caused this? I Did some googling with only one or two similar crashes as result, although the hits didn't give much.. I've never had this crash before. Thanks -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Jails and PF states on locahost
No-one with any clues or recommendations? :/ CCing to -stable too.. Thanks -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ On Oct 29, 2007, at 09:37 , Johan Ström wrote: Hello I got a FreeBSD 6.2 box running a few jails, with a pretty strict PF ruleset. I got a problem with traffic between two of the jails. Both have public IPs (one of them have two using the jail-multiple- ip-patch). The problem I have is when they are to talk with each other. First let med describe the PF ruleset (somewhat stripped down but this should be the relevant stuff) jail1=xx.xx.xx.131 jail2a=xx.xx.xx.133 jail2b=xx.xx.xx.134 scrub in all block drop in log # base system talk to itself pass in on lo0 inet from 127.0.0.1 to 127.0.0.1 # all can talk out pass out on em0 proto tcp flags S/SA modulate state pass out on em0 proto udp keep state # jails talk to them selfs pass in on lo0 inet from $jail1 to $jail1 pass in on lo0 inet from {$jail2a $jail2b} to {$jail2a $jail2b} # let smtp in on jail1 pass in on {lo0 em0} inet proto tcp from any to $jail1 port smtp flags S/SA modulate state Okay, so the problem occurs when jail2 shall talk to jail1 on port 25 (smtp). From the above rules, when the traffic leaves jail2 (traffic comes from $jail2b it seems) it should match the last rule and create a state. And so it does! self tcp xx.xx.xx:25 <- xx.xx.xx.134:57557 SYN_SENT:ESTABLISHED [3014249759 + 65536](+2074393365) wscale 1 [4121000179 + 65536] (+541973245) wscale 1 age 00:01:03, expires in 00:00:01, 7:10 pkts, 384:640 bytes So the SYN arives at $jail1, but the SYNACK fails to go back to $jail2b (where the state should let the packet back in?), which is also seen in the following row from pflog0: 09:30:34.370402 rule 1/0(match): block in on lo0: (tos 0x0, ttl 64, id 35618, offset 0, flags [DF], proto: TCP (6), length: 64) xx.xx.xx.131.25 > xx.xx.xx.134.57557: S 793675827:793675827(0) ack 4121000179 win 65535 So.. What have I missed? The state is created but it doesnt seem to match enough bytes or something? 384:640 matched packets, so et matches in both directions? Any clues are welcome! Thanks -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
scrambled (gmirror) dmesg output
Hello Im playing with a new box running RELENG_7.0 from yesterday. I got two discs with gmirror on ad[6|14]s1a and zfs-mirror on s1d. When o do atacontrol detach ata7 (detach ad14), i get this in dmesG: (first time) subdisk14: detached ad14: detached GEOM_MIRROR: Device gm1b: provider ad14s1bG dEiOsMc_oMnInReRcOtRe:d .De vice gm1: provider ad14s1a disconnected. (second time, detaching again after reattach) subdisk14: detached ad14: detached GEOMG_EMOIMR_RMOIRR:R ORD:e viDceev icgem 1bg:m 1p:r opvriodveird era d1a4ds114bs 1dai sdciosncnoencnteecdt.ed. huh? :) Some print raceing or something? Btw, Im doing ZFS'ed root as on wiki, but i added gmirror to the root partition to (and steps to install from one disc to the other, then boot over and add the original disc to mirrors).. I've documented the steps (or at least the commands and some simple comments), would anyone be interested in having it, on the wiki or otherwise? -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: scrambled (gmirror) dmesg output
On Dec 1, 2007, at 12:37 , Jeremy Chadwick wrote: On Sat, Dec 01, 2007 at 12:16:45PM +0100, Johan Ström wrote: Hello Im playing with a new box running RELENG_7.0 from yesterday. I got two discs with gmirror on ad[6|14]s1a and zfs-mirror on s1d. When o do atacontrol detach ata7 (detach ad14), i get this in dmesG: (first time) subdisk14: detached ad14: detached GEOM_MIRROR: Device gm1b: provider ad14s1bG dEiOsMc_oMnInReRcOtRe:d .De vice gm1: provider ad14s1a disconnected. (second time, detaching again after reattach) subdisk14: detached ad14: detached GEOMG_EMOIMR_RMOIRR:R ORD:e viDceev icgem 1bg:m 1p:r opvriodveird era d1a4ds114bs 1dai sdciosncnoencnteecdt.ed. huh? :) Some print raceing or something? The problem isn't specific to GEOM or ZFS. It's a known issue with two kernel printf()s being called simultaneously. There are older threads discussing the issue. I can dig up URLs if you want to read them, but I don't have them available quickly... Just what I thought then. Just have never seen it 6.x (where I use gmirror) so I was a bit curious. Btw, zfs doesnt seem to be very "chatty" in dmesg? Ie loosing discs, starting to rebuild discs etc... Isnt that something one would want in logs? Thanks! -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http:// www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: I just broke out of a FreeBSD jail.. Known bug??
On Dec 28, 2007, at 13:41 , Edwin Groothuis wrote: On Fri, Dec 28, 2007 at 01:15:38PM +0100, Johan Str?m wrote: Thats my home dir on core!.. That should very much not be visible there! I have full access now (from the wrong jail!) Known bug or did I just stumble upon something pretty bad?? You didn't really break out of it, the person who managed the machine did something he shouldn't have done: Moving the directories while the jail(s) were running. It should be mentioned in the BUGS section of the jail(8) command. Yes, thats true.. Without "super-root" doing that the "breakout" would never happen. But still a bug, so yes I guess it should be mentioned in BUGS (and handbook too? not sure where this kind of "special features" are noted) unless its fixed. -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
I just broke out of a FreeBSD jail.. Known bug??
Hello list! I'm running a FreeBSD 6.2-p8 box with a few jails. The other day a user of mine uploaded a number of files to one jail, then I (in the actual system outside of all jails) moved that directory to another jail.. When I later did some chdiring in the original jail, I found my self standing in my other jails pwd and beeing able to read/ manipulate files!.. Example: jb-1 (the base machine, jailbox-1) shell (jail 1) core (jail 2) shell /home/johan# pwd /home/johan shell /home/johan# ls .cshrc .irssi .login_conf .mailrc .profile .shrc .zcompdump public_html .histfile .login .mail_aliases .noident.rhosts .ssh.zshrc shell /home/johan# mkdir test shell /home/johan# cd test shell /home/johan/test# touch asd shell /home/johan/test# ls -al total 4 drwxr-xr-x 2 root root 512 Dec 28 13:09 . drwxr-x--x 6 johan johan 512 Dec 28 13:09 .. -rw-r--r-- 1 root root0 Dec 28 13:09 asd shell /home/johan/test# Then moving it on the root box jb-1 /usr/jails# mv shell/home/johan/test core/home/johan/ jb-1 /usr/jails# And back on shell jail: shell /home/johan/test# ls asd shell /home/johan/test# pwd pwd: .: No such file or directory shell /home/johan/test# cd .. shell /home/johan# ls .cshrc .lesshst.mailrc .shrc .vimrc file.bigroundcube.sql www.tar.gz .histfile .login .mysql_history .ssh.zcompdu mp picsstuff .history.login_conf .profile.vim.zshrc postfix-2.4.5 test .irssi .mail_aliases .rhosts .viminfo cacert.pem public_html vmail.tar.gz shell /home/johan# Thats my home dir on core!.. That should very much not be visible there! I have full access now (from the wrong jail!) Known bug or did I just stumble upon something pretty bad?? -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Backup solution suggestions
Hello I'm looking to invest in some new hardware for backup. probably some kind of NAS (a 4-disk 1U NAS or something in that size). The thing is that I won't be the only one with access to this box, thus I would like to secure my data. What I would like is encryption both for the transfer to the box, and encrypted on disk. The data on disk should not be readable by anyone but me (ie the other user(s) of the box should not be able to read it, at least not without a big effort). So, I'm wondering what the best solution might be.. Tar'balling all my stuff and encrypt it with GPG or something and just dump it there with NFS would be the easiest solution, but maybe not the best. I've been thinking about running a GELI image on my box, and store that on the NAS over NFS.. would that be doable/secure/stable? Another idea would be to go with some regular 1U box running some FBSD, doing scp to the box and geli local on the box but that would require me to have the encryption keys on that box (which would be shared so thus no good idea). Any other ideas? Being able to rsync to the backup storage instead of just sending big encrypted tarballs would be very nice (and I guess that would be possible with geli version) Maybe not the perfect list for this, but it is somewhat freebsd specific and I'm sure some other ppl on the list have had simliar situations :) -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Backup solution suggestions
First of all, thanks for your extensive answer! On Jan 15, 2008, at 13:34 , Jeremy Chadwick wrote: On Tue, Jan 15, 2008 at 10:52:56AM +0100, Johan Ström wrote: I'm looking to invest in some new hardware for backup. probably some kind of NAS (a 4-disk 1U NAS or something in that size). The thing is that I won't be the only one with access to this box, thus I would like to secure my data. In my experience, your best bet when it comes to backups like what you want (1U box with 4 disks, or a 2U box with 8 or more) is to simply buy a server with the specifications you want, and run FreeBSD on it. I cannot recommend commercial products for something of this "scale" (e.g. small/medium). I could list off all the reasons why [as a small hosting provider] I avoid proprietary backup solutions, but the list is quite long. The two main reasons: 1) Proprietary solutions often use proprietary hardware. How do you know what's inside of that mystery box? What if it uses a SATA controller you know has h/w-level bugs in it? What if something in the device fails; are you going to be charged an arm and a leg for a replacement part? Does it even HAVE user-servicable parts? etc... I feel much more confident relying on hardware that I'm familiar with, e.g. I know what motherboard is in the server I buy or build, I know who makes it, I know if it's compatible with FreeBSD or Linux, I know the SATA controller works and isn't flaky, I know the SATA backplane actually works properly and supports hot-swapping, and I know if I need replacement parts I can get them promptly. Also, if the h/w I buy turns out to have compatibility problems or performance issues, I can always return it, get my money back, and try other h/w; with a proprietary solution you're "stuck with it", and if something's broken about it which the vendor can't/won't fix, you're screwed. 2) Proprietary solutions also means proprietary software. This is pretty much guaranteed regardless of what h/w is used. What if the volume manager used for your array has a bug and your data is corrupt? You have no way of really "knowing" this until it's too late, and you only have one person to turn to: the vendor. All good points there, cannot argue against that. Certainly something to think about before doing any purchases. The only thing against that right now is size (we've got "cheap" access to a rack with limited depth), havent realy found any good 1U chassis that arent to deep. Admittedly I haven't spent veery much time looking yet but.. :) I prefer to have freedom of choice when it comes to backup methods. "Hmm, dump/restore isn't working out very well, so maybe I'll try ZFS, or bacula, or tar over NFS, or rsync, or...". What I would like is encryption both for the transfer to the box, and encrypted on disk. The data on disk should not be readable by anyone but me (ie the other user(s) of the box should not be able to read it, at least not without a big effort). I'm curious what the reason is for on-disk encryption? Is it necessary for something *only you* will have access to? What's the concern here? I think I wrote that I *wont* be the only one with access to the box. Sorry if that wasn't clear. It will be shared with a friend (or rather his company) of mine. I do trust him, but to keep some level of security I don't want him (or rather, someone with access to his box) to be able to read my files (and the other way arround for his files). So, I'm wondering what the best solution might be.. Tar'balling all my stuff and encrypt it with GPG or something and just dump it there with NFS would be the easiest solution, but maybe not the best. I've been thinking about running a GELI image on my box, and store that on the NAS over NFS.. would that be doable/secure/stable? I would recommend avoiding NFS unless the machine you're running nfsd/mountd/portmap on has no direct way to talk to the Internet. It's impossible to get NFS-related daemons to bind solely to one IP/ interface on FreeBSD, which imposes a security risk. If the machine is behind NAT, you're very likely safe (unless the public has some way of accessing another machine on that NAT network). Thus, if you choose to go the NFS route, have it on a segregated network. The box will be on a separate LAN only accessible by our two boxes. No internet connectivity. But the client boxes ofcourse have internet connectivty (but that would only be NFS clients, not servers). That said -- what we use in our production environment is dump/restore over SSH over a dedicated LAN. I wrote a series of scripts that do this, using SSH keys for the SSH portion. Incrementals are done 6 days a week, with fulls done once a week. I use a si
Re: Backup solution suggestions
On Jan 15, 2008, at 15:03 , Ronald Klop wrote: This sounds like a problem for 'tarsnap'. It's from the same author as portsnap. http://www.daemonology.net/blog/2006-09-13-encrypted-backup.html http://www.daemonology.net/blog/2007-08-29-tarsnap-update.html http://www.tarsnap.com/ I never used it so I don't know more about it than you can find in these url's. Indeed, sounds like that could be what I'm looking for. But however nothing public yet, only "private" beta.. And from the sound of it, only a hosted service? Nothing that one will be able to put up themself. http://news.ycombinator.com/item?id=81221 And a remotely hosted service I can have at home, but it's the bandwith of the internet link that limits me.. -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Backup solution suggestions
On Jan 15, 2008, at 13:44 , Jeremy Chadwick wrote: On Tue, Jan 15, 2008 at 12:40:02PM +0100, Vladimir Botka wrote: Dne Tue, 15 Jan 2008 10:52:56 +0100 Johan Ström <[EMAIL PROTECTED]> napsal(a): Hello I'm looking to invest in some new hardware for backup. probably some kind of NAS (a 4-disk 1U NAS or something in that size). The thing is that I won't be the only one with access to this box, thus I would like to secure my data. What I would like is encryption both for the transfer to the box, and encrypted on disk. The data on disk should not be readable by anyone but me (ie the other user(s) of the box should not be able to read it, at least not without a big effort). So, I'm wondering what the best solution might be.. Tar'balling all my stuff and encrypt it with GPG or something and just dump it there with NFS would be the easiest solution, but maybe not the best. I've been thinking about running a GELI image on my box, and store that on the NAS over NFS.. would that be doable/secure/stable? Another idea would be to go with some regular 1U box running some FBSD, doing scp to the box and geli local on the box but that would require me to have the encryption keys on that box (which would be shared so thus no good idea). Any other ideas? Being able to rsync to the backup storage instead of just sending big encrypted tarballs would be very nice (and I guess that would be possible with geli version) Maybe not the perfect list for this, but it is somewhat freebsd specific and I'm sure some other ppl on the list have had simliar situations :) -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ Hello, As of the encryption on the transfer I use security/sfs to mount remote directory for backup and then rsync in the local. I thought SFS looked pretty neat until I saw this in the documentation: Finally, you must export all the local-directorys in your sfsrwsd_config to localhost via NFS version 3. See my mail to Johan, as it documents a known "issue" with nfsd/mountd/portmap on FreeBSD (re: binding to INADDR_ANY and using dynamically-allocated port numbers). This circles back to my "if you HAVE to use NFS, do so on a dedicated network which has no public access" statement. SFS indeed looked very nice, but didnt provide me with the encrypted- on-disk feature I need as I understand?. As mentioned earlier I don't want to store crypto keys on the backup machine itself, otherwise I could have used geli or something. Thanks -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Backup solution suggestions
On Jan 15, 2008, at 22:09 , Aristedes Maniatis wrote: On 15/01/2008, at 8:52 PM, Johan Ström wrote: I'm looking to invest in some new hardware for backup. probably some kind of NAS (a 4-disk 1U NAS or something in that size). The thing is that I won't be the only one with access to this box, thus I would like to secure my data. What I would like is encryption both for the transfer to the box, and encrypted on disk. The data on disk should not be readable by anyone but me (ie the other user(s) of the box should not be able to read it, at least not without a big effort). Take a look at bacula. It is a proper backup system, meaning that it does incremental backups, etc. Storage pools can be encrypted. Not sure if the network stream can be, but that could be solved with an ssh tunnel. And it is open source, reliable and runs nicely on FreeBSD. My main problem with existing solutions is this "gap" of encryption on the backup server side. I dont want it to be readable outside of my box (without encryption keys ofcourse), so as soon as I send it of from my box I want it to be encrypted over the link, and down on the disk. Not decrypted on the remote box, to then be encrypted again (with keys available on that box) and then stored to disk. That would allow any users of that box (yes sure you can have file permissions but lets assume someone else have root access there) to read my files. Simple Example: I create regular tarball (gziped maybee) with some files i want to backup, Then i encrypt this file with ie gpg. Then i send of this file using some unspecified network protocol to the storage server. Encrypted all the way, from my end to the remote disk.. The downside is that it is a static file.. not a "dynamic filesystem", nothing I can mount and have easy access to individual files from. *Thats* what I'm looking for. -- Johan___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Backup solution suggestions [ggated]
On Jan 16, 2008, at 23:27 , Ulrich Spoerlein wrote: On Wed, 16.01.2008 at 00:26:34 +0100, Johan Ström wrote: I create regular tarball (gziped maybee) with some files i want to backup, Then i encrypt this file with ie gpg. Then i send of this file using some unspecified network protocol to the storage server. Encrypted all the way, from my end to the remote disk.. The downside is that it is a static file.. not a "dynamic filesystem", nothing I can mount and have easy access to individual files from. *Thats* what I'm looking for. Export the disk on the backup server with ggated. Bind it on the client with ggatec. Slap a GELI or GBDE encryption on top of it and then put a ZFS on top of it. You can mount/import this "remote" ZFS at will and do your zfs send/receive on your local box. Nothing ever leaves your box unencrypted. Now that is a cool solution! That actually sounds like something doable. I tried it out some at home between a 6.2 box (client) and 7.0 box (server), hosting the system in a ZFS "sparse volume" with a predefined size, exported that via ggated and connected ggatec on the client box. I then did some experimentation with just newfs, and it worked great! The only downside with this would be that the size is fixed. So I played around a bit with setting the volsize property in ZFS and it seemd to work just fine. zfs list reported the new, bigger, size. Restarted ggatec and did a growfs, and then remounted.. Yay bigger disk :) Then I went on do do some geli test, geli'ed /dev/ggate0 and newfs'ed, mounted and played around a bit. All fine.. Now came the problem, i unmounetd it, expanded the zfs volume a bit more, restarted ggatec and tried to attach it using geli again (note, I have no idea if this is supposed to work at all, I'm just testing. Havent read such things anywhere). Now I got Invalid argument. Im not realy sure about how GEOM works, but if I recall correct it uses the last sectors of the disk? If I moved X bytes of data from old end of disk to new end of disk, would that make GELI work? If I can get that to work, then this would be a kickass solution (all encryption stuff works great, I don't have to allocate all space immediatly, I can expand it later without destroying data and starting from scratch etc). Some other questions, more related to ggated/c. Is this stable? Good working? how does it handle failure situations? Anyone using it for production systems? Yes this is for backup only so minor glitches might be acceptable for me, but I'd rather know about those beforehand. I did some dd from urandom to the disk, with and without GELI.. I did notice some slightly lower speeds, i was able to write around 11MB/s without GELI, with GELI it did around 9.5MB/s. The client machine is no super box but its not that bad (A64 3200, 1G mem with not much load). Input and ideas? Thank you very much :) -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Backup solution suggestions
On Jan 16, 2008, at 19:02 , Toomas Aas wrote: Johan Ström wrote: My main problem with existing solutions is this "gap" of encryption on the backup server side. I dont want it to be readable outside of my box (without encryption keys ofcourse), so as soon as I send it of from my box I want it to be encrypted over the link, and down on the disk. Not decrypted on the remote box, to then be encrypted again (with keys available on that box) and then stored to disk. That would allow any users of that box (yes sure you can have file permissions but lets assume someone else have root access there) to read my files. Simple Example: I create regular tarball (gziped maybee) with some files i want to backup, Then i encrypt this file with ie gpg. Then i send of this file using some unspecified network protocol to the storage server. Encrypted all the way, from my end to the remote disk.. The downside is that it is a static file.. not a "dynamic filesystem", nothing I can mount and have easy access to individual files from. *Thats* what I'm looking for. As a long-time user of Amanda and regular lurker on their mailing list, I've noticed that latest versions of Amanda have encryption capabilities. They seem to fit your needs in that encryption can be performed entirely on the backup client ("your box") side if one opts to set things up that way. I haven't used encryption with Amanda myself so this is just what I've heard on the list and read from the wiki just now: http://wiki.zmanda.com/index.php/How_To:Set_up_data_encryption As for the ease of restore, it's not quite *that* easy, i.e. you can't just transparently mount the backup as a filesystem and copy files from there. Amanda has a command-line-ftp-like recovery interface, where you can specify which files/subdirectories and from which date you want recovered. It's been easy enough for me. Looked through that page, seems like pretty much work right now. And I looked through the amanda docs, and I got to say, when calling themselfs "Amanda is the world's most popular Open Source Backup and Archiving software." one would expect somewhat better docs.. hehe. Anyway, I will look more into the ggated suggestion from another post before digging deeper into amanda :) -- Johan___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Backup solution suggestions [ggated]
On Jan 17, 2008, at 09:30 , Ulrich Spoerlein wrote: On Jan 17, 2008 1:31 AM, Johan Ström <[EMAIL PROTECTED]> wrote: Export the disk on the backup server with ggated. Bind it on the client with ggatec. Slap a GELI or GBDE encryption on top of it and then put a ZFS on top of it. You can mount/import this "remote" ZFS at will and do your zfs send/receive on your local box. Nothing ever leaves your box unencrypted. Now that is a cool solution! That actually sounds like something doable. I tried it out some at home between a 6.2 box (client) and 7.0 box (server), hosting the system in a ZFS "sparse volume" with a predefined size, exported that via ggated and connected ggatec on the client box. I then did some experimentation with just newfs, and it worked great! The only downside with this would be that the size is fixed. So I played around a bit with setting the volsize property in ZFS and it seemd to work just fine. zfs list reported the new, bigger, size. Restarted ggatec and did a growfs, and then remounted.. Yay bigger disk :) Then I went on do do some geli test, geli'ed /dev/ggate0 and newfs'ed, mounted and played around a bit. All fine.. Now came the problem, i unmounetd it, expanded the zfs volume a bit more, restarted ggatec and tried to attach it using geli again (note, I have no idea if this is supposed to work at all, I'm just testing. Havent read such things anywhere). Now I got Invalid argument. Im not realy sure about how GEOM works, but if I recall correct it uses the last sectors of the disk? If I moved X bytes of data from old end of disk to new end of disk, would that make GELI work? If I can get that to work, then this would be a kickass solution (all encryption stuff works great, I don't have to allocate all space immediatly, I can expand it later without destroying data and starting from scratch etc). I'm pretty certain that GELI cannot handle variable sized disks. But you could add GVIRSTOR into the mix. But I'd just allocate the necessary space and be done with it. Adding yet another layer is asking for trouble, imho. Okay. Some other questions, more related to ggated/c. Is this stable? Good working? how does it handle failure situations? Anyone using it for production systems? From my personal experience (which is rather limited): No, barely, bad, hell no. There were/are some open PRs about ggate. I had troubles with gmirror+ggate in that it would deadlock every other hour on SMP systems (try removing option PREEMPTION if that bug hits you). Your no,barely, bad hell no seems to fit pretty good.. I did some testing during the night with the above (non-production) setup. What I did was doing some rsyncing over the night: while true ; do echo "`date` Clearing vmail" >> logfile rm -rf vmail echo "`date` Starting rsync" >> logfile rsync -vr /usr/var/vmail . |tee -a logfile echo "`date` Rsync finished " >> logfile done I started this at ~02.0. The results? A freshly rebooted 6.2 (6.2- RELEASE-p6 FreeBSD 6.2-RELEASE-p6 #0: Fri Jul 27 15:47:50 UTC 2007) box in the morning.. Looking at the messages: Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ggate0.eli[WRITE(offset=8844480512, length=4096)] Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ggate0.eli[WRITE(offset=8844484608, length=4096)] Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ggate0.eli[WRITE(offset=8844488704, length=4096)] Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ggate0.eli[WRITE(offset=8844492800, length=4096)] Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ggate0.eli[WRITE(offset=8844517376, length=4096)] ... more of the same... Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE (offset=8844480512, length=4096)]error = 5 Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ggate0.eli[WRITE(offset=8844640256, length=32768)] ..more of the same... Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ggate0.eli[WRITE(offset=8844988416, length=4096)] Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE (offset=8844484608, length=4096)]error = 5 Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE (offset=8844488704, length=4096)]error = 5 Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE (offset=8844492800, length=4096)]error = 5 Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE (offset=8844517376, length=4096)]error = 5 Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ggate0.eli[WRITE(offset=8844992512, length=4096)] ...more of the same... Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ggate0.eli
HP ProLiant DL360 G5 success stories?
Hello! Im looking into getting a new server box to replace a Supermicro box, which unfortunately have a bunch of problems with heat, random hangups, crappy IPMI/remote admin capabilities etc.. What I'm looking at is a DL360 G5, probably with one E5335 (quad 2.0) and 4G of RAM and 4x 146Gb SAS disks on the Smart Array P400i card. I've googled and looked through the list archives trying to find success stories/problem reports using FreeBSD on this box, but haven't found very much.. Only thing was http://www.freebsd.org/platforms/amd64/motherboards.html which says "Functional" which isnt very informative ;) So.. Does anyone have any experience with this combo (DL360 G5 / P400i)? Furthermore, anyone run 7.0 on this? Or should I still stick with 6.3... Load will be a couple of jails mainly running apache + php + mysql (or at least thats where the load will be). Thanks! -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
First of all, nice with all these positive answers! Thank you all (without responding to each and every post:))! On Mar 12, 2008, at 12:35 PM, Pete French wrote: What I'm looking at is a DL360 G5, probably with one E5335 (quad 2.0) and 4G of RAM and 4x 146Gb SAS disks on the Smart Array P400i card. ... So.. Does anyone have any experience with this combo (DL360 G5 / P400i)? We have around 20 machines like that and they work beautifully. We run 7.0/amd64 on the machines now, but we have run 6.2/i386 in the past and that work fine - though you will only be able to use the first 3.5 gig of RAM. I don't have any plans on running i368, running amd64 on the supermicro box now without any problems (that I can relate to that at least). How long have you run 7.0 (before release)? From all the other responses it seems lots of ppl use 7.0 on these without any problems at all. Furthermore, anyone run 7.0 on this? Or should I still stick with We run 7.0 on these machines and it works fine - I always prefer 7.0 to 6.3 on SMP machines as it performs better. Also 7.0 works well with the iLO on these machines - I seem to recall when I installed 6.X that it didn't work too well and I had to use boot floppy images. I'd say go for 7.0 and amd64 if you can. This is where I'm a bit curious. What OS interaction does iLO do? That needs to be "compatible" i mean. On my current box I got a IPMI card that gives me (when its working..) SOL capabilities.. To what degree can I remote control with iLO? If I've understood correct, I get the exact console as on screen with kb access, over web/ssh/telnet. Is this working good? This is one of my important points for changing since its so crappy on my current box, and when the box is a couple of miles away its quite nice to have it working flawlessly.. iLO over internet? Possible, impossible? Encryption? (yes i know, not exactly freebsd related questions but.. ) Another thing, how is it with physical monitoring? Temperatures/ fanspeeds/voltage? Thank you (all)! :) -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
On Mar 12, 2008, at 12:26 PM, Krassimir Slavchev wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Johan Ström wrote: Hello! Im looking into getting a new server box to replace a Supermicro box, which unfortunately have a bunch of problems with heat, random hangups, crappy IPMI/remote admin capabilities etc.. What I'm looking at is a DL360 G5, probably with one E5335 (quad 2.0) and 4G of RAM and 4x 146Gb SAS disks on the Smart Array P400i card. I have some DL360 G5 (3.0GHz E53xx) with 8G, 4x146G SAS and P400i and they work perfect with 7.0 (amd64) and sched ULE! I've googled and looked through the list archives trying to find success stories/problem reports using FreeBSD on this box, but haven't found very much.. Only thing was Try to find "Performance!" thread on this list Found it, but the main point seemed to be dont compile postgres stuff with incompatible threadsafeness? Or did I miss something? http://www.mail-archive.com/freebsd-stable@freebsd.org/msg92787.html http://www.freebsd.org/platforms/amd64/motherboards.html which says "Functional" which isnt very informative ;) So.. Does anyone have any experience with this combo (DL360 G5 / P400i)? Yes, P400i works fine with ciss(4) driver Can i offline/online drives, rebuild arrays etc from the OS? Thank you! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
On Mar 12, 2008, at 11:27 PM, Pete French wrote: How long have you run 7.0 (before release)? From all the other responses it seems lots of ppl use 7.0 on these without any problems at all. I've been running it since last september - never had any problem with it, and am pretty convinced it is stable. Sounds good :) This is where I'm a bit curious. What OS interaction does iLO do? That needs to be "compatible" i mean. Booting from the CD - I had one FreeBSD/iLO combination which would noit boot from the emulated CD. I needed to use the floppies and do a network install. That was painful - I can't remember the version though. Certainly I have had no such probelsm with 7.0 and the latest iLO. I see.. well if i ever need to boot from cd (reinstall) this box i'll probably do it with local access anyway.. SOL capabilities.. To what degree can I remote control with iLO? If It acts as a complete console - just as if you were sitting in front of the machine. You can see the screen, use keyboard and mouse, and attack images as CD's or floppies. iLO over internet? Possible, impossible? Encryption? (yes i know, not exactly freebsd related questions but.. ) iLO runs over https so is encrypted. It does run better from a Windows client than anything else sadly - but I keep a wWindows box around for this purpose. Have just installed a set of machines somewhere in Louisianna remotely, whilst sitting in bed in London with a cup of tea : using an OSX laptop :-) I love iLO... Just what I thought then. Thanks :) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
On Mar 12, 2008, at 11:37 PM, Joe Koberg wrote: The iLO is a completely separate management processor with its own network port. It runs its own OS and has its own IP address. It runs an SSL webserver for access. The iLO is accessible over the network any time the machine is plugged into power. I am not sure about IPMI access to it. Okay, kind of what I "expected" (havent read up very much on it yet). The "normal" iLO option will give you exact textual console screen output and keyboard control from the moment of power-on. It will also let you toggle power and hit the reset button. I believe it uses a java applet in the browser. The "advanced" iLO option, which is license-key-unlocked, also provides graphical remote console, and virtual media. You can upload a CD or floppy image and then boot the server from it. I suspect the compatibility issue appears here - the virtual media probably emulates USB mass storage, and the OS must be able to boot from it. I see... So for a box that is going to run fbsd in console mode, and hopefully never need to boot from CD after install, it sounds like the normal mode will work splendid. But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf seems to tell me that in basic mode I can only access BIOS (pre-OS) using the Remote Console feature, and that after POST I have to have the advanced licensed option? "iLO 2 displays this information through the remote console applet while in the server pre-operating system state, enabling a non-licensed iLO 2 to observe and interact with the server during POST activities. A non- licensed iLO 2 cannot use remote console access after the server completes POST and begins to load the operating system. The iLO 2 Advanced License enables access to the remote console at all times." So.. Then what? I have to configure FreeBSD to use a serial console and continue with using serial console instead? Later in the same doc: • iLO 2 Standard (unlicensed:) NOTE: The features annotated with an asterisk (*) are not supported on all systems. o Virtual Power and Reset control o Remote serial console through POST only ... o Serial access* Am i missing something here or will I only be able to access the console during post, unless i configure the box to use a serial console? Hope you can shed some light here :) It has full reporting of hardware state and management log details, and the "home page" is a big summary with any faults outlined in red. Yes, that was what I expected. But can i retreive the data some other way? IPMI, SNMP or something? Would like to gather the stats to a central management site. Further investigation in the manual seems to indicate that no SNMP access is available, but there is some XML "RIBCL" interface I can use (yes this is in standard mode too :)) Thank you! -- Johan___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
On Mar 13, 2008, at 12:40 AM, Joe Koberg wrote: Johan Ström wrote: But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf seems to tell me that in basic mode I can only access BIOS (pre- OS) using the Remote Console feature, and that after POST I have to have the advanced licensed option? I don't do the purchasing and we get all Advanced iLO, so I will take your word for it. The older generations supported text console (i have a 360G2 that does so). We use the HP Management agents under Windows for all SNMP reporting so I can't comment on the reporting method under other OS's. I see. Can anyone else maybe shed some light here? Thanks___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
On Mar 13, 2008, at 1:01 AM, Sean Winn wrote: For using HP blades and standard iLO (no licensed advance features), it works perfectly well, installing both FreeBSD 5 and 6 on the blades I've tried, using a remote install CD from the Java applet (there's one for remote devices like disks, and one for remote console); there's only text mode but that's plenty to install the OS and enable it to the point of using SSH to manage it from there on in. I'd hope the iLO hardware/software is relatively common to all the HP range :) text mode access continues at all times - the iLO interface is just a remote screen/keyboard onto it, even POST BIOS boot. The external devices are USB mass storage ones, but I didn't have problems booting off the CD and installing it for 6.2. Well.. The blades seems to be an exception: • iLO 2 Standard Blade Edition (unlicensed blade server): o Remote Console and IRC This is not listed under "iLO 2 Standard (unlicensed:)"... I guess that means I'm out of luck unless I want to bang up another $400 (listing price).. Which I'd rather not :) Anyone running those 360G5's using serial console on a normal licensed iLO? The iLO web interface nominally requires/is tested only on Windows/ IE setup, but I do my work from a Mac running Safari, and have no problems to date. On 13/03/2008, at 10:41 AM, Johan Ström wrote: On Mar 13, 2008, at 12:40 AM, Joe Koberg wrote: Johan Ström wrote: But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf seems to tell me that in basic mode I can only access BIOS (pre- OS) using the Remote Console feature, and that after POST I have to have the advanced licensed option? I don't do the purchasing and we get all Advanced iLO, so I will take your word for it. The older generations supported text console (i have a 360G2 that does so). We use the HP Management agents under Windows for all SNMP reporting so I can't comment on the reporting method under other OS's. I see. Can anyone else maybe shed some light here? Thanks___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
On Mar 13, 2008, at 4:57 PM, David Schutt wrote: Johan Ström wrote: On Mar 13, 2008, at 1:01 AM, Sean Winn wrote: For using HP blades and standard iLO (no licensed advance features), it works perfectly well, installing both FreeBSD 5 and 6 on the blades I've tried, using a remote install CD from the Java applet (there's one for remote devices like disks, and one for remote console); there's only text mode but that's plenty to install the OS and enable it to the point of using SSH to manage it from there on in. I'd hope the iLO hardware/software is relatively common to all the HP range :) text mode access continues at all times - the iLO interface is just a remote screen/keyboard onto it, even POST BIOS boot. The external devices are USB mass storage ones, but I didn't have problems booting off the CD and installing it for 6.2. Well.. The blades seems to be an exception: • iLO 2 Standard Blade Edition (unlicensed blade server): o Remote Console and IRC This is not listed under "iLO 2 Standard (unlicensed:)"... I guess that means I'm out of luck unless I want to bang up another $400 (listing price).. Which I'd rather not :) Anyone running those 360G5's using serial console on a normal licensed iLO? Yes. We have one DL360 G5, and I was able to get serial console working using information I found in this thread -- http://lists.freebsd.org/pipermail/freebsd-proliant/2007-October/000303.html Only downside is that the physical COM1 becomes unavailable, which caused some consternation when trying to monitor a UPS :-) That is probably not a big deal for me. Then I guess this should work fine.. I've just played around some on 7.0 with serial console on a regular port (not HP box though), and it seems to work fine. From what I could tell i can just SSH to iLO and enter 'vps' and I get the serial port, and that this works very good (http://lists.freebsd.org/pipermail/freebsd-proliant/2007-August/000292.html ). If anyone thinks opposite, I'd appreciate a line. :) Thanks!___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
On Mar 16, 2008, at 8:36 AM, Ulf Zimmermann wrote: On Wed, Mar 12, 2008 at 06:40:49PM -0500, Joe Koberg wrote: Johan Str?m wrote: But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf seems to tell me that in basic mode I can only access BIOS (pre-OS) using the Remote Console feature, and that after POST I have to have the advanced licensed option? I don't do the purchasing and we get all Advanced iLO, so I will take your word for it. The older generations supported text console (i have a 360G2 that does so). We use the HP Management agents under Windows for all SNMP reporting so I can't comment on the reporting method under other OS's. iLO2 ActiveX based remote console (Integrated KVM) can still do text only console without license but it doesn't work too well IMHO. The Java based console is the same, text will work out license but graphics mode and that includes certain VESA text modes. Standard iLO gives the graphical console and virtual media. On Blade servers the graphical access and virtual media is included. And the Advanced license gives extra stuff like integration into AD for authentication afik. How about SSH mode? SSH and view textmode at boot (serial rdr in bios too?) and console @ serial in fbsd (bootloader and on). Does that work good or "not to well" either? Lets hope it works out good now at least, I ordered the box, without full license though, but I guess I can always get that later on if it turns out to work like crap.. But for once I'm purchasing quality brand hardware.. So it should work with me instead of against me... I hope :) Thank you all for all of your replies! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
On Mar 17, 2008, at 11:46 AM, Ulf Zimmermann wrote: On Mon, Mar 17, 2008 at 08:33:20AM +0100, Johan Str?m wrote: On Mar 16, 2008, at 8:36 AM, Ulf Zimmermann wrote: On Wed, Mar 12, 2008 at 06:40:49PM -0500, Joe Koberg wrote: Johan Str?m wrote: But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf seems to tell me that in basic mode I can only access BIOS (pre-OS) using the Remote Console feature, and that after POST I have to have the advanced licensed option? I don't do the purchasing and we get all Advanced iLO, so I will take your word for it. The older generations supported text console (i have a 360G2 that does so). We use the HP Management agents under Windows for all SNMP reporting so I can't comment on the reporting method under other OS's. iLO2 ActiveX based remote console (Integrated KVM) can still do text only console without license but it doesn't work too well IMHO. The Java based console is the same, text will work out license but graphics mode and that includes certain VESA text modes. Standard iLO gives the graphical console and virtual media. On Blade servers the graphical access and virtual media is included. And the Advanced license gives extra stuff like integration into AD for authentication afik. How about SSH mode? SSH and view textmode at boot (serial rdr in bios too?) and console @ serial in fbsd (bootloader and on). Does that work good or "not to well" either? Lets hope it works out good now at least, I ordered the box, without full license though, but I guess I can always get that later on if it turns out to work like crap.. But for once I'm purchasing quality brand hardware.. So it should work with me instead of against me... I hope :) Thank you all for all of your replies! iLO1 (used on DL360 g3, g4, g4p and DL380 g3, g4) had text console via ssh and I have used it often because of cut+paste. Unfortunatly as far I know iLO2 (used on g5) does not support ssh text console. Hm.. No not native text console, but the virtual serial port should work under SSH if I'm reading the manual correct: " Although additional configuration steps are required to use Remote Serial Console (as compared to using the remote console or IRC), the Remote Serial Console allows telnet or SSH users to interact with the server remotely and without requiring an iLO 2 Advanced license and is the only way a true text-based remote console is presented by iLO 2. " If I've understood correct, the "text console" mode from iLO 1 is removed all together in favor for graphical mode (the internal workings have been changed). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
On Mar 17, 2008, at 9:52 AM, Jeremy Chadwick wrote: On Mon, Mar 17, 2008 at 08:33:20AM +0100, Johan Ström wrote: On Mar 16, 2008, at 8:36 AM, Ulf Zimmermann wrote: On Wed, Mar 12, 2008 at 06:40:49PM -0500, Joe Koberg wrote: Johan Str?m wrote: But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf seems to tell me that in basic mode I can only access BIOS (pre-OS) using the Remote Console feature, and that after POST I have to have the advanced licensed option? I don't do the purchasing and we get all Advanced iLO, so I will take your word for it. The older generations supported text console (i have a 360G2 that does so). We use the HP Management agents under Windows for all SNMP reporting so I can't comment on the reporting method under other OS's. iLO2 ActiveX based remote console (Integrated KVM) can still do text only console without license but it doesn't work too well IMHO. The Java based console is the same, text will work out license but graphics mode and that includes certain VESA text modes. Standard iLO gives the graphical console and virtual media. On Blade servers the graphical access and virtual media is included. And the Advanced license gives extra stuff like integration into AD for authentication afik. How about SSH mode? SSH and view textmode at boot (serial rdr in bios too?) and console @ serial in fbsd (bootloader and on). Does that work good or "not to well" either? I have to chime in here. Who cares if it has SSH support? iLO, LOM, and serial console should all be done over a *private network*, and should NOT be hooked up to a publicly-accessible network or given public IPs. I cannot stress how important this is. DO NOT put stuff like this on the public Internet: you will regret it. The advantage to iLO is that it's the equivalent of KVM-over-IP, supporting virtual media too (read: an ISO image on your laptop/local client machine being used as a CD on the server itself, thus you can install whatever OS you want, etc.). You get NATIVE VGA CONSOLE remotely on the machine -- there is no "serial console", and that's always best. I've seen it in action, and it's *awesome*. For advanced license yes. Thats another $400 or so (which might not be very much money for big corps but for me and my one server installation its more..) Said iLO capability usually works over a series of TCP or UDP ports, somtimes even supporting HTTP (on the iLO module itself!) which means if its on a private network, you can tunnel to it using SSH or similar utilities via another box in the co-lo. Then simply access 127.0.0.1:whatever in the ActiveX, Java, or native Win32/Linux client and voila -- you have the machines' native VGA console in front of you, with no issues relating to serial console. No more "ohhh, the bootup configuration uses 9600bps, but our serial console servers are configured to use 115200bps... but the disk isn't booting so it's still using 9600bps at that stage, now I HAVE to go to the datacenter" scenarios. Yep, there are some downsides with serial console. But if it works, i'd rather use a normal ssh client in my terminal together with the virtual serial port than sitting in a web browser. But i'll guess I'm going to evaluate the serial port option when I get the box, and if it isnt working to good i'll just have to throw up the money and get the advanced license (even if i'd rather use that money on more "fun" things..) I do not trust IPMI based on stories I have heard from Yahoo! SAs, talking about how every implementation is different (so much for a "standard"), and how the number of bugs in Supermicro's IPMI implementation are absurd. Supposedly Intel and others have done a better job with it, but I lost all interest in it once I found that there was no real "standard". Besides, anything that "piggybacks" on top of an existing LAN port (even some iLO implementations do this!) is worth avoiding. I do not want to deal with a single NIC emitting two separate MAC addresses -- and that's what happens. It's sometimes referred to as "ASF" as well. I've got a supermicro ipmi card now and.. I'm afraid I cannot describe it with better words than "crappy toy".. Constant IPMI card restarts/ crashes, the serial consol java browser applet stopping responding, firmware upgrades that b0rks the card totally etc... -- Johan___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
FreeBSD 7 and multiple IP (mijail-patch in 6.x)
Hello I got a machine running 6.2 right now, which is being replaced. And since SMP performance is much better on 7.x I'd like to go with 7.0 (and many ppl have indeed verified that it works good on this box, HP DL360 G5)... But, now when I start to setup the machine, I recalled that i've patched the 6.2 box with the freebsd mijail patch (http://www.digitaldaemon.com/FreeBSD/FreeBSD/FreeBSD_6.2-STABLE-mijail.patch ). However, I cannot find anywhere about FreeBSD 7 and a similar patch. A quick look at the patch vs the 7.x source tells me it won't apply cleanly, but from what I've seen quickly, it could maybe be done. The differences I've seen doesn't look too advanced, but then again, I'm not a kernel developer... So, I'd like to know if anyone considered this on 7.x, or if anyone can tell me immediately that this wont work or will be LOTS of work, or just some patch line adjusting? Ie, how big are the changes from 6.x to 7.x in these sections? Thank you for any answers or pointers. -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD 7 and multiple IP (mijail-patch in 6.x)
On Apr 3, 2008, at 8:39 PM, Bjoern A. Zeeb wrote: On Mon, 31 Mar 2008, Johan Ström wrote: Hi, I got a machine running 6.2 right now, which is being replaced. And since SMP performance is much better on 7.x I'd like to go with 7.0 (and many ppl have indeed verified that it works good on this box, HP DL360 G5)... But, now when I start to setup the machine, I recalled that i've patched the 6.2 box with the freebsd mijail patch (http://www.digitaldaemon.com/FreeBSD/FreeBSD/FreeBSD_6.2-STABLE-mijail.patch ). However, I cannot find anywhere about FreeBSD 7 and a similar patch. A quick look at the patch vs the 7.x source tells me it won't apply cleanly, but from what I've seen quickly, it could maybe be done. The differences I've seen doesn't look too advanced, but then again, I'm not a kernel developer... So, I'd like to know if anyone considered this on 7.x, or if anyone can tell me immediately that this wont work or will be LOTS of work, or just some patch line adjusting? Ie, how big are the changes from 6.x to 7.x in these sections? I had planned to have a patch for multiv4/v6 jails last month but it's not yet publicly available. I have sent it off to some people for review. In case the above is a successor of pjd's multi-ip v4 jail patch I can give you a plain forward port to a FreeBSD 7 system (which might have possible locking issues I have never experienced). All depends on how quickly you need it. Hello, thanks for your answer. Yep, the patch i've been using on 6 looks very much like pjd's (http://people.freebsd.org/~pjd/patches/mijail5.patch ). Are you using this Fbsd7-port, or do you have any idea if anyone does/how much it have been tested? I have no need for IPv6 right now, so if nothing else, I'd be glad to test the 7-port of pjd's to see if it works. That sounds kindof what I thought to do so.. :) Thank you! -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
ZFS deadlock
Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, since I've edited my file yesterday for next reboot), with 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is at default), but since I just got back to 2G total mem after some hardware problems I've been runnig at those lows (1G total is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I can still do Ctrl-T on console, but thats it.. I don't really know what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it "unlocks" or if anyone have any suggestions. Thanks -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS deadlock
On Apr 8, 2008, at 9:32 AM, Jeremy Chadwick wrote: On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Ström wrote: Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, since I've edited my file yesterday for next reboot), with 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is at default), but since I just got back to 2G total mem after some hardware problems I've been runnig at those lows (1G total is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I can still do Ctrl-T on console, but thats it.. I don't really know what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it "unlocks" or if anyone have any suggestions. I don't think there are any suggestions left to give. Many people, including myself, have experienced this kind of problem. It's well- documented both on my Common Issues page, and the official FreeBSD ZFS Wiki. Ah.. I guess I was just to restrictive with the googling on "zfs:&buf_hash_table.ht_locks[i].ht_lock". ZFS is still considered highly experimental, so if your data is at all important to you, perform backups or switch to another filesystem provider. That I am aware of. Thanks.___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS deadlock
On Apr 8, 2008, at 9:37 AM, LI Xin wrote: Johan Ström wrote: Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, since I've edited my file yesterday for next reboot), with 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is at default), but since I just got back to 2G total mem after some hardware problems I've been runnig at those lows (1G total is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I can still do Ctrl-T on console, but thats it.. I don't really know what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it "unlocks" or if anyone have any suggestions. The key is to increase your kmem and prevent it from being exhausted. I think more recent OpenSolaris's ZFS code has some improvements but I do not have spare devices at hand to test and debug :( Yep, never had the problem when I was running with 2G total mem, but then one stick (damn consumer crap) failed and I was left with 1G, and I started to have random problems. Going to tune kmem back up now when I got more mem again, thinking about putting in 4G too.. Maybe pjd@ would get a new import at some point? I have cc'ed him. Cheers, -- Xin LI <[EMAIL PROTECTED]>http://www.delphij.net/ FreeBSD - The Power to Serve! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS deadlock
On Apr 8, 2008, at 9:40 AM, LI Xin wrote: For your question: just reboot would be fine, you may want to tune your arc size (to be smaller) and kmem space (to be larger), which would reduce the chance that this would happen, or eliminate it, depending on your workload. Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are those reasonable on a 2G machine? I think I've read that from somewhere, but cannot find that (arc at least) in the TuningGuide now. This situation is not recoverable and you can trust ZFS that you will not lose data if they are already sync'ed. Actually, I've had a lot of hard crashes lately on this machine (bad hw) but not a single time I have lost data (to my knowledge at least...). In that regard, comparing to UFS, ZFS is waaay better! :) -- Xin LI <[EMAIL PROTECTED]>http://www.delphij.net/ FreeBSD - The Power to Serve! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
connect(): Operation not permitted
Hello I got a FreeBSD 7 machine running mail services (among other things). This machine recently replaced a FreeBSD 6.2 machine doing the same tasks. Now and then I need to send alot of mail to customers (mailing list), and one thing i've noticed now after the change is that when I use a lot of connections subsequently (high connection rate, even if they are very shortlived) inside a jail (dunno if that has anything to do with it though), I start to get Operation not permitted in return to connect(). I've seen this in the PHP app that sends mail, when it tried to connect to localhost, as well as from postfix when it have been trying to connect to amavisd on localhost, but also from postfix when it has tried to connect to remote SMTP servers. I do have PF for filtering, but there are no max-src-conn-rate limits enabled for any rules that is used for this. However, from one of the jail I do have a hfsc queue limiting the outgoing mail traffic from one jailed IP. But I'm not sure that this would be the problem, since I've also seen the problem when doing localhost connects in the jail, and also in other jails on an entierly different IP that is not affected. Does anyone have any clues about what I can look at and tune to fix this? Thanks! -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: connect(): Operation not permitted
First of all, for freebsd-pf subscribers, I posted my original problem (in the bottom) to freebsd-net earlier, but replies seems to point to PF so I'll CC there too.. On May 17, 2008, at 5:19 PM, Alex Trull wrote: Hi Johan and List, In my case a few months ago it was pahu. Don't give that fine fellow an account on your precious system ! But seriously, I had a pf-firewalled jail being being used for DNS testing, with large numbers of udp "connections" hanging around in pf state. While the default udp timeout settings in PF are lower than those of the tcp timeouts, it is was still too high for it to to remove the states in time before hitting the default 10k state limit! If this is the case with you - run 'pfctl -s state | wc -l' - when there is traffic load you may see that hitting 10k states if you've not tuned that variable. What to do next - up the state limit or lower the state timeouts. I did both, to be safe. in /etc/pf.conf these must be at the very top of the file: # options # 10k is insanely low, lets raise it.. set limit { frags 16384, states 32768 } # timeouts - see 'pfctl -s timeouts' for options - you will want to # change the tcp ones rather than the udp ones for your smtp setup. # but these are mine, I set them for the dns traffic. set timeout { udp.first 15, udp.single 5, udp.multiple 30 } don't forget to: $ /etc/rc.d/pf check && /etc/rc.d/pf reload Ok, looked over the PF states now, but I'm not quite sure thats what causing it. I have default limit on 10k states, normally I seem to have around ~800 states, and when I start my test script that tries to send as many mails as possible (using PHP's Pear::Mail, creating a connection, sending, disconnecting, creating new connection.. and so on), I can clearly see the PF state counter (pfctl -vsi) increase, but the script aborts with Operation not permitted way before I hit 10k, its rather around 3-4k.. If I then wait a few seconds and run the script again, I can see the number of states increase even more, and if I do this enough times I finally hit around 9700 states. But at this point (states exhausted), I don't get Operation not permitted, instead it just seems that the script blocks up a few seconds while states clear up, then continues running until it gets a Operation not permitted. So, from the above results, I cant say that it looks like its the states? Just tried to disable the altq rule now too, no changes (not that I expected one, since its on bce0 not lo0). Another thing, which might be more approriate in freebsd-pf though.. Why would it create states at all for this traffic, when my pf.conf rule is "pass on lo0 inet from $jail to $jail" (i have a block drop in rule to drop all traffic)? A check with pfctl -vsr reveals that the actual rule inserted is "pass on lo0 inet from 123.123.123.123 to 123.123.123.123 flags S/SA keep state". Where did that "keep state" come from? Thanks for ideas :) HTH, Alex On Sat, 2008-05-17 at 16:33 +0200, Johan Ström wrote: Hello I got a FreeBSD 7 machine running mail services (among other things). This machine recently replaced a FreeBSD 6.2 machine doing the same tasks. Now and then I need to send alot of mail to customers (mailing list), and one thing i've noticed now after the change is that when I use a lot of connections subsequently (high connection rate, even if they are very shortlived) inside a jail (dunno if that has anything to do with it though), I start to get Operation not permitted in return to connect(). I've seen this in the PHP app that sends mail, when it tried to connect to localhost, as well as from postfix when it have been trying to connect to amavisd on localhost, but also from postfix when it has tried to connect to remote SMTP servers. I do have PF for filtering, but there are no max-src-conn-rate limits enabled for any rules that is used for this. However, from one of the jail I do have a hfsc queue limiting the outgoing mail traffic from one jailed IP. But I'm not sure that this would be the problem, since I've also seen the problem when doing localhost connects in the jail, and also in other jails on an entierly different IP that is not affected. Does anyone have any clues about what I can look at and tune to fix this? Thanks! -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: connect(): Operation not permitted
On May 18, 2008, at 9:19 AM, Matthew Seaman wrote: Johan Ström wrote: drop all traffic)? A check with pfctl -vsr reveals that the actual rule inserted is "pass on lo0 inet from 123.123.123.123 to 123.123.123.123 flags S/SA keep state". Where did that "keep state" come from? 'flags S/SA keep state' is the default now for tcp filter rules -- that was new in 7.0 reflecting the upstream changes made between the 4.0 and 4.1 releases of OpenBSD. If you want a stateless rule, append 'no state'. http://www.openbsd.org/faq/pf/filter.html#state Thanks! I was actually looking around in the pf.conf manpage but failed to find it yesterday, but looking closer today I now saw it. Applied the no state (and quick) to the rule, and now no state is created. And the problem I had in the first place seems to have been resolved too now, even though it didn't look like a state problem.. (started to deny new connections much earlier than the states was full, altough maybee i wasnt looking for updates fast enough or something). Anyways, thanks to all helping me out, and of course thanks to everybody involved in FreeBSD/pf and all for great products! Cannot be said enough times ;)___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD with a Gigabyte GA-K8NSC?
On Sep 3, 2006, at 14:13 , Johan Ström wrote: Hi I'm about to get a "new" server... In this case what I'm looking at is a Gigabyte GA-K8NSC mobo with nForce3 250Gb chipset, and a AMD 64 3200+ Venice S939. Does anyone have any experience with FreeBSD (6.1) and this mobo/ chipset? Does the network work? How good? SATA? Any stability/ performance issues? I did notice it was mentioned on http://www.freebsd.org/platforms/ amd64/motherboards.html on 5.4 with the only comment "Sound and USB untested.".. So.. anyone got more detailed experience than that? Thanks :) -- Johan Ström [EMAIL PROTECTED] Hi again, I got the mobo now and everything I've tested seems to work fine, network (Marvell Gigabit Ethernet) works perfect (altough just using 100mbit, havent tested gig), and sata seems to work.. Somewhat... Thats part of why I post this.. I got two disks plugged in currently, two pieces of ad4: 286187MB at ata2-master SATA150 (ad4 and ad6) on one SATA each... When I only access ad4 (the system disk) and dont touch ad6 (the old system disk, moving some data form there now.. soon to be gmirrored with ad4) it works fine. But as soon as i start to transer data from ad6 to ad4 (or rather, from ad4s1f to gm0s1f of which ad6 is provider), the system becomes veeerrry slow... Its still usable but it takes several seconds (sometimes as much as 10-20) to ie exectue a simple command like ls, top, su... gstat reports speeds of around 30MB/s: dT: 0.501 flag_I 50us sizeof 288 i -1 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 17395 12200 577.2383 490598.5 99.2| ad4 17395 12200 577.2383 490598.5 99.2| mirror/gm0 0 0 0 00.0 0 00.00.0| ad4s1 17395 12200 577.2383 490598.5 99.2| mirror/gm0s1 0387387 495701.1 0 00.0 43.0| ad6 3 2 2 32 583.1 0 00.0 116.5| mirror/gm0s1a 1 0 0 00.0 0 00.00.0| mirror/gm0s1b 0 0 0 00.0 0 00.00.0| mirror/gm0s1c 0 0 0 00.0 0 00.00.0| mirror/gm0s1d 0 0 0 00.0 0 00.00.0| mirror/gm0s1e 13393 10168 576.0383 490598.5 95.3| mirror/gm0s1f 0387387 495701.1 0 00.0 43.2| ad6s1 0 0 0 00.0 0 00.00.0| ad6s1a 0 0 0 00.0 0 00.00.0| ad6s1b 0 0 0 00.0 0 00.00.0| ad6s1c 0 0 0 00.0 0 00.00.0| ad6s1d 0 0 0 00.0 0 00.00.0| ad6s1e 0387387 495701.1 0 00.0 44.0| ad6s1f Those busy figures.. on the gmirror they fly up to > 100% all the time and are red.. on the ad6 figures they are 40-50% all the time (during copy that is).. Any ideas? dmesg: Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-RELEASE-p7 #0: Wed Sep 20 09:21:41 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ELFI Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 Processor 3200+ (2009.79-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x20ff0 Stepping = 0 Features=0x78bfbffMCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x1 real memory = 1073676288 (1023 MB) avail memory = 1024299008 (976 MB) ACPI APIC Table: ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff,0xcf0-0xcf3 on acpi0 pci0: on pcib0 agp0: mem 0xf800-0xf9ff at device 0.0 on pci0 isab0: at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) ohci0: mem 0xfd005000-0xfd005fff irq 20 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered ohci1: mem 0xfd00-0xfd000fff irq 21 at device 2.1 on pci0 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: on ohci1 usb1: USB revision 1.0 uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 4 ports with 4 removable, self powered ehci0: mem 0
Total crash in gdb!.. something is broken!.. Was: Re: FreeBSD with a Gigabyte GA-K8NSC?
us1: on xl0 xlphy0: <3Com internal media interface> on miibus1 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:10:5a:dc:5e:aa fwohci0: port 0x8400-0x847f mem 0xfdffe000-0xfdffe7ff irq 19 at device 12.0 on pci2 fwohci0: OHCI version 1.0 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:10:dc:00:00:77:83:dc fwohci0: Phy 1394a available S400, 3 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: 02:10:dc:77:83:dc fwe0: Ethernet address: 02:10:dc:77:83:dc fwe0: if_start running deferred for Giant sbp0: on firewire0 fwohci0: Initiate bus reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: port 0x378-0x37f,0x778-0x77b irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 ppbus0: IEEE1284 device found /NIBBLE/ECP Probing for PnP devices on ppbus0: ppbus0: SCP,VLINK plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] orm0: at iomem 0xc-0xce7ff,0xd-0xd17ff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa-0xb on isa0 ums0: Microsoft Microsoft 5-Button Mouse with IntelliEye(TM), rev 1.10/3.00, addr 2, iclass 3/1 ums0: 5 buttons and Z dir. Timecounter "TSC" frequency 2210091714 Hz quality 800 Timecounters tick every 1.000 msec module_register_init: MOD_LOAD (amr_linux, 0x80654f90, 0) error 6 ad0: 156334MB at ata0-master UDMA133 acd0: DVDR at ata1-master UDMA33 ad4: 286187MB at ata2-master SATA150 GEOM_MIRROR: Device gm0 created (id=316220990). GEOM_MIRROR: Device gm0: provider ad4 detected. GEOM_MIRROR: Device gm0: provider ad4 activated. GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. Trying to mount root from ufs:/dev/mirror/gm0s1a So.. Wtf is the problem here?... Hope someone can help me.. Thanks On Sep 23, 2006, at 18:41 , Johan Ström wrote: On Sep 3, 2006, at 14:13 , Johan Ström wrote: Hi I'm about to get a "new" server... In this case what I'm looking at is a Gigabyte GA-K8NSC mobo with nForce3 250Gb chipset, and a AMD 64 3200+ Venice S939. Does anyone have any experience with FreeBSD (6.1) and this mobo/ chipset? Does the network work? How good? SATA? Any stability/ performance issues? I did notice it was mentioned on http://www.freebsd.org/platforms/ amd64/motherboards.html on 5.4 with the only comment "Sound and USB untested.".. So.. anyone got more detailed experience than that? Thanks :) -- Johan Ström [EMAIL PROTECTED] Hi again, I got the mobo now and everything I've tested seems to work fine, network (Marvell Gigabit Ethernet) works perfect (altough just using 100mbit, havent tested gig), and sata seems to work.. Somewhat... Thats part of why I post this.. I got two disks plugged in currently, two pieces of ad4: 286187MB at ata2-master SATA150 (ad4 and ad6) on one SATA each... When I only access ad4 (the system disk) and dont touch ad6 (the old system disk, moving some data form there now.. soon to be gmirrored with ad4) it works fine. But as soon as i start to transer data from ad6 to ad4 (or rather, from ad4s1f to gm0s1f of which ad6 is provider), the system becomes veeerrry slow... Its still usable but it takes several seconds (sometimes as much as 10-20) to ie exectue a simple command like ls, top, su... gstat reports speeds of around 30MB/s: dT: 0.501 flag_I 50us sizeof 288 i -1 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 17395 12200 577.2383 490598.5 99.2| ad4 17395 12200 577.2383 490598.5 99.2| mirror/gm0 0 0 0 00.0 0 00.00.0| ad4s1 17395 12200 577.2383 490598.5 99.2| mirror/gm0s1 0387387 495701.1 0 00.0 43.0| ad6 3 2 2 32 583.1 0 00.0 116.5| mirror/gm0s1a 1 0 0 00.0 0 00.00.0| mirror/gm0s1b 0 0 0 00.0 0 00.00.0| mirror/gm0s1c 0 0 0 00.0 0 00.00.0| mirror/gm0s1d 0 0 0 00.0 0 00.00.0| mirror/gm0s1e 13393 10168 576.0383 490598.5 95.3| mirror/
Re: Total crash in gdb!.. something is broken!.. Was: Re: FreeBSD with a Gigabyte GA-K8NSC?
Fcking great.. Waking up and noting that the box has rebooted it self during the night... Yay!!... No kernel dumps, nothing in message log.. Nada... (this was on the "first" box, that is the one first in this thread) What exactly does kernel dumps on /dev/mirror/gm0s1b mean? Not that it saves any kernel dumps at least.. But otoh I have no clue why it crashed at all and if it even did try to dump kernel or if it just blacked out as when i tried to debug clamd... -- Johan On Sep 24, 2006, at 14:23 , Johan Ström wrote: Okay, I got some problems here now... I'm trying to get clamav's clamd to work.. Failes with an abort in libc: It coredumps directly on start, trace: http://sial.org/pbot/19922 Truss output: www.stromnet.org/~johan/clamd.log Okay...something seems to be f*cked in the nss/ldap stuff.. Anyway,, when running with gdb --args /usr/local/sbin/clamd --debug it works fine!... No coredump or anything, untill i decide to kill clamd.. kill breaks into gdb, and when I run continue to process the signal and let it die, the whole fcking box dies! Screen goes black and reboot.. No panic messages or anything... I have reproduced this two times now on the box in the dmesg in earlier mail... Then i moved the disk to another box pretty similar, same chipset i thikn but not exactly same mobo.. tried the above commands, and bam exactly same problem.. screen just goes black and the box reboots... Dmesg from that box: if i can get the crap up running. now the fs is broken or some shit get this on boot, after started a few services: Starting jails:/usr: bad dir ino 32125198 at offset 512: mangled entry panic: ufs_dirbad: bad dir Uptime: 54s GEOM_MIRROR: Device gm0: provider mirror/gm0 destroyed GEOM_MIRROR: Device gm0 destroyed. Cannot dump. No dump device defined. Automatic reboot in 15 seconds... Ok, now i rebooted to singeluser mode and enalbed the dumpdev in rc.conf... , then saw it continune booting and i chcked for the line saying kernel dumps on /dev/mirror/gm0s1b... it was there... and then it booted further and got by the place it crashed before, but a minute later when i try to login to crashes on the same inode... AND STILL!.. it says Cannot dump. No dump device defined... WTF??... brokeness brokeness.. Okay, after some fscking its back up. dmesg from second box which i can crash with clamd...: Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-RELEASE-p7 #0: Wed Sep 20 09:21:41 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ELFI Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 Processor 3200+ (2210.09-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0xfc0 Stepping = 0 Features=0x78bfbffE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> AMD Features=0xe0500800 real memory = 1073676288 (1023 MB) avail memory = 1024299008 (976 MB) ACPI APIC Table: ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 cpu0: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff,0xcf0-0xcf3 on acpi0 pci0: on pcib0 agp0: mem 0xf000-0xf7ff at device 0.0 on pci0 isab0: at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) ohci0: mem 0xfe02f000-0xfe02 irq 21 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered ohci1: mem 0xfe02e000-0xfe02efff irq 22 at device 2.1 on pci0 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: on ohci1 usb1: USB revision 1.0 uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 4 ports with 4 removable, self powered ehci0: mem 0xfe02d000-0xfe02d0ff irq 23 at device 2.2 on pci0 ehci0: [GIANT-LOCKED] usb2: EHCI version 1.0 usb2: companion controllers, 4 ports each: usb0 usb1 usb2: on ehci0 usb2: USB revision 2.0 uhub2: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 8 ports with 8 removable, self powered nve0: port 0xf000-0xf007 mem 0xfe02c000-0xfe02cfff irq 21 at device 5.0 on pci0 nve0: Ethernet address 00:11:09:c5:fc:9e miibus0: on nve0 ukphy0: on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto nve0: Ethernet address: 00:11:09:c5:fc:9e pci0: at device 6.0 (no driver attached) atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xdc00-0xdc0f at device 8.0 on pci0 ata0: on atapci0 ata1: on atapci0 atapci1: port 0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xc80
Re: Total crash in gdb!.. something is broken!.. Was: Re: FreeBSD with a Gigabyte GA-K8NSC?
On Sep 25, 2006, at 08:45 , Jiawei Ye wrote: On 9/25/06, Johan Ström <[EMAIL PROTECTED]> wrote: Fcking great.. Waking up and noting that the box has rebooted it self during the night... Yay!!... No kernel dumps, nothing in message log.. Nada... (this was on the "first" box, that is the one first in this thread) What exactly does kernel dumps on /dev/mirror/gm0s1b mean? Not that it saves any kernel dumps at least.. But otoh I have no clue why it crashed at all and if it even did try to dump kernel or if it just blacked out as when i tried to debug clamd... -- Johan It means that the system died and released the sphincter when it did. If you have dumpdev='AUTO' dumpdir='/var/crash' in your rc.conf, then you can find the crash dump in ${dumpdir}, then you can use kgdb to retrieve the backtrace from the dump. I got dumpdev="/dev/mirror/gm0s1b", savecore doesnt extract any dumps :/ Jiawei -- "If it looks like a duck, walks like a duck, and quacks like a duck, then to the end user it's a duck, and end users have made it pretty clear they want a duck; whether the duck drinks hot chocolate or coffee is irrelevant." ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Total crash in gdb!.. something is broken!.. Was: Re: FreeBSD with a Gigabyte GA-K8NSC?
fOn Sep 25, 2006, at 09:55 , Alban Hertroys wrote: On Sep 25, 2006, at 8:36, Johan Ström wrote: What exactly does kernel dumps on /dev/mirror/gm0s1b mean? Not that it saves any kernel dumps at least.. But otoh I have no clue why it It means exactly that. IIRC kernel dumps are created in swap space and on the next boot are moved to ${dumpdir}. I'm pretty certain this is explained nicely in the handbook[1]. Probably is, yes I know what it means, I was just pretty upset at the moment..;) AFAIK kernels can only be dumped on real devices, not on virtual devices like /dev/mirror/*. In that case your setup is not going to get you any dumps. In earlier FBSD (6.0 i think?) one got an ioctl error when trying to dumpon to a gmirror device, but if I dont recall wrong this has been changed since (I dont get an ioctl error anymore at least...) Besides that, it is probably not a very good idea to mirror your swap. I am certain it is bad for performance, if it'd gain you reliability is beyond my knowledge. This has been discussed before, you probably want to check the archives. Performance yes, but I think I've read that it is "best" anyway, if one of your disks dies, youd dont want to loose half your swap since that would not be very good if there is anything swapped out to that disk.. [1] Which I didn't check as I'm about to be in a hurry... -- Alban Hertroys Priest to alien: "We want to know, is there a higher being?". Alien: "Well, actually that's why we're here, we're sheer out of virgins". !DSPAM:259,45178b407241208415560! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Network polling
Hi I just tried to enable network polling on my router box, a P2 400MHz with 3 different NICs (one internal, i think its the fxp one): fxp0: port 0x7c60-0x7c7f mem 0xf3dff000-0xf3df,0xf3f0-0xf3ff irq 11 at device 3.0 on pci0 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: [GIANT-LOCKED] rl0: port 0x7400-0x74ff mem 0xf3efef00-0xf3efefff irq 10 at device 16.0 on pci0 miibus1: on rl0 rlphy0: on miibus1 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl0: [GIANT-LOCKED] sis0: port 0x7800-0x78ff mem 0xf3eff000-0xf3ef irq 9 at device 20.0 on pci0 sis0: Silicon Revision: DP83815C gw-1 ~$ uname -a FreeBSD gw-1.stromnet.org 6.1-RELEASE-p10 FreeBSD 6.1-RELEASE-p10 #1: Fri Oct 13 16:59:41 CEST 2006 [EMAIL PROTECTED]:/usr/obj/ usr/src/sys/ROUTER.POLLING i386 Kernel is GENERIC + carp+pfsync+ipsec+polling.. Ok, so when I transfer data between sis0 to rl0 for example, i get very high intererrupt rate, ~40% or so.. Im using openvpn on the box (laptop on rl0), so the packets is maybee shopped up into smaller fragmenst, im not sure.. But anyways, I got the idea that I should try to enable polling on the interface instead. So I did,r ecompiled with polling and enabled polling on all thre if's (man polling says all three should be supported). Any difference? None! still at 40% interrupts when loading ~10MBit (cant seem to get much more since ovpn floors the CPU at that speed). So, shouldnt the interrupts go down somewhat now that i enabled polling? Or did I get this all wrong ;) Thanks Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: OpenSSH 4.6 error: channel 0: chan_read_failed for istate 3
On Mar 20, 2007, at 16:04 , Dominik Zalewski wrote: Hi All, After upgrading to openssh-portable-4.6.p1,1 I'm getting following messages in logs: error: channel 0: chan_read_failed for istate 3 Althought ssh works fine. Hi, just wan't to report that I've started to see the same thing since i upgraded to 4.6.p1,1. Every night my backup server uses scp to transfer files from my box, and I see this: Mar 23 21:00:04 elfi sshd[76875]: Accepted publickey for root from 2001:xxx::xxx:xx port 63449 ssh2 Mar 23 21:01:18 elfi sshd[76875]: error: channel 0: chan_read_failed for istate 3 Mar 23 21:01:18 elfi sshd[76875]: error: channel 0: chan_read_failed for istate 3 Mar 23 21:01:18 elfi sshd[77389]: Accepted publickey for root from 2001: xxx::xxx:xx port port 63450 ssh2 Mar 23 21:53:31 elfi sshd[77389]: error: channel 0: chan_read_failed for istate 3 Mar 23 21:53:32 elfi sshd[77389]: error: channel 0: chan_read_failed for istate 3 Mar 23 21:53:34 elfi sshd[85742]: Accepted publickey for root from 2001: xxx::xxx:xx port port 49493 ssh2 Mar 23 21:53:34 elfi sshd[85742]: error: channel 0: chan_read_failed for istate 3 Mar 23 21:53:34 elfi sshd[85742]: error: channel 0: chan_read_failed for istate 3 The backup process works by first executing a pre-script, then scp'ing, then executing a post-script.. so those errors looks like they appear directly when the ssh session is disconnected. Anyone else with clues? Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
ATA driver/gmirror problems, multiple boxes...
, march 12, feb 13, jan 22, jan 18) For crus: Apr 23 13:46:14 crus kernel: GEOM_MIRROR: Device gm1: provider ad8 disconnected. Apr 13 09:57:49 crus kernel: GEOM_MIRROR: Device gm1: provider ad8 disconnected. I think it has happened once more, but thats it.. For gw-1 it's luckily only once so far.. At least with the current install, it has had problems when the maxtor disks was running in it (and i think it was 6.0 back then) So.. Three different boxes, with three different chipsets... With three different crash scenarios.. But they all have problems.. So where is the actual problem? The HW? The chipset drivers? Gmirror code? I have run SMART tests on the crashing disks, no errors.. I have run powermax (maxtors own test program) a while back on the maxtor disks, no problems.. I have tried changing SATA cables on some of the disks, no difference.. Does anyone have any clue about what can be causing this? What is most likely? How do we hunt this down? Thank you. Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Crashed gmirror, single disk marked SYNC and wont boot...
Hi FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7: Tue Feb 13 18:24:34 CET 2007 [EMAIL PROTECTED]:/usr/obj/usr/ src/sys/ROUTER.POLLING i386 (ROUTER.POLLING is GENERIC + options DEVICE_POLLING and ALTQ, IPSEC, also pfsync and carp) This weekend I had a disk failing on me in a machine running gmirror gm0 with 2 providers (ad0 and ad6). The whole box froze with no screen output, and on hard reboot I got some LBA errors etc from ad0, after a few reboots it got up and running though (I wasnt at the screen, had do do it by phone so couldn't really debug very well). As soon as the box got up, I removed ad0 from the gmirror, so ad6 was the only provider. Today I got a new disk that would replace ad0.. Now remeber, ad6 was the only disk in the mirror. I took the box down fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4 +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA.. Okay, there came the first problem; the boot loader gave me the usual options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1 i got the same prompt again.. F5 nothing at all.. Funny!... The system refused to load the loader (or whatever the 1-9 menu thingy is called) kernel or anything.. So I finally plugged the old ad0 disk into the machine to at least get it booted, thinking it would go up on the gmirror.. Nope..: (got the new ad4 out here) ad0: 38166MB at ata0-master UDMA100 ad6: 152627MB at ata3-master SATA150 GEOM_MIRROR: Device gm0 created (id=4029378995). GEOM_MIRROR: Device gm0: provider ad6 detected. Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR GEOM_MIRROR: Force device gm0 start due to timeout. Trying to mount root from ufs:/dev/mirror/gm0s1a Manual root filesystem specification: : Mount using filesystem eg. ufs:da0s1a ? List valid disk boot devices Abort manual input mountroot> Okey... so why wouldnt it load my mirror from ad6 now?? I just did a clean shutdown without problems.. It didnt even recognize any slices on ad6s1 (altough the ad6s1 was found)... I entered ad0s1 as root and booted from there, ofcourse i got to emergency shell since fstab looked for the gmirror devices, which didnt exist.. Some more digging into gmirror, I did a gmirror dump ad6: Metadata on /dev/ad6: magic: GEOM::MIRROR version: 3 name: gm0 mid: 4029378995 did: 449032193 all: 3 genid: 0 syncid: 5 priority: 0 slice: 4096 balance: round-robin mediasize: 20416757248 sectorsize: 512 syncoffset: 0 mflags: NONE dflags: SYNCHRONIZING hcprovider: provsize: 160041885696 MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f Some googling indicated that SYNCHRONIZING means that its not "complete" and wont mount? Is that correct? Why would it be in that state then, I just shut it down fine... And where the f*ck did my slices go??.. Did a sysctl kern.geom.mirror.debug=2 and tried to gmirror activate the mirror: GEOM_MIRROR[1]: Creating device gm0 (id=4029378995). GEOM_MIRROR[0]: Device gm0 created (id=4029378995). GEOM_MIRROR[1]: root_mount_hold 0xc3539510 GEOM_MIRROR[1]: Adding disk ad6 to gm0. GEOM_MIRROR[2]: Adding disk ad6. GEOM_MIRROR[2]: Disk ad6 connected. GEOM_MIRROR[1]: Disk ad6 state changed from NONE to NEW (device gm0). GEOM_MIRROR[0]: Device gm0: provider ad6 detected. GEOM_MIRROR[2]: Tasting ad6s1. GEOM_MIRROR[0]: Force device gm0 start due to timeout. GEOM_MIRROR[1]: root_mount_rel[2169] 0xc3539510 GEOM_MIRROR[2]: No I/O requests for gm0, it can be destroyed. GEOM_MIRROR[2]: Metadata on ad6 updated. GEOM_MIRROR[2]: Access ad6 r-1w-1e-1 = 0 GEOM_MIRROR[0]: Device gm0 destroyed. GEOM_MIRROR[1]: Thread exiting. GEOM_MIRROR[1]: Consumer ad6 destroyed. Soo.. What is going on here? Anyone with some clues? Currently running on the ad0 disk, no raid at all.. Lets hope it doesnt die on me (havent had any signs of that since sunday when it froze and gave boot errors now so I'm hoping..). The data loss from using ad0 instead of ad6 is probably minimal, its a router so its more or less only logging that seems to been lost... For now I just want to get clear about wth happened here and how to prevent it, and how to get back up on a gmirror with ad6 and ad4 (to be plugged in) so I can throw ad0 out... Thanks -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Crashed gmirror, single disk marked SYNC and wont boot...
On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote: On Tue, Aug 21, 2007 at 02:15:08PM +0200, Johan Ström wrote: Hi FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7: Tue Feb 13 18:24:34 CET 2007 [EMAIL PROTECTED]:/usr/obj/usr/ src/sys/ROUTER.POLLING i386 (ROUTER.POLLING is GENERIC + options DEVICE_POLLING and ALTQ, IPSEC, also pfsync and carp) This weekend I had a disk failing on me in a machine running gmirror gm0 with 2 providers (ad0 and ad6). The whole box froze with no screen output, and on hard reboot I got some LBA errors etc from ad0, after a few reboots it got up and running though (I wasnt at the screen, had do do it by phone so couldn't really debug very well). As soon as the box got up, I removed ad0 from the gmirror, so ad6 was the only provider. Today I got a new disk that would replace ad0.. Now remeber, ad6 was the only disk in the mirror. I took the box down fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4 +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA.. Okay, there came the first problem; the boot loader gave me the usual options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1 i got the same prompt again.. F5 nothing at all.. Funny!... The system refused to load the loader (or whatever the 1-9 menu thingy is called) kernel or anything.. So I finally plugged the old ad0 disk into the machine to at least get it booted, thinking it would go up on the gmirror.. Nope..: (got the new ad4 out here) ad0: 38166MB at ata0-master UDMA100 ad6: 152627MB at ata3-master SATA150 GEOM_MIRROR: Device gm0 created (id=4029378995). GEOM_MIRROR: Device gm0: provider ad6 detected. Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR GEOM_MIRROR: Force device gm0 start due to timeout. Trying to mount root from ufs:/dev/mirror/gm0s1a Manual root filesystem specification: : Mount using filesystem eg. ufs:da0s1a ? List valid disk boot devices Abort manual input mountroot> Okey... so why wouldnt it load my mirror from ad6 now?? I just did a clean shutdown without problems.. It didnt even recognize any slices on ad6s1 (altough the ad6s1 was found)... It loaded your mirror just fine, you confuse things. Gmirror started in degraded state, as one could expect, but it seems there is no 'a' partition on your gm0s1 slice (or entire bsdlabel is gone). You could try to recreate it based on bsdlabel from ad0 (if it should be the same), but I've no idea how it disapeared. Anyway, gmirror seems to work properly. Okay.. So it tries to load, find no partition table, and ignores and unloads gm0? Some more digging into gmirror, I did a gmirror dump ad6: Metadata on /dev/ad6: magic: GEOM::MIRROR version: 3 name: gm0 mid: 4029378995 did: 449032193 all: 3 You have 3-way mirror? Uhm.. never had more than 2 disks in this machine.. genid: 0 syncid: 5 priority: 0 slice: 4096 balance: round-robin mediasize: 20416757248 sectorsize: 512 syncoffset: 0 mflags: NONE dflags: SYNCHRONIZING hcprovider: provsize: 160041885696 MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f BTW. Your provider size is 149GB and mirror only use 19GB, which means you mirrored 149GB disk with 19GB disk and you waste 130GB (it's unused). Yes, the ad0 disk was (is) only 40GB so only first 40 Gb of that disk was in mirror (the rest was in another slice with its own label.. altough if I'm doing fdisk on the disk it seems to not be there at all..) But hum, 19??.. It should be 40 (or somewhere around there at least).. From ad0 mount: Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 5076308514238187818%/ /dev/ad0s1e 507630 20467000 0%/tmp /dev/ad0s1f 10154158 1176410 816541613%/usr /dev/ad0s1d150619080326 1305370 6%/var /dev/ad0s1g 24174212 6939804 1530047231%/var/squid swapinfo: /dev/ad0s1b 10225360 1022536 0% ~35Gb... Compared slice 1 on ad0 vs ad6, both have the exact same size. Some googling indicated that SYNCHRONIZING means that its not "complete" and wont mount? Is that correct? Why would it be in that state then, I just shut it down fine... And where the f*ck did my slices go??.. SYNCHRONIZING means that this component was/is being synchronized. It seems that you removed/lost the master disk, while it was synchronizing. It should work anyway. Okay thats odd.. ad6 was the only disk in the mirror when I shut down (shutdown -p now, and it powered off by itself..) so it should have been good.. BTW. You confuse things again. Your slice is just fine (ad6s1), you don't have partitions, AFAIU. Seems I did yes,
Re: Crashed gmirror, single disk marked SYNC and wont boot...
On Aug 21, 2007, at 17:53 , Johan Ström wrote: On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote: All in all, your partition table seems to be gone. If you created it on gmirror before (gm0s1) you may still have the same partition table on the other half of the mirror. You can try to move it to ad6 with bsdlabel and verify if you can see file system inside partitions. Okay, tried that now.. Saved ad0s1 label, reloaded it onto ad6s1.. Now I got same partition table on ad6s1 as on ad0s1... Trying to mount any though gives me incorrect super block... fsck cannot find any superblocks either.. So.. What to do now then? Just for get ad6 and start from scratch from ad2? (as i said, the data isnt very old realy)... Im thinking about doing complete reinstall on ad4+ad6 then.. Can I do that? fdisk both with full partition on both, create a new gmirror between ad6s1/ad4s1 (or should i go on ad4/ad6?), create slices, use dump | restore (of course with apps shutdown so no data is changed.. or at least nothing that I care about) to copy all files from ad2 to new mirror.. what do I need to do more? bsdlabel - B on both to write boot blocks? Is there anything else to think about? Ok just for the record, I plugged both sata disks in, cleared them, created a new mirror on both of them, sliced up and dump -0 -L -f - / | restore -r -f - all filesystems, also bsdlabel -B. and what i missed in the above thext, fdisk -B to write boot0 code.. Now its booted fine on the mirror! altough, one thing that I got curious about. In the fdisk manpage it says -b can be used to change the bootcode.. and that default is / boot/mbr.. What is this? I checked md5 against boot0 and its not the same (altough I guess it might just be some boot0 with different config..). I never found any references to this mbr file in neither man pages or handbook. Again, thanks for the help :) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Crashed gmirror, single disk marked SYNC and wont boot...
On Aug 24, 2007, at 12:21 , CyberLeo Kitsana wrote: Johan Ström wrote: altough, one thing that I got curious about. In the fdisk manpage it says -b can be used to change the bootcode.. and that default is /boot/mbr.. What is this? I checked md5 against boot0 and its not the same (altough I guess it might just be some boot0 with different config..). I never found any references to this mbr file in neither man pages or handbook. boot0 is the pretty 'F1 FreeBSD' type boot menu. mbr is more like your standard MS bootloader, that just boots the active slice of the current disk. The latter is my favorite, as I despise multi-booting. I see. Shouldn't this info be in the manpages/handbook somewhere? Like referenced from boot0cfgs manpage or something, and in the boot section in handbook. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Apache2, mod_python and nss_ldap: Coredump...
Hi I got a new 6.0-STABLE box. Rebuilt kernel and world 2 hours ago (against RELENG_6), so it should be pretty new. Im trying to have apache 2.0.55, mod_python 3.1.4 and nss_ldap 239, all the latest from ports. The problem I have is this: If i have LoadModule python_module libexec/apache2/mod_python.so in my httpd.conf, and at the same time have either "group: files ldap" and/or "passwd: files ldap" in my nsswitch.conf, i get Segfaults. Example: [EMAIL PROTECTED]:~$ apachectl configtest Syntax OK Segmentation fault (core dumped) [EMAIL PROTECTED]:~$ However, apache itself is running fine, even using mod_python. If i remove either the LoadModule or both the ldap-entrys in nsswitch, the segfaults dissappear. I've compiled httpd with debug symbols, and this is what I found with gdb (httpd -t is same as apachectl configtest): [EMAIL PROTECTED]:~$ gdb httpd GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... (gdb) run -t Starting program: /usr/local/sbin/httpd -t warning: Unable to get location for thread creation breakpoint: generic error [New LWP 100104] [New Thread 0x80ab000 (LWP 100104)] Warning: DocumentRoot [/usr/local/nagios/share] does not exist Syntax OK Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x80ab000 (LWP 100104)] 0x in ?? () (gdb) where #0 0x in ?? () #1 0x28be6744 in ?? () from /usr/local/lib/nss_ldap.so.1 #2 0x28bf2200 in ?? () from /usr/local/lib/nss_ldap.so.1 #3 0x280ba3d8 in ?? () from /libexec/ld-elf.so.1 #4 0xbfbfe618 in ?? () #5 0x280a0b26 in _rtld_error () from /libexec/ld-elf.so.1 #6 0x28bef998 in _fini () from /usr/local/lib/nss_ldap.so.1 #7 0x280b9018 in tls_dtv_generation () from /libexec/ld-elf.so.1 #8 0x280ba3d8 in ?? () from /libexec/ld-elf.so.1 #9 0xbfbfe628 in ?? () #10 0x280a1076 in elf_hash () from /libexec/ld-elf.so.1 #11 0x280a3958 in dlclose () from /libexec/ld-elf.so.1 #12 0x284de64c in _nsdbtaddsrc () from /lib/libc.so.6 #13 0x284de20f in endhostent () from /lib/libc.so.6 #14 0x284de6cc in _nsdbtaddsrc () from /lib/libc.so.6 #15 0x284fd35f in __cxa_finalize () from /lib/libc.so.6 #16 0x284fcf9a in exit () from /lib/libc.so.6 #17 0x0806f0ee in destroy_and_exit_process (process=0x80b6098, process_exit_value=0) at main.c:216 #18 0x0806faa6 in main (argc=2, argv=0xbfbfe838) at main.c:565 (gdb) So, seems the segfault appears when apache calls exit(), explains why it seems to work good otherwise... Googling gave me some similar problem (bug 65220), however that bug seemd to affect other programs, so far I've only encountered this problem with apache. Currently I've compiled apache with the following: portinstall apache-2.0.55 -M "WITH_DBM=bdb WITH_BERKELEYDB=db4 WITH_LDAP=1 WITH_MPM=prefork WITH_THREADS=yes WITH_THREADS_MODULES=yes WITH_DEBUG=1" The threads stuff was added after some suspect gdb'ing around a pthread function (can't remember exact name now.. something pthread_cancel.. the symptoms where the same, segfault just before exit). mod_python is installed without any special options, there isnt realy any (ie no option to turn of threads). Does anyone have any clue about whats going on here? Thanks! Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Apache2, mod_python and nss_ldap: Coredump...
On 10 nov 2005, at 00.25, Brian Fundakowski Feldman wrote: On Wed, Nov 09, 2005 at 10:20:26AM +0100, Johan Ström wrote: Hi I got a new 6.0-STABLE box. Rebuilt kernel and world 2 hours ago (against RELENG_6), so it should be pretty new. Im trying to have apache 2.0.55, mod_python 3.1.4 and nss_ldap 239, all the latest from ports. The problem I have is this: If i have LoadModule python_module libexec/apache2/mod_python.so in my httpd.conf, and at the same time have either "group: files ldap" and/or "passwd: files ldap" in my nsswitch.conf, i get Segfaults. Example: [EMAIL PROTECTED]:~$ apachectl configtest Syntax OK Segmentation fault (core dumped) [EMAIL PROTECTED]:~$ However, apache itself is running fine, even using mod_python. If i remove either the LoadModule or both the ldap-entrys in nsswitch, the segfaults dissappear. I've compiled httpd with debug symbols, and this is what I found with gdb (httpd -t is same as apachectl configtest): [...] (gdb) where #0 0x in ?? () #1 0x28be6744 in ?? () from /usr/local/lib/nss_ldap.so.1 #2 0x28bf2200 in ?? () from /usr/local/lib/nss_ldap.so.1 Can you try making sure that nss_ldap gets built and linked with -g, and is not stripped, so that all symbols and debug info are preserved as well? Looks to be atexit(3)-related, from here, but the symbols should clear things up. Hi, thanks for the answer! I *think* i got the nss_ldap.so to not be strip'd, at least I cant find any call in the port Makefile or the sources makefile/configure stuff that would strip it. Same result as before, no new symbols.. Strange? I'm compiling with -g and -O0.. However, I've noticed one thing, if I run gdb httpd and then run -t, I get this: [EMAIL PROTECTED]:~$ gdb httpd GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... (gdb) run -t Starting program: /usr/local/sbin/httpd -t warning: Unable to get location for thread creation breakpoint: generic error [New LWP 100128] [New Thread 0x80fa000 (LWP 100128)] wWarning: DocumentRoot [/usr/local/nagios/share] does not exist Syntax OK [New LWP 100128] Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to LWP 100128] 0x28bce277 in pthread_testcancel () from /usr/lib/libpthread.so.2 (gdb) where #0 0x28bce277 in pthread_testcancel () from /usr/lib/libpthread.so.2 Error accessing memory address 0x28bcd7a8: Bad address. (gdb) Thats the pthread_cancel thing I was talking about before... However, if I do run httpd -t and then check the dump with gdb httpd - c httpd.core, I get the same as first posted. Did the test over and over again, got the same pthread_cancel error, reading the same memory address, re-ran httpd -t a couple of times and seems I only get these pthread_cancel calls... Is there any way to check if a lib is strip'd/got debug symbols or not? Thanks Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Apache2, mod_python and nss_ldap: Coredump
On 10 nov 2005, at 13.55, Stephane Bortzmeyer wrote: On Wed, Nov 09, 2005 at 01:46:37PM +0100, Johan Ström <[EMAIL PROTECTED]> wrote a message of 112 lines which said: Im trying to have apache 2.0.55, mod_python 3.1.4 and nss_ldap 239, all the latest from ports. The problem I have is this: If i have LoadModule python_module libexec/apache2/mod_python.so in my httpd.conf, and at the same time have either "group: files ldap" and/or "passwd: files ldap" in my nsswitch.conf, i get Segfaults. Example: The only thing I can say is that I have the same problem on FreeBSD 5.4-RELEASE. Intresting... So it seems im not the only one with problems then. CC'ing this to the freebsd-stable-list (and the correct mod_python mail-address.. had it wrong in the first mail to apache-users).. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Apache2, mod_python and nss_ldap: Coredump...
On 10 nov 2005, at 12.54, Johan Ström wrote: On 10 nov 2005, at 00.25, Brian Fundakowski Feldman wrote: On Wed, Nov 09, 2005 at 10:20:26AM +0100, Johan Ström wrote: Hi I got a new 6.0-STABLE box. Rebuilt kernel and world 2 hours ago (against RELENG_6), so it should be pretty new. Im trying to have apache 2.0.55, mod_python 3.1.4 and nss_ldap 239, all the latest from ports. The problem I have is this: If i have LoadModule python_module libexec/apache2/mod_python.so in my httpd.conf, and at the same time have either "group: files ldap" and/or "passwd: files ldap" in my nsswitch.conf, i get Segfaults. Example: [EMAIL PROTECTED]:~$ apachectl configtest Syntax OK Segmentation fault (core dumped) [EMAIL PROTECTED]:~$ However, apache itself is running fine, even using mod_python. If i remove either the LoadModule or both the ldap-entrys in nsswitch, the segfaults dissappear. I've compiled httpd with debug symbols, and this is what I found with gdb (httpd -t is same as apachectl configtest): [...] (gdb) where #0 0x in ?? () #1 0x28be6744 in ?? () from /usr/local/lib/nss_ldap.so.1 #2 0x28bf2200 in ?? () from /usr/local/lib/nss_ldap.so.1 Can you try making sure that nss_ldap gets built and linked with -g, and is not stripped, so that all symbols and debug info are preserved as well? Looks to be atexit(3)-related, from here, but the symbols should clear things up. Hi, thanks for the answer! I *think* i got the nss_ldap.so to not be strip'd, at least I cant find any call in the port Makefile or the sources makefile/ configure stuff that would strip it. Same result as before, no new symbols.. Strange? I'm compiling with -g and -O0.. However, I've noticed one thing, if I run gdb httpd and then run - t, I get this: [EMAIL PROTECTED]:~$ gdb httpd GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... (gdb) run -t Starting program: /usr/local/sbin/httpd -t warning: Unable to get location for thread creation breakpoint: generic error [New LWP 100128] [New Thread 0x80fa000 (LWP 100128)] wWarning: DocumentRoot [/usr/local/nagios/share] does not exist Syntax OK [New LWP 100128] Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to LWP 100128] 0x28bce277 in pthread_testcancel () from /usr/lib/libpthread.so.2 (gdb) where #0 0x28bce277 in pthread_testcancel () from /usr/lib/libpthread.so.2 Error accessing memory address 0x28bcd7a8: Bad address. (gdb) Thats the pthread_cancel thing I was talking about before... However, if I do run httpd -t and then check the dump with gdb httpd -c httpd.core, I get the same as first posted. Did the test over and over again, got the same pthread_cancel error, reading the same memory address, re-ran httpd -t a couple of times and seems I only get these pthread_cancel calls... Is there any way to check if a lib is strip'd/got debug symbols or not? Thanks Johan Okay, some news here then.. Thanks to David Adam I used file to determine if it was striped or not, seems it was.. So now I've fixed it, not striped anymore (the install command striped it, i missed that).. New debug output then: (gdb) where #0 0x in ?? () #1 0x28bd9730 in __do_global_dtors_aux () from /usr/local/lib/ nss_ldap.so.1 #2 0x28be2984 in _fini () from /usr/local/lib/nss_ldap.so.1 #3 0x280b5018 in tls_dtv_generation () from /libexec/ld-elf.so.1 #4 0x280b63d8 in ?? () from /libexec/ld-elf.so.1 #5 0xbfbfe628 in ?? () #6 0x2809d076 in elf_hash () from /libexec/ld-elf.so.1 #7 0x2809f958 in dlclose () from /libexec/ld-elf.so.1 #8 0x284b064c in _nsdbtaddsrc () from /lib/libc.so.6 #9 0x284b020f in endhostent () from /lib/libc.so.6 #10 0x284b06cc in _nsdbtaddsrc () from /lib/libc.so.6 #11 0x284cf35f in __cxa_finalize () from /lib/libc.so.6 #12 0x284cef9a in exit () from /lib/libc.so.6 #13 0x0806b746 in destroy_and_exit_process (process=0x80a4090, process_exit_value=0) at main.c:216 #14 0x0806c0fe in main (argc=2, argv=0xbfbfe838) at main.c:565 (Also sent this to the other lists this thread is discussed in). Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Page fault, GEOM problem??
Ok, just got this not so very nice error on a RELENG_6_0 box (built from sources this morning, GENERIC kernel minus drivers I dont use): Nov 17 15:35:43 elfi kernel: subdisk10: detached Nov 17 15:35:43 elfi kernel: ad10: detached Nov 17 15:35:43 elfi kernel: unknown: TIMEOUT - READ_DMA retrying (1 retry left) LBA=85720528 Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad10s1 disconnected. Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134356992, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134373376, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134438912, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268591104, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268607488, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268623872, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268640256, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=20151026176, length=2048)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=32299655680, length=8192)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=37363671552, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=38349087232, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=45453566464, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=54459458048, length=131072)] Nov 17 17:59:18 elfi syslogd: kernel boot file is /boot/kernel/kernel Nov 17 17:59:18 elfi kernel: Nov 17 17:59:18 elfi kernel: Nov 17 17:59:18 elfi kernel: Fatal trap 12: page fault while in kernel mode Nov 17 17:59:18 elfi kernel: fault virtual address = 0x48 Nov 17 17:59:18 elfi kernel: fault code = supervisor read, page not present Nov 17 17:59:18 elfi kernel: instruction pointer= 0x20:0xc0506b92 Nov 17 17:59:18 elfi kernel: stack pointer = 0x28:0xd56d7c9c Nov 17 17:59:18 elfi kernel: frame pointer = 0x28:0xd56d7c9c Nov 17 17:59:18 elfi kernel: code segment = base 0x0, limit 0xf, type 0x1b Nov 17 17:59:18 elfi kernel: = DPL 0, pres 1, def32 1, gran 1 Nov 17 17:59:18 elfi kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Nov 17 17:59:18 elfi kernel: current process= 36 (swi4: clock sio) Nov 17 17:59:18 elfi kernel: trap number= 12 Nov 17 17:59:18 elfi kernel: panic: page fault Nov 17 17:59:18 elfi kernel: Uptime: 8h55m1s ad10 and ad6, 2 brand new Maxtor Maxline 300GB SATA, attached to a Promise PDC40518 SATA150 controller, makes a GEOM mirror gm0s1. I've been running this stuff in another "test" machine (MSI K8N neo Platinum, KT333 chip I believe), and I havent had a single problem. I moved the disks/controllercard to my "real" server 24 hours ago, with the only apparent "problem" I seemd to have was this: Nov 17 07:06:12 elfi kernel: xl0: transmission error: 90 Nov 17 07:06:12 elfi kernel: xl0: tx underrun, increasing tx start threshold to 120 bytes Nov 17 07:06:18 elfi kernel: xl0: watchdog timeout Nov 17 07:06:18 elfi kernel: xl0: link state changed to DOWN Nov 17 07:06:18 elfi kernel: vlan5: link state changed to DOWN Nov 17 07:06:20 elfi kernel: xl0: link state changed to UP Nov 17 07:06:20 elfi kernel: vlan5: link state changed to UP Comming and going... these problems just apperade during first 20-30 minutes after boot, then they dissapeared totally (and yes there was plenty of IO on the net going on both during and after these messages). Sometimes i just got the first two messages and nothing "happened", but sometimes the watchdog message came and the network died for a minute or so. Here is dmesg from last boot (directly after crash): Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-RELEASE #0: Thu Nov 17 00:49:29 CET 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ELFI ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(TM) XP 1900+ (1599.56-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x662 Stepping = 2 Features=0x383fbffMCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> AMD Features=0xc0480800 real memory = 536854528 (511 MB) avail memory = 516014080 (492 MB) ioapic0: Changing APIC ID to 2 ioapic0 irqs 0-23 on motherboard npx0: [FAST] npx0: on motherboard n
Re: Page fault, GEOM problem??
On 18 nov 2005, at 10.17, Xin LI wrote: On 11/18/05, Johan Ström <[EMAIL PROTECTED]> wrote: Ok, just got this not so very nice error on a RELENG_6_0 box (built from sources this morning, GENERIC kernel minus drivers I dont use): The network card is the exact same model as the one I used in the "test" machine, didn't have any problems there.. [...] So, any ideas what this can be? If there were a disk crash, wish I have a hard time believing since I ran powermax (maxtor test program) on both of these disk 3 weeks ago and they have been running fine w/o a single problem since I started using them, why didn't just GEOM kick in and run on the other disk? Pagefaulting is not a way to react if a disk goes dead.. Hope someone can help me/this problem doesn't occur any more... but I suppose that is to much to hope for... Would you please consider trying to obtain a crashdump and send the backtrace so we can investigate more? (Hints can be found at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers- handbook/kerneldebug.html#KERNELDEBUG-OBTAIN) Thanks for answer Doesnt look like I got any "usable" dump devices.. When booting i get GEOM_MIRROR: Device gm0s1 created (id=4118114647). GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. GEOM_MIRROR: Device gm0s1: provider ad10s1 detected. GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. GEOM_MIRROR: Device gm0s1: rebuilding provider ad10s1. Trying to mount root from ufs:/dev/mirror/gm0s1a WARNING: / was not properly dismounted Loading configuration files. No suitable dump device was found. Entropy harvesting: interrupts ethernet point_to_point kickstart . swapon: adding /dev/mirror/gm0s1b as swap device Then naturally: /etc/rc: WARNING: Dump device does not exist. Savecore not run. Looked around in the rc-scripts and tried to figure out what it did, the dumpon script tries to autolookup a good dump device but finds none.. According to the page you linked to, the dumpon command has to be executed AFTER swapon.. Why is the rc scripts trying to run it before swapon then? Anyway, tried to do dumpon manually on my swap drive: $ dumpon -v /dev/mirror/gm0s1b dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported Didn't work too good.. Also tried savecore manually: $ savecore /var/crash/ /dev/mirror/gm0s1b savecore: no dumps found Didnt work very good either (but probably expected since there was no working dumps..) Google showed me some other thread in this list about gmirror swap dump, just a question (if it was supported) w/o any answers tho. Same error as I got. Hope this helps. Thanks again Johan Thanks, -- Xin LI <[EMAIL PROTECTED]> http://www.delphij.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Page fault, GEOM problem??
Hi! On 18 nov 2005, at 18.43, Xin LI wrote: Hi, Johan, On 11/18/05, Johan Ström <[EMAIL PROTECTED]> wrote: On 18 nov 2005, at 10.17, Xin LI wrote: [snip] Doesnt look like I got any "usable" dump devices.. When booting i get [...] Loading configuration files. No suitable dump device was found. Entropy harvesting: interrupts ethernet point_to_point kickstart . swapon: adding /dev/mirror/gm0s1b as swap device I see, so your both SATA disks are in the same mirror group... Then naturally: /etc/rc: WARNING: Dump device does not exist. Savecore not run. Looked around in the rc-scripts and tried to figure out what it did, the dumpon script tries to autolookup a good dump device but finds none.. Unfortunately, kernel dumps currently does not support every device, for some technical reasons (probably to simplify the crash code so they do not make more mistakes^Wdamages) According to the page you linked to, the dumpon command has to be executed AFTER swapon.. Why is the rc scripts trying to run it before swapon then? I guess this is because that dumpon now can detect dump device automatically, but I'm not quite sure about this. Will look for the reason. I think either Handbook should be updated, or the code should be corrected. What I am very curious is that why dumpon is "BEFORE" savecore. Maybe I have some misunderstanding... Sorry, partly my misstake.. I think i missunderstod how save savecore works below (when i tried it manually in last mail).. But the messages from above are directly from boot, seems it tries dumpon before savecore? Relevant bootlog from last boot: ad0: 2441MB at ata0-master UDMA33 acd0: CDROM at ata1-master PIO4 ad6: 286188MB at ata3-master SATA150 ad10: 286188MB at ata5-master SATA150 GEOM_MIRROR: Device gm0s1 created (id=4118114647). GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. GEOM_MIRROR: Device gm0s1: provider ad10s1 detected. GEOM_MIRROR: Device gm0s1: provider ad10s1 activated. GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. Trying to mount root from ufs:/dev/mirror/gm0s1a Loading configuration files. dumpon: (this DIOCSKERNELDUMP message is probably since i specified dumpdev in rc.conf so it forced useage of gm0s1b instead of letting the scripts autodetect.. ) ioctl(DIOCSKERNELDUMP) : Operation not supported Entropy harvesting: interrupts ethernet point_to_point kickstart . swapon: adding /dev/mirror/gm0s1b as swap device Starting file system checks: /dev/mirror/gm0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/mirror/gm0s1a: clean, 213811 free (771 frags, 26630 blocks, 0.3% fragmentation) /dev/mirror/gm0s1e: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/mirror/gm0s1e: clean, 1012917 free (85 frags, 126604 blocks, 0.0% fragmentation) /dev/mirror/gm0s1f: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/mirror/gm0s1f: clean, 115955787 free (40747 frags, 14489380 blocks, 0.0% fragmentation) /dev/mirror/gm0s1d: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/mirror/gm0s1d: clean, 1983354 free (4834 frags, 247315 blocks, 0.2% fragmentation) Starting devd. Mounting NFS file systems: . Creating and/or trimming log files: . Starting syslogd. Checking for core dump on /dev/mirror/gm0s1b... savecore: no dumps found Starting named. So, it seems it does run savecore after running dumpon and mounting disks etc... Is that wrong? Anyway, tried to do dumpon manually on my swap drive: $ dumpon -v /dev/mirror/gm0s1b dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported Didn't work too good.. Also tried savecore manually: $ savecore /var/crash/ /dev/mirror/gm0s1b savecore: no dumps found (This was my misstake, of course there are no dumps when I didnt have a dump when it crashed..) Didnt work very good either (but probably expected since there was no working dumps..) Google showed me some other thread in this list about gmirror swap dump, just a question (if it was supported) w/o any answers tho. Same error as I got. It seems that this could not be workaround'ed easily. If possible, my suggestion is that you attach a third disk and create a swap partition on it for the crash dump. If this is not feasible, then adding DDB and KDB may give us a chance to catch the panic and you can use "trace" command at the ddb> prompt to obtain a simplified backtrace, and there is good chance that it would reveal what is happening. I have cc'ed to Pawel who is very knowledgeable in this area, and let's see whether he has some better suggestions :-) Okay, just added an old but working 2 gig disk to the system, made it a swap and swapon'ed and: [EMAIL PROTECTED]:~$ dumpon -v /dev/ad0s1b kernel dumps on /dev/ad0s1b Great! :) So, let's see when/if it dies next time... Before I took it down for the dump-disk, it had been running fine for 1d 1h (since boot after crasch), however probably not as loaded as the day i
Re: Page fault, GEOM problem??
On 18 nov 2005, at 23.39, Michal Mertl wrote: Johan Ström wrote: Hi! On 18 nov 2005, at 18.43, Xin LI wrote: Hi, Johan, < large snip> So, it seems it does run savecore after running dumpon and mounting disks etc... Is that wrong? No, this is normal. When you run savecore you need to have mounted filesystems. In order to mount the filesystems they may have to be checked. The fsck program requires big amount of memory to check larger filesystems so the swap has to be enabled. Core dumps are written to the dump device (swap) from the end whereas the swap is normally used from the beginning (or the other way around). Therefore there's quite a big chance that, even when the swap has to be used for fsck, the core dump is intact and usable. If the usage of the swap file by fsck corrupts the core dump you may start after next crash in single user mode and run the commands manually (without enabling swap). As to why you can write kernel core dumps only to certain devices the answer is that at the time, when the kernel is dumping core, it is usually in pretty bad state, kernel internals may be corrupted and so on. The dumping code is therefore written to be quite low level so that even wedged kernel can be dumped. The dumping code is part of hard disk controller's drivers. The gmirror is quite high-level device and geom itself needs working scheduler so there will probably never be a way to dump on gmirror provided swap. When you issue the dumpon command the check is performed whether the driver for the disk you want to dump on supports kernel core dumps. Michal Well that makes sense... Then that is right at least.. :) I just noticed another thing... My disk performance... sucks! :P Some examples (from an otherwise unloaded system): [EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=1024 count=100 100+0 records in 100+0 records out 102400 bytes transferred in 77.014797 secs (13296146 bytes/sec) real1m17.100s user0m0.244s sys 0m10.140s 13MB/s from /dev/zero?? This was to my home dir (gm0s1f, last label on the slice/disk)).. When I'm about to open a new window in screen (ctrl-a-c) it takes forever (or rather, bash takes forever) to init when the above dd is running... Well, iostat during dd: [EMAIL PROTECTED]:~$ iostat tty ad0 ad6 ad10 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 164 2.19 0 0.00 50.52 3 0.17 50.99 3 0.17 1 0 1 1 97 0.17MB/s?? Am i missreading these iostats or something?.. Load averages directly after the dd is complete is at 0.36, 0.15, 0.05, so the dd doesnt take that much of aload to make bash work soo slow...Gotta be something else... Running diskinfo -t gives me good values (for /dev/ad6 and /dev/ad10) Transfer rates: outside: 102400 kbytes in 1.846578 sec =55454 kbytes/sec middle:102400 kbytes in 1.879855 sec =54472 kbytes/sec inside:102400 kbytes in 3.147158 sec =32537 kbytes/sec So it shouldnt be the disk itself.. those values are the same as when I hade the disk in the "temp" system.. However I never did try any dd speedtests there. Btw, tried to do regular cp on a dirtree at some gigs, same slooow speed.. Maybee my customkernel is fuckedup or something? It's just a GENERIC with some nonused devicedrivers removed so it would be strange... I'll recompile during night and test GENERIC tomorrow, reporting back.. Did try to move the cards (network/vga/sata) arround in the PCI ports, in case there were any strange conflicts... No difference except I only got one txerror from xl since last boot (wooh!) No crash so far. -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Page fault, GEOM problem??
On 19 nov 2005, at 02.35, Pawel Jakub Dawidek wrote: On Sat, Nov 19, 2005 at 01:55:57AM +0100, Johan Ström wrote: +> I just noticed another thing... My disk performance... sucks! :P +> +> Some examples (from an otherwise unloaded system): +> +> [EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=1024 count=100 +> 100+0 records in +> 100+0 records out +> 102400 bytes transferred in 77.014797 secs (13296146 bytes/sec) You won't get more with such small block size. Try bs=128k. Hi Can't say that a bigger blocksize did much better.. [EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=128k count=1 1+0 records in 1+0 records out 131072 bytes transferred in 98.519181 secs (13304211 bytes/sec) [EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=512k count=1 ^C3587+0 records in 3587+0 records out 1880621056 bytes transferred in 145.049578 secs (12965367 bytes/sec) [EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=50k count=1 1+0 records in 1+0 records out 51200 bytes transferred in 38.536217 secs (13286203 bytes/sec) All this time, iostats MB/s column wouldnt go over 0.24MB/s... Back on GENERIC: [EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=128k count=1 1+0 records in 1+0 records out 131072 bytes transferred in 99.497358 secs (13173415 bytes/sec) [EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=512k count=1000 1000+0 records in 1000+0 records out 524288000 bytes transferred in 39.019239 secs (13436654 bytes/sec) Still slow.. However, iostat goes up as high as 5.64MB/s on each disk in the mirror. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Page fault, GEOM problem??
On 19 nov 2005, at 00.30, Michal Mertl wrote: Parv wrote: in message <[EMAIL PROTECTED]>, wrote Michal Mertl thusly... Johan Ström wrote: On 18 nov 2005, at 18.43, Xin LI wrote: ... So, it seems it does run savecore after running dumpon and mounting disks etc... Is that wrong? No, this is normal. When you run savecore you need to have mounted filesystems. In order to mount the filesystems they may have to be checked. The fsck program requires big amount of memory to check larger filesystems so the swap has to be enabled. Core dumps are written to the dump device (swap) from the end whereas the swap is normally used from the beginning (or the other way around). Therefore there's quite a big chance that, even when the swap has to be used for fsck, the core dump is intact and usable. Is there any formula to calculate the size of swap to account for fsck & core dump while assigning swap size (short of having two swap partitions)? None that I know of. Someone posted to some FreeBSD mailing list some figures about the fsck consumption of memory. I really don't remember, but I think it was something like some MBs of memory per quite a lot of GB of file system space. E.g. that the fsck on "normally" sized file systems (e.g. at most a couple of hundred GB) doesn't normally cosume all of "normally" sized memory (>=256MB) and thus doesn't need to swap. If the usage of the swap file by fsck corrupts the core dump you may start after next crash in single user mode and run the commands manually (without enabling swap). Is that after kernel (re)boots? And would the commands to be executed be savecore followed by swapon? If the dump got corrupted by fsck, you would have to wait for another crash and dump. Then you would reboot and start in single user mode, repair the file systems without swap enabled (fsck would crash on the large file system(s)) and then run savecore. Swapon is then irrelevant, you probably don't need swap for savecore. After running savecore you can start normally multi user (exit from the single user shell). I didn't try all of that but I believe it should work. Michal I just got another coredump, hadn't had one since the first one. From messages: Nov 29 20:36:54 elfi kernel: subdisk10: detached Nov 29 20:36:54 elfi kernel: ad10: detached Nov 29 20:36:54 elfi kernel: unknown: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=426562704 Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad10s1 disconnected. Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134356992, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134373376, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134389760, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134438912, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268591104, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268607488, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268623872, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5966307328, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5967650816, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5968355328, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5968584704, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5969715200, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5971795968, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5972697088, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063848960, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063865344, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063881728, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063914496, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16064324096, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16064340480, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[
Re: Page fault, GEOM problem??
On 29 nov 2005, at 21.10, Johan Ström wrote: I just got another coredump, hadn't had one since the first one. From messages: Nov 29 20:36:54 elfi kernel: subdisk10: detached Nov 29 20:36:54 elfi kernel: ad10: detached Nov 29 20:36:54 elfi kernel: unknown: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=426562704 Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad10s1 disconnected. Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134356992, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134373376, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134389760, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134438912, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268591104, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268607488, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268623872, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5966307328, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5967650816, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5968355328, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5968584704, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5969715200, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5971795968, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5972697088, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063848960, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063865344, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063881728, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063914496, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16064324096, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16064340480, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16064373248, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16064471552, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18761523712, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18762850816, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18762867200, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18762883584, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18762899968, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18762949120, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18762965504, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18846032384, length=131072)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18846228992, length=131072)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18846441984, length=131072)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=18846638592, length=131072)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=20110369280, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=2011168, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=20111696384, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=21073961472, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=21073977856, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=21844845056, length=
Re: Page fault, GEOM problem??
On 29 nov 2005, at 21.10, Johan Ström wrote: On 19 nov 2005, at 00.30, Michal Mertl wrote: Parv wrote: in message <[EMAIL PROTECTED]>, wrote Michal Mertl thusly... Johan Ström wrote: On 18 nov 2005, at 18.43, Xin LI wrote: ... So, it seems it does run savecore after running dumpon and mounting disks etc... Is that wrong? No, this is normal. When you run savecore you need to have mounted filesystems. In order to mount the filesystems they may have to be checked. The fsck program requires big amount of memory to check larger filesystems so the swap has to be enabled. Core dumps are written to the dump device (swap) from the end whereas the swap is normally used from the beginning (or the other way around). Therefore there's quite a big chance that, even when the swap has to be used for fsck, the core dump is intact and usable. Is there any formula to calculate the size of swap to account for fsck & core dump while assigning swap size (short of having two swap partitions)? None that I know of. Someone posted to some FreeBSD mailing list some figures about the fsck consumption of memory. I really don't remember, but I think it was something like some MBs of memory per quite a lot of GB of file system space. E.g. that the fsck on "normally" sized file systems (e.g. at most a couple of hundred GB) doesn't normally cosume all of "normally" sized memory (>=256MB) and thus doesn't need to swap. If the usage of the swap file by fsck corrupts the core dump you may start after next crash in single user mode and run the commands manually (without enabling swap). Is that after kernel (re)boots? And would the commands to be executed be savecore followed by swapon? If the dump got corrupted by fsck, you would have to wait for another crash and dump. Then you would reboot and start in single user mode, repair the file systems without swap enabled (fsck would crash on the large file system(s)) and then run savecore. Swapon is then irrelevant, you probably don't need swap for savecore. After running savecore you can start normally multi user (exit from the single user shell). I didn't try all of that but I believe it should work. Michal I just got another coredump, hadn't had one since the first one. From messages: Nov 29 20:36:54 elfi kernel: subdisk10: detached Nov 29 20:36:54 elfi kernel: ad10: detached Nov 29 20:36:54 elfi kernel: unknown: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=426562704 Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad10s1 disconnected. Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134356992, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134373376, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134389760, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134438912, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268591104, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268607488, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268623872, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5966307328, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5967650816, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5968355328, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5968584704, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5969715200, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5971795968, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=5972697088, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063848960, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063865344, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063881728, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16063914496, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16064324096, length=16384)] Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=16064340480, length=16384)] Nov 29 20:36:54 elfi
Re: Page fault, GEOM problem??
On 22 jan 2006, at 22.58, Michael S. Eubanks wrote: ...snip... Can there be problems with the mobo/controllercard? Or is it more likely to be driver realted? Promise lists my motherboard (asus a7v333) in their manual for the controllercard (promise sataII 150 TX4). ...snip... After looking at the dmesg output, I am curious whether you are using the promise sataII 150 TX4 controller for the raid disks? I see you are using 6.0-RELEASE whereas I'm using 5.4-STABLE with that particular controller. My dmesg output for the disk array looks like the following: Hi! Thanks for response! Yes, this is a Promise SATAII 150 TX4 controller.. But afaik it doesnt do raid?? ad4: 238475MB [484521/16/63] at ata2-master SATA150 ad6: 238475MB [484521/16/63] at ata3-master SATA150 ad8: 238475MB [484521/16/63] at ata4-master SATA150 ad10: 238475MB [484521/16/63] at ata5- master SATA150 ar0: 953900MB [65535/255/63] status: READY subdisks: disk0 READY on ad4 at ata2-master disk1 READY on ad6 at ata3-master disk2 READY on ad8 at ata4-master disk3 READY on ad10 at ata5-master The device I mount as my raid filesystem is ar0s1 and I believe it corresponds to ``device ataraid'' in the kernel. I read the raid mirroring page in the handbook, although, I'm thinking your controller should represent each disk as ``ar0'' and handle the mirroring itself (possibly consisting of two sets of two disks). I really don't know though. No /dev/ar*.. It looks like the RAID1 mirroring tutorial is for systems that don't actually have a raid controller. Hence, the RAID0 tutorial is the one that I would be using if I did not use the promise controller. Because I _DO_ use the controller, I am simply able to manipulate the ar0 disk array as a single disk. I imagine your setup will differ, but I hope this helps. This card does afaik dont have raid functionalitys (I've never read anything about it either on the web, the cards box or anywhere else..). I'm running GENERIC, which does include ataraid.. What does your dmesg identify your card as? atapci0: port 0xb800-0xb87f, 0xb400-0xb4ff mem 0xfb80-0xfb800fff,0xfb00-0xfb01 irq 19 at device 12.0 on pci0 Is it the same PDC chipset? -- Johan -Michael ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]"
Re: Page fault, GEOM problem??
On 23 jan 2006, at 01.17, Michael S. Eubanks wrote: On Sun, 2006-01-22 at 23:51 +0100, Johan Ström wrote: ...snip... On 22 jan 2006, at 22.58, Michael S. Eubanks wrote: This card does afaik dont have raid functionalitys (I've never read anything about it either on the web, the cards box or anywhere else..). I'm running GENERIC, which does include ataraid.. What does your dmesg identify your card as? atapci0: port 0xb800-0xb87f, 0xb400-0xb4ff mem 0xfb80-0xfb800fff,0xfb00-0xfb01 irq 19 at device 12.0 on pci0 Is it the same PDC chipset? -- Johan No, I have a different controller. My mistake. I think what is happening is the DMA read command is failing, therefore causing the device to be disconnected, and the kernel can't write to the disk from that point on (this is somewhat obvious given the output below). Nov 29 20:36:54 elfi kernel: subdisk10: detached Nov 29 20:36:54 elfi kernel: ad10: detached Nov 29 20:36:54 elfi kernel: unknown: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=426562704 Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad10s1 disconnected. The message seen from the last line above is generated in any of the following scenarios (from g_mirror.c): 1. Device wasn't running yet, but disk disappear. 2. Disk was active and disapppear. 3. Disk disappear during synchronization process. Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134356992, length=16384)] As far as recovering the disk, I remember seeing something about booting to single user mode and using fsck after a core dump in a previous post. I'm assuming the disks worked initially and that you were able to label them etc? Is there any possibility that the disk state may be altered by a power saving feature or setting in the BIOS and FreeBSD just doesn't know when it happens until the next time it tries to access the disk? For recovering, i've always done a direct reboot, the gmirror rebuilds the mirror and fsck is run. No problems reading labels etc, and never has been, only problem has been these sporadic crashes.. And the read/write performance (see earlier in thread)... This is a server, so all bios setting for powersaving is (should be) shut of. Bios should thus never make the disk go to sleep. -Michael Thanks for trying to help! -- Johan
Re: Page fault, GEOM problem??
On 23 jan 2006, at 09.53, Michael S. Eubanks wrote: On Mon, 2006-01-23 at 06:43 +0100, Johan Ström wrote: Wish I could be of more help. :) Have you tried to toggle the sysctl dma flags? I've seen similar posts in the past with read timeouts caused from dma being enabled. # sysctl -a | grep dma ... hw.ata.ata_dma: 1 <=== Try turning this one off (1 ==> 0). hw.ata.atapi_dma: 1 ... Disabling DMA, wouldnt that give me pretty bad performance? -Michael ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]"
Re: Page fault, GEOM problem?? (also: using a ASUS A7N8X-XE/nForce2 utlra400?)
On 23 jan 2006, at 14.15, Michael S. Eubanks wrote: On Mon, 2006-01-23 at 10:24 +0100, Johan Ström wrote: On 23 jan 2006, at 09.53, Michael S. Eubanks wrote: On Mon, 2006-01-23 at 06:43 +0100, Johan Ström wrote: Wish I could be of more help. :) Have you tried to toggle the sysctl dma flags? I've seen similar posts in the past with read timeouts caused from dma being enabled. # sysctl -a | grep dma ... hw.ata.ata_dma: 1 <=== Try turning this one off (1 ==> 0). hw.ata.atapi_dma: 1 ... Disabling DMA, wouldnt that give me pretty bad performance? -Michael If it was not the problem, you could always change it back. It *should* be possible to simply set the control mode on those two disks (``man rc.early'', ``man atacontrol''). Unfortunately, the problem is noted as errata in several FreeBSD versions tending to appear on SATA disks. I believe this is also a problem with some linux setups. If you google ``FreeBSD hw.ata.ata_dma RELEASE'' you will eventually find the following page relating to Asus motherboards: http://www.ryxi.com/freebsd/63-668-write-dma-other-similar-errors- read.shtml I picked it out based on the following line in the dmesg output: Nov 29 20:46:09 elfi kernel: ACPI APIC Table: I'd say it's worth a shot. You might even try turning both the flags off temporarily to see what you get. Your guess is as good as mine. :) Okay, tried turning it of.. The disk IO speeds went even lower... whoping 9-10MB/s and lots of load ;) And since the crashes comes randomly (haven't been able to reproduce them "on deamon") i dont realy want to run it like this.. ;) I did another test. I moved the controller card and the disks to my MSI K8N Neo motherboard (with AMD64 3200+ etc), and immediatly I got write speeds of ~49MB/s: $ dd if=/dev/zero of=bigfile.zero bs=1024 count=124 1024024576 bytes transferred in 21.974227 secs (46601164 bytes/sec) Compared to $ dd if=/dev/zero of=bigfile.zero bs=1024 count=124 1024024576 bytes transferred in 78.897708 secs (12979142 bytes/sec) All tests where done in /dev/mirror/gm0s1f on /usr (ufs, NFS exported, local, soft-updates, acls) Soo.. I guess this mobo is just plain fucked and needs to be replaced with something newer ;) Bad thing is, this is Socket A.. so there isnt so many choices left in the mobo market.. However, i found a ASUS A7N8X-XE NF ULTRA 400 SOCKET A with Nforce2 Ultra 400 chipset.. Does anyone have any knowledge about this chipset? How well does it work with Fbsd? I'll do some googling but if someone is using this successfully or unsuccessfully, please let me know :) -- Johan
Re: Page fault, GEOM problem??
On 23 jan 2006, at 20.16, Paul T. Root wrote: My friends disks are SATA. The jumper was to force the drives to use the SATA 1.x 1.5 gig standard instead of the faster SATA 2.x standard. Older cards can have trouble recognizing newer disks. His were recognized, but very flaky. They've been solid since. These disk should be SATA150 afaik (Maxtor MaXLine III 300Gb). The promise card is named SATAII 150.. So shouldnt be any missmatching. Both card and disks supports NCQ.. Dunno about freebsd on the other hand..Havent found a way to enable/ disable this Johan Ström wrote: On 23 jan 2006, at 15.29, Paul T. Root wrote: I'm coming in very late here, and only have some hearsay. But, a friend of mine has built a new hobby machine, with twin 160G drives on a 3Ware 8006, working as a stripe. He had a bunch of problems with stability of the drives until I gave him a couple of tiny (half size) jumpers, that he put on the drive. Smooth sailing since them. If needed, I can find what the jumpers did. But looking through the controllers doco should give you a clue. As far as I know, SATA drives doesnt have jumpers.. Mine doesnt seem to do atleast.. There are two unused pins but i doubt they are for jumpers.. -- Paul Root "Few people know what to do when hula girls attack." - Sam, age 8
Re: Page fault, GEOM problem?? (also: using a ASUS A7N8X-XE/nForce2 utlra400?)
On 23 jan 2006, at 20.01, Johan Ström wrote: On 23 jan 2006, at 14.15, Michael S. Eubanks wrote: On Mon, 2006-01-23 at 10:24 +0100, Johan Ström wrote: On 23 jan 2006, at 09.53, Michael S. Eubanks wrote: On Mon, 2006-01-23 at 06:43 +0100, Johan Ström wrote: Wish I could be of more help. :) Have you tried to toggle the sysctl dma flags? I've seen similar posts in the past with read timeouts caused from dma being enabled. # sysctl -a | grep dma ... hw.ata.ata_dma: 1 <=== Try turning this one off (1 ==> 0). hw.ata.atapi_dma: 1 ... Disabling DMA, wouldnt that give me pretty bad performance? -Michael If it was not the problem, you could always change it back. It *should* be possible to simply set the control mode on those two disks (``man rc.early'', ``man atacontrol''). Unfortunately, the problem is noted as errata in several FreeBSD versions tending to appear on SATA disks. I believe this is also a problem with some linux setups. If you google ``FreeBSD hw.ata.ata_dma RELEASE'' you will eventually find the following page relating to Asus motherboards: http://www.ryxi.com/freebsd/63-668-write-dma-other-similar-errors- read.shtml I picked it out based on the following line in the dmesg output: Nov 29 20:46:09 elfi kernel: ACPI APIC Table: I'd say it's worth a shot. You might even try turning both the flags off temporarily to see what you get. Your guess is as good as mine. :) Okay, tried turning it of.. The disk IO speeds went even lower... whoping 9-10MB/s and lots of load ;) And since the crashes comes randomly (haven't been able to reproduce them "on deamon") i dont realy want to run it like this.. ;) I did another test. I moved the controller card and the disks to my MSI K8N Neo motherboard (with AMD64 3200+ etc), and immediatly I got write speeds of ~49MB/s: $ dd if=/dev/zero of=bigfile.zero bs=1024 count=124 1024024576 bytes transferred in 21.974227 secs (46601164 bytes/sec) Compared to $ dd if=/dev/zero of=bigfile.zero bs=1024 count=124 1024024576 bytes transferred in 78.897708 secs (12979142 bytes/sec) All tests where done in /dev/mirror/gm0s1f on /usr (ufs, NFS exported, local, soft-updates, acls) Soo.. I guess this mobo is just plain fucked and needs to be replaced with something newer ;) Bad thing is, this is Socket A.. so there isnt so many choices left in the mobo market.. However, i found a ASUS A7N8X-XE NF ULTRA 400 SOCKET A with Nforce2 Ultra 400 chipset.. Does anyone have any knowledge about this chipset? How well does it work with Fbsd? I'll do some googling but if someone is using this successfully or unsuccessfully, please let me know :) Got the board now, everything seems to work great, fine transferspeeds, no crashes so far (1 day..). Lets hope this thread ends here..:) -- Johan
Re: SCSI device timeout
On 1 feb 2006, at 11.42, Holm Tiffe wrote: Johan Ström wrote: On 1 feb 2006, at 10.57, Holm Tiffe wrote: Derkjan de Haan wrote: All, Today, after a cvsup (RELENG_6) and a rebuild of kernel and world, my system no longer boots. It hangs on Waiting 5 seconds for SCSI devices to settle Booting from the previous kernel allows my system to boot again. Please let me know if I can do anything to diagnose further. regards, Derkjan de Haan I have exactly the same problem here on a ASUS A7V333 Motherboard and an Adaptec 3960D SCSI Controller. The problem seems to be in the acpi interrupt routing, I've updated the mainboard Bios to the last available version in the meantime (1018.004 Beta) with no luck. Disabling acpi completly helps booting the machine again.. Hi I got one of those motherboards.. however no SCSI card but a promise. Ive hade huge problems with it (check out the "Page fault, GEOM problem??" thread). The problems i had was random crashes and very bad speed to the disks. It was solved by throwing the mobo out with a new one with nforce2 chipset... Got great speeds now and haven't had a crash since i installed it (roughly a week now). Johan Ström [EMAIL PROTECTED] http://www.stromnet.org/ No Johan, my A7V333 has no problem, it runs for arounrd 2 years now as my personal workstation here at work 24/7. I've cvsupped RELENG_6 again for an hour or so and the now build kernel runs flawlessly. There are some new patches in the pci code. Regards, Holm -- L&P::Kommunikation GbR Holm Tiffe * Administration, Development FreibergNet.de Internet Systems phone +49 3731 419010 Bereich Server & Technik fax +49 3731 4196026 D-09599 Freiberg * Am St. Niclas Schacht 13 http:// www.freibergnet.de Hi, yes I've been running it for around 2-3 years too, but with linux. A couple of months ago I switched to fbsd and problems began to occur. Might not be the same problem however.. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: gmirror/disk problems!
On 10 feb 2006, at 07.15, Johan Ström wrote: Hi list! I've been experiencing problems earlier with gmirror (thread "Page fault, GEOM problem??"). My gmirror crashed, and the box compleatly froze. Now I got a new mobo, and it has been working great since (no crashes, and i get decent 40-50mb/s read/write instead of ~10-20). This morning i woke up to this: subdisk4: detached ad4: detached unknown: TIMEOUT - READ_DMA retrying (1 retry left) LBA=187595536 unknown: timeout waiting to issue command unknown: error issueing READ_DMA command GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=134373376, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=134438912, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=268591104, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=268607488, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=268656640, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=5966399488, length=2048)] GEOM_MIRROR: Request failed (error=5). ad4s1[READ (offset=96048882176, length=32768)] Just like "old times"... However, no page faults! Yay.. But.. what is going on here?? Why does the atacontroler or whatever think they need to detach my disk?? And how do i reattach it? I have tried some stuff with atacontrol: $ atacontrol list ATA channel 0: Master: acd0 ATA/ATAPI revision 0 Slave: no device present ATA channel 1: Master: no device present Slave: no device present ATA channel 2: Master: no device present Slave: no device present ATA channel 3: Master: ad6 Serial ATA v1.0 Slave: no device present $ atacontrol attach ata2 atacontrol: ioctl(IOCATAATTACH): File exists $ atacontrol reinit ata2 < here i get a long system wide block> Master: no device present Slave: no device present $ Okay so no luck reiniting it.. I dont realy wanna reboot the box (each time this might happen).. But im happy that it doesnt crash totally anymore heh... dmesg of current system: Feb 2 19:39:09 elfi syslogd: kernel boot file is /boot/kernel/kernel Feb 2 19:39:09 elfi kernel: Copyright (c) 1992-2005 The FreeBSD Project. Feb 2 19:39:09 elfi kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Feb 2 19:39:09 elfi kernel: The Regents of the University of California. All rights reserved. Feb 2 19:39:09 elfi kernel: FreeBSD 6.0-RELEASE #2: Thu Dec 1 20:18:30 CET 2005 Feb 2 19:39:09 elfi kernel: [EMAIL PROTECTED]:/usr/obj/usr/src/ sys/GENERIC Feb 2 19:39:09 elfi kernel: ACPI APIC Table: Feb 2 19:39:09 elfi kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Feb 2 19:39:09 elfi kernel: CPU: AMD Athlon(tm) XP (1200.01-MHz 686- class CPU) Feb 2 19:39:09 elfi kernel: Origin = "AuthenticAMD" Id = 0x662 Stepping = 2 Feb 2 19:39:09 elfi kernel: Features=0x383fbffMCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> Feb 2 19:39:09 elfi kernel: AMD Features=0xc04808003DNow+,3DNow> Feb 2 19:39:09 elfi kernel: real memory = 536674304 (511 MB) Feb 2 19:39:09 elfi kernel: avail memory = 515833856 (491 MB) Feb 2 19:39:09 elfi kernel: ioapic0 irqs 0-23 on motherboard Feb 2 19:39:09 elfi kernel: npx0: [FAST] Feb 2 19:39:09 elfi kernel: npx0: on motherboard Feb 2 19:39:09 elfi kernel: npx0: INT 16 interface Feb 2 19:39:09 elfi kernel: acpi0: on motherboard Feb 2 19:39:09 elfi kernel: acpi0: Power Button (fixed) Feb 2 19:39:09 elfi kernel: pci_link0: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link1: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link2: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link3: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link4: irq 11 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link5: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link6: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link7: irq 3 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link8: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link9: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link10: irq 11 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link11: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link12: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link13: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link14: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link15: irq 10 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link16: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 Feb 2 19:39:09 elfi kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 Feb 2 19:39:09 elfi kernel: cpu0: on acpi0 Feb 2 19:39:09 elfi kernel: acpi_throttle0: on cpu0 Feb 2 19:39:09 elfi kernel: pcib0: port
Re: gmirror/disk problems!
On 10 feb 2006, at 07.43, Johan Ström wrote: On 10 feb 2006, at 07.15, Johan Ström wrote: Hi list! I've been experiencing problems earlier with gmirror (thread "Page fault, GEOM problem??"). My gmirror crashed, and the box compleatly froze. Now I got a new mobo, and it has been working great since (no crashes, and i get decent 40-50mb/s read/write instead of ~10-20). This morning i woke up to this: subdisk4: detached ad4: detached unknown: TIMEOUT - READ_DMA retrying (1 retry left) LBA=187595536 unknown: timeout waiting to issue command unknown: error issueing READ_DMA command GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=134373376, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=134438912, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=268591104, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=268607488, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=268656640, length=16384)] GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE (offset=5966399488, length=2048)] GEOM_MIRROR: Request failed (error=5). ad4s1[READ (offset=96048882176, length=32768)] Just like "old times"... However, no page faults! Yay.. But.. what is going on here?? Why does the atacontroler or whatever think they need to detach my disk?? And how do i reattach it? I have tried some stuff with atacontrol: $ atacontrol list ATA channel 0: Master: acd0 ATA/ATAPI revision 0 Slave: no device present ATA channel 1: Master: no device present Slave: no device present ATA channel 2: Master: no device present Slave: no device present ATA channel 3: Master: ad6 Serial ATA v1.0 Slave: no device present $ atacontrol attach ata2 atacontrol: ioctl(IOCATAATTACH): File exists $ atacontrol reinit ata2 < here i get a long system wide block> Master: no device present Slave: no device present $ Okay so no luck reiniting it.. I dont realy wanna reboot the box (each time this might happen).. But im happy that it doesnt crash totally anymore heh... dmesg of current system: Feb 2 19:39:09 elfi syslogd: kernel boot file is /boot/kernel/kernel Feb 2 19:39:09 elfi kernel: Copyright (c) 1992-2005 The FreeBSD Project. Feb 2 19:39:09 elfi kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Feb 2 19:39:09 elfi kernel: The Regents of the University of California. All rights reserved. Feb 2 19:39:09 elfi kernel: FreeBSD 6.0-RELEASE #2: Thu Dec 1 20:18:30 CET 2005 Feb 2 19:39:09 elfi kernel: [EMAIL PROTECTED]:/usr/obj/usr/ src/sys/GENERIC Feb 2 19:39:09 elfi kernel: ACPI APIC Table: Feb 2 19:39:09 elfi kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Feb 2 19:39:09 elfi kernel: CPU: AMD Athlon(tm) XP (1200.01-MHz 686-class CPU) Feb 2 19:39:09 elfi kernel: Origin = "AuthenticAMD" Id = 0x662 Stepping = 2 Feb 2 19:39:09 elfi kernel: Features=0x383fbffE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> Feb 2 19:39:09 elfi kernel: AMD Features=0xc0480800+,3DNow+,3DNow> Feb 2 19:39:09 elfi kernel: real memory = 536674304 (511 MB) Feb 2 19:39:09 elfi kernel: avail memory = 515833856 (491 MB) Feb 2 19:39:09 elfi kernel: ioapic0 irqs 0-23 on motherboard Feb 2 19:39:09 elfi kernel: npx0: [FAST] Feb 2 19:39:09 elfi kernel: npx0: on motherboard Feb 2 19:39:09 elfi kernel: npx0: INT 16 interface Feb 2 19:39:09 elfi kernel: acpi0: on motherboard Feb 2 19:39:09 elfi kernel: acpi0: Power Button (fixed) Feb 2 19:39:09 elfi kernel: pci_link0: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link1: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link2: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link3: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link4: irq 11 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link5: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link6: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link7: irq 3 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link8: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link9: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link10: irq 11 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link11: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link12: irq 5 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link13: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link14: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link15: irq 10 on acpi0 Feb 2 19:39:09 elfi kernel: pci_link16: irq 0 on acpi0 Feb 2 19:39:09 elfi kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 Feb 2 19:39:09 elfi kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 Feb 2 19:39:09 elfi kernel: cpu0: on acpi0 Feb 2 19:39:09 elfi kernel: acpi_throttle0:
Re: gmirror/disk problems!
On 10 feb 2006, at 07.15, Johan Ström wrote: Hi list! I've been experiencing problems earlier with gmirror (thread "Page fault, GEOM problem??"). My gmirror crashed, and the box compleatly froze. Now I got a new mobo, and it has been working great since (no crashes, and i get decent 40-50mb/s read/write instead of ~10-20). This morning i woke up to this: ... I could try to move the disks to my promise sata2 tx4 card i bought for the old mobo (which didnt have sata)... But i'd rather find the problem ;) Hope someone can help. Thanks Johan And now it happened again.. Feb 26 00:13:27 elfi kernel: subdisk4: detached Feb 26 00:13:27 elfi kernel: ad4: detached Feb 26 00:13:27 elfi kernel: unknown: TIMEOUT - READ_DMA retrying (1 retry left) LBA=11660623 Feb 26 00:13:27 elfi kernel: unknown: timeout waiting to issue command Feb 26 00:13:27 elfi kernel: unknown: error issueing READ_DMA command Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=5). ad4s1[READ(offset=5970206720, length=16384)] Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE(offset=5974401024, length=131072)] Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE(offset=5976973312, length=131072)] Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE(offset=5977153536, length=131072)] Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE(offset=5977333760, length=131072)] Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE(offset=5977530368, length=131072)] Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE(offset=5977710592, length=131072)] Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE(offset=5977907200, length=131072)] Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE(offset=5978087424, length=131072)] Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE(offset=5978939392, length=114688)] And then on reboot Feb 26 20:17:53 elfi kernel: ad4: 286188MB at ata2-master SATA150 Feb 26 20:17:53 elfi kernel: ad6: 286188MB at ata3-master SATA150 Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1 created (id=4118114647). Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 detected. Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. Feb 26 20:17:53 elfi kernel: Root mount waiting for: GMIRROR Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: rebuilding provider ad4s1. Rebuilding currently This problem have occured many times now.. Does anyone know why this happens? Is there some bug somewhere that needs to be haunted down?? In geom? in ata driver? This needs to be solved.. Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
GEOM problems again...
Hi I've had problems before with GEOM mirror and my SATA drives, and i've posted about it here before too. The solution seemd to be a change of motherboard, this reduced the crash very much (and also the speeds archieved was greatly improved, from 10-15MB/s to 40-50MB/s..). However after the change i had one or two crashes, but now it has been running for well over 50-60 days or so without any problems. Then, 11 days ago I upgraded to 6.1... And now I got these "crashe"s again (the mirror is crashed that is, the system still runs fine): May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached May 21 02:04:58 elfi kernel: subdisk6: detached May 21 02:04:58 elfi kernel: ad6: detached May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 disconnected. May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=11006308352, length=2048)]error = 6 May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=164847927296, length=131072)]error = 6 May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=256680296448, length=32768)]error = 6 Some info about the controller and disks: May 9 22:46:52 elfi kernel: ata1: on atapci0 May 9 22:46:52 elfi kernel: atapci1: controller> port 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0f, 0x7c0 0-0x7c7f irq 22 at device 11.0 on pci0 May 9 22:46:52 elfi kernel: ad4: 286188MB at ata2-master SATA150 May 9 22:46:52 elfi kernel: ad6: 286188MB at ata3-master SATA150 May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created (id=4118114647). May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 detected. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 activated. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. May 9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/ mirror/gm0s1a Anyone got any new clues? Afaik the disks should be working fine (they are 6 months old and this same problem has occured multiple times...) Hope to solve this ;) Thanks Johan
maxproc limit exceeded by uid 0
Hi Today I woke up and was not able to log in to my system (ssh). Some stuff worked (DNS for example, this box runs bind), altough the IMAP server didnt work to well... Anyway, I checked out local console: maxproc limit exceeded by uid 0, please see tuning(7) and login.conf(5). Repeated 23 times... I was not able to do anything, neither local or remote, ACPI didnt work very well except for giving me acpi: suspend request ignored (not ready yet) the second time I pressed the power button... So a hard reboot it was. Anyway.. I'm using default login.conf, which have unlimited for all resource limits.. So wtf is this? As far as I know there shouldnt be any processes running away but you never know... The only thing would be a "umount -f /some/nfs" and a "df -h" running (the umount as root) but both hanged since the NFS volume was unreachable, but why would this fork like this? Dunno what more info could be useable, doesnt have much more in logs... Ström___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: maxproc limit exceeded by uid 0
On 22 jun 2006, at 09.57, [LoN]Kamikaze wrote: Johan Ström wrote: Anyway.. I'm using default login.conf, which have unlimited for all resource limits.. So wtf is this? Look at # sysctl kern.maxproc Okay, 4096 procs... But what was those 4k procs...On my newly booted i got 127... Well I guess there is now way to find out now.___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: maxproc limit exceeded by uid 0
On 22 jun 2006, at 17.42, Dan Nelson wrote: If it ever happens again, you can drop to the debugger with Ctrl-Alt-ESC and run "ps" to get a list of running processes. You might even be able to recover by killing some offending processes with "kill 9 ", then continue with "c". Hm, I tried this on a 6.1 GENERIC box just now, ctrl-alt-esc doesnt seem to give me any debugger... I suppose I have to recompile with DDB for this? Is this recommended for servers where I normally dont need DDB? Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: maxproc limit exceeded by uid 0
On 22 jun 2006, at 18.45, Dan Nelson wrote: In the last episode (Jun 22), Johan Strm said: On 22 jun 2006, at 17.42, Dan Nelson wrote: If it ever happens again, you can drop to the debugger with Ctrl-Alt-ESC and run "ps" to get a list of running processes. You might even be able to recover by killing some offending processes with "kill 9 ", then continue with "c". Hm, I tried this on a 6.1 GENERIC box just now, ctrl-alt-esc doesnt seem to give me any debugger... I suppose I have to recompile with DDB for this? Is this recommended for servers where I normally dont need DDB? Right; DDB isn't in GENERIC. The problem with not including DDB on servers you don't think you'll need it on is: the one time you need it, it's not there :) Very true.. ;) But are there any reasons NOT to have it on my servers? -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: SNMP access to pf ALTQ data?
On 8 jul 2006, at 09.18, J. Buck Caldwell wrote: Forgive the cross-posting, but I think I need a wider audience. Is it possible to track pf ALTQ usage with MRTG? I notice that FreeBSD's built-in bsnmpd has a module and mibs to support pf, but I know too little about SNMP to figure out how to access the queue stats. Specifically, I'm looking to make a series of MRTG graphs that show the total bytes that pass through each queue. I figure if worst comes to worst, I can work out a separate program that parses the output of 'pfctl -vsq' and returns that as MRTG-readable input, but it would be much smoother to get it via SNMP, if it can be done. I got one of those, a small python script which feeds the data into a rrd file: https://svn.stromnet.org/repos/misc/trunk/rrd/pfque-rrd.py Works fine, the only problem I have is when i reload my rules (that is, reset the counters).. The graph goes mad ;) Altough, if there is some way to do this via SNMP instead, I would also like to know... The above scripts uses tftp to move the rrdfiles to my graphing host. I call it from crontab every minute. For the graphing I use this: https://svn.stromnet.org/repos/misc/trunk/rrd/pfque-graph.py And the result looks like this: http://stats.stromnet.org/router/details.php?file=pfqueue_out If you look at the last month/year graphs, you see the problem with resetting the counters.. Any help would be appreciated. I'm sure others would be interested in this as well. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: GEOM problems again...
On 21 maj 2006, at 11.16, Johan Ström wrote: Hi I've had problems before with GEOM mirror and my SATA drives, and i've posted about it here before too. The solution seemd to be a change of motherboard, this reduced the crash very much (and also the speeds archieved was greatly improved, from 10-15MB/s to 40-50MB/s..). However after the change i had one or two crashes, but now it has been running for well over 50-60 days or so without any problems. Then, 11 days ago I upgraded to 6.1... And now I got these "crashe"s again (the mirror is crashed that is, the system still runs fine): May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached May 21 02:04:58 elfi kernel: subdisk6: detached May 21 02:04:58 elfi kernel: ad6: detached May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 disconnected. May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=11006308352, length=2048)]error = 6 May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=164847927296, length=131072)]error = 6 May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=256680296448, length=32768)]error = 6 Some info about the controller and disks: May 9 22:46:52 elfi kernel: ata1: on atapci0 May 9 22:46:52 elfi kernel: atapci1: controller> port 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0f, 0x7c0 0-0x7c7f irq 22 at device 11.0 on pci0 May 9 22:46:52 elfi kernel: ad4: 286188MB BANC1G10> at ata2-master SATA150 May 9 22:46:52 elfi kernel: ad6: 286188MB BANC1G10> at ata3-master SATA150 May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created (id=4118114647). May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 detected. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 activated. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. May 9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/ mirror/gm0s1a Anyone got any new clues? Afaik the disks should be working fine (they are 6 months old and this same problem has occured multiple times...) Hope to solve this ;) Thanks Johan Here we go again Jul 7 16:20:09 elfi kernel: ad4: FAILURE - device detached Jul 7 16:20:09 elfi kernel: subdisk4: detached Jul 7 16:20:09 elfi kernel: ad4: detached Jul 7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. Jul 7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=88896847872, length=32768)]error = 6 However no read read timeouts etc as before, just this. 18 days uptime this time (i've rebooted for other reasons since last mail). It always seems to be ad4 that is disconnecting.. I'm going to do some disk tests on it but i doubt it will give anything since i've had similiar problems from day one (did tests at that time w/o problems) with this gmirror setup (new disks). Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: GEOM problems again...
On 10 jul 2006, at 11.09, Johan Ström wrote: On 21 maj 2006, at 11.16, Johan Ström wrote: Hi I've had problems before with GEOM mirror and my SATA drives, and i've posted about it here before too. The solution seemd to be a change of motherboard, this reduced the crash very much (and also the speeds archieved was greatly improved, from 10-15MB/s to 40-50MB/s..). However after the change i had one or two crashes, but now it has been running for well over 50-60 days or so without any problems. Then, 11 days ago I upgraded to 6.1... And now I got these "crashe"s again (the mirror is crashed that is, the system still runs fine): May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached May 21 02:04:58 elfi kernel: subdisk6: detached May 21 02:04:58 elfi kernel: ad6: detached May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 disconnected. May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=11006308352, length=2048)]error = 6 May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=164847927296, length=131072)]error = 6 May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=256680296448, length=32768)]error = 6 Some info about the controller and disks: May 9 22:46:52 elfi kernel: ata1: on atapci0 May 9 22:46:52 elfi kernel: atapci1: controller> port 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0f ,0x7c0 0-0x7c7f irq 22 at device 11.0 on pci0 May 9 22:46:52 elfi kernel: ad4: 286188MB BANC1G10> at ata2-master SATA150 May 9 22:46:52 elfi kernel: ad6: 286188MB BANC1G10> at ata3-master SATA150 May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created (id=4118114647). May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 detected. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 activated. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. May 9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/ mirror/gm0s1a Anyone got any new clues? Afaik the disks should be working fine (they are 6 months old and this same problem has occured multiple times...) Hope to solve this ;) Thanks Johan Here we go again Jul 7 16:20:09 elfi kernel: ad4: FAILURE - device detached Jul 7 16:20:09 elfi kernel: subdisk4: detached Jul 7 16:20:09 elfi kernel: ad4: detached Jul 7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. Jul 7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=88896847872, length=32768)]error = 6 However no read read timeouts etc as before, just this. 18 days uptime this time (i've rebooted for other reasons since last mail). It always seems to be ad4 that is disconnecting.. I'm going to do some disk tests on it but i doubt it will give anything since i've had similiar problems from day one (did tests at that time w/o problems) with this gmirror setup (new disks). Johan Followup, I ran over the disk with Maxtors own test program, full length test. Not a single problem. After reboot the raid is rebuilding fine: GEOM_MIRROR: Device gm0s1: rebuilding provider ad4s1. As usual it seems i cannot get the controller/driver to redetect the disk using atacontrol etc.. Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: GEOM problems again...
On 10 jul 2006, at 13.59, Johan Ström wrote: On 10 jul 2006, at 11.09, Johan Ström wrote: On 21 maj 2006, at 11.16, Johan Ström wrote: Hi I've had problems before with GEOM mirror and my SATA drives, and i've posted about it here before too. The solution seemd to be a change of motherboard, this reduced the crash very much (and also the speeds archieved was greatly improved, from 10-15MB/s to 40-50MB/s..). However after the change i had one or two crashes, but now it has been running for well over 50-60 days or so without any problems. Then, 11 days ago I upgraded to 6.1... And now I got these "crashe"s again (the mirror is crashed that is, the system still runs fine): May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached May 21 02:04:58 elfi kernel: subdisk6: detached May 21 02:04:58 elfi kernel: ad6: detached May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 disconnected. May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=11006308352, length=2048)]error = 6 May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=164847927296, length=131072)]error = 6 May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=256680296448, length=32768)]error = 6 Some info about the controller and disks: May 9 22:46:52 elfi kernel: ata1: on atapci0 May 9 22:46:52 elfi kernel: atapci1: controller> port 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0 f,0x7c0 0-0x7c7f irq 22 at device 11.0 on pci0 May 9 22:46:52 elfi kernel: ad4: 286188MB BANC1G10> at ata2-master SATA150 May 9 22:46:52 elfi kernel: ad6: 286188MB BANC1G10> at ata3-master SATA150 May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created (id=4118114647). May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 detected. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 activated. May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. May 9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/ mirror/gm0s1a Anyone got any new clues? Afaik the disks should be working fine (they are 6 months old and this same problem has occured multiple times...) Hope to solve this ;) Thanks Johan Here we go again Jul 7 16:20:09 elfi kernel: ad4: FAILURE - device detached Jul 7 16:20:09 elfi kernel: subdisk4: detached Jul 7 16:20:09 elfi kernel: ad4: detached Jul 7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. Jul 7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=88896847872, length=32768)]error = 6 However no read read timeouts etc as before, just this. 18 days uptime this time (i've rebooted for other reasons since last mail). It always seems to be ad4 that is disconnecting.. I'm going to do some disk tests on it but i doubt it will give anything since i've had similiar problems from day one (did tests at that time w/o problems) with this gmirror setup (new disks). Johan Followup, I ran over the disk with Maxtors own test program, full length test. Not a single problem. After reboot the raid is rebuilding fine: GEOM_MIRROR: Device gm0s1: rebuilding provider ad4s1. As usual it seems i cannot get the controller/driver to redetect the disk using atacontrol etc.. Johan And now again... raid gone degraded only 2 days after reboot! Jul 12 22:22:50 elfi kernel: ad4: FAILURE - device detached Jul 12 22:22:50 elfi kernel: subdisk4: detached Jul 12 22:22:50 elfi kernel: ad4: detached Jul 12 22:22:50 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. Jul 12 22:22:50 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=120776474624, length=32768)]error = 6 $ uname -a FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue May 9 20:40:23 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/ sys/GENERIC i386 Still no luck with atacontrol... Is there any way to debug this further ?? I've tested the disk, the SATA cables are new... I've had similar problems with other motherboard... I dont think this is related to hw problems, but rather a softwareproblem that needs to be solved, this is not something one can call stable ;) So, any pointers how to enable more debugging or anything that could give some clues? Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... (was: Re: GEOM problems again...)
On 13 jul 2006, at 14.26, Robert Watson wrote: On Thu, 13 Jul 2006, Johan Ström wrote: And now again... raid gone degraded only 2 days after reboot! Jul 12 22:22:50 elfi kernel: ad4: FAILURE - device detached Jul 12 22:22:50 elfi kernel: subdisk4: detached Jul 12 22:22:50 elfi kernel: ad4: detached Jul 12 22:22:50 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. Jul 12 22:22:50 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=120776474624, length=32768)]error = 6 $ uname -a FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue May 9 20:40:23 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/ sys/GENERIC i386 Still no luck with atacontrol... Is there any way to debug this further ?? I've tested the disk, the SATA cables are new... I've had similar problems with other motherboard... I dont think this is related to hw problems, but rather a softwareproblem that needs to be solved, this is not something one can call stable ;) So, any pointers how to enable more debugging or anything that could give some clues? I don't have a whole lot to add to this thread, but have changed the subject to make sure that the right people are reading this. This is likely either a hardware problem (motherboard/cable/drive) or driver problem. GEOM and the mirror driver seems to be behaving as desired (it detaches a drive reported by the driver as being bad). Could you post the dmesg -v output for the probing of the ata controller and driver? dmesg -v? I got the full dmesg from dmesg.boot (this has been posted earlier in this thread too) Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-RELEASE #3: Tue May 9 20:40:23 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) XP (1200.01-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x662 Stepping = 2 Features=0x383fbffMCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> AMD Features=0xc0480800 real memory = 536674304 (511 MB) avail memory = 515805184 (491 MB) ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xf800-0xfbff at device 0.0 on pci0 pci0: at device 0.1 (no driver attached) pci0: at device 0.2 (no driver attached) pci0: at device 0.3 (no driver attached) pci0: at device 0.4 (no driver attached) pci0: at device 0.5 (no driver attached) isab0: at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) ohci0: mem 0xfebfb000-0xfebfbfff irq 20 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered ohci1: mem 0xfebfc000-0xfebfcfff irq 21 at device 2.1 on pci0 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: on ohci1 usb1: USB revision 1.0 uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 4 ports with 4 removable, self powered ehci0: mem 0xfebfdc00-0xfebfdcff irq 22 at device 2.2 on pci0 ehci0: [GIANT-LOCKED] usb2: EHCI version 1.0 usb2: companion controllers, 4 ports each: usb0 usb1 usb2: on ehci0 usb2: USB revision 2.0 uhub2: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 8 ports with 8 removable, self powered nve0: port 0xdc00-0xdc07 mem 0xfebfe000-0xfebfefff irq 20 at device 4.0 on pci0 nve0: Ethernet address 00:13:d4:bf:5b:79 miibus0: on nve0 rlphy0: on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto nve0: Ethernet address: 00:13:d4:bf:5b:79 pci0: at device 6.0 (no driver attached) pcib1: at device 8.0 on pci0 pci2: on pcib1 pci2: at device 6.0 (no driver attached) xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xcc00-0xcc7f mem 0xfeafec00-0xfeafec7f irq 17 at device 9.0 on pci2 miibus1: on xl0 xlphy0: <3c905C 10/100 internal PHY> on miibus1 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:04:76:ef:c6:36 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 9.0 on pci0 ata0: on atapci0 ata1: on atapci0 atapci1: port 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0f, 0x7c00-0x7c7f irq 22 at device 11. 0 on pci0 ata2: on atapci1 ata3: on atapci1 pcib2: at device 30.0 on pci0 pci1: on pcib2 acpi_button0: on acpi0 fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] ppc0: port 0x378-0x37f,0x778
Re: ATA problems again ...
On 17 jul 2006, at 00.53, Mike Tancsa wrote: At 03:02 PM 14/07/2006, Miroslav Lachman wrote: After reboot (command reboot), system boot up with both disks attached and start autosynchronization. I do not know, if this is hw or sw error, I got Install the smartmontools from /usr/ports/sysutils/smartmontools/ and post the output of smartctl -a /dev/ad8 I tried this on my SATA disk ad6: === START OF INFORMATION SECTION === Model Family: Maxtor MaXLine III family Device Model: Maxtor 7L300S0 Serial Number:L60CJKPH Firmware Version: BANC1G10 User Capacity:300,090,728,448 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is:Mon Jul 17 11:54:35 2006 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled SMART Error Log Version: 1 No Errors Logged Any other output from smartctl that can help? Both my ad4 and ad6 are the same. Now I had yet another crash.. They are very much more intense now the latest days... Two crashes ago I changed the SATA cable to ad4, i wonder if that had anything to do with it... On the other hand, now it was ad6 that got lost, so why would ad4's cable make any difference.. i'll change ad6 now too when i've taken the box down.. Jul 16 03:27:25 elfi kernel: ad6: FAILURE - device detached Jul 16 03:27:25 elfi kernel: subdisk6: detached Jul 16 03:27:25 elfi kernel: ad6: detached Jul 16 03:27:25 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6s1 disconnected. Jul 16 03:27:25 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=27210082304, length=2048)]error = 6 Jul 16 03:27:25 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=35153985536, length=32768)]error = 6 Jul 16 03:27:25 elfi kernel: ufs_access(): Error retrieving ACL on object (6). Jul 16 03:27:25 labdator kernel: nfs: server 192.168.1.2 not responding, still trying Jul 16 03:27:25 labdator kernel: nfs: server 192.168.1.2 OK Those last 3 messages seems to be very related to the gmirror going to degraded mode? Some ACL reading and a mounted NFS system (192.168.1.2 is the failing box). Is there some way to enable more debug info output or something?? Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ...
On 17 jul 2006, at 16.51, Mike Tancsa wrote: At 05:59 AM 17/07/2006, Johan Ström wrote: On 17 jul 2006, at 00.53, Mike Tancsa wrote: At 03:02 PM 14/07/2006, Miroslav Lachman wrote: After reboot (command reboot), system boot up with both disks attached and start autosynchronization. I do not know, if this is hw or sw error, I got Install the smartmontools from /usr/ports/sysutils/smartmontools/ and post the output of smartctl -a /dev/ad8 I tried this on my SATA disk ad6: === START OF INFORMATION SECTION === Model Family: Maxtor MaXLine III family Device Model: Maxtor 7L300S0 Serial Number:L60CJKPH Firmware Version: BANC1G10 User Capacity:300,090,728,448 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is:Mon Jul 17 11:54:35 2006 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled SMART Error Log Version: 1 No Errors Logged Any other output from smartctl that can help? Both my ad4 and ad6 are the same. This at least rules out the disks being bad for the most part. It still could be bad cables, but if you changed those out than its doubtful. Perhaps try updating to RELENG_6 ? If its a gmirror issue, I think there have been a number of fixes. Just ran PowerMax (maxtors own testing software) full length test on ad6, not a single problem.. Same result as on ad4 a couple of days ago.. So no, i doubt it's the disks. I've changed the other SATA cable too now (the one one ad6), this was a fresh one never used before. I'll change ad4 too when i take it down for reboot. I'm currently running RELENG_6_1, however from may 9th. Have there been any ata/gmirror changes merged to 6_1 since then? If I run RELENG_6 instead, how big is the change any other problems might arise? ;) I've never used anything other than "stable".. Thanks Johan___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ...
On 17 jul 2006, at 17.40, Miroslav Lachman wrote: Mike Tancsa wrote: [..] Install the smartmontools from /usr/ports/sysutils/smartmontools/ and post the output of smartctl -a /dev/ad8 smartmontools was previously installed and running as daemon without any bad reports. I can not run "smartctl -a /dev/ad8" now, because my server housing provider replaced HDD with the new one and after an hour of synchronization "ad8: FAILURE - device detached". So provider replaced whole server, only ad4 is original piece of HW. On new server synchronization was much faster then in previous server (1:30 hour compared to 5 hours in previous server) - so I think it was HW problem. Now I am running stresstest with copying /usr/ports to another partition in infinite loop. I will post results later. (On bad server, test failed after about 30 minutes. On another server the test is running fine second day, so I think if disk will not fail after 1 day, problem is solved) At last - now I think this was not GEOM/gmirror related. I tried remove ad8 provider from gmirror (gm0), boot up system from gm0 with one provider (ad4) and test ad8 mounted separately - ad8 failed again. Just got another one.. Jul 25 13:30:47 elfi kernel: ad4: FAILURE - device detached Jul 25 13:30:47 elfi kernel: subdisk4: detached Jul 25 13:30:47 elfi kernel: ad4: detached Jul 25 13:30:47 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. Jul 25 13:30:47 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=46318008320, length=2048)]error = 6 Jul 25 13:30:47 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=77269614592, length=16384)]error = 6 6 days uptime when this occured... Both disks are tested with PowerMax without a single problem (same with smartctl), both SATA cables are new. So the only hwproblem that I cant rule out would be the mobo, but that is quite new too... Solutions? Try RELENG_6 as recommended earlier? Thanks Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... This time system froze!
On Jul 28, 2006, at 13:15 , Johan Ström wrote: On 17 jul 2006, at 17.40, Miroslav Lachman wrote: Mike Tancsa wrote: [..] Install the smartmontools from /usr/ports/sysutils/smartmontools/ and post the output of smartctl -a /dev/ad8 smartmontools was previously installed and running as daemon without any bad reports. I can not run "smartctl -a /dev/ad8" now, because my server housing provider replaced HDD with the new one and after an hour of synchronization "ad8: FAILURE - device detached". So provider replaced whole server, only ad4 is original piece of HW. On new server synchronization was much faster then in previous server (1:30 hour compared to 5 hours in previous server) - so I think it was HW problem. Now I am running stresstest with copying /usr/ports to another partition in infinite loop. I will post results later. (On bad server, test failed after about 30 minutes. On another server the test is running fine second day, so I think if disk will not fail after 1 day, problem is solved) At last - now I think this was not GEOM/gmirror related. I tried remove ad8 provider from gmirror (gm0), boot up system from gm0 with one provider (ad4) and test ad8 mounted separately - ad8 failed again. Just got another one.. Jul 25 13:30:47 elfi kernel: ad4: FAILURE - device detached Jul 25 13:30:47 elfi kernel: subdisk4: detached Jul 25 13:30:47 elfi kernel: ad4: detached Jul 25 13:30:47 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. Jul 25 13:30:47 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=46318008320, length=2048)]error = 6 Jul 25 13:30:47 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=77269614592, length=16384)]error = 6 6 days uptime when this occured... Both disks are tested with PowerMax without a single problem (same with smartctl), both SATA cables are new. So the only hwproblem that I cant rule out would be the mobo, but that is quite new too... Solutions? Try RELENG_6 as recommended earlier? Okay still on 6.1-RELEASE: FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue May 9 20:40:23 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/ src/sys/GENERIC i386 Uptime approx 12 days since last reboot for raid fix... Just got home to meet a box which doesnt respond to SSH.. monitor tells me it has crashed totaly. From /var/log/message: Aug 16 00:58:37 elfi kernel: ad4: FAILURE - device detached Aug 16 00:58:37 elfi kernel: subdisk4: detached Aug 16 00:58:37 elfi kernel: ad4: detached Aug 16 00:58:37 elfi kernel: GEOM_MIRROR: Cannot write metadata on ad4s1 (device=gm0s1, error=6). Aug 16 00:58:37 elfi kernel: GEOM_MIRROR: Cannot update metadata on disk ad4s1 (error=6). Aug 16 00:58:37 elfi last message repeated 2 times Aug 16 00:58:37 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. Aug 16 00:58:37 elfi kernel: g_vfs_done():mirror/gm0s1f[READ (offset=112910630912, length=32768)]error = 6 Aug 16 00:58:37 labdator kernel: nfs: server 192.168.1.2 not responding, still trying Aug 16 00:58:37 labdator kernel: nfs: server 192.168.1.2 OK Aug 16 03:04:21 elfi syslogd: kernel boot file is /boot/kernel/kernel Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE (offset=2325168128, length=16384)]error = 6 Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE (offset=2325184512, length=16384)]error = 6 Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE (offset=2325200896, length=16384)]error = 6 Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE (offset=2325217280, length=16384)]error = 6 Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE (offset=2325233664, length=16384)]error = 6 Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE (offset=2325250048, length=16384)]error = 6 Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE (offset=2319169536, length=2048)]error = 6 Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE (offset=2312404992, length=16384)]error = 6 Aug 16 03:04:21 elfi kernel: Copyright (c) 1992-2006 The FreeBSD Project. Aug 16 03:04:21 elfi kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Aug 16 03:04:21 elfi kernel: The Regents of the University of California. All rights reserved. Aug 16 03:04:21 elfi kernel: FreeBSD 6.1-RELEASE #3: Tue May 9 20:40:23 CEST 2006 ...(regular boot stuff)... (labdator is a box with a elfi nfs export mounted) dmesg shows me some other stuff not in messages: ad4: FAILURE - device detached subdisk4: detached ad4: detached GEOM_MIRROR: Cannot write metadata on ad4s1 (device=gm0s1, error=6). GEOM_MIRROR: Cannot update metadata on disk ad4s1 (error=6). GEOM_MIRROR: Cannot update metadata on disk ad4s1 (error=6). GEOM_MIRROR: Cannot update metadata on disk ad4s1 (error=6). GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. g_vfs_done():mirror/gm0s1f[READ(offset=112910630912, length=327
Q about gmirror's "metadata sector"
Hi If i've understood correctly gmirror uses the last sector on the provider for a metadata. If one uses http://www.onlamp.com/pub/a/bsd/2005/11/10/ FreeBSD_Basics.html to setup a gmirror'ed system, that is haveing a fully used disk where the last sector is used (right?) and converting it to a gmirror, this will overwrite whatever is on the last sector, right? This will probably not be overwritten on a non-full fs, but if the fs gets full later, is there any risk that this sector get's overwritten? Does one have to shrink the fs/slice manually or something to make sure this does not happend? I haven't seen anyone mention this anywhere so im just curious to how it works Thanks Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
FreeBSD with a Gigabyte GA-K8NSC?
Hi I'm about to get a "new" server... In this case what I'm looking at is a Gigabyte GA-K8NSC mobo with nForce3 250Gb chipset, and a AMD 64 3200+ Venice S939. Does anyone have any experience with FreeBSD (6.1) and this mobo/ chipset? Does the network work? How good? SATA? Any stability/ performance issues? I did notice it was mentioned on http://www.freebsd.org/platforms/ amd64/motherboards.html on 5.4 with the only comment "Sound and USB untested.".. So.. anyone got more detailed experience than that? Thanks :) -- Johan Ström [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"