Re: 8-STABLE/amd64 semi-regular crash with "kernel trap 12 with interrupts disabled" in "process 12 (swi4: clock)"
On 01/20/11 03:05, Lev Serebryakov wrote: Hello, Eugene. You wrote 19 января 2011 г., 12:50:25: Yes, I've missed it's PRERELEASE already. Backtrace points to the problem in em_local_timer() fixed in CURRENT 7 days ago, take a look: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/e1000/if_em.c#rev1.65 I run my servers with this commit backported manually as it has not been MFC'd yet. Ok, I'll try it (rebuilding system now). Did it help the problem? I think I saw a related panic today so I'm going to try updating past the time this was MFC'ed to 8 which I think was Sat Jan 22 01:34:08 2011 UTC (10 days, 17 hours ago). Thanks. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: about thumper aka sun fire x4500
On 01/17/12 17:09, Jeremy Chadwick wrote: On Tue, Jan 17, 2012 at 06:59:08PM +0100, peter h wrote: I have been beating on of these a few days, i have udes freebsd 9.0 and 8.2 Both fails when i engage> 10 disks, the system craches and messages : "Hyper transport sync flood" will get into the BIOS errorlog ( but nothing will come to syslog since reboot is immediate) Using a zfs radz of 25 disks and typing "zpool scrub" will bring the system down in seconds. Anyone using a x4500 that can comfirm that it works ? Or is this box broken ? I've seen what is probably the same base issue but on multiple x4100m2 systems running FreeBSD 7 or 8 a few years ago. For me the instant reboot and HT sync flood error happened when I fetched a ~200mb file via HTTP using an onboard intel nic and wrote it out to a simple zfs mirror on 2 disks. I may have tried the nvidia ethernet ports as an alternative but that driver had its own issues at the time. This was never a problem with FFS instead of ZFS. I could repeat it fairly easily by running fetch in a loop (can't remember if writing the output to disk was necessary to trigger it). The workaround I found that worked for me was to buy a cheap intel PCIE nic and use that instead of the onboard ports. If a zpool scrub triggers it for you, I doubt my workaround will help but I wanted to relate my experience. Given this above diagram, I'm sure you can figure out how "flooding" might occur. :-) I'm not sure what "sync flood" means (vs. I/O flooding). As I understand it, a sync flood is a purposeful reaction to an error condition as somewhat of a last ditch effort to regain control over the system (which ends up rebooting). I'm pulling this out of my memory from a few years ago. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Load when idl on stable
On 06/05/12 15:37, Albert Shih wrote: Le 03/06/2012 ? 23:55:06+0200, Oliver Pinter a écrit I think, this is the old thread: http://freebsd.1045724.n5.nabble.com/High-load-event-idl-td5671431.html Yes. But because I didn't find any solution, I resent the problem. The interrupt rerouting does not help? Well I've no idea what you talking but I try every solution describe in the thread you mentioned. I didn't find any solution. Regards. NB: I forget to say I'm not a developer, just sysadmin. I use Stable just for report here any problem I got. Try changing kern.eventtimer.timer: % sysctl kern.eventtimer.timer=LAPIC How to display your choices ordered by quality: % sysctl kern.eventtimer ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: How to bind a route to a network adapter and not IP
On 06/15/12 12:19, Hans Petter Selasky wrote: Hi, Maybe there is a simple answer, but how do I bind a route to a network interface in 8-stable? Is that possible at all? I'm asking because the routes I add in my network setup are lost because of ARP packet drops. I.E. they exist for a while, but not forever like I want to. --HPS Is route add x.x.x.x -iface em0 what you want? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Stale NFS file handles on 8.x amd64
I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. delivery is via procmail which doesn't touch the dovecot metadata and webmail uses imapd. Client connections to imapd go to random servers and I don't yet have solid means to keep certain users on certain servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran into Stale NFS file handles causing index/uidlist corruption causing inboxes to appear as empty when they were not. In some situations their corrupt index had to be deleted manually. I first suspected dovecot 1.2 since it was upgraded at the same time but I downgraded to 1.1 and its doing the same thing. I don't really have a wealth of details to go on yet and I usually stay quiet until I do, and half the time it is difficult to reproduce myself so I've had to put it in production to get a feel for progress. This only happens a dozen or so times per weekday but I feel the need to start taking bigger steps. I'll probably do what I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x on the remaining servers. A binary search is within possibility if I can reproduce the symptoms often enough even if I have to put a test server in production for a few hours. Any tips on where we could start looking, or alterations I could try making such as sysctls to return to older behavior? It might be worth noting that I've seen a considerable increase in traffic from my mail servers since the 8.x upgrade timeframe, on the order of 5-10x as much traffic to the NFS server. dovecot tries its hardest to flush out the access cache when needed and it was working well enough since about 1.0.16 (years ago). It seems like FreeBSD is what regressed in this scenario. dovecot 2.x is going in a different direction from my situation and I'm not ready to start testing that immediately if I can avoid it as it will involve some restructuring. Thanks for any input. For now the following errors are about all I have to go on: Nov 29 11:07:54 server1 dovecot: IMAP(user1): o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 13:19:51 server1 dovecot: IMAP(user1): o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 14:35:41 server1 dovecot: IMAP(user2): o_stream_send(/home/user2/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 15:07:05 server1 dovecot: IMAP(user3): read(mail, uid=128990) failed: Stale NFS file handle Nov 29 11:57:22 server2 dovecot: IMAP(user4): open(/egr/mail/shared/vprgs/dovecot-acl-list) failed: Stale NFS file handle Nov 29 14:04:22 server2 dovecot: IMAP(user5): o_stream_send(/home/user5/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 14:27:21 server2 dovecot: IMAP(user6): o_stream_send(/home/user6/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 15:44:38 server2 dovecot: IMAP(user7): open(/egr/mail/shared/decs/dovecot-acl-list) failed: Stale NFS file handle Nov 29 19:04:54 server2 dovecot: IMAP(user8): o_stream_send(/home/user8/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 06:32:11 server3 dovecot: IMAP(user9): open(/egr/mail/shared/cmsc/dovecot-acl-list) failed: Stale NFS file handle Nov 29 10:03:58 server3 dovecot: IMAP(user10): o_stream_send(/home/user10/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Stale NFS file handles on 8.x amd64
On 11/29/10 20:35, Chuck Swiger wrote: Hi, Adam-- On Nov 29, 2010, at 5:06 PM, Adam McDougall wrote: I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. delivery is via procmail which doesn't touch the dovecot metadata and webmail uses imapd. Client connections to imapd go to random servers and I don't yet have solid means to keep certain users on certain servers. Are you familiar with: http://wiki1.dovecot.org/NFS Basically, you're running a "try to avoid doing this" configuration, but it does discuss some options to improve the situation. If you can tolerate the performance hit, try disabling NFS attribute cache... Regards, I am familiar with that page, have taken it into account, worked closly with Timo the author of Dovecot and my mail servers have been running close enough to perfect on 7.x for years. The FreeBSD version is the the only major change that I can think of at this point other than the versions of other ports. I'm planning to revert some to 7.x to make sure. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Stale NFS file handles on 8.x amd64
On 11/30/10 02:49, Doug Barton wrote: On 11/29/2010 17:06, Adam McDougall wrote: I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. There are a whole lot more variables that I haven't seen covered yet. Are you using TCP mounts or UDP mounts? Try toggling that setting and see if your performance increases. Are you using rpc.lockd, or not? Try toggling that. What mount options are you using other than TCP/UDP? What does the network topology look like? It's very likely that we can help you here, but more information is needed. Doug I am using tcp mounts, the mounts in question are either using rw,bg,tcp,nosuid or rw,bg,tcp,noexec in fstab which is what I was using in 7.x except for intr. I'm much more concerned about the corruption rather than the performance at this point but I could try UDP when I get a chance, although I wasn't using it on 7. I am running lockd and statd and was on 7 too. I was using options NFSLOCKD on both 7 and 8. The Netapp I am accessing is on the same local /24 subnet and it does not traverse any firewalls or routers to get there. Two of the four clients are on the same switch as the NFS server, the other two clients are on a different layer 2 switch but same vlan. Today is a bit busy so I may only have time for discussion or simple non-invasive changes during the day but I'll compile a list of suggestions at least. Thanks. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Stale NFS file handles on 8.x amd64
On 11/30/10 09:33, John Baldwin wrote: On Monday, November 29, 2010 8:06:54 pm Adam McDougall wrote: I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. delivery is via procmail which doesn't touch the dovecot metadata and webmail uses imapd. Client connections to imapd go to random servers and I don't yet have solid means to keep certain users on certain servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran into Stale NFS file handles causing index/uidlist corruption causing inboxes to appear as empty when they were not. In some situations their corrupt index had to be deleted manually. I first suspected dovecot 1.2 since it was upgraded at the same time but I downgraded to 1.1 and its doing the same thing. I don't really have a wealth of details to go on yet and I usually stay quiet until I do, and half the time it is difficult to reproduce myself so I've had to put it in production to get a feel for progress. This only happens a dozen or so times per weekday but I feel the need to start taking bigger steps. I'll probably do what I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x on the remaining servers. A binary search is within possibility if I can reproduce the symptoms often enough even if I have to put a test server in production for a few hours. There were some changes to allow more concurrency in the NFS client in 8 (and 7.2+) that caused ESTALE errors to occur on open(2) more frequently. You can try setting 'vfs.lookup_shared=0' to disable the extra concurrency (but at a performance cost) as a workaround. The most recent 7.x and 8.x have some changes to open(2) to minimize ESTALE errors that I think get it back to the same level as when lookup_shared is set to 0. I tried vfs.lookup_shared=0 on two of the three already with no help (forgot what it was called or I would have mentioned it), and I also tried vfs.nfs.prime_access_cache=1 on a guess on all three but that didn't help either. I'll go through the other suggestions and see where it gets me. Thanks all for the input. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Stale NFS file handles on 8.x amd64
On 11/30/10 08:33, Rick Macklem wrote: I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. delivery is via procmail which doesn't touch the dovecot metadata and webmail uses imapd. Client connections to imapd go to random servers and I don't yet have solid means to keep certain users on certain servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran into Stale NFS file handles causing index/uidlist corruption causing inboxes to appear as empty when they were not. In some situations their corrupt index had to be deleted manually. I first suspected dovecot 1.2 since it was upgraded at the same time but I downgraded to 1.1 and its doing the same thing. I don't really have a wealth of details to go on yet and I usually stay quiet until I do, and half the time it is difficult to reproduce myself so I've had to put it in production to get a feel for progress. This only happens a dozen or so times per weekday but I feel the need to start taking bigger steps. I'll probably do what I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x on the remaining servers. A binary search is within possibility if I can reproduce the symptoms often enough even if I have to put a test server in production for a few hours. Any tips on where we could start looking, or alterations I could try making such as sysctls to return to older behavior? It might be worth noting that I've seen a considerable increase in traffic from my mail servers since the 8.x upgrade timeframe, on the order of 5-10x as much traffic to the NFS server. dovecot tries its hardest to flush out the access cache when needed and it was working well enough since about 1.0.16 (years ago). It seems like FreeBSD is what regressed in this scenario. dovecot 2.x is going in a different direction from my situation and I'm not ready to start testing that immediately if I can avoid it as it will involve some restructuring. Thanks for any input. For now the following errors are about all I have to go on: Nov 29 11:07:54 server1 dovecot: IMAP(user1): o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 13:19:51 server1 dovecot: IMAP(user1): o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 14:35:41 server1 dovecot: IMAP(user2): o_stream_send(/home/user2/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 15:07:05 server1 dovecot: IMAP(user3): read(mail, uid=128990) failed: Stale NFS file handle Nov 29 11:57:22 server2 dovecot: IMAP(user4): open(/egr/mail/shared/vprgs/dovecot-acl-list) failed: Stale NFS file handle Nov 29 14:04:22 server2 dovecot: IMAP(user5): o_stream_send(/home/user5/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 14:27:21 server2 dovecot: IMAP(user6): o_stream_send(/home/user6/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 15:44:38 server2 dovecot: IMAP(user7): open(/egr/mail/shared/decs/dovecot-acl-list) failed: Stale NFS file handle Nov 29 19:04:54 server2 dovecot: IMAP(user8): o_stream_send(/home/user8/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Nov 29 06:32:11 server3 dovecot: IMAP(user9): open(/egr/mail/shared/cmsc/dovecot-acl-list) failed: Stale NFS file handle Nov 29 10:03:58 server3 dovecot: IMAP(user10): o_stream_send(/home/user10/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Others have made good suggestions. One more you could try is disabling the negative name caching by setting the option "negnametimeo=0". The addition of negative name caching is also in FreeBSD7, but it is a fairly recent change, so your FreeBSD7 boxes may not have had it. I also think trying the "dot-locking" and running without statd and lockd (you can mount with the "nolock" option) would be worth trying. And, of course, disabling attribute caching is mentioned on the web page others cited. Good luck with it, rick ps: Unfortunately the NFS protocol cannot support for POSIX file system semantics, so some apps can never run correctly on NFS mounted volumes. NFSv4 comes closer, but it still can't provide full POSIX semantics. I'll give negnametimeo=0 a try on one server starting tonight, I'll be busy tomorrow and don't want to risk making anything potentially worse than it is yet. I can't figure out how to disable the attr cache in FreeBSD. Neither suggestions seem to be valid, and years ago when I looked into it I got the impression that you can't, but I'd love to be proven wrong. I'll try dotlock when I can. Would disabling statd and lockd be the same as using nolock on all mounts? The vacation binary is the
Re: Stale NFS file handles on 8.x amd64
On Wed, Dec 01, 2010 at 06:33:06PM -0500, Rick Macklem wrote: > > I'll give negnametimeo=0 a try on one server starting tonight, I'll be > busy tomorrow and don't want to risk making anything potentially worse > than it is yet. I can't figure out how to disable the attr cache in > FreeBSD. Neither suggestions seem to be valid, and years ago when I > looked into it I got the impression that you can't, but I'd love to be > proven wrong. I just looked and, yea, you are correct, in that the cached attributes are still used while NMODIFIED is set if the mtime isn't within the current second. (I'm not going to veture a guess as to why this is done at this time:-) But, "acregmon=0,acregmax=0,acdirmin=0,acdirmax=0" looks like it comes close, from a quick inspection of the code. I haven't tested this. You do have to set both min and max == 0, or max just gets set to min instead of 0. The *dir* ones apply to directories and the *reg* ones otherwise. Ok. I'm applying it right now (with s/mon/min/) > I'll try dotlock when I can. Would disabling statd and > lockd be the same as using nolock on all mounts? Nope. If you kill off lockd and statd without using the "nolock" option, I think all file lock operations will fail with ENOTSUPPORTED whereas when you mount with "nolock", the lock ops will be done locally in the client (ie seen by other processes in the same client, but not by other clients). This last go around I told dovecot to use dotlock for more things, not just Maildir files (apparently) and it apparently made it stop using fcntl because it no longer complained when I temporarily stop lockd. It did not help the Stale files through, nor did turning off lockd, so I'm still hunting. > The vacation binary > is > the only thing I can think of that might use it, not sure how well it > would like missing it which is how I discovered I needed it in the > first > place. Also, if disabling lockd shows an improvement, could it lead to > further investigation or is it just a workaround? Well, it's a work around in the sense that you are avoiding the NLM and NSM protocols. These are fundamentally flawed protocol designs imho, but some folks find that they work ok for them. Imho, the two big flaws are: 1 - Allowing a blocking lock in the server. Then what happens if the client is network partitioned when the server finally acquires the lock for the client? (NFSv4 only allows the server to block for a very short time before it replies. The client must "poll" until the lock is available, if the client app. allows blocking. In other works, the client does the blocking.) 2 - It depends upon the NSM to decide if a node is up/down. I'm not sure what the NSM actually does, but it's along the lines of an IP broadcast to see if the other host(s) are responding and then sets up/down based on how recently it saw a message from a given host. (NFSv4 requires that the server recognize a lock request where the client had state that predates this boot and reply with an error that tells the client to recover its lock state using special variants of the lock ops. Imho, this does a much better job of making sure the server and clients maintain a consistent set of lock state. The server may throw away lock state for an NFSv4 client if it hasn't renewed the state within a lease time and then the client will be given an "expired" error to tell it that it has lost locks. This should only happen when a network partitioning exceeds the lease duration and, in the case of the FreeBSD NFSv4 server, it has also received a conflicting lock request from another client whose lease has not expired.) I always appreciate learning more. I have a fairly strict firewall in place and don't see any denies related to rpc/nfs but I'll keep my eyes open. I'm presently allowing all traffic between the client and the nfs server on all ports, tcp and udp. The firewall was put in place after the problems started, as part of a different project. Probably a lot more glop than you expected, but I couldn't resist a chance to put in a plug for NFSv4 file locking. Btw, you could try NFSv4 mounts, since Netapp and the experimental FreeBSD8 client both support them. I'll keep that in mind too, I tried it a while back before the revamp. >From a firewall standpoint (all around), I would love to get rid of v3 when the time comes. Although, I would probably invest my time into testing dovecot v2 where the author tries to keep users on their own server anyway. Either would probably be a bigger project. I have some solaris clients that are too smart for their own good for example and broke when I enabled v4 on the netapp in the past (permission issues). Good lock (oh, I meant luck:-) with it, rick HA ha :) ___
Re: Stale NFS file handles on 8.x amd64
On 12/04/10 18:39, Adam McDougall wrote: On Wed, Dec 01, 2010 at 06:33:06PM -0500, Rick Macklem (and others) wrote: (various suggestions) I had to call off the experimentation with live users and resorted to running all IMAP connections from a single NFS client for now. I was causing a little too much of a stir. Here are the things I can remember trying: negnametimeo=0 (no perceived difference) acregmin=0,acregmax=0,acdirmin=0,acdirmax=0 (slowed down imap but otherwise no perceived difference) stopped statd and lockd after changing dovecot to use dotlock (no perceived difference) vfs.lookup_shared=0 (no perceived difference) dovecot 1.1 (no better than 1.2 regarding this issue) Did not try (yet?): udp, nfsv4, freebsd 7 I feel compelled to keep the users happy for now while I look into the nfs director included with dovecot 2.0 designed to keep individual IMAP users on their own server, so I'm taking a stab at the recommended usage with newer software rather than more debugging with previous versions. I may return to other tactics if the future path slows down, I'm still curious why things changed. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: fxp performance with POLLING
Bartosz Stec wrote: BTW overall SAMBA performance still sucks on 7.1-pre as much as on RELENG_5 ...:( - 7.5 MB/s peak. 7.5MB is 75% effeciency of a 100mbit card. Not amazing, but not "sucks". Where do you see faster performance? Between windows machines on the same hardware or linux server? It sucks because it is a peak performance. About 5-6 MB/s average. I tried polling only because I found some suggestions on mailing lists, that it could improve performance with SAMBA on FreeBSD. As you see at the top of this thread - not in my case :) I also tried sysctl tunings, and smb.conf settings, also suggested on maling lists, with no or very little improvements noticed. Most of suggestions unfortunately end with "change OS to Linux if you want to use SAMBA". I think I will try to change NIC to 1Gbit - hope that helps :) Or maybe there's some "FreeBSD and SAMBA tuning guide" which I didn't found? Please try experimenting with "socket options" in smb.conf, I've found that some tuning is desirable on any OS with Samba, but these are the values that worked best for me with Windows XP clients in mind. Win2003 clients seemed much faster without tuning (same base code as XP 64bit) and I suspect it has a different SMB implementation. I'd suggest starting with "socket options = TCP_NODELAY IPTOS_THROUGHPUT SO_RCVBUF=8192 SO_SNDBUF=8192" and if you aren't satisfied, experiment with the numbers and which options are enabled. Be sure that the client has been disconnected from Samba completely to make sure you are testing the values in the config file. I'm pretty sure with these tunings I was able to get closer to 10MB/sec on 100Mbit, which satisfies me for the average user. # Most people will find that this option gives better performance. # See smb.conf(5) and /usr/share/doc/samba-doc/htmldocs/speed.html # for details # You may want to add the following on a Linux system: # SO_RCVBUF=8192 SO_SNDBUF=8192 # socket options = TCP_NODELAY # For some reason, 8192 is pretty fast on a XP lab 100Mb client. Other sizes tested and dissapointing in that situation. Windows Server 2k3 on gig is much faster, and likes larger values. There might be some merit in testing 49152 in some situations. (20080617) # TCP_NODELAY makes a huge improvement. IPTOS_THROUGHPUT is negligible locally. # mcdouga9 20070110 socket options = TCP_NODELAY IPTOS_THROUGHPUT SO_RCVBUF=8192 SO_SNDBUF=8192 # socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: am2 MBs - 4g + SCSI wipes out root partition
Jeremy Chadwick wrote: On Sat, Oct 11, 2008 at 04:45:29PM +0200, Gary Jennejohn wrote: On Sat, 11 Oct 2008 03:13:16 -0700 Jeremy Chadwick <[EMAIL PROTECTED]> wrote: On Sat, Oct 11, 2008 at 11:30:57AM +0200, Gary Jennejohn wrote: On Fri, 10 Oct 2008 14:29:37 -0300 JoaoBR <[EMAIL PROTECTED]> wrote: I tried MBs as Asus, Abit and Gigabyte all same result Same hardware with SATA works perfect Same hardware with scsi up to 3.5Gigs installed works perfect what calls my attention that all this MBs do not have the memroy hole remapping feature so the complete 4gigs are available what normally was not the case with amd64 Mbs for the Athlon 64 CPUs some has an opinion if this is a freebsd issue or MB falure or scsi drv problem? It's a driver problem. If you want to use SCSI then you'll have to limit memory to 3.5 GB. What you're saying is that Adaptec and LSI Logic SCSI controllers behave badly (and can cause data loss) on amd64 systems which contain more than 3.5GB of RAM. This is a very big claim. Have you talked to Scott Long about this? Please expand on this, and provide evidence or references. I need to document this in my Wiki if it is indeed true. See the freebsd-scsi thread with Subject "data corruption with ahc driver and 4GB of memory using a FBSD-8 64-bit installation?" from Wed, 30 Jan 2008. This was for ahc, but the bit-rot which Scott mentions in his reply might also apply to the LSI Logic controllers. Basically the driver doesn't correctly handle DMA above 4GB. Since the PCI hole gets mapped above 4GB it causes problems. the (S)ATA drivers don't seem to have this problem. Thank you -- this is the exact information I was looking for. I will update my Wiki page to reflect this quite major problem. I am using some LSI (mpt driver) ultra4 (U320 scsi) and LSI SAS controllers in FreeBSD 7.x amd64 with 20G of ram, and Adaptec (aac driver) with a 5th generation RAID card with 8G of ram, both have no such corruption problems. Providing this as a counter-example just to document some evidence of which products seem to work fine. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: am2 MBs - 4g + SCSI wipes out root partition
Jeremy Chadwick wrote: On Sat, Oct 11, 2008 at 12:26:29PM -0400, Adam McDougall wrote: Jeremy Chadwick wrote: On Sat, Oct 11, 2008 at 04:45:29PM +0200, Gary Jennejohn wrote: On Sat, 11 Oct 2008 03:13:16 -0700 Jeremy Chadwick <[EMAIL PROTECTED]> wrote: On Sat, Oct 11, 2008 at 11:30:57AM +0200, Gary Jennejohn wrote: On Fri, 10 Oct 2008 14:29:37 -0300 JoaoBR <[EMAIL PROTECTED]> wrote: I tried MBs as Asus, Abit and Gigabyte all same result Same hardware with SATA works perfect Same hardware with scsi up to 3.5Gigs installed works perfect what calls my attention that all this MBs do not have the memroy hole remapping feature so the complete 4gigs are available what normally was not the case with amd64 Mbs for the Athlon 64 CPUs some has an opinion if this is a freebsd issue or MB falure or scsi drv problem? It's a driver problem. If you want to use SCSI then you'll have to limit memory to 3.5 GB. What you're saying is that Adaptec and LSI Logic SCSI controllers behave badly (and can cause data loss) on amd64 systems which contain more than 3.5GB of RAM. This is a very big claim. Have you talked to Scott Long about this? Please expand on this, and provide evidence or references. I need to document this in my Wiki if it is indeed true. See the freebsd-scsi thread with Subject "data corruption with ahc driver and 4GB of memory using a FBSD-8 64-bit installation?" from Wed, 30 Jan 2008. This was for ahc, but the bit-rot which Scott mentions in his reply might also apply to the LSI Logic controllers. Basically the driver doesn't correctly handle DMA above 4GB. Since the PCI hole gets mapped above 4GB it causes problems. the (S)ATA drivers don't seem to have this problem. Thank you -- this is the exact information I was looking for. I will update my Wiki page to reflect this quite major problem. I am using some LSI (mpt driver) ultra4 (U320 scsi) and LSI SAS controllers in FreeBSD 7.x amd64 with 20G of ram, and Adaptec (aac driver) with a 5th generation RAID card with 8G of ram, both have no such corruption problems. Providing this as a counter-example just to document some evidence of which products seem to work fine. Is your LSI SAS controller driven by mpt(4) or mfi(4)? Let's break down what we know for sure at this point: aac(4) - not affected aha(4) - unknown ahb(4) - unknown ahc(4) - affected ahd(4) - unknown; no one answered the OP's question in the thread asr(4) - unknown ips(4) - unknown mpt(4) - not affected mfi(4) - unknown sym(4) - unknown Could the problem be specific to certain firmware revisions on the cards? Also adding Scott Long to the CC list. All the LSI I reported is driven by mpt, I have no mfi devices. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Disk top usage PIDs
Eduardo Meyer wrote: Hello, I have some serious issue. Sometimes something happens and my disk usage performance find its limit quickly. I follow with gstat and iostat -xw1, and everything usually happens just fine, with %b around 20 and 0 to 1 pending i/o request. Suddely I get 30, 40 pending requests and %b is always on 100% (or more than this). fstat and lsof gives me no hint, because the type of programs as well as the amount of 'em is just the same. How can I find the PID which is hammering my disk? Is there an "iotop" or "disktop" tool or something alike? Its a mail server. I have pop3, imap, I also have maildrop and sometimes, httpd, working around the busiest mount point. I have also started AUDIT, however all I can get are the top PIDs which issue read/write requests. Not the requests which take longer to perform (the busiest ones), or should I look for some special audit class or event other than open, read and write? Thank you in advance. top -mio ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Fwd: aac0 issues, Controller is no longer running
Forwarding to list because mail to ema...@freebsd.org bounced at the location it is apparently being forwarded to so I'm not sure if it was received. I'll add that the problem seems to follow a pattern where it happens only twice after a reboot (usually in the morning when rsync runs) and then its fine until the next intentional reboot. --- Begin Message --- Ed, I've been having issues with the aac0 driver with a Sun STK raid card ( Adaptec 5805 ). After FBSD upgrades and a reboot, it will lock up when rsync backups run at night. After doing this about 2-3 times over 2-3 days, it will stabilize and be rock solid. If you could assist me in maybe debugging it somehow, I would greatly appreciate it. Info: Controller Status: Optimal Channel description : SAS/SATA Controller Model : Sun STK RAID INT Controller Serial Number : 00809AA0182 Physical Slot: 48 Temperature : 64 C/ 147 F (Normal) Installed memory : 256 MB BIOS : 5.2-0 (15583) Firmware : 5.2-0 (15583) Driver : 5.2-0 (15583) Boot Flash : 5.2-0 (15583) Raid5+HSP setup with 8 SAS disks. Code here: code = AAC_GET_FWSTATUS(sc); if (code != AAC_UP_AND_RUNNING) { device_printf(sc->aac_dev, "WARNING! Controller is no " "longer running! code= 0x%x\n", code); And one that mentions: "COMMAND %p TIMEOUT AFTER %d SECONDS\n", TIA, -- Bryan G. Seitz --- End Message --- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Fwd: aac0 issues, Controller is no longer running
Xin LI wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Adam, Adam McDougall wrote: [...] I've been having issues with the aac0 driver with a Sun STK raid card ( Adaptec 5805 ). After FBSD upgrades and a reboot, it will lock up when rsync backups run at night. After doing this about 2-3 times over 2-3 days, it will stabilize and be rock solid. If you could assist me in maybe debugging it somehow, I would greatly appreciate it. Which release are you using in the past (seems you don't have problem with that release, if I understand correctly), and which release are you upgrading to? That information would help us to narrow down the problem. Cheers, - -- Xin LI http://www.delphij.net/ We've been tracking 7-stable since the system was installed about half a year ago. This issue has occurred every few months whenever we update to a more recent -stable. I haven't seen any recent changes to aac in the last few months; this last upgrade was purely compulsory. This is the first time I can recall where it has hung up 3 nights in a row now after the upgrade, usually its just 2. It seems that it hung up last night about 3am or possibly earlier, judging based on when remote network connections started timing out. I'm guessing it was tickled by the nightly periodic find scripts. Thanks. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2 dies in zfs
On Sat, Nov 21, 2009 at 11:36:43AM -0800, Jeremy Chadwick wrote: On Sat, Nov 21, 2009 at 08:07:40PM +0100, Johan Hendriks wrote: > Randy Bush wrote: > > imiho, zfs can not be called production ready if it crashes if you > > do not stand on your left leg, put your right hand in the air, and > > burn some eye of newt. > > This is not a rant, but where do you read that on FreeBSD 7.2 ZFS has > been marked as production ready. > As far as i know, on FreeBSD 8.0 ZFS is called production ready. > > If you boot your system it probably tell you it is still experimental. > > Try running FreeBSD 7-Stable to get the latest ZFS version which on > FreeBSD is 13 > On 7.2 it is still at 6 (if I remember it right). RELENG_7 uses ZFS v13, RELENG_8 uses ZFS v18. RELENG_7 and RELENG_8 both, more or less, behave the same way with regards to ZFS. Both panic on kmem exhaustion. No one has answered my question as far as what's needed to stabilise ZFS on either 7.x or 8.x. I have a stable public ftp/http/rsync/cvsupd mirror that runs ZFS v13. It has been stable since mid may. I have not had a kmem panic on any of my ZFS systems for a long time, its a matter of making sure there is enough kmem at boot (not depending on kmem_size_max) and that it is big enough that fragmentation does not cause a premature allocation failure due to lack of large-enough contiguous chunk. This requires the platform to support a kmem size that is "big enough"... i386 can barely muster 1.6G and sometimes that might not be enough. I'm pretty sure all of my currently existing ZFS systems are amd64 where the kmem can now be huge. On the busy fileserver with 20 gigs of ram running FreeBSD 8.0-RC2 #21: Tue Oct 27 21:45:41 EDT 2009, I currently have: vfs.zfs.arc_max=16384M vfs.zfs.arc_min=4096M vm.kmem_size=18G The arc settings here are to try to encourage it to favor the arc cache instead of whatever else Inactive memory in 'top' contains. On other systems that are hit less hard, I simply set: vm.kmem_size="20G" I even do this on systems with much less ram, it doesn't seem to matter except it works, this is on an amd64 with only 8G. Most of my ZFS systems are 7.2-stable, some are 8.0-something. Anything with v13 is much better than v6, but 8.0 has additional fixes that have not been backported to 7 yet. I don't consider the additional fixes in 8 required for my uses yet, although I'm planning on moving forward eventually. I would consider 2G kmem a realistic minimum on a system that will see some serious disk IO (regardless of how much ram the system actually contains, as long as the kmem size can be set that big and the system not blow chunks). Hope this personal experience helps. The people who need to answer the question are those who are familiar with the code. Specifically: Kip Macy, Pawel Jakub Dawidek, and anyone else who knows the internals. Everyone else in the user community is simply guessing + going crazy trying to figure out a solution. As much as I appreciate all the work that has been done to bring ZFS to FreeBSD -- and I do mean that! -- we need answers at this point. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2 dies in zfs
On Sun, Nov 22, 2009 at 10:00:03AM +0100, Svein Skogen (listmail account) wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Adam McDougall wrote: > On Sat, Nov 21, 2009 at 11:36:43AM -0800, Jeremy Chadwick wrote: > > > On Sat, Nov 21, 2009 at 08:07:40PM +0100, Johan Hendriks wrote: > > Randy Bush wrote: > > > imiho, zfs can not be called production ready if it crashes if you > > > do not stand on your left leg, put your right hand in the air, and > > > burn some eye of newt. > > > > This is not a rant, but where do you read that on FreeBSD 7.2 ZFS has > > been marked as production ready. > > As far as i know, on FreeBSD 8.0 ZFS is called production ready. > > > > If you boot your system it probably tell you it is still experimental. > > > > Try running FreeBSD 7-Stable to get the latest ZFS version which on > > FreeBSD is 13 > > On 7.2 it is still at 6 (if I remember it right). > > RELENG_7 uses ZFS v13, RELENG_8 uses ZFS v18. > > RELENG_7 and RELENG_8 both, more or less, behave the same way with > regards to ZFS. Both panic on kmem exhaustion. No one has answered my > question as far as what's needed to stabilise ZFS on either 7.x or 8.x. > > I have a stable public ftp/http/rsync/cvsupd mirror that runs ZFS v13. > It has been stable since mid may. I have not had a kmem panic on any > of my ZFS systems for a long time, its a matter of making sure there is > enough kmem at boot (not depending on kmem_size_max) and that it is big enough > that fragmentation does not cause a premature allocation failure due to lack > of large-enough contiguous chunk. This requires the platform to support a > kmem size that is "big enough"... i386 can barely muster 1.6G and sometimes > that might not be enough. I'm pretty sure all of my currently existing ZFS > systems are amd64 where the kmem can now be huge. On the busy fileserver with > 20 gigs of ram running FreeBSD 8.0-RC2 #21: Tue Oct 27 21:45:41 EDT 2009, > I currently have: > vfs.zfs.arc_max=16384M > vfs.zfs.arc_min=4096M > vm.kmem_size=18G > The arc settings here are to try to encourage it to favor the arc cache > instead of whatever else Inactive memory in 'top' contains. Very interesting. For my iscsi backend (running istgt from ports), I had to change the arc_max below 128M to stop iSCSI initiators generating timeouts when the cache flushed. (This is on a system with a megaraid 8308ELP handling the disk back end, with the disks in two RAID5 arrays of four disks each, zpooled as one big pool). When I had more than 128M arc_max, zfs on regular times ate all available resources to flush to disk, leaving the istgt waiting, and iSCSI initiators timed out and had to reconnect. The iSCSI initiators are the built-in software initator in VMWare ESX 4i. //Svein I could understand that happening. I've seen situations in the past where my kmem was smaller than I wanted it to be, and within a few days the overall ZFS disk IO would become incredibly slow because it was trying to flush out the ARC way too often because of external intense memory pressure on the ARC. Assuming you have a large amount of ram, I wonder if setting kmem_size, arc_min and arc_max sufficiently large and using modern code would help as long as you made sure other processes on the machine don't squeeze down Wired memory in top too much. In such a situation, I would expect it to operate fine while the ARC has enough kmem to expand as much as it wants to, and it might either hit a wall later or perhaps given enough ARC the reclamation might be tolerable. Or, if 128M ARC is good enough for you, leave it :) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nscd again (nis client cache)
I wanted to say Thanks!!! for this example, because before this point I was under the impression that nscd/cached was of no use for NIS clients, only LDAP or maybe other directory systems that I don't use. I tried "cache compat" as below for passwd and group and it works! Our NIS entries at work are big enough that without the cache, top takes 7+ seconds to open, ssh login takes a few seconds, and samba logins were concerningly slow. I did not try samba connections, but the other methods are much faster now on the second run. Wanted to post this for the archive too. On Sat, Jan 19, 2008 at 02:17:11PM +0300, Michael Bushkov wrote: Hi Denis, Several things: 1. You definitely can't use cache for *_compat sources. I mean lines like "group_compat: cache nis" aren't supported. 2. Cache should work ok with the configuration you've mentioned in your first example, i.e.: "group: cache compat". Just checking - why do you think that cache isn't working? The correct way to determine it is to perform the same query twice. During the first pass (when query is not cached), the request will be processed by NIS module and you'll have all the NIS-related stuff in the logs. On the second pass the request should be handled by scd module - and you shouldn't see any activity in NIS logs. It would be great to see the debug log (with nscd log turned on) separately - for the first and the second pass. It would help to find the error in nscd, if there is one. With best regards, Michael Bushkov On Jan 17, 2008, at 9:55 PM, Denis Barov wrote: >> Hello! >> >> I found some strange behaviour of NIS/nscd when NIS in compat mode. In >> /etc/nsswitch.conf I have: >> >> netgroup: cache compat >> passwd: cache compat >> group:cache compat ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: gjournal panic 7.0-RC1
Ivan Voras wrote: Chris wrote: Came back to see box had rebooted itself from a journal related panic. panic: Journal overflow (joffset=49905408 active=499691355136 inactive=4990$ cpuid = 0 AFAIK this means that the journal is too small for your machine - try doubling it until there are no more panics. If so, this is the same class of errors as ZFS (some would call it "tuning errors"), only this time the space reserved for the on-disk journal is too small, and the fast drives fill it up before data can be transfered from the journal to the data area. I did some experimentation with gjournal a few weeks ago to determine how I might partition a new server, as well as how large to make my journals and where. I did find that for the computers I have tested so far, a 1 gig (default size) journal seems to be sufficient, but half of that or less is asking for trouble and I could not find any workarounds to reduce the chances of panic when I was already stuck with a too-small journal I created a while ago. I also found the -s parameter is vague in that it does not say what units it accepts (appears to be bytes) and I *could not* get it to make a journal inside a data partition any bigger than somewhere around 1.7 gigs. Some values of -s seemed to wrap around to a smaller number, while other values gave errors about being too small (when they weren't) or invalid size. The only way I could force a journal size 2G or larger was to make a separate partition for journal. On the server I was setting up, I decided to make my (journaled) data partitions da0s1d,e,f and the journals da0s2d,e,f. I'm just getting this out there to the list because I don't have time to debug it further. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
How to debug or explain slow gjournal writes?
I am evaluating using gjournal on my servers. This one test system is running 7.0-RELEASE at the moment on a Dell PE2650 with dual 2ghz xeon, ahc0: , and some seagate 36g 10k disks. I had the opportunity to try placing the journal consumer device on a dedicated disk. Whether or not I use a separate journal consumer device from the data consumer, I can expect the overall write speed to be slower because it is effectively writing everything twice. However when I watch it write just to the journal consumer device, the MB/sec it writes to the journal consumer device never seems to exceed around 50% of the speed that the disk is capable of. Its almost as if it is competing with IO being written to the provider, but that might be a coincidence. When I see it flush the journal consumer to the data consumer, it seems to read from a memory cache and write speed to the data consumer is full speed. Comparing with a more modern system with a mpt0: Adapter> and two FUJITSU MAY2073RCSUN72G 0401 in a gmirror, I see journal write speed approx 40M/sec but data write speed of approx 45M/sec, which is a lot closer together, but still shows a difference. I'm fairly sure a single drive on its own could transfer faster than that, but I haven't tested it recently. I could if it would be helpful. Example with data,journal together just so I can put the terminology together with the device listing below: Writing to da2.journal shows input at approx 35M/sec and writes to the da2 journal at approx 35M/sec for a few seconds. Then the journal switches, input speed to da2.journal drops to 0, and apparently an in-memory journal gets copied to the data consumer on da2 at 65M/sec. Then it goes back to 35M/sec in and 35M/sec to the journal, repeating as expected. It seems to avoid writing to the data and journal at the same time, which is probably intentional to avoid head thrashing, especially if the journal fits in memory. Due to the slow journal write speed, and the expected double-writing of data, I only see a resulting write speed to da2.journal of approximately 16-22MB/sec. There is no improvement in the journal consumer write speed if I put it on a separate disk. The overall write speed is also the same. Additionally, when I have the journal on a gmirror (the data is too), the journal write speed is variable between 8 and 35MB/sec, it fluxuates pretty wildly second to second. As soon as I deactivate one side of the mirror, or get rid of the mirror under the journal, it is a consistent 35MB/sec. I'm looking for input on debugging, tuning, questions, bonehead errors, etc because I would like to get the most out of this setup if possible and not just settle for an inconsistent 16-22MB/sec. Thanks. Geom name: gjournal 170802896 ID: 170802896 Providers: 1. Name: da2.journal Mediasize: 35346332672 (33G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: da2 Mediasize: 36420075008 (34G) Sectorsize: 512 Mode: r1w1e1 Jend: 36420074496 Jstart: 35346332672 Role: Data,Journal ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: How to debug or explain slow gjournal writes?
On Thu, Mar 13, 2008 at 05:57:41PM -0400, Adam McDougall wrote: I am evaluating using gjournal on my servers. This one test system is running 7.0-RELEASE at the moment on a Dell PE2650 with dual 2ghz xeon, ahc0: , and some seagate 36g 10k Additionally, when I have the journal on a gmirror (the data is too), the journal write speed is variable between 8 and 35MB/sec, it fluxuates pretty wildly second to second. As soon as I deactivate one side of the mirror, or get rid of the mirror under the journal, it is a consistent 35MB/sec. You know, the more I probe, the more it seems to unravel to an issue I have seen on at least 3 FreeBSD 7.x systems so far: SOMETIMES writes have inconsistent speed if they have been used in some sort of raid. Key words are "sometimes" and "have been". It seems like a disk is more likely to misbehave or just be consistently slow if it used to be in a raid like a gmirror. Sometimes I could deactivate a disk from a mirror and use it independantly (like for a journal) but it would stay slow. Other times, with a more fresh setup, it would perform fine. - On my brand new desktop, I setup a zfs mirror, my writes went into the toilet. I could hear the writes getting interrupted every few seconds while the disk heads seeked. I deactivated the mirror, and the writes would go to the first disk with full speed again. And when I formatted the second disk with whatever FS I wanted, it would give me full speed. But mirror them, and performance would drop back down by several factors. Somehow I avoided this by switching my desktop install to amd64 freebsd and re-setting up the mirror with zfs, but I think that was chance. - On a friend's home server, sometimes when I would setup a multidisk raid (using different zfs methods, or geom raids), I would run gstat and watch the performance bounce all around between the disks in the raid. It would continue to bounce between 8M/sec and something more reasonable for his ata disks like 40M/sec. If I still had the opportunity to experiment with that again, I would try to reproduce it and look at it more closely with gstat -I 10 or so. I bet it was completely stalling out when I saw it be slow on a one-second average. I found a combination of sata controllers that gave fair and consistent (but not impressive) performance and called it good because it exceeded what his 1Gbit network could likely push or pull to it for a file server. - On this Dell 2650, I keep setting up different zfs mirrors, raidz, gmirror, gstripe, and sometimes the write speed is fine, sometimes it keeps bouncing around or stalling out for brief periods. Not always every drive at the same time. Its too loud in the server room to hear if this one is seeking while its slow. I can never see anything obvious in top -S, gstat, systat -vmstat that might cause a spike. I know the scsi bus on this one is capped at 160 so I don't expect to see 3-4 drives run at the same time and get full performance. But ALWAYS if you use dd to write directly to a drive, even multiple drives, or put a filesystem on a single drive, write speed was fine. I'm looking for input on debugging, tuning, questions, bonehead errors, etc because I would like to get the most out of this setup if possible and not just settle for an inconsistent 16-22MB/sec. Thanks. Geom name: gjournal 170802896 ID: 170802896 Providers: 1. Name: da2.journal Mediasize: 35346332672 (33G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: da2 Mediasize: 36420075008 (34G) Sectorsize: 512 Mode: r1w1e1 Jend: 36420074496 Jstart: 35346332672 Role: Data,Journal ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ubsa speed limit
Dominic Fandrey wrote: When I download a single file it seems that the download speed is limited to 32k (raw data as shown by netstat). Under Windows I can reach values around 60k. I can achieve more throughput (though not as much as under Windows), when downloading several files at once. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" Try this hack, its using concepts I gathered from people patching the Linux driver, basically increasing the block size of transfers. Edit ubsa.c and recompile/reload the ubsa driver. Its located around line 362. Basically replace UGETW(ed->wMaxPacketSize); or UGETW(ed->wMaxPacketSize); with 2048. I think the default is 512 and you can play with different values to evaluate its effect on speed. I realized a large performance boost from 2048, I think at least 80k/sec transfer rate. } else if (UE_GET_DIR(ed->bEndpointAddress) == UE_DIR_IN && UE_GET_XFERTYPE(ed->bmAttributes) == UE_BULK) { ucom->sc_bulkin_no = ed->bEndpointAddress; - ucom->sc_ibufsize = UGETW(ed->wMaxPacketSize); + ucom->sc_ibufsize = 2048; + // ucom->sc_ibufsize = UGETW(ed->wMaxPacketSize); } else if (UE_GET_DIR(ed->bEndpointAddress) == UE_DIR_OUT && UE_GET_XFERTYPE(ed->bmAttributes) == UE_BULK) { ucom->sc_bulkout_no = ed->bEndpointAddress; - ucom->sc_obufsize = UGETW(ed->wMaxPacketSize); + ucom->sc_obufsize = 2048; + // ucom->sc_obufsize = UGETW(ed->wMaxPacketSize); } } ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: good/best practices for gmirror and gjournal on a pair of disks?
George Hartzell wrote: I've been running many of my systems for some time now using gmirror on a pair of identical disks, as described by Ralf at: http://people.freebsd.org/~rse/mirror/ Each disk has single slice that covers almost all of the disk. These slices are combined into the gmirror device (gm0), which is then carved up by bsdlabel into gm0a (/), gm0b (swap), gm0d (/var), gm0e (/tmp), and gm0f (/usr). My latest machine is using Seagate 1TB disks so I thought I should add gjournal to the mix to avoid ugly fsck's if/when the machine doesn't shut down cleanly. I ended up just creating a gm0f.journal and using it for /usr, which basically seems to be working. I'm left with a couple of questions though: - I've read in the gjournal man page that when it is "... configured on top of gmirror(8) or graid3(8) providers, it also keeps them in a consistent state..." I've been trying to figure out if this simply falls out of how gjournal works or if there's explicity collusion with gmirror/graid3 but can't come up with a satisfactory explanation. Can someone walk me through it? Since I'm only gjournal'ing a portion of the underlying gmirror device I assume that I don't get this benefit? - I've also read in the gjournal man page "... that sync(2) and fsync(2) system calls do not work as expected anymore." Does this invalidate any of the assumptions made by various database packages such as postgresql, sqlite, berkeley db, etc about if/when/whether their data is safely on the disk? - What's the cleanest gjournal adaptation of rse's two-disk-mirror-everything setup that would be able to avoid tedious gmirror sync's. The best I've come up with is to do two slices per disk, combine the slices into a pair of gmirror devices, bsdlabel the first into gm0a (/), gm0b (swap), gm0d (/var) and gm0e (/tmp) and bsdlabel the second into a gm1f which gets a gjournal device. Alternatively, would it work and/or make sense to give each disk a single slice, combine them into a gmirror, put a gjournal on top of that, then use bsdlabel to slice it up into partitions? Is anyone using gjournal and gmirror for all of the system on a pair of disks in some other configuration? Thanks, g. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" I am pasting below the instructions I would use to convert a recently installed system with only / (root) and swap to be using gmirror+gjournal. It is in mediawiki markup format so it could be pasted into one if desired. I based my gmirror steps on the instructions from http://people.freebsd.org/~rse/mirror/ so thats why some of the words sound familiar. I also have similar instructions for setting up a gmirrored da0s1a and da0s1b alongside a zfs mirror containing the rest. I decided to journal /usr /var /tmp and leave / as a standard UFS partition because it is so small, fsck doesn't take long anyway and hopefully doesn't get written to enough to cause damage by an abrupt reboot. Because I'm not journaling the root partition, I chose to ignore the possibility of gjournal marking the mirror clean. Sudden reboots don't happen enough on servers for me to care. And all my servers got abruptly rebooted this sunday and they all came up fine :) I believe gjournal uses 1G for journal (2x512) which seemed to be sufficient on all of the systems where I have used the default, but I quickly found that using a smaller journal is a bad idea and leads to panics that I was unable to avoid with tuning. Considering 1G was such a close value, I chose to go several times above the default journal size (disk is cheap and I want to be sure) but I ran into problems using gjournal label -s (size) rejecting my sizes or wrapping the value around to something too low. As a workaround I chose to use a separate partition for each journal. I quickly ran out of partitions in a bsd disklabel so I decided to partition each disk into two slices; the first for data and the second for journals. This also made it easier to line up disk devices so they made more sense as a pair, for example: gm0s1d(data) + gm0s2d(journal) = /usr. I will note that if you accidentally put a gjournal label in the 'wrong' spot on your disk, you might make a tough situation for yourself getting rid of it. I have had plenty of times where I applied a gjournal label, discovered something unideal with it, but every time I did 'gjournal stop foo' the label would automatically get detected as a child of a different part of the disk because it could be seen and I could not unload it. That is part of why I use -h for gjournal label, and use slices+partitions, and the first partition is at offset 16, some of which may have been for gmirror's sake too. ==Softwa
Re: Disk access/MPT under ESX3.5
On Mon, May 19, 2008 at 11:02:31AM +0200, Daniel Ponticello wrote: Hello, monitor# camcontrol negotiate 0:0 -W 16 Current Parameters: (pass0:mpt0:0:0:0): sync parameter: 0 (pass0:mpt0:0:0:0): offset: 0 (pass0:mpt0:0:0:0): bus width: 8 bits (pass0:mpt0:0:0:0): disconnection is enabled (pass0:mpt0:0:0:0): tagged queueing is enabled monitor# dd if=/dev/zero of=/var/tmp/dead.file bs=1024k count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 32.421679 secs (32341817 bytes/sec) monitor# dd if=/dev/zero of=/var/tmp/dead.file bs=1024k count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 20.355797 secs (51512402 bytes/sec) No improvements. But it looks like it did not renegotiated the transfers data rate that for some odd reasons are setted as 3.3MB/S instead of 320mb/s. I made some tests using linux 2.18 (debian): debiantest:/home/daniel# uname -a Linux debiantest 2.6.18-6-686 #1 SMP Sun Feb 10 22:11:31 UTC 2008 i686 GNU/Linux scsi0 : ioc0: LSI53C1030, FwRev=h, Ports=1, MaxQ=128, IRQ=169 Vendor: VMwareModel: Virtual disk Rev: 1.0 Type: Direct-Access ANSI SCSI revision: 02 target0:0:0: Beginning Domain Validation target0:0:0: Domain Validation skipping write tests target0:0:0: Ending Domain Validation target0:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU RDSTRM RTI WRFLOW PCOMP (6.25 ns, offset 127) debiantest:/home/daniel# dd if=/dev/zero of=dead.file bs=1024k count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 5.01316 seconds, 209 MB/s For the Linux test, are you sure it didn't cache part of the write before returning? You may need to add some syncs and make it part of the elapsed time. Just checking because this seems to be my experience. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: How go back from X.Y-RELEASE-pZ to X.Y-RELEASE?
On 11/23/2012 6:22 AM, Peter Olsson wrote: We are currently using cvs for both source and ports. I have begun changing to portsnap for ports, and I would also like to try changing at least some of our servers to freebsd-update. But all servers have been patched, using either RELENG_8_3 or RELENG_9_0 as cvs tag. I need to revert them to their respective RELEASE to be able to use freebsd-update. Complete reinstall from eg CD is not an option, and I don't want to upgrade to a newer RELEASE at the moment. Can I change the cvs tags to RELENG_8_3_0_RELEASE or RELENG_9_0_0_RELEASE, and then build/install world and kernel as usual? Or will that method cause problems for the system or the installed ports? Thanks! -- Peter Olssonp...@leissner.se That is what I would do. Certainly try it on a non-critical system first, and take proper consideration for the potential vulnerabilities that will come back until freebsd-update succeeds. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Samsung SSD 840 PRO fails to probe
Hello, My co-worker ordered a Samsung 840 PRO series SSD for his desktop but we found 9.0-rel would not probe it and 9.1-rc3 shows some errors. I got past the problem with a workaround of disabling AHCI mode in the BIOS which drops it to IDE mode and it detects fine, although runs a little slower. Is there something I can try to make it probe properly in AHCI mode? We also tried moving it to the SATA data and power cables from the working SATA HD so I don't think it is the port or controller driver. The same model motherboard from another computer did the same thing. Thanks. dmesg line when it is working: ada0: ATA-9 SATA 3.x device dmesg lines when it is not working: (hand transcribed from a picture) (aprobe0:ahcich0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 05 00 (aprobe0:ahcich0:0:0): CAM status: ATA Status Error (aprobe0:ahcich0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich0:0:0): RES: 51 04 00 00 00 40 00 00 00 00 00 (aprobe0:ahcich0:0:0): Retrying command (aprobe0:ahcich0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 05 00 (aprobe0:ahcich0:0:0): CAM status: ATA Status Error (aprobe0:ahcich0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich0:0:0): RES: 51 04 00 00 00 40 00 00 00 00 00 (aprobe0:ahcich0:0:0): Error 5, Retries exhausted ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Samsung SSD 840 PRO fails to probe
On 11/26/12 14:27, Alexander Motin wrote: Hi. On 26.11.2012 20:51, Adam McDougall wrote: My co-worker ordered a Samsung 840 PRO series SSD for his desktop but we found 9.0-rel would not probe it and 9.1-rc3 shows some errors. I got past the problem with a workaround of disabling AHCI mode in the BIOS which drops it to IDE mode and it detects fine, although runs a little slower. Is there something I can try to make it probe properly in AHCI mode? We also tried moving it to the SATA data and power cables from the working SATA HD so I don't think it is the port or controller driver. The same model motherboard from another computer did the same thing. Thanks. dmesg line when it is working: ada0: ATA-9 SATA 3.x device dmesg lines when it is not working: (hand transcribed from a picture) (aprobe0:ahcich0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 05 00 (aprobe0:ahcich0:0:0): CAM status: ATA Status Error (aprobe0:ahcich0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich0:0:0): RES: 51 04 00 00 00 40 00 00 00 00 00 (aprobe0:ahcich0:0:0): Retrying command (aprobe0:ahcich0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 05 00 (aprobe0:ahcich0:0:0): CAM status: ATA Status Error (aprobe0:ahcich0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich0:0:0): RES: 51 04 00 00 00 40 00 00 00 00 00 (aprobe0:ahcich0:0:0): Error 5, Retries exhausted I believe that is SSD's firmware bug. Probably it declares support for SATA Asynchronous Notifications in its IDENTIFY data, but returns error on attempt to enable it. Switching controller to legacy mode disables that functionality and so works as workaround. Patch below should workaround the problem from the OS side: --- ata_xpt.c (revision 243561) +++ ata_xpt.c (working copy) @@ -745,6 +745,14 @@ probedone(struct cam_periph *periph, union ccb *do goto noerror; /* +* Some Samsung SSDs report supported Asynchronous Notification, +* but return ABORT on attempt to enable it. +*/ + } else if (softc->action == PROBE_SETAN && + status == CAM_ATA_STATUS_ERROR) { + goto noerror; + + /* * SES and SAF-TE SEPs have different IDENTIFY commands, * but SATA specification doesn't tell how to identify them. * Until better way found, just try another if first fail. Thanks for the prompt response and patch, that worked! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: time issues and ZFS
On 01/22/13 07:27, Julian Stecklina wrote: Thus spake Daniel Braniss : In the meantime here is some info: Intel(R) Xeon(R) CPU E5645: running with no problems LAPIC(600) HPET(450) HPET1(440) HPET2(440) HPET3(440) i8254(100) RTC(0) Intel(R) Xeon(R) CPU X5550: this is the problematic, at least for the moment HPET(450) HPET1(440) HPET2(440) HPET3(440) LAPIC(400) i8254(100) RTC(0) Does anyone know why the LAPIC is given a lower priority than HPET in this case? If you have an LAPIC, it should always be prefered to HPET, unless something is seriously wrong with it... Julian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" This may help: "Problem with LAPIC timer is that it stops working when CPU goes to C3 or deeper idle state. These states are not enabled by default, so unless you enabled them explicitly, it is safe to use LAPIC. In any case present 9-STABLE system should prevent you from using unsafe C-state if LAPIC timer is used. From all other perspectives LAPIC is preferable, as it is faster and easier to operate then HPET. Latest CPUs fixed the LAPIC timer problem, so I don't think that switching to it will be pessimistic in foreseeable future. -- Alexander Motin" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FreeBSD 9.1 - openldap slapd lockups, mutex problems
On 01/22/13 05:19, Kai Gallasch wrote: Hi. (Im am sending this to the "stable" list, because it maybe kernel related.. ) On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon. The slapd runs for some days and then hangs, consuming high amounts of CPU. In this state slapd can only be restarted by SIGKILL. # procstat -kk 71195 PIDTID COMM TDNAME KSTACK 71195 149271 slapd-mi_switch+0x186 sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d do_wait+0x678 __umtx_op_wait+0x68 amd64_syscall+0x546 Xfast_syscall+0xf7 On UFS2 slapd runs fine, without showing the error. Has anyone else running openldap-server on FreeBSD 9.1 inside a jail seen similar problems? I have seen openldap spin the cpu and even run out of memory to get killed on some of our test systems running ~9.1-rel with zfs. No jails. I'm not sure what would have put load on our test systems other than nightly scripts. I had to focus my attention on other servers so I don't have one to inspect at this point, but I won't be surprised if I see this in production. Thanks for the tip about it being ZFS related, and I'll let you know if I find anything out. This is mostly a "me too" reply. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Why does poudriere always rebuild nginx and GraphicsMagick13?
On Fri, Feb 15, 2013 at 12:37:19AM +0100, Rainer Duffner wrote: Am 12.02.2013 um 23:11 schrieb Baptiste Daroussin : > On Tue, Feb 12, 2013 at 10:59:28PM +0100, Rainer Duffner wrote: >> Hi, >> >> poudriere 2.2 here, running on 9.1-amd64 >> >> Of the 730-ish ports, whenever I run a build, it always rebuilds the above two ports. >> Even if nothing changed. >> Options changed, deleting: GraphicsMagick-nox11-1.3.16_1.txz >> Options changed, deleting: nginx-1.2.6,1.txz Somehow, it thinks the options have changed. Maybe, the options-file has an error? Regards, Rainer Try deleting the options file for each and run poudriere twice to test. I had the same problem with mailman and it turned out I was missing a required but not enforced option due to another option I had selected. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 9.1-stable: ATI IXP600 AHCI: CAM timeout
On 05/29/13 10:21, Oliver Fromme wrote: > Steven Hartland wrote: > > Have you checked your sata cables and psu outputs? > > > > Both of these could be the underlying cause of poor signalling. > > I can't easily check that because it is a cheap rented > server in a remote location. > > But I don't believe it is bad cabling or PSU anyway, or > otherwise the problem would occur intermittently all the > time if the load on the disks is sufficiently high. > But it only occurs at tags=3 and above. At tags=2 it does > not occur at all, no matter how hard I hammer on the disks. > > At the moment I'm inclined to believe that it is either > a bug in the HDD firmware or in the controller. The disks > aren't exactly new, they're 400 GB Samsung ones that are > several years old. I think it's not uncommon to have bugs > in the NCQ implementation in such disks. > > The only thing that puzzles me is the fact that the problem > also disappears completely when I reduce the SATA rev from > II to I, even at tags=32. > > Best regards >Oliver > > Jeremy Chadwick knows of some hardware faults with IXP600/700, there may be more information on the freebsd-fs mailing list archives or if you can discuss with him: http://docs.freebsd.org/cgi/mid.cgi?20130414194440.GB38338 That email mentions port multipliers but the problems may extend beyond. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS pool with 4k sector size
On 08/22/13 04:23, Trond Endrestøl wrote: On Thu, 22 Aug 2013 11:40+0400, Michael BlackHeart wrote: Hello, I'd like to know what is the best way to convert my pool from 512b sector size to 4k sector size. Hardware: 2 x2Tb WD Green with 4k physical sector size Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s) Device Model: WDC WD20EARX-00PASB0 Serial Number: WD-WCAZA8280575 LU WWN Device Id: 5 0014ee 206032063 Firmware Version: 51.0AB51 User Capacity: 2 000 398 934 016 bytes [2,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Thu Aug 22 11:33:16 2013 MSK SMART support is: Available - device has SMART capability. SMART support is: Enabled They're running in a mirror pool: storage state: ONLINE scan: resilvered 48K in 0h0m with 0 errors on Thu Jul 25 19:18:01 2013 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada7 ONLINE 0 0 0 zdb info storage: version: 5000 name: 'storage' state: 0 txg: 1292269 pool_guid: 18442220950447532371 hostid: 708219113 hostname: 'diablo.miekoff.local' vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 18442220950447532371 create_txg: 4 children[0]: type: 'mirror' id: 0 guid: 4289294206539029185 metaslab_array: 33 metaslab_shift: 34 ashift: 9 asize: 2000394125312 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 16348588566764560218 path: '/dev/ada3' phys_path: '/dev/ada3' whole_disk: 1 DTL: 95 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 7655198429866445090 path: '/dev/ada7' phys_path: '/dev/ada7' whole_disk: 1 DTL: 97 create_txg: 4 features_for_read: As you see ashift is 9 (512b). I know a common solution with gnop and export-mport pool, but how should I manage mirror this way? Should I create a mirror on gnop-ed devices and then export-import? I'm afraid you're out of luck. You need to backup the data somehow, recreate the pool with ashift=12, and restore the data. A better option would be to buy a couple of new drives, assuming you can connect them to the current system, create a new mirrored pool with ashift=12, and transfer the data using a recursive set of snapshots on the current pool and a ZFS send stream sent to the new pool. You can zpool detach storage ada7, gnop create -S 4k ada7, zpool create storage2 ada7.nop, then copy all of your data into storage2 manually. When done, destroy the original zpool then zpool attach storage2 ada7.nop ada3 which will resilver ada7 onto ada3 to complete the new mirror. Then I'd zpool export storage2, destroy the nop or just reboot, and re-import storage2 as storage if you wish to rename it. The risk is losing all of your data if there is a problem while you only have one valid copy. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Is there a linux_base available for RELENG_9?
On 03/09/2015 20:44, Chris H wrote: > I performed av svn update for both src (r279796), > and ports (r380829) last night. building/installing > world/kernel, went as one would hope. Upgrading ports > was a different story. Given this box has an nVidia card. > I usually start by upgrading emulators/linux_base; which > according to UPDATING; meant linux_base-f10 --> linux_base-c6. > I deinstalled x11/nvidia-driver, followed by > emulators/linux_base-f10. I then attempted to make install > emulators/linux_base-c6, which resulted in a message > that it wasn't supported. What was the exact error? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SSH hung with an OpenSSH_6.6.1p1 --> OpenSSH_5.8p2_hpn13v11
On 03/26/2015 21:25, Wu ShuKun wrote: > Okay > % ssh -v -o "KexAlgorithms diffie-hellman-group-exchange-sha1" 10.41.172.19 > OpenSSH_6.6.1p1, OpenSSL 1.0.1l-freebsd 15 Jan 2015 > debug1: Reading configuration data /etc/ssh/ssh_config > debug1: Connecting to 10.41.172.19 [10.41.172.19] port 22. > debug1: Connection established. > debug1: identity file /home/wsk/.ssh/id_rsa type -1 > debug1: identity file /home/wsk/.ssh/id_rsa-cert type -1 > debug1: identity file /home/wsk/.ssh/id_dsa type -1 > debug1: identity file /home/wsk/.ssh/id_dsa-cert type -1 > debug1: identity file /home/wsk/.ssh/id_ecdsa type -1 > debug1: identity file /home/wsk/.ssh/id_ecdsa-cert type -1 > debug1: identity file /home/wsk/.ssh/id_ed25519 type -1 > debug1: identity file /home/wsk/.ssh/id_ed25519-cert type -1 > debug1: Enabling compatibility mode for protocol 2.0 > debug1: Local version string SSH-2.0-OpenSSH_6.6.1_hpn13v11 FreeBSD-20140420 > debug1: Remote protocol version 2.0, remote software version > OpenSSH_5.8p2_hpn13v11 FreeBSD-20110503 > debug1: match: OpenSSH_5.8p2_hpn13v11 FreeBSD-20110503 pat OpenSSH_5* > compat 0x0c00 > debug1: SSH2_MSG_KEXINIT sent > debug1: SSH2_MSG_KEXINIT received > debug1: kex: server->client aes128-ctr hmac-md5 none > debug1: kex: client->server aes128-ctr hmac-md5 none > debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<3072<8192) sent > debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > Connection closed by 10.41.172.19 > % Can you try stopping sshd on the server side and run /usr/sbin/sshd -Dd then SSH in and see if the server provides a reason for disconnecting the client? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ports/base ntpd rc.d script with WITHOUT_NTP=yes
On 04/08/2015 12:48, Matt Smith wrote: > Hi, > > I just upgraded my server to 10.1-STABLE r281264 and when I ran > mergemaster it told me that /etc/rc.d/ntpd was stale and would I like to > delete it. It's never done this before. I've figured out it's because I > have WITHOUT_NTP=yes in /etc/src.conf. I did this because I use the > ports version of ntpd and thus wanted to remove the base installed > version so that when I run commands like ntpq it's using my possibly > newer port installed version and not the older one. > > However, the port version doesn't have its own rc script. It usually > uses the base version with ntpd_program and ntpd_config set. With this > latest change it means I have to have the base version installed again. > Is it possible to get the port version to have its own rc script? > net/openntpd has an rc script if you don't mind switching. It is very very simple to configure. Ideally the original problem should be solved too but I ran into the same problem with Kerberos. I didn't get anywhere in the bug report where I argued the system scripts still worked fine except for recent changes in them causing a regression and failure with the port. Both situations could probably use a contributed patch to make an rc script. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Getting going with a new Dell 7810
On 06/16/2015 12:55, Richard Kuhns wrote: > Greetings all, > > I've just received a new Dell Precision 7810. I've installed FreeBSD > 10.1 (UEFI boot), checked out sources, built world & kernel and am now > running r284449. So far, so good. > > The problem is Xorg. I'm running the latest Xorg in ports; I just did a > 'make install clean' in /usr/ports/x11/xorg with no errors. > > The display card is a FirePro W4100. lspci shows: > > 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. > [AMD/ATI] Cape Verde GL [FirePro W4100] > If it is brand new, it is probably not supported and probably won't be for a while. Please see https://wiki.freebsd.org/Graphics for a list of devices which does include your Radeon HD 4670. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 4TB Western Digital My Book 1230 USB hard drive not working on 10.2
On 08/02/2015 21:22, Paul Mather wrote: > I have a 4TB external USB drive (Western Digital My Book 1230) that I am > trying to use under FreeBSD/amd64 10.2 (10.2-PRERELEASE #0 r286052: Wed Jul > 29 20:59:39 EDT 2015). This system has a MSI 760GMA-P34 (FX) motherboard. > > The drive probes unreliably when plugged in to a USB 3 port. It reliably > probes when plugged into a USB 2 port. However, it works in neither cases. > Attempting to dd from the drive results in a "dd: /dev/da0: Invalid argument". > > When plugged in to a USB 2 port, this is how the drive is probed: > > ugen6.2: at usbus6 > umass0: on > usbus6 > umass0: SCSI over Bulk-Only; quirks = 0xc001 > umass0:9:0:-1: Attached to scbus9 > da0 at umass-sim0 bus 0 scbus9 target 0 lun 0 > da0: Fixed Direct Access SPC-4 SCSI device > da0: Serial Number 57434334453056594A4C4A4A > da0: 40.000MB/s transfers > da0: 3815415MB (976746240 4096 byte sectors: 255H 63S/T 60799C) > da0: quirks=0x2 > ses0 at umass-sim0 bus 0 scbus9 target 0 lun 1 > ses0: Fixed Enclosure Services SPC-4 SCSI device > ses0: Serial Number 57434334453056594A4C4A4A > ses0: 40.000MB/s transfers > ses0: SCSI-3 ENC Device > > When booting with it connected to a USB 3 port, this is what is output: > > xhci0: mem 0xfeafe000-0xfeaf irq 18 > at device 0.0 on pci3 > xhci0: 64 bytes context size, 64-bit DMA > usbus0 on xhci0 > [[...]] > ohci0: mem 0xfe7fe000-0xfe7fefff irq > 16 at device 18.0 on pci0 > usbus1 on ohci0 > ohci1: mem 0xfe7fd000-0xfe7fdfff irq > 16 at device 18.1 on pci0 > usbus2 on ohci1 > ehci0: mem 0xfe7ff800-0xfe7ff8ff > irq 17 at device 18.2 on pci0 > usbus3: EHCI version 1.0 > usbus3 on ehci0 > ohci2: mem 0xfe7fc000-0xfe7fcfff irq > 18 at device 19.0 on pci0 > usbus4 on ohci2 > ohci3: mem 0xfe7f7000-0xfe7f7fff irq > 18 at device 19.1 on pci0 > usbus5 on ohci3 > ehci1: mem 0xfe7ff400-0xfe7ff4ff > irq 19 at device 19.2 on pci0 > usbus6: EHCI version 1.0 > usbus6 on ehci1 > [[...]] > ohci4: mem 0xfe7f6000-0xfe7f6fff irq > 18 at device 20.5 on pci0 > usbus7 on ohci4 > [[...]] > usbus0: 5.0Gbps Super Speed USB v3.0 > usbus1: 12Mbps Full Speed USB v1.0 > usbus2: 12Mbps Full Speed USB v1.0 > usbus3: 480Mbps High Speed USB v2.0 > usbus4: 12Mbps Full Speed USB v1.0 > usbus5: 12Mbps Full Speed USB v1.0 > usbus6: 480Mbps High Speed USB v2.0 > usbus7: 12Mbps Full Speed USB v1.0 > ugen7.1: at usbus7 > uhub0: on usbus7 > ugen6.1: at usbus6 > uhub1: on usbus6 > ugen5.1: at usbus5 > uhub2: on usbus5 > ugen4.1: at usbus4 > uhub3: on usbus4 > ugen3.1: at usbus3 > uhub4: on usbus3 > ugen2.1: at usbus2 > uhub5: on usbus2 > ugen1.1: at usbus1 > uhub6: on usbus1 > ugen0.1: <0x1912> at usbus0 > uhub7: <0x1912 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 > uhub0: 2 ports with 2 removable, self powered > uhub2: 3 ports with 3 removable, self powered > uhub3: 3 ports with 3 removable, self powered > uhub5: 3 ports with 3 removable, self powered > uhub6: 3 ports with 3 removable, self powered > uhub7: 8 ports with 8 removable, self powered > [[...]] > Root mount waiting for: usbus6 usbus3 usbus0 > Root mount waiting for: usbus6 usbus3 usbus0 > uhub1: 6 ports with 6 removable, self powered > uhub4: 6 ports with 6 removable, self powered > ugen0.2: at usbus0 > umass0: on > usbus0 > umass0: SCSI over Bulk-Only; quirks = 0x8000 > Root mount waiting for: usbus0 > [[...]] > Root mount waiting for: usbus0 > Root mount waiting for: usbus0 > umass0: Get Max Lun not supported (USB_ERR_TIMEOUT) > umass0:9:0:-1: Attached to scbus9 > [[...]] > da0 at umass-sim0 bus 0 scbus9 target 0 lun 0 > da0: < > Fixed Direct Access SCSI device > da0: Serial Number WCC4E0VYJLJJ > da0: 400.000MB/s transfers > da0: 3815415MB (976746240 4096 byte sectors: 255H 63S/T 60799C) > da0: quirks=0x2 > > > This external USB drive works fine under OSX Yosemite and also when plugged > in to my Raspberry Pi 2 running OSMC. > > Is there anyone using this model of USB drive under FreeBSD/amd64 10.2? Is > it a matter of finding the correct quirk to get it working? > > Cheers, > > Paul. The trouble detecting is probably related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196332 I have trouble with my 2T WD "My passport" but I did not create a bug report because I found the one above which didn't get a reply. As far as "dd: /dev/da0: Invalid argument", did you supply a bs= argument to dd? I noticed: "da0: 3815415MB (976746240 4096 byte sectors" which means it is reporting as a 4k drive and it will reject 512 byte IO requests like dd would use without bs=. I ran into that issue when I was testing some old FC drives formatted for 520 byte sectors for ECC instead of 512. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [CTF] pkg 1.6.0
On 09/22/2015 03:20, Ranjan1018 . wrote: > 2015-09-21 23:27 GMT+02:00 Baptiste Daroussin : > >> Hi all, >> >> We are about to release pkg 1.6.0. pkg-devel has been updated to 1.5.99.13 >> aka >> 1.6.0 rc3 that we hope will become the new pkg 1.6.0 btw the end of the >> Week >> (release planned for Saturday 26th of September or no important issues are >> raised) > running version 1.5.3 I have this error message: > # pkg upgrade > Updating FreeBSD repository catalogue... > FreeBSD repository is up-to-date. > All repositories are up-to-date. > Checking for upgrades (40 candidates): 100% > Processing candidates (40 candidates): 100% > Checking integrity... done (1 conflicting) > pkg: Cannot solve problem using SAT solver: > dependency rule: package Thunar(l) depends on: > xfce4-tumbler(r)xfce4-tumbler(l) > upgrade rule: upgrade local xfce4-tumbler-0.1.31_1 to remote > xfce4-tumbler-0.1.31_1 > cannot install package xfce4-tumbler, remove it from request? [Y/n]: > pkg: cannot find xfce4-tumbler in the request > pkg: cannot solve job using SAT solver > Checking integrity... done (0 conflicting) > Your packages are up to date. > > With this version I have been able to update the packages. Same here, I ran into a conflict with Thunar on at least two computers with 1.5.6: pkg: Cannot solve problem using SAT solver: dependency rule: package Thunar(l) depends on: xfce4-tumbler(r)xfce4-tumbler(l) upgrade rule: upgrade local xfce4-tumbler-0.1.31_1 to remote xfce4-tumbler-0.1.31_1 cannot install package xfce4-tumbler, remove it from request? [Y/n]: ^C I upgraded to 1.5.99.13 without any problems and it handles Thunar fine without any workarounds. Just some extra warnings the first time when upgrading to 1.5.99 from my own repo: # pkg upgrade Updating pkg-desktop repository catalogue... Fetching meta.txz: 100%260 B 0.3kB/s00:01 Fetching packagesite.txz: 100% 217 KiB 222.0kB/s00:01 Processing entries: 0% pkg: Skipping unknown key 'messages' Processing entries: 2% pkg: Skipping unknown key 'messages' Processing entries: 4% pkg: Skipping unknown key 'messages' Processing entries: 5% pkg: Skipping unknown key 'messages' Processing entries: 6% pkg: Skipping unknown key 'messages' pkg: Skipping unknown key 'messages' ... Processing entries: 100% pkg-desktop repository update completed. 914 packages processed. New version of pkg detected; it needs to be installed first. pkg: Skipping unknown key 'messages' Checking integrity... done (0 conflicting) The following 1 package(s) will be affected (of 0 checked): Installed packages to be UPGRADED: pkg: 1.5.6 -> 1.5.99.13 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nfs lockd errors after NetApp software upgrade.
Try changing bool_t do_tcp = FALSE; to TRUE in /usr/src/sys/nlm/nlm_prot_impl.c, recompile the kernel and try again. I think this makes it match Linux client behavior. I suspect I ran into the same issue as you. I do think I used nolockd is a workaround temporarily. I can provide some more details if it works. On 12/19/19 9:21 AM, Daniel Braniss wrote: > > >> On 19 Dec 2019, at 16:09, Rick Macklem wrote: >> >> Daniel Braniss wrote: >> [stuff snipped] >>> all mounts are nfsv3/tcp >> This doesn't affect what the NLM code (rpc.lockd) uses. I honestly don't >> know when >> the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast at times. > can the replay cache have any influence here? I tend to remember way back > issues > with it, >> >> To me, it looks like a network configuration issue. > that was/is my gut feelings too, but, as far as we can tell, nothing has > changed in the network infrastructure, > the problems appeared after the NetAPP’s software was updated, it was working > fine till then. > > the problems are also happening on freebsd 12.1 > >> You could capture packets (maybe when a client first starts rpc.statd and >> rpc.lockd) >> and then look at them in wireshark. I'd disable statup of rpc.lockd and >> rpc.statd >> at boot for a test client and then run something like: >> # tcpdump -s 0 -s out.pcap host >> - and then start rpc.statd and rpc.lockd >> Then I'd look at out.pcap in wireshark (much better at decoding this stuff >> than >> tcpdump). I'd look for things like different reply IP addresses from the >> Netapp, >> which might confuse this tired old NLM protocol Sun devised in the mid-1980s. >> > it’s going to be an interesting week end :-( > >>> the error is also appearing on freebsd-11.2-stable, I’m now checking if >>> it’s also >>> happening on 12.1 >>> btw, the NetApp version is 9.3P17 >> Yes. I wasn't the author of the NSM and NLM code (long ago I refused to even >> try to implement it, because I knew the protocol was badly broken) and I >> avoid >> fiddling with. As such, it won't have change much since around FreeBSD7. > and we haven’t had any issues with it for years, so you must have done > something good > > cheers, > danny > >> >> rick >> >> cheers, >>danny >> >>> rick >>> >>> Cheers >>> >>> Richard >>> (NetApp admin) >>> >>> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss >>> mailto:da...@cs.huji.ac.il>> wrote: >>> >>> On 18 Dec 2019, at 16:55, Rick Macklem mailto:rmack...@uoguelph.ca>> wrote: Daniel Braniss wrote: > Hi, > The server with the problems is running FreeBSD 11.1 stable, it was > working fine for >several months, > but after a software upgrade of our NetAPP server it’s reporting many > lockd errors >and becomes catatonic, > ... > Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not > responding > Dec 18 13:11:45 moo-09 last message repeated 7 times > Dec 18 13:12:55 moo-09 last message repeated 8 times > Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive > again > Dec 18 13:13:10 moo-09 last message repeated 8 times > Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen > queue >overflow: 194 already in queue awaiting acceptance (1 occurrences) > Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen > queue >overflow: 193 already in queue awaiting acceptance (3957 > occurrences) > Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen > queue >overflow: 193 already in queue awaiting acceptance … Seems like their software upgrade didn't improve handling of NLM RPCs? Appears to be handling RPCs slowly and/or intermittently. Note that no one tests it with IPv6, so at least make sure you are still using IPv4 for the mounts and try and make sure IP broadcast works between client and Netapp. I think the NLM and NSM (rpc.statd) still use IP broadcast sometimes. >>> we are ipv4 - we have our own class c :-) Maybe the network guys can suggest more w.r.t. why, but as I've stated before, the NLM is a fundamentally broken protocol which was never published by Sun, so I suggest you avoid using it if at all possible. >>> well, at the moment the ball is on NetAPP court, and switching to NFSv4 at >>> the moment is out of the question, it’s >>> a production server used by several thousand students. >>> - If the locks don't need to be seen by other clients, you can just use the "nolockd" mount option. or - If locks need to be seen by other clients, try NFSv4 mounts. Netapp filers should support NFSv4.1, which is a much better protocol that NFSv4.0. Good luck with it, rick >>> thanks >>> danny >>> … any ideas? thanks, danny ___ >>
Re: nfs lockd errors after NetApp software upgrade.
On 12/22/19 12:01 PM, Rick Macklem wrote: > Well, I've noted the flawed protocol. Here's an example (from my limited > understanding of these protocols, where there has never been a published > spec) : > - The NLM supports a "blocking lock request" that goes something like this... >- client requests lock and is willing to wait for it >- if server has a conflicting lock on the file, it replies "I'll acquire > the lock for > you when I can and let you know". > --> When the conflicting lock is released, the server acquires the lock > and does > a callback (server->client RPC) to tell the client it now has the > lock. > You don't have to think about this for long to realize that any network > unreliability > or partitioning could result in trouble. > The kernel RPC layer may do some retries of the RPCs (this is controlled by > the > parameters set for the RPC), but at some point the protocol asks the NSM > (rpc.statd) if the machine is "up" and then uses the NSM's answer to deal > with it. > (The NSM basically pokes other systems and notes they are "up" if they get > replies to these pokes. It uses IP broadcast at some point.) > > Now, maybe switching to TCP will make the RPCs reliable enough that it will > work, or maybe it won't? (It certainly sounds like the Netapp upgrade is > causing > some kind of network issue, and the NLM doesn't tolerate that well.) > > rick tl;dr I think netapp effectively nerfed UDP lockd performance in newer versions, maybe cluster mode. >From my very un-fun experience after migrating our volumes off an older netapp onto a new netapp with flash drives (plenty fast) running Ontap 9.x ("cluster mode"), our typical IO load from idle time IMAP connections was enough to overwhelm the new netapp and drive performance into the ground. The same IO that was perfectly fine on the old netapp. Going into a workday in this state was absolutely not possible. I opened a high priority ticket with netapp, didn't really get anywhere that very long day and settled on nolockd so I could go home and sleep. Both my hunch later and netapp support suggested switching lockd traffic to TCP even though I had no network problems (the old netapp was fine). I think I still run into occasional load issues but the newer netapp OS seemed way more capable of this load when using TCP lockd. Of course they also suggested switching to nfsv4 but I could not seriously entertain validating that type of change for production in less than a day. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
[vfs_bio] Re: Fatal trap 12: page fault while in kernel mode (with potential cause, fix?)
On Mon, Apr 23, 2007 at 11:55:52AM -0400, Kris Kennaway wrote: On Mon, Apr 23, 2007 at 05:35:47PM +0200, Kai wrote: > On Thu, Apr 19, 2007 at 02:33:29PM +0200, Kai wrote: > > On Wed, Apr 11, 2007 at 12:53:32PM +0200, Kai wrote: > > > > > > Hello all, > > > > > > We're running into regular panics on our webserver after upgrading > > > from 4.x to 6.2-stable: > > > > Hi all, > > To continue this story, a colleague wrote a small program in C that launches > 40 threads to randomly append and write to 10 files on an NFS mounted > filesystem. > > If I keep removing the files on one of the other machines in a while loop, > the first system panics: > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x34 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc06bdefa > stack pointer = 0x28:0xeb9f69b8 > frame pointer = 0x28:0xeb9f69c4 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 73626 (nfscrash) > trap number = 12 > panic: page fault > cpuid = 1 > Uptime: 3h2m14s > > Sounds like a nice denial of service problem. I can hand the program to > developers on request. Please send it to me. Panics are always much easier to get fixed if they come with a test case that developer can use to reproduce it. Kris I have been working on this problem all weekend and I have a strong hunch at this point that it is a result of 1.424 of sys/kern/vfs_bio.c which was between FreeBSD 5.1 and 5.2. This hunch is currently being verified by a system that was cvsupped to code just before 1.424, and it has been running about 7 times longer than the usual time required to crash. I am currently attempting to craft a patch for 6.2 that essentially backs out the change to see if that works, but if this information can help send a FreeBSD developer down the right trail to a proper fix, great. I will follow up with more detailed findings and results tonight or soon. links: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.423;r2=1.424 related to 1.424: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.420&r2=1.421 Commit emails: http://docs.freebsd.org/cgi/mid.cgi?200311150845.hAF8jawU027349 http://docs.freebsd.org/cgi/mid.cgi?20030445.hAB4jbYw093253 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: [vfs_bio] Re: Fatal trap 12: page fault while in kernel mode (with potential cause)
On Sun, Jun 24, 2007 at 12:30:20AM -0400, Adam McDougall wrote: On Mon, Apr 23, 2007 at 11:55:52AM -0400, Kris Kennaway wrote: On Mon, Apr 23, 2007 at 05:35:47PM +0200, Kai wrote: > On Thu, Apr 19, 2007 at 02:33:29PM +0200, Kai wrote: > > On Wed, Apr 11, 2007 at 12:53:32PM +0200, Kai wrote: > > > > > > Hello all, > > > > > > We're running into regular panics on our webserver after upgrading > > > from 4.x to 6.2-stable: > > > > Hi all, > > To continue this story, a colleague wrote a small program in C that launches > 40 threads to randomly append and write to 10 files on an NFS mounted > filesystem. > > If I keep removing the files on one of the other machines in a while loop, > the first system panics: > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x34 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc06bdefa > stack pointer = 0x28:0xeb9f69b8 > frame pointer = 0x28:0xeb9f69c4 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 73626 (nfscrash) > trap number = 12 > panic: page fault > cpuid = 1 > Uptime: 3h2m14s > > Sounds like a nice denial of service problem. I can hand the program to > developers on request. Please send it to me. Panics are always much easier to get fixed if they come with a test case that developer can use to reproduce it. Kris I have been working on this problem all weekend and I have a strong hunch at this point that it is a result of 1.424 of sys/kern/vfs_bio.c which was between FreeBSD 5.1 and 5.2. This hunch is currently being verified by a system that was cvsupped to code just before 1.424, and it has been running about 7 times longer than the usual time required to crash. I am currently attempting to craft a patch for 6.2 that essentially backs out the change to see if that works, but if this information can help send a FreeBSD developer down the right trail to a proper fix, great. I will follow up with more detailed findings and results tonight or soon. links: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.423;r2=1.424 related to 1.424: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.420&r2=1.421 Commit emails: http://docs.freebsd.org/cgi/mid.cgi?200311150845.hAF8jawU027349 http://docs.freebsd.org/cgi/mid.cgi?20030445.hAB4jbYw093253 ___ If I turn on invariants, I get the following panic instead, much quicker, and happens with at least as far back as 5.0-RELEASE: panic: bundirty: buffer 0x8e2e95f8 still on queue 1 cpuid = 1 Uptime: 35s Dumping 511 MB (2 chunks) chunk 0: 1MB (153 pages) ... ok chunk 1: 511MB (130816 pages) 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x8028d699 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0x8028d12b in panic (fmt=0x80443458 "bundirty: buffer %p still on queue %d") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0x802e1e78 in bundirty (bp=0x8e2e95f8) at /usr/src/sys/kern/vfs_bio.c:1055 #4 0x802e3eb1 in brelse (bp=0x8e2e95f8) at /usr/src/sys/kern/vfs_bio.c:1370 #5 0x803550e8 in nfs_writebp (bp=0x8e2e95f8, force=0, td=0x0) at /usr/src/sys/nfsclient/nfs_vnops.c:3005 #6 0x802e5197 in getblk (vp=0xff000c23e5d0, blkno=0, size=14400, slpflag=256, slptimeo=0, flags=0) at buf.h:412 #7 0x80344f13 in nfs_getcacheblk (vp=0xff000c23e5d0, bn=0, size=14400, td=0xff0015b274c0) at /usr/src/sys/nfsclient/nfs_bio.c:1252 #8 0x8034616c in nfs_write (ap=0x0) at /usr/src/sys/nfsclient/nfs_bio.c:1068 #9 0x80405ee4 in VOP_WRITE_APV (vop=0x805a0260, a=0x976bfa10) at vnode_if.c:698 #10 0x80303d2c in vn_write (fp=0xff000f524000, uio=0x976bfb50, active_cred=0x0, flags=0, td=0xff0015b274c0) at vnode_if.h:372 #11 0x802ba2e5 in dofilewrite (td=0xff0015b274c0, fd=3, fp=0xff000f524000, auio=0x976bfb50, offset=0, flags=0) at file.h:253 #12 0x802ba5e1 in kern_writev (td=0xff0015b274c0, fd=3, auio=0x976bfb50) at /us
Hook up idmapd to build in 6-stable?
I am beginning to dabble with NFSv4 client functionality. I noticed idmapd is not built in -stable but it has been in -current since src/sbin/Makefile v. 1.163 (13 months ago). Should it be hooked up to the build? Thanks ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Hook up idmapd to build in 6-stable?
On Sat, Nov 17, 2007 at 07:44:31PM +, Ceri Davies wrote: On Fri, Nov 16, 2007 at 08:31:04PM -0500, Adam McDougall wrote: > I am beginning to dabble with NFSv4 client functionality. I noticed > idmapd is not built in -stable but it has been in -current since src/sbin/Makefile > v. 1.163 (13 months ago). Should it be hooked up to the build? Thanks At the time I was looking at it in -current, idmapd worked fine but the client had serious issues (nothing on an NFSv4 mount could be executed, for instance) which I couldn't track down, so I stopped working with it. I think that hooking up idmapd could be a good thing to do in order to expose those problems, but I'm concerned that it may give the impression that our NFSv4 client is any use, which it appears not to be (at least 13 months ago; apologies if this is not longer the case). Ceri I hadn't realized until I read the manpage that the nfs4 client was considered incomplete, I suppose one of the first clues was idmap not being hooked up to the build :) and while I got a mount working and with idmap, it was discovered that ctimes appear to be broken: > ls -lc > total 5 drwxr-xr-x 5 nobody nobody 4096 Dec 31 1969 Maildir -rw--- 1 nobody nobody 1029 Mar 3 1970 foo I doubt I will continue looking at the nfsv4 client in this state, and I don't really have a reason to use it at this point in time, but perhaps it would be better to state the apparent usefulness in a manpage to not give false impressions. Of course, that also takes work. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Working mini PCIe wireless?
On Tue, Dec 04, 2007 at 12:02:58PM -0500, Michael Proto wrote: Geoff Buckingham wrote: > I recently purchased the Dell I6400 Ubuntu laptop, with the express intent of > running FreeBSD on it. > > My intention was to use this a working machine using PC-BSD, I.E. RELENG_6, limiting tinkering, experimentation to other systems. > > It comes with an Intel 3945[1] mini PCIe card, which I knew would not work. So > I ordered myself a Azurewave Atheros based mini PCIe card to replace it. > Unfortunatly the ath_hal[2] doesn't recognise the hardware revision, so that > doesn't work either. > > Does anybody know of a mini PCIe wireless card that works under RELENG_6? > Failing that an ExpressCard? (The 6400 has no Card Bus) > > Notes: > > [1]I know there is a 3945 driver for 7 and an older rev fo 6, I tried it > it locks up the machine. > [2] Sam has a more recent HAL on his "people" page, but the headers have > changed making it non-trivial, for me at least, to compile into a > RELENG_6 kernel/module. > I don't have access to my laptop at the moment (and hence can't pull the exact kernel output regarding the adapter), but I have a ThinkPad R60 with the ThinkPad a/b/g miniPCI-e wireless card that worked fine under FreeBSD 6.2. I believe you can find the actual part here: http://shop.lenovo.com/SEUILibrary/controller/e/web/LenovoPortal/en_US/catalog.workflow:item.detail?GroupID=38&Code=40Y7026¤t-category-id=DD119CA6FA0E4518A4086EB8FF1FDD2B&model-number=9456 Part number: 40Y7026 I ordered the same card from amazon, it works with my Dell Latitude D820 but for some reason I get an occasional NMI (once a day or less) which panics FreeBSD or bluescreens windows. I found a workaround in FreeBSD, the POWERFAIL_NMI option. It beeps and logs an event instead of panicing. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Also seeing 2 x quad-core system slower that 2 x dual core
On Fri, Dec 07, 2007 at 12:39:22PM +, Pete French wrote: Just as a followup to this - I soent some time going through all the suggestions and advice that people gave me regarding this problem. It turns out that the newer servers shipped by HP have different cache settings to the older ones on their RAID controllers, plus I get very different results from my benchmarks depending on how long the machines have been booted for and what activity has occurred on them (probably due to things ending up in cache). Upshot - if the machines are configured identically, and an identical install is made and an identical test doen then we get identical performance as expected. Part of the reason for posting this though is that a lot of people have bbeen worrying about 8x CPU performance, and this thread won't have helped. So I wanted to say that now I am convinced that (for my workload) these machines are fine. To the point where I have installed 7.0-BETA4 on the ten new 8 core servers for a very large load on th webfarm this morning. I'm pleased toio say that it went off perfectly, the servers took the load and we had no problems at all. We are running CGI scripts against mySQL under apache22 basically - which is a pretty common thing to do. Ia m using ULE and tthe amd64 version of the OS. 7.0 is excellent as far as I am concerned, and I don't think people should be worried about deploying it on 8 core machines. My experinec has been that it is fine and is also somewhat faster than 6.3 on the same hardware. -pete. I feel I have to throw in my experience too just for the archives. This previous weekend I had the opportunity to play with a dual quad-core dell precision 690, and I saw no huge performance problems. All around I was fairly happy with standard compile tests, did some rm. I didn't do anything involved like databases or php. I could do a buildkernel with a few additional kernel modules in 29 seconds :) but buildworld doesn't seem to benefit much from -j16 on a 8 core (not unexpected). I didn't have any useful parallel loads to test on it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Swapping caused by very large (regular) file size
On Sat, Jan 12, 2008 at 12:05:38AM -0500, John Baldwin wrote: On Friday 11 January 2008 10:31:47 pm Peter Jeremy wrote: > On Fri, Jan 11, 2008 at 06:44:20PM +0100, Kris Kennaway wrote: > >Ian West wrote: > >> dd if=/dev/zero bs=32768 of=junkfile count=10 seems to do it quite > >> reliably on all the boxes I have tested ? > > > >I am unable to reproduce this on 7.0. > > I can't reproduce it on 6.3-PRERELEASE/amd64 with 1GB RAM. > > vmstat -s;dd if=/dev/zero bs=32768 of=junkfile count=10;vmstat -s > shows the following changes: > 2 swap pager pageins > 2 swap pager pages paged in > 4 swap pager pageouts > 5 swap pager pages paged out >24 vnode pager pageins >78 vnode pager pages paged in > 0 vnode pager pageouts > 0 vnode pager pages paged out You may not have a fast enough disk. We have noticed an issue at work but only on faster controllers (e.g. certain mfi(4) drive configurations) when doing I/O to a single file like the dd command mentioned causes the buffer cache to fill up. The problem being that we can't lock the vm object to recycle pages when we hit the limit that is supposed to prevent this because all the pages in the cache are for the file (vm object) we are working on. Stephan (ups@) says this is fixed in 7. The tell-tale sign that we see is pagedaemon starts chewing up lots of CPU as the kernel tries to realign the page queues along with I/O throughput going down the toilet and being very erratic. -- John Baldwin These are the same symptoms as on a friend's system a little while back with 6.x. I forwarded him a message from this thread and he agreed, and he confirmed having an mfi. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: current zfs tuning in RELENG_7 (AMD64) suggestions ?
On Fri, May 01, 2009 at 04:42:09PM -0400, Mike Tancsa wrote: I gave the AMD64 version of 7.2 RC2 a spin and all installed as expected off the dvd INTEL S3200SHV MB, Core2Duo, 4G of RAM The writes are all within the normal variance of the tests except for b). Is there anything else that should be tuned ? Not that I am looking for any "magic bullets" but I just want to run this backup server as best as possible ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU a5000 98772 54.7 153111 31.5 100015 21.1 178730 85.3 368782 32.5 161.6 0.6 b5000 101271 57.9 154765 31.5 61325 13.9 176741 84.6 372477 32.8 149.3 0.6 c5000 102331 57.1 159559 29.5 105767 17.4 144410 63.8 299317 19.9 167.9 0.6 d5000 107308 58.6 175004 32.4 117926 18.8 143657 63.4 305126 20.0 167.6 0.6 You might want to try running gstat -I 10 during the test to see how fast each drive flushes the cache from ram and if there are any disks slower than the others. I've found some cards or slots cause drives to perform slower than other drives in the system, dragging down performance of the raid to the slowest drive(s). Individual performance testing of the drives outside of the raid might reveal something too, even just to find out what the maximum sequential speed of one drive is so you know that 4x(speed) is the best to hope for in raid tests. ZFS tends to cache heavily at the start of each write and you will probably see it bounce between no IO and furious writes, until the ram cache fills up more and it has no choice but to write almost constantly. This can affect the results between runs. I would recommend a larger count= that results in a test run of 30-60 seconds at least. Additionally, try other zfs raid types such as mirror and stripe to see if raidz is acting as an unexpectedly large bottleneck, I've found its serial write speed usually leaves something to be desired. Even if the other raid levels won't work realistically in the long run, its useful to raise the bar to find out what extra performance your IO setup can push. It could be useful to compare with gstripe and graid3 for further hardware performance evaluation. On the other hand, if you can read/write data faster than your network connection can push, you're probably at a workable level. Also, I believe that zfs uses a cluster size up to 128k (queueing multiple writes if it can, depending on the disk subsystem) so I think the computer has to do extra work if you are giving it bs=2048k since zfs will have to cut that into 16 pieces, sending one piece to each drive. You might try bs=512k or bs=128k for example to see if this has a positive effect. In a traditional raid5 setup, I've found I get head over heals the best performance when my bs= is the same size as the raid stripe size multiplied by the number of drives, and this gets weird when you have an odd number of drives because your optimum write size might be something like 768k which probably no application is going to produce :) Also it makes it hard to optimize UFS for a larger stripe size when the cluster sizes are generally limited to 16k such as in Solaris. Results tend to fluctuate a bit. offsitetmp# dd if=/dev/zero of=/tank1/test bs=2048k count=1000 1000+0 records in 1000+0 records out 2097152000 bytes transferred in 10.016818 secs (209363092 bytes/sec) offsitetmp# offsitetmp# dd if=/dev/zero of=/tank1/test bs=2048k count=1000 1000+0 records in 1000+0 records out 2097152000 bytes transferred in 10.733547 secs (195382943 bytes/sec) offsitetmp# Drives are raidz ad1: 1430799MB at ata3-master SATA300 ad2: 1430799MB at ata4-master SATA300 ad3: 1430799MB at ata5-master SATA300 ad4: 1430799MB at ata6-master SATA300 on ich9 ---Mike Mike Tancsa, tel +1 519 651 3400 Sentex Communications,m...@sentex.net Providing Internet since 1994www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: RFT: ZFS MFC
On Fri, May 15, 2009 at 05:02:22PM -0700, Kip Macy wrote: I've MFC'd ZFS v13 to RELENG_7 in a work branch. Please test if you can. http://svn.freebsd.org/base/user/kmacy/ZFS_MFC/ The standard disclaimers apply. This has only been lightly tested in a VM. Please do not use it with data you care about at this time. Thanks, Kip Seems to work for me so far. I had a zfs send hang part way through and with a notable speed difference depending on the direction but this is literally the first time I've tried zfs send/recv and the systems are setup different so I have no idea if it would have happened anyway. Eventually I could probably make these test systems more similar to give a fair test, but wanted to mention it so others could check. Thanks for working on the MFC, I'm excited to see progress there! It will play a factor in some upcoming server plans even if the MFC doesn't happen for months. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS booting without partitions (was: ZFS boot on zfs mirror)
I encountered the same symptoms today on both a 32bit and 64bit brand new install using gptzfsboot. It works for me when I use a copy of loader from an 8-current box with zfs support compiled in. I haven't looked into it much yet but it might help you. If you want, you can try the loader I am using from: http://www.egr.msu.edu/~mcdouga9/loader On Thu, May 28, 2009 at 10:41:42PM +0200, Lorenzo Perone wrote: On 28.05.2009, at 21:46, Mickael MAILLOT wrote: > hi, > > did you erase gmirror meta ? (on the last sector) > with: gmirror clear ad6 ohps I had forgotten that. just did it (in single user mode), but it didn't help :( Shall I repeat any of the other steps after clearing gmirror meta? thanx a lot for your help... Lorenzo > 2009/5/28 Lorenzo Perone : >> Hi, >> >> I tried hard... but without success ;( >> >> the result is, when choosing the disk with the zfs boot >> sectors in it (in my case F5, which goes to ad6), the kernel >> is not found. the console shows: >> >> forth not found >> definitions not found >> only not found >> (the above repeated several times) >> >> can't load 'kernel' >> >> and I get thrown to the loader prompt. >> lsdev does not show any ZFS devices. >> >> Strange thing: if I boot from the other disk, F1, which is my >> ad4 containing the normal ufs system I used to make up the other >> one, and escape to the loader prompt, lsdev actually sees the >> zpool which is on the other disk, and shows: >> zfs0: tank >> >> I tried booting with boot zfs:tank or zfs:tank:/boot/kernel/kernel, >> but there I get the panic: free: guard1 fail message. >> (would boot zfs:tank:/boot/kernel/kernel be correct, anyways?) >> >> Sure I'm doing something wrong, but what...? Is it a problem that >> the pool is made out of the second disk only (ad6)? >> >> Here are my details (note: latest stable and biosdisk.c merged >> with changes shown in r185095. no problems in buildworld/kernel): >> () ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS booting without partitions
I'm thinking that too. I spent some time taking stabs at figuring it out yesterday but didn't get anywhere useful. I did try compiling the -current src/sys/boot tree on 7.2 after a couple header tweaks to make it compile but the loader still didn't work. The working loader is the same file size as the broken loader unless it was compiled on i386 and then it is ~30k bigger for some reason (it shrinks to the same size as the rest if I force it to use the same 32bit compilation flags as used on amd64). Just mentioning this in case it saves someone else some time. I'm real pleased it works at all. Kip Macy wrote: Odds are that there are more changes that were made in HEAD to the loader that need to be MFC'd. -Kip On Mon, Jun 1, 2009 at 3:55 AM, Alberto Villa wrote: On Mon, Jun 1, 2009 at 12:06 PM, Henri Hennebert wrote: This is the file /boot/loader from 7.2-STABLE which is wrong. You can find a copy from 8.0-CURRENT and a script that I tested on a USB key) and is running for me: replacing /boot/loader with yours did the job thanks! -- Alberto Villa ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS NAS configuration question
I have a proof of concept system doing this. I started with a 7.2 install on zfs root, compiled world and kernel from 8, took a snapshot and made a clone for the 7.2 install, and proceeded to upgrade the current fs to 8.0. After updating the loader.conf in the 7.2 zfs to point to its own cloned fs, I can pick which one to boot with a simple "zpool set bootfs=z/ROOT/7.2" or "zpool set bootfs=z/ROOT/8.0" before rebooting. I also tried rsyncing from a FFS based system into a new ZFS in that same zpool, used DESTDIR with installkernel and installworld to update the imported OS to support zfs, setup its boot loader and misc config files, and was able to boot from it using zpool to set it as the bootfs. Somewhat like shifting around OS images in a virtualization environment except its easy to reach inside the "image" to upgrade/modify it, copy them between systems, and no execution overhead while running one since its still on bare metal (but only one running OS per server of course). This makes it very easy to swap an OS onto another server if you need better/lesser hardware or just want to test. Dan Naumov wrote: This reminds me. I was reading the release and upgrade notes of OpenSolaris 2009.6 and noted one thing about upgrading from a previous version to the new one:: When you pick the "upgrade OS" option in the OpenSolaris installer, it will check if you are using a ZFS root partition and if you do, it intelligently suggests to take a current snapshot of the root filesystem. After you finish the upgrade and do a reboot, the boot menu offers you the option of booting the new upgraded version of the OS or alternatively _booting from the snapshot taken by the upgrade installation procedure_. Reading that made me pause for a second and made me go "WOW", this is how UNIX system upgrades should be done. Any hope of us lowly users ever seeing something like this implemented in FreeBSD? :) - Dan Naumov On Tue, Jun 2, 2009 at 9:47 PM, Zaphod Beeblebrox wrote: The system boots from a pair of drives in a gmirror. Mot because you can't boot from ZFS, but because it's just so darn stable (and it predates the use of ZFS). Really there are two camps here --- booting from ZFS is the use of ZFS as the machine's own filesystem. This is one goal of ZFS that is somewhat imperfect on FreeBSD at the momment. ZFS file servers are another goal where booting from ZFS is not really required and only marginally beneficial. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS booting without partitions
Henri Hennebert wrote: Kip Macy wrote: On Mon, Jun 1, 2009 at 10:21 AM, Adam McDougall wrote: I'm thinking that too. I spent some time taking stabs at figuring it out yesterday but didn't get anywhere useful. I did try compiling the -current src/sys/boot tree on 7.2 after a couple header tweaks to make it compile but the loader still didn't work. The working loader is the same file size as the broken loader unless it was compiled on i386 and then it is ~30k bigger for some reason (it shrinks to the same size as the rest if I force it to use the same 32bit compilation flags as used on amd64). Just mentioning this in case it saves someone else some time. I'm real pleased it works at all. If someone has the time to track down the differences I'll MFC them. I'm not using ZFS boot at the moment so I have no way of testing. At last I get this F.G diff!!! The problem was in libstand.a. By the way , the patch also take into account the update of Doug Rabson to answer my problem with too many devices / pools. Happy to help on this one. I can confirm that this fixes my loader when I patch, compile, install libstand then compile and install the loader. Thanks for finding it! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: /boot/loader and RELENG_7 (WAS: gptzfsboot and RELENG_7)
Dan Naumov wrote: Ah, so there is still a (small) piece of 8-CURRENT needed to have a working 7-STABLE zfs boot configuration? I am getting really confused now, if I add LOADER_ZFS_SUPPORT=yes to my /etc/make.conf, the RELENG_7 system will be built with zfs boot support, but I still need the actual /boot/loader from 8-CURRENT? Is that getting MFC'ed into into RELENG_7 anytime soon? Where are all make.conf options documented by the way? Neither /usr/share/examples/etc/make.conf nor "man make.conf" make any reference to the LOADER_ZFS_SUPPORT option. - Dan Naumov ZFS booting without partitions See the last emails in the thread "ZFS booting without partitions", it has a patch which makes the -stable loader work (some changes that didn't get merged to -stable). I think only the change from 8 to 64 is all thats needed but I applied the whole patch when I tested it. Also, yes LOADER_ZFS_SUPPORT causes gptzfsboot to be compiled as well as add ZFS support to loader (and other stuff). It may not have been documented (I didn't check). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: pw groupadd/useradd fail when the nscd cache is used for name/group resolution
Michael Proto wrote: On Mon, Jul 13, 2009 at 12:57 PM, Ulrich Spörlein wrote: On Thu, 09.07.2009 at 16:13:25 +0300, Vlad Galu wrote: I've stumbled upon this while installing postgres. In /etc/nsswitch.conf I had "group: cache files compat" and "passwd: cache files compat". Once I commented them out things started working again. Before the change, this is how it looked like: -- cut here -- [r...@vgalu /usr/ports/databases/postgresql84-server]# pw group add pgsql -g 70 pw: group disappeared during update [r...@vgalu /usr/ports/databases/postgresql84-server]# pw group add pgsql -g 70 pw: group 'pgsql' already exists [r...@vgalu /usr/ports/databases/postgresql84-server]# -- and here -- Shouldn't 'files' be used upon a cache miss? If this is a PEBKAC, sorry for the noise. Just a me too. This is most likely because nscd is also caching negative lookups. The usual workaround would be to restart it using /etc/rc.d/nscd restart A slightly lower-impact alternative would be to use "nscd -i passwd" to invalidate the password cache. -Proto ___ I was intending to report this soon as well (its been on my list for a while) as a problematic issue while installing ports. The other issue I had was Java would crash immediately if I had nscd running (configured to cache YP). I plan to report that soon if it still happens with 1.6. I probably tested with 1.4 or 1.5. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Hangs with mrsas?
On 03/07/2016 14:09, Garrett Wollman wrote: > I have a new Dell server with a typical Dell hardware RAID. pciconf > identifies it as "MegaRAID SAS-3 3008 [Fury]"; mfiutil reports: > > mfi0 Adapter: > Product Name: PERC H330 Adapter >Serial Number: 5AT00PI > Firmware: 25.3.0.0016 > RAID Levels: > Battery Backup: not present >NVRAM: 32K > Onboard Memory: 0M > Minimum Stripe: 64K > Maximum Stripe: 64K > > Since I'm running ZFS I have the RAID functions disabled and the > drives are presented as "system physical drives" ("mfisyspd[0-3]" when > using mfi(4)). I wanted to use mrsas(4) instead, so that I could have > direct access to the drives' SMART functions, and this seemed to work > after I set the hw.mfi.mrsas_enable tunable, with one major exception: > all drive access would hang after about 12 hours and the machine would > require a hard reset to come back up. > > Has anyone seen this before? The driver in head doesn't appear to be > any newer. > > -GAWollman I did some similar testing in late Jan but perhaps not long enough to notice your symptoms. I'm pretty certain I used mrsas_enable since that is what I would plan to use in production. I had a H330-mini with the same firmware rev in a R430. I was testing with some 2.5" Seagate ST9600205SS 600gb disks from another system. What kind of disks were you using and in what kind of configuration? Does a simpler config stay up? If you are using SSD, I wonder if disks would survive? SSD firmware issue? Was it hard hung at the console too? Can you enter DDB? If you don't mind, which Dell model is this? Sorry I don't have any directly helpful suggestions but you have good timing because this could very well influence hardware choices. Thanks. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Boot hang on Xen after r318347/(310418)
Hello, Recently I made a new build of 11-STABLE but encountered a boot hang at this state: http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png It is easy to reproduce, I can just boot from any 11 or 12 ISO that contains the commit. I compiled various svn revisions to confirm that r318347 caused the issue and r318346 is fine. With r318347 or later including the latest 11-STABLE, the system will only boot with one virtual CPU in XenServer. Any more cpus and it hangs. I also tried a 12 kernel from head this afternoon and I have the same hang. I had this issue on XenServer 7 (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I also did much of my testing with a GENERIC kernel to try to rule out kernel configuration mistakes. When it hangs, the performance monitoring in Xen tells me at least one CPU is pegged. r318674 boots fine on physical hardware without Xen involved. Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to my kernel but it turned the hang into a panic but with any number of CPUs: http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png I think I verified that this happens with EARLY_AP_STARTUP before r318347 too so I'll assume it is a different problem. I may need to do some experimentation to figure out how to get the console to pass through hotkeys to drop into a kernel debugger. I could also try modifying the kernel config if I can make it print information about the hang. Is there anything else I can provide that might help? Would you prefer this be entered in a bugzilla report? Thanks. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Boot hang on Xen after r318347/(310418)
On Thu, May 25, 2017 at 10:41:03AM +0100, Roger Pau Monné wrote: > On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote: > > Hello, > > > > Recently I made a new build of 11-STABLE but encountered a boot hang > > at this state: > > http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png > > > > It is easy to reproduce, I can just boot from any 11 or 12 ISO that > > contains the commit. > > I have just tested latest HEAD (r318861) and stable/11 (r318854) and > they both work fine on my environment (a VM with 4 vCPUs and 2GB of > RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input, > he has been doing some tests on HEAD and AFAIK he hasn't seen any > issues. > > > I compiled various svn revisions to confirm that r318347 caused the > > issue and r318346 is fine. With r318347 or later including the latest > > 11-STABLE, the system will only boot with one virtual CPU in XenServer. > > Any more cpus and it hangs. I also tried a 12 kernel from head this > > afternoon and I have the same hang. I had this issue on XenServer 7 > > (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I > > also did much of my testing with a GENERIC kernel to try to rule out > > kernel configuration mistakes. When it hangs, the performance > > monitoring in Xen tells me at least one CPU is pegged. r318674 boots > > fine on physical hardware without Xen involved. > > > > Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing > > r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to > > my kernel but it turned the hang into a panic but with any number of > > CPUs: > > http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png > > I guess this is on stable/11 right? The panic looks easier to debug > that the hang, so let's start by this one. Can you enable the serial > console and kernel debug options in order to get a trace? With just > this it's almost impossible to know what went wrong. Yes this was on stable/11 amd64. > If you still have that kernel around (and it's debug symbols), can you > do: > > $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0x80793344 > > (The address is the instruction pointer on the crash image, I think I > got it right) I'll reproduce this soon and get the results from that command. > In order to compile a stable/11 kernel with full debugging support you > will have to add: > > # For full debugger support use (turn off in stable branch): > options BUF_TRACKING# Track buffer history > options DDB # Support DDB. > options FULL_BUF_TRACKING # Track more buffer history > options GDB # Support remote GDB. > options DEADLKRES # Enable the deadlock resolver > options INVARIANTS # Enable calls of extra sanity checking > options INVARIANT_SUPPORT # Extra sanity checks of internal > structures, required by INVARIANTS > options WITNESS # Enable checks to detect deadlocks and > cycles > options WITNESS_SKIPSPIN# Don't run witness on spinlocks for > speed > options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones > > To your kernel config file. I'll work on that soon too when I get a chance, thanks. > > Just to be sure, this is an amd64 kernel right? yes > > Roger. > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Boot hang on Xen after r318347/(310418)
On 05/25/2017 09:28, Adam McDougall wrote: > On Thu, May 25, 2017 at 10:41:03AM +0100, Roger Pau Monné wrote: > >> On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote: >>> Hello, >>> >>> Recently I made a new build of 11-STABLE but encountered a boot hang >>> at this state: >>> http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png >>> >>> It is easy to reproduce, I can just boot from any 11 or 12 ISO that >>> contains the commit. >> >> I have just tested latest HEAD (r318861) and stable/11 (r318854) and >> they both work fine on my environment (a VM with 4 vCPUs and 2GB of >> RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input, >> he has been doing some tests on HEAD and AFAIK he hasn't seen any >> issues. >> >>> I compiled various svn revisions to confirm that r318347 caused the >>> issue and r318346 is fine. With r318347 or later including the latest >>> 11-STABLE, the system will only boot with one virtual CPU in XenServer. >>> Any more cpus and it hangs. I also tried a 12 kernel from head this >>> afternoon and I have the same hang. I had this issue on XenServer 7 >>> (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I >>> also did much of my testing with a GENERIC kernel to try to rule out >>> kernel configuration mistakes. When it hangs, the performance >>> monitoring in Xen tells me at least one CPU is pegged. r318674 boots >>> fine on physical hardware without Xen involved. >>> >>> Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing >>> r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to >>> my kernel but it turned the hang into a panic but with any number of >>> CPUs: >>> http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png >> >> I guess this is on stable/11 right? The panic looks easier to debug >> that the hang, so let's start by this one. Can you enable the serial >> console and kernel debug options in order to get a trace? With just >> this it's almost impossible to know what went wrong. > > Yes this was on stable/11 amd64. > >> >> Roger. I worked on this today and the short version is recent kernels no longer hang or panic with EARLY_AP_STARTUP which includes the 20170602 iso images of 11 and 12. Adding EARLY_AP_STARTUP to my kernel config appears to prevent the hang and something between r318855 (May 24) and r319554 (today, June 3) prevents the panic. I'm tempted to figure out which commit but I already spent hours bisecting and building today, so since this seems to be a forward working solution, I'm content. Thanks. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Fatal trap 12: page fault while in kernel mode
On Thu, Apr 19, 2007 at 02:33:29PM +0200, Kai wrote: On Wed, Apr 11, 2007 at 12:53:32PM +0200, Kai wrote: > > Hello all, > > We're running into regular panics on our webserver after upgrading > from 4.x to 6.2-stable: Hi Again, The panics keep happening, so I'm trying alternate kernel setups. This is a trace of a panic on a default SMP kernel with debugging symbols. Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x34 ^ fault code = supervisor read, page not present ^^^ #7 0xc06bdefa in vfs_vmio_release (bp=0xdbec2560) at atomic.h:146 #8 0xc06be728 in getnewbuf (slpflag=0, slptimeo=0, size=6585, maxsize=8192) at ../../../kern/vfs_bio.c:1779 #9 0xc06bfccc in getblk (vp=0xca2cfdd0, blkno=8438, size=6585, slpflag=0, slptimeo=0, flags=0) at ../../../kern/vfs_bio.c:2497 #10 0xc075ad41 in nfs_getcacheblk (vp=0xca2cfdd0, bn=8438, size=6585, td=0xc8cd1c00) at ../../../nfsclient/nfs_bio.c:1261 #11 0xc075a978 in nfs_write (ap=0x0) at ../../../nfsclient/nfs_bio.c:1069 #12 0xc089fde6 in VOP_WRITE_APV (vop=0xc0984440, a=0xeb9cfbec) at vnode_if.c:698 #13 0xc06dbb26 in vn_write (fp=0xc8940e10, uio=0xeb9cfcbc, active_cred=0xc89ee880, flags=0, td=0xc8cd1c00) at vnode_if.h:372 #14 0xc0698f63 in dofilewrite (td=0xc8cd1c00, fd=5, fp=0xc8940e10, auio=0xeb9cfcbc, offset=Unhandled dwarf expression opcode 0x93 ) at file.h:252 #15 0xc0698e07 in kern_writev (td=0xc8cd1c00, fd=5, auio=0xeb9cfcbc) at ../../../kern/sys_generic.c:402 #16 0xc0698d2d in write (td=0xc8cd1c00, uap=0xc8cd1c00) at ../../../kern/sys_generic.c:326 I believe I am seeing the same panic on my samba servers, sometimes from NFS and sometimes from FFS. I see it on i386 and amd64 alike. I do not know how to manually trigger it, but I do have two servers sitting in DDB from after the panic, waiting for more experienced hands to continue the debugging from what I have already done. I filed a PR with as much details as I could think of, and it would be wonderful if someone could look at it and either tell me what else to do in DDB, or I could provide remote access to the existing DDB session to a developer. Both servers crashed in vfs_vmio_release but one was through NFS and one through FFS. pr 111831 http://docs.freebsd.org/cgi/mid.cgi?200704181924.l3IJOMUL088901 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"