Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Jan 24, 2013, at 4:24 PM, Wojciech Puchar wrote: >> > Except it is on paper reliability. This "on paper" reliability saved my ass numerous times. For example I had one home NAS server machine with flaky SATA controller that would not detect one of the four drives from time to time on reboot. This made my pool degraded several times, and even rebooting with let's say disk4 failed to a situation that disk3 is failed did not corrupt any data. I don't think this is possible with any other open source FS, let alone hardware RAID that would drop the whole array because of this. I have never ever personally lost any data on ZFS. Yes, the performance is another topic, and you must know what you are doing, and what is your usage pattern, but from reliability standpoint, to me ZFS looks more durable than anything else. P.S.: My home NAS is running freebsd-CURRENT with ZFS from the first version available. Several drives died, two times the pool was expanded by replacing all drives one by one and resilvered, no single byte lost. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Jan 23, 2013, at 11:09 PM, Mark Felder wrote: > On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees wrote: > >> >> So we have to take your word for it? >> Provide a link if you're going to make assertions, or they're no more than >> your own opinion. > > I've heard this same thing -- every vdev == 1 drive in performance. I've > never seen any proof/papers on it though. > ___ > freebsd...@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscr...@freebsd.org" Here is a blog post that describes why this is true for IOPS: http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: pgbench performance is lagging compared to Linux and DragonflyBSD?
On Nov 8, 2012, at 12:56 PM, Wojciech Puchar wrote: >> EC> That thread starts here: >> EC> http://lists.freebsd.org/pipermail/freebsd-arch/2010-April/010143.html >> Year 2010! And we still limited by MAXPHYS (128K) transfers :( > put > options MAXPHYS=2097152 > in your kernel config. > > EVERYTHING works in all production machines for over a year > > > the only exception is my laptop with OCZ petrol SSD that hangs on any > transfer >1MB, i've set it to 0.5MB here. Have you measured the performance increase? I'm also interested in bigger MAXBSIZE as this is what the NFS server uses as maximum transfer size. Linux and Solaris can do up to 1MB. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: pgbench performance is lagging compared to Linux and DragonflyBSD?
On Nov 7, 2012, at 4:48 PM, Wojciech Puchar wrote: >>> >>> actually FreeBSD defaults are actually good for COMMON usage. and can be >>> tuned. >>> >>> default MAXBSIZE is one exception. >> >> "Common usage" is vague. While FreeBSD might do ok for some applications >> (dev box, simple workstation/laptop, etc), there are other areas that >> require additional tuning to get better perf that arguably shouldn't as much >> (or there should be templates for doing so): 10GbE and mbuf and network >> tuning; file server and file descriptor, network tuning, etc; low latency >> desktop and scheduler tweaking; etc. > > still any idea why MAXBSIZE is 128kB by default. for modern hard disk it is a > disaster. 2 or even 4 megabyte is OK. > >> >> Not to say that freebsd is entirely at fault, but because it's more of a >> commodity OS that Linux, more tweaking is required... > actually IMHO much more tweaking is needed with linux, at least from what i > know from other people. And they are not newbies > ___ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org" Actually MAXBSIZE is 64k, MAXPHYS is 128k. There was a thread about NFS performance where it was mentioned that bigger MAXBSIZE leads to KVA fragmentation. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 23, 2012, at 2:36 AM, Rick Macklem wrote: > Ivan Voras wrote: >> On 20 October 2012 13:42, Nikolay Denev wrote: >> >>> Here are the results from testing both patches : >>> http://home.totalterror.net/freebsd/nfstest/results.html >>> Both tests ran for about 14 hours ( a bit too much, but I wanted to >>> compare different zfs recordsize settings ), >>> and were done first after a fresh reboot. >>> The only noticeable difference seems to be much more context >>> switches with Ivan's patch. >> >> Thank you very much for your extensive testing! >> >> I don't know how to interpret the rise in context switches; as this is >> kernel code, I'd expect no context switches. I hope someone else can >> explain. >> >> But, you have also shown that my patch doesn't do any better than >> Rick's even on a fairly large configuration, so I don't think there's >> value in adding the extra complexity, and Rick knows NFS much better >> than I do. >> >> But there are a few things other than that I'm interested in: like why >> does your load average spike almost to 20-ties, and how come that with >> 24 drives in RAID-10 you only push through 600 MBit/s through the 10 >> GBit/s Ethernet. Have you tested your drive setup locally (AESNI >> shouldn't be a bottleneck, you should be able to encrypt well into >> Gbyte/s range) and the network? >> >> If you have the time, could you repeat the tests but with a recent >> Samba server and a CIFS mount on the client side? This is probably not >> important, but I'm just curious of how would it perform on your >> machine. > > Oh, I realized that, if you are testing 9/stable (and not head), that > you won't have r227809. Without that, all reads on a given file will > be serialized, because the server will acquire an exclusive lock on > the vnode. > > The patch for r227809 in head is at: > http://people.freebsd.org/~rmacklem/lkshared.patch > This should apply fine to a 9 system (but not 8.n), I think. > > Good luck with it and have fun, rick > >> ___ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to >> "freebsd-hackers-unsubscr...@freebsd.org" Thanks, I've applied the patch by hand because of some differences and I'm now rebuilding. In case they are still needed here are the "dd" tests with loopback UDP mount : http://home.totalterror.net/freebsd/nfstest/udp-dd.html Over udp writing degrades much worse... ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 20, 2012, at 10:45 PM, Outback Dingo wrote: > On Sat, Oct 20, 2012 at 3:28 PM, Ivan Voras wrote: >> On 20 October 2012 14:45, Rick Macklem wrote: >>> Ivan Voras wrote: >> I don't know how to interpret the rise in context switches; as this is kernel code, I'd expect no context switches. I hope someone else can explain. >>> Don't the mtx_lock() calls spin for a little while and then context >>> switch if another thread still has it locked? >> >> Yes, but are in-kernel context switches also counted? I was assuming >> they are light-weight enough not to count. >> >>> Hmm, I didn't look, but were there any tests using UDP mounts? >>> (I would have thought that your patch would mainly affect UDP mounts, >>> since that is when my version still has the single LRU queue/mutex. >> >> Another assumption - I thought UDP was the default. >> >>> As I think you know, my concern with your patch would be correctness >>> for UDP, not performance.) >> >> Yes. > > Ive got a similar box config here, with 2x 10GB intel nics, and 24 2TB > drives on an LSI controller. > Im watching the thread patiently, im kinda looking for results, and > answers, Though Im also tempted to > run benchmarks on my system also see if i get similar results I also > considered that netmap might be one > but not quite sure if it would help NFS, since its to hard to tell if > its a network bottle neck, though it appears > to be network related. > >> ___ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org" Doesn't look like network issue to me. From my observations it's more like some overhead in nfs and arc. The boxes easily push 10G with simple iperf test. Running two iperf test over each port of the dual ported 10G nics gives 960MB/sec regardles which machine is the server. Also, I've seen over 960Gb/sec over NFS with this setup, but I can't understand what type of workload was able to do this. At some point I was able to do this with simple dd, then after a reboot I was no longer to push this traffic. I'm thinking something like ARC/kmem fragmentation might be the issue? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 20, 2012, at 4:00 PM, Nikolay Denev wrote: > > On Oct 20, 2012, at 3:11 PM, Ivan Voras wrote: > >> On 20 October 2012 13:42, Nikolay Denev wrote: >> >>> Here are the results from testing both patches : >>> http://home.totalterror.net/freebsd/nfstest/results.html >>> Both tests ran for about 14 hours ( a bit too much, but I wanted to compare >>> different zfs recordsize settings ), >>> and were done first after a fresh reboot. >>> The only noticeable difference seems to be much more context switches with >>> Ivan's patch. >> >> Thank you very much for your extensive testing! >> >> I don't know how to interpret the rise in context switches; as this is >> kernel code, I'd expect no context switches. I hope someone else can >> explain. >> >> But, you have also shown that my patch doesn't do any better than >> Rick's even on a fairly large configuration, so I don't think there's >> value in adding the extra complexity, and Rick knows NFS much better >> than I do. >> >> But there are a few things other than that I'm interested in: like why >> does your load average spike almost to 20-ties, and how come that with >> 24 drives in RAID-10 you only push through 600 MBit/s through the 10 >> GBit/s Ethernet. Have you tested your drive setup locally (AESNI >> shouldn't be a bottleneck, you should be able to encrypt well into >> Gbyte/s range) and the network? >> >> If you have the time, could you repeat the tests but with a recent >> Samba server and a CIFS mount on the client side? This is probably not >> important, but I'm just curious of how would it perform on your >> machine. > > The first iozone local run finished, I'll paste just the result here, and > also the same test over NFS for comparison: > (This is iozone doing 8k sized IO ops, on ZFS dataset with recordsize=8k) > > NFS: >random random > bkwd record stride > KB reclen write rewritereadrereadread write > read rewrite read >33554432 849735522 2930 290629083886 > > > Local: >random random > bkwd record stride > KB reclen write rewritereadrereadread write > read rewrite read >33554432 8 34740 41390 135442 142534 24992 12493 > > > > P.S.: I forgot to mention that the network is with 9K mtu. Here are the full results of the test on the local fs : http://home.totalterror.net/freebsd/nfstest/local_fs/ I'm now running the same test on NFS mount over the loopback interface on the NFS server machine. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 20, 2012, at 3:11 PM, Ivan Voras wrote: > On 20 October 2012 13:42, Nikolay Denev wrote: > >> Here are the results from testing both patches : >> http://home.totalterror.net/freebsd/nfstest/results.html >> Both tests ran for about 14 hours ( a bit too much, but I wanted to compare >> different zfs recordsize settings ), >> and were done first after a fresh reboot. >> The only noticeable difference seems to be much more context switches with >> Ivan's patch. > > Thank you very much for your extensive testing! > > I don't know how to interpret the rise in context switches; as this is > kernel code, I'd expect no context switches. I hope someone else can > explain. > > But, you have also shown that my patch doesn't do any better than > Rick's even on a fairly large configuration, so I don't think there's > value in adding the extra complexity, and Rick knows NFS much better > than I do. > > But there are a few things other than that I'm interested in: like why > does your load average spike almost to 20-ties, and how come that with > 24 drives in RAID-10 you only push through 600 MBit/s through the 10 > GBit/s Ethernet. Have you tested your drive setup locally (AESNI > shouldn't be a bottleneck, you should be able to encrypt well into > Gbyte/s range) and the network? > > If you have the time, could you repeat the tests but with a recent > Samba server and a CIFS mount on the client side? This is probably not > important, but I'm just curious of how would it perform on your > machine. The first iozone local run finished, I'll paste just the result here, and also the same test over NFS for comparison: (This is iozone doing 8k sized IO ops, on ZFS dataset with recordsize=8k) NFS: random random bkwd record stride KB reclen write rewritereadrereadread write read rewrite read 33554432 849735522 2930 290629083886 Local: random random bkwd record stride KB reclen write rewritereadrereadread write read rewrite read 33554432 8 34740 41390 135442 142534 24992 12493 P.S.: I forgot to mention that the network is with 9K mtu. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 20, 2012, at 3:11 PM, Ivan Voras wrote: > On 20 October 2012 13:42, Nikolay Denev wrote: > >> Here are the results from testing both patches : >> http://home.totalterror.net/freebsd/nfstest/results.html >> Both tests ran for about 14 hours ( a bit too much, but I wanted to compare >> different zfs recordsize settings ), >> and were done first after a fresh reboot. >> The only noticeable difference seems to be much more context switches with >> Ivan's patch. > > Thank you very much for your extensive testing! > > I don't know how to interpret the rise in context switches; as this is > kernel code, I'd expect no context switches. I hope someone else can > explain. > > But, you have also shown that my patch doesn't do any better than > Rick's even on a fairly large configuration, so I don't think there's > value in adding the extra complexity, and Rick knows NFS much better > than I do. > > But there are a few things other than that I'm interested in: like why > does your load average spike almost to 20-ties, and how come that with > 24 drives in RAID-10 you only push through 600 MBit/s through the 10 > GBit/s Ethernet. Have you tested your drive setup locally (AESNI > shouldn't be a bottleneck, you should be able to encrypt well into > Gbyte/s range) and the network? > > If you have the time, could you repeat the tests but with a recent > Samba server and a CIFS mount on the client side? This is probably not > important, but I'm just curious of how would it perform on your > machine. I've now started this test locally. But from previous different iozone runs, I remember locally the speed was much better, but I will wait for this test to finish, as the comparison will be better. But I think there is still something fishy… I have cases where I have reached 1000MB/s over NFS (from network stats, not local machine stats), but sometimes it is very slow even for file completely in ARC. Rick mentioned that this could be due to RPC overhead and network round trip time, but earlier in this thread I've done a test only on the server by mounting the NFS exported ZFS dataset locally and did some tests with "dd": > To take the network out of the equation I redid the test by mounting the same > filesystem over NFS on the server: > > [18:23]root@goliath:~# mount -t nfs -o > rw,hard,intr,tcp,nfsv3,rsize=1048576,wsize=1048576 > localhost:/tank/spa_db/undo /mnt > [18:24]root@goliath:~# dd if=/mnt/data.dbf of=/dev/null bs=1M > 30720+1 records in > 30720+1 records out > 32212262912 bytes transferred in 79.793343 secs (403696120 bytes/sec) > [18:25]root@goliath:~# dd if=/mnt/data.dbf of=/dev/null bs=1M > 30720+1 records in > 30720+1 records out > 32212262912 bytes transferred in 12.033420 secs (2676900110 bytes/sec) > > During the first run I saw several nfsd threads in top, along with dd and > again zero disk I/O. > There was increase in memory usage because of the double buffering > ARC->buffercahe. > The second run was with all of the nfsd threads totally idle, and read > directly from the buffercache. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 18, 2012, at 6:11 PM, Nikolay Denev wrote: > > On Oct 15, 2012, at 5:34 PM, Ivan Voras wrote: > >> On 15 October 2012 16:31, Nikolay Denev wrote: >>> >>> On Oct 15, 2012, at 2:52 PM, Ivan Voras wrote: >> >>>> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch >>>> >>>> It should apply to HEAD without Rick's patches. >>>> >>>> It's a bit different approach than Rick's, breaking down locks even more. >>> >>> Applied and compiled OK, I will be able to test it tomorrow. >> >> Ok, thanks! >> >> The differences should be most visible in edge cases with a larger >> number of nfsd processes (16+) and many CPU cores. > > I'm now rebooting with your patch, and hopefully will have some results > tomorrow. > Here are the results from testing both patches : http://home.totalterror.net/freebsd/nfstest/results.html Both tests ran for about 14 hours ( a bit too much, but I wanted to compare different zfs recordsize settings ), and were done first after a fresh reboot. The only noticeable difference seems to be much more context switches with Ivan's patch. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 15, 2012, at 5:34 PM, Ivan Voras wrote: > On 15 October 2012 16:31, Nikolay Denev wrote: >> >> On Oct 15, 2012, at 2:52 PM, Ivan Voras wrote: > >>> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch >>> >>> It should apply to HEAD without Rick's patches. >>> >>> It's a bit different approach than Rick's, breaking down locks even more. >> >> Applied and compiled OK, I will be able to test it tomorrow. > > Ok, thanks! > > The differences should be most visible in edge cases with a larger > number of nfsd processes (16+) and many CPU cores. I'm now rebooting with your patch, and hopefully will have some results tomorrow. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: syncing large mmaped files
On Oct 18, 2012, at 3:08 AM, Tristan Verniquet wrote: > > I want to work with large (1-10G) files in memory but eventually sync them > back out to disk. The problem is that the sync process appears to lock the > file in kernel for the duration of the sync, which can run into minutes. This > prevents other processes from reading from the file (unless they already have > it mapped) for this whole time. Is there any way to prevent this? I think I > read in a post somewhere about openbsd implementing partial-writes when it > hits a file with lots of dirty pages in order to prevent this. Is there > anything available for FreeBSD or is there another way around it? > > Sorry if this is the wrong mailing list. > > ___ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org" Isn't msync(2) what you are looking for? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 15, 2012, at 2:52 PM, Ivan Voras wrote: > On 13/10/2012 17:22, Nikolay Denev wrote: > >> drc3.patch applied and build cleanly and shows nice improvement! >> >> I've done a quick benchmark using iozone over the NFS mount from the Linux >> host. >> > > Hi, > > If you are already testing, could you please also test this patch: > > http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch > > It should apply to HEAD without Rick's patches. > > It's a bit different approach than Rick's, breaking down locks even more. > Applied and compiled OK, I will be able to test it tomorrow. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 15, 2012, at 2:52 PM, Ivan Voras wrote: > On 13/10/2012 17:22, Nikolay Denev wrote: > >> drc3.patch applied and build cleanly and shows nice improvement! >> >> I've done a quick benchmark using iozone over the NFS mount from the Linux >> host. >> > > Hi, > > If you are already testing, could you please also test this patch: > > http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch > > It should apply to HEAD without Rick's patches. > > It's a bit different approach than Rick's, breaking down locks even more. > I will try to apply it to RELENG_9 as that's what I'm running and compare the results. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 13, 2012, at 5:05 AM, Rick Macklem wrote: > I wrote: >> Oops, I didn't get the "readahead" option description >> quite right in the last post. The default read ahead >> is 1, which does result in "rsize * 2", since there is >> the read + 1 readahead. >> >> "rsize * 16" would actually be for the option "readahead=15" >> and for "readahead=16" the calculation would be "rsize * 17". >> >> However, the example was otherwise ok, I think? rick > > I've attached the patch drc3.patch (it assumes drc2.patch has already been > applied) that replaces the single mutex with one for each hash list > for tcp. It also increases the size of NFSRVCACHE_HASHSIZE to 200. > > These patches are also at: > http://people.freebsd.org/~rmacklem/drc2.patch > http://people.freebsd.org/~rmacklem/drc3.patch > in case the attachments don't get through. > > rick > ps: I haven't tested drc3.patch a lot, but I think it's ok? drc3.patch applied and build cleanly and shows nice improvement! I've done a quick benchmark using iozone over the NFS mount from the Linux host. drc2.pach (but with NFSRVCACHE_HASHSIZE=500) TEST WITH 8K - Auto Mode Using Minimum Record Size 8 KB Using Maximum Record Size 8 KB Using minimum file size of 2097152 kilobytes. Using maximum file size of 2097152 kilobytes. O_DIRECT feature enabled SYNC Mode. OPS Mode. Output is in operations per second. Command line used: iozone -a -y 8k -q 8k -n 2g -g 2g -C -I -o -O -i 0 -i 1 -i 2 Time Resolution = 0.01 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewritereadrereadread write read rewrite read fwrite frewrite fread freread 2097152 819191914 2356 232123351706 TEST WITH 1M - Auto Mode Using Minimum Record Size 1024 KB Using Maximum Record Size 1024 KB Using minimum file size of 2097152 kilobytes. Using maximum file size of 2097152 kilobytes. O_DIRECT feature enabled SYNC Mode. OPS Mode. Output is in operations per second. Command line used: iozone -a -y 1m -q 1m -n 2g -g 2g -C -I -o -O -i 0 -i 1 -i 2 Time Resolution = 0.01 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewritereadrereadread write read rewrite read fwrite frewrite fread freread 20971521024 73 64 477 486 496 61 drc3.patch TEST WITH 8K - Auto Mode Using Minimum Record Size 8 KB Using Maximum Record Size 8 KB Using minimum file size of 2097152 kilobytes. Using maximum file size of 2097152 kilobytes. O_DIRECT feature enabled SYNC Mode. OPS Mode. Output is in operations per second. Command line used: iozone -a -y 8k -q 8k -n 2g -g 2g -C -I -o -O -i 0 -i 1 -i 2 Time Resolution = 0.01 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewritereadrereadread write read rewrite read fwrite frewrite fread freread 2097152 821082397 3001 301330102389 TEST WITH 1M - Auto Mode Using Minimum Record Size 1024 KB Using Maximum Record Size 1024 KB Using minimum file size of 2097152 kilobytes. Using maximum file size of 2097152 kilobytes. O_DIRECT feature enabled SYNC Mode. OPS Mode. Output is in operations per second. Command line use
Re: NFS server bottlenecks
On Oct 11, 2012, at 7:20 PM, Nikolay Denev wrote: > On Oct 11, 2012, at 8:46 AM, Nikolay Denev wrote: > >> >> On Oct 11, 2012, at 1:09 AM, Rick Macklem wrote: >> >>> Nikolay Denev wrote: >>>> On Oct 10, 2012, at 3:18 AM, Rick Macklem >>>> wrote: >>>> >>>>> Nikolay Denev wrote: >>>>>> On Oct 4, 2012, at 12:36 AM, Rick Macklem >>>>>> wrote: >>>>>> >>>>>>> Garrett Wollman wrote: >>>>>>>> <>>>>>>> said: >>>>>>>> >>>>>>>>>> Simple: just use a sepatate mutex for each list that a cache >>>>>>>>>> entry >>>>>>>>>> is on, rather than a global lock for everything. This would >>>>>>>>>> reduce >>>>>>>>>> the mutex contention, but I'm not sure how significantly since >>>>>>>>>> I >>>>>>>>>> don't have the means to measure it yet. >>>>>>>>>> >>>>>>>>> Well, since the cache trimming is removing entries from the >>>>>>>>> lists, >>>>>>>>> I >>>>>>>>> don't >>>>>>>>> see how that can be done with a global lock for list updates? >>>>>>>> >>>>>>>> Well, the global lock is what we have now, but the cache trimming >>>>>>>> process only looks at one list at a time, so not locking the list >>>>>>>> that >>>>>>>> isn't being iterated over probably wouldn't hurt, unless there's >>>>>>>> some >>>>>>>> mechanism (that I didn't see) for entries to move from one list >>>>>>>> to >>>>>>>> another. Note that I'm considering each hash bucket a separate >>>>>>>> "list". (One issue to worry about in that case would be >>>>>>>> cache-line >>>>>>>> contention in the array of hash buckets; perhaps >>>>>>>> NFSRVCACHE_HASHSIZE >>>>>>>> ought to be increased to reduce that.) >>>>>>>> >>>>>>> Yea, a separate mutex for each hash list might help. There is also >>>>>>> the >>>>>>> LRU list that all entries end up on, that gets used by the >>>>>>> trimming >>>>>>> code. >>>>>>> (I think? I wrote this stuff about 8 years ago, so I haven't >>>>>>> looked >>>>>>> at >>>>>>> it in a while.) >>>>>>> >>>>>>> Also, increasing the hash table size is probably a good idea, >>>>>>> especially >>>>>>> if you reduce how aggressively the cache is trimmed. >>>>>>> >>>>>>>>> Only doing it once/sec would result in a very large cache when >>>>>>>>> bursts of >>>>>>>>> traffic arrives. >>>>>>>> >>>>>>>> My servers have 96 GB of memory so that's not a big deal for me. >>>>>>>> >>>>>>> This code was originally "production tested" on a server with >>>>>>> 1Gbyte, >>>>>>> so times have changed a bit;-) >>>>>>> >>>>>>>>> I'm not sure I see why doing it as a separate thread will >>>>>>>>> improve >>>>>>>>> things. >>>>>>>>> There are N nfsd threads already (N can be bumped up to 256 if >>>>>>>>> you >>>>>>>>> wish) >>>>>>>>> and having a bunch more "cache trimming threads" would just >>>>>>>>> increase >>>>>>>>> contention, wouldn't it? >>>>>>>> >>>>>>>> Only one cache-trimming thread. The cache trim holds the (global) >>>>>>>> mutex for much longer than any individual nfsd service thread has >>>>>>>> any >>>>>>>> need to, and having N threads doing that in parallel is why it's >>>>>>>
Re: NFS server bottlenecks
On Oct 11, 2012, at 8:46 AM, Nikolay Denev wrote: > > On Oct 11, 2012, at 1:09 AM, Rick Macklem wrote: > >> Nikolay Denev wrote: >>> On Oct 10, 2012, at 3:18 AM, Rick Macklem >>> wrote: >>> >>>> Nikolay Denev wrote: >>>>> On Oct 4, 2012, at 12:36 AM, Rick Macklem >>>>> wrote: >>>>> >>>>>> Garrett Wollman wrote: >>>>>>> <>>>>>> said: >>>>>>> >>>>>>>>> Simple: just use a sepatate mutex for each list that a cache >>>>>>>>> entry >>>>>>>>> is on, rather than a global lock for everything. This would >>>>>>>>> reduce >>>>>>>>> the mutex contention, but I'm not sure how significantly since >>>>>>>>> I >>>>>>>>> don't have the means to measure it yet. >>>>>>>>> >>>>>>>> Well, since the cache trimming is removing entries from the >>>>>>>> lists, >>>>>>>> I >>>>>>>> don't >>>>>>>> see how that can be done with a global lock for list updates? >>>>>>> >>>>>>> Well, the global lock is what we have now, but the cache trimming >>>>>>> process only looks at one list at a time, so not locking the list >>>>>>> that >>>>>>> isn't being iterated over probably wouldn't hurt, unless there's >>>>>>> some >>>>>>> mechanism (that I didn't see) for entries to move from one list >>>>>>> to >>>>>>> another. Note that I'm considering each hash bucket a separate >>>>>>> "list". (One issue to worry about in that case would be >>>>>>> cache-line >>>>>>> contention in the array of hash buckets; perhaps >>>>>>> NFSRVCACHE_HASHSIZE >>>>>>> ought to be increased to reduce that.) >>>>>>> >>>>>> Yea, a separate mutex for each hash list might help. There is also >>>>>> the >>>>>> LRU list that all entries end up on, that gets used by the >>>>>> trimming >>>>>> code. >>>>>> (I think? I wrote this stuff about 8 years ago, so I haven't >>>>>> looked >>>>>> at >>>>>> it in a while.) >>>>>> >>>>>> Also, increasing the hash table size is probably a good idea, >>>>>> especially >>>>>> if you reduce how aggressively the cache is trimmed. >>>>>> >>>>>>>> Only doing it once/sec would result in a very large cache when >>>>>>>> bursts of >>>>>>>> traffic arrives. >>>>>>> >>>>>>> My servers have 96 GB of memory so that's not a big deal for me. >>>>>>> >>>>>> This code was originally "production tested" on a server with >>>>>> 1Gbyte, >>>>>> so times have changed a bit;-) >>>>>> >>>>>>>> I'm not sure I see why doing it as a separate thread will >>>>>>>> improve >>>>>>>> things. >>>>>>>> There are N nfsd threads already (N can be bumped up to 256 if >>>>>>>> you >>>>>>>> wish) >>>>>>>> and having a bunch more "cache trimming threads" would just >>>>>>>> increase >>>>>>>> contention, wouldn't it? >>>>>>> >>>>>>> Only one cache-trimming thread. The cache trim holds the (global) >>>>>>> mutex for much longer than any individual nfsd service thread has >>>>>>> any >>>>>>> need to, and having N threads doing that in parallel is why it's >>>>>>> so >>>>>>> heavily contended. If there's only one thread doing the trim, >>>>>>> then >>>>>>> the nfsd service threads aren't spending time either contending >>>>>>> on >>>>>>> the >>>>>>> mutex (it will be held less frequently and for shorter periods). >>>>>
Re: NFS server bottlenecks
On Oct 11, 2012, at 1:09 AM, Rick Macklem wrote: > Nikolay Denev wrote: >> On Oct 10, 2012, at 3:18 AM, Rick Macklem >> wrote: >> >>> Nikolay Denev wrote: >>>> On Oct 4, 2012, at 12:36 AM, Rick Macklem >>>> wrote: >>>> >>>>> Garrett Wollman wrote: >>>>>> <>>>>> said: >>>>>> >>>>>>>> Simple: just use a sepatate mutex for each list that a cache >>>>>>>> entry >>>>>>>> is on, rather than a global lock for everything. This would >>>>>>>> reduce >>>>>>>> the mutex contention, but I'm not sure how significantly since >>>>>>>> I >>>>>>>> don't have the means to measure it yet. >>>>>>>> >>>>>>> Well, since the cache trimming is removing entries from the >>>>>>> lists, >>>>>>> I >>>>>>> don't >>>>>>> see how that can be done with a global lock for list updates? >>>>>> >>>>>> Well, the global lock is what we have now, but the cache trimming >>>>>> process only looks at one list at a time, so not locking the list >>>>>> that >>>>>> isn't being iterated over probably wouldn't hurt, unless there's >>>>>> some >>>>>> mechanism (that I didn't see) for entries to move from one list >>>>>> to >>>>>> another. Note that I'm considering each hash bucket a separate >>>>>> "list". (One issue to worry about in that case would be >>>>>> cache-line >>>>>> contention in the array of hash buckets; perhaps >>>>>> NFSRVCACHE_HASHSIZE >>>>>> ought to be increased to reduce that.) >>>>>> >>>>> Yea, a separate mutex for each hash list might help. There is also >>>>> the >>>>> LRU list that all entries end up on, that gets used by the >>>>> trimming >>>>> code. >>>>> (I think? I wrote this stuff about 8 years ago, so I haven't >>>>> looked >>>>> at >>>>> it in a while.) >>>>> >>>>> Also, increasing the hash table size is probably a good idea, >>>>> especially >>>>> if you reduce how aggressively the cache is trimmed. >>>>> >>>>>>> Only doing it once/sec would result in a very large cache when >>>>>>> bursts of >>>>>>> traffic arrives. >>>>>> >>>>>> My servers have 96 GB of memory so that's not a big deal for me. >>>>>> >>>>> This code was originally "production tested" on a server with >>>>> 1Gbyte, >>>>> so times have changed a bit;-) >>>>> >>>>>>> I'm not sure I see why doing it as a separate thread will >>>>>>> improve >>>>>>> things. >>>>>>> There are N nfsd threads already (N can be bumped up to 256 if >>>>>>> you >>>>>>> wish) >>>>>>> and having a bunch more "cache trimming threads" would just >>>>>>> increase >>>>>>> contention, wouldn't it? >>>>>> >>>>>> Only one cache-trimming thread. The cache trim holds the (global) >>>>>> mutex for much longer than any individual nfsd service thread has >>>>>> any >>>>>> need to, and having N threads doing that in parallel is why it's >>>>>> so >>>>>> heavily contended. If there's only one thread doing the trim, >>>>>> then >>>>>> the nfsd service threads aren't spending time either contending >>>>>> on >>>>>> the >>>>>> mutex (it will be held less frequently and for shorter periods). >>>>>> >>>>> I think the little drc2.patch which will keep the nfsd threads >>>>> from >>>>> acquiring the mutex and doing the trimming most of the time, might >>>>> be >>>>> sufficient. I still don't see why a separate trimming thread will >>>>> be >>>>> an advantage. I'd also be worried
Re: NFS server bottlenecks
On Oct 10, 2012, at 3:18 AM, Rick Macklem wrote: > Nikolay Denev wrote: >> On Oct 4, 2012, at 12:36 AM, Rick Macklem >> wrote: >> >>> Garrett Wollman wrote: >>>> <>>> said: >>>> >>>>>> Simple: just use a sepatate mutex for each list that a cache >>>>>> entry >>>>>> is on, rather than a global lock for everything. This would >>>>>> reduce >>>>>> the mutex contention, but I'm not sure how significantly since I >>>>>> don't have the means to measure it yet. >>>>>> >>>>> Well, since the cache trimming is removing entries from the lists, >>>>> I >>>>> don't >>>>> see how that can be done with a global lock for list updates? >>>> >>>> Well, the global lock is what we have now, but the cache trimming >>>> process only looks at one list at a time, so not locking the list >>>> that >>>> isn't being iterated over probably wouldn't hurt, unless there's >>>> some >>>> mechanism (that I didn't see) for entries to move from one list to >>>> another. Note that I'm considering each hash bucket a separate >>>> "list". (One issue to worry about in that case would be cache-line >>>> contention in the array of hash buckets; perhaps >>>> NFSRVCACHE_HASHSIZE >>>> ought to be increased to reduce that.) >>>> >>> Yea, a separate mutex for each hash list might help. There is also >>> the >>> LRU list that all entries end up on, that gets used by the trimming >>> code. >>> (I think? I wrote this stuff about 8 years ago, so I haven't looked >>> at >>> it in a while.) >>> >>> Also, increasing the hash table size is probably a good idea, >>> especially >>> if you reduce how aggressively the cache is trimmed. >>> >>>>> Only doing it once/sec would result in a very large cache when >>>>> bursts of >>>>> traffic arrives. >>>> >>>> My servers have 96 GB of memory so that's not a big deal for me. >>>> >>> This code was originally "production tested" on a server with >>> 1Gbyte, >>> so times have changed a bit;-) >>> >>>>> I'm not sure I see why doing it as a separate thread will improve >>>>> things. >>>>> There are N nfsd threads already (N can be bumped up to 256 if you >>>>> wish) >>>>> and having a bunch more "cache trimming threads" would just >>>>> increase >>>>> contention, wouldn't it? >>>> >>>> Only one cache-trimming thread. The cache trim holds the (global) >>>> mutex for much longer than any individual nfsd service thread has >>>> any >>>> need to, and having N threads doing that in parallel is why it's so >>>> heavily contended. If there's only one thread doing the trim, then >>>> the nfsd service threads aren't spending time either contending on >>>> the >>>> mutex (it will be held less frequently and for shorter periods). >>>> >>> I think the little drc2.patch which will keep the nfsd threads from >>> acquiring the mutex and doing the trimming most of the time, might >>> be >>> sufficient. I still don't see why a separate trimming thread will be >>> an advantage. I'd also be worried that the one cache trimming thread >>> won't get the job done soon enough. >>> >>> When I did production testing on a 1Gbyte server that saw a peak >>> load of about 100RPCs/sec, it was necessary to trim aggressively. >>> (Although I'd be tempted to say that a server with 1Gbyte is no >>> longer relevant, I recently recall someone trying to run FreeBSD >>> on a i486, although I doubt they wanted to run the nfsd on it.) >>> >>>>> The only negative effect I can think of w.r.t. having the nfsd >>>>> threads doing it would be a (I believe negligible) increase in RPC >>>>> response times (the time the nfsd thread spends trimming the >>>>> cache). >>>>> As noted, I think this time would be negligible compared to disk >>>>> I/O >>>>> and network transit times in the total RPC response time? >>>> >>>> With adaptive mutexes, many
Re: NFS server bottlenecks
On Oct 9, 2012, at 5:12 PM, Nikolay Denev wrote: > > On Oct 4, 2012, at 12:36 AM, Rick Macklem wrote: > >> Garrett Wollman wrote: >>> <>> said: >>> >>>>> Simple: just use a sepatate mutex for each list that a cache entry >>>>> is on, rather than a global lock for everything. This would reduce >>>>> the mutex contention, but I'm not sure how significantly since I >>>>> don't have the means to measure it yet. >>>>> >>>> Well, since the cache trimming is removing entries from the lists, I >>>> don't >>>> see how that can be done with a global lock for list updates? >>> >>> Well, the global lock is what we have now, but the cache trimming >>> process only looks at one list at a time, so not locking the list that >>> isn't being iterated over probably wouldn't hurt, unless there's some >>> mechanism (that I didn't see) for entries to move from one list to >>> another. Note that I'm considering each hash bucket a separate >>> "list". (One issue to worry about in that case would be cache-line >>> contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE >>> ought to be increased to reduce that.) >>> >> Yea, a separate mutex for each hash list might help. There is also the >> LRU list that all entries end up on, that gets used by the trimming code. >> (I think? I wrote this stuff about 8 years ago, so I haven't looked at >> it in a while.) >> >> Also, increasing the hash table size is probably a good idea, especially >> if you reduce how aggressively the cache is trimmed. >> >>>> Only doing it once/sec would result in a very large cache when >>>> bursts of >>>> traffic arrives. >>> >>> My servers have 96 GB of memory so that's not a big deal for me. >>> >> This code was originally "production tested" on a server with 1Gbyte, >> so times have changed a bit;-) >> >>>> I'm not sure I see why doing it as a separate thread will improve >>>> things. >>>> There are N nfsd threads already (N can be bumped up to 256 if you >>>> wish) >>>> and having a bunch more "cache trimming threads" would just increase >>>> contention, wouldn't it? >>> >>> Only one cache-trimming thread. The cache trim holds the (global) >>> mutex for much longer than any individual nfsd service thread has any >>> need to, and having N threads doing that in parallel is why it's so >>> heavily contended. If there's only one thread doing the trim, then >>> the nfsd service threads aren't spending time either contending on the >>> mutex (it will be held less frequently and for shorter periods). >>> >> I think the little drc2.patch which will keep the nfsd threads from >> acquiring the mutex and doing the trimming most of the time, might be >> sufficient. I still don't see why a separate trimming thread will be >> an advantage. I'd also be worried that the one cache trimming thread >> won't get the job done soon enough. >> >> When I did production testing on a 1Gbyte server that saw a peak >> load of about 100RPCs/sec, it was necessary to trim aggressively. >> (Although I'd be tempted to say that a server with 1Gbyte is no >> longer relevant, I recently recall someone trying to run FreeBSD >> on a i486, although I doubt they wanted to run the nfsd on it.) >> >>>> The only negative effect I can think of w.r.t. having the nfsd >>>> threads doing it would be a (I believe negligible) increase in RPC >>>> response times (the time the nfsd thread spends trimming the cache). >>>> As noted, I think this time would be negligible compared to disk I/O >>>> and network transit times in the total RPC response time? >>> >>> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G >>> network connectivity, spinning on a contended mutex takes a >>> significant amount of CPU time. (For the current design of the NFS >>> server, it may actually be a win to turn off adaptive mutexes -- I >>> should give that a try once I'm able to do more testing.) >>> >> Have fun with it. Let me know when you have what you think is a good patch. >> >> rick >> >>> -GAWollman >>> ___ >>> freebsd-hackers@freebsd.org mailing
Re: NFS server bottlenecks
On Oct 4, 2012, at 12:36 AM, Rick Macklem wrote: > Garrett Wollman wrote: >> <> said: >> Simple: just use a sepatate mutex for each list that a cache entry is on, rather than a global lock for everything. This would reduce the mutex contention, but I'm not sure how significantly since I don't have the means to measure it yet. >>> Well, since the cache trimming is removing entries from the lists, I >>> don't >>> see how that can be done with a global lock for list updates? >> >> Well, the global lock is what we have now, but the cache trimming >> process only looks at one list at a time, so not locking the list that >> isn't being iterated over probably wouldn't hurt, unless there's some >> mechanism (that I didn't see) for entries to move from one list to >> another. Note that I'm considering each hash bucket a separate >> "list". (One issue to worry about in that case would be cache-line >> contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE >> ought to be increased to reduce that.) >> > Yea, a separate mutex for each hash list might help. There is also the > LRU list that all entries end up on, that gets used by the trimming code. > (I think? I wrote this stuff about 8 years ago, so I haven't looked at > it in a while.) > > Also, increasing the hash table size is probably a good idea, especially > if you reduce how aggressively the cache is trimmed. > >>> Only doing it once/sec would result in a very large cache when >>> bursts of >>> traffic arrives. >> >> My servers have 96 GB of memory so that's not a big deal for me. >> > This code was originally "production tested" on a server with 1Gbyte, > so times have changed a bit;-) > >>> I'm not sure I see why doing it as a separate thread will improve >>> things. >>> There are N nfsd threads already (N can be bumped up to 256 if you >>> wish) >>> and having a bunch more "cache trimming threads" would just increase >>> contention, wouldn't it? >> >> Only one cache-trimming thread. The cache trim holds the (global) >> mutex for much longer than any individual nfsd service thread has any >> need to, and having N threads doing that in parallel is why it's so >> heavily contended. If there's only one thread doing the trim, then >> the nfsd service threads aren't spending time either contending on the >> mutex (it will be held less frequently and for shorter periods). >> > I think the little drc2.patch which will keep the nfsd threads from > acquiring the mutex and doing the trimming most of the time, might be > sufficient. I still don't see why a separate trimming thread will be > an advantage. I'd also be worried that the one cache trimming thread > won't get the job done soon enough. > > When I did production testing on a 1Gbyte server that saw a peak > load of about 100RPCs/sec, it was necessary to trim aggressively. > (Although I'd be tempted to say that a server with 1Gbyte is no > longer relevant, I recently recall someone trying to run FreeBSD > on a i486, although I doubt they wanted to run the nfsd on it.) > >>> The only negative effect I can think of w.r.t. having the nfsd >>> threads doing it would be a (I believe negligible) increase in RPC >>> response times (the time the nfsd thread spends trimming the cache). >>> As noted, I think this time would be negligible compared to disk I/O >>> and network transit times in the total RPC response time? >> >> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G >> network connectivity, spinning on a contended mutex takes a >> significant amount of CPU time. (For the current design of the NFS >> server, it may actually be a win to turn off adaptive mutexes -- I >> should give that a try once I'm able to do more testing.) >> > Have fun with it. Let me know when you have what you think is a good patch. > > rick > >> -GAWollman >> ___ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to >> "freebsd-hackers-unsubscr...@freebsd.org" > ___ > freebsd...@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscr...@freebsd.org" My quest for IOPS over NFS continues :) So far I'm not able to achieve more than about 3000 8K read requests over NFS, while the server locally gives much more. And this is all from a file that is completely in ARC cache, no disk IO involved. I've snatched some sample DTrace script from the net : [ http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes ] And modified it for our new NFS server : #!/usr/sbin/dtrace -qs fbt:kernel:nfsrvd_*:entry { self->ts = timestamp; @counts[probefunc] = count(); } fbt:kernel:nfsrvd_*:return / self->ts > 0 / { this->delta = (timestamp-self->ts)/100; } fbt:kernel:nfsrvd_*:return / self
Re: NFS server bottlenecks
On Oct 4, 2012, at 12:36 AM, Rick Macklem wrote: > Garrett Wollman wrote: >> <> said: >> Simple: just use a sepatate mutex for each list that a cache entry is on, rather than a global lock for everything. This would reduce the mutex contention, but I'm not sure how significantly since I don't have the means to measure it yet. >>> Well, since the cache trimming is removing entries from the lists, I >>> don't >>> see how that can be done with a global lock for list updates? >> >> Well, the global lock is what we have now, but the cache trimming >> process only looks at one list at a time, so not locking the list that >> isn't being iterated over probably wouldn't hurt, unless there's some >> mechanism (that I didn't see) for entries to move from one list to >> another. Note that I'm considering each hash bucket a separate >> "list". (One issue to worry about in that case would be cache-line >> contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE >> ought to be increased to reduce that.) >> > Yea, a separate mutex for each hash list might help. There is also the > LRU list that all entries end up on, that gets used by the trimming code. > (I think? I wrote this stuff about 8 years ago, so I haven't looked at > it in a while.) > > Also, increasing the hash table size is probably a good idea, especially > if you reduce how aggressively the cache is trimmed. > >>> Only doing it once/sec would result in a very large cache when >>> bursts of >>> traffic arrives. >> >> My servers have 96 GB of memory so that's not a big deal for me. >> > This code was originally "production tested" on a server with 1Gbyte, > so times have changed a bit;-) > >>> I'm not sure I see why doing it as a separate thread will improve >>> things. >>> There are N nfsd threads already (N can be bumped up to 256 if you >>> wish) >>> and having a bunch more "cache trimming threads" would just increase >>> contention, wouldn't it? >> >> Only one cache-trimming thread. The cache trim holds the (global) >> mutex for much longer than any individual nfsd service thread has any >> need to, and having N threads doing that in parallel is why it's so >> heavily contended. If there's only one thread doing the trim, then >> the nfsd service threads aren't spending time either contending on the >> mutex (it will be held less frequently and for shorter periods). >> > I think the little drc2.patch which will keep the nfsd threads from > acquiring the mutex and doing the trimming most of the time, might be > sufficient. I still don't see why a separate trimming thread will be > an advantage. I'd also be worried that the one cache trimming thread > won't get the job done soon enough. > > When I did production testing on a 1Gbyte server that saw a peak > load of about 100RPCs/sec, it was necessary to trim aggressively. > (Although I'd be tempted to say that a server with 1Gbyte is no > longer relevant, I recently recall someone trying to run FreeBSD > on a i486, although I doubt they wanted to run the nfsd on it.) > >>> The only negative effect I can think of w.r.t. having the nfsd >>> threads doing it would be a (I believe negligible) increase in RPC >>> response times (the time the nfsd thread spends trimming the cache). >>> As noted, I think this time would be negligible compared to disk I/O >>> and network transit times in the total RPC response time? >> >> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G >> network connectivity, spinning on a contended mutex takes a >> significant amount of CPU time. (For the current design of the NFS >> server, it may actually be a win to turn off adaptive mutexes -- I >> should give that a try once I'm able to do more testing.) >> > Have fun with it. Let me know when you have what you think is a good patch. > > rick > >> -GAWollman >> ___ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to >> "freebsd-hackers-unsubscr...@freebsd.org" > ___ > freebsd...@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscr...@freebsd.org" I was doing some NFS testing with RELENG_9 machine and a Linux RHEL machine over 10G network, and noticed the same nfsd threads issue. Previously I would read a 32G file locally on the FreeBSD ZFS/NFS server with "dd if=/tank/32G.bin of=/dev/null bs=1M" to cache it completely in ARC (machine has 196G RAM), then if I do this again locally I would get close to 4GB/sec read - completely from the cache... But If I try to read the file over NFS from the Linux machine I would only get about 100MB/sec speed, sometimes a bit more, and all of the nfsd threads are clearly visible in top. pmcstat also showed the same mutex contention as in the original post. I've now applied
accessing geom stats from the kernel
Hello, I have a small four SATA bay machine (HP ex470) which I'm using as a NAS with FreeBSD+ZFS. It has four dual color leds for each SATA bay (red and blue + purple if lit together) Which I'm controlling either from userspace by writing data to the enclosure management ioport, or recently I've made a kernel module which uses the led(4) framework and exports the leds as device nodes in /dev. What I'm wondering now is, if there is an easy way to access geom/disk stats from in kernel and make the leds flash only during disk activity, without having to do it in userspace? -- Regards, Nikolay Denev ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"