Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm
On Tue May 31 11, Bruce Evans wrote: On Mon, 30 May 2011 m...@freebsd.org wrote: On Mon, May 30, 2011 at 8:25 AM, Bruce Evans b...@optusnet.com.au wrote: On Sat, 28 May 2011 m...@freebsd.org wrote: ... Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I think that will restore things to the original performance. Using /dev/zero always thrashes caches by the amount source buffer size + target buffer size (unless the arch uses nontemporal memory accesses for uiomove, which none do AFAIK). So a large source buffer is always just a pessimization. A large target buffer size is also a pessimization, but for the target buffer a fairly large size is needed to amortize the large syscall costs. In this PR, the target buffer size is 64K. ZERO_REGION_SIZE is 64K on i386 and 2M on amd64. 64K+64K on i386 is good for thrashing the L1 cache. That depends -- is the cache virtually or physically addressed? The zero_region only has 4k (PAGE_SIZE) of unique physical addresses. So most of the cache thrashing is due to the user-space buffer, if the cache is physically addressed. Oops. I now remember thinking that the much larger source buffer would be OK since it only uses 1 physical page. But it is apparently virtually addressed. It will only have a noticeable impact on a current L2 cache in competition with other threads. It is hard to fit everything in the L1 cache even with non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0 for the source (D)cache and 4K for the target cache might work). On amd64, 2M+2M is good for thrashing most L2 caches. In this PR, the thrashing is limited by the target buffer size to about 64K+64K, up from 4K+64K, and it is marginal whether the extra thrashing from the larger source buffer makes much difference. The old zbuf source buffer size of PAGE_SIZE was already too large. Wouldn't this depend on how far down from the use of the buffer the actual copy happens? Another advantage to a large virtual buffer is that it reduces the number of times the copy loop in uiomove has to return up to the device layer that initiated the copy. This is all pretty fast, but again assuming a physical cache fewer trips is better. Yes, I had forgotten that I have to keep going back to the uiomove() level for each iteration. That's a lot of overhead although not nearly as much as going back to the user level. If this is actually important to optimize, then I might add a repeat count to uiomove() and copyout() (actually a different function for the latter). linux-2.6.10 uses a mmapped /dev/zero and has had this since Y2K according to its comment. Sigh. You will never beat that by copying, but I think mmapping /dev/zero is only much more optimal for silly benchmarks. linux-2.6.10 also has a seekable /dev/zero. Seeks don't really work, but some of them succeed and keep the offset at 0 . ISTR remember a FreeBSD PR about the file offset for /dev/zero not working because it is garbage instead of 0. It is clearly a Linuxism to depend on it being nonzero. IIRC, the file offset for device files is at best implementation-defined in POSIX. i think you refer to [1]. i posted a patch as followup to that PR, but later noticed that it is completely wrong. there was also a discussion on @hackers i opened up with the subject line seeking into /dev/{null,zero}. however not much came out of it. POSIX doesn't have anything to say about seeking in connection with /dev/{null,zero}. it only states that: The behavior of lseek() on devices which are incapable of seeking is implementation-defined. The value of the file offset associated with such a device is undefined. so basically we can decide for ourselves, whether /dev/{null,zero} shall be capable or incapable of seeking. i really think this issue should be solved once and for all and then also mentioned in the zero(4) and null(4) man pages. so the question is: how do we want /dev/zero and /dev/null to behave when seeking into the devices? right now HEAD features the following semantics: reading from /dev/null != seeking writing to /dev/null != seeking reading from /dev/zero == seeking writing to /dev/zero != seeking please don't get me wrong: i'm NOT saying the current semantics are wrong. the issue in question is: the semantics need to be agreed upon and then documented once and for all in the zero(4) and null(4) man pages, so people don't trip over this questions every couple of years over and over again. cheers. alex [1] http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/152485 Bruce -- a13x ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm
On Sunday 29 May 2011 05:01:57 m...@freebsd.org wrote: On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl wrote: To me it looks like it's not able to cache the zeroes anymore. Is this intentional? I tried to change ZERO_REGION_SIZE back to 64K but that didn't help. Hmm. I don't have access to my FreeBSD box over the weekend, but I'll run this on my box when I get back to work. Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I think that will restore things to the original performance. Indeed it does. I couldn't find any authoritative docs stating wether or not the cache on this CPU is virtually indexed, but apparently at least some of it is. Regards, Pieter ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm
On Tue, May 31, 2011 at 2:48 PM, Pieter de Goeje pie...@degoeje.nl wrote: On Sunday 29 May 2011 05:01:57 m...@freebsd.org wrote: On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl wrote: To me it looks like it's not able to cache the zeroes anymore. Is this intentional? I tried to change ZERO_REGION_SIZE back to 64K but that didn't help. Hmm. I don't have access to my FreeBSD box over the weekend, but I'll run this on my box when I get back to work. Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I think that will restore things to the original performance. Indeed it does. I couldn't find any authoritative docs stating wether or not the cache on this CPU is virtually indexed, but apparently at least some of it is. On my physical box (some Dell thing from about 2008), I ran 10 loops of dd if=/dev/zero of=/dev/null bs=XX count=XX where bs went by powers of 2 from 512 bytes to 2M, and count was set so that the dd always transferred 8GB. I compared ZERO_REGION_SIZE of 64k and 2M on amd64. The summary of the ministat(1) output is: bs=512b - no difference bs=1K - no difference bs=2k - no difference bs=4k - no difference bs=8k - no difference bs=16k - no difference bs=32k - no difference bs=64k - no difference bs=128k - 2M is 0.69% faster bs=256k - 2M is 0.98% faster bs=512k - 2M is 0.65% faster bs=1M - 2M is 1.02% faster bs=2M - 2M is 2.17% slower I'll play again with a 4K buffer. For some applications (/dev/zero) a small size is sufficient. For some (md(4)) a ZERO_REGION_SIZE at least as large as the sectorsize is desired so that a single kernel buffer pointer can be used to set up a uio for VOP_WRITE(9). Attached is the ministat output; I hope it makes it. :-) Thanks, matthew x /data/zero-amd64-small/zero-512.txt + /data/zero-amd64-large/zero-512.txt +--+ | + x| |+ + +x*x+ +x x * x+x +x| | |__|__AM___MA___|___| | +--+ N Min MaxMedian AvgStddev x 10 13.564276 13.666499 13.590373 13.591993 0.030172083 + 10 13.49174 13.616263 13.569925 13.568006 0.033884281 No difference proven at 95.0% confidence x /data/zero-amd64-small/zero-1024.txt + /data/zero-amd64-large/zero-1024.txt +--+ |++ ++xx x x +* ++ xx++| | ||___AAM__M|_| | +--+ N Min MaxMedian AvgStddev x 10 7.155384 7.182849 7.168076 7.16613820.01041489 + 10 7.124263 7.207363 7.170449 7.1647896 0.023453662 No difference proven at 95.0% confidence x /data/zero-amd64-small/zero-2048.txt + /data/zero-amd64-large/zero-2048.txt +--+ | + | |+ + +xx *x +* xx+ ++xx x| ||_|A_M__M__A_|_| | +--+ N Min MaxMedian AvgStddev x 10 3.827242 3.867095 3.837901 3.839988 0.012983755 + 10 3.809213 3.843682 3.835748 3.8302765 0.011340307 No difference proven at 95.0% confidence x /data/zero-amd64-small/zero-4096.txt + /data/zero-amd64-large/zero-4096.txt +--+ |+ + ++xxx x + + * x+ ++ x x x| | |___AM_M_A___|_| | +--+ N Min MaxMedian AvgStddev x 10 2.165541 2.201224 2.173227 2.1769029 0.013803193 + 10 2.161362 2.185911 2.172388 2.1719634 0.0088129371 No difference proven at 95.0% confidence x /data/zero-amd64-small/zero-8192.txt + /data/zero-amd64-large/zero-8192.txt +--+ |+x| |+ x + + +x +x++x+ x xx + x x| | |__|___A__M_A_|___| |
Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm
On Tue, May 31, 2011 at 3:47 PM, m...@freebsd.org wrote: On Tue, May 31, 2011 at 2:48 PM, Pieter de Goeje pie...@degoeje.nl wrote: On Sunday 29 May 2011 05:01:57 m...@freebsd.org wrote: On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl wrote: To me it looks like it's not able to cache the zeroes anymore. Is this intentional? I tried to change ZERO_REGION_SIZE back to 64K but that didn't help. Hmm. I don't have access to my FreeBSD box over the weekend, but I'll run this on my box when I get back to work. Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I think that will restore things to the original performance. Indeed it does. I couldn't find any authoritative docs stating wether or not the cache on this CPU is virtually indexed, but apparently at least some of it is. On my physical box (some Dell thing from about 2008), I ran 10 loops of dd if=/dev/zero of=/dev/null bs=XX count=XX where bs went by powers of 2 from 512 bytes to 2M, and count was set so that the dd always transferred 8GB. I compared ZERO_REGION_SIZE of 64k and 2M on amd64. The summary of the ministat(1) output is: bs=512b - no difference bs=1K - no difference bs=2k - no difference bs=4k - no difference bs=8k - no difference bs=16k - no difference bs=32k - no difference bs=64k - no difference bs=128k - 2M is 0.69% faster bs=256k - 2M is 0.98% faster bs=512k - 2M is 0.65% faster bs=1M - 2M is 1.02% faster bs=2M - 2M is 2.17% slower I'll play again with a 4K buffer. The data is harder to parse precisely, but in general it looks like on my box using a 4K buffer results in significantly worse performance when the dd(1) block size is larger than 4K. How much worse depends on the block size, but it goes from 6% at bs=8k to 17% at bs=256k. Showing 4k/64k/2M ZERO_REGION_SIZE graphically in the ministat(1) output also makes it clear that the difference between 64k and 2M is nearly insignificant on my box compared to using 4k. http://people.freebsd.org/~mdf/zero-ministat.txt Cheers, matthew ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm
On Sat, 28 May 2011 m...@freebsd.org wrote: On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl wrote: On Friday 13 May 2011 20:48:01 Matthew D Fleming wrote: Author: mdf Date: Fri May 13 18:48:00 2011 New Revision: 221853 URL: http://svn.freebsd.org/changeset/base/221853 Log: ? Usa a globally visible region of zeros for both /dev/zero and the md ? device. ?There are likely other kernel uses of blob of zeros than can ? be converted. ? Reviewed by: ? ? ? ?alc ? MFC after: ?1 week This change seems to reduce /dev/zero performance by 68% as measured by this command: dd if=/dev/zero of=/dev/null bs=64k count=10. x dd-8-stable + dd-9-current +-+ |+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| Argh, hard \xa0. [...binary garbage deleted] This particular measurement was against 8-stable but the results are the same for -current just before this commit. Basically througput drops from ~13GB/sec to 4GB/sec. Hardware is a Phenom II X4 945 with 8GB of 800Mhz DDR2 memory. FreeBSD/amd64 is installed. This processor has 6MB of L3 cache. To me it looks like it's not able to cache the zeroes anymore. Is this intentional? I tried to change ZERO_REGION_SIZE back to 64K but that didn't help. Hmm. I don't have access to my FreeBSD box over the weekend, but I'll run this on my box when I get back to work. Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I think that will restore things to the original performance. Using /dev/zero always thrashes caches by the amount source buffer size + target buffer size (unless the arch uses nontemporal memory accesses for uiomove, which none do AFAIK). So a large source buffer is always just a pessimization. A large target buffer size is also a pessimization, but for the target buffer a fairly large size is needed to amortize the large syscall costs. In this PR, the target buffer size is 64K. ZERO_REGION_SIZE is 64K on i386 and 2M on amd64. 64K+64K on i386 is good for thrashing the L1 cache. It will only have a noticeable impact on a current L2 cache in competition with other threads. It is hard to fit everything in the L1 cache even with non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0 for the source (D)cache and 4K for the target cache might work). On amd64, 2M+2M is good for thrashing most L2 caches. In this PR, the thrashing is limited by the target buffer size to about 64K+64K, up from 4K+64K, and it is marginal whether the extra thrashing from the larger source buffer makes much difference. The old zbuf source buffer size of PAGE_SIZE was already too large. The source buffer size only needs to be large enough to amortize loop overhead. 1 cache line is enough in most cases. uiomove() and copyout() unfortunately don't support copying from register space, so there must be a source buffer. This may limit the bandwidth by a factor of 2 in some cases, since most modern CPUs can execute either 2 64-bit stores or 1 64-bit store and 1 64-bit load per cycle if everything is already in the L1 cache. However, target buffers for /dev/zero (or any user i/o) probably need to be larger than the L1 cache to amortize the syscall overhead, so there are usually plenty of cycles to spare for the unnecessary loads while the stores wait for caches. This behaviour is easy to see for regular files too (regular files get copied out from the buffer cache). You have limited control on the amount of thrashing by changing the target buffer size, and can determine cache sizes by looking at throughputs. Bruce___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm
On Mon, May 30, 2011 at 8:25 AM, Bruce Evans b...@optusnet.com.au wrote: On Sat, 28 May 2011 m...@freebsd.org wrote: On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl wrote: On Friday 13 May 2011 20:48:01 Matthew D Fleming wrote: Author: mdf Date: Fri May 13 18:48:00 2011 New Revision: 221853 URL: http://svn.freebsd.org/changeset/base/221853 Log: Usa a globally visible region of zeros for both /dev/zero and the md device. There are likely other kernel uses of blob of zeros than can be converted. Reviewed by: alc MFC after: 1 week This change seems to reduce /dev/zero performance by 68% as measured by this command: dd if=/dev/zero of=/dev/null bs=64k count=10. x dd-8-stable + dd-9-current +-+ |+ | Argh, hard \xa0. [...binary garbage deleted] This particular measurement was against 8-stable but the results are the same for -current just before this commit. Basically througput drops from ~13GB/sec to 4GB/sec. Hardware is a Phenom II X4 945 with 8GB of 800Mhz DDR2 memory. FreeBSD/amd64 is installed. This processor has 6MB of L3 cache. To me it looks like it's not able to cache the zeroes anymore. Is this intentional? I tried to change ZERO_REGION_SIZE back to 64K but that didn't help. Hmm. I don't have access to my FreeBSD box over the weekend, but I'll run this on my box when I get back to work. Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I think that will restore things to the original performance. Using /dev/zero always thrashes caches by the amount source buffer size + target buffer size (unless the arch uses nontemporal memory accesses for uiomove, which none do AFAIK). So a large source buffer is always just a pessimization. A large target buffer size is also a pessimization, but for the target buffer a fairly large size is needed to amortize the large syscall costs. In this PR, the target buffer size is 64K. ZERO_REGION_SIZE is 64K on i386 and 2M on amd64. 64K+64K on i386 is good for thrashing the L1 cache. That depends -- is the cache virtually or physically addressed? The zero_region only has 4k (PAGE_SIZE) of unique physical addresses. So most of the cache thrashing is due to the user-space buffer, if the cache is physically addressed. It will only have a noticeable impact on a current L2 cache in competition with other threads. It is hard to fit everything in the L1 cache even with non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0 for the source (D)cache and 4K for the target cache might work). On amd64, 2M+2M is good for thrashing most L2 caches. In this PR, the thrashing is limited by the target buffer size to about 64K+64K, up from 4K+64K, and it is marginal whether the extra thrashing from the larger source buffer makes much difference. The old zbuf source buffer size of PAGE_SIZE was already too large. Wouldn't this depend on how far down from the use of the buffer the actual copy happens? Another advantage to a large virtual buffer is that it reduces the number of times the copy loop in uiomove has to return up to the device layer that initiated the copy. This is all pretty fast, but again assuming a physical cache fewer trips is better. Thanks, matthew The source buffer size only needs to be large enough to amortize loop overhead. 1 cache line is enough in most cases. uiomove() and copyout() unfortunately don't support copying from register space, so there must be a source buffer. This may limit the bandwidth by a factor of 2 in some cases, since most modern CPUs can execute either 2 64-bit stores or 1 64-bit store and 1 64-bit load per cycle if everything is already in the L1 cache. However, target buffers for /dev/zero (or any user i/o) probably need to be larger than the L1 cache to amortize the syscall overhead, so there are usually plenty of cycles to spare for the unnecessary loads while the stores wait for caches. This behaviour is easy to see for regular files too (regular files get copied out from the buffer cache). You have limited control on the amount of thrashing by changing the target buffer size, and can determine cache sizes by looking at throughputs. Bruce ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm
On Mon, 30 May 2011 m...@freebsd.org wrote: On Mon, May 30, 2011 at 8:25 AM, Bruce Evans b...@optusnet.com.au wrote: On Sat, 28 May 2011 m...@freebsd.org wrote: ... Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I think that will restore things to the original performance. Using /dev/zero always thrashes caches by the amount source buffer size + target buffer size (unless the arch uses nontemporal memory accesses for uiomove, which none do AFAIK). ?So a large source buffer is always just a pessimization. ?A large target buffer size is also a pessimization, but for the target buffer a fairly large size is needed to amortize the large syscall costs. ?In this PR, the target buffer size is 64K. ?ZERO_REGION_SIZE is 64K on i386 and 2M on amd64. ?64K+64K on i386 is good for thrashing the L1 cache. That depends -- is the cache virtually or physically addressed? The zero_region only has 4k (PAGE_SIZE) of unique physical addresses. So most of the cache thrashing is due to the user-space buffer, if the cache is physically addressed. Oops. I now remember thinking that the much larger source buffer would be OK since it only uses 1 physical page. But it is apparently virtually addressed. ?It will only have a noticeable impact on a current L2 cache in competition with other threads. ?It is hard to fit everything in the L1 cache even with non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0 for the source (D)cache and 4K for the target cache might work). ?On amd64, 2M+2M is good for thrashing most L2 caches. ?In this PR, the thrashing is limited by the target buffer size to about 64K+64K, up from 4K+64K, and it is marginal whether the extra thrashing from the larger source buffer makes much difference. The old zbuf source buffer size of PAGE_SIZE was already too large. Wouldn't this depend on how far down from the use of the buffer the actual copy happens? Another advantage to a large virtual buffer is that it reduces the number of times the copy loop in uiomove has to return up to the device layer that initiated the copy. This is all pretty fast, but again assuming a physical cache fewer trips is better. Yes, I had forgotten that I have to keep going back to the uiomove() level for each iteration. That's a lot of overhead although not nearly as much as going back to the user level. If this is actually important to optimize, then I might add a repeat count to uiomove() and copyout() (actually a different function for the latter). linux-2.6.10 uses a mmapped /dev/zero and has had this since Y2K according to its comment. Sigh. You will never beat that by copying, but I think mmapping /dev/zero is only much more optimal for silly benchmarks. linux-2.6.10 also has a seekable /dev/zero. Seeks don't really work, but some of them succeed and keep the offset at 0 . ISTR remember a FreeBSD PR about the file offset for /dev/zero not working because it is garbage instead of 0. It is clearly a Linuxism to depend on it being nonzero. IIRC, the file offset for device files is at best implementation-defined in POSIX. Bruce___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm
On Friday 13 May 2011 20:48:01 Matthew D Fleming wrote: Author: mdf Date: Fri May 13 18:48:00 2011 New Revision: 221853 URL: http://svn.freebsd.org/changeset/base/221853 Log: Usa a globally visible region of zeros for both /dev/zero and the md device. There are likely other kernel uses of blob of zeros than can be converted. Reviewed by:alc MFC after: 1 week This change seems to reduce /dev/zero performance by 68% as measured by this command: dd if=/dev/zero of=/dev/null bs=64k count=10. x dd-8-stable + dd-9-current +-+ |+| |+| |+| |+x x| |+ x x x| |A |MA_|| +-+ N Min MaxMedian AvgStddev x 5 1.2573578e+10 1.3156063e+10 1.2827355e+10 1.290079e+10 2.4951207e+08 + 5 4.1271391e+09 4.1453925e+09 4.1295157e+09 4.1328097e+09 7487363.6 Difference at 95.0% confidence -8.76798e+09 +/- 2.57431e+08 -67.9647% +/- 1.99547% (Student's t, pooled s = 1.76511e+08) This particular measurement was against 8-stable but the results are the same for -current just before this commit. Basically througput drops from ~13GB/sec to 4GB/sec. Hardware is a Phenom II X4 945 with 8GB of 800Mhz DDR2 memory. FreeBSD/amd64 is installed. This processor has 6MB of L3 cache. To me it looks like it's not able to cache the zeroes anymore. Is this intentional? I tried to change ZERO_REGION_SIZE back to 64K but that didn't help. Regards, Pieter de Goeje ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm
Matthew D Fleming m...@freebsd.org writes: Log: Usa a globally visible region of zeros for both /dev/zero and the md device. There are likely other kernel uses of blob of zeros than can be converted. Excellent, thank you! DES -- Dag-Erling Smørgrav - d...@des.no ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
svn commit: r221853 - in head/sys: dev/md dev/null sys vm
Author: mdf Date: Fri May 13 18:48:00 2011 New Revision: 221853 URL: http://svn.freebsd.org/changeset/base/221853 Log: Usa a globally visible region of zeros for both /dev/zero and the md device. There are likely other kernel uses of blob of zeros than can be converted. Reviewed by: alc MFC after:1 week Modified: head/sys/dev/md/md.c head/sys/dev/null/null.c head/sys/sys/systm.h head/sys/vm/vm_kern.c Modified: head/sys/dev/md/md.c == --- head/sys/dev/md/md.cFri May 13 18:46:20 2011(r221852) +++ head/sys/dev/md/md.cFri May 13 18:48:00 2011(r221853) @@ -205,9 +205,6 @@ struct md_s { vm_object_t object; }; -/* Used for BIO_DELETE on MD_VNODE */ -static u_char zero[PAGE_SIZE]; - static struct indir * new_indir(u_int shift) { @@ -560,7 +557,8 @@ mdstart_vnode(struct md_s *sc, struct bi * that the two cases end up having very little in common. */ if (bp-bio_cmd == BIO_DELETE) { - zerosize = sizeof(zero) - (sizeof(zero) % sc-sectorsize); + zerosize = ZERO_REGION_SIZE - + (ZERO_REGION_SIZE % sc-sectorsize); auio.uio_iov = aiov; auio.uio_iovcnt = 1; auio.uio_offset = (vm_ooffset_t)bp-bio_offset; @@ -573,7 +571,7 @@ mdstart_vnode(struct md_s *sc, struct bi vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); error = 0; while (auio.uio_offset end) { - aiov.iov_base = zero; + aiov.iov_base = __DECONST(void *, zero_region); aiov.iov_len = end - auio.uio_offset; if (aiov.iov_len zerosize) aiov.iov_len = zerosize; Modified: head/sys/dev/null/null.c == --- head/sys/dev/null/null.cFri May 13 18:46:20 2011(r221852) +++ head/sys/dev/null/null.cFri May 13 18:48:00 2011(r221853) @@ -65,8 +65,6 @@ static struct cdevsw zero_cdevsw = { .d_flags = D_MMAP_ANON, }; -static void *zbuf; - /* ARGSUSED */ static int null_write(struct cdev *dev __unused, struct uio *uio, int flags __unused) @@ -95,10 +93,19 @@ null_ioctl(struct cdev *dev __unused, u_ static int zero_read(struct cdev *dev __unused, struct uio *uio, int flags __unused) { + void *zbuf; + ssize_t len; int error = 0; - while (uio-uio_resid 0 error == 0) - error = uiomove(zbuf, MIN(uio-uio_resid, PAGE_SIZE), uio); + KASSERT(uio-uio_rw == UIO_READ, + (Can't be in %s for write, __func__)); + zbuf = __DECONST(void *, zero_region); + while (uio-uio_resid 0 error == 0) { + len = uio-uio_resid; + if (len ZERO_REGION_SIZE) + len = ZERO_REGION_SIZE; + error = uiomove(zbuf, len, uio); + } return (error); } @@ -111,7 +118,6 @@ null_modevent(module_t mod __unused, int case MOD_LOAD: if (bootverbose) printf(null: null device, zero device\n); - zbuf = (void *)malloc(PAGE_SIZE, M_TEMP, M_WAITOK | M_ZERO); null_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, null_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0666, null); zero_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, zero_cdevsw, 0, @@ -121,7 +127,6 @@ null_modevent(module_t mod __unused, int case MOD_UNLOAD: destroy_dev(null_dev); destroy_dev(zero_dev); - free(zbuf, M_TEMP); break; case MOD_SHUTDOWN: Modified: head/sys/sys/systm.h == --- head/sys/sys/systm.hFri May 13 18:46:20 2011(r221852) +++ head/sys/sys/systm.hFri May 13 18:48:00 2011(r221853) @@ -125,6 +125,9 @@ extern char static_hints[]; /* by config extern char **kenvp; +extern const void *zero_region;/* address space maps to a zeroed page */ +#defineZERO_REGION_SIZE(2048 * 1024) + /* * General function declarations. */ Modified: head/sys/vm/vm_kern.c == --- head/sys/vm/vm_kern.c Fri May 13 18:46:20 2011(r221852) +++ head/sys/vm/vm_kern.c Fri May 13 18:48:00 2011(r221853) @@ -91,6 +91,9 @@ vm_map_t exec_map=0; vm_map_t pipe_map; vm_map_t buffer_map=0; +const void *zero_region; +CTASSERT((ZERO_REGION_SIZE PAGE_MASK) == 0); + /* * kmem_alloc_nofault: * @@ -527,6 +530,35 @@ kmem_free_wakeup(map, addr, size) vm_map_unlock(map); } +static void +kmem_init_zero_region(void) +{ + vm_offset_t addr; + vm_page_t m; + unsigned int i;