Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-06-03 Thread Alexander Best
On Tue May 31 11, Bruce Evans wrote:
 On Mon, 30 May 2011 m...@freebsd.org wrote:
 
 On Mon, May 30, 2011 at 8:25 AM, Bruce Evans b...@optusnet.com.au wrote:
 On Sat, 28 May 2011 m...@freebsd.org wrote:
 ...
 Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I
 think that will restore things to the original performance.
 
 Using /dev/zero always thrashes caches by the amount source buffer
 size + target buffer size (unless the arch uses nontemporal memory
 accesses for uiomove, which none do AFAIK).  So a large source buffer
 is always just a pessimization.  A large target buffer size is also a
 pessimization, but for the target buffer a fairly large size is needed
 to amortize the large syscall costs.  In this PR, the target buffer
 size is 64K.  ZERO_REGION_SIZE is 64K on i386 and 2M on amd64.  64K+64K
 on i386 is good for thrashing the L1 cache.
 
 That depends -- is the cache virtually or physically addressed?  The
 zero_region only has 4k (PAGE_SIZE) of unique physical addresses.  So
 most of the cache thrashing is due to the user-space buffer, if the
 cache is physically addressed.
 
 Oops.  I now remember thinking that the much larger source buffer would be
 OK since it only uses 1 physical page.  But it is apparently virtually
 addressed.
 
 It will only have a
 noticeable impact on a current L2 cache in competition with other
 threads.  It is hard to fit everything in the L1 cache even with
 non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0
 for the source (D)cache and 4K for the target cache might work).  On
 amd64, 2M+2M is good for thrashing most L2 caches.  In this PR, the
 thrashing is limited by the target buffer size to about 64K+64K, up
 from 4K+64K, and it is marginal whether the extra thrashing from the
 larger source buffer makes much difference.
 
 The old zbuf source buffer size of PAGE_SIZE was already too large.
 
 Wouldn't this depend on how far down from the use of the buffer the
 actual copy happens?  Another advantage to a large virtual buffer is
 that it reduces the number of times the copy loop in uiomove has to
 return up to the device layer that initiated the copy.  This is all
 pretty fast, but again assuming a physical cache fewer trips is
 better.
 
 Yes, I had forgotten that I have to keep going back to the uiomove()
 level for each iteration.  That's a lot of overhead although not nearly
 as much as going back to the user level.  If this is actually important
 to optimize, then I might add a repeat count to uiomove() and copyout()
 (actually a different function for the latter).
 
 linux-2.6.10 uses a mmapped /dev/zero and has had this since Y2K
 according to its comment.  Sigh.  You will never beat that by copying,
 but I think mmapping /dev/zero is only much more optimal for silly
 benchmarks.
 
 linux-2.6.10 also has a seekable /dev/zero.  Seeks don't really work,
 but some of them succeed and keep the offset at 0 .  ISTR remember
 a FreeBSD PR about the file offset for /dev/zero not working because
 it is garbage instead of 0.  It is clearly a Linuxism to depend on it
 being nonzero.  IIRC, the file offset for device files is at best
 implementation-defined in POSIX.

i think you refer to [1]. i posted a patch as followup to that PR, but later
noticed that it is completely wrong. there was also a discussion on @hackers i
opened up with the subject line seeking into /dev/{null,zero}. however not
much came out of it. POSIX doesn't have anything to say about seeking in
connection with /dev/{null,zero}. it only states that:

The behavior of lseek() on devices which are incapable of seeking is 
implementation-defined.
The value of the file offset associated with such a device is undefined.

so basically we can decide for ourselves, whether /dev/{null,zero} shall be
capable or incapable of seeking.

i really think this issue should be solved once and for all and then also
mentioned in the zero(4) and null(4) man pages. so the question is:

how do we want /dev/zero and /dev/null to behave when seeking into the devices?

right now HEAD features the following semantics:

reading from /dev/null != seeking
writing to /dev/null != seeking
reading from /dev/zero == seeking
writing to /dev/zero != seeking

please don't get me wrong: i'm NOT saying the current semantics are wrong. the
issue in question is: the semantics need to be agreed upon and then documented
once and for all in the zero(4) and null(4) man pages, so people don't trip
over this questions every couple of years over and over again.

cheers.
alex

[1] http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/152485

 
 Bruce


-- 
a13x
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-05-31 Thread Pieter de Goeje
On Sunday 29 May 2011 05:01:57 m...@freebsd.org wrote:
 On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl wrote:
  To me it looks like it's not able to cache the zeroes anymore. Is this
  intentional? I tried to change ZERO_REGION_SIZE back to 64K but that
  didn't help.

 Hmm.  I don't have access to my FreeBSD box over the weekend, but I'll
 run this on my box when I get back to work.

 Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I
 think that will restore things to the original performance.

Indeed it does. I couldn't find any authoritative docs stating wether or not 
the cache on this CPU is virtually indexed, but apparently at least some of 
it is.

Regards,

Pieter
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-05-31 Thread mdf
On Tue, May 31, 2011 at 2:48 PM, Pieter de Goeje pie...@degoeje.nl wrote:
 On Sunday 29 May 2011 05:01:57 m...@freebsd.org wrote:
 On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl wrote:
  To me it looks like it's not able to cache the zeroes anymore. Is this
  intentional? I tried to change ZERO_REGION_SIZE back to 64K but that
  didn't help.

 Hmm.  I don't have access to my FreeBSD box over the weekend, but I'll
 run this on my box when I get back to work.

 Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I
 think that will restore things to the original performance.

 Indeed it does. I couldn't find any authoritative docs stating wether or not
 the cache on this CPU is virtually indexed, but apparently at least some of
 it is.

On my physical box (some Dell thing from about 2008), I ran 10 loops
of dd if=/dev/zero of=/dev/null bs=XX count=XX where bs went by powers
of 2 from 512 bytes to 2M, and count was set so that the dd always
transferred 8GB.  I compared ZERO_REGION_SIZE of 64k and 2M on amd64.

The summary of the ministat(1) output is:

bs=512b - no difference
bs=1K - no difference
bs=2k - no difference
bs=4k - no difference
bs=8k - no difference
bs=16k - no difference
bs=32k - no difference
bs=64k - no difference
bs=128k - 2M is 0.69% faster
bs=256k - 2M is 0.98% faster
bs=512k - 2M is 0.65% faster
bs=1M - 2M is 1.02% faster
bs=2M - 2M is 2.17% slower

I'll play again with a 4K buffer.  For some applications (/dev/zero) a
small size is sufficient.  For some (md(4)) a ZERO_REGION_SIZE at
least as large as the sectorsize is desired so that a single kernel
buffer pointer can be used to set up a uio for VOP_WRITE(9).

Attached is the ministat output; I hope it makes it. :-)

Thanks,
matthew
x /data/zero-amd64-small/zero-512.txt
+ /data/zero-amd64-large/zero-512.txt
+--+
| +   x|
|+  + +x*x+  +x x *  x+x +x|
|  |__|__AM___MA___|___|   |
+--+
N   Min   MaxMedian   AvgStddev
x  10 13.564276 13.666499 13.590373 13.591993   0.030172083
+  10  13.49174 13.616263 13.569925 13.568006   0.033884281
No difference proven at 95.0% confidence



x /data/zero-amd64-small/zero-1024.txt
+ /data/zero-amd64-large/zero-1024.txt
+--+
|++   ++xx  x  x +* ++ xx++|
|   ||___AAM__M|_| |
+--+
N   Min   MaxMedian   AvgStddev
x  10  7.155384  7.182849  7.168076 7.16613820.01041489
+  10  7.124263  7.207363  7.170449 7.1647896   0.023453662
No difference proven at 95.0% confidence



x /data/zero-amd64-small/zero-2048.txt
+ /data/zero-amd64-large/zero-2048.txt
+--+
|  +   |
|+  +  +xx   *x   +* xx+   ++xx   x|
||_|A_M__M__A_|_|  |
+--+
N   Min   MaxMedian   AvgStddev
x  10  3.827242  3.867095  3.837901  3.839988   0.012983755
+  10  3.809213  3.843682  3.835748 3.8302765   0.011340307
No difference proven at 95.0% confidence



x /data/zero-amd64-small/zero-4096.txt
+ /data/zero-amd64-large/zero-4096.txt
+--+
|+ +   ++xxx   x   + + * x+  ++   x   x   x|
|   |___AM_M_A___|_|   |
+--+
N   Min   MaxMedian   AvgStddev
x  10  2.165541  2.201224  2.173227 2.1769029   0.013803193
+  10  2.161362  2.185911  2.172388 2.1719634  0.0088129371
No difference proven at 95.0% confidence



x /data/zero-amd64-small/zero-8192.txt
+ /data/zero-amd64-large/zero-8192.txt
+--+
|+x|
|+   x  +  +  +x  +x++x+  x xx  +   x x|
| |__|___A__M_A_|___|  |

Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-05-31 Thread mdf
On Tue, May 31, 2011 at 3:47 PM,  m...@freebsd.org wrote:
 On Tue, May 31, 2011 at 2:48 PM, Pieter de Goeje pie...@degoeje.nl wrote:
 On Sunday 29 May 2011 05:01:57 m...@freebsd.org wrote:
 On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl wrote:
  To me it looks like it's not able to cache the zeroes anymore. Is this
  intentional? I tried to change ZERO_REGION_SIZE back to 64K but that
  didn't help.

 Hmm.  I don't have access to my FreeBSD box over the weekend, but I'll
 run this on my box when I get back to work.

 Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I
 think that will restore things to the original performance.

 Indeed it does. I couldn't find any authoritative docs stating wether or not
 the cache on this CPU is virtually indexed, but apparently at least some of
 it is.

 On my physical box (some Dell thing from about 2008), I ran 10 loops
 of dd if=/dev/zero of=/dev/null bs=XX count=XX where bs went by powers
 of 2 from 512 bytes to 2M, and count was set so that the dd always
 transferred 8GB.  I compared ZERO_REGION_SIZE of 64k and 2M on amd64.

 The summary of the ministat(1) output is:

 bs=512b - no difference
 bs=1K - no difference
 bs=2k - no difference
 bs=4k - no difference
 bs=8k - no difference
 bs=16k - no difference
 bs=32k - no difference
 bs=64k - no difference
 bs=128k - 2M is 0.69% faster
 bs=256k - 2M is 0.98% faster
 bs=512k - 2M is 0.65% faster
 bs=1M - 2M is 1.02% faster
 bs=2M - 2M is 2.17% slower

 I'll play again with a 4K buffer.

The data is harder to parse precisely, but in general it looks like on
my box using a 4K buffer results in significantly worse performance
when the dd(1) block size is larger than 4K.  How much worse depends
on the block size, but it goes from 6% at bs=8k to 17% at bs=256k.
Showing 4k/64k/2M ZERO_REGION_SIZE graphically in the ministat(1)
output also makes it clear that the difference between 64k and 2M is
nearly insignificant on my box compared to using 4k.

http://people.freebsd.org/~mdf/zero-ministat.txt

Cheers,
matthew
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-05-30 Thread Bruce Evans

On Sat, 28 May 2011 m...@freebsd.org wrote:


On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl wrote:

On Friday 13 May 2011 20:48:01 Matthew D Fleming wrote:

Author: mdf
Date: Fri May 13 18:48:00 2011
New Revision: 221853
URL: http://svn.freebsd.org/changeset/base/221853

Log:
? Usa a globally visible region of zeros for both /dev/zero and the md
? device. ?There are likely other kernel uses of blob of zeros than can
? be converted.

? Reviewed by: ? ? ? ?alc
? MFC after: ?1 week


This change seems to reduce /dev/zero performance by 68% as measured by this
command: dd if=/dev/zero of=/dev/null bs=64k count=10.

x dd-8-stable
+ dd-9-current
+-+
|+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?|


Argh, hard \xa0.

[...binary garbage deleted]


This particular measurement was against 8-stable but the results are the same
for -current just before this commit. Basically througput drops from
~13GB/sec to 4GB/sec.

Hardware is a Phenom II X4 945 with 8GB of 800Mhz DDR2 memory. FreeBSD/amd64
is installed. This processor has 6MB of L3 cache.

To me it looks like it's not able to cache the zeroes anymore. Is this
intentional? I tried to change ZERO_REGION_SIZE back to 64K but that didn't
help.


Hmm.  I don't have access to my FreeBSD box over the weekend, but I'll
run this on my box when I get back to work.

Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I
think that will restore things to the original performance.


Using /dev/zero always thrashes caches by the amount source buffer
size + target buffer size (unless the arch uses nontemporal memory
accesses for uiomove, which none do AFAIK).  So a large source buffer
is always just a pessimization.  A large target buffer size is also a
pessimization, but for the target buffer a fairly large size is needed
to amortize the large syscall costs.  In this PR, the target buffer
size is 64K.  ZERO_REGION_SIZE is 64K on i386 and 2M on amd64.  64K+64K
on i386 is good for thrashing the L1 cache.  It will only have a
noticeable impact on a current L2 cache in competition with other
threads.  It is hard to fit everything in the L1 cache even with
non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0
for the source (D)cache and 4K for the target cache might work).  On
amd64, 2M+2M is good for thrashing most L2 caches.  In this PR, the
thrashing is limited by the target buffer size to about 64K+64K, up
from 4K+64K, and it is marginal whether the extra thrashing from the
larger source buffer makes much difference.

The old zbuf source buffer size of PAGE_SIZE was already too large.
The source buffer size only needs to be large enough to amortize
loop overhead.  1 cache line is enough in most cases.  uiomove()
and copyout() unfortunately don't support copying from register
space, so there must be a source buffer.  This may limit the bandwidth
by a factor of 2 in some cases, since most modern CPUs can execute
either 2 64-bit stores or 1 64-bit store and 1 64-bit load per cycle
if everything is already in the L1 cache.  However, target buffers
for /dev/zero (or any user i/o) probably need to be larger than the
L1 cache to amortize the syscall overhead, so there are usually plenty
of cycles to spare for the unnecessary loads while the stores wait for
caches.

This behaviour is easy to see for regular files too (regular files get
copied out from the buffer cache).  You have limited control on the
amount of thrashing by changing the target buffer size, and can determine
cache sizes by looking at throughputs.

Bruce___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org

Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-05-30 Thread mdf
On Mon, May 30, 2011 at 8:25 AM, Bruce Evans b...@optusnet.com.au wrote:
 On Sat, 28 May 2011 m...@freebsd.org wrote:

 On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje pie...@degoeje.nl
 wrote:

 On Friday 13 May 2011 20:48:01 Matthew D Fleming wrote:

 Author: mdf
 Date: Fri May 13 18:48:00 2011
 New Revision: 221853
 URL: http://svn.freebsd.org/changeset/base/221853

 Log:
   Usa a globally visible region of zeros for both /dev/zero and the md
   device.  There are likely other kernel uses of blob of zeros than
 can
   be converted.

   Reviewed by:        alc
   MFC after:  1 week

 This change seems to reduce /dev/zero performance by 68% as measured by
 this
 command: dd if=/dev/zero of=/dev/null bs=64k count=10.

 x dd-8-stable
 + dd-9-current

 +-+
 |+
  |

 Argh, hard \xa0.

 [...binary garbage deleted]

 This particular measurement was against 8-stable but the results are the
 same
 for -current just before this commit. Basically througput drops from
 ~13GB/sec to 4GB/sec.

 Hardware is a Phenom II X4 945 with 8GB of 800Mhz DDR2 memory.
 FreeBSD/amd64
 is installed. This processor has 6MB of L3 cache.

 To me it looks like it's not able to cache the zeroes anymore. Is this
 intentional? I tried to change ZERO_REGION_SIZE back to 64K but that
 didn't
 help.

 Hmm.  I don't have access to my FreeBSD box over the weekend, but I'll
 run this on my box when I get back to work.

 Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I
 think that will restore things to the original performance.

 Using /dev/zero always thrashes caches by the amount source buffer
 size + target buffer size (unless the arch uses nontemporal memory
 accesses for uiomove, which none do AFAIK).  So a large source buffer
 is always just a pessimization.  A large target buffer size is also a
 pessimization, but for the target buffer a fairly large size is needed
 to amortize the large syscall costs.  In this PR, the target buffer
 size is 64K.  ZERO_REGION_SIZE is 64K on i386 and 2M on amd64.  64K+64K
 on i386 is good for thrashing the L1 cache.

That depends -- is the cache virtually or physically addressed?  The
zero_region only has 4k (PAGE_SIZE) of unique physical addresses.  So
most of the cache thrashing is due to the user-space buffer, if the
cache is physically addressed.


  It will only have a
 noticeable impact on a current L2 cache in competition with other
 threads.  It is hard to fit everything in the L1 cache even with
 non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0
 for the source (D)cache and 4K for the target cache might work).  On
 amd64, 2M+2M is good for thrashing most L2 caches.  In this PR, the
 thrashing is limited by the target buffer size to about 64K+64K, up
 from 4K+64K, and it is marginal whether the extra thrashing from the
 larger source buffer makes much difference.

 The old zbuf source buffer size of PAGE_SIZE was already too large.

Wouldn't this depend on how far down from the use of the buffer the
actual copy happens?  Another advantage to a large virtual buffer is
that it reduces the number of times the copy loop in uiomove has to
return up to the device layer that initiated the copy.  This is all
pretty fast, but again assuming a physical cache fewer trips is
better.

Thanks,
matthew

 The source buffer size only needs to be large enough to amortize
 loop overhead.  1 cache line is enough in most cases.  uiomove()
 and copyout() unfortunately don't support copying from register
 space, so there must be a source buffer.  This may limit the bandwidth
 by a factor of 2 in some cases, since most modern CPUs can execute
 either 2 64-bit stores or 1 64-bit store and 1 64-bit load per cycle
 if everything is already in the L1 cache.  However, target buffers
 for /dev/zero (or any user i/o) probably need to be larger than the
 L1 cache to amortize the syscall overhead, so there are usually plenty
 of cycles to spare for the unnecessary loads while the stores wait for
 caches.

 This behaviour is easy to see for regular files too (regular files get
 copied out from the buffer cache).  You have limited control on the
 amount of thrashing by changing the target buffer size, and can determine
 cache sizes by looking at throughputs.

 Bruce
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-05-30 Thread Bruce Evans

On Mon, 30 May 2011 m...@freebsd.org wrote:


On Mon, May 30, 2011 at 8:25 AM, Bruce Evans b...@optusnet.com.au wrote:

On Sat, 28 May 2011 m...@freebsd.org wrote:

...
Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I
think that will restore things to the original performance.


Using /dev/zero always thrashes caches by the amount source buffer
size + target buffer size (unless the arch uses nontemporal memory
accesses for uiomove, which none do AFAIK). ?So a large source buffer
is always just a pessimization. ?A large target buffer size is also a
pessimization, but for the target buffer a fairly large size is needed
to amortize the large syscall costs. ?In this PR, the target buffer
size is 64K. ?ZERO_REGION_SIZE is 64K on i386 and 2M on amd64. ?64K+64K
on i386 is good for thrashing the L1 cache.


That depends -- is the cache virtually or physically addressed?  The
zero_region only has 4k (PAGE_SIZE) of unique physical addresses.  So
most of the cache thrashing is due to the user-space buffer, if the
cache is physically addressed.


Oops.  I now remember thinking that the much larger source buffer would be
OK since it only uses 1 physical page.  But it is apparently virtually
addressed.


?It will only have a

noticeable impact on a current L2 cache in competition with other
threads. ?It is hard to fit everything in the L1 cache even with
non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0
for the source (D)cache and 4K for the target cache might work). ?On
amd64, 2M+2M is good for thrashing most L2 caches. ?In this PR, the
thrashing is limited by the target buffer size to about 64K+64K, up
from 4K+64K, and it is marginal whether the extra thrashing from the
larger source buffer makes much difference.

The old zbuf source buffer size of PAGE_SIZE was already too large.


Wouldn't this depend on how far down from the use of the buffer the
actual copy happens?  Another advantage to a large virtual buffer is
that it reduces the number of times the copy loop in uiomove has to
return up to the device layer that initiated the copy.  This is all
pretty fast, but again assuming a physical cache fewer trips is
better.


Yes, I had forgotten that I have to keep going back to the uiomove()
level for each iteration.  That's a lot of overhead although not nearly
as much as going back to the user level.  If this is actually important
to optimize, then I might add a repeat count to uiomove() and copyout()
(actually a different function for the latter).

linux-2.6.10 uses a mmapped /dev/zero and has had this since Y2K
according to its comment.  Sigh.  You will never beat that by copying,
but I think mmapping /dev/zero is only much more optimal for silly
benchmarks.

linux-2.6.10 also has a seekable /dev/zero.  Seeks don't really work,
but some of them succeed and keep the offset at 0 .  ISTR remember
a FreeBSD PR about the file offset for /dev/zero not working because
it is garbage instead of 0.  It is clearly a Linuxism to depend on it
being nonzero.  IIRC, the file offset for device files is at best
implementation-defined in POSIX.

Bruce___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org

Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-05-28 Thread Pieter de Goeje
On Friday 13 May 2011 20:48:01 Matthew D Fleming wrote:
 Author: mdf
 Date: Fri May 13 18:48:00 2011
 New Revision: 221853
 URL: http://svn.freebsd.org/changeset/base/221853

 Log:
   Usa a globally visible region of zeros for both /dev/zero and the md
   device.  There are likely other kernel uses of blob of zeros than can
   be converted.

   Reviewed by:alc
   MFC after:  1 week


This change seems to reduce /dev/zero performance by 68% as measured by this 
command: dd if=/dev/zero of=/dev/null bs=64k count=10.

x dd-8-stable
+ dd-9-current
+-+
|+|
|+|
|+|
|+x  x|
|+  x x  x|
|A   |MA_||
+-+
N   Min   MaxMedian   AvgStddev
x   5 1.2573578e+10 1.3156063e+10 1.2827355e+10  1.290079e+10 2.4951207e+08
+   5 4.1271391e+09 4.1453925e+09 4.1295157e+09 4.1328097e+09 7487363.6
Difference at 95.0% confidence
-8.76798e+09 +/- 2.57431e+08
-67.9647% +/- 1.99547%
(Student's t, pooled s = 1.76511e+08)

This particular measurement was against 8-stable but the results are the same 
for -current just before this commit. Basically througput drops from 
~13GB/sec to 4GB/sec.

Hardware is a Phenom II X4 945 with 8GB of 800Mhz DDR2 memory. FreeBSD/amd64 
is installed. This processor has 6MB of L3 cache.

To me it looks like it's not able to cache the zeroes anymore. Is this 
intentional? I tried to change ZERO_REGION_SIZE back to 64K but that didn't 
help.

Regards,

Pieter de Goeje
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-05-15 Thread Dag-Erling Smørgrav
Matthew D Fleming m...@freebsd.org writes:
 Log:
   Usa a globally visible region of zeros for both /dev/zero and the md
   device.  There are likely other kernel uses of blob of zeros than can
   be converted.

Excellent, thank you!

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r221853 - in head/sys: dev/md dev/null sys vm

2011-05-13 Thread Matthew D Fleming
Author: mdf
Date: Fri May 13 18:48:00 2011
New Revision: 221853
URL: http://svn.freebsd.org/changeset/base/221853

Log:
  Usa a globally visible region of zeros for both /dev/zero and the md
  device.  There are likely other kernel uses of blob of zeros than can
  be converted.
  
  Reviewed by:  alc
  MFC after:1 week

Modified:
  head/sys/dev/md/md.c
  head/sys/dev/null/null.c
  head/sys/sys/systm.h
  head/sys/vm/vm_kern.c

Modified: head/sys/dev/md/md.c
==
--- head/sys/dev/md/md.cFri May 13 18:46:20 2011(r221852)
+++ head/sys/dev/md/md.cFri May 13 18:48:00 2011(r221853)
@@ -205,9 +205,6 @@ struct md_s {
vm_object_t object;
 };
 
-/* Used for BIO_DELETE on MD_VNODE */
-static u_char zero[PAGE_SIZE];
-
 static struct indir *
 new_indir(u_int shift)
 {
@@ -560,7 +557,8 @@ mdstart_vnode(struct md_s *sc, struct bi
 * that the two cases end up having very little in common.
 */
if (bp-bio_cmd == BIO_DELETE) {
-   zerosize = sizeof(zero) - (sizeof(zero) % sc-sectorsize);
+   zerosize = ZERO_REGION_SIZE -
+   (ZERO_REGION_SIZE % sc-sectorsize);
auio.uio_iov = aiov;
auio.uio_iovcnt = 1;
auio.uio_offset = (vm_ooffset_t)bp-bio_offset;
@@ -573,7 +571,7 @@ mdstart_vnode(struct md_s *sc, struct bi
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
error = 0;
while (auio.uio_offset  end) {
-   aiov.iov_base = zero;
+   aiov.iov_base = __DECONST(void *, zero_region);
aiov.iov_len = end - auio.uio_offset;
if (aiov.iov_len  zerosize)
aiov.iov_len = zerosize;

Modified: head/sys/dev/null/null.c
==
--- head/sys/dev/null/null.cFri May 13 18:46:20 2011(r221852)
+++ head/sys/dev/null/null.cFri May 13 18:48:00 2011(r221853)
@@ -65,8 +65,6 @@ static struct cdevsw zero_cdevsw = {
.d_flags =  D_MMAP_ANON,
 };
 
-static void *zbuf;
-
 /* ARGSUSED */
 static int
 null_write(struct cdev *dev __unused, struct uio *uio, int flags __unused)
@@ -95,10 +93,19 @@ null_ioctl(struct cdev *dev __unused, u_
 static int
 zero_read(struct cdev *dev __unused, struct uio *uio, int flags __unused)
 {
+   void *zbuf;
+   ssize_t len;
int error = 0;
 
-   while (uio-uio_resid  0  error == 0)
-   error = uiomove(zbuf, MIN(uio-uio_resid, PAGE_SIZE), uio);
+   KASSERT(uio-uio_rw == UIO_READ,
+   (Can't be in %s for write, __func__));
+   zbuf = __DECONST(void *, zero_region);
+   while (uio-uio_resid  0  error == 0) {
+   len = uio-uio_resid;
+   if (len  ZERO_REGION_SIZE)
+   len = ZERO_REGION_SIZE;
+   error = uiomove(zbuf, len, uio);
+   }
 
return (error);
 }
@@ -111,7 +118,6 @@ null_modevent(module_t mod __unused, int
case MOD_LOAD:
if (bootverbose)
printf(null: null device, zero device\n);
-   zbuf = (void *)malloc(PAGE_SIZE, M_TEMP, M_WAITOK | M_ZERO);
null_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, null_cdevsw, 0,
NULL, UID_ROOT, GID_WHEEL, 0666, null);
zero_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, zero_cdevsw, 0,
@@ -121,7 +127,6 @@ null_modevent(module_t mod __unused, int
case MOD_UNLOAD:
destroy_dev(null_dev);
destroy_dev(zero_dev);
-   free(zbuf, M_TEMP);
break;
 
case MOD_SHUTDOWN:

Modified: head/sys/sys/systm.h
==
--- head/sys/sys/systm.hFri May 13 18:46:20 2011(r221852)
+++ head/sys/sys/systm.hFri May 13 18:48:00 2011(r221853)
@@ -125,6 +125,9 @@ extern char static_hints[]; /* by config
 
 extern char **kenvp;
 
+extern const void *zero_region;/* address space maps to a zeroed page  
*/
+#defineZERO_REGION_SIZE(2048 * 1024)
+
 /*
  * General function declarations.
  */

Modified: head/sys/vm/vm_kern.c
==
--- head/sys/vm/vm_kern.c   Fri May 13 18:46:20 2011(r221852)
+++ head/sys/vm/vm_kern.c   Fri May 13 18:48:00 2011(r221853)
@@ -91,6 +91,9 @@ vm_map_t exec_map=0;
 vm_map_t pipe_map;
 vm_map_t buffer_map=0;
 
+const void *zero_region;
+CTASSERT((ZERO_REGION_SIZE  PAGE_MASK) == 0);
+
 /*
  * kmem_alloc_nofault:
  *
@@ -527,6 +530,35 @@ kmem_free_wakeup(map, addr, size)
vm_map_unlock(map);
 }
 
+static void
+kmem_init_zero_region(void)
+{
+   vm_offset_t addr;
+   vm_page_t m;
+   unsigned int i;