Hi,
I've done some benchmarks regarding this series now. In particular, I've
created a 7G image, installed Arch Linux to a partition in the first 2G
and created an empty ext4 partition for benchmarking in the remaining 5G.
My first test consisted of running bonnie++ ("bonnie++ -d [scratch
partition] -s 4g -n 0 -x 16 -Z /dev/urandom" (4G files for I/O
performance tests, no file creation tests, repeat 16 times)) using
different metadata overlap checks (none, constant (all tests can be
performed in constant time) and cached (current default)). The reason I
didn't test for "all" (perform all overlap checks) is that this will
only make a difference (when compared to "cached") if there are
snapshots (right now, at least). I put the underlying image file to /tmp
(tmpfs) for minimal true I/O latency (to maximize the check overhead).
The second test was basically the same, except I've taken 100 (internal)
snapshots before and used 2G files instead of 4G. In this case, I also
tested the "all" scenario.
I performed the third test on a HDD instead of tmpfs to maximize the
overhead of non-cached overlap checks (that is, checking inactive L2
table overlaps right now) which require disk I/O. I used -drive
cache=none for this test (in contrast to the others which ran on a tmpfs
anyway). Also, I've used 256M files since 2G just took too much time. :)
As far as I understand, the I/O speed (the duration of an I/O operation)
should be pretty much the same for all scenarios, however, the latency
is the value in question (since the overlap checks should affect the
latency only).
Basically, I didn't get any results which indicate a performance hit.
The raw HDD test data sometimes resulted in a standard deviation greater
than the average itself (!), thus I've removed some outliers there. The
averages rarely exceed each other's standard deviation and if they do,
often there is no trend at all. The only time there is a real trend
exceeding the standard deviation is for block writes in my first test –
however, the trend is negative, indicating overlap checks actually sped
things up (which is obviously contraintuitive). The difference, however,
is below 1 % anyway.
The only major differences visible (exceeding the combined standard
deviation of the two values in question) occured during the HDD test:
The duration of putc, block writes and rewrites for HDDs was much
greater (about 10 to 20 %, however, bear in mind the standard deviation
is in that magnitude as well) for "constant" and "cached" than for
"none" and "all". On the other hand, the putc and rewrite latency was
much better for "constant" and "cached" then for "none" and "all". The
durations differing so greatly is a sign to me that the data from this
test is not really usable (since I think it should be the same for all
scenarios). If we're to ignore that and the fact that there was indeed a
higher latency in "none" than in "all" for both latencies affected, we
could conclude that "all" is really much slower than "constant" or
"cached". But then again, the block write latency was even smaller for
"all" than for "cached" and "constant", so I'd just ignore these
benchmarks (for the HDD).
All in all, I don't see any significant performance difference when
benchmarking on a tmpfs (which should maximize the overhead of
"constant" and "cached") and the data from my HDD benchmarks is probably
stastically unusable. The only comparison which they would have been
useful for are the comparison of "all" to "cached", but since "all" will
not be the default (and anyone explicitly using this option is in fact
responsible for slow I/O himself)) they aren't actually that important
anyway.
I've attached a CSV file containing the edited results, that is, the
averages and standard deviations for the tests performed by bonnie++,
excluding some outliers from the HDD benchmark; I think the values are
given in microseconds.
Max
,putc,put_block,rewrite,putc_latency,put_block_latency,rewrite_latency
tmpfs (4G),,,,,,
None,1376±12,1112658±4230,463507±37777,9934±1021,6740±144,33143±7043
Constant,1338±8,1107024±2347,476831±35860,10014±1090,6846±254,34115±5867
Cached,1366±9,1104980±3903,463079±39164,10413±1158,6794±181,33820±6338
tmpfs+snapshots (2G),,,,,,
None,1392±8,1127802±6131,459712±24960,9614±1132,6523±120,27823±6142
Constant,1396±11,1126206±4224,453260±25135,9178±1249,6579±126,23221±7521
Cached,1394±7,1119699±6276,467627±27784,9470±951,6616±137,27254±6785
All,1398±12,1123245±3805,472850±29836,9500±823,6617±169,28555±6206
HDD+snapshots (256M),,,,,,
None,850±21,72862±9765,32392±3900,15954±1942,126226±28733,148929±37483
Constant,986±17,84240±9009,38247±1117,10843±1214,101496±29997,106627±19098
Cached,988±19,84111±8460,37691±1379,10310±761,101249±42366,103950±14453
All,873±13,75392±5506,34556±1273,12690±563,95506±4917,134220±24256