(2) is the best path for USERS of subversion. More toggles is mired in risk and adding complexity. Improvements should "just work" out the box - unless there is some technical hurdle. A 25% increase in disk usage is nothing today for even a fraction more speed on operations happening thousands of times a day on a typical team. However, this is more than a fraction!
Great quantitative metrics Evgeny. On Fri, Aug 18, 2017 at 2:58 PM, Evgeny Kotkov <evgeny.kot...@visualsvn.com> wrote: > Evgeny Kotkov <evgeny.kot...@visualsvn.com> writes: > > > (B) For the on-disk data, we start using LZ4 compression by default > > (in format 8 repositories). > > > > The reasoning behind this is that currently, zlib compression is a > > hotspot that can limit the performance of both read and write > > operations on the repository. It also affects how well Subversion > > works when dealing with large and, possibly, incompressible files > > (and I tend to think that it's a fairly important use case). > > > > Switching to a faster compression algorithm that is also used by > other > > various file system implementations should improve the performance > of > > such operations in a visible way. Please note that this change is a > > trade-off between the compression ratio and speed of the operations. > > The repositories using LZ4 compression would require a bit more disk > > space. The amount of the required additional space is proportional > > to the difference between the compression ratio of LZ4 and zlib-5, > > which can be roughly estimated as around 30-35% for compressible > > binary and text files, although that may vary depending on the > > actual data. > > > > To illustrate how these changes will affect the speed of some of the > > operations, the 'svn import' of a 2 GB file over HTTP on LAN in my > > environment takes 18 seconds instead of 63 seconds. > > Here are some additional zlib-5 vs. LZ4 benchmarks to consider: > > (All tests were performed on the SSD drive using the file:// protocol. > The results should be interpreted as "before is zlib-5, after is LZ4". > Also, the results over http:// are somewhat similar in terms of the > improvement factor and are omitted for brevity. "Import time " is > for "svn import", "Export time" is for "svnbench null-export".) > > - One compressible file, 1.17 GB: > > Import time: 40.79 s → 11.97 s (3.4 x faster) > Export time: 6.30 s → 3.13 s (2.0 x faster) > Compression ratio: 31.8 % → 43.8% (384 MB → 529 MB on disk) > > - One incompressible file, 833 MB: > > Import time: 32.16 s → 8.22 s (3.9 x faster) > Export time: 2.71 s → 2.06 s (1.3 x faster) > Compression ratio: 91.9 % → 93.3% (766 MB → 778 MB on disk) > > - Multiple source code files (TortoiseSVN trunk), 213 MB, ~7,000 files: > > Import time: 17.83 s → 10.36 s (1.7 x faster) > Export time: 1.62 s → 1.15 s (1.4 x faster) > Compression ratio: 35.2 % → 48.8 % (75 MB → 104 MB on disk) > > - Multiple binary files, 1.68 GB, 25 files: > > Import time: 55.10 s → 15.84 s (3.5 x faster) > Export time: 8.56 s → 4.34 s (2.0 x faster) > Compression ratio: 38.4 % → 46.9 % (662 MB → 807 MB on disk) > > > Reiterating over the whole topic of the default compression algorithm for > the repositories, I think that we have the following options to choose > from: > > (1) Make LZ4 compression optional in format 8 repositories, and still use > zlib-5 compression by default. > > With this approach, users would have to have "compression=lz4" in > fsfs.conf to use it. Personally, I would expect a number of such users > to be quite low, because they would have to both upgrade the repository > to fsfs format 8 and use non-default fsfs.conf settings. > > This option means that we'd keep our existing performance > characteristics > with read and write operations being limited by the compression speed > of zlib-5 (which isn't exactly fast) for most of the users. It also > means > that the expected size and the compression ratio of the repository data > would remain unchanged. > > (2) Compress with LZ4 by default in all (new and upgraded) format 8 > repositories. > > This approach means that a much bigger part of our users will have > their data compressed with LZ4, and will get the visible read and write > performance improvement. It also means that the compression ratio of > the on disk data will be lower than with zlib-5, and the projected > size of the repositories will increase accordingly. > > One additional point to consider here is that such change may be > going a bit against the policy of adding a new optional feature and > switching the default in the next minor release. > > (3) Compress with LZ4 by default, but only in new format 8 repositories. > > This option is similar to (2), but with a more limited scope where > LZ4 compression is only used for the new repositories created with > Subversion 1.10 binaries. > > > Personally, I find the significant speed improvement for both read and > write > operations from LZ4 compression quite important, and I think that the > actual > reduction in the compression ratio is acceptable, considering the gained > benefits. I also think that the risks associated with switching the > default > on-disk format are low in this particular case, considering that the LZ4 > library is stable. (It has been available for a long time and is used by > projects like Linux Kernel and ZFS). > > In other words, I think that we would benefit from using LZ4 compression > by default. > > Among the options (2) and (3) that make LZ4 the new default compression > algorithm, I think that option (2) is better. The reasoning here is that > using LZ4 compression would improve the performance even for existing > repositories by making new commits faster and by speeding up read > operations for the new committed files. Apart from this, option (3) > needs implementation and is probably going to have a couple of related > challenges, which can be otherwise avoided. > > With all that in mind, I propose that we do (2). Any objections? > > > Thanks, > Evgeny Kotkov > -- Jacek Materna Chief Technology Officer Assembla +1 210 410 7661