On Windows' NTFS implementation very small files (probably something like < 256 bytes, but this is not documented/strictly stable) are stored in the directory table and so don't use 'a whole cluster'.
Nice work on all the research! Bert On Tue, Oct 23, 2018 at 6:12 PM, Branko Čibej <br...@apache.org> wrote: > On 22.10.2018 22:14, Evgeny Kotkov wrote: > > Branko Čibej <br...@apache.org> writes: > > > >> Still missing is a mechanism for the libsvn_wc (and possibly > >> libsvn_client) to determine the capabilities of the working copy at > >> runtime (this will be needed for deciding whether to use compressed > >> pristines). > > FWIW, I tried the idea of using LZ4 to compress the pristines and > storing small > > pristines as blobs in the `PRISTINE` table. I was particularly > interested in > > how such change would affect the performance and what kind of obstacles > > would have to be dealt with. > > Nice! I did some simpler tests by compressing exported trees, but this > is definitely better. > > > In the attachment you will find a more or less functional implementation > of > > this idea that might be useful to some extent. The patch is a proof of > > concept: it doesn't include the WC compatibility bits and most certainly > > doesn't have everything necessary in place. But in the meanwhile, I > think > > that is might give a good approximation of what can be expected from the > > approach. > > > > The patch applies to the `better-pristines` branch. > > > > A couple of observations: > > > > - As expected, the combined size of the pristines is halved when the > data > > itself is compressible, thus making the working copy 25% smaller. > > Yes, that was my observation as well. In fact, though, storing small > BLOBs in the database itself should have even better effects, since the > space on disk actually used by a file is rounded up to the nearest > cluster size, but SQLite's blocks are typically much smaller than that. > > > > - A variety of the callers currently access the pristine contents by > reading > > the corresponding files. That doesn't work in case of compressed > pristines > > or pristines stored as BLOBs. > > > > I think that ideally we would want to use streams as much as > possible, and > > only spill the uncompressed pristine contents to temporary files when > we > > need to pass them to external tools, etc.; and that temporary files > need > > to be backed by a work queue to avoid leaving them in place in case > of an > > application crash. > > Yes and yes. Keeping those temporary spilled files on disk could turn > out to be a problem, finding a reasonable time to delete them without > having to run cleanup will be rather important, I think. > > > > The patch does that kind of plumbing to some extent, but that part of > the > > work is not complete. The starting point is around wc_db_pristine.c: > > svn_wc__db_pristine_get_path(). > > > > - Using BLOBs to store the pristine contents didn't have a measurable > impact > > on the speed of the WC operations such as checkout in my experiments > on > > Windows. These experiments were not comprehensive, and also I didn't > run > > the tests on *nix. > > I wouldn't expect much change in performance but would expect better use > of the disk, as explained above. > > > - There's also the deprecated svn_wc_get_pristine_copy_path() public > API that > > would require plumbing to maintain compatibility; the patch performs > it by > > spilling the pristine contents result into a temporary file whose > lifetime > > is attached to the `result_pool`. > > Ack; that's one reasonable definition of "lifetime." But I suspect that > any users of that function expect the pristine file to survive at least > to the next WC cleanup. > > > (I probably won't be able to continue the work on this patch in the > nearby > > future; posting this in case it might be useful.) > > Thanks, it definitely is useful! > > -- Brane > >