Re: [fossil-users] Fossil performance and optimisation

Damien Sykes-Lindley Fri, 11 Aug 2017 13:57:17 -0700

Hi Warren,

Very interesting. As I said I don't understand how these things workinternally, I just thought I'd put it out here and see what the moreknowledgeable people thought of these comparisons. I just know that itrecords artefact changes, I've no idea how the compression or local/remotesync works or what repercussions would occur as a result in either system.

I ought to clarify that the games I work on are based on sound rather thangraphics, and so that sounds can easily be edited and changed without lossof quality they are uncompressed PCM data. I suppose I could convert them toFLAC, but given that the game engine doesn't support this format they wouldhave to be reconverted back to PCM before every test.As for the executable, sometimes that gets included due to the fact that weforget to delete it after testing an executable copy and don't use Fossil'signore feature, although to be fair with some of my non-game projects I doinclude executables and binary libraries for the simple fact that using amainstream, compiled language like C++ where there are many assets to managecan make compilation a real ballache - maybe that's because I'm so new tosuch languages. So if I include the executable I can always get the latestwithout having to spend hours trying to figure out how to recompile thething.Same with libraries - some libraries that I use are proprietary and soincluding them in source form isn't an option. Especially in an interpretedlanguage where I can simply run the code from the source script, if it needsthe libraries to run I always include them. Again, maybe a mistake on mypart but I always like to be prepared when I am checking out a new branch orversion to test or work with.

The Fossil and Git commit tests were both locals with no remotes attached,so one didn't have any speed advantages against the other due to networkusage.

I had no idea about the checksum setting. Despite three year usage, it seemsI've only scratched the surface of Fossil. Again, another reason why I putsome of my thoughts out there. Some users, and of course the developersthemselves, who are much more knowledgeable on these things than I am.

Cheers.
Damien.

-----Original Message-----From: Warren Young

Sent: Friday, August 11, 2017 9:08 PM
To: Fossil SCM user's discussion
Subject: Re: [fossil-users] Fossil performance and optimisation

On Aug 11, 2017, at 7:10 AM, Damien Sykes-Lindley<dam...@dcpendleton.plus.com> wrote:

I couldn't help noticing there seemed to be a silence on speedcomparisons.

There have been many threads on this over the years. Just for a start,search the list archives for “NetBSD”.

After cloning and working with several publicised Fossil repositories, Ican't help but notice that the majority of them are rather small.

Yes, that’s best practice for most any DVCS. Even the Linux project takesthis philosophy:


   http://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html

That is, the single monolithic repo you see when cloning “the Linux Gitrepo” is something of an illusion, which the developers of Linux don’tactually deal with very much.

Most of the projects that I am involved with are games...Of course thesewill contain binary files


That “of course” needn’t be a foregone conclusion.

Many asset formats are available in text forms, which are friendly for usein version control systems. For example, you may be able to store 3D modelsin the repository in COLLADA format and some 2D assets in SVG.

For the bitmapped textures, it’s better to store those as uncompressedbitmap formats, then compress them during the build process to whateverformat you’ll use within the game engine and for distribution.

A 1-pixel change to a Windows BMP file causes a much smaller change to thesize of a Fossil repository than does a 1-pixel change to a JPEG or PNG,because that 1 pixel difference can throw off the whole rest of thecompression algorithm, causing much of the rest of the file to change.

This can be tricky to manage. You might think TIFF is a good file formatfor this purpose, but you’re forgetting all the metadata in it that changessimply when a file is opened and re-saved. (Timestamps, GUIDs, etc.) It’sbetter to go with a bare “box of pixels” format like Windows BMP.

All of this does make the checkout size bigger, but Fossil’s deltacompression has two positive consequences here:

1. The Fossil repository size will probably be as small or even smaller. Azlib-compressed Windows BMP file is going to be about the same size as a PNGfile with the same content.

2. If those files are changed multiple times between initial creation andproduct ship time, the delta compression will do a far better job if theinput data isn’t already compressed. This is how you get the highcompression ratios you see on most Fossil repositories by visiting their/stat page. My biggest repository is rocking along at 39:1 compressionratio, and it hasn’t been rebuilt and recompressed lately.

(generally an executable


Why would you include generated files in a version control repository?

Fossil is not a networked file system. If you try to treat it like one, itwill take its revenge on you.

dependency libraries


In source code form only, perhaps.

Even then, it’s better to hold those in separate repositories.

It would be nice if Fossil had a sub-modules feature like Git to help withthis, so that opening the main repository also caused sub-Fossils to becloned and opened in subdirectories. Meanwhile, you have to do manual“fossil open --nested” commands, but it’s a one-time hassle.

Nested checkins would also be nice. That is, if a file changes in a nestedcheckout, a “fossil ci” from the top level should offer to check in thechanges on the sub-project.

Also note that all commits were tests only and so weren't synced toremotes. Naturally this means that commits are even slower when syncing.

It also means that local differences are a smaller percentage of the totaltime taken for many operations, since the time may be swamped by networkI/O.

For instance, I notice in your tests that you seem to be comparing “fossilci” to “git commit”, where the fair test would be against “git commit -a &&git push”.

1. Git seems to do better at compressing and opening smaller repositories,while Fossil triumphs over larger ones.


Be careful with such comparisons.

Fossil repositories aren’t kept optimally small, since that would increasethe time for checkins and such. Every now and then, even after an initialimport, you want to look into “fossil rebuild” and some of its more advancedoptions.

This is what I was getting at about with my comments about the 39:1compression ratio I’m currently seeing on my largest Fossil repository. Iexpect I could make it smaller, if I did such a rebuild.

I have no idea if Git has some similar “rebuild” feature, though I willspeculate that the per-file filesystem overheads will eat away at a lot ofany advantages Git has. Be sure you’re calculating size-on-disk, not thetotal size of the files alone. That is, a 1 byte file on a filesystem witha 4K block size takes 4K plus a directory entry, not 1 byte.

Fossil, by keeping all artifacts in a single file, does not have thisoverhead. The “rebuild” problem is probably its closest analog.

3. The speed of a commit in Git seems to be dependent on the size of thechange. The bigger the changes, naturally the slower the commits. Commitsin Fossil seem, with a few discrepancies (notably commit times in repos 1and 2), to be dependent on the size of the repository.

That’s probably due to the repo-cksum setting, which defaults to “on,” andwhich has no equivalent in Git. You’ll probably gain a lot of speed byturning that off:


   https://www.fossil-scm.org/index.html/help?cmd=settings

I was very interested to find an article hidden in the depths of theFossil website(http://fossil-scm.org/index.html/event?name=be8f2f3447ef2ea3344f8058b6733aa08c08336f)


That’s a summary of the NetBSD threads I referred to above.

are there any plans to optimise Fossil in the future?

My sense is that it depends on people scratching their own itches. TheSQLite, Fossil, and Tcl projects don’t need Fossil to be faster, so it’sfine for now. If someone wants to come along and make Fossil support hugerepositories, I’m sure the patches would be thoughtfully reviewed andpossibly accepted.

One option not covered in the tech note you found is the possibility ofnarrow and shallow clones:

1. Narrow: The local clone doesn’t contain the history of all assets in theremote repository (e.g. just one subdirectory)

2. Shallow: The local clone contains only the most recent history of allassets in the remote repository. With depth=1, you get the effect ofold-style VCSes like Subversion, except that you have the option to build upmore history in the local repository as time goes on.


Both would help significantly, but no one has stepped up to do either yet.
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org

http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] Fossil performance and optimisation

Reply via email to