Here is my "inefficient" attempt on exposing a Git protocol over a Fossil repository. It is written in Ruby. http://fossil.webstream.io/fossil_ruby/artifact/e10666fa65ebce2e
How it works: * it uses The Dumb Protocol (https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols#The-Dumb-Protocol) which is basically exposing .git directory over HTTP protocol. * In The Dumb Protocol, client starts with requesting the "info/refs" file which calls the "fetch_info_refs" method * the "fetch_info_refs" method then iterates over all open Fossil branches, and generates a response which contains the associated Git SHA1 hash of these Fossil commits. How the Git SHA1 hashes are calculated: * Git has a very simple database layout, and this script takes advantage of that. * To generate the Git SHA1 hash for a Fossil manifest (which happens in the get_git_uuid_for_commit(fossil_manifest) method), this script generates a Git commit object (https://git-scm.com/book/en/v2/Git-Internals-Git-Objects#Commit-Objects) and then calculates its SHA1 hash. * The Git commit object is created in the method fetch_git_commit(fossil_manifest), which in turn uses the Git SHA1 hash of manifest's parents and tree. You can see that for fetching the Git SHA1 of its parents, it just gets recursive. * Creating Git tree object is rather easy (it just uses the fetch_git_tree(manifest, path) recursively, where for the first call, "path" is "/"). You can find the "files_level_hash_for" method here - http://fossil.webstream.io/fossil_ruby/artifact/03189394837f1db6 and more information on Git tree objects here - https://git-scm.com/book/en/v2/Git-Internals-Git-Objects#Tree-Objects Since calculating Git SHA1 hash for a Fossil manifest requires the script to already know the Git SHA1 hash for its parents, and recursively calculating parents' IDs will be very expensive. So, I use a table "git_objects" to store the Git SHA1 IDs of objects, once it is calculated. So, when the path "info/refs" is fetched for the first time, it generates Git SHA1 ID for all manifests using recursion and any subsequent requests then directly read from the cache ("git_objects" table). However, as noted earlier, that the code is inefficient. Since it uses recursion, it fails with "stack level too deep" for repositories with too many commits. Also the "git_objects" table gets too large containing hundreds of thousands of rows if not millions (specially those Git tree objects, as you need to create a row for every parent directory for a changed file in a commit). So, I started to redesign it. * To avoid the issue of system-stack-error, I used topological sorting and processed Fossil manifests in an iteration rather than recursion, and instead of building the git objects on demand (when the request comes in), build it in advance (e.g., when fossil writes a new manifest). * Write to file-system instead of a table. The script initialises a "git init --bare" repository and writes Git objects to the <git>/objects directory directly. An sql table is still needed though to find the associated Git commit object ID for a Fossil manifest, as well as finding Git blob object IDs for files. But you get to avoid the Git tree objects, so that saves tons of rows. * Periodically run "git fsck" to pack the objects to a Packfile (https://git-scm.com/book/en/v2/Git-Internals-Packfiles), so that the repository size doesn't increase like crazy. (The script writes full content of the file to the disk, and then leaves it to git-fsck to generate and store the diff) * I also added some concurrency using threads and mutex, so that multiple threads write to file-system while the main thread creates Git objects. (Yes, I'm aware that threads are evil, but I was just using this opportunity to teach myself more about Ruby mutex & fibers - https://www.sqlite.org/faq.html#q6) :-) I have not yet committed the code for above redesign. But if you are interested, I can commit in next 1-2 days. A better plan would be to understand the Git Packfile format and write directly to it. https://www.kernel.org/pub/software/scm/git/docs/v1.4.3/technical/pack-format.txt __ Vikrant Chaudhary http://webstream.io On 19 December 2015 at 20:37, Richard Hipp <[email protected]> wrote: > Would it be good to support the "git:" URL scheme for > clone/push/pull/sync? In other words, teach Fossil to understand the > GIT wire protocol, translating content to and from the GIT format as > it crosses the wire? > > This would allow you to "clone" repos off of GitHub. Or to > automatically sync your Fossil repositories on GitHub. > > I'd be willing to work on this as my Christmas project (assuming > nothing more pressing comes up over the Holiday). You can help by > looking up documentation on the Git wire protocol for > clone/push/pull/sync and sending me links. > > -- > D. Richard Hipp > [email protected] > _______________________________________________ > fossil-dev mailing list > [email protected] > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev _______________________________________________ fossil-dev mailing list [email protected] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev
