Re: Implementing reftable in Git
On Fri, 2018-05-11 at 11:31 +0200, Michael Haggerty wrote: > On Wed, May 9, 2018 at 4:33 PM, Christian Couder > wrote: > > I might start working on implementing reftable in Git soon. > > [...] > > Nice. It'll be great to have a reftable implementation in git core > (and ideally libgit2, as well). It seems to me that it could someday > become the new default reference storage method. The file format is > considerably more complicated than the current loose/packed scheme, > which is definitely a disadvantage (for example, for other Git > implementations). But implementing it *with good performance and > without races* might be no more complicated than the current scheme. I am somewhat concerned about perf, because as I recall, we have a bunch of code which effectively load all refs, which will be more expensive with reftable than packed-refs (though maybe cheaper than loose refs). But maybe we have eliminated this code or can work around it. > Testing will be important. There are already many tests specifically > about testing loose/packed reference storage. These will always have > to run against repositories that are forced to use that reference > scheme. And there will need to be new tests specifically about the > reftable scheme. Both classes of tests should be run every time. That > much is pretty obvious. > > But currently, there are a lot of tests that assume the loose/packed > reference format on disk even though the tests are not really related > to references at all. ISTM that these should be converted to work at > a > higher level, for example using `for-each-ref`, `rev-parse`, etc. to > examine references rather than reading reference files directly. That > way the tests should run correctly regardless of which scheme is in > use. I agree with that, and I think some of my patches from years ago attempted to do that. I probably should have broken those out into a separate series so that they could have been applied separately. > And since it's too expensive to run the whole test suite with both > reference storage schemes, it seems to me that the reference storage > scheme that is used while running the scheme-neutral tests should be > easy to choose at runtime. I ran the whole suite with both schemes during my testing, and I think it was quite valuable in flushing out bugs. > David Turner did some analogous work for wiring up and testing his > proposed LMDB ref storage backend that might be useful [1]. I'm CCing > him, since he might have thoughts on this topic. Inline, above.
Re: Implementing reftable in Git
On Wed, May 9, 2018 at 4:33 PM, Christian Couder wrote: > I might start working on implementing reftable in Git soon. > [...] Nice. It'll be great to have a reftable implementation in git core (and ideally libgit2, as well). It seems to me that it could someday become the new default reference storage method. The file format is considerably more complicated than the current loose/packed scheme, which is definitely a disadvantage (for example, for other Git implementations). But implementing it *with good performance and without races* might be no more complicated than the current scheme. Testing will be important. There are already many tests specifically about testing loose/packed reference storage. These will always have to run against repositories that are forced to use that reference scheme. And there will need to be new tests specifically about the reftable scheme. Both classes of tests should be run every time. That much is pretty obvious. But currently, there are a lot of tests that assume the loose/packed reference format on disk even though the tests are not really related to references at all. ISTM that these should be converted to work at a higher level, for example using `for-each-ref`, `rev-parse`, etc. to examine references rather than reading reference files directly. That way the tests should run correctly regardless of which scheme is in use. And since it's too expensive to run the whole test suite with both reference storage schemes, it seems to me that the reference storage scheme that is used while running the scheme-neutral tests should be easy to choose at runtime. David Turner did some analogous work for wiring up and testing his proposed LMDB ref storage backend that might be useful [1]. I'm CCing him, since he might have thoughts on this topic. Regarding the reftable spec itself: I recently gave a little internal talk about it, and while preparing the talk I noticed a couple of things that should maybe be tweaked: * The spec proposes to change `$GIT_DIR/refs`, which is currently a directory that holds the loose refs, into a file that holds the table of contents of reftable files comprising the full set of references. This was my suggestion. I was thinking that this would prevent old refs code from being used accidentally on a reftable-enabled repository, while still enabling old versions of Git recognize this as a git directory [2]. I think that the latter is important to make things like `git rev-parse --git-dir` work correctly, even if the installed version of git can't actually *read* the repository. The problem is that `is_git_directory()` checks not only whether `$GIT_DIR/refs` exists, but also whether it is executable (i.e., since it is normally a directory, that it is searchable). It would be silly to make the reftable table of contents executable, so this doesn't seem like a good approach after all. So probably `$GIT_DIR/refs` should continue to be a directory. If it's there, it would probably make sense to place the reftable files and maybe the ToC inside of it. We would have to rely on older Git versions refusing to work in the directory because its `config` file has an unrecognized `core.repositoryFormatVersion`, but that should be OK I think. * The scheme for naming reftable files [3] is, I believe, just a suggestion as far as the spec is concerned (except for the use of `.ref`/`.log` file extensions). It might be more less unwieldy to use `%d` rather than `%08d`, and more convenient to name compacted files to `${min_update_index}-${max_update_index}_${n}.{ref,log}` to make it clearer to see by inspection what each file contains. That would also make it unnecessary, in most cases, to insert a `_${n}` to make the filename unique. Michael [1] https://github.com/dturner-tw/git/tree/dturner/pluggable-backends [2] https://github.com/git/git/blob/ccdcbd54c4475c2238b310f7113ab3075b5abc9c/setup.c#L309-L347 [3] https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#layout https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#compaction [4] https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#footer
Re: Implementing reftable in Git
On Wed, May 09 2018, Stefan Beller wrote: > Hi Christian, > > On Wed, May 9, 2018 at 7:33 AM, Christian Couder > wrote: >> Hi, >> >> I might start working on implementing reftable in Git soon. > > Cool! Everyone is waiting for it as they dream about the > performance and correctness benefits this brings. > > Benefits that I know of: > * performance in repos with many refs > * no capitalization issues on case insensitive FS > * replay-ability of the last fetch ("show the last reflog > of any ref under refs/remote/origin") is easier to do > in a correct way. (This is one of my motivations to desire reftables) > * We *might* be able to use reftables in negotiation later > ("client: Last I fetched, you said your latest transaction > number was '5' with the hash over all refs to be ; > server: ok, here are the refs and the pack, you're welcome"). > > Why are you (or rather booking.com) interested in this? We have a lot of refs, which is a longer-term scalability issue (which I've implemented hacks around (ref archiving)), and we also run into the capitalization issues you mentioned.
Re: Implementing reftable in Git
On Wed, 2018-05-09 at 10:54 -0700, Jonathan Nieder wrote: > Carlos Martín Nieto wrote: > > On Wed, 2018-05-09 at 09:48 -0700, Jonathan Nieder wrote: > > > If you would like the patches at https://git.eclipse.org/r/q/topi > > > c:reftable > > > relicensed for Git's use so that you don't need to include that > > > license header, let me know. Separate from any legal concerns, > > > if > > > you're doing a straight port, a one-line comment crediting the > > > JGit > > > project would still be appreciated, of course. > > [...] > > Would you expect that this port would keep the Eclipse Distribution > > License or would it get relicensed to GPLv2? > > I think you're way overcomplicating things. > > The patches are copyright Google. We can handle issues as they come. Fair enough. I just wanted to avoid coming back to this in a few months and realising we can't use it at all. Cheers, cmn
Re: Implementing reftable in Git
On Wed, May 9, 2018 at 10:48 AM, Jonathan Nieder wrote: > Stefan Beller wrote: > >> * We *might* be able to use reftables in negotiation later >> ("client: Last I fetched, you said your latest transaction >> number was '5' with the hash over all refs to be ; >> server: ok, here are the refs and the pack, you're welcome"). > > Do you mean that reftable's reflog layout makes this easier? > > It's not clear to me why this wouldn't work with the current > reflogs. Because of D/F conflicts we may not know all remote refs (and their ref logs), such that "the hash over all refs" on the remote is error prone to compute. Without transaction numbers it is also cumbersome for the server to remember the state. We could try it based on the current refs, but I'd think it is not easy to do, whereas reftables bring some subtle advantages that allow for such easier negotiation. > > [...] >> On Wed, May 9, 2018 at 7:33 AM, Christian Couder >> wrote: > >>> During the last Git Merge conference last March Stefan talked about >>> reftable. In Alex Vandiver's notes [1] it is asked that people >>> announce it on the list when they start working on it, >> >> Mostly because many parties want to see it implemnented >> and were not sure when they could start implementing it. > > And to coordinate / help each other! Yes. Usually open source contributions are so sparse, that just doing it and then sending it to the mailing list does not produce contention or conflict (double work), but this seemed like a race condition waiting to happen. ;) >> With that said, please implement it in a way that it can not just be used as >> a refs backend, but can easily be re-used to write ref advertisements >> onto the wire? > > Can you spell this out a little more for me? At first glance it's not > obvious to me how knowing about this potential use would affect the > initial code. Yeah me neither. I just want to make Christian aware of the potential use cases, that come afterwards, so it can influence his design decisions for the implementation.
Re: Implementing reftable in Git
Carlos Martín Nieto wrote: > On Wed, 2018-05-09 at 09:48 -0700, Jonathan Nieder wrote: >> If you would like the patches at https://git.eclipse.org/r/q/topic:reftable >> relicensed for Git's use so that you don't need to include that >> license header, let me know. Separate from any legal concerns, if >> you're doing a straight port, a one-line comment crediting the JGit >> project would still be appreciated, of course. [...] > Would you expect that this port would keep the Eclipse Distribution > License or would it get relicensed to GPLv2? I think you're way overcomplicating things. The patches are copyright Google. We can handle issues as they come. Jonathan
Re: Implementing reftable in Git
Hi all, On Wed, 2018-05-09 at 09:48 -0700, Jonathan Nieder wrote: > Hi, > > Christian Couder wrote: > > > I might start working on implementing reftable in Git soon. > > Yay! > > [...] > > So I think the most straightforward and compatible way to do it would > > be to port the JGit implementation. > > I suspect following the spec[1] would be even more compatible, since it > would force us to tighten the spec where it is unclear. > > >It looks like the > > JGit repo and the reftable code there are licensed under the Eclipse > > Distribution License - v 1.0 [7] which is very similar to the 3-Clause > > BSD License also called Modified BSD License > > If you would like the patches at https://git.eclipse.org/r/q/topic:reftable > relicensed for Git's use so that you don't need to include that > license header, let me know. Separate from any legal concerns, if > you're doing a straight port, a one-line comment crediting the JGit > project would still be appreciated, of course. > > That said, I would not be surprised if going straight from the spec is > easier than porting the code. Would you expect that this port would keep the Eclipse Distribution License or would it get relicensed to GPLv2? We would also want to have reftable functionality in the libgit2 project, but it has a slightly different license from git (GPLv2 with linking exception) which requires explicit consent from the authors for us to port over the code from git with its GPLv2 license. The libgit2 project does have permission from Shawn to relicense his git code, but this would presumably not cover this kind of porting. I don't believe we would have issues if the code remained this BSD-like license. Sorry for being difficult, but fewer distinct reimplementations is probably a good thing overall. cc the core libgit2 team Cheers, cmn
Re: Implementing reftable in Git
Stefan Beller wrote: > * We *might* be able to use reftables in negotiation later > ("client: Last I fetched, you said your latest transaction > number was '5' with the hash over all refs to be ; > server: ok, here are the refs and the pack, you're welcome"). Do you mean that reftable's reflog layout makes this easier? It's not clear to me why this wouldn't work with the current reflogs. [...] > On Wed, May 9, 2018 at 7:33 AM, Christian Couder > wrote: >> During the last Git Merge conference last March Stefan talked about >> reftable. In Alex Vandiver's notes [1] it is asked that people >> announce it on the list when they start working on it, > > Mostly because many parties want to see it implemnented > and were not sure when they could start implementing it. And to coordinate / help each other! [...] > I volunteer for reviewing. \o/ [...] > With that said, please implement it in a way that it can not just be used as > a refs backend, but can easily be re-used to write ref advertisements > onto the wire? Can you spell this out a little more for me? At first glance it's not obvious to me how knowing about this potential use would affect the initial code. Thanks, Jonathan
Re: Implementing reftable in Git
Hi Christian, On Wed, May 9, 2018 at 7:33 AM, Christian Couder wrote: > Hi, > > I might start working on implementing reftable in Git soon. Cool! Everyone is waiting for it as they dream about the performance and correctness benefits this brings. Benefits that I know of: * performance in repos with many refs * no capitalization issues on case insensitive FS * replay-ability of the last fetch ("show the last reflog of any ref under refs/remote/origin") is easier to do in a correct way. (This is one of my motivations to desire reftables) * We *might* be able to use reftables in negotiation later ("client: Last I fetched, you said your latest transaction number was '5' with the hash over all refs to be ; server: ok, here are the refs and the pack, you're welcome"). Why are you (or rather booking.com) interested in this? > During the last Git Merge conference last March Stefan talked about > reftable. In Alex Vandiver's notes [1] it is asked that people > announce it on the list when they start working on it, Mostly because many parties want to see it implemnented and were not sure when they could start implementing it. > and it appears > that there is a reference implementation in JGit. The reference implementation can be used in tests to see if we can interact with them, using the JGIT pre-requisite. > Looking it up, there is indeed some documentation [2], code [3], tests > [4] and other related stuff [5] in the JGit repo. It looks like the > JGit repo and the reftable code there are licensed under the Eclipse > Distribution License - v 1.0 [7] which is very similar to the 3-Clause > BSD License also called Modified BSD License which is GPL compatible > according to gnu.org [9]. So from a quick look it appears that I > should be able to port the JGit to Git if I just keep the copyright > and license header comments in all the related files. > > So I think the most straightforward and compatible way to do it would > be to port the JGit implementation. I would think you can go by the spec and then test if it is compatible with JGit; that way the spec will be ironed out in corner cases. > Thanks in advance for any suggestion or comment about this. I volunteer for reviewing. (Advanced:) The spec allows for some tune-able parameters and JGits use is heavily optimized for the server side. I think git-core may need to have slightly different tweaks in different situations, e.g. block sizes and how many restarts are put into the block. On the FS we may want to have faster access at the cost of more disk space, whereas in the future when using reftables on the wire as well for ref advertisement we may want to opt for smallest tables. (largest blocks, no restarts) With that said, please implement it in a way that it can not just be used as a refs backend, but can easily be re-used to write ref advertisements onto the wire? Thanks, Stefan
Re: Implementing reftable in Git
Hi, Christian Couder wrote: > I might start working on implementing reftable in Git soon. Yay! [...] > So I think the most straightforward and compatible way to do it would > be to port the JGit implementation. I suspect following the spec[1] would be even more compatible, since it would force us to tighten the spec where it is unclear. >It looks like the > JGit repo and the reftable code there are licensed under the Eclipse > Distribution License - v 1.0 [7] which is very similar to the 3-Clause > BSD License also called Modified BSD License If you would like the patches at https://git.eclipse.org/r/q/topic:reftable relicensed for Git's use so that you don't need to include that license header, let me know. Separate from any legal concerns, if you're doing a straight port, a one-line comment crediting the JGit project would still be appreciated, of course. That said, I would not be surprised if going straight from the spec is easier than porting the code. Thanks, Jonathan [1] https://eclipse.googlesource.com/jgit/jgit/+/master/Documentation/technical/reftable.md
Re: Implementing reftable in Git
On Wed, May 9, 2018 at 4:33 PM, Christian Couder wrote: > Hi, > > I might start working on implementing reftable in Git soon. Adding Michael Haggerty who did lots of work on ref stuff. He probably can give a few suggestions. You probably should also look at the last attempt to add lmdb as a new ref backend. I'm not sure why it's still not in, maybe it wasn't the right time (e.g. infrastructure was not ready). > During the last Git Merge conference last March Stefan talked about > reftable. In Alex Vandiver's notes [1] it is asked that people > announce it on the list when they start working on it, and it appears > that there is a reference implementation in JGit. > > Looking it up, there is indeed some documentation [2], code [3], tests > [4] and other related stuff [5] in the JGit repo. It looks like the > JGit repo and the reftable code there are licensed under the Eclipse > Distribution License - v 1.0 [7] which is very similar to the 3-Clause > BSD License also called Modified BSD License which is GPL compatible > according to gnu.org [9]. So from a quick look it appears that I > should be able to port the JGit to Git if I just keep the copyright > and license header comments in all the related files. > > So I think the most straightforward and compatible way to do it would > be to port the JGit implementation. > > Thanks in advance for any suggestion or comment about this. > > Reftable was first described by Shawn and then discussed last July on > the list [6]. > > My work on this would be sponsored by Booking.com. > > Thanks, > Christian. > > [1] > https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/ > > [2] > https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md > > [3] > https://github.com/eclipse/jgit/tree/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/reftable > > [4] > https://github.com/eclipse/jgit/tree/master/org.eclipse.jgit.test/tst/org/eclipse/jgit/internal/storage/reftable > > [5] > https://github.com/eclipse/jgit/tree/master/org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/debug > > [6] > https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=s-dpz5dr1af+vhvetkg2...@mail.gmail.com/ > > [7] http://www.eclipse.org/org/documents/edl-v10.php > > [8] https://opensource.org/licenses/BSD-3-Clause > > [9] https://www.gnu.org/licenses/license-list.en.html#ModifiedBSD -- Duy
Re: Implementing reftable in Git
On 5/9/2018 10:33 AM, Christian Couder wrote: Hi, I might start working on implementing reftable in Git soon. During the last Git Merge conference last March Stefan talked about reftable. In Alex Vandiver's notes [1] it is asked that people announce it on the list when they start working on it, and it appears that there is a reference implementation in JGit. Thanks for starting on this! In addition to the performance gains, this will help a lot of users with case-insensitive file systems from getting case-errors on refnames. Looking it up, there is indeed some documentation [2], code [3], tests [4] and other related stuff [5] in the JGit repo. It looks like the JGit repo and the reftable code there are licensed under the Eclipse Distribution License - v 1.0 [7] which is very similar to the 3-Clause BSD License also called Modified BSD License which is GPL compatible according to gnu.org [9]. So from a quick look it appears that I should be able to port the JGit to Git if I just keep the copyright and license header comments in all the related files. So I think the most straightforward and compatible way to do it would be to port the JGit implementation. Thanks in advance for any suggestion or comment about this. Reftable was first described by Shawn and then discussed last July on the list [6]. The hope is that such a direct port should be possible, but someone else should comment on the porting process. This is also something that could be created independently based on the documentation you mention. I was planning to attempt that during a hackathon in July, but I'm happy you are able to start earlier (and that you are announcing your intentions). I would be happy to review your patch series, so please keep me posted. Thanks, -Stolee
Implementing reftable in Git
Hi, I might start working on implementing reftable in Git soon. During the last Git Merge conference last March Stefan talked about reftable. In Alex Vandiver's notes [1] it is asked that people announce it on the list when they start working on it, and it appears that there is a reference implementation in JGit. Looking it up, there is indeed some documentation [2], code [3], tests [4] and other related stuff [5] in the JGit repo. It looks like the JGit repo and the reftable code there are licensed under the Eclipse Distribution License - v 1.0 [7] which is very similar to the 3-Clause BSD License also called Modified BSD License which is GPL compatible according to gnu.org [9]. So from a quick look it appears that I should be able to port the JGit to Git if I just keep the copyright and license header comments in all the related files. So I think the most straightforward and compatible way to do it would be to port the JGit implementation. Thanks in advance for any suggestion or comment about this. Reftable was first described by Shawn and then discussed last July on the list [6]. My work on this would be sponsored by Booking.com. Thanks, Christian. [1] https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/ [2] https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md [3] https://github.com/eclipse/jgit/tree/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/reftable [4] https://github.com/eclipse/jgit/tree/master/org.eclipse.jgit.test/tst/org/eclipse/jgit/internal/storage/reftable [5] https://github.com/eclipse/jgit/tree/master/org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/debug [6] https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=s-dpz5dr1af+vhvetkg2...@mail.gmail.com/ [7] http://www.eclipse.org/org/documents/edl-v10.php [8] https://opensource.org/licenses/BSD-3-Clause [9] https://www.gnu.org/licenses/license-list.en.html#ModifiedBSD