Re: reftable [v5]: new ref storage format

2017-08-14 Thread Jeff King
On Mon, Aug 14, 2017 at 04:05:05PM +, David Turner wrote:

> > All that aside, we could simply add an EXCLUSIVE open-flag to LMDB, and
> > prevent multiple processes from using the DB concurrently. In that case,
> > maintaining coherence with other NFS clients is a non-issue. It strikes me 
> > that git
> > doesn't require concurrent multi-process access anyway, and any particular
> > process would only use the DB for a short time before closing it and going 
> > away.
> 
> Git, in general, does require concurrent multi-process access, depending on 
> what 
> that means.
> 
> For example, a post-receive hook might call some git command which opens the 
> ref database.  This means that git receive-pack would have to close and 
> re-open the ref database.  More generally, a fair number of git commands are
> implemented in terms of other git commands, and might need the same treatment.
> We could, in general, close and re-open the database around fork/exec, but I 
> am
> not sure that this solves the general problem -- by mere happenstance, one 
> might
> be e.g. pushing in one terminal while running git checkout in another.  This 
> is 
> especially true with git worktrees, which share one ref database across 
> multiple
> working directories.

Yeah, I'd agree that git's multi-process way of working would probably
cause some headaches if there were a broad lock.

I had the impression that Howard meant we would lock for _read_
operations, too. If so, I think that's probably going to cause a
noticeable performance problem for servers.  A repository which is
serving fetches to a lot of clients (even if some of those are noops)
has to send the current ref state out to each client. I don't think we'd
want to add a serial bottleneck to that portion of each process, which
can otherwise happen totally in parallel.

Serializing writes is probably not so big a deal as long as it is kept
to the portion where the process is actively writing out values. And as
long as there's a reasonable backoff/retry protocol; right now we don't
generally bother retrying ref locks because they're taken individually,
so racing on a lock almost certainly[1] means that you've lost the
sha1-lease and need to restart the larger operation.

-Peff

[1] Actually, we've found this isn't always true. Things like ref
packing require taking locks for correctness, which means they can
interfere with actual ref updates. That's yet another thing it would
be nice to get rid of when moving away from the loose/packed
storage.


RE: reftable [v5]: new ref storage format

2017-08-14 Thread David Turner
> -Original Message-
> From: Howard Chu [mailto:h...@symas.com]
> Sent: Monday, August 14, 2017 8:31 AM
> To: spea...@spearce.org
> Cc: David Turner <david.tur...@twosigma.com>; ava...@gmail.com;
> ben.a...@acegi.com.au; dborow...@google.com; git@vger.kernel.org;
> gits...@pobox.com; mhag...@alum.mit.edu; p...@peff.net;
> sbel...@google.com; sto...@gmail.com
> Subject: Re: reftable [v5]: new ref storage format
> 
> Howard Chu wrote:
> > The primary issue with using LMDB over NFS is with performance. All
> > reads are performed thru accesses of mapped memory, and in general,
> > NFS implementations don't cache mmap'd pages. I believe this is a
> > consequence of the fact that they also can't guarantee cache
> > coherence, so the only way for an NFS client to see a write from
> > another NFS client is by always refetching pages whenever they're accessed.
> 
> > LMDB's read lock management also wouldn't perform well over NFS; it
> > also uses an mmap'd file. On a local filesystem LMDB read locks are
> > zero cost since they just atomically update a word in the mmap. Over
> > NFS, each update to the mmap would also require an msync() to
> > propagate the change back to the server. This would seriously limit
> > the speed with which read transactions may be opened and closed.
> > (Ordinarily opening and closing a read txn can be done with zero
> > system calls.)
> 
> All that aside, we could simply add an EXCLUSIVE open-flag to LMDB, and
> prevent multiple processes from using the DB concurrently. In that case,
> maintaining coherence with other NFS clients is a non-issue. It strikes me 
> that git
> doesn't require concurrent multi-process access anyway, and any particular
> process would only use the DB for a short time before closing it and going 
> away.

Git, in general, does require concurrent multi-process access, depending on 
what 
that means.

For example, a post-receive hook might call some git command which opens the 
ref database.  This means that git receive-pack would have to close and 
re-open the ref database.  More generally, a fair number of git commands are
implemented in terms of other git commands, and might need the same treatment.
We could, in general, close and re-open the database around fork/exec, but I am
not sure that this solves the general problem -- by mere happenstance, one might
be e.g. pushing in one terminal while running git checkout in another.  This is 
especially true with git worktrees, which share one ref database across multiple
working directories.



Re: reftable [v5]: new ref storage format

2017-08-14 Thread Howard Chu

Howard Chu wrote:
The primary issue with using LMDB over NFS is with performance. All reads are 
performed thru accesses of mapped memory, and in general, NFS implementations 
don't cache mmap'd pages. I believe this is a consequence of the fact that 
they also can't guarantee cache coherence, so the only way for an NFS client 
to see a write from another NFS client is by always refetching pages whenever 
they're accessed.


LMDB's read lock management also wouldn't perform well over NFS; it also uses an mmap'd file. On a local filesystem LMDB read locks are zero cost since they just atomically update a word in the mmap. Over NFS, each update to the mmap would also require an msync() to propagate the change back to the server. This would seriously limit the speed with which read transactions may be opened and closed. (Ordinarily opening and closing a read txn can be done with zero system calls.) 


All that aside, we could simply add an EXCLUSIVE open-flag to LMDB, and 
prevent multiple processes from using the DB concurrently. In that case, 
maintaining coherence with other NFS clients is a non-issue. It strikes me 
that git doesn't require concurrent multi-process access anyway, and any 
particular process would only use the DB for a short time before closing it 
and going away.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/


Re: reftable [v5]: new ref storage format

2017-08-09 Thread Howard Chu

Shawn Pearce wrote:

On Sun, Aug 6, 2017 at 4:37 PM, Ben Alex  wrote:

> Just on the LmdbJava specific pieces:
>
> On Mon, Aug 7, 2017 at 8:56 AM, Shawn Pearce  wrote:



I don't know if we need a larger key size. $DAY_JOB limits ref names
to ~200 bytes in a hook. I think GitHub does similar. But I'm worried
about the general masses who might be using our software and expect
ref names thus far to be as long as PATH_MAX on their system. Most
systems run PATH_MAX around 1024.


The key size limit in LMDB can be safely raised to around 2KB or so without 
any issues. There's also work underway in LMDB 1.0 to raise the limit to 2GB, 
but in general it would be silly to use such large keys.



Mostly at $DAY_JOB its because we can't virtualize the filesystem
calls the C library is doing.

In git-core, I'm worried about the caveats related to locking. Git
tries to work nicely on NFS,


That may be a problem in current LMDB 0.9, but needs further clarification.


and it seems LMDB wouldn't. Git also runs
fine on a read-only filesystem, and LMDB gets a little weird about
that.


Not sure what you're talking about. LMDB works perfectly fine on read-only 
filesystems, it just enforces that it is used in read-only mode.



Finally, Git doesn't have nearly the risks LMDB has about a
crashed reader or writer locking out future operations until the locks
have been resolved. This is especially true with shared user
repositories, where another user might setup and own the semaphore.


All locks disappear when the last process using the DB environment exits.
If only a single process is using the DB environment, there's no issue. If 
multiple processes are sharing the DB environment concurrently, the write lock 
cleans up automatically when the writer terminates; stale reader locks would 
require a call to mdb_reader_check() to clean them up.


The primary issue with using LMDB over NFS is with performance. All reads are 
performed thru accesses of mapped memory, and in general, NFS implementations 
don't cache mmap'd pages. I believe this is a consequence of the fact that 
they also can't guarantee cache coherence, so the only way for an NFS client 
to see a write from another NFS client is by always refetching pages whenever 
they're accessed.


This is also why LMDB doesn't provide user-level VFS hooks - it's generally 
impractical to emulate mmap from application level. You could always write a 
FUSE driver if that's really what you need to do, but again, the performance 
of such a solution is pretty horrible.


LMDB's read lock management also wouldn't perform well over NFS; it also uses 
an mmap'd file. On a local filesystem LMDB read locks are zero cost since they 
just atomically update a word in the mmap. Over NFS, each update to the mmap 
would also require an msync() to propagate the change back to the server. This 
would seriously limit the speed with which read transactions may be opened and 
closed. (Ordinarily opening and closing a read txn can be done with zero 
system calls.)


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/


Re: reftable [v5]: new ref storage format

2017-08-08 Thread Shawn Pearce
On Tue, Aug 8, 2017 at 12:52 AM, Jeff King  wrote:
> On Mon, Aug 07, 2017 at 03:40:48PM +, David Turner wrote:
>
>> > -Original Message-
>> > From: Shawn Pearce [mailto:spea...@spearce.org]
>> > In git-core, I'm worried about the caveats related to locking. Git tries 
>> > to work
>> > nicely on NFS, and it seems LMDB wouldn't. Git also runs fine on a 
>> > read-only
>> > filesystem, and LMDB gets a little weird about that. Finally, Git doesn't 
>> > have
>> > nearly the risks LMDB has about a crashed reader or writer locking out 
>> > future
>> > operations until the locks have been resolved. This is especially true 
>> > with shared
>> > user repositories, where another user might setup and own the semaphore.
>>
>> FWIW, git has problems with stale lock file in the event of a crash 
>> (refs/foo.lock
>> might still exist, and git does nothing to clean it up).
>>
>> In my testing (which involved a *lot* of crashing), I never once had to 
>> clean up a
>> stale LMDB lock.  That said, I didn't test on a RO filesystem.
>
> Yeah, I'd expect LMDB to do much better than Git in a crash, because it
> relies on flock. So when the kernel goes away, so too does your lock
> (ditto if a git process dies without remembering to remove the lock,
> though I don't think we've ever had such a bug).
>
> But that's also why it may not work well over NFS (though my impression
> is that flock _does_ work on modern NFS; I've been lucky enough not to
> ever use it). Lack of NFS support wouldn't be a show-stopper for most
> people, but it would be for totally replacing the existing code, I'd
> think. I'm just not clear on what the state of lmdb-on-nfs is.
>
> Assuming it could work, the interesting tradeoffs to me are:
>
>   - something like reftable is hyper-optimized for high-latency
> block-oriented access. It's not clear to me if lmdb would even be
> usable for the distributed storage case Shawn has.
>
>   - reftable is more code for us to implement, but we'd "own" the whole
> stack down to the filesystem. That could be a big win for debugging
> and optimizing for our use case.
>
>   - reftable is re-inventing a lot of the database wheel. lmdb really is
> a debugged, turn-key solution.
>
> I'm not opposed to a world where lmdb becomes the standard solution and
> Google does their own bespoke thing. But that's easy for me to say
> because I'm not Google. I do care about keeping complexity and bugs to a
> minimum for most users, and it's possible that lmdb could do that. But
> if it can't become the baseline standard (due to NFS issues), then we'd
> still want something to replace the current loose/packed storage. And if
> reftable does that, then lmdb becomes a lot less interesting.

Peff, thank you for this summary. It echos my opinions as well.

On the one hand, I love the idea of offloading the database stuff to
lmdb. But its got two technical blockers for me: behavior on NFS, and
virtualizing onto a different filesystem in userspace.

I really need a specialized reference store on a virtualized
distributed storage. The JGit reftable implementation fits that need
today. So we're probably going to go ahead and deploy that in our
environment.

I'd like to start writing a prototype reftable in C for git-core soon,
but I've been distracted by the JGit version first. It would be good
to have something to compare against the lmdb approach for git-core
before we make any decisions about what git-core wants to promote as
the new standard for ref storage.


Re: reftable [v5]: new ref storage format

2017-08-08 Thread Jeff King
On Mon, Aug 07, 2017 at 03:40:48PM +, David Turner wrote:

> > -Original Message-
> > From: Shawn Pearce [mailto:spea...@spearce.org]
> > In git-core, I'm worried about the caveats related to locking. Git tries to 
> > work
> > nicely on NFS, and it seems LMDB wouldn't. Git also runs fine on a read-only
> > filesystem, and LMDB gets a little weird about that. Finally, Git doesn't 
> > have
> > nearly the risks LMDB has about a crashed reader or writer locking out 
> > future
> > operations until the locks have been resolved. This is especially true with 
> > shared
> > user repositories, where another user might setup and own the semaphore.
> 
> FWIW, git has problems with stale lock file in the event of a crash 
> (refs/foo.lock 
> might still exist, and git does nothing to clean it up).
> 
> In my testing (which involved a *lot* of crashing), I never once had to clean 
> up a
> stale LMDB lock.  That said, I didn't test on a RO filesystem.

Yeah, I'd expect LMDB to do much better than Git in a crash, because it
relies on flock. So when the kernel goes away, so too does your lock
(ditto if a git process dies without remembering to remove the lock,
though I don't think we've ever had such a bug).

But that's also why it may not work well over NFS (though my impression
is that flock _does_ work on modern NFS; I've been lucky enough not to
ever use it). Lack of NFS support wouldn't be a show-stopper for most
people, but it would be for totally replacing the existing code, I'd
think. I'm just not clear on what the state of lmdb-on-nfs is.

Assuming it could work, the interesting tradeoffs to me are:

  - something like reftable is hyper-optimized for high-latency
block-oriented access. It's not clear to me if lmdb would even be
usable for the distributed storage case Shawn has.

  - reftable is more code for us to implement, but we'd "own" the whole
stack down to the filesystem. That could be a big win for debugging
and optimizing for our use case.

  - reftable is re-inventing a lot of the database wheel. lmdb really is
a debugged, turn-key solution.

I'm not opposed to a world where lmdb becomes the standard solution and
Google does their own bespoke thing. But that's easy for me to say
because I'm not Google. I do care about keeping complexity and bugs to a
minimum for most users, and it's possible that lmdb could do that. But
if it can't become the baseline standard (due to NFS issues), then we'd
still want something to replace the current loose/packed storage. And if
reftable does that, then lmdb becomes a lot less interesting.

-Peff


Re: reftable [v5]: new ref storage format

2017-08-08 Thread Jeff King
On Mon, Aug 07, 2017 at 07:41:43AM -0700, Shawn Pearce wrote:

> > As such if JGit wanted to use a longer key size, it is possible to implement
> > similar automatic builds and packaging into JGit.
> 
> I don't know if we need a larger key size. $DAY_JOB limits ref names
> to ~200 bytes in a hook. I think GitHub does similar. But I'm worried
> about the general masses who might be using our software and expect
> ref names thus far to be as long as PATH_MAX on their system. Most
> systems run PATH_MAX around 1024.

GitHub limits to 255 (for the fully-qualified name, so including
"refs/heads/"). I don't recall ever seeing any complaints about that,
though I suppose it's not out of the realm of possibility for somebody
with a multi-byte encoding to hit with a real name (it's configurable,
so I'm not sure if Enterprise customers in Asia might ever bump it).  I
do think something like 1024 would be well into "you're insane if you
really want to name your branch this" territory.

-Peff


RE: reftable [v5]: new ref storage format

2017-08-07 Thread David Turner
> -Original Message-
> From: Shawn Pearce [mailto:spea...@spearce.org]
> In git-core, I'm worried about the caveats related to locking. Git tries to 
> work
> nicely on NFS, and it seems LMDB wouldn't. Git also runs fine on a read-only
> filesystem, and LMDB gets a little weird about that. Finally, Git doesn't have
> nearly the risks LMDB has about a crashed reader or writer locking out future
> operations until the locks have been resolved. This is especially true with 
> shared
> user repositories, where another user might setup and own the semaphore.

FWIW, git has problems with stale lock file in the event of a crash 
(refs/foo.lock 
might still exist, and git does nothing to clean it up).

In my testing (which involved a *lot* of crashing), I never once had to clean 
up a
stale LMDB lock.  That said, I didn't test on a RO filesystem.


Re: reftable [v5]: new ref storage format

2017-08-07 Thread Shawn Pearce
On Sun, Aug 6, 2017 at 4:37 PM, Ben Alex  wrote:
> Just on the LmdbJava specific pieces:
>
> On Mon, Aug 7, 2017 at 8:56 AM, Shawn Pearce  wrote:
>>
>> Looks pretty complete. Its a Java wrapper around the C implementation
>> of LMDB, which may be sufficient for reference storage. Keys are
>> limited to 511 bytes, so insanely long reference names would have to
>> be rejected. Reftable allows reference names up to the file's
>> `page_size`, minus overhead (~15 bytes) and value (20 bytes).
>
>
> For clarification LmdbJava code doesn't enforce a particular key size limit.
> For puts the caller nominates the size in the buffer they present for
> storage, and for get-style operations (cursors etc) the LMDB database stores
> the key size and LmdbJava adjusts the Java-visible buffer accordingly.
>
> A 511 byte key limit is specified at compile time for the native LMDB
> library. For convenience the native library is compiled for 64-bit Windows,
> Linux and Mac OS and included in the LmdbJava JAR, and this compilation is
> performed using default values (including the 511 key limit) by the
> https://github.com/lmdbjava/native project. Users can specify a different
> native library to use (eg one packaged by their OS or separately compiled
> using an LmdbJava Native-like automatic build) with a larger key size if
> they wish.
>
> As such if JGit wanted to use a longer key size, it is possible to implement
> similar automatic builds and packaging into JGit.

I don't know if we need a larger key size. $DAY_JOB limits ref names
to ~200 bytes in a hook. I think GitHub does similar. But I'm worried
about the general masses who might be using our software and expect
ref names thus far to be as long as PATH_MAX on their system. Most
systems run PATH_MAX around 1024.

The limitation of needing native JARs, and having such a low compile
time constant, may be annoying to some.

>> A downside for JGit is getting these two open source projects cleared.
>> We would have to get approval from our sponsor (Eclipse Foundation) to
>> use both lmdbjava (Apache License) and LMDB (LMDB license).
>
>
> I can't speak for the other contributors, but I'm happy to review LmdbJava's
> license if this assisted. For example changing to the OpenLDAP License would
> seem a reasonable variation given users of LmdbJava already need to accept
> the OpenLDAP License to use it. Kristoffer, do you have thoughts on this?

Thanks for considering it, but please don't change your licensing just
because of JGit. Its unlikely we can use LMDB for a lot of technical
reasons.

>> Plus it
>> looks like lmdbjava still relies on local disk and isn't giving us a
>> way to patch in a virtual filesystem the way I need to at $DAY_JOB.
>
>
> LMDB's mdb_env_open requires a const char* path, so we can pass through any
> char array desired. But I think you'll find LMDB native can't map to a
> virtual file system implemented by JVM code (the LMDB caveats section has
> further local file system considerations).

Mostly at $DAY_JOB its because we can't virtualize the filesystem
calls the C library is doing.

In git-core, I'm worried about the caveats related to locking. Git
tries to work nicely on NFS, and it seems LMDB wouldn't. Git also runs
fine on a read-only filesystem, and LMDB gets a little weird about
that. Finally, Git doesn't have nearly the risks LMDB has about a
crashed reader or writer locking out future operations until the locks
have been resolved. This is especially true with shared user
repositories, where another user might setup and own the semaphore.


Re: reftable [v5]: new ref storage format

2017-08-06 Thread Shawn Pearce
On Sun, Aug 6, 2017 at 9:56 AM, Ævar Arnfjörð Bjarmason
 wrote:
> On Sun, Aug 06 2017, Shawn Pearce jotted:
>
>> 5th iteration of the reftable storage format.
>
> I haven't kept up with all of the discussion, sorry if these comments
> repeat something that's already mentioned.
>
>> ### Version 1
>>
>> A repository must set its `$GIT_DIR/config` to configure reftable:
>>
>> [core]
>> repositoryformatversion = 1
>> [extensions]
>> reftable = true
>
> David Turner's LMDB proposal specified a extensions.refStorage config
> variable instead. I think this is a much better idea, cf. the mistake we
> already made with grep.extendedRegexp & grep.patternType. I.e. to have
> 'extensions.refStorage = reftable' instead of 'extensions.reftable =
> true'.
>
> If we grow another storage backend this'll become messy, and it won't be
> obvious to the user that the configuration is mutually exclusive (which
> it surely will be), so we'll end up having to special case it similar to
> the grep.[extendedRegexp,patternType] (i.e. either make one override the
> other, or make specifying >1 an error, a hassle with the config API).

Good catch. I've fixed this to use extensions.refStorage. Thanks!


>> Performance testing indicates reftable is faster for lookups (51%
>> faster, 11.2 usec vs.  5.4 usec), although reftable produces a
>> slightly larger file (+ ~3.2%, 28.3M vs 29.2M):
>>
>> format|  size  | seek cold | seek hot  |
>> -:|---:|--:|--:|
>> mh-alt| 28.3 M | 23.4 usec | 11.2 usec |
>> reftable  | 29.2 M | 19.9 usec |  5.4 usec |
>>
>> [mh-alt]: 
>> https://public-inbox.org/git/camy9t_hcnyc1g8xwoowhe7nn0aefyybskv2aomb_fe+wgve...@mail.gmail.com/
>
> Might be worth noting "based on WIP Java implementation". I started
> searching for patches for this new format & found via
> 

Re: reftable [v5]: new ref storage format

2017-08-06 Thread Ævar Arnfjörð Bjarmason

On Sun, Aug 06 2017, Shawn Pearce jotted:

> 5th iteration of the reftable storage format.

I haven't kept up with all of the discussion, sorry if these comments
repeat something that's already mentioned.

> ### Version 1
>
> A repository must set its `$GIT_DIR/config` to configure reftable:
>
> [core]
> repositoryformatversion = 1
> [extensions]
> reftable = true

David Turner's LMDB proposal specified a extensions.refStorage config
variable instead. I think this is a much better idea, cf. the mistake we
already made with grep.extendedRegexp & grep.patternType. I.e. to have
'extensions.refStorage = reftable' instead of 'extensions.reftable =
true'.

If we grow another storage backend this'll become messy, and it won't be
obvious to the user that the configuration is mutually exclusive (which
it surely will be), so we'll end up having to special case it similar to
the grep.[extendedRegexp,patternType] (i.e. either make one override the
other, or make specifying >1 an error, a hassle with the config API).

> Performance testing indicates reftable is faster for lookups (51%
> faster, 11.2 usec vs.  5.4 usec), although reftable produces a
> slightly larger file (+ ~3.2%, 28.3M vs 29.2M):
>
> format|  size  | seek cold | seek hot  |
> -:|---:|--:|--:|
> mh-alt| 28.3 M | 23.4 usec | 11.2 usec |
> reftable  | 29.2 M | 19.9 usec |  5.4 usec |
>
> [mh-alt]: 
> https://public-inbox.org/git/camy9t_hcnyc1g8xwoowhe7nn0aefyybskv2aomb_fe+wgve...@mail.gmail.com/

Might be worth noting "based on WIP Java implementation". I started
searching for patches for this new format & found via

Re: reftable [v5]: new ref storage format

2017-08-05 Thread Shawn Pearce
5th iteration of the reftable storage format.

You can read a rendered version of this here:
https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md

Significant changes from v4:
- Supported Michael Haggerty's multi-level index.
- Restart table now appears at start of block, instead of end.
- The `restart_offset` is now 3-bytes, instead of 4-bytes.
- Footer stores `obj_id_len` abbreviation used by obj blocks.

- Clarified log-only files can exist.
- Clarified use of `position` for byte in file, `offset` for byte in block.
- Clarified peeling, and reference name encoding as bag of bytes.
- Corrected extensions.reftable value to be `true`.


# reftable

[TOC]

## Overview

### Problem statement

Some repositories contain a lot of references (e.g.  android at 866k,
rails at 31k).  The existing packed-refs format takes up a lot of
space (e.g.  62M), and does not scale with additional references.
Lookup of a single reference requires linearly scanning the file.

Atomic pushes modifying multiple references require copying the
entire packed-refs file, which can be a considerable amount of data
moved (e.g. 62M in, 62M out) for even small transactions (2 refs
modified).

Repositories with many loose references occupy a large number of disk
blocks from the local file system, as each reference is its own file
storing 41 bytes (and another file for the corresponding reflog).
This negatively affects the number of inodes available when a large
number of repositories are stored on the same filesystem.  Readers can
be penalized due to the larger number of syscalls required to traverse
and read the `$GIT_DIR/refs` directory.

### Objectives

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push with `O(size_of_update)` operations.
- Combine reflog storage with ref storage for small transactions.
- Separate reflog storage for base refs and historical logs.

### Description

A reftable file is a portable binary file format customized for
reference storage. References are sorted, enabling linear scans,
binary search lookup, and range scans.

Storage in the file is organized into blocks.  Prefix compression
is used within a single block to reduce disk space.  Block size is
tunable by the writer.

### Performance

Space used, packed-refs vs. reftable:

repository | packed-refs | reftable | % original | avg ref  | avg obj
---|:|-:|---:|-:|:
android|  62.2 M |   34.4 M | 55.2%  | 33 bytes | 8 bytes
rails  |   1.8 M |1.1 M | 57.7%  | 29 bytes | 6 bytes
git|  78.7 K |   44.0 K | 60.0%  | 50 bytes | 6 bytes
git (heads)|   332 b |274 b | 83.1%  | 34 bytes | 0 bytes

Scan (read 866k refs), by reference name lookup (single ref from 866k
refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):

format  | cache | scan| by name| by SHA-1
|--:|:|---:|---:
packed-refs | cold  |  402 ms | 409,660.1 usec | 412,535.8 usec
packed-refs | hot   | |   6,844.6 usec |  20,110.1 usec
reftable| cold  |  112 ms |  33.9 usec | 323.2 usec
reftable| hot   | |  20.2 usec | 320.8 usec

Space used for 149,932 log entries for 43,061 refs,
reflog vs. reftable:

format| size  | avg entry
--|--:|---:
$GIT_DIR/logs | 173 M | 1209 bytes
reftable  |   5 M |   37 bytes

## Details

### Peeling

References stored in a reftable are peeled, a record for an annotated
(or signed) tag records both the tag object, and the object it refers
to.

### Reference name encoding

Reference names are an uninterpreted sequence of bytes that must pass
[git-check-ref-format][ref-fmt] as a valid reference name.

[ref-fmt]: https://git-scm.com/docs/git-check-ref-format

### Network byte order

All multi-byte, fixed width fields are in network byte order.

### Ordering

Blocks are lexicographically ordered by their first reference.

### Directory/file conflicts

The reftable format accepts both `refs/heads/foo` and
`refs/heads/foo/bar` as distinct references.

This property is useful for retaining log records in reftable, but may
confuse versions of Git using `$GIT_DIR/refs` directory tree to
maintain references.  Users of reftable may choose to continue to
reject `foo` and `foo/bar` type conflicts to prevent problems for
peers.

## File format

### Structure

A reftable file has the following high-level structure:

first_block {
  header
  first_ref_block
}
ref_blocks*
ref_index?
obj_blocks*
obj_index?
log_blocks*
log_index?
footer

A log-only file omits the `ref_blocks`, `ref_index`,