Re: RFC/Pull Request: Refs db backend

2015-06-24 Thread Jeff King
On Tue, Jun 23, 2015 at 02:18:36PM -0400, David Turner wrote:

  Can you describe a bit more about the reflog handling?
  
  One of the problems we've had with large-ref repos is that the reflog
  storage is quite inefficient. You can pack all the refs, but you may
  still be stuck with a bunch of reflog files with one entry, wasting a
  whole inode. Doing a git repack when you have a million of those has
  horrible cold-cache performance. Basically anything that isn't
  one-file-per-reflog would be a welcome change. :)
 
 Reflogs are stored in the database as well.  There is one header entry
 per ref to indicate that a reflog is present, and then one database
 entry per reflog entry; the entries are stored consecutively and
 immediately following the header so that it's fast to iterate over them.

OK, that make sense. I did notice that the storage for the refdb grows
rapidly. If I add a millions refs (like refs/tags/$i) with a simple
reflog message foo, I ended up with a 500MB database file.

That's _probably_ OK, because a million is getting into crazy
territory[1].  But it's 500 bytes per ref, each with one reflog entry.
Our ideal lower bound is probably something like 100 bytes per reflog
entry:

  - 20 bytes for old sha1
  - 20 bytes for new sha1
  - ~50 bytes for name, email, timestamp
  - ~6 bytes for refname (100 is the longest unique part)

That assumes we store binary[2] (and not just the raw reflog lines), and
reconstruct the reflog lines on the fly. It also assumes we use some
kind of trie-like storage (where we can amortize the cost of storing
refs/tags/ across all of the entries).

Of course that neglects lmdb's overhead, and the storage of the ref tip
itself. But it would hopefully give us a ballpark for an optimal
solution. We don't have to hit that, of course, but it's food for
thought.

[1] The homebrew/homebrew repository on GitHub has almost half a million
ref updates. Since this is storing not just refs but all ref
updates, that's actually the interesting number (and optimizing the
per-reflog-entry size is more interesting than the per-ref size).

[2] I'm hesitant to suggest binary formats in general, but given that
this is a blob embedded inside lmdb, I think it's OK. If we were to
pursue the log-structured idea I suggested earlier, I'm torn on
whether it should be binary or not.

  It has also been a dream of mine to stop tying the reflogs specifically
  to the refs. I.e., have a spot for reflogs of branches that no longer
  exist, which allows us to retain them for deleted branches.
 [...]
 That would be cool, and I don't think it would be hard to add to my
 current code; we could simply replace the header with a tombstone.
 But I would prefer to wait until the series is merged; then we can build
 on top of it.

Yeah, I think you can add it easily to basically any system that does
not have the filesystem D/F conflicts in its storage (i.e., having
refs/foo does not block data under refs/foo/bar).

  But it may also be worth going with a slightly slower database if we can
  get wider compatibility for free.
 
 There's a JNI interface to LMDB, which is, of course, not native.  I
 don't think it would be too hard to entirely rewrite LMDB in Java, but
 I'm not going to have time to do it for the forseeable future.  I've
 asked Howard Chu if he knows of any efforts in progress.

Yeah, I think JNI is not enough for Eclipse folks. I don't think this is
a task that you would necessarily need to take on. More just something
to think about for the future when picking a format.

 Thanks, that's valuable.  For the refs backend, opening the LMDB
 database for writing is sufficient to block other writers.  Do you think
 it would be valuable to provide a git hold-ref-lock command that simply
 reads refs from stdin and keeps them locked until it reads EOF from
 stdin?  That would allow cross-backend ref locking.

I'm not sure what you would use it for. If you want to update the refs,
then you can specify a whole transaction with git update-ref --stdin,
and that should work whatever backend you choose. Is there some other
operation you want where you hold the lock for a longer period of time?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-24 Thread Shawn Pearce
On Tue, Jun 23, 2015 at 4:47 AM, Jeff King p...@peff.net wrote:

 One of the problems we've had with large-ref repos is that the reflog
 storage is quite inefficient.

Yup. We ran into this with Gerrit Code Review years ago. The
refs/changes/... namespace created by Gerrit Code Review is 1 ref per
snapshot per code review, and never modified. Reflogs for these are
always exactly one record. We broke down and modified JGit to add an
API that allowed Gerrit Code Review to disable recording reflogs for
specific updates just to avoid creating reflogs under refs/changes/.

In our JGit DFS implementation we store reflogs in databases to
eliminate these overheads. It works well for us. Hopefully the feature
can come to git-core through this series.

 It has also been a dream of mine to stop tying the reflogs specifically
 to the refs. I.e., have a spot for reflogs of branches that no longer
 exist, which allows us to retain them for deleted branches. Then you can
 possibly recover from a branch deletion, whereas now you have to dig
 through git fsck's dangling output. And the reflog, if you don't
 expire it, becomes a suitable audit log to find out what happened to
 each branch when (whereas now it is full of holes when things get
 deleted).

Yes. $DAY_JOB's DFS implementation never expires reflogs, allowing it
to be used as a history to inspect what happened. Its been useful a
couple of times to investigate and recover from a few accidental
deletions.

Once you never expire reflog records you now have to consider at what
point do you stop paying attention to the reflog entries for graph
reachability during repack and fsck. Users still expect to be able to
force push or delete a branch and have a set of objects disappear from
the repository.

I am looking forward to something like this in git-core. I delete
branches in my local repos and then regret that. Then remember HEAD
has a reflog and hope I can find it somewhere in there. Usually I
fail, and am sad. :(

 I was thinking of actually moving to a log-structured ref storage.
 Something like:

   - any ref write puts a line at the end of a single logfile that
 contains the ref name, along with the normal reflog data

   - the logfile is the source of truth for the ref state. If you want to
 know the value of any ref, you can read it backwards to find the
 last entry for the ref. Everything else is an optimization.

 Let's call the number of refs N, and the number of ref updates in
 the log U.

   - we keep a key/value index mapping the name of any branch that exists
 to the byte offset of its entry in the logfile. This would probably
 be in some binary key/value store (like LMDB). Without this,
 resolving a ref is O(U), which is horrible. With it, it should be
 O(1) or O(lg N), depending on the index data structure.

This ... would be fantastic.

There are some issues with append. Before appending we would need to
verify the last record actually ends with an LF. If there was a power
failure and only part of the last record wrote, you can't append
without that record separator in place.

If that last record was truncated, and an LF was wedged in to do a new
append, we can't trust that intermediate record. A CRC at the end of
the record might make it safer to know the record is intact or bogus
due to an earlier failed write that wasn't completed.

What about the case of never expiring the reflog? This log would grow
forever. You may eventually need to archive old sections of it (e.g. 1
year ago?) to maintain an audit log, while keeping the latest entry
for each ref to rebuild the index.

   - the index can also contain other optimizations. E.g., rather than
 point to the entry in the logfile, it can include the sha1 directly
 (to avoid an extra level of indirection). It may want to include the
 peeled value, as the current packed-refs file does.

+1 to always storing the peeled value. This was a major improvement
for $DAY_JOB's Git servers as peeling tags on the fly can be costly
when your storage is something remote, such as NFS. Unfortunately the
current wire protocol demands peeled information to serve a ref
advertisement.

One thing we do is always peel all refs. We record a bit to state its
been peeled, but there is no peeled value because the ref is pointing
to a non-tag object (e.g. refs/heads/master points to a commit).

I guess this puts an index structure at something like:

  refname \0 log_idx_4 sha1_20 ('n' | 'p' sha1_20)

Or refname + 26 bytes for heads and refname + 46 bytes for tags.


Updating the index on updates to a ref would be costly, as its O(N).
You could skip some index updates. Record in the header of the index
the length of the reflog file used to build it. When reading the
index, scan the reflog from that position to the end and patch those
updates in memory. Rewrites of the index could then be deferred until
the scan delta on the log is high, or the next gc.

   - Reading all of the 

Re: RFC/Pull Request: Refs db backend

2015-06-24 Thread Jeff King
On Tue, Jun 23, 2015 at 08:10:03PM +0700, Duy Nguyen wrote:

- we keep a key/value index mapping the name of any branch that exists
  to the byte offset of its entry in the logfile. This would probably
 
 One key/value mapping per branch, pointing to the latest reflog entry,
 or one key/valye mapping for each reflog entry?

Yeah, sorry, I meant to point only to the latest entry (and then from
there if you want to actually walk the reflog, you can do so by
following the backreference to the previous entry).

  be in some binary key/value store (like LMDB). Without this,
  resolving a ref is O(U), which is horrible. With it, it should be
  O(1) or O(lg N), depending on the index data structure.
 
 I'm thinking of the user with small or medium repos, in terms of refs,
 who does not want an extra dependency. If we store one mapping per
 branch, then the size of this mapping is small enough that the index
 in a text file is ok. If we also store the offset to the previous
 reflog entry of the same branch in the current reflog entry, like a
 back pointer, then we could jump back faster.
 
 Or do you have something else in mind? Current reflog structure won't
 work because I think you bring back the reflog graveyard with this,
 and I don't want to lose that

I hadn't really thought about having multiple formats for the index. But
in theory, yes, you could, and the lowest common denominator could just
use the filesystem. Or even something similar to the packed-refs file,
where we have to write the whole thing to make a single update. That
doesn't perform well, but it's dirt simple and might be OK if you have
only a handful of refs.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-24 Thread Duy Nguyen
On Wed, Jun 24, 2015 at 1:09 PM, Shawn Pearce spea...@spearce.org wrote:
 I chose to use LMDB for the database.  LMDB has a few features that make
 it suitable for usage in git:

 One of the complaints that Shawn had about sqlite is that there is no
 native Java implementation, which makes it hard for JGit to ship a
 compatible backend. I suspect the same is true for LMDB, but it is
 probably a lot simpler than sqlite (so reimplementation might be
 possible).

 Yes. Whatever the default standard format is for git-core, we need
 that format to be easily supportable from JGit. Loading code via JNI
 is not easily supportable.

I'm under the impression that this will be opt-in, not completely
replacing fs-based ref backend. Anyway, any recommendation about
database format or engine that is more friendly to Java and JGit (and
preferably has good C support too)?
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-24 Thread Jeff King
On Tue, Jun 23, 2015 at 11:09:40PM -0700, Shawn Pearce wrote:

 Yes. $DAY_JOB's DFS implementation never expires reflogs, allowing it
 to be used as a history to inspect what happened. Its been useful a
 couple of times to investigate and recover from a few accidental
 deletions.
 
 Once you never expire reflog records you now have to consider at what
 point do you stop paying attention to the reflog entries for graph
 reachability during repack and fsck. Users still expect to be able to
 force push or delete a branch and have a set of objects disappear from
 the repository.

Yeah, we face this problem at GitHub. We actually write every single ref
write to $GIT_DIR/audit_log, which is essentially a reflog with the
refname prepended. The key, though, is that it isn't ever _read_ by git
for reachability. So it becomes an immutable log of what happened, and
we can happily prune the reflog to drop objects.

In a log-structured ref storage world, I think I'd include a single bit
per entry for use this for reachability. Then you could soft-expire
reflog entries by dropping their reachability bit, but still retain them
in your audit_log. The alternative is to just copy the entries to an
archival log.

 There are some issues with append. Before appending we would need to
 verify the last record actually ends with an LF. If there was a power
 failure and only part of the last record wrote, you can't append
 without that record separator in place.

Yeah, I think that is straightforward. You have to take a lock on the
whole log anyway, so it's OK to fixup the previous entry.

 If that last record was truncated, and an LF was wedged in to do a new
 append, we can't trust that intermediate record. A CRC at the end of
 the record might make it safer to know the record is intact or bogus
 due to an earlier failed write that wasn't completed.

I suspect you could get by with just realizing that the entry doesn't
parse (that's what we do now for reflogs). But the idea of per-entry
consistency checks is appealing. You could also include the CRC for the
previous entry (remember that we would probably have a back-pointer to
some byte offset to say this is the current ref state that I am
building on). Then you can walk back the whole chain to know that it
hasn't been damaged.

If you want to get very fancy, replace your CRC with a cryptographically
strong hash, and you've just reinvented a blockchain. :)

 What about the case of never expiring the reflog? This log would grow
 forever. You may eventually need to archive old sections of it (e.g. 1
 year ago?) to maintain an audit log, while keeping the latest entry
 for each ref to rebuild the index.

Yeah, that's certainly an option. I'd say that's somewhat outside the
scope of git. If git provides the ability to prune entries completely
(i.e., what reflog expire does now) and to soft-expire them, then that
is enough for anyone to build whatever sort of archival system they want
(e.g., soft-expire for reachability as desired, and then occasionally
git reflog show your-archive  git reflog expire).

 +1 to always storing the peeled value. This was a major improvement
 for $DAY_JOB's Git servers as peeling tags on the fly can be costly
 when your storage is something remote, such as NFS. Unfortunately the
 current wire protocol demands peeled information to serve a ref
 advertisement.

Even on good disks, it makes the initial ref advertisement from
git-upload-pack _way_ cheaper, because we don't have to actually touch
the object database at all. It's basically just blitting out the
packed-refs file.

 One thing we do is always peel all refs. We record a bit to state its
 been peeled, but there is no peeled value because the ref is pointing
 to a non-tag object (e.g. refs/heads/master points to a commit).

Yeah, since c29c46f (pack-refs: add fully-peeled trait, 2013-03-18) we
implicitly do this in packed-refs; if there's no peel line after the
entry, it cannot be peeled. We could do the same here, but I think I
favor being more implicit (I'd probably add a few bits of flags to
each entry, and this could be one such flag).

 Updating the index on updates to a ref would be costly, as its O(N).

It depends how you implement the index. A straight text index would be
O(N). Replacing the index with a real key/value store should be very
fast. But unless we are going to write our own, that's going to
introduce a dependency (possibly one we can ship as we do with xdiff,
but the whole JGit thing is an open question).

 You could skip some index updates. Record in the header of the index
 the length of the reflog file used to build it. When reading the
 index, scan the reflog from that position to the end and patch those
 updates in memory. Rewrites of the index could then be deferred until
 the scan delta on the log is high, or the next gc.

Yeah, basically use the log as a journal. You save (or at least
amortize) O(# of refs) work for the writers, at the cost of O(# of
recent updates) work 

Re: RFC/Pull Request: Refs db backend

2015-06-24 Thread brian m. carlson
On Wed, Jun 24, 2015 at 05:49:20AM -0400, Jeff King wrote:
 I don't know how much that helps for the JGit situation. It punts the
 native code out of JGit, but people using JGit still have to have the
 native helper from git on their system.  I have no problems at all with
 pluggable $FANCY_DB that not everybody supports.  But I think we would
 want _some_ baseline that is reasonably performant, and that everybody
 will support. I'm not sure putting the index into a flat file is
 performant enough. Is there any basic key/value store that is has both a
 C and a pure-Java version (e.g., berkeley db)?

Berkeley DB has switched to the AGPLv3 for new versions.  Besides being
unpalatable for many people, it's also incompatible with the GPLv2.  I
do otherwise like Berkeley DB: it performs reasonably well and is
available on most systems.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature


Re: RFC/Pull Request: Refs db backend

2015-06-24 Thread David Turner
On Tue, 2015-06-23 at 23:27 +0200, Michael Haggerty wrote:
 On 06/23/2015 09:53 PM, David Turner wrote:
  On Tue, 2015-06-23 at 17:51 +0200, Michael Haggerty wrote:
  [...]
  * I don't like the fact that you have replaced `struct ref_transaction
  *` with `void *` in the public interface. On a practical level, I like
  the bit of type-safety that comes with the more specific declaration.
  But on a more abstract level, I think that the concept of a transaction
  could be useful across backends, for example in utility functions that
  verify that a proposed set of updates are internally consistent. I would
  rather see either
 
* backends extend a basic `struct ref_transaction` to suit their
  needs, and upcast/downcast pointers at the module boundary, or
 
* `struct ref_transaction` itself gets a `void *` member that backends
  can use for whatever purposes they want.
  
  There are no common fields between refs-be-file transactions and
  refs-be-lmdb transactions.  I don't see much gain from adding an empty
  ref_transaction that backends could extend, since we would have to
  explicitly upcast/downcast all over the place.
 
 If you ask me, it would be better to do a bunch of up/downcasts within
 the single module (via two helper functions that could even do
 consistency checks) than have no help from the compiler in preventing
 people from passing unrelated pointer types into the `void *transaction`
 argument. Plus the `struct ref_transaction *` variables scattered
 throughout the code are a lot more self-explanatory than `void *`.

I'll take a look at what that would look like.

  * Regarding MERGE_HEAD: you take the point of view that it must continue
  to be stored as a file. And yet it must also behave somewhat like a
  reference; for example, `git rev-parse MERGE_HEAD` works today.
  MERGE_HEAD is also used for reachability, right?
 
  Another point of view is that MERGE_HEAD is a plain old boring
  reference, but there is some other metadata related to it that the refs
  backend has to store. The file-based backend would have special-case
  code to read the additional data from the tail of the loose refs file
  (and be sure to write the metadata when writing the reference), but
  other backends could store the reference with the rest but do their own
  thing with the metadata. So I guess I'm wondering whether the refs API
  needs a MERGE_HEAD-specific way to read and write MERGE_HEAD along with
  its metadata.
  
  You are probably right that this is a good idea.
  
  * Don't the same considerations that apply to MERGE_HEAD also apply to
  FETCH_HEAD?
  
  All of the tests pass without any special handling of FETCH_HEAD.
 
 That's odd. From git-fetch.txt:
 
 The names of refs that are fetched, together with the object names
 they point at, are written to `.git/FETCH_HEAD`.  This information
 may be used by scripts or other git commands, such as
 linkgit:git-pull[1].
 
 It seems like the test suite is reading FETCH_HEAD via the refs API in a
 couple of places. I don't understand why these don't fail when LMDB is
 being used...

You are right; I did add some special-case code for FETCH_HEAD.

  * Rehash of the last two points: I expected one backend function that is
  used to initialize the refs backend when a new repository is created
  (e.g., in `git init`). The file-based backend would use this function to
  create the `refs`, `refs/heads`, and `refs/tags` directories. I expected
  a second function that is called once every time git runs in an existing
  repository (this one might, for example, open a database connection).
  And maybe even a third one that closes down the database connection
  before git exits. Would you please explain how this actually works?
  
  LMDB doesn't really have the concept of a connection.  It's basically
  just a couple of files that communicate using shared memory (and maybe
  some other locking that I haven't paid attention to).  There is the
  concept of a transaction, which is the unit of concurrency (each
  thread may only have one open transaction).  Transactions are either
  read-only or read-write, and there can only be one read-write
  transaction open at a time (across the entire system).  Read-only
  transactions take a snapshot of the DB state at transaction start time.
  This combination of features means that we need to be a bit clever about
  read-only transactions; if a read-write transaction occurs in a separate
  process, we need to restart any read-only transactions to pick up its
  changes.
 
 If you are thinking about an *unrelated* separate process, then Git's
 philosophy is that if our process is reading *some* valid state of the
 references, it's all good even if that state is not quite the newest.
 After all, who's to say whether our process ran before or after the
 other process? As long as each process sees self-consistent views of the
 world as it existed at some recent time, we're satisfied.

No, I'm thinking 

Re: RFC/Pull Request: Refs db backend

2015-06-24 Thread David Turner
On Wed, 2015-06-24 at 05:14 -0400, Jeff King wrote:
 On Tue, Jun 23, 2015 at 02:18:36PM -0400, David Turner wrote:
 
   Can you describe a bit more about the reflog handling?
   
   One of the problems we've had with large-ref repos is that the reflog
   storage is quite inefficient. You can pack all the refs, but you may
   still be stuck with a bunch of reflog files with one entry, wasting a
   whole inode. Doing a git repack when you have a million of those has
   horrible cold-cache performance. Basically anything that isn't
   one-file-per-reflog would be a welcome change. :)
  
  Reflogs are stored in the database as well.  There is one header entry
  per ref to indicate that a reflog is present, and then one database
  entry per reflog entry; the entries are stored consecutively and
  immediately following the header so that it's fast to iterate over them.
 
 OK, that make sense. I did notice that the storage for the refdb grows
 rapidly. If I add a millions refs (like refs/tags/$i) with a simple
 reflog message foo, I ended up with a 500MB database file.
 
 That's _probably_ OK, because a million is getting into crazy
 territory[1].  But it's 500 bytes per ref, each with one reflog entry.
 Our ideal lower bound is probably something like 100 bytes per reflog
 entry:
 
   - 20 bytes for old sha1
   - 20 bytes for new sha1
   - ~50 bytes for name, email, timestamp
   - ~6 bytes for refname (100 is the longest unique part)
 
 That assumes we store binary[2] (and not just the raw reflog lines), and
 reconstruct the reflog lines on the fly. It also assumes we use some
 kind of trie-like storage (where we can amortize the cost of storing
 refs/tags/ across all of the entries).
 
 Of course that neglects lmdb's overhead, and the storage of the ref tip
 itself. But it would hopefully give us a ballpark for an optimal
 solution. We don't have to hit that, of course, but it's food for
 thought.
 
 [1] The homebrew/homebrew repository on GitHub has almost half a million
 ref updates. Since this is storing not just refs but all ref
 updates, that's actually the interesting number (and optimizing the
 per-reflog-entry size is more interesting than the per-ref size).
 
 [2] I'm hesitant to suggest binary formats in general, but given that
 this is a blob embedded inside lmdb, I think it's OK. If we were to
 pursue the log-structured idea I suggested earlier, I'm torn on
 whether it should be binary or not.

I could try a binary format.  I was optimizing for simplicity,
debuggability, recoverability, compatibility with the choice of the text
format, but I wouldn't have to.  I don't know how much this will save.
Unfortunately, given the way LMDB works, a trie-like storage to save
refs/tags does not seem possible (of course, we could hard-code some
hacks like \001=refs/rags, \002=refs/heads, etc but that is a
micro-optimization that might not be worth it.

Also, the reflog header has some overhead (it's an entire extra record
per ref). The header exists to implement reflog creation/existence
checking.  I didn't really try to understand why we have the distinction
between empty and nonexistent reflogs; I just copied it.  If we didn't
have that distinction, we could eliminate that overhead.

  Thanks, that's valuable.  For the refs backend, opening the LMDB
  database for writing is sufficient to block other writers.  Do you think
  it would be valuable to provide a git hold-ref-lock command that simply
  reads refs from stdin and keeps them locked until it reads EOF from
  stdin?  That would allow cross-backend ref locking.
 
 I'm not sure what you would use it for. If you want to update the refs,
 then you can specify a whole transaction with git update-ref --stdin,
 and that should work whatever backend you choose. Is there some other
 operation you want where you hold the lock for a longer period of time?

I'm sure I had a reason for this at the time I wrote it, but now I can't
think of what it was.  Nevermind!

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread Michael Haggerty
On 06/23/2015 02:50 AM, David Turner wrote:
 I've revived and modified Ronnie Sahlberg's work on the refs db
 backend.  
 
 The work is on top of be3c13e5564, Junio's First batch for 2.5 cycle.
 I recognize that there have been changes to the refs code since then,
 and that there are some further changes in-flight from e.g. Michael
 Haggerty.  If there is interest in this, I can rebase once Michael's
 changes land.

It's awesome that you are working on this!

I'm reading through your commits and will add comments as they pop into
my head...

* I initially read refs-be-files to be a short version of references,
they be files. I might never be able to get that pronunciation out of
my head :-)

* It would be more modest to call the files implementing the LMDB
backend refs-be-lmdb.[c,h] rather than refs-be-db.[c,h].

* I wonder whether `refname_is_safe()` might eventually have to become
backend-specific. For example, maybe one backend will have to impose a
limit of 128 characters or something? No matter, though...it can be
moved later.

* You have put `format_reflog_msg()` in the public interface. It
probably makes sense, because more than one backend might want to use
it. But another backend might want to store (refname, old_sha1,
new_sha1, ...) as separate columns in a database. As long as
`format_reflog_msg()` is seen as a helper function and is not called by
any of the shared code, it shouldn't be a problem.

* I wonder whether `init_backend()` will be general enough. I'm thinking
by analogy with object constructors, which usually need class-specific
arguments during their initialization even though afterwards objects of
different classes can be used interchangeably. So I guess the idea is
that a typical `init_backend()` function will have to dig around itself
to find whatever additional information that it needs (e.g., from the
git configuration or the filesystem or whatever). So I think this is OK.

* Your methods for bulk updates are I think analogous to the
`initial_ref_transaction_commit()` function that I recently submitted
[1]. Either way, the goal is to abstract away the fact that the
file-based backend uses packed and loose references with tradeoffs that
callers currently have to know about.

* I don't like the fact that you have replaced `struct ref_transaction
*` with `void *` in the public interface. On a practical level, I like
the bit of type-safety that comes with the more specific declaration.
But on a more abstract level, I think that the concept of a transaction
could be useful across backends, for example in utility functions that
verify that a proposed set of updates are internally consistent. I would
rather see either

  * backends extend a basic `struct ref_transaction` to suit their
needs, and upcast/downcast pointers at the module boundary, or

  * `struct ref_transaction` itself gets a `void *` member that backends
can use for whatever purposes they want.

* Regarding MERGE_HEAD: you take the point of view that it must continue
to be stored as a file. And yet it must also behave somewhat like a
reference; for example, `git rev-parse MERGE_HEAD` works today.
MERGE_HEAD is also used for reachability, right?

Another point of view is that MERGE_HEAD is a plain old boring
reference, but there is some other metadata related to it that the refs
backend has to store. The file-based backend would have special-case
code to read the additional data from the tail of the loose refs file
(and be sure to write the metadata when writing the reference), but
other backends could store the reference with the rest but do their own
thing with the metadata. So I guess I'm wondering whether the refs API
needs a MERGE_HEAD-specific way to read and write MERGE_HEAD along with
its metadata.

* Don't the same considerations that apply to MERGE_HEAD also apply to
FETCH_HEAD?

* I'm showing my ignorance of LMDB, but I don't see where the LMDB
backend initializes its database during `git init-db`. Is that what
`init_env()` does? But it looks like `init_env()` is called on every git
invocation (via `git_config_early()`). Puzzled.

* Meanwhile, `create_default_files()` (in `builtin/init-db`) still
creates directories `refs`, `refs/heads`, and `refs/tags`.

* Rehash of the last two points: I expected one backend function that is
used to initialize the refs backend when a new repository is created
(e.g., in `git init`). The file-based backend would use this function to
create the `refs`, `refs/heads`, and `refs/tags` directories. I expected
a second function that is called once every time git runs in an existing
repository (this one might, for example, open a database connection).
And maybe even a third one that closes down the database connection
before git exits. Would you please explain how this actually works?

* `lmdb_init_backend()` leaks `path` if `env` is already set (in which
case it needn't compute `path` in the first place).

* You have the constraint that submodules need to use the same reference

Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread Duy Nguyen
On Tue, Jun 23, 2015 at 6:47 PM, Jeff King p...@peff.net wrote:
 I was thinking of actually moving to a log-structured ref storage.
 Something like:

   - any ref write puts a line at the end of a single logfile that
 contains the ref name, along with the normal reflog data

   - the logfile is the source of truth for the ref state. If you want to
 know the value of any ref, you can read it backwards to find the
 last entry for the ref. Everything else is an optimization.

 Let's call the number of refs N, and the number of ref updates in
 the log U.

   - we keep a key/value index mapping the name of any branch that exists
 to the byte offset of its entry in the logfile. This would probably

One key/value mapping per branch, pointing to the latest reflog entry,
or one key/valye mapping for each reflog entry?

 be in some binary key/value store (like LMDB). Without this,
 resolving a ref is O(U), which is horrible. With it, it should be
 O(1) or O(lg N), depending on the index data structure.

I'm thinking of the user with small or medium repos, in terms of refs,
who does not want an extra dependency. If we store one mapping per
branch, then the size of this mapping is small enough that the index
in a text file is ok. If we also store the offset to the previous
reflog entry of the same branch in the current reflog entry, like a
back pointer, then we could jump back faster.

Or do you have something else in mind? Current reflog structure won't
work because I think you bring back the reflog graveyard with this,
and I don't want to lose that

   - the index can also contain other optimizations. E.g., rather than
 point to the entry in the logfile, it can include the sha1 directly
 (to avoid an extra level of indirection). It may want to include the
 peeled value, as the current packed-refs file does.

   - Reading all of the reflogs (e.g., for repacking) is O(U), just like
 it is today. Except the storage for the logfile is a lot more
 compact than what we store today, with one reflog per file.

   - Reading a single reflog is _also_ O(U), which is not as good as
 today. But if each log entry contains a byte offset of the previous
 entry, you can follow the chain (it is still slightly worse, because
 you are jumping all over the file, rather than reading a compact set
 of lines).

   - Pruning the reflog entries from the logfile requires rewriting the
 whole thing. That's similar to today, where we rewrite each of the
 reflog files.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread David Turner
On Mon, 2015-06-22 at 22:36 -0700, Junio C Hamano wrote:
 David Turner dtur...@twopensource.com writes:
 
  I've revived and modified Ronnie Sahlberg's work on the refs db
  backend.  
 
  The work is on top of be3c13e5564, Junio's First batch for 2.5 cycle.
  I recognize that there have been changes to the refs code since then,
  and that there are some further changes in-flight from e.g. Michael
  Haggerty.  If there is interest in this, I can rebase once Michael's
  changes land.
  ...
  The db backend runs git for-each-ref about 30% faster than the files
  backend with fully-packed refs on a repo with ~120k refs.  It's also
  about 4x faster than using fully-unpacked refs.  In addition, and
  perhaps more importantly, it avoids case-conflict issues on OS X.
 
  I chose to use LMDB for the database...
  ...
  Ronnie Sahlberg's original version of this patchset used tdb.  The
  advantage of tdb is that it's smaller (~125k).  The disadvantages are
  that tdb is hard to build on OS X.  It's also not in homebrew.  So lmdb
  seemed simpler.
 
 If there is interest?  Shut up and take my money ;-)
 
 More seriously, that's great that you stepped up to resurrect this
 topic.  In a sense, the choice of sample database backend does not
 matter.  I do not care if it is tdb, lmdb, or even Berkeley DB as
 long as it functions. ;-)
 
 As long as the interface between ref-transaction system on the Git
 side and the database backend is designed right, your lmdb thing can
 serve as a reference implementation for other people to plug other
 database backends to the same interface, right? 

Yes.

  As one step to
 validate the interface to the database backends, it would be nice to
 eventually have at least two backends that talk to meaningfully
 different systems, but we have to start somewhere, and for now we
 have lmdb is as good a place to start as any other db backend.
 
 I wonder if we can do a filesystem backend on top of the same
 backend interface---is that too much impedance mismatch to make it
 unpractical?

The patch series does include a filesystem backend, which is simply the
current ref infrastructure with extremely minor changes.  

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread Stefan Beller
[+ronniesahlb...@gmail.com, FYI]

On Mon, Jun 22, 2015 at 5:50 PM, David Turner dtur...@twopensource.com wrote:
 I've revived and modified Ronnie Sahlberg's work on the refs db
 backend.

Awesome!


 The work is on top of be3c13e5564, Junio's First batch for 2.5 cycle.
 I recognize that there have been changes to the refs code since then,
 and that there are some further changes in-flight from e.g. Michael
 Haggerty.  If there is interest in this, I can rebase once Michael's
 changes land.

Originally I wanted to continue on Ronnies work, but because of the churn
in refs I stopped it for a while and took care of other projects (and wanted
to come back eventually). Thanks for reviving this topic!

 The changes can be found here:
 https://github.com/dturner-tw/git.git on the dturner/pluggable-backends
 branch

 The db backend code was added in the penultimate commit; the rest is
 just code rearrangement and minor changes to make alternate backends
 possible.  There ended up being a fair amount of this rearrangement, but
 the end result is that almost the entire git test suite runs under the
 db backend without error (see below for details).

Looking at the end result in refs-be-db.c it feels like there are more
functions in the refs_be_db struct, did this originate from other design
choices? IIRC Ronnie wanted to have as least functions in there as
possible, and share as much of the code between the databases, such
that the glue between the db and the refs code is minimal.

Some random comments from looking over the branch briefly:

In the latest commit, (refs: tests for db backend), I am unsure about the
copyright annotations. At least a sole Copyright (c) 2007 Junio C Hamano
doesn't make sense to me. ;)

Typo in commit message bisect: use refs insfrastructure for BISECT_START

Some commits contain a ChangeId, which is a Gerrit leftover. :(

Thanks,
Stefan


 The db backend runs git for-each-ref about 30% faster than the files
 backend with fully-packed refs on a repo with ~120k refs.  It's also
 about 4x faster than using fully-unpacked refs.  In addition, and
 perhaps more importantly, it avoids case-conflict issues on OS X.

 I chose to use LMDB for the database.  LMDB has a few features that make
 it suitable for usage in git:

 1. It is relatively lightweight; it requires only one header file, and
 the library itself is under 300k (as opposed to 700k for
 e.g. sqlite).

 2. It is well-tested: it's been used in OpenLDAP for years.

 3. It's very fast.  LMDB's benchmarks show that it is among
 the fastest key-value stores.

 4. It has a relatively simple concurrency story; readers don't
 block writers and writers don't block readers.

 Ronnie Sahlberg's original version of this patchset used tdb.  The
 advantage of tdb is that it's smaller (~125k).  The disadvantages are
 that tdb is hard to build on OS X.  It's also not in homebrew.  So lmdb
 seemed simpler.

 To test this backend's correctness, I hacked test-lib.sh and
 test-lib-functions.sh to run all tests under the refs backend. Dozens
 of tests use manual ref/reflog reading/writing, or create submodules
 without passing --refs-backend-type to git init.  If those tests are
 changed to use the update-ref machinery or test-refs-be-db (or, in the
 case of packed-refs, corrupt refs, and dumb fetch tests, are skipped),
 the only remaining failing tests are the git-new-workdir tests and the
 gitweb tests.

 Please let me know how it would be best to proceed.

 --
 To unsubscribe from this list: send the line unsubscribe git in
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread David Turner
On Tue, 2015-06-23 at 07:47 -0400, Jeff King wrote:
 On Mon, Jun 22, 2015 at 08:50:56PM -0400, David Turner wrote:
 
  The db backend runs git for-each-ref about 30% faster than the files
  backend with fully-packed refs on a repo with ~120k refs.  It's also
  about 4x faster than using fully-unpacked refs.  In addition, and
  perhaps more importantly, it avoids case-conflict issues on OS X.
 
 Neat.
 
 Can you describe a bit more about the reflog handling?
 
 One of the problems we've had with large-ref repos is that the reflog
 storage is quite inefficient. You can pack all the refs, but you may
 still be stuck with a bunch of reflog files with one entry, wasting a
 whole inode. Doing a git repack when you have a million of those has
 horrible cold-cache performance. Basically anything that isn't
 one-file-per-reflog would be a welcome change. :)

Reflogs are stored in the database as well.  There is one header entry
per ref to indicate that a reflog is present, and then one database
entry per reflog entry; the entries are stored consecutively and
immediately following the header so that it's fast to iterate over them.

 It has also been a dream of mine to stop tying the reflogs specifically
 to the refs. I.e., have a spot for reflogs of branches that no longer
 exist, which allows us to retain them for deleted branches. Then you can
 possibly recover from a branch deletion, whereas now you have to dig
 through git fsck's dangling output. And the reflog, if you don't
 expire it, becomes a suitable audit log to find out what happened to
 each branch when (whereas now it is full of holes when things get
 deleted).

That would be cool, and I don't think it would be hard to add to my
current code; we could simply replace the header with a tombstone.
But I would prefer to wait until the series is merged; then we can build
on top of it.

 I dunno. Maybe I am overthinking it. But it really feels like the _refs_
 are a key/value thing, but the _reflogs_ are not. You can cram them into
 a key/value store, but you're probably operating on them as a big blob,
 then.

Reflogs are, conceptually, queues. I agree that a raw key-value store is
not a good way to store queues, but a B-Tree is not so terrible, since
it offers relatively fast iteration (amortized constant time IIRC).

  I chose to use LMDB for the database.  LMDB has a few features that make
  it suitable for usage in git:
 
 One of the complaints that Shawn had about sqlite is that there is no
 native Java implementation, which makes it hard for JGit to ship a
 compatible backend. I suspect the same is true for LMDB, but it is
 probably a lot simpler than sqlite (so reimplementation might be
 possible).
 
 But it may also be worth going with a slightly slower database if we can
 get wider compatibility for free.

There's a JNI interface to LMDB, which is, of course, not native.  I
don't think it would be too hard to entirely rewrite LMDB in Java, but
I'm not going to have time to do it for the forseeable future.  I've
asked Howard Chu if he knows of any efforts in progress.

  To test this backend's correctness, I hacked test-lib.sh and
  test-lib-functions.sh to run all tests under the refs backend. Dozens
  of tests use manual ref/reflog reading/writing, or create submodules
  without passing --refs-backend-type to git init.  If those tests are
  changed to use the update-ref machinery or test-refs-be-db (or, in the
  case of packed-refs, corrupt refs, and dumb fetch tests, are skipped),
  the only remaining failing tests are the git-new-workdir tests and the
  gitweb tests.
 
 I think we'll need to bump core.repositoryformatversion, too. See the
 patches I just posted here:
 
   http://thread.gmane.org/gmane.comp.version-control.git/272447

Thanks, that's valuable.  For the refs backend, opening the LMDB
database for writing is sufficient to block other writers.  Do you think
it would be valuable to provide a git hold-ref-lock command that simply
reads refs from stdin and keeps them locked until it reads EOF from
stdin?  That would allow cross-backend ref locking. 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread David Turner
On Tue, 2015-06-23 at 17:23 +0700, Duy Nguyen wrote:
 On Tue, Jun 23, 2015 at 7:50 AM, David Turner
dtur...@twopensource.com wrote:
  To test this backend's correctness, I hacked test-lib.sh and
  test-lib-functions.sh to run all tests under the refs backend.
 
 Now we have two. split-index also benefits from running through full
 test suite like this. I propose we make make test run the test suite
 twice. The first run is with default configuration, no split index, no
 fancy ref backend. The second run enables split-index and switches to
 new backend, running through all test cases. In future we can also
 enable packv4 in this second run. There won't be a third run.
 
 When the second ref backend comes, we can switch between the two
 backends using a random number generator where we control both
 algorithm and seed, so that when a test fails, the user can give us
 their seed and we can re-run with the same configuration.

I'm not in love with this idea, because it makes it hard to do
exhaustive testing efficiently.  I would rather have make test run
through all tests under all combinations -- or at least all relevant
tests.  We could perhaps mark tests with a list of features that they
exercise, so that we don't have to run e.g. t8xxx with alternate refs
backends.  

 Dozens of tests use manual ref/reflog reading/writing, or create
submodules
  without passing --refs-backend-type to git init.  If those tests are
  changed to use the update-ref machinery or test-refs-be-db (or, in
the
  case of packed-refs, corrupt refs, and dumb fetch tests, are
skipped),
  the only remaining failing tests are the git-new-workdir tests and
the
  gitweb tests.
 
 I haven't read the series, but I guess you should also add a few tests
 to run on the first run, so new code is exercised a bit even if people
 skip the second run.

I did this already, yes.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread David Turner
On Tue, 2015-06-23 at 17:51 +0200, Michael Haggerty wrote:
 On 06/23/2015 02:50 AM, David Turner wrote:
  I've revived and modified Ronnie Sahlberg's work on the refs db
  backend.  
  
  The work is on top of be3c13e5564, Junio's First batch for 2.5 cycle.
  I recognize that there have been changes to the refs code since then,
  and that there are some further changes in-flight from e.g. Michael
  Haggerty.  If there is interest in this, I can rebase once Michael's
  changes land.
 
 It's awesome that you are working on this!
 
 I'm reading through your commits and will add comments as they pop into
 my head...
 
 * I initially read refs-be-files to be a short version of references,
 they be files. I might never be able to get that pronunciation out of
 my head :-)

That's OK so long as I can keep pronouncing reflog as re-flog. ;)

 * It would be more modest to call the files implementing the LMDB
 backend refs-be-lmdb.[c,h] rather than refs-be-db.[c,h].

Agreed.  Will fix.

 * I wonder whether `refname_is_safe()` might eventually have to become
 backend-specific. For example, maybe one backend will have to impose a
 limit of 128 characters or something? No matter, though...it can be
 moved later.

I think it would be an error to allow backends to impose additional
limits on ref names.  The limits imposed by the current backend have
been the cause of much sadness here at Twitter (primarily,
case-conflicts combined with d/f conflicts).

 * You have put `format_reflog_msg()` in the public interface. It
 probably makes sense, because more than one backend might want to use
 it. But another backend might want to store (refname, old_sha1,
 new_sha1, ...) as separate columns in a database. As long as
 `format_reflog_msg()` is seen as a helper function and is not called by
 any of the shared code, it shouldn't be a problem.

Agreed.

 * I wonder whether `init_backend()` will be general enough. 

We can always upgrade it later.

 * Your methods for bulk updates are I think analogous to the
 `initial_ref_transaction_commit()` function that I recently submitted
 [1]. Either way, the goal is to abstract away the fact that the
 file-based backend uses packed and loose references with tradeoffs that
 callers currently have to know about.

Yes, I saw your work after I had already started mine.

 * I don't like the fact that you have replaced `struct ref_transaction
 *` with `void *` in the public interface. On a practical level, I like
 the bit of type-safety that comes with the more specific declaration.
 But on a more abstract level, I think that the concept of a transaction
 could be useful across backends, for example in utility functions that
 verify that a proposed set of updates are internally consistent. I would
 rather see either
 
   * backends extend a basic `struct ref_transaction` to suit their
 needs, and upcast/downcast pointers at the module boundary, or
 
   * `struct ref_transaction` itself gets a `void *` member that backends
 can use for whatever purposes they want.

There are no common fields between refs-be-file transactions and
refs-be-lmdb transactions.  I don't see much gain from adding an empty
ref_transaction that backends could extend, since we would have to
explicitly upcast/downcast all over the place.

 * Regarding MERGE_HEAD: you take the point of view that it must continue
 to be stored as a file. And yet it must also behave somewhat like a
 reference; for example, `git rev-parse MERGE_HEAD` works today.
 MERGE_HEAD is also used for reachability, right?
 
 Another point of view is that MERGE_HEAD is a plain old boring
 reference, but there is some other metadata related to it that the refs
 backend has to store. The file-based backend would have special-case
 code to read the additional data from the tail of the loose refs file
 (and be sure to write the metadata when writing the reference), but
 other backends could store the reference with the rest but do their own
 thing with the metadata. So I guess I'm wondering whether the refs API
 needs a MERGE_HEAD-specific way to read and write MERGE_HEAD along with
 its metadata.

You are probably right that this is a good idea.

 * Don't the same considerations that apply to MERGE_HEAD also apply to
 FETCH_HEAD?

All of the tests pass without any special handling of FETCH_HEAD.

 * I'm showing my ignorance of LMDB, but I don't see where the LMDB
 backend initializes its database during `git init-db`. Is that what
 `init_env()` does? But it looks like `init_env()` is called on every git
 invocation (via `git_config_early()`). Puzzled.

There is no need to explicitly create the database (other than with
mkdir); init_env does everything for you.

 * Meanwhile, `create_default_files()` (in `builtin/init-db`) still
 creates directories `refs`, `refs/heads`, and `refs/tags`.

Yeah, that's legit.  I'll create a backend initdb function, and rename
init to setup.

 * Rehash of the last two points: I expected one backend function that is
 used to 

RE: RFC/Pull Request: Refs db backend

2015-06-23 Thread Randall S. Becker
 -Original Message-
 From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On
 Behalf Of David Turner
 Sent: June 23, 2015 4:22 PM
 To: Randall S. Becker
 Cc: 'Stefan Beller'; 'git mailing list'; 'ronnie sahlberg'
 Subject: Re: RFC/Pull Request: Refs db backend
 
  Just to beg a request: LMDB is not available on some MPP architectures to
 which git has been ported. If it comes up, I beg you not to add this as a
 dependency to base git components.
 
 My changes make `configure` check for the presence of liblmdb. The LMDB
 code is only built if liblmdb is present.  So, I think we're good.

Thanks :) You have no idea how much, in a burnt by that in other projects POV.

Cheers,
Randall

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread Junio C Hamano
David Turner dtur...@twopensource.com writes:

 On Tue, 2015-06-23 at 15:53 -0400, David Turner wrote:
  * Regarding MERGE_HEAD: you take the point of view that it must continue
  to be stored as a file. And yet it must also behave somewhat like a
  reference; for example, `git rev-parse MERGE_HEAD` works today.
  MERGE_HEAD is also used for reachability, right?
  
  Another point of view is that MERGE_HEAD is a plain old boring
  reference, but there is some other metadata related to it that the refs
  backend has to store. The file-based backend would have special-case
  code to read the additional data from the tail of the loose refs file
  (and be sure to write the metadata when writing the reference), but
  other backends could store the reference with the rest but do their own
  thing with the metadata. So I guess I'm wondering whether the refs API
  needs a MERGE_HEAD-specific way to read and write MERGE_HEAD along with
  its metadata.
 
 You are probably right that this is a good idea.

 On reflection, I think it might make sense to keep MERGE_HEAD as a file.
 The problem is that not only would refs backends have to add new
 MERGE_HEAD-handling functions, but we would also need new plumbing
 commands to allow scripts to access the complete contents of MERGE_HEAD.
 That seems more complicated to me.  

I think you are talking about FETCH_HEAD, but I tend to agree.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread David Turner
On Tue, 2015-06-23 at 10:16 -0700, Stefan Beller wrote:
  The db backend code was added in the penultimate commit; the rest is
  just code rearrangement and minor changes to make alternate backends
  possible.  There ended up being a fair amount of this 
  rearrangement,  but the end result is that almost the entire git 
  test suite runs under the db backend without error (see below for 
 details).
 
 Looking at the end result in refs-be-db.c it feels like there are more
 functions in the refs_be_db struct, did this originate from other 
 design choices? IIRC Ronnie wanted to have as least functions in 
 there as possible, and share as much of the code between the 
 databases, such that the glue between the db and the refs code is 
 minimal.

I didn't actually spend that much time reading Ronnie's backend code.
My code aims to be extremely thoroughly compatible.  I spent a ton of
time making sure that the git test suite passed.  I don't know if an
alternate approach would have been as compatible.

The requirement for reflog storage did complicate things a bit.

I also didn't see a strong need to abstract the database, since LMDB is
common, widely compatible, and tiny.  

 Some random comments from looking over the branch briefly:
 
 In the latest commit, (refs: tests for db backend), I am unsure about 
 the copyright annotations. At least a sole Copyright (c) 2007 Junio C
 Hamano doesn't make sense to me. ;)

Will fix, thanks.

 Typo in commit message bisect: use refs insfrastructure for 
 BISECT_START

Will fix, thanks.

 Some commits contain a ChangeId, which is a Gerrit leftover. :(

Those were leftover from Ronnie's patches; since you are a Googler and
you think we don't need them, I'll remove them. 


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC/Pull Request: Refs db backend

2015-06-23 Thread Randall S. Becker
 -Original Message-
 From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On
 Behalf Of David Turner
 Sent: June 23, 2015 4:05 PM
 To: Stefan Beller
 Cc: git mailing list; ronnie sahlberg
 Subject: Re: RFC/Pull Request: Refs db backend
 
 On Tue, 2015-06-23 at 10:16 -0700, Stefan Beller wrote:
   The db backend code was added in the penultimate commit; the rest is
   just code rearrangement and minor changes to make alternate backends
   possible.  There ended up being a fair amount of this rearrangement,
   but the end result is that almost the entire git test suite runs
   under the db backend without error (see below for
  details).
 
  Looking at the end result in refs-be-db.c it feels like there are more
  functions in the refs_be_db struct, did this originate from other
  design choices? IIRC Ronnie wanted to have as least functions in there
  as possible, and share as much of the code between the databases, such
  that the glue between the db and the refs code is minimal.
 
 I didn't actually spend that much time reading Ronnie's backend code.
 My code aims to be extremely thoroughly compatible.  I spent a ton of time
 making sure that the git test suite passed.  I don't know if an alternate
 approach would have been as compatible.
 
 The requirement for reflog storage did complicate things a bit.
 
 I also didn't see a strong need to abstract the database, since LMDB is 
 common,
 widely compatible, and tiny.

Just to beg a request: LMDB is not available on some MPP architectures to which 
git has been ported. If it comes up, I beg you not to add this as a dependency 
to base git components.

Thanks,
Randall

-- Brief whoami: NonStopUNIX developer since approximately 
UNIX(421664400)/NonStop(2112884442)
-- In my real life, I talk too much.



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread Michael Haggerty
On 06/23/2015 09:53 PM, David Turner wrote:
 On Tue, 2015-06-23 at 17:51 +0200, Michael Haggerty wrote:
 [...]
 * I don't like the fact that you have replaced `struct ref_transaction
 *` with `void *` in the public interface. On a practical level, I like
 the bit of type-safety that comes with the more specific declaration.
 But on a more abstract level, I think that the concept of a transaction
 could be useful across backends, for example in utility functions that
 verify that a proposed set of updates are internally consistent. I would
 rather see either

   * backends extend a basic `struct ref_transaction` to suit their
 needs, and upcast/downcast pointers at the module boundary, or

   * `struct ref_transaction` itself gets a `void *` member that backends
 can use for whatever purposes they want.
 
 There are no common fields between refs-be-file transactions and
 refs-be-lmdb transactions.  I don't see much gain from adding an empty
 ref_transaction that backends could extend, since we would have to
 explicitly upcast/downcast all over the place.

If you ask me, it would be better to do a bunch of up/downcasts within
the single module (via two helper functions that could even do
consistency checks) than have no help from the compiler in preventing
people from passing unrelated pointer types into the `void *transaction`
argument. Plus the `struct ref_transaction *` variables scattered
throughout the code are a lot more self-explanatory than `void *`.

 * Regarding MERGE_HEAD: you take the point of view that it must continue
 to be stored as a file. And yet it must also behave somewhat like a
 reference; for example, `git rev-parse MERGE_HEAD` works today.
 MERGE_HEAD is also used for reachability, right?

 Another point of view is that MERGE_HEAD is a plain old boring
 reference, but there is some other metadata related to it that the refs
 backend has to store. The file-based backend would have special-case
 code to read the additional data from the tail of the loose refs file
 (and be sure to write the metadata when writing the reference), but
 other backends could store the reference with the rest but do their own
 thing with the metadata. So I guess I'm wondering whether the refs API
 needs a MERGE_HEAD-specific way to read and write MERGE_HEAD along with
 its metadata.
 
 You are probably right that this is a good idea.
 
 * Don't the same considerations that apply to MERGE_HEAD also apply to
 FETCH_HEAD?
 
 All of the tests pass without any special handling of FETCH_HEAD.

That's odd. From git-fetch.txt:

The names of refs that are fetched, together with the object names
they point at, are written to `.git/FETCH_HEAD`.  This information
may be used by scripts or other git commands, such as
linkgit:git-pull[1].

It seems like the test suite is reading FETCH_HEAD via the refs API in a
couple of places. I don't understand why these don't fail when LMDB is
being used...

 * I'm showing my ignorance of LMDB, but I don't see where the LMDB
 backend initializes its database during `git init-db`. Is that what
 `init_env()` does? But it looks like `init_env()` is called on every git
 invocation (via `git_config_early()`). Puzzled.
 
 There is no need to explicitly create the database (other than with
 mkdir); init_env does everything for you.

OK.

 * Rehash of the last two points: I expected one backend function that is
 used to initialize the refs backend when a new repository is created
 (e.g., in `git init`). The file-based backend would use this function to
 create the `refs`, `refs/heads`, and `refs/tags` directories. I expected
 a second function that is called once every time git runs in an existing
 repository (this one might, for example, open a database connection).
 And maybe even a third one that closes down the database connection
 before git exits. Would you please explain how this actually works?
 
 LMDB doesn't really have the concept of a connection.  It's basically
 just a couple of files that communicate using shared memory (and maybe
 some other locking that I haven't paid attention to).  There is the
 concept of a transaction, which is the unit of concurrency (each
 thread may only have one open transaction).  Transactions are either
 read-only or read-write, and there can only be one read-write
 transaction open at a time (across the entire system).  Read-only
 transactions take a snapshot of the DB state at transaction start time.
 This combination of features means that we need to be a bit clever about
 read-only transactions; if a read-write transaction occurs in a separate
 process, we need to restart any read-only transactions to pick up its
 changes.

If you are thinking about an *unrelated* separate process, then Git's
philosophy is that if our process is reading *some* valid state of the
references, it's all good even if that state is not quite the newest.
After all, who's to say whether our process ran before or after the
other process? As long as each 

Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread David Turner
 Just to beg a request: LMDB is not available on some MPP architectures to 
 which git has been ported. If it comes up, I beg you not to add this as a 
 dependency to base git components.

My changes make `configure` check for the presence of liblmdb. The LMDB
code is only built if liblmdb is present.  So, I think we're good.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread David Turner
On Tue, 2015-06-23 at 15:53 -0400, David Turner wrote:
  * Regarding MERGE_HEAD: you take the point of view that it must continue
  to be stored as a file. And yet it must also behave somewhat like a
  reference; for example, `git rev-parse MERGE_HEAD` works today.
  MERGE_HEAD is also used for reachability, right?
  
  Another point of view is that MERGE_HEAD is a plain old boring
  reference, but there is some other metadata related to it that the refs
  backend has to store. The file-based backend would have special-case
  code to read the additional data from the tail of the loose refs file
  (and be sure to write the metadata when writing the reference), but
  other backends could store the reference with the rest but do their own
  thing with the metadata. So I guess I'm wondering whether the refs API
  needs a MERGE_HEAD-specific way to read and write MERGE_HEAD along with
  its metadata.
 
 You are probably right that this is a good idea.

On reflection, I think it might make sense to keep MERGE_HEAD as a file.
The problem is that not only would refs backends have to add new
MERGE_HEAD-handling functions, but we would also need new plumbing
commands to allow scripts to access the complete contents of MERGE_HEAD.
That seems more complicated to me.  

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread Jeff King
On Mon, Jun 22, 2015 at 08:50:56PM -0400, David Turner wrote:

 The db backend runs git for-each-ref about 30% faster than the files
 backend with fully-packed refs on a repo with ~120k refs.  It's also
 about 4x faster than using fully-unpacked refs.  In addition, and
 perhaps more importantly, it avoids case-conflict issues on OS X.

Neat.

Can you describe a bit more about the reflog handling?

One of the problems we've had with large-ref repos is that the reflog
storage is quite inefficient. You can pack all the refs, but you may
still be stuck with a bunch of reflog files with one entry, wasting a
whole inode. Doing a git repack when you have a million of those has
horrible cold-cache performance. Basically anything that isn't
one-file-per-reflog would be a welcome change. :)

It has also been a dream of mine to stop tying the reflogs specifically
to the refs. I.e., have a spot for reflogs of branches that no longer
exist, which allows us to retain them for deleted branches. Then you can
possibly recover from a branch deletion, whereas now you have to dig
through git fsck's dangling output. And the reflog, if you don't
expire it, becomes a suitable audit log to find out what happened to
each branch when (whereas now it is full of holes when things get
deleted).

Does your solution handle something like that?

I was thinking of actually moving to a log-structured ref storage.
Something like:

  - any ref write puts a line at the end of a single logfile that
contains the ref name, along with the normal reflog data

  - the logfile is the source of truth for the ref state. If you want to
know the value of any ref, you can read it backwards to find the
last entry for the ref. Everything else is an optimization.

Let's call the number of refs N, and the number of ref updates in
the log U.

  - we keep a key/value index mapping the name of any branch that exists
to the byte offset of its entry in the logfile. This would probably
be in some binary key/value store (like LMDB). Without this,
resolving a ref is O(U), which is horrible. With it, it should be
O(1) or O(lg N), depending on the index data structure.

  - the index can also contain other optimizations. E.g., rather than
point to the entry in the logfile, it can include the sha1 directly
(to avoid an extra level of indirection). It may want to include the
peeled value, as the current packed-refs file does.

  - Reading all of the reflogs (e.g., for repacking) is O(U), just like
it is today. Except the storage for the logfile is a lot more
compact than what we store today, with one reflog per file.

  - Reading a single reflog is _also_ O(U), which is not as good as
today. But if each log entry contains a byte offset of the previous
entry, you can follow the chain (it is still slightly worse, because
you are jumping all over the file, rather than reading a compact set
of lines).

  - Pruning the reflog entries from the logfile requires rewriting the
whole thing. That's similar to today, where we rewrite each of the
reflog files.

One of the nice properties of this system is that it should be very
resilient to corruption and races. Most of the operations are either
appending to a file, or writing to a tempfile and renaming in place.
The exception is the key/value index, but if we run into any problems
there, it can be rebuilt by walking over the logfile (for a cost of
O(U)).

I dunno. Maybe I am overthinking it. But it really feels like the _refs_
are a key/value thing, but the _reflogs_ are not. You can cram them into
a key/value store, but you're probably operating on them as a big blob,
then.

 I chose to use LMDB for the database.  LMDB has a few features that make
 it suitable for usage in git:

One of the complaints that Shawn had about sqlite is that there is no
native Java implementation, which makes it hard for JGit to ship a
compatible backend. I suspect the same is true for LMDB, but it is
probably a lot simpler than sqlite (so reimplementation might be
possible).

But it may also be worth going with a slightly slower database if we can
get wider compatibility for free.

 To test this backend's correctness, I hacked test-lib.sh and
 test-lib-functions.sh to run all tests under the refs backend. Dozens
 of tests use manual ref/reflog reading/writing, or create submodules
 without passing --refs-backend-type to git init.  If those tests are
 changed to use the update-ref machinery or test-refs-be-db (or, in the
 case of packed-refs, corrupt refs, and dumb fetch tests, are skipped),
 the only remaining failing tests are the git-new-workdir tests and the
 gitweb tests.

I think we'll need to bump core.repositoryformatversion, too. See the
patches I just posted here:

  http://thread.gmane.org/gmane.comp.version-control.git/272447

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More 

Re: RFC/Pull Request: Refs db backend

2015-06-23 Thread Duy Nguyen
On Tue, Jun 23, 2015 at 12:36 PM, Junio C Hamano gits...@pobox.com wrote:
 If there is interest?  Shut up and take my money ;-)

Yeah. This may be the next big thing since pack bitmap. It's even
better if it enters 'master' hand in hand with pack protocol v2, but I
think v2 needs more time.

On Tue, Jun 23, 2015 at 7:50 AM, David Turner dtur...@twopensource.com wrote:
 To test this backend's correctness, I hacked test-lib.sh and
 test-lib-functions.sh to run all tests under the refs backend.

Now we have two. split-index also benefits from running through full
test suite like this. I propose we make make test run the test suite
twice. The first run is with default configuration, no split index, no
fancy ref backend. The second run enables split-index and switches to
new backend, running through all test cases. In future we can also
enable packv4 in this second run. There won't be a third run.

When the second ref backend comes, we can switch between the two
backends using a random number generator where we control both
algorithm and seed, so that when a test fails, the user can give us
their seed and we can re-run with the same configuration.

 Dozens of tests use manual ref/reflog reading/writing, or create submodules
 without passing --refs-backend-type to git init.  If those tests are
 changed to use the update-ref machinery or test-refs-be-db (or, in the
 case of packed-refs, corrupt refs, and dumb fetch tests, are skipped),
 the only remaining failing tests are the git-new-workdir tests and the
 gitweb tests.

I haven't read the series, but I guess you should also add a few tests
to run on the first run, so new code is exercised a bit even if people
skip the second run.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Pull Request: Refs db backend

2015-06-22 Thread Junio C Hamano
David Turner dtur...@twopensource.com writes:

 I've revived and modified Ronnie Sahlberg's work on the refs db
 backend.  

 The work is on top of be3c13e5564, Junio's First batch for 2.5 cycle.
 I recognize that there have been changes to the refs code since then,
 and that there are some further changes in-flight from e.g. Michael
 Haggerty.  If there is interest in this, I can rebase once Michael's
 changes land.
 ...
 The db backend runs git for-each-ref about 30% faster than the files
 backend with fully-packed refs on a repo with ~120k refs.  It's also
 about 4x faster than using fully-unpacked refs.  In addition, and
 perhaps more importantly, it avoids case-conflict issues on OS X.

 I chose to use LMDB for the database...
 ...
 Ronnie Sahlberg's original version of this patchset used tdb.  The
 advantage of tdb is that it's smaller (~125k).  The disadvantages are
 that tdb is hard to build on OS X.  It's also not in homebrew.  So lmdb
 seemed simpler.

If there is interest?  Shut up and take my money ;-)

More seriously, that's great that you stepped up to resurrect this
topic.  In a sense, the choice of sample database backend does not
matter.  I do not care if it is tdb, lmdb, or even Berkeley DB as
long as it functions. ;-)

As long as the interface between ref-transaction system on the Git
side and the database backend is designed right, your lmdb thing can
serve as a reference implementation for other people to plug other
database backends to the same interface, right?  As one step to
validate the interface to the database backends, it would be nice to
eventually have at least two backends that talk to meaningfully
different systems, but we have to start somewhere, and for now we
have lmdb is as good a place to start as any other db backend.

I wonder if we can do a filesystem backend on top of the same
backend interface---is that too much impedance mismatch to make it
unpractical?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in