Re: [PATCH v4 20/21] refs: add LMDB refs storage backend

2016-02-17 Thread David Turner
On Sun, 2016-02-14 at 19:04 +0700, Duy Nguyen wrote:
> On Sat, Feb 6, 2016 at 2:44 AM, David Turner <
> dtur...@twopensource.com> wrote:
> > +static char *get_refdb_path(const char *base)
> > +{
> > +   static struct strbuf path_buf = STRBUF_INIT;
> > +   strbuf_reset(_buf);
> > +   strbuf_addf(_buf, "%s/refdb", base);
> > +   return path_buf.buf;
> > +}
> ...
> > +static int lmdb_init_db(struct strbuf *err, int shared)
> > +{
> > +   /*
> > +* To create a db, all we need to do is make a directory
> > for
> > +* it to live in; lmdb will do the rest.
> > +*/
> > +
> > +   if (!db_path)
> > +   db_path =
> > xstrdup(real_path(get_refdb_path(get_git_common_dir(;
> 
> This works for multiple worktrees. But scripts may have harder time
> getting the path. The recommended way is "git rev-parse --git-path
> refdb" but because "refdb" is not registered in path.c:common_list[],
> that command becomes git_path("refdb") instead of
> get_refdb(get_git_... like here. And I will need to know that
> .git/refdb is _not_ per-worktree when I migrate/convert main worktree
> (it's very likely I have to go that route to solve .git/config issue
> in multi worktree).

I'll fix common_list.  

> The solution is register refdb to common_list[] and you can do
> git_path("refdb") here. But then what happens when another backend is
> added? Will the new backend use the same path "refdb", or say
> "refdb.sqlite"? If all backends share the name "refdb", why can't we
> just reuse "refs" instead because the default filesystem-based 
> backend is technically just another backend?

I'll change it to "refs.lmdb".

I don't want to use "refs" because it makes it hard for software to
distinguish the format (in some cases, like sqlite, it won't even be a
directory).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 20/21] refs: add LMDB refs storage backend

2016-02-16 Thread David Turner
On Mon, 2016-02-15 at 16:57 +0700, Duy Nguyen wrote:
> On Sun, Feb 14, 2016 at 7:04 PM, Duy Nguyen 
> wrote:
> > On Sat, Feb 6, 2016 at 2:44 AM, David Turner <
> > dtur...@twopensource.com> wrote:
> > > +static char *get_refdb_path(const char *base)
> > > +{
> > > +   static struct strbuf path_buf = STRBUF_INIT;
> > > +   strbuf_reset(_buf);
> > > +   strbuf_addf(_buf, "%s/refdb", base);
> > > +   return path_buf.buf;
> > > +}
> > ...
> > > +static int lmdb_init_db(struct strbuf *err, int shared)
> > > +{
> > > +   /*
> > > +* To create a db, all we need to do is make a directory
> > > for
> > > +* it to live in; lmdb will do the rest.
> > > +*/
> > > +
> > > +   if (!db_path)
> > > +   db_path =
> > > xstrdup(real_path(get_refdb_path(get_git_common_dir(;
> > 
> > This works for multiple worktrees. But scripts may have harder time
> > getting the path. The recommended way is "git rev-parse --git-path
> > refdb" but because "refdb" is not registered in
> > path.c:common_list[],
> > that command becomes git_path("refdb") instead of
> > get_refdb(get_git_... like here. And I will need to know that
> > .git/refdb is _not_ per-worktree when I migrate/convert main
> > worktree
> > (it's very likely I have to go that route to solve .git/config
> > issue
> > in multi worktree).
> > 
> > The solution is register refdb to common_list[] and you can do
> > git_path("refdb") here. But then what happens when another backend
> > is
> > added? Will the new backend use the same path "refdb", or say
> > "refdb.sqlite"? If all backends share the name "refdb", why can't
> > we
> > just reuse "refs" instead because the default filesystem-based
> > backend
> > is technically just another backend?
> 
> To answer myself: I forgot that there were per-worktree refs (e.g.
> refs/bisect). It makes me wonder if we should put per-worktree refs
> to
> lmdb as well (maybe one per worktree if we don't want to put all in
> one db). One of the advantages of moving away from fs-based backend
> is
> the ability to deal with case sensitivity, "nested" refs (e.g. a/b
> and
> a/b/c are both refs).

I don't think case-sensitivity is a huge problem here: per-worktree
refs are now HEAD or refs/bisect/*, and refs/bisect is controlled
entirely by the bisection machinery (that is, it won't have user
-defined refs with case conflicts).

>  With this split, I think some refs are still
> left behind.. Sorry if this was discussed before, I haven't followed
> this closely.

Previous discussion:
http://comments.gmane.org/gmane.comp.version-control.git/275097
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 20/21] refs: add LMDB refs storage backend

2016-02-15 Thread Duy Nguyen
On Sun, Feb 14, 2016 at 7:04 PM, Duy Nguyen  wrote:
> On Sat, Feb 6, 2016 at 2:44 AM, David Turner  wrote:
>> +static char *get_refdb_path(const char *base)
>> +{
>> +   static struct strbuf path_buf = STRBUF_INIT;
>> +   strbuf_reset(_buf);
>> +   strbuf_addf(_buf, "%s/refdb", base);
>> +   return path_buf.buf;
>> +}
> ...
>> +static int lmdb_init_db(struct strbuf *err, int shared)
>> +{
>> +   /*
>> +* To create a db, all we need to do is make a directory for
>> +* it to live in; lmdb will do the rest.
>> +*/
>> +
>> +   if (!db_path)
>> +   db_path = 
>> xstrdup(real_path(get_refdb_path(get_git_common_dir(;
>
> This works for multiple worktrees. But scripts may have harder time
> getting the path. The recommended way is "git rev-parse --git-path
> refdb" but because "refdb" is not registered in path.c:common_list[],
> that command becomes git_path("refdb") instead of
> get_refdb(get_git_... like here. And I will need to know that
> .git/refdb is _not_ per-worktree when I migrate/convert main worktree
> (it's very likely I have to go that route to solve .git/config issue
> in multi worktree).
>
> The solution is register refdb to common_list[] and you can do
> git_path("refdb") here. But then what happens when another backend is
> added? Will the new backend use the same path "refdb", or say
> "refdb.sqlite"? If all backends share the name "refdb", why can't we
> just reuse "refs" instead because the default filesystem-based backend
> is technically just another backend?

To answer myself: I forgot that there were per-worktree refs (e.g.
refs/bisect). It makes me wonder if we should put per-worktree refs to
lmdb as well (maybe one per worktree if we don't want to put all in
one db). One of the advantages of moving away from fs-based backend is
the ability to deal with case sensitivity, "nested" refs (e.g. a/b and
a/b/c are both refs). With this split, I think some refs are still
left behind.. Sorry if this was discussed before, I haven't followed
this closely.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 20/21] refs: add LMDB refs storage backend

2016-02-14 Thread Duy Nguyen
On Sat, Feb 6, 2016 at 2:44 AM, David Turner  wrote:
> +static char *get_refdb_path(const char *base)
> +{
> +   static struct strbuf path_buf = STRBUF_INIT;
> +   strbuf_reset(_buf);
> +   strbuf_addf(_buf, "%s/refdb", base);
> +   return path_buf.buf;
> +}
...
> +static int lmdb_init_db(struct strbuf *err, int shared)
> +{
> +   /*
> +* To create a db, all we need to do is make a directory for
> +* it to live in; lmdb will do the rest.
> +*/
> +
> +   if (!db_path)
> +   db_path = 
> xstrdup(real_path(get_refdb_path(get_git_common_dir(;

This works for multiple worktrees. But scripts may have harder time
getting the path. The recommended way is "git rev-parse --git-path
refdb" but because "refdb" is not registered in path.c:common_list[],
that command becomes git_path("refdb") instead of
get_refdb(get_git_... like here. And I will need to know that
.git/refdb is _not_ per-worktree when I migrate/convert main worktree
(it's very likely I have to go that route to solve .git/config issue
in multi worktree).

The solution is register refdb to common_list[] and you can do
git_path("refdb") here. But then what happens when another backend is
added? Will the new backend use the same path "refdb", or say
"refdb.sqlite"? If all backends share the name "refdb", why can't we
just reuse "refs" instead because the default filesystem-based backend
is technically just another backend?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 20/21] refs: add LMDB refs storage backend

2016-02-12 Thread Michael Haggerty
On 02/05/2016 08:44 PM, David Turner wrote:
> Add a database backend for refs using LMDB.  This backend runs git
> for-each-ref about 30% faster than the files backend with fully-packed
> refs on a repo with ~120k refs.  It's also about 4x faster than using
> fully-unpacked refs.  In addition, and perhaps more importantly, it
> avoids case-conflict issues on OS X.
> 
> LMDB has a few features that make it suitable for usage in git:
> 

0. It is licensed under the OpenLDAP Public License, which is apparently
GPL compatible [1].

[1] http://www.gnu.org/licenses/license-list.en.html#newOpenLDAP

> 1. It is relatively lightweight; it requires only one header file, and
> the library code takes under 64k at runtime.
> 
> 2. It is well-tested: it's been used in OpenLDAP for years.
> 
> 3. It's very fast.  LMDB's benchmarks show that it is among
> the fastest key-value stores.
> 
> 4. It has a relatively simple concurrency story; readers don't
> block writers and writers don't block readers.
> 
> Ronnie Sahlberg's original version of this patchset used tdb.  The
> major disadvantage of tdb is that tdb is hard to build on OS X.  It's
> also not in homebrew.  So lmdb seemed simpler.
> 
> To test this backend's correctness, I hacked test-lib.sh and
> test-lib-functions.sh to run all tests under the refs backend. Dozens
> of tests use manual ref/reflog reading/writing, or create submodules
> without passing --ref-storage to git init.  If those tests are
> changed to use the update-ref machinery or test-refs-lmdb-backend (or,
> in the case of packed-refs, corrupt refs, and dumb fetch tests, are
> skipped), the only remaining failing tests are the git-new-workdir
> tests and the gitweb tests.

Would it possible to build your "hack" into the test suite? For example,
perhaps one could implement an option that requests tests to use LMDB
wherever possible and skip the tests that are not LMDB-compatible. Over
time, we could try to increase the fraction of tests that are
LMDB-compatible by (for example) using `git update-ref` and `git
rev-parse` rather than reading/writing reference files via the
filesystem directly.

Otherwise, I'm worried that LMDB support will bitrot quickly because it
will be too much of a nuisance to exercise the code.

> Signed-off-by: David Turner 
> ---
>  .gitignore |1 +
>  Documentation/config.txt   |7 +
>  Documentation/git-clone.txt|3 +-
>  Documentation/git-init.txt |2 +-
>  Documentation/technical/refs-lmdb-backend.txt  |   52 +
>  Documentation/technical/repository-version.txt |5 +
>  Makefile   |   12 +
>  configure.ac   |   33 +
>  contrib/workdir/git-new-workdir|3 +
>  refs.c |3 +
>  refs.h |2 +
>  refs/lmdb-backend.c| 2052 
> 
>  setup.c|   11 +-
>  test-refs-lmdb-backend.c   |   64 +
>  transport.c|7 +-
>  15 files changed, 2250 insertions(+), 7 deletions(-)
>  create mode 100644 Documentation/technical/refs-lmdb-backend.txt
>  create mode 100644 refs/lmdb-backend.c
>  create mode 100644 test-refs-lmdb-backend.c
> 
> diff --git a/.gitignore b/.gitignore
> index 1c2f832..87d45a2 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -199,6 +199,7 @@
>  /test-path-utils
>  /test-prio-queue
>  /test-read-cache
> +/test-refs-lmdb-backend
>  /test-regex
>  /test-revision-walking
>  /test-run-command
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 06d3659..bc222c9 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1153,6 +1153,13 @@ difftool..cmd::
>  difftool.prompt::
>   Prompt before each invocation of the diff tool.
>  
> +extensions.refsStorage::
> + Type of refs storage backend. Default is to use the original
> + files based ref storage.  When set to "lmdb", refs are stored in
> + the lmdb database backend.  This setting reflects the refs
> + backend that is currently in use; it is not possible to change
> + the backend by changing this setting.
> +

Let's give the users a pointer to how they *can* change this setting. Maybe

Type of refs storage backend. Default is to use the original
files based ref storage.  When set to "lmdb", refs are stored in
an LMDB database.  This setting reflects the refs storage
backend that was chosen via the --ref-storage option when the
repository was originally created. It is currently
not possible to change the refs storage backend of an
existing repository.


>  fetch.recurseSubmodules::
>   This option can be either set to a boolean value or to 'on-demand'.
>

Re: [PATCH v4 20/21] refs: add LMDB refs storage backend

2016-02-12 Thread David Turner
On Fri, 2016-02-12 at 18:01 +0100, Michael Haggerty wrote:
> On 02/05/2016 08:44 PM, David Turner wrote:
> > Add a database backend for refs using LMDB.  This backend runs git
> > for-each-ref about 30% faster than the files backend with fully
> > -packed
> > refs on a repo with ~120k refs.  It's also about 4x faster than
> > using
> > fully-unpacked refs.  In addition, and perhaps more importantly, it
> > avoids case-conflict issues on OS X.
> > 
> > LMDB has a few features that make it suitable for usage in git:
> > 
> 
> 0. It is licensed under the OpenLDAP Public License, which is
> apparently
> GPL compatible [1].
> 
> [1] http://www.gnu.org/licenses/license-list.en.html#newOpenLDAP

Oh, yeah, I checked that before choosing it and then forgot all about
it. 

> > 1. It is relatively lightweight; it requires only one header file,
> > and
> > the library code takes under 64k at runtime.
> > 
> > 2. It is well-tested: it's been used in OpenLDAP for years.
> > 
> > 3. It's very fast.  LMDB's benchmarks show that it is among
> > the fastest key-value stores.
> > 
> > 4. It has a relatively simple concurrency story; readers don't
> > block writers and writers don't block readers.
> > 
> > Ronnie Sahlberg's original version of this patchset used tdb.  The
> > major disadvantage of tdb is that tdb is hard to build on OS X. 
> >  It's
> > also not in homebrew.  So lmdb seemed simpler.
> > 
> > To test this backend's correctness, I hacked test-lib.sh and
> > test-lib-functions.sh to run all tests under the refs backend.
> > Dozens
> > of tests use manual ref/reflog reading/writing, or create
> > submodules
> > without passing --ref-storage to git init.  If those tests are
> > changed to use the update-ref machinery or test-refs-lmdb-backend
> > (or,
> > in the case of packed-refs, corrupt refs, and dumb fetch tests, are
> > skipped), the only remaining failing tests are the git-new-workdir
> > tests and the gitweb tests.
> 
> Would it possible to build your "hack" into the test suite? For
> example,
> perhaps one could implement an option that requests tests to use LMDB
> wherever possible and skip the tests that are not LMDB-compatible.
> Over
> time, we could try to increase the fraction of tests that are
> LMDB-compatible by (for example) using `git update-ref` and `git
> rev-parse` rather than reading/writing reference files via the
> filesystem directly.
> 
> Otherwise, I'm worried that LMDB support will bitrot quickly because
> it
> will be too much of a nuisance to exercise the code.

I can add that.  It requires minor changes to a large number of tests,
but that's OK.

> > Signed-off-by: David Turner 
> > ---
> >  .gitignore |1 +
> >  Documentation/config.txt   |7 +
> >  Documentation/git-clone.txt|3 +-
> >  Documentation/git-init.txt |2 +-
> >  Documentation/technical/refs-lmdb-backend.txt  |   52 +
> >  Documentation/technical/repository-version.txt |5 +
> >  Makefile   |   12 +
> >  configure.ac   |   33 +
> >  contrib/workdir/git-new-workdir|3 +
> >  refs.c |3 +
> >  refs.h |2 +
> >  refs/lmdb-backend.c| 2052
> > 
> >  setup.c|   11 +-
> >  test-refs-lmdb-backend.c   |   64 +
> >  transport.c|7 +-
> >  15 files changed, 2250 insertions(+), 7 deletions(-)
> >  create mode 100644 Documentation/technical/refs-lmdb-backend.txt
> >  create mode 100644 refs/lmdb-backend.c
> >  create mode 100644 test-refs-lmdb-backend.c
> > 
> > diff --git a/.gitignore b/.gitignore
> > index 1c2f832..87d45a2 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -199,6 +199,7 @@
> >  /test-path-utils
> >  /test-prio-queue
> >  /test-read-cache
> > +/test-refs-lmdb-backend
> >  /test-regex
> >  /test-revision-walking
> >  /test-run-command
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index 06d3659..bc222c9 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -1153,6 +1153,13 @@ difftool..cmd::
> >  difftool.prompt::
> > Prompt before each invocation of the diff tool.
> >  
> > +extensions.refsStorage::
> > +   Type of refs storage backend. Default is to use the
> > original
> > +   files based ref storage.  When set to "lmdb", refs are
> > stored in
> > +   the lmdb database backend.  This setting reflects the refs
> > +   backend that is currently in use; it is not possible to
> > change
> > +   the backend by changing this setting.
> > +
> 
> Let's give the users a pointer to how they *can* change this setting.
> Maybe
> 
>   Type of refs storage backend. Default is to use the original
>   

Re: [PATCH v4 20/21] refs: add LMDB refs storage backend

2016-02-11 Thread Michael Haggerty
On 02/05/2016 08:44 PM, David Turner wrote:
> Add a database backend for refs using LMDB.  This backend runs git
> for-each-ref about 30% faster than the files backend with fully-packed
> refs on a repo with ~120k refs.  It's also about 4x faster than using
> fully-unpacked refs.  In addition, and perhaps more importantly, it
> avoids case-conflict issues on OS X.

When I compile this using gcc with -Werror=pointer-arith, I get the
following errors:

> refs/lmdb-backend.c: In function ‘rename_reflog_ent’:
> refs/lmdb-backend.c:1068:39: error: pointer of type ‘void *’ used in 
> arithmetic [-Werror=pointer-arith]
>   strbuf_add(_key_buf, key.mv_data + key.mv_size - 8, 8);
>^
> refs/lmdb-backend.c:1068:53: error: pointer of type ‘void *’ used in 
> arithmetic [-Werror=pointer-arith]
>   strbuf_add(_key_buf, key.mv_data + key.mv_size - 8, 8);

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 20/21] refs: add LMDB refs storage backend

2016-02-11 Thread David Turner
On Thu, 2016-02-11 at 09:48 +0100, Michael Haggerty wrote:
> On 02/05/2016 08:44 PM, David Turner wrote:
> > Add a database backend for refs using LMDB.  This backend runs git
> > for-each-ref about 30% faster than the files backend with fully
> > -packed
> > refs on a repo with ~120k refs.  It's also about 4x faster than
> > using
> > fully-unpacked refs.  In addition, and perhaps more importantly, it
> > avoids case-conflict issues on OS X.
> 
> When I compile this using gcc with -Werror=pointer-arith, I get the
> following errors:
> 
> > refs/lmdb-backend.c: In function ‘rename_reflog_ent’:
> > refs/lmdb-backend.c:1068:39: error: pointer of type ‘void *’ used
> > in arithmetic [-Werror=pointer-arith]
> >   strbuf_add(_key_buf, key.mv_data + key.mv_size - 8, 8);
> >^
> > refs/lmdb-backend.c:1068:53: error: pointer of type ‘void *’ used
> > in arithmetic [-Werror=pointer-arith]
> >   strbuf_add(_key_buf, key.mv_data + key.mv_size - 8, 8);

Fixed, thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html