Re: reftable [v5]: new ref storage format

2017-08-14 Thread Howard Chu

Howard Chu wrote:
The primary issue with using LMDB over NFS is with performance. All reads are 
performed thru accesses of mapped memory, and in general, NFS implementations 
don't cache mmap'd pages. I believe this is a consequence of the fact that 
they also can't guarantee cache coherence, so the only way for an NFS client 
to see a write from another NFS client is by always refetching pages whenever 
they're accessed.


LMDB's read lock management also wouldn't perform well over NFS; it also uses an mmap'd file. On a local filesystem LMDB read locks are zero cost since they just atomically update a word in the mmap. Over NFS, each update to the mmap would also require an msync() to propagate the change back to the server. This would seriously limit the speed with which read transactions may be opened and closed. (Ordinarily opening and closing a read txn can be done with zero system calls.) 


All that aside, we could simply add an EXCLUSIVE open-flag to LMDB, and 
prevent multiple processes from using the DB concurrently. In that case, 
maintaining coherence with other NFS clients is a non-issue. It strikes me 
that git doesn't require concurrent multi-process access anyway, and any 
particular process would only use the DB for a short time before closing it 
and going away.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/


Re: reftable [v5]: new ref storage format

2017-08-09 Thread Howard Chu

Shawn Pearce wrote:

On Sun, Aug 6, 2017 at 4:37 PM, Ben Alex <ben.a...@acegi.com.au> wrote:

> Just on the LmdbJava specific pieces:
>
> On Mon, Aug 7, 2017 at 8:56 AM, Shawn Pearce <spea...@spearce.org> wrote:



I don't know if we need a larger key size. $DAY_JOB limits ref names
to ~200 bytes in a hook. I think GitHub does similar. But I'm worried
about the general masses who might be using our software and expect
ref names thus far to be as long as PATH_MAX on their system. Most
systems run PATH_MAX around 1024.


The key size limit in LMDB can be safely raised to around 2KB or so without 
any issues. There's also work underway in LMDB 1.0 to raise the limit to 2GB, 
but in general it would be silly to use such large keys.



Mostly at $DAY_JOB its because we can't virtualize the filesystem
calls the C library is doing.

In git-core, I'm worried about the caveats related to locking. Git
tries to work nicely on NFS,


That may be a problem in current LMDB 0.9, but needs further clarification.


and it seems LMDB wouldn't. Git also runs
fine on a read-only filesystem, and LMDB gets a little weird about
that.


Not sure what you're talking about. LMDB works perfectly fine on read-only 
filesystems, it just enforces that it is used in read-only mode.



Finally, Git doesn't have nearly the risks LMDB has about a
crashed reader or writer locking out future operations until the locks
have been resolved. This is especially true with shared user
repositories, where another user might setup and own the semaphore.


All locks disappear when the last process using the DB environment exits.
If only a single process is using the DB environment, there's no issue. If 
multiple processes are sharing the DB environment concurrently, the write lock 
cleans up automatically when the writer terminates; stale reader locks would 
require a call to mdb_reader_check() to clean them up.


The primary issue with using LMDB over NFS is with performance. All reads are 
performed thru accesses of mapped memory, and in general, NFS implementations 
don't cache mmap'd pages. I believe this is a consequence of the fact that 
they also can't guarantee cache coherence, so the only way for an NFS client 
to see a write from another NFS client is by always refetching pages whenever 
they're accessed.


This is also why LMDB doesn't provide user-level VFS hooks - it's generally 
impractical to emulate mmap from application level. You could always write a 
FUSE driver if that's really what you need to do, but again, the performance 
of such a solution is pretty horrible.


LMDB's read lock management also wouldn't perform well over NFS; it also uses 
an mmap'd file. On a local filesystem LMDB read locks are zero cost since they 
just atomically update a word in the mmap. Over NFS, each update to the mmap 
would also require an msync() to propagate the change back to the server. This 
would seriously limit the speed with which read transactions may be opened and 
closed. (Ordinarily opening and closing a read txn can be done with zero 
system calls.)


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/


Re: [PATCH 02/16] refs: add methods for misc ref operations

2015-12-17 Thread Howard Chu
Junio C Hamano  pobox.com> writes:

> 
> David Turner  twopensource.com> writes:
> 
> >  struct ref_be {
> > struct ref_be *next;
> > const char *name;
> > ref_transaction_commit_fn *transaction_commit;
> > +
> > +   pack_refs_fn *pack_refs;
> > +   peel_ref_fn *peel_ref;
> > +   create_symref_fn *create_symref;
> > +
> > +   resolve_ref_unsafe_fn *resolve_ref_unsafe;
> > +   verify_refname_available_fn *verify_refname_available;
> > +   resolve_gitlink_ref_fn *resolve_gitlink_ref;
> >  };
> 
> This may have been pointed out in the previous reviews by somebody
> else, but I think it is more customary to declare a struct member
> that is a pointer to a customization function without leading '*',
> i.e.
> 
>   typedef TYPE (*customize_fn)(ARGS);
> 
> struct vtable {
>   ...
>   cutomize_fn fn;
>   ...
>   };
> 
> in our codebase (cf. string_list::cmp, prio_queue::compare).

(LMDB author here, just passing by)

IMO you're making a mistake. You should always typedef the thing itself, not
a pointer to the thing. If you only typedef the pointer, you can only use
the typedef to declare pointers and never the thing itself. If you typedef
the actual thing, you can use the typedef to declare both the thing and
pointer to thing.

It's particularly useful to have function typedefs because later on you can
use the actual typedef to declare instances of that function type in header
files, and guarantee that your function definitions match what they're
intended to match. Otherwise, assigning a bare (*func) to something that
mismatches only generates a warning, instead of an error.

-- 
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/ 


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Wishlist: git fetch --reference

2014-08-21 Thread Howard Chu
I maintain multiple copies of the same repo because I keep each one checked 
out to different branch/rev levels. It would be nice if, similar to clone 
--reference, we could also use git fetch --reference to reference a local repo 
when doing a fetch to pull in updates.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Wishlist: git fetch --reference

2014-08-21 Thread Howard Chu

Jeff King wrote:

On Thu, Aug 21, 2014 at 07:57:47PM -0700, Howard Chu wrote:


I maintain multiple copies of the same repo because I keep each one checked
out to different branch/rev levels. It would be nice if, similar to clone
--reference, we could also use git fetch --reference to reference a local
repo when doing a fetch to pull in updates.


I think it is just spelled:

   echo $reference_repo .git/objects/info/alternates
   git fetch

We need --reference with clone because that first line needs to happen
after clone runs git init but before it runs git fetch. And if you
cloned with --reference, of course, the alternates file remains and
further fetches will automatically use it.


Aha, thanks, hadn't realized that. Just checked and yes, the alternates file 
is already set in all of these different copies.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html