Re: reftable [v5]: new ref storage format
Howard Chu wrote: The primary issue with using LMDB over NFS is with performance. All reads are performed thru accesses of mapped memory, and in general, NFS implementations don't cache mmap'd pages. I believe this is a consequence of the fact that they also can't guarantee cache coherence, so the only way for an NFS client to see a write from another NFS client is by always refetching pages whenever they're accessed. LMDB's read lock management also wouldn't perform well over NFS; it also uses an mmap'd file. On a local filesystem LMDB read locks are zero cost since they just atomically update a word in the mmap. Over NFS, each update to the mmap would also require an msync() to propagate the change back to the server. This would seriously limit the speed with which read transactions may be opened and closed. (Ordinarily opening and closing a read txn can be done with zero system calls.) All that aside, we could simply add an EXCLUSIVE open-flag to LMDB, and prevent multiple processes from using the DB concurrently. In that case, maintaining coherence with other NFS clients is a non-issue. It strikes me that git doesn't require concurrent multi-process access anyway, and any particular process would only use the DB for a short time before closing it and going away. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Re: reftable [v5]: new ref storage format
Shawn Pearce wrote: On Sun, Aug 6, 2017 at 4:37 PM, Ben Alex <ben.a...@acegi.com.au> wrote: > Just on the LmdbJava specific pieces: > > On Mon, Aug 7, 2017 at 8:56 AM, Shawn Pearce <spea...@spearce.org> wrote: I don't know if we need a larger key size. $DAY_JOB limits ref names to ~200 bytes in a hook. I think GitHub does similar. But I'm worried about the general masses who might be using our software and expect ref names thus far to be as long as PATH_MAX on their system. Most systems run PATH_MAX around 1024. The key size limit in LMDB can be safely raised to around 2KB or so without any issues. There's also work underway in LMDB 1.0 to raise the limit to 2GB, but in general it would be silly to use such large keys. Mostly at $DAY_JOB its because we can't virtualize the filesystem calls the C library is doing. In git-core, I'm worried about the caveats related to locking. Git tries to work nicely on NFS, That may be a problem in current LMDB 0.9, but needs further clarification. and it seems LMDB wouldn't. Git also runs fine on a read-only filesystem, and LMDB gets a little weird about that. Not sure what you're talking about. LMDB works perfectly fine on read-only filesystems, it just enforces that it is used in read-only mode. Finally, Git doesn't have nearly the risks LMDB has about a crashed reader or writer locking out future operations until the locks have been resolved. This is especially true with shared user repositories, where another user might setup and own the semaphore. All locks disappear when the last process using the DB environment exits. If only a single process is using the DB environment, there's no issue. If multiple processes are sharing the DB environment concurrently, the write lock cleans up automatically when the writer terminates; stale reader locks would require a call to mdb_reader_check() to clean them up. The primary issue with using LMDB over NFS is with performance. All reads are performed thru accesses of mapped memory, and in general, NFS implementations don't cache mmap'd pages. I believe this is a consequence of the fact that they also can't guarantee cache coherence, so the only way for an NFS client to see a write from another NFS client is by always refetching pages whenever they're accessed. This is also why LMDB doesn't provide user-level VFS hooks - it's generally impractical to emulate mmap from application level. You could always write a FUSE driver if that's really what you need to do, but again, the performance of such a solution is pretty horrible. LMDB's read lock management also wouldn't perform well over NFS; it also uses an mmap'd file. On a local filesystem LMDB read locks are zero cost since they just atomically update a word in the mmap. Over NFS, each update to the mmap would also require an msync() to propagate the change back to the server. This would seriously limit the speed with which read transactions may be opened and closed. (Ordinarily opening and closing a read txn can be done with zero system calls.) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Re: [PATCH 02/16] refs: add methods for misc ref operations
Junio C Hamano pobox.com> writes: > > David Turner twopensource.com> writes: > > > struct ref_be { > > struct ref_be *next; > > const char *name; > > ref_transaction_commit_fn *transaction_commit; > > + > > + pack_refs_fn *pack_refs; > > + peel_ref_fn *peel_ref; > > + create_symref_fn *create_symref; > > + > > + resolve_ref_unsafe_fn *resolve_ref_unsafe; > > + verify_refname_available_fn *verify_refname_available; > > + resolve_gitlink_ref_fn *resolve_gitlink_ref; > > }; > > This may have been pointed out in the previous reviews by somebody > else, but I think it is more customary to declare a struct member > that is a pointer to a customization function without leading '*', > i.e. > > typedef TYPE (*customize_fn)(ARGS); > > struct vtable { > ... > cutomize_fn fn; > ... > }; > > in our codebase (cf. string_list::cmp, prio_queue::compare). (LMDB author here, just passing by) IMO you're making a mistake. You should always typedef the thing itself, not a pointer to the thing. If you only typedef the pointer, you can only use the typedef to declare pointers and never the thing itself. If you typedef the actual thing, you can use the typedef to declare both the thing and pointer to thing. It's particularly useful to have function typedefs because later on you can use the actual typedef to declare instances of that function type in header files, and guarantee that your function definitions match what they're intended to match. Otherwise, assigning a bare (*func) to something that mismatches only generates a warning, instead of an error. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wishlist: git fetch --reference
I maintain multiple copies of the same repo because I keep each one checked out to different branch/rev levels. It would be nice if, similar to clone --reference, we could also use git fetch --reference to reference a local repo when doing a fetch to pull in updates. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wishlist: git fetch --reference
Jeff King wrote: On Thu, Aug 21, 2014 at 07:57:47PM -0700, Howard Chu wrote: I maintain multiple copies of the same repo because I keep each one checked out to different branch/rev levels. It would be nice if, similar to clone --reference, we could also use git fetch --reference to reference a local repo when doing a fetch to pull in updates. I think it is just spelled: echo $reference_repo .git/objects/info/alternates git fetch We need --reference with clone because that first line needs to happen after clone runs git init but before it runs git fetch. And if you cloned with --reference, of course, the alternates file remains and further fetches will automatically use it. Aha, thanks, hadn't realized that. Just checked and yes, the alternates file is already set in all of these different copies. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html