Re: [PATCH] Alternate object pool mechanism updates.
On Tue, 16 Aug 2005, Junio C Hamano wrote: > Linus Torvalds <[EMAIL PROTECTED]> writes: > > > We've got a "git prune-packed", it would be good to have a "git > > prune-alternate" or something equivalent. > > If you have GIT_ALTERNATE_DIRECTORIES environment variable, git > prune-packed will remove objects from your repository if it is > found in somebody else's pack. I am not sure if this is the > behaviour we would want. Well, it may be good enough if the "master" repository is regularly packed.. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
Linus Torvalds <[EMAIL PROTECTED]> writes: > We've got a "git prune-packed", it would be good to have a "git > prune-alternate" or something equivalent. If you have GIT_ALTERNATE_DIRECTORIES environment variable, git prune-packed will remove objects from your repository if it is found in somebody else's pack. I am not sure if this is the behaviour we would want. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
Linus Torvalds <[EMAIL PROTECTED]> writes: > Btw, looking at the code, it strikes me that using ":" to separate the > alternate object directories in the file is rather strange. Yes, I admit it one was done in a quick and dirty way. Patches welcome [*1*] ;-) > Anyway, I don't think "alternates" is necessarily sensible as a "object" > information. Sure, it's about alternate objects, but the thing is, object > directories can be shared across many projects, but "alternates" to me > makes more sense as a per-project thing. Well, I have to think about this a bit more, but I have to say there were some thinking behind the way things are right now. $GIT_DIR/info describes properties of the repository. That's why refs, graft and rev-cache go there. $GIT_OBJECT_DIRECTORY/info describes the properties of the object pool (I am inventing words as I speak, but an object pool is a directory that can be combined with other object pools to make an object database). So object/info/packs talks about the packs in it, but not about packs it borrows from its alternates. The alternates file in question talks about what other object pools you need to consult to obtain the objects it refers to but it lacks itself. If two repositories share a particular object pool as its .git/objects directory, by symlinking .git/objects or by using GIT_OBJECT_DIRECTORY environment, it does not matter from which repository you look at this object pool. The set of objects it refers to but lacks itself, and from which other pools these objects can be obtained, do not depend on from which repository you are looking at it. While I agree with everything you said about "maybe logical but confusing", I have to disagree with you about this one. > What this all is leading up to is that I think we'd be better off with a > totally new "git config" file, in ".git/config", and we'd have all the > startup configuration there. I think what _is_ lacking is an easy way to have per repository configuration that can be shared among "opt-in" developers. The graft file naturally falls into this category, and probably the Porcelain standard .git/info/exclude file as well. Although we ended up doing .git/hooks, it is a per repository thing and logically it _could_ be moved to .git/info/hooks [*2*]. And that might also be a nice thing to share among "opt-in" developers. [Footnote] *1* Sorry I could not resist --- I always wanted to say this. *2* I do not think we _should_ move it under .git/info, though. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
On Tue, 16 Aug 2005, Linus Torvalds wrote: > Finally, I have to say that that "info" directory is confusing. Namely, > there's two of them - the "git info" and the "object info" directories are > totally different directories - maybe logical, but to me it smells like > "info" is here a code-name for "misc files that don't make sense anywhere > else". > > What this all is leading up to is that I think we'd be better off with a > totally new "git config" file, in ".git/config", and we'd have all the > startup configuration there. Including things like alternate object > directories, perhaps standard preferences for that particular repo, and > things like the "grafts" thing. > > Wouldn't that be nice? I'd originally proposed the .git/info directory because I keep multiple working trees for the same repository, by having symlinks for .git/objects and .git/refs, and I could also get other per-repository things to be shared properly without knowing exactly what they are if they're in a subdirectory of .git that could be a symlink. This would mean that a ".git/config" would be per-working-tree, like .git/index or .git/HEAD, not pre-repository like ".git/info/config". Of course, the core didn't have any thing to go in .git/info at the time, so it didn't really get tacked down. (I find it convenient to have mainline and my latest work both checked out for reference while I'm generating a series of commits for a patch set, and I don't want three different repositories which could be out of sync; this also keeps the repository safely out of pwd, since I have the actual repositories as ~/git/{project}.git/) -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
On Sun, 14 Aug 2005, Junio C Hamano wrote: > Linus Torvalds <[EMAIL PROTECTED]> writes: > > > I think this is great - especially for places like kernel.org, where a lot > > of repos end up being related to each other, yet independent. > > Yes. There is one shortcoming in the current git-clone -s in > the proposed updates branch. If the parent repository has > alternates on its own, that information should be copied to the > cloned one as well (e.g. Jeff has alternates pointing at you, > and I clone from Jeff with -s flag --- I should list not just > Jeff but also you to borrow from in my alternates file). Btw, looking at the code, it strikes me that using ":" to separate the alternate object directories in the file is rather strange. Maybe allow a different format for the file? Or at least allow '\n' as an alternate separator (but it would be nice to allow comments too). Finally, I have to say that that "info" directory is confusing. Namely, there's two of them - the "git info" and the "object info" directories are totally different directories - maybe logical, but to me it smells like "info" is here a code-name for "misc files that don't make sense anywhere else". Anyway, I don't think "alternates" is necessarily sensible as a "object" information. Sure, it's about alternate objects, but the thing is, object directories can be shared across many projects, but "alternates" to me makes more sense as a per-project thing. What this all is leading up to is that I think we'd be better off with a totally new "git config" file, in ".git/config", and we'd have all the startup configuration there. Including things like alternate object directories, perhaps standard preferences for that particular repo, and things like the "grafts" thing. Wouldn't that be nice? Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
Linus Torvalds <[EMAIL PROTECTED]> writes: > I think this is great - especially for places like kernel.org, where a lot > of repos end up being related to each other, yet independent. Yes. There is one shortcoming in the current git-clone -s in the proposed updates branch. If the parent repository has alternates on its own, that information should be copied to the cloned one as well (e.g. Jeff has alternates pointing at you, and I clone from Jeff with -s flag --- I should list not just Jeff but also you to borrow from in my alternates file). > However, exactly for places like kernel.org it would _also_ be nice if > there was some way to prune objects that have been merged back into the > parent. Yes. Another possibility is to use git-relink which was written exactly to solve this in a different way. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
On Sun, 14 Aug 2005, Junio C Hamano wrote: > > Ok, so the one in the proposed updates branch says > info/alternates. > > With this, your recent cg-clone -l can be made to still use > individual .git/object/??/ hierarchy to keep objects newly > created in each repository while sharing the inherited objects > from the parent repository, which would probably alleviate the > multi-user environment worries you express in the comments for > the option. The git-clone-script in the proposed updates branch > has such a change. I think this is great - especially for places like kernel.org, where a lot of repos end up being related to each other, yet independent. However, exactly for places like kernel.org it would _also_ be nice if there was some way to prune objects that have been merged back into the parent. In other words, imagine that people start using my kernel tree as their source of "alternate" objects, which works wonderfully well, but then as I pull from them, nothing ever removes the objects that are now duplicate. We've got a "git prune-packed", it would be good to have a "git prune-alternate" or something equivalent. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
Petr Baudis <[EMAIL PROTECTED]> writes: > What about calling it rather info/alternates (or info/alternate)? It > looks better, sounds better, is more namespace-ecological tab-completes > fine and you don't type it that often anyway. :-) Ok, so the one in the proposed updates branch says info/alternates. With this, your recent cg-clone -l can be made to still use individual .git/object/??/ hierarchy to keep objects newly created in each repository while sharing the inherited objects from the parent repository, which would probably alleviate the multi-user environment worries you express in the comments for the option. The git-clone-script in the proposed updates branch has such a change. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
Petr Baudis <[EMAIL PROTECTED]> writes: > What about calling it rather info/alternates (or info/alternate)? It > looks better, sounds better, is more namespace-ecological tab-completes > fine and you don't type it that often anyway. :-) Thanks for the suggestion. Will fix and keep it in the pu branch for now just in case somebody else suggests a name even better. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
Dear diary, on Sat, Aug 13, 2005 at 11:09:13AM CEST, I got a letter where Junio C Hamano <[EMAIL PROTECTED]> told me that... > It was a mistake to use GIT_ALTERNATE_OBJECT_DIRECTORIES > environment variable to specify what alternate object pools to > look for missing objects when working with an object database. > It is not a property of the process running the git commands, > but a property of the object database that is partial and needs > other object pools to complete the set of objects it lacks. > > This patch allows you to have $GIT_OBJECT_DIRECTORY/info/alt > file whose contents is in exactly the same format as the > environment variable, to let an object database name alternate > object pools it depends on. > > Signed-off-by: Junio C Hamano <[EMAIL PROTECTED]> What about calling it rather info/alternates (or info/alternate)? It looks better, sounds better, is more namespace-ecological tab-completes fine and you don't type it that often anyway. :-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ If you want the holes in your knowledge showing up try teaching someone. -- Alan Cox - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Alternate object pool mechanism updates.
It was a mistake to use GIT_ALTERNATE_OBJECT_DIRECTORIES environment variable to specify what alternate object pools to look for missing objects when working with an object database. It is not a property of the process running the git commands, but a property of the object database that is partial and needs other object pools to complete the set of objects it lacks. This patch allows you to have $GIT_OBJECT_DIRECTORY/info/alt file whose contents is in exactly the same format as the environment variable, to let an object database name alternate object pools it depends on. Signed-off-by: Junio C Hamano <[EMAIL PROTECTED]> --- cache.h |5 +- fsck-cache.c |8 ++- sha1_file.c | 146 -- 3 files changed, 88 insertions(+), 71 deletions(-) 8150a422f79cc461316052b52263289b851d4820 diff --git a/cache.h b/cache.h --- a/cache.h +++ b/cache.h @@ -278,9 +278,10 @@ struct checkout { extern int checkout_entry(struct cache_entry *ce, struct checkout *state); extern struct alternate_object_database { - char *base; + struct alternate_object_database *next; char *name; -} *alt_odb; + char base[0]; /* more */ +} *alt_odb_list; extern void prepare_alt_odb(void); extern struct packed_git { diff --git a/fsck-cache.c b/fsck-cache.c --- a/fsck-cache.c +++ b/fsck-cache.c @@ -456,13 +456,13 @@ int main(int argc, char **argv) fsck_head_link(); fsck_object_dir(get_object_directory()); if (check_full) { - int j; + struct alternate_object_database *alt; struct packed_git *p; prepare_alt_odb(); - for (j = 0; alt_odb[j].base; j++) { + for (alt = alt_odb_list; alt; alt = alt->next) { char namebuf[PATH_MAX]; - int namelen = alt_odb[j].name - alt_odb[j].base; - memcpy(namebuf, alt_odb[j].base, namelen); + int namelen = alt->name - alt->base; + memcpy(namebuf, alt->base, namelen); namebuf[namelen - 1] = 0; fsck_object_dir(namebuf); } diff --git a/sha1_file.c b/sha1_file.c --- a/sha1_file.c +++ b/sha1_file.c @@ -222,84 +222,100 @@ char *sha1_pack_index_name(const unsigne return base; } -struct alternate_object_database *alt_odb; +struct alternate_object_database *alt_odb_list; +static struct alternate_object_database **alt_odb_tail; /* * Prepare alternate object database registry. - * alt_odb points at an array of struct alternate_object_database. - * This array is terminated with an element that has both its base - * and name set to NULL. alt_odb[n] comes from n'th non-empty - * element from colon separated ALTERNATE_DB_ENVIRONMENT environment - * variable, and its base points at a statically allocated buffer - * that contains "/the/directory/corresponding/to/.git/objects/...", - * while its name points just after the slash at the end of - * ".git/objects/" in the example above, and has enough space to hold - * 40-byte hex SHA1, an extra slash for the first level indirection, - * and the terminating NUL. - * This function allocates the alt_odb array and all the strings - * pointed by base fields of the array elements with one xmalloc(); - * the string pool immediately follows the array. + * + * The variable alt_odb_list points at the list of struct + * alternate_object_database. The elements on this list come from + * non-empty elements from colon separated ALTERNATE_DB_ENVIRONMENT + * environment variable, and $GIT_OBJECT_DIRECTORY/info/alt file, + * whose contents is exactly in the same format as that environment + * variable. Its base points at a statically allocated buffer that + * contains "/the/directory/corresponding/to/.git/objects/...", while + * its name points just after the slash at the end of ".git/objects/" + * in the example above, and has enough space to hold 40-byte hex + * SHA1, an extra slash for the first level indirection, and the + * terminating NUL. */ -void prepare_alt_odb(void) +static void link_alt_odb_entries(const char *alt, const char *ep) { - int pass, totlen, i; const char *cp, *last; - char *op = NULL; - const char *alt = gitenv(ALTERNATE_DB_ENVIRONMENT) ? : ""; + struct alternate_object_database *ent; + + last = alt; + do { + for (cp = last; cp < ep && *cp != ':'; cp++) + ; + if (last != cp) { + /* 43 = 40-byte + 2 '/' + terminating NUL */ + int pfxlen = cp - last; + int entlen = pfxlen + 43; + + ent = xmalloc(sizeof(*ent) + entlen); + *alt_odb_tail = ent; + alt_odb_tail = &(ent->next); + ent->next = NULL; + + memcpy(ent->base, last, p