Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits
Responding to a few comments... On 2016-08-14 7:44 AM, Christian Couder wrote: multiple_commits) ... but here multiple_commits is the last argument. It would be better if it was more consistent. (Johannes made the same comment.) Yes. Will do. multiple_commits = (todo_list->next) != NULL; Why not "last_commit" instead of "multiple_commits"? Because it *isn't*. You can see that in pick_commits(), I set multiple_commits outside of the `for todo_list` loop. It is not re-evaluated at every iteration of the loop. As per my comment when emailing the patch "I intentionally print the '--continue' hint even in the case where it's last of n commits that fails. " I think it makes much more sense that "this is the message you always get when cherry-picking multiple commits as opposed to "this is the message you sometimes get, except when it's the last one". (Yes, the careful observer will realize that if when cherry-picking multiple commits, there are conflicts in the second-last and last then the --continue from the second-last will result in multiple_commits being set to 0. I can live with that.) On 2016-08-16 4:44 AM, Remi Galan Alfonso wrote: Hi Stephen, Stephen Morton writes: +if (multiple_commits) + advise(_("after resolving the conflicts, mark the corrected paths with 'git add ' or 'git rm '\n" +"then continue with 'git %s --continue'\n" +"or cancel with 'git %s --abort'" ), action_name(opts), action_name(opts)); +else +advise(_("after resolving the conflicts, mark the corrected paths\n" +"with 'git add ' or 'git rm '\n" +"and commit the result with 'git commit'")); In both cases (multiple_commits or not), the beginning of the advise is nearly the same, with only a '\n' in the middle being the difference: multiple_commits: "after resolving the conflicts, mark the corrected paths with 'git add ' or 'git rm '\n" !multiple_commits: "after resolving the conflicts, mark the corrected paths\n with 'git add ' or 'git rm '\n" ~~~^ In 'multiple_commits' case the advise is more than 80 characters long, did you forget the '\n' in that case? A previous comment had indicated that having 4 lines was too many. And I tend to agree. So I tried to squash it into 3. Back in xterm days, 80 characters was sacrosanct, but is it really a big deal to exceed it now? On 2016-08-14 7:44 AM, Christian Couder wrote: ...but please try to send a real patch. There is https://github.com/git/git/blob/master/Documentation/SubmittingPatches and also SubmitGit that can help you do that. Agreed. I just want to send a patch that stands a reasonable chance of getting accepted. Stephen -- Stephen Morton, 7750 SR Product Group, SW Development Tools/DevOps w: +1-613-784-6026 (int: 2-825-6026) m: +1-613-302-2589 | EST Time Zone -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits
On Mon, Aug 1, 2016 at 5:12 AM, Johannes Schindelin wrote: > Hi Stephen, > > On Wed, 27 Jul 2016, Stephen Morton wrote: > >> On Wed, Jul 27, 2016 at 11:03 AM, Johannes Schindelin >> wrote: >> > >> > On Wed, 27 Jul 2016, Stephen Morton wrote: >> > >> >> diff --git a/sequencer.c b/sequencer.c >> >> index cdfac82..ce06876 100644 >> >> --- a/sequencer.c >> >> +++ b/sequencer.c >> >> @@ -176,7 +176,8 @@ static void print_advice(int show_hint, struct >> >> replay_opts *opts) >> >> else >> >> advise(_("after resolving the conflicts, mark >> >> the corrected paths\n" >> >> "with 'git add ' or 'git rm >> >> '\n" >> >> -"and commit the result with 'git >> >> commit'")); >> >> +"then continue the %s with 'git %s >> >> --continue'\n" >> >> +"or cancel the %s operation with 'git >> >> %s --abort'" ), action_name(opts), action_name(opts), >> >> action_name(opts), action_name(opts)); >> > >> > That is an awful lot of repetition right there, with an added >> > inconsistency that the action is referred to by its name alone in the >> > "--continue" case, but with "operation" added in the "--abort" case. >> > >> > And additionally, in the most common case (one commit to cherry-pick), the >> > advice now suggests a more complicated operation than necessary: a simply >> > `git commit` would be enough, then. >> > >> > Can't we have a test whether this is the last of the commits to be >> > cherry-picked, and if so, have the simpler advice again? >> >> Ok, knowing that I'm not on the last element of the sequencer is >> beyond my git code knowledge. > > Oh, my mistake: I meant to say that this information could be easily > provided by `pick_commits()` if it passed it to `print_advice()` via > `do_pick_commit()`. > > Ciao, > Johannes (Finally getting back to this.) Something like this, then Johannes? (I intentionally print the '--continue' hint even in the case where it's last of n commits that fails.) ~/ws/extern/git (maint *%>) > git diff diff --git a/sequencer.c b/sequencer.c index 617a3df..e0071aa 100644 --- a/sequencer.c +++ b/sequencer.c @@ -154,7 +154,7 @@ static void free_message(struct commit *commit, struct commit_message *msg) unuse_commit_buffer(commit, msg->message); } -static void print_advice(int show_hint, struct replay_opts *opts) +static void print_advice(int show_hint, int multiple_commits, struct replay_opts *opts) { char *msg = getenv("GIT_CHERRY_PICK_HELP"); @@ -174,14 +174,14 @@ static void print_advice(int show_hint, struct replay_opts *opts) advise(_("after resolving the conflicts, mark the corrected paths\n" "with 'git add ' or 'git rm '")); else -if (! file_exists(git_path_seq_dir())) -advise(_("after resolving the conflicts, mark the corrected paths\n" -"with 'git add ' or 'git rm '\n" - "and commit the result with 'git commit'")); -else +if (multiple_commits) advise(_("after resolving the conflicts, mark the corrected paths with 'git add ' or 'git rm '\n" "then continue with 'git %s --continue'\n" "or cancel with 'git %s --abort'" ), action_name(opts), action_name(opts)); +else +advise(_("after resolving the conflicts, mark the corrected paths\n" +"with 'git add ' or 'git rm '\n" +"and commit the result with 'git commit'")); } } @@ -445,7 +445,7 @@ static int allow_empty(struct replay_opts *opts, struct commit *commit) return 1; } -static int do_pick_commit(struct commit *commit, struct replay_opts *opts) +static int do_pick_commit(struct commit *commit, struct replay_opts *opts, i
Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits
On Mon, Aug 1, 2016 at 5:12 AM, Johannes Schindelin wrote: Hi Stephen, On Wed, 27 Jul 2016, Stephen Morton wrote: On Wed, Jul 27, 2016 at 11:03 AM, Johannes Schindelin wrote: > > On Wed, 27 Jul 2016, Stephen Morton wrote: > >> diff --git a/sequencer.c b/sequencer.c >> index cdfac82..ce06876 100644 >> --- a/sequencer.c >> +++ b/sequencer.c >> @@ -176,7 +176,8 @@ static void print_advice(int show_hint, struct >> replay_opts *opts) >> else >> advise(_("after resolving the conflicts, mark >> the corrected paths\n" >> "with 'git add ' or 'git rm '\n" >> - "and commit the result with 'git commit'")); >> + "then continue the %s with 'git %s >> --continue'\n" >> + "or cancel the %s operation with 'git >> %s --abort'" ), action_name(opts), action_name(opts), >> action_name(opts), action_name(opts)); > > That is an awful lot of repetition right there, with an added > inconsistency that the action is referred to by its name alone in the > "--continue" case, but with "operation" added in the "--abort" case. > > And additionally, in the most common case (one commit to cherry-pick), the > advice now suggests a more complicated operation than necessary: a simply > `git commit` would be enough, then. > > Can't we have a test whether this is the last of the commits to be > cherry-picked, and if so, have the simpler advice again? Ok, knowing that I'm not on the last element of the sequencer is beyond my git code knowledge. Oh, my mistake: I meant to say that this information could be easily provided by `pick_commits()` if it passed it to `print_advice()` via `do_pick_commit()`. Ciao, Johannes Formatting on previous email was terrible, plus the diff wasn't performed against origin. Re-sending. (Finally getting back to this.) Something like the diff below, then Johannes? (I intentionally print the '--continue' hint even in the case whereit's last of n commits that fails.) Stephen ~/ws/extern/git (maint *%>) > git diff @{u} diff --git a/sequencer.c b/sequencer.c index c6362d6..e0071aa 100644 --- a/sequencer.c +++ b/sequencer.c @@ -154,7 +154,7 @@ static void free_message(struct commit *commit, struct commit_message *msg) unuse_commit_buffer(commit, msg->message); } -static void print_advice(int show_hint, struct replay_opts *opts) +static void print_advice(int show_hint, int multiple_commits, struct replay_opts *opts) { char *msg = getenv("GIT_CHERRY_PICK_HELP"); @@ -174,9 +174,14 @@ static void print_advice(int show_hint, struct replay_opts *opts) advise(_("after resolving the conflicts, mark the corrected paths\n" "with 'git add ' or 'git rm '")); else - advise(_("after resolving the conflicts, mark the corrected paths\n" -"with 'git add ' or 'git rm '\n" -"and commit the result with 'git commit'")); +if (multiple_commits) + advise(_("after resolving the conflicts, mark the corrected paths with 'git add ' or 'git rm '\n" +"then continue with 'git %s --continue'\n" +"or cancel with 'git %s --abort'" ), action_name(opts), action_name(opts)); +else +advise(_("after resolving the conflicts, mark the corrected paths\n" +"with 'git add ' or 'git rm '\n" +"and commit the result with 'git commit'")); } } @@ -440,7 +445,7 @@ static int allow_empty(struct replay_opts *opts, struct commit *commit) return 1; } -static int do_pick_commit(struct commit *commit, struct replay_opts *opts) +static int do_pick_commit(struct commit *commit, struct replay_opts *opts, int multiple_commits) { unsigned char head[20]; struct commit *base, *next, *parent; @@ -595,7 +600,7 @@ static int do_pick_commit(struct commit *commit, struct replay_opts *opts) : _("could not apply %s... %s"), find_unique_abbrev(commit->object.oid.hash, DEFAULT_ABBREV),
Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran
On 2016-08-03 3:54 PM, Junio C Hamano wrote: Jeff King writes: I agree it would be a good property to have. I think it's hard to do atomically, though. Certainly we can wait to tell the other side "your push has been recorded" until after the hook is run. But we would already have updated the refs locally at that point (and we must -- that is part of the interface to the post-receive hooks, that the refs are already in place). So would we roll-back the ref update then? Even that suffers from power failures, etc. So I'm not sure if making it truly atomic is all the feasible. As long as the requirement is that post- hook must see the updated ref in place, I do not think it is feasible to give "the hook always runs once" guarantee, without cooperation by other parts of the flow (think: pulling the power at an arbitrary point in the process). A receiving repository can implement it all in the userland, I would think, though: * A pre-receive hook records the intention to update a ref (from what old commit to what new commit), and does not return until that record is securely in a database; * A post-receive hook checks the entry in the database above (it _must_ find one), and atomically does its thing and marks the entry "done"; * A separate sweeper scans the database for entries not yet marked as "done", sees if the ref has been already updated, and atomically does its thing and marks the entry "done" (the same can be done as part of a post-receive for previously pushed commit that pre-receive recorded but did not manage to run post-receive before the power was pulled or the user did \C-c). As you originally described, the non-atomicity is not new; as long as we have necessary hooks in place on the git-core side for repositories that want a stronger guarantee, I do not think there is any more thing we need to do on this topic. If we can narrow the window in a non-intrusive way, that would be a good thing to do, though. I certainly understand not being able to make it atomic when faced with say "pulling the power at an arbitrary point in the process". That seems to me almost along the lines of disaster recovery contingency plans. But could we not guarantee that if there is no problem on the receiving end, that "IF a commit is received and the ref updated, THEN the post-receive hook is guaranteed to run". The not-so-uncommon situation where I saw this was where a user had second-thoughts and hit Ctrl-C in the middle of a push. The push went through --the ref was updated-- but the post-receive hooks were not run. Steve -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran
On 2016-07-25 6:22 PM, Jeff King wrote: On Mon, Jul 25, 2016 at 12:34:04PM +0200, Jan Smets wrote: I have always assumed the post-receive hook to be executed whenever a commit is "accepted" by the (gitolite) server. That does not seem to be true any more. Generally, yeah, I would expect that to be the case, too. Since 9658846 is appears that, when a client bails out, the pre-receive hook continues to run and the commit is written to the repository, but no post-receive hook is executed. No signal of any kind is received in the hook, not even a sig pipe when the post- hook is writing to stdout whilst the client has disconnected. I see. The problem is that cmd_receive_pack() does this: execute_commands(commands, unpack_status, &si); if (pack_lockfile) unlink_or_warn(pack_lockfile); if (report_status) report(commands, unpack_status); run_receive_hook(commands, "post-receive", 1); run_update_post_hook(commands); It reports the status to the client, and _then_ runs the post-receive hook. But if that reporting fails (either because of an error, or if we just get SIGPIPE because the client hung up), then we don't actually run the hooks. Leaving 9658846 out of it entirely, it is always going to be racy whether we notice that the client hung up during the pre-receive step. E.g.: - your pre-receive might not write any output, so the muxer has nothing to write to the other side, and we never notice that the connection closed until we write the status out in report() - if NO_PTHREADS is set, the sideband muxer runs in a sub-process, not a sub-thread. And thus we don't know of a hangup except by checking the result of finish_async(), which we never do. - the client could hang up just _after_ we've written the pre-receive output, but before report() is called, so there's nothing to notice until we're in report() So I think 9658846 just made that race a bit longer, because it means that a write error in the sideband muxer during the pre-receive hook means we return an error via finish_async() rather than unceremoniously calling exit() from a sub-thread. So we have a longer period during which we might actually finish off execute_commands() but not make it out of report(). And the real solution is to make cmd_receive_pack() more robust, and try harder to run the hooks even when the client hangs up or we have some other reporting error (because getting data back to the user is only one use of post-receive hooks; they are also used to queue jobs or do maintenance). But that's a bit tricky, as it requires report() to ignore SIGPIPE, and to stop using write_or_die() or any other functions that can exit (some of which happen at a lower level). Plus if a client does hangup, we don't want our hook to die with SIGPIPE either, so we'd want to proxy the data into /dev/null. -Peff Sounds tricky. I do think it's important, almost a 'data integrity' issue, that IF a commit is received, THEN the post-receive hook must be run. Too much mission-critical stuff is based on post-receive hooks. The alternatives, as I see them --either document that the post-receive hook cannot be fully trusted and that all such uses must change to asynchronous polling, or otherwise just say that nobody should hit Ctrl-C during a push (not even reflexively when their lizard-brain says "Woops, no!") and hope that network issues don't cause the same thing-- are simply not realistic. Stephen -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits
On Wed, Jul 27, 2016 at 11:03 AM, Johannes Schindelin wrote: > Hi Stephen, > > On Wed, 27 Jul 2016, Stephen Morton wrote: > >> Here is my patch then. (Personally, I would add some capitalization and >> punctuation, but I didn't see much of that in the existing code.) I'm >> not a regular pull-requester, do I do that, or can somebody else handle >> that for me? > > The process of the patch submission is described in > Documentation/SubmittingPatches (yes, it is a bit involved, and it is > slightly easier when you use http://submitgit.herokuapp.com/, but please > note that this process has served us well over one decade). > > Please also note that top-posting is highly discouraged on this list: > > A: Because it messes up the order in which people normally read text. >>Q: Why is top-posting such a bad thing? >>>A: Top-posting. >>>>Q: What is the most annoying thing in e-mail? > > Now to your patch: > >> diff --git a/sequencer.c b/sequencer.c >> index cdfac82..ce06876 100644 >> --- a/sequencer.c >> +++ b/sequencer.c >> @@ -176,7 +176,8 @@ static void print_advice(int show_hint, struct >> replay_opts *opts) >> else >> advise(_("after resolving the conflicts, mark >> the corrected paths\n" >> "with 'git add ' or 'git rm >> '\n" >> -"and commit the result with 'git commit'")); >> +"then continue the %s with 'git %s >> --continue'\n" >> +"or cancel the %s operation with 'git >> %s --abort'" ), action_name(opts), action_name(opts), >> action_name(opts), action_name(opts)); > > That is an awful lot of repetition right there, with an added > inconsistency that the action is referred to by its name alone in the > "--continue" case, but with "operation" added in the "--abort" case. > > And additionally, in the most common case (one commit to cherry-pick), the > advice now suggests a more complicated operation than necessary: a simply > `git commit` would be enough, then. > > Can't we have a test whether this is the last of the commits to be > cherry-picked, and if so, have the simpler advice again? > > Ciao, > Johannes Ok, knowing that I'm not on the last element of the sequencer is beyond my git code knowledge. I see that in do_pick_commit() , we do not have a copy of the todo_list. I would assume that would be necessary, but I'm not certain. I could file_exists(git_path_seq_dir()). This works to determine if one or many commits are being cherry-picked / reverted, although it will return true even on the last of n cherry-picks. I think that is still reasonable. I was trying to just take the same text as 'git status' already displays. It could indeed be made more concise. Happy to use the submission process, I just didn't know it. Thanks for letting me know. (Yup, sorry about the top-posting. I just wan't careful.) Stephen -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits
Here is my patch then. (Personally, I would add some capitalization and punctuation, but I didn't see much of that in the existing code.) I'm not a regular pull-requester, do I do that, or can somebody else handle that for me? diff --git a/sequencer.c b/sequencer.c index cdfac82..ce06876 100644 --- a/sequencer.c +++ b/sequencer.c @@ -176,7 +176,8 @@ static void print_advice(int show_hint, struct replay_opts *opts) else advise(_("after resolving the conflicts, mark the corrected paths\n" "with 'git add ' or 'git rm '\n" -"and commit the result with 'git commit'")); +"then continue the %s with 'git %s --continue'\n" +"or cancel the %s operation with 'git %s --abort'" ), action_name(opts), action_name(opts), action_name(opts), action_name(opts)); } } Stephen On Tue, Jul 26, 2016 at 4:44 PM, Stephen Morton wrote: > On Tue, Jul 26, 2016 at 4:30 PM, Jeff King wrote: >> On Tue, Jul 26, 2016 at 01:18:55PM -0700, Stefan Beller wrote: >> >>> > Would it be possible to expand the hint message to tell users to run >>> > 'git cherry-pick --continue' >>> >>> Instead of expanding I'd go for replacing? >>> >>> I'd say the user is tempted for 2 choices, >>> a) aborting (for various reasons) >>> b) fix and continue. >> >> Yeah, I'd agree with this. >> >> I think that advice comes from a time when you could only cherry-pick a >> single commit. These days you can do several in a single run, and that's >> why "git cherry-pick --continue" was invented. >> >> So I think we would need to make sure that the "cherry-pick --continue" >> advice applies in both cases (and that we do not need to give different >> advice depending on whether we are in a single or multiple cherry-pick). >> >> I did some basic tests and it _seems_ to work to use --continue in >> either case. Probably due to 093a309 (revert: allow cherry-pick >> --continue to commit before resuming, 2011-12-10), but I didn't dig. >> >> -Peff > > The 'git status' text for a rebase/am/cherry-pick is > > fix conflicts and then run "git --continue" > use "git --skip" to skip this patch" > use "git --abort" to cancel the operation > > (The --cancel text varies a bit actually, but that's the gist of it.) > > The rebase/cherry-pick conflict case should really indicate how to > mark the conflict as resolved as that's the specific situation the > user is in. I don't know if there are guidelines to hint line length, > or how many actions should be on one line but if the above text was > changed to have this as the "fix" text, possibly over two lines, I > think that would do it. > > fix conflicts with 'git add ' or 'git rm '" and then > run "git --continue" > > Stephen -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits
On Tue, Jul 26, 2016 at 4:30 PM, Jeff King wrote: > On Tue, Jul 26, 2016 at 01:18:55PM -0700, Stefan Beller wrote: > >> > Would it be possible to expand the hint message to tell users to run >> > 'git cherry-pick --continue' >> >> Instead of expanding I'd go for replacing? >> >> I'd say the user is tempted for 2 choices, >> a) aborting (for various reasons) >> b) fix and continue. > > Yeah, I'd agree with this. > > I think that advice comes from a time when you could only cherry-pick a > single commit. These days you can do several in a single run, and that's > why "git cherry-pick --continue" was invented. > > So I think we would need to make sure that the "cherry-pick --continue" > advice applies in both cases (and that we do not need to give different > advice depending on whether we are in a single or multiple cherry-pick). > > I did some basic tests and it _seems_ to work to use --continue in > either case. Probably due to 093a309 (revert: allow cherry-pick > --continue to commit before resuming, 2011-12-10), but I didn't dig. > > -Peff The 'git status' text for a rebase/am/cherry-pick is fix conflicts and then run "git --continue" use "git --skip" to skip this patch" use "git --abort" to cancel the operation (The --cancel text varies a bit actually, but that's the gist of it.) The rebase/cherry-pick conflict case should really indicate how to mark the conflict as resolved as that's the specific situation the user is in. I don't know if there are guidelines to hint line length, or how many actions should be on one line but if the above text was changed to have this as the "fix" text, possibly over two lines, I think that would do it. fix conflicts with 'git add ' or 'git rm '" and then run "git --continue" Stephen -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git cherry-pick conflict error message is deceptive when cherry-picking multiple commits
When I cherry-pick n commits and one of the first (n-1), fails, what I should do is resolve the conflict, 'git add' it, and then run 'git cherry-pick --continue'. However git advises me to error: could not apply d0fb756... Commit message for test commit hint: after resolving the conflicts, mark the corrected paths hint: with 'git add ' or 'git rm ' hint: and commit the result with 'git commit' I suspect this is not just a little bit deceptive but actually a bit destructive, as doing a 'git commit' removes some of the sequencing information. If I do a 'git commit', while there is .git/sequencer information and 'git cherry-pick --continue' will work, there is no indication in 'git status' that a cherry-pick is in progress. It would be extremely easy for somebody to follow cherry-pick's hints by running 'git commit' and think that they were done, not realizing that there were m more commits remaining to be cherry-picked. Would it be possible to expand the hint message to tell users to run 'git cherry-pick --continue' This patch is *not* meant as a serious patch for submission --I'm sure it's all wrong-- it's just a proof of concept and showing some goodwill on my part that I'm trying to help and I'm willing to put in some work if I can be pointed more in the right direction. Regards, Stephen diff --git a/sequencer.c b/sequencer.c index cdfac82..b8fa534 100644 --- a/sequencer.c +++ b/sequencer.c @@ -171,14 +171,21 @@ static void print_advice(int show_hint, struct replay_opts *opts) if (show_hint) { if (opts->no_commit) advise(_("after resolving the conflicts, mark the corrected paths\n" "with 'git add ' or 'git rm '")); - else - advise(_("after resolving the conflicts, mark the corrected paths\n" -"with 'git add ' or 'git rm '\n" -"and commit the result with 'git commit'")); + else if (! file_exists(git_path_seq_dir())) { + advise(_("SCM: after resolving the conflicts, mark the corrected paths\n" + "with 'git add ' or 'git rm '\n" + "and commit the result with 'git commit'")); +} +else +{ +advise(_("SCM: after resolving the conflicts, mark the corrected paths\n" + "with 'git add ' or 'git rm '\n" + "and continue the %s with 'git %s --continue'"), action_name(opts), action_name(opts)); +} } } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git smudge filter fails
On Thu, Mar 10, 2016 at 5:04 PM, Junio C Hamano wrote: > Jeff King writes: > >> On Thu, Mar 10, 2016 at 09:45:19AM -0500, Stephen Morton wrote: >> >>> I am a bit confused because this is basically the example used in >>> ProGit [1] and it is fundamentally broken. In fact, if I understand >>> correctly, this means that smudge filters cannot be relied upon to >>> provide any 'keyword expansion' type tasks because they will all by >>> nature have to query the file with 'git log'. >> >> Interesting. Perhaps I am missing something (I am far from an expert in >> clean/smudge filters, which I do not generally use myself), but the >> example in ProGit looks kind of bogus to me. I don't think it ever would >> have worked reliably, under any version of git. >> >>> (Note that although in my example I used 'git checkout', with an only >>> slightly more complicated example I can make it fail on 'git pull' >>> which is perhaps a much more realistic use case. That was probably >>> implied in your answer, I just wanted to mention it.) >> >> Yeah, I think the issue is basically the same for several commands which >> update the worktree and the HEAD. Most of them are going to do the >> worktree first. > > You can have a pair of branches A and B that have forked long time > ago, and have a path F that has been changed identically since they > forked, perhaps by cherry-picking the same change. This happens all > the time. > > If there were some mechanism that modifies the checked out version > of F with information that depends on the history that leads to A > (e.g. "which commit that is an ancestor of A last modified F?") > when you check out branch A, it will become invalid when you do "git > checkout B", because the path F will not change because they are the > same between the branches. In short, CVS $Id$-style substitutions > that depend on the history fundamentally does not work, unless you > are willing to always rewrite the whole working tree every time you > switch branches. > > The smudge and clean filters are given _only_ the blob contents and > nothing else, not "which commit (or tree) the blob is taken from", > not "which path this blob sits in that tree-ish", not "what branch > am I on" and this is a very much deliberate design decision made in > order to avoid leading people to a misguided attempt to mimick CVS > $Id$-style substitutions. > I will raise an Issue with ProGit. It's perhaps beyond the scope of my original question, but for situations where I need a "last change date" embedded in a file (e.g. because a protocol standard requires it), is there any recommended way to do so? We've the hard way that hardcoding makes merging/cherry-picking a bit of a nightmare and should be avoided. Is a post-checkout hook the way to go? I've actually found the smudge filter to be very slow for this application as each file is processed in series; a post-commit hook that could operate on files in parallel would likely be substantially faster. Stephen (Sorry about the earlier top-posting. I didn't realize what gmail was doing until after it had happened.) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Sample pre-push hook can crash
Thanks for the information Junio. It is just interesting that although the pre-push hook receives the remote_sha value from the server, it does not get 'git merge-base $remote_sha $local_sha' which is what a check that iterated over all outgoing commits would really need. (I'm sure this is a simple-minded assessment. I don't pretend to have spent even an order of magnitude less time working on git than you have. I'm not trying to be foolishly arrogant.) I do think it would be worth replacing the existing example with yours because the existing example will crash anytime somebody's workspace is up to date. Stephen On Fri, Mar 11, 2016 at 3:04 PM, Junio C Hamano wrote: > Stephen Morton writes: > >> That is interesting, so in the case of a non-ff push, there is no way >> for a pre-push hook to know what is being pushed in order to run? > > If you were up-to-date from the other side once: > > ---A---B---C > > and built one new commit on top: > > ---A---B---C---D > > when you attempt to push it out, there are various possibilities. > > An ff situation is simple. They didn't do anything, so the hook > gets "we'd be updating their ref from C to D", and "rev-list C..D" > would tell you that you would need to make sure D is sane. > > If your push does not fast-forward, that would mean something has > happened on the other side while you were working on D. Perhaps > they accepted another commit that you haven't seen: > > ---A---B---C---E > > and because you haven't fetched from them, even though the hook may > say "we'd be updating their ref from E to D", you haven't heard of > E, you do not know where it would be relative to the history you > have: > > E??? > > ---A---B---C---D > > Or perhaps they themselves rewound their history and they now have > this E at the tip: > > ---A---B---C > \ > E > > But again, because you do not yet know where E is relative to your > history, you cannot say C is where you can stop checking your side > of the history. > > Or perhaps they somehow obtained your D without you pushing into > them (e.g. you pushed to the "next" tree and they fixed your commit > and the result was merged there) and have something like this: > > ---A---B---C---D---E > > In this case, D is not even a new commit from their point of view, > and updating their tip E with your old D would lose the fixup E they > made for D, but again, you do not know what E is yet, you cannot say > "they have this already, so there is no check I need to do". > > In summary, you cannot even say which ones you have need to be > checked. > > If you _are_ using @{u} tracking, then you would know at least they > used to have up to C in their repository to limit your check, but > you cannot unconditionally say ref@{u}.. without making sure ref@{u} > makes sense in the first place. > > An obvious alternative for the sample script would be to instead let > the non-ff push pass the pre-push check (as you may have other > arrangements to forbid non-ff pushes) without actually checking > anything. As this sample script is to serve as a sample, I think > such an advanced consideration of loosening checks depending on the > situation should be left to the end users. > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Sample pre-push hook can crash
That is interesting, so in the case of a non-ff push, there is no way for a pre-push hook to know what is being pushed in order to run? Steve On Thu, Mar 10, 2016 at 4:43 PM, Junio C Hamano wrote: > Stephen Morton writes: > >> The sample pre-push hook provided with git [1] will crash if the local >> repo is not up to date with the remote as $remote_sha is not present >> in the local repo. I'm not sure if this patch is exactly correct, it's >> just provided as an example. >> >> Given that people are likely crafting their own solutions based on the >> examples, it's probably good to get right. > > It's probably good to get right, but I do not think use of @{u} is > making it right, unfortunately. You may not necessarily have @{u} > configured, and you may not even pushing to the configured remote > branch. > > The spirit of the sample hook, I think, is to validate the new > commits you are publishing, so if you cannot even determine which > ones are new and which ones are not, failing the "push" by exiting > with non-zero status is the right behaviour for this sample. > > So perhaps something like this may be more appropriate as an > example. > > templates/hooks--pre-push.sample | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/templates/hooks--pre-push.sample > b/templates/hooks--pre-push.sample > index 6187dbf..7ef6780 100755 > --- a/templates/hooks--pre-push.sample > +++ b/templates/hooks--pre-push.sample > @@ -41,7 +41,12 @@ do > fi > > # Check for WIP commit > - commit=`git rev-list -n 1 --grep '^WIP' "$range"` > + commit=`git rev-list -n 1 --grep '^WIP' "$range"` || { > + # we do not even know about the range... > + echo >&2 "Non-ff update to $remote_ref" > + echo >&2 "fetch from there first" > + exit 1 > + } > if [ -n "$commit" ] > then > echo >&2 "Found WIP commit in $local_ref, not pushing" -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sample pre-push hook can crash
The sample pre-push hook provided with git [1] will crash if the local repo is not up to date with the remote as $remote_sha is not present in the local repo. I'm not sure if this patch is exactly correct, it's just provided as an example. Given that people are likely crafting their own solutions based on the examples, it's probably good to get right. diff --git a/templates/hooks--pre-push.sample b/templates/hooks--pre-push.sample index 6187dbf..99ed984 100755 --- a/templates/hooks--pre-push.sample +++ b/templates/hooks--pre-push.sample @@ -36,9 +36,9 @@ do # New branch, examine all commits range="$local_sha" else # Update to existing branch, examine new commits - range="$remote_sha..$local_sha" + range="@{u}..$local_sha" fi # Check for WIP commit commit=`git rev-list -n 1 --grep '^WIP' "$range"` (This is just something I noticed and thought you might be interested in, but is not important to me. I actually do care about the smudge filter issue.) Stephen [1] https://github.com/git/git/blob/master/templates/hooks--pre-push.sample -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git smudge filter fails
I am a bit confused because this is basically the example used in ProGit [1] and it is fundamentally broken. In fact, if I understand correctly, this means that smudge filters cannot be relied upon to provide any 'keyword expansion' type tasks because they will all by nature have to query the file with 'git log'. (Note that although in my example I used 'git checkout', with an only slightly more complicated example I can make it fail on 'git pull' which is perhaps a much more realistic use case. That was probably implied in your answer, I just wanted to mention it.) Steve [1] https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes On Wed, Mar 9, 2016 at 8:59 PM, Jeff King wrote: > On Wed, Mar 09, 2016 at 01:29:31PM -0500, Stephen Morton wrote: > >> git config --local filter.dater.smudge 'myDate=`git log >> --pretty=format:"%cd" --date=iso -1 -- %f`; sed -e >> "s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"' > > Your filter is running "git log" without a revision parameter, which > means it is looking at HEAD. > > And here > >> git checkout no_foo >> git checkout master >> cat foo.c >> #observe keyword expansion lost > > You are expecting this second one to do: > > 1. Switch HEAD to "master". > > 2. Checkout files which need updating. Looking at HEAD in your filter > then examines "master", and you see the commit timestamp of the > destination. > > But that isn't how it is implemented. Checkout will handle the file > checkout _first_, as that is the part that is likely to run into > problems (e.g., rejecting a switch because it would lose changes in the > working tree). Only at the very end, after the change to the working > tree has succeeded, do we update HEAD. > > I think the order you are expecting is conceptually cleaner, but I don't > think we would want to switch it around (for reasons of efficiency and > robustness). And I don't think we would want to make a promise about the > ordering to callers either way, as it binds our implementation. > > So is there a way to reliably know the destination of a checkout? My > first thought was that we could add a placeholder similar to "%f" that > your filter could use. I'm not sure how we would handle the corner cases > there, though (e.g., do we always have a "destination" to report? If > not, what do we give the script?). > > I suspect you could also hack something together with a post-checkout > script, though it would probably be a lot less efficient (and might also > have some weird corner cases). > > -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git smudge filter fails
A git smudge filter, at least one that relies on the results from 'git log' does not seem to work on file A when doing a 'git update' from a revision where file A doesn't exist to a revision where it does exist. Below is a simple recipe to reproduce. This appears to me to be a bug. If not, why is it expected and is there anything I can do to work around this behaviour? Steve mkdir git_test cd git_test/ git init . touch bar.c git add . git commit -am "Initial commit. foo.c not here yet." git tag no_foo touch foo.c git add . git commit -am "Add foo, no content" echo 'Date is $Date$' >> foo.c git commit -am "Add date to foo.c" echo 'foo.c filter=dater' > .git/info/attributes git config --local filter.dater.smudge 'myDate=`git log --pretty=format:"%cd" --date=iso -1 -- %f`; sed -e "s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"' git config --local filter.dater.clean 'sed -e "s/\(\\$\)Date[^\\$]*\\$/\1Date\\$/g"' rm -f foo.c git checkout -- foo.c cat foo.c # observe keyword expansion git checkout no_foo git checkout master cat foo.c #observe keyword expansion lost -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow git pushes: sitting 1 minute in pack-objects
On Mon, Mar 16, 2015 at 6:15 PM, Jeff King wrote: > On Mon, Mar 09, 2015 at 09:37:25PM -0400, Stephen Morton wrote: > >> 3. Not sure how long this part takes. It takes 1/3 - 1/2 of the time >> when straced, but I think it's much less, as little as 10s when not >> straced. >> It then reads a bunch of what look like objects from filehandle 0 >> (presumably stdin, read from the remote end?) >> It then tries to lstat() these filenames in .git/ >> ./git/refs/, .git/heads/, etc. It always fails ENOENT. >> It fails some 120,000 times. This could be a problem. Though I haven't >> checked to see if this happens on a fast push on another machine. > > Hmm. The "push" process must feed the set of object boundaries to > "pack-objects" so it knows what to pack (i.e., what we want to send, and > what the other side has). > > 120,000 is an awfully large number of objects to be pass there, though. > Does the repo you are pushing to by any chance have an extremely large > number of refs (e.g., on the order of 120K tags)? No. There are on the order of 9,500 refs (mostly tags) but nowhere near 120k. > > Can you try building git with this patch which would tell us more about > where those objects are coming from? > Those are all rather blunt debugging methods, but hopefully it can get > us a sense of where the time is going. > > -Peff Thanks Peff, I haven't tried your patch, but I tried to backtrack a bit and double-check that the problem always happened in a fresh clone with no other modifications. It did _not_ happen in a new clone --a push took just 5s -- and I think the culprit could have been "repack.writebitmaps=true". Although I had thought writebitmaps was not originally enabled, I now suspect that it was. Let me follow up on that first, before I recompile git with your changes. Thanks again, I really appreciate the help. Steve -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow git pushes: sitting 1 minute in pack-objects
Thanks Peff, I've done an strace and here's what I see. I'll try to put relevant information in as legible a form as possible. The operation is cpu-bound on a single core (note that yes, delta compression is using 8 threads. so that's obviously not the bottleneck) for the duration of the pack-objects. Timing is tough, because it's slower when straced. I don't know if it matters, but this push is going to a fork on the remote end most of whose objects are in the objects/info/alternatives location. (Which is still on the local drive of the remote, not a network drive or anything like that.) 1. > GIT_TRACE=1 time time git push 12:21:28.980410 git.c:349 trace: built-in: git 'push' 12:21:29.099547 run-command.c:341 trace: run_command: 'ssh' '-p' '7999' 'git@server.privacy' 'git-receive-pack '\''~smorton/panos.git'\''' 12:21:42.863968 run-command.c:341 trace: run_command: 'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' 12:21:42.871170 exec_cmd.c:134 trace: exec: 'git' 'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' 12:21:42.907783 git.c:349 trace: built-in: git 'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' 2. At this point, "git pack-objects" is forked 3. Not sure how long this part takes. It takes 1/3 - 1/2 of the time when straced, but I think it's much less, as little as 10s when not straced. It then reads a bunch of what look like objects from filehandle 0 (presumably stdin, read from the remote end?) It then tries to lstat() these filenames in .git/ ./git/refs/, .git/heads/, etc. It always fails ENOENT. It fails some 120,000 times. This could be a problem. Though I haven't checked to see if this happens on a fast push on another machine. This takes a lot of time when straced, but I think less time when not straced. 4. Then it just sits there for almost 1 minute. It uses up almost 100% of a single core during this period. It spends a lot of time running lots of brk() (memory allocation) commands and then getting (polled by?) a SIGALRM every 1s. About 5.5+ GB (about the repo size) of VIRT memory is allocated. Only about 400M is ever RES. 5. Then we get to the "Counting objects" phase. Note the number of threads specified: so it Counting objects: 4, done. Delta compression using up to 8 threads. Compressing objects: 100% (4/4), done. ... 6. Time. Note the 121,000 pagefaults 46.08user 0.67system 0:49.47elapsed 94%CPU (0avgtext+0avgdata 1720240maxresident)k 0inputs+16outputs (0major+121199minor)pagefaults 0swaps Steve On Mon, Mar 9, 2015 at 3:53 AM, Jeff King wrote: > On Thu, Mar 05, 2015 at 04:03:07PM -0500, Stephen Morton wrote: > >> I'm experiencing very slow git pushes. On the order of 1 minute to push a >> trivial one-line change. When I set GIT_TRACE=1, I see that it seems to be >> taking a lot of time in the pack-objects phase. > > Can you tell what pack-objects is doing? Perhaps "strace -f -tt" might > help. Or even just "time git push", which will tell us whether we're > spinning on CPU or something else. > > If you watch the progress meter, where does it spend its time? In > "counting objects", "compressing objects", or "writing objects"? Or > after that? > > You said the repo is big. Is it reasonably packed (try running `git > gc`). Pack-objects may look at objects that you know the other side has > in order to create deltas against them. If you have some crazy situation > (e.g., you have several thousand packfiles) that can slow down object > access quite a bit. A minute is more than I would expect, but if VFS > operations in your VM are slow, that could be it. > > -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow git pushes: sitting 1 minute in pack-objects
8GB of RAM. Sorry, typo. Steve On Thu, Mar 5, 2015 at 7:25 PM, Duy Nguyen wrote: > On Fri, Mar 6, 2015 at 4:03 AM, Stephen Morton > wrote: >> I'm experiencing very slow git pushes. On the order of 1 minute to push a >> trivial one-line change. When I set GIT_TRACE=1, I see that it seems to be >> taking a lot of time in the pack-objects phase. >> >> Others are not seeing this with the same repo, but I'm the only one working >> in a VM. >> >> ... >> >> Details: >> git version 2.1.4 >> OS: CentOS 6.6 64-bit in a VM. >> repo size: huge. 6 GB .git directory, around 800 MB working tree. >> VM has 8 MB RAM and 8 cores. > > Is it 8 GB or MB RAM? > >> CPU: i7, 8 core (4 cores hyperthreaded) > -- > Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Slow git pushes: sitting 1 minute in pack-objects
(Apologies, after a day I'm cross-posting from git.users. I think the question is maybe too technical for that group.) I'm experiencing very slow git pushes. On the order of 1 minute to push a trivial one-line change. When I set GIT_TRACE=1, I see that it seems to be taking a lot of time in the pack-objects phase. Others are not seeing this with the same repo, but I'm the only one working in a VM. ``` ~/ws/git/repo.1/repo > date; git push mortons; date Wed Mar 4 15:03:11 EST 2015 15:03:11.086758 git.c:349 trace: built-in: git 'push' 'mortons' 15:03:11.126665 run-command.c:341 trace: run_command: 'ssh' '-p' '7999' 'git@privacy.privacy' 'git-receive-pack '\''~mortons/repo.git'\''' 15:03:20.383341 run-command.c:341 trace: run_command: 'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' 15:03:20.383945 exec_cmd.c:134 trace: exec: 'git' 'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' 15:03:20.385168 git.c:349 trace: built-in: git 'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' Counting objects: 4, done. Delta compression using up to 8 threads. Compressing objects: 100% (4/4), done. Writing objects: 100% (4/4), 20.86 KiB | 0 bytes/s, done. Total 4 (delta 0), reused 0 (delta 0) To ssh://git@privacy.privacy:7999/~mortons/repo.git 5fe662f..a137bda my_branch -> my_branch Wed Mar 4 15:04:22 EST 2015 ``` After it was slow at first, I tried setting these which did not help repack.writebitmaps=true pack.windowmemory=100m Details: git version 2.1.4 OS: CentOS 6.6 64-bit in a VM. repo size: huge. 6 GB .git directory, around 800 MB working tree. VM has 8 MB RAM and 8 cores. CPU: i7, 8 core (4 cores hyperthreaded) It is an ext4 filesystem on the guest linux drive. On the host side, it is a .vmdk file and the virtualization software used is virtualbox. While the drive is dynamically allocated, after I ran into this issue, I used fallocate to create a 50GB dummy file and then delete it to ensure that there was headroom in the drive and that dynamic allocation slowness was not the issue, and subsequent pushes were still slow. I have not experienced any filesystem slowness issues in the months I've been using this vm. Any ideas? I'm evaluating a move to git and this is the kind of thing that could derail it. Thanks, Stephen -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git Scaling: What factors most affect Git performance for a large repo?
This is fantastic. I really appreciate all the answers. And it's great that I think I've sparked some general discussion that could lead somewhere too. Notes: I'm currently using 2.1.3. I'll move to 2.3.x I'm experimenting with git-annex to reduce repo size on disk. We'll see. I could remove all tags older than /n/ years old in the active repo and just maintain them in the historical repo. (We have quite a lot of CI-generated tags.) It sounds like that might improve performance. Questions: 1. Ævar : I'm a bit concerned by your statement that git rebases take about 1-2 s per commit. Does that mean that a "git pull --rebase", if it is picking up say 120 commits (not at all unrealistic), could potentially take 4 minutes to complete? Or have I misinterpreted your comment. 2. I'd not heard about bitmap indexes before this thread but it sounds like they should help me. In limited searching I can't find much useful documentation about them. It is also not clear to me if I have to explicitly run "git repack --write-bitmap-indexes" or if git will automatically detect when they're needed; first experiments seem to indicate that I need to explicitly generate them. I assume that once the index is there, git will just use it automatically. Steve On Thu, Feb 19, 2015 at 7:03 PM, brian m. carlson wrote: > On Thu, Feb 19, 2015 at 04:26:58PM -0500, Stephen Morton wrote: >> I posted this to comp.version-control.git.user and didn't get any response. I >> think the question is plumbing-related enough that I can ask it here. >> >> I'm evaluating the feasibility of moving my team from SVN to git. We have a >> very >> large repo. [1] We will have a central repo using GitLab (or similar) that >> everybody works with. Forks, code sharing, pull requests etc. will be done >> through this central server. >> >> By 'performance', I guess I mean speed of day to day operations for devs. >> >>* (Obviously, trivially, a (non-local) clone will be slow with a large >> repo.) >>* Will a few simultaneous clones from the central server also slow down >> other concurrent operations for other users? > > This hasn't been a problem for us at $DAYJOB. Git doesn't lock anything > on fetches, so each process is independent. We probably have about > sixty developers (and maybe twenty other occasional users) that manage > to interact with our Git server all day long. We also have probably > twenty smoker (CI) systems pulling at two hour intervals, or, when > there's nothing to do, every two minutes, plus probably fifteen to > twenty build systems pulling hourly. > > I assume you will provide adequate resources for your server. > >>* Will 'git pull' be slow? >>* 'git push'? > > The most pathological case I've seen for git push is a branch with a > single commit merged into the main development branch. As of Git 2.3.0, > the performance regression here is fixed. > > Obviously, the speed of your network connection will affect this. Even > at 30 MB/s, cloning several gigabytes of data takes time. Git tries > hard to eliminate sending a lot of data, so if your developers keep > reasonably up-to-date, the cost of establishing the connection will tend > to dominate. > > I see pull and push times that are less than 2 seconds in most cases. > >>* 'git commit'? (It is listed as slow in reference [3].) >>* 'git stautus'? (Slow again in reference 3 though I don't see it.) > > These can be slow with slow disks or over remote file systems. I > recommend not doing that. I've heard rumbles that disk performance is > better on Unix, but I don't use Windows so I can't say. > > You should keep your .gitignore files up-to-date to avoid enumerating > untracked files. There's some work towards making this less of an > issue. > > git blame can be somewhat slow, but it's not something I use more than > about once a day, so it doesn't bother me that much. > >> Assuming I can put lots of resources into a central server with lots of CPU, >> RAM, fast SSD, fast networking, what aspects of the repo are most likely to >> affect devs' experience? >>* Number of commits >>* Sheer disk space occupied by the repo > > The number of files can impact performance due to the number of stat()s > required. > >>* Number of tags. >>* Number of branches. > > The number of tags and branches individually is really less relevant > than the total number of refs (tags, branches, remote branches, etc). > Very large numbers of refs can impact performance on pushe
Re: Git Scaling: What factors most affect Git performance for a large repo?
On Thu, Feb 19, 2015 at 5:21 PM, Stefan Beller wrote: > On Thu, Feb 19, 2015 at 1:26 PM, Stephen Morton > wrote: >> I posted this to comp.version-control.git.user and didn't get any response. I >> think the question is plumbing-related enough that I can ask it here. >> >> I'm evaluating the feasibility of moving my team from SVN to git. We have a >> very >> large repo. [1] >> >> [1] (Yes, I'm investigating ways to make our repo not so large etc. That's >> beyond the scope of the discussion I'd like to have with this >> question. Thanks.) > > What do you mean by large? > * lots of files > * large files > * or even large binary files (bad to diff/merge) > * long history (i.e. lots of small changes) > * impactful history (changes which rewrite nearly everything from scratch) > > For reference, the linux > * has 48414 files, in 3128 directories > * the largest file is 1.1M, the whole repo is 600M > * no really large binary files > * more than 500051 changes/commits including merges > * started in 2004 (when git was invented essentially) > * the .git folder is 1.4G compared to the 600M files, >indicating it may have been rewritting 3 times (well this >metric is bogus, there is lots of compression >going on in .git) > > and linux seems to be doing ok with git. > > So as long as you cannot pinpoint your question on what you are exactly > concerned about, there will be no helpful answer I guess. > > linux is by no means a really large project, there are other projects way > larger than that (I am thinking about the KDE project for example) > and they do fine as well. > > Thanks, > Stefan Hi Stefan, I think I addressed most of this in my original post with the paragraph "Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB repo, 50k tags, 1,000 branches. (Due to historical code fixups, another 5,000 "fix-up branches" which are just one little dangling commit required to change the code a little bit between a commit and a tag that was not quite made from it.)" To that I'd add 25k files, no major rewrites, no huge binary files, but lots of a few MB binary files with many revisions. But even without details of my specific concerns, I thought that perhaps the git developers know what limits git's performance even if large projects like the kernel are not hitting these limits. Steve -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Git Scaling: What factors most affect Git performance for a large repo?
I posted this to comp.version-control.git.user and didn't get any response. I think the question is plumbing-related enough that I can ask it here. I'm evaluating the feasibility of moving my team from SVN to git. We have a very large repo. [1] We will have a central repo using GitLab (or similar) that everybody works with. Forks, code sharing, pull requests etc. will be done through this central server. By 'performance', I guess I mean speed of day to day operations for devs. * (Obviously, trivially, a (non-local) clone will be slow with a large repo.) * Will a few simultaneous clones from the central server also slow down other concurrent operations for other users? * Will 'git pull' be slow? * 'git push'? * 'git commit'? (It is listed as slow in reference [3].) * 'git stautus'? (Slow again in reference 3 though I don't see it.) * Some operations might not seem to be day-to-day but if they are called frequently by the web front-end to GitLab/Stash/GitHub etc then they can become bottlenecks. (e.g. 'git branch --contains' seems terribly adversely affected by large numbers of branches.) * Others? Assuming I can put lots of resources into a central server with lots of CPU, RAM, fast SSD, fast networking, what aspects of the repo are most likely to affect devs' experience? * Number of commits * Sheer disk space occupied by the repo * Number of tags. * Number of branches. * Binary objects in the repo that cause it to bloat in size [1] * Other factors? Of the various HW items listed above --CPU speed, number of cores, RAM, SSD, networking-- which is most critical here? (Stash recommends 1.5 x repo_size x number of concurrent clones of available RAM. I assume that is good advice in general.) Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB repo, 50k tags, 1,000 branches. (Due to historical code fixups, another 5,000 "fix-up branches" which are just one little dangling commit required to change the code a little bit between a commit a tag that was not quite made from it.) While there's lots of information online, much of it is old [3] and with git constantly evolving I don't know how valid it still is. Then there's anecdotal evidence that is of questionable value.[2] Are many/all of the issues Facebook identified [3] resolved? (Yes, I understand Facebook went with Mercurial. But I imagine the git team nevertheless took their analysis to heart.) Thanks, Steve [1] (Yes, I'm investigating ways to make our repo not so large etc. That's beyond the scope of the discussion I'd like to have with this question. Thanks.) [2] The large amounts of anecdotal evidence relate to the "why don't you try it yourself?" response to my question. I will I I have to but setting up a properly methodical study is time consuming and difficult --I don't want to produce poor anecdotal numbers that don't really hold up-- and if somebody's already done the work, then I should leverage it. [3] http://thread.gmane.org/gmane.comp.version-control.git/189776 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html