from:"\"Stephen Morton\""

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

2016-08-17 Thread Stephen Morton


Responding to a few comments...

On 2016-08-14 7:44 AM, Christian Couder wrote:

multiple_commits)
... but here multiple_commits is the last argument.
It would be better if it was more consistent.

(Johannes made the same comment.)
Yes. Will do.




multiple_commits = (todo_list->next) != NULL;
Why not "last_commit" instead of "multiple_commits"?



Because it *isn't*. You can see that in pick_commits(), I set 
multiple_commits outside of the `for todo_list` loop. It is not 
re-evaluated at every iteration of the loop. As per my comment when 
emailing the patch "I intentionally print the '--continue' hint even in 
the case where it's last of n commits that fails. " I think it makes 
much more sense that "this is the message you always get when 
cherry-picking multiple commits as opposed to "this is the message you 
sometimes get, except when it's the last one". (Yes, the careful 
observer will realize that if when cherry-picking multiple commits, 
there are conflicts in the second-last and last then the --continue  
from the second-last will result in multiple_commits being set to 0. I 
can live with that.)



On 2016-08-16 4:44 AM, Remi Galan Alfonso wrote:

Hi Stephen,

Stephen Morton  writes:

+if  (multiple_commits)
+   advise(_("after resolving the conflicts,
mark the corrected paths with 'git add ' or 'git rm '\n"
+"then continue with 'git %s
--continue'\n"
+"or cancel with 'git %s
--abort'" ), action_name(opts), action_name(opts));
+else
+advise(_("after resolving the
conflicts, mark the corrected paths\n"
+"with 'git add ' or 'git
rm '\n"
+"and commit the result with
'git commit'"));

In both cases (multiple_commits or not), the beginning of the advise
is nearly the same, with only a '\n' in the middle being the
difference:

multiple_commits:
  "after resolving the conflicts, mark the corrected paths with 'git
  add ' or 'git rm '\n"

!multiple_commits:
  "after resolving the conflicts, mark the corrected paths\n with 'git
  add ' or 'git rm '\n"
   ~~~^

In 'multiple_commits' case the advise is more than 80 characters long,
did you forget the '\n' in that case?
A previous comment had indicated that having 4 lines was too many. And I 
tend to agree. So I tried to squash it into 3. Back in xterm days, 80 
characters was sacrosanct, but is it really a big deal to exceed it now?



On 2016-08-14 7:44 AM, Christian Couder wrote:

...but please try to send a real patch.

There is https://github.com/git/git/blob/master/Documentation/SubmittingPatches
and also SubmitGit that can help you do that.
Agreed. I just want to send a patch that stands a reasonable chance of 
getting accepted.


Stephen


--
Stephen Morton, 7750 SR Product Group, SW Development Tools/DevOps
w: +1-613-784-6026 (int: 2-825-6026) m: +1-613-302-2589 | EST Time Zone

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

2016-08-10 Thread Stephen Morton

On Mon, Aug 1, 2016 at 5:12 AM, Johannes Schindelin
 wrote:
> Hi Stephen,
>
> On Wed, 27 Jul 2016, Stephen Morton wrote:
>
>> On Wed, Jul 27, 2016 at 11:03 AM, Johannes Schindelin
>>  wrote:
>> >
>> > On Wed, 27 Jul 2016, Stephen Morton wrote:
>> >
>> >> diff --git a/sequencer.c b/sequencer.c
>> >> index cdfac82..ce06876 100644
>> >> --- a/sequencer.c
>> >> +++ b/sequencer.c
>> >> @@ -176,7 +176,8 @@ static void print_advice(int show_hint, struct
>> >> replay_opts *opts)
>> >> else
>> >> advise(_("after resolving the conflicts, mark
>> >> the corrected paths\n"
>> >>  "with 'git add ' or 'git rm 
>> >> '\n"
>> >> -"and commit the result with 'git 
>> >> commit'"));
>> >> +"then continue the %s with 'git %s
>> >> --continue'\n"
>> >> +"or cancel the %s operation with 'git
>> >> %s --abort'" ),  action_name(opts), action_name(opts),
>> >> action_name(opts), action_name(opts));
>> >
>> > That is an awful lot of repetition right there, with an added
>> > inconsistency that the action is referred to by its name alone in the
>> > "--continue" case, but with "operation" added in the "--abort" case.
>> >
>> > And additionally, in the most common case (one commit to cherry-pick), the
>> > advice now suggests a more complicated operation than necessary: a simply
>> > `git commit` would be enough, then.
>> >
>> > Can't we have a test whether this is the last of the commits to be
>> > cherry-picked, and if so, have the simpler advice again?
>>
>> Ok, knowing that I'm not on the last element of the sequencer is
>> beyond my git code knowledge.
>
> Oh, my mistake: I meant to say that this information could be easily
> provided by `pick_commits()` if it passed it to `print_advice()` via
> `do_pick_commit()`.
>
> Ciao,
> Johannes

(Finally getting back to this.)
Something like this, then Johannes?
(I intentionally print the '--continue' hint even in the case where
it's last of n commits that fails.)

~/ws/extern/git (maint *%>) > git diff
diff --git a/sequencer.c b/sequencer.c
index 617a3df..e0071aa 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -154,7 +154,7 @@ static void free_message(struct commit *commit,
struct commit_message *msg)
unuse_commit_buffer(commit, msg->message);
 }

-static void print_advice(int show_hint, struct replay_opts *opts)
+static void print_advice(int show_hint, int multiple_commits, struct
replay_opts *opts)
 {
char *msg = getenv("GIT_CHERRY_PICK_HELP");

@@ -174,14 +174,14 @@ static void print_advice(int show_hint, struct
replay_opts *opts)
advise(_("after resolving the conflicts, mark
the corrected paths\n"
 "with 'git add ' or 'git rm '"));
else
-if  (! file_exists(git_path_seq_dir()))
-advise(_("after resolving the
conflicts, mark the corrected paths\n"
-"with 'git add ' or
'git rm '\n"
- "and commit the
result with 'git commit'"));
-else
+if  (multiple_commits)
advise(_("after resolving the
conflicts, mark the corrected paths with 'git add ' or 'git rm
'\n"
 "then continue with 'git %s
--continue'\n"
 "or cancel with 'git %s
--abort'" ), action_name(opts), action_name(opts));
+else
+advise(_("after resolving the
conflicts, mark the corrected paths\n"
+"with 'git add ' or
'git rm '\n"
+"and commit the result with
'git commit'"));
}
 }

@@ -445,7 +445,7 @@ static int allow_empty(struct replay_opts *opts,
struct commit *commit)
return 1;
 }

-static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
+static int do_pick_commit(struct commit *commit, struct replay_opts
*opts, i

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

2016-08-10 Thread Stephen Morton

On Mon, Aug 1, 2016 at 5:12 AM, Johannes Schindelin 
 wrote:

Hi Stephen,

On Wed, 27 Jul  2016, Stephen Morton wrote:


On Wed, Jul  27, 2016 at 11:03 AM, Johannes Schindelin
  wrote:
>
> On Wed,  27 Jul 2016, Stephen Morton wrote:
>
>>  diff --git a/sequencer.c b/sequencer.c
>>  index cdfac82..ce06876 100644
>> ---  a/sequencer.c
>> +++  b/sequencer.c
>> @@  -176,7 +176,8 @@ static void print_advice(int show_hint, struct
>>  replay_opts *opts)
>> else
>> advise(_("after resolving the conflicts, mark
>> the  corrected paths\n"
>>  "with 'git add ' or 'git rm '\n"
>> -  "and commit the result with 'git commit'"));
>> +  "then continue the %s with 'git %s
>>  --continue'\n"
>> +  "or cancel the %s operation with 'git
>> %s  --abort'" ),  action_name(opts), action_name(opts),
>>  action_name(opts), action_name(opts));
>
> That is  an awful lot of repetition right there, with an added
>  inconsistency that the action is referred to by its name alone in the
>  "--continue" case, but with "operation" added in the "--abort" case.
>
> And  additionally, in the most common case (one commit to cherry-pick), the
> advice  now suggests a more complicated operation than necessary: a simply
> `git  commit` would be enough, then.
>
> Can't  we have a test whether this is the last of the commits to be
>  cherry-picked, and if so, have the simpler advice again?

Ok, knowing  that I'm not on the last element of the sequencer is
beyond my  git code knowledge.


Oh, my mistake:  I meant to say that this information could be easily
provided by  `pick_commits()` if it passed it to `print_advice()` via
 `do_pick_commit()`.

Ciao,
Johannes


Formatting on previous email was terrible, plus the diff wasn't 
performed against origin. Re-sending.


(Finally getting back to this.)

Something like the diff below, then Johannes?

(I intentionally print the '--continue' hint even in the case whereit's 
last of n commits that fails.)



Stephen

~/ws/extern/git (maint *%>) > git diff @{u}
diff --git a/sequencer.c b/sequencer.c
index c6362d6..e0071aa 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -154,7 +154,7 @@ static void free_message(struct commit *commit, 
struct commit_message *msg)

unuse_commit_buffer(commit, msg->message);
 }

-static void print_advice(int show_hint, struct replay_opts *opts)
+static void print_advice(int show_hint, int multiple_commits, struct 
replay_opts *opts)

 {
char *msg = getenv("GIT_CHERRY_PICK_HELP");

@@ -174,9 +174,14 @@ static void print_advice(int show_hint, struct 
replay_opts *opts)
advise(_("after resolving the conflicts, mark 
the corrected paths\n"
 "with 'git add ' or 'git rm 
'"));

else
-   advise(_("after resolving the conflicts, mark 
the corrected paths\n"
-"with 'git add ' or 'git rm 
'\n"
-"and commit the result with 'git 
commit'"));

+if  (multiple_commits)
+   advise(_("after resolving the conflicts, 
mark the corrected paths with 'git add ' or 'git rm '\n"
+"then continue with 'git %s 
--continue'\n"
+"or cancel with 'git %s 
--abort'" ), action_name(opts), action_name(opts));

+else
+advise(_("after resolving the 
conflicts, mark the corrected paths\n"
+"with 'git add ' or 'git 
rm '\n"
+"and commit the result with 
'git commit'"));

}
 }

@@ -440,7 +445,7 @@ static int allow_empty(struct replay_opts *opts, 
struct commit *commit)

return 1;
 }

-static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
+static int do_pick_commit(struct commit *commit, struct replay_opts 
*opts, int multiple_commits)

 {
unsigned char head[20];
struct commit *base, *next, *parent;
@@ -595,7 +600,7 @@ static int do_pick_commit(struct commit *commit, 
struct replay_opts *opts)

  : _("could not apply %s... %s"),
  find_unique_abbrev(commit->object.oid.hash, 
DEFAULT_ABBREV),

Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran

2016-08-03 Thread Stephen Morton


On 2016-08-03 3:54 PM, Junio C Hamano wrote:

Jeff King  writes:


I agree it would be a good property to have. I think it's hard to do
atomically, though. Certainly we can wait to tell the other side "your
push has been recorded" until after the hook is run. But we would
already have updated the refs locally at that point (and we must -- that
is part of the interface to the post-receive hooks, that the refs are
already in place). So would we roll-back the ref update then? Even that
suffers from power failures, etc.

So I'm not sure if making it truly atomic is all the feasible.

As long as the requirement is that post- hook must see the updated
ref in place, I do not think it is feasible to give "the hook always
runs once" guarantee, without cooperation by other parts of the flow
(think: pulling the power at an arbitrary point in the process).

A receiving repository can implement it all in the userland, I would
think, though:

  * A pre-receive hook records the intention to update a ref (from
what old commit to what new commit), and does not return until
that record is securely in a database;

  * A post-receive hook checks the entry in the database above (it
_must_ find one), and atomically does its thing and marks the
entry "done";

  * A separate sweeper scans the database for entries not yet marked
as "done", sees if the ref has been already updated, and
atomically does its thing and marks the entry "done" (the same
can be done as part of a post-receive for previously pushed
commit that pre-receive recorded but did not manage to run
post-receive before the power was pulled or the user did \C-c).

As you originally described, the non-atomicity is not new; as long
as we have necessary hooks in place on the git-core side for
repositories that want a stronger guarantee, I do not think there is
any more thing we need to do on this topic.  If we can narrow the
window in a non-intrusive way, that would be a good thing to do,
though.



I certainly understand not being able to make it atomic when faced with 
say "pulling the power at an arbitrary point in the process". That seems 
to me almost along the lines of disaster recovery contingency plans. But 
could we not guarantee that if there is no problem on the receiving end, 
that "IF a commit is received and the ref updated, THEN the post-receive 
hook is guaranteed to run".


The not-so-uncommon situation where I saw this was where a user had 
second-thoughts and hit Ctrl-C in the middle of a push. The push went 
through --the ref was updated-- but the post-receive hooks were not run.


Steve



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran

2016-08-02 Thread Stephen Morton


On 2016-07-25 6:22 PM, Jeff King wrote:

On Mon, Jul 25, 2016 at 12:34:04PM +0200, Jan Smets wrote:


I have always assumed the post-receive hook to be executed whenever a commit
is "accepted" by the (gitolite) server. That does not seem to be true any
more.

Generally, yeah, I would expect that to be the case, too.


Since 9658846 is appears that, when a client bails out, the pre-receive hook
continues to run and the commit is written to the repository, but no
post-receive hook is executed. No signal of any kind is received in the
hook, not even a sig pipe when the post- hook is writing to stdout whilst
the client has disconnected.

I see. The problem is that cmd_receive_pack() does this:

 execute_commands(commands, unpack_status, &si);
 if (pack_lockfile)
 unlink_or_warn(pack_lockfile);
 if (report_status)
 report(commands, unpack_status);
 run_receive_hook(commands, "post-receive", 1);
 run_update_post_hook(commands);

It reports the status to the client, and _then_ runs the post-receive
hook. But if that reporting fails (either because of an error, or if we
just get SIGPIPE because the client hung up), then we don't actually run
the hooks.

Leaving 9658846 out of it entirely, it is always going to be racy
whether we notice that the client hung up during the pre-receive step.
E.g.:

   - your pre-receive might not write any output, so the muxer has
 nothing to write to the other side, and we never notice that the
 connection closed until we write the status out in report()

   - if NO_PTHREADS is set, the sideband muxer runs in a sub-process, not
 a sub-thread. And thus we don't know of a hangup except by checking
 the result of finish_async(), which we never do.

   - the client could hang up just _after_ we've written the pre-receive
 output, but before report() is called, so there's nothing to notice
 until we're in report()

So I think 9658846 just made that race a bit longer, because it means
that a write error in the sideband muxer during the pre-receive hook
means we return an error via finish_async() rather than unceremoniously
calling exit() from a sub-thread. So we have a longer period during
which we might actually finish off execute_commands() but not make it
out of report().

And the real solution is to make cmd_receive_pack() more robust, and try
harder to run the hooks even when the client hangs up or we have some
other reporting error (because getting data back to the user is only one
use of post-receive hooks; they are also used to queue jobs or do
maintenance).

But that's a bit tricky, as it requires report() to ignore SIGPIPE, and
to stop using write_or_die() or any other functions that can exit (some
of which happen at a lower level). Plus if a client does hangup, we
don't want our hook to die with SIGPIPE either, so we'd want to proxy
the data into /dev/null.

-Peff


Sounds tricky. I do think it's important, almost a 'data integrity' 
issue, that IF a commit is received, THEN the post-receive hook must be 
run. Too much mission-critical stuff is based on post-receive hooks.


The alternatives, as I see them --either document that the post-receive 
hook cannot be fully trusted and that all such uses must change to 
asynchronous polling, or otherwise just say that nobody should hit 
Ctrl-C during a push (not even reflexively when their lizard-brain says 
"Woops, no!") and hope that network issues don't cause the same thing-- 
are simply not realistic.


Stephen



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

2016-07-27 Thread Stephen Morton

On Wed, Jul 27, 2016 at 11:03 AM, Johannes Schindelin
 wrote:
> Hi Stephen,
>
> On Wed, 27 Jul 2016, Stephen Morton wrote:
>
>> Here is my patch then. (Personally, I would add some capitalization and
>> punctuation, but I didn't see much of that in the existing code.) I'm
>> not a regular pull-requester, do I do that, or can somebody else handle
>> that for me?
>
> The process of the patch submission is described in
> Documentation/SubmittingPatches (yes, it is a bit involved, and it is
> slightly easier when you use http://submitgit.herokuapp.com/, but please
> note that this process has served us well over one decade).
>
> Please also note that top-posting is highly discouraged on this list:
>
> A: Because it messes up the order in which people normally read text.
>>Q: Why is top-posting such a bad thing?
>>>A: Top-posting.
>>>>Q: What is the most annoying thing in e-mail?
>
> Now to your patch:
>
>> diff --git a/sequencer.c b/sequencer.c
>> index cdfac82..ce06876 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -176,7 +176,8 @@ static void print_advice(int show_hint, struct
>> replay_opts *opts)
>> else
>> advise(_("after resolving the conflicts, mark
>> the corrected paths\n"
>>  "with 'git add ' or 'git rm 
>> '\n"
>> -"and commit the result with 'git commit'"));
>> +"then continue the %s with 'git %s
>> --continue'\n"
>> +"or cancel the %s operation with 'git
>> %s --abort'" ),  action_name(opts), action_name(opts),
>> action_name(opts), action_name(opts));
>
> That is an awful lot of repetition right there, with an added
> inconsistency that the action is referred to by its name alone in the
> "--continue" case, but with "operation" added in the "--abort" case.
>
> And additionally, in the most common case (one commit to cherry-pick), the
> advice now suggests a more complicated operation than necessary: a simply
> `git commit` would be enough, then.
>
> Can't we have a test whether this is the last of the commits to be
> cherry-picked, and if so, have the simpler advice again?
>
> Ciao,
> Johannes

Ok, knowing that I'm not on the last element of the sequencer is
beyond my git code knowledge. I see that in do_pick_commit() , we do
not have a copy of the todo_list. I would assume that would be
necessary, but I'm not certain. I could
file_exists(git_path_seq_dir()). This works to determine if one or
many commits are being cherry-picked / reverted, although it will
return true even on the last of n cherry-picks. I think that is still
reasonable.

I was trying to just take the same text as 'git status' already
displays. It could indeed be made more concise.

Happy to use the submission process, I just didn't know it. Thanks for
letting me know.

(Yup, sorry about the top-posting. I just wan't careful.)

Stephen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

2016-07-27 Thread Stephen Morton

Here is my patch then. (Personally, I would add some capitalization
and punctuation, but I didn't see much of that in the existing code.)
I'm not a regular pull-requester, do I do that, or can somebody else
handle that for me?


diff --git a/sequencer.c b/sequencer.c
index cdfac82..ce06876 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -176,7 +176,8 @@ static void print_advice(int show_hint, struct
replay_opts *opts)
else
advise(_("after resolving the conflicts, mark
the corrected paths\n"
 "with 'git add ' or 'git rm '\n"
-"and commit the result with 'git commit'"));
+"then continue the %s with 'git %s
--continue'\n"
+"or cancel the %s operation with 'git
%s --abort'" ),  action_name(opts), action_name(opts),
action_name(opts), action_name(opts));
}
 }


Stephen



On Tue, Jul 26, 2016 at 4:44 PM, Stephen Morton
 wrote:
> On Tue, Jul 26, 2016 at 4:30 PM, Jeff King  wrote:
>> On Tue, Jul 26, 2016 at 01:18:55PM -0700, Stefan Beller wrote:
>>
>>> > Would it be possible to expand the hint message to tell users to run
>>> > 'git cherry-pick --continue'
>>>
>>> Instead of expanding I'd go for replacing?
>>>
>>> I'd say the user is tempted for 2 choices,
>>> a) aborting (for various reasons)
>>> b) fix and continue.
>>
>> Yeah, I'd agree with this.
>>
>> I think that advice comes from a time when you could only cherry-pick a
>> single commit. These days you can do several in a single run, and that's
>> why "git cherry-pick --continue" was invented.
>>
>> So I think we would need to make sure that the "cherry-pick --continue"
>> advice applies in both cases (and that we do not need to give different
>> advice depending on whether we are in a single or multiple cherry-pick).
>>
>> I did some basic tests and it _seems_ to work to use --continue in
>> either case. Probably due to 093a309 (revert: allow cherry-pick
>> --continue to commit before resuming, 2011-12-10), but I didn't dig.
>>
>> -Peff
>
> The 'git status' text for a rebase/am/cherry-pick is
>
> fix conflicts and then run "git  --continue"
> use "git  --skip" to skip this patch"
> use "git  --abort" to cancel the  operation
>
> (The --cancel text varies a bit actually, but that's the gist of it.)
>
> The rebase/cherry-pick conflict case should really indicate how to
> mark the conflict as resolved as that's the specific situation the
> user is in. I don't know if there are guidelines to hint line length,
> or how many actions should be on one line but if the above text was
> changed to have this as the "fix" text, possibly over two lines, I
> think that would do it.
>
> fix conflicts with 'git add ' or 'git rm '" and then
> run "git  --continue"
>
> Stephen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

2016-07-26 Thread Stephen Morton

On Tue, Jul 26, 2016 at 4:30 PM, Jeff King  wrote:
> On Tue, Jul 26, 2016 at 01:18:55PM -0700, Stefan Beller wrote:
>
>> > Would it be possible to expand the hint message to tell users to run
>> > 'git cherry-pick --continue'
>>
>> Instead of expanding I'd go for replacing?
>>
>> I'd say the user is tempted for 2 choices,
>> a) aborting (for various reasons)
>> b) fix and continue.
>
> Yeah, I'd agree with this.
>
> I think that advice comes from a time when you could only cherry-pick a
> single commit. These days you can do several in a single run, and that's
> why "git cherry-pick --continue" was invented.
>
> So I think we would need to make sure that the "cherry-pick --continue"
> advice applies in both cases (and that we do not need to give different
> advice depending on whether we are in a single or multiple cherry-pick).
>
> I did some basic tests and it _seems_ to work to use --continue in
> either case. Probably due to 093a309 (revert: allow cherry-pick
> --continue to commit before resuming, 2011-12-10), but I didn't dig.
>
> -Peff

The 'git status' text for a rebase/am/cherry-pick is

fix conflicts and then run "git  --continue"
use "git  --skip" to skip this patch"
use "git  --abort" to cancel the  operation

(The --cancel text varies a bit actually, but that's the gist of it.)

The rebase/cherry-pick conflict case should really indicate how to
mark the conflict as resolved as that's the specific situation the
user is in. I don't know if there are guidelines to hint line length,
or how many actions should be on one line but if the above text was
changed to have this as the "fix" text, possibly over two lines, I
think that would do it.

fix conflicts with 'git add ' or 'git rm '" and then
run "git  --continue"

Stephen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

2016-07-26 Thread Stephen Morton

When I cherry-pick n commits and one of the first (n-1), fails, what I
should do is resolve the conflict, 'git add' it, and then run 'git
cherry-pick --continue'. However git advises me to

error: could not apply d0fb756... Commit message for test commit
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add ' or 'git rm '
hint: and commit the result with 'git commit'

I suspect this is not just a little bit deceptive but actually a bit
destructive, as doing a 'git commit' removes some of the sequencing
information. If I do a 'git commit', while there is .git/sequencer
information and 'git cherry-pick --continue' will work, there is no
indication in 'git status' that a cherry-pick is in progress.

It would be extremely easy for somebody to follow cherry-pick's hints
by running 'git commit' and think that they were done, not realizing
that there were m more commits remaining to be cherry-picked.

Would it be possible to expand the hint message to tell users to run
'git cherry-pick --continue'

This patch is *not* meant as a serious patch for submission --I'm sure
it's all wrong-- it's just a proof of concept and showing some
goodwill on my part that I'm trying to help and I'm willing to put in
some work if I can be pointed more in the right direction.

Regards,
Stephen




diff --git a/sequencer.c b/sequencer.c
index cdfac82..b8fa534 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -171,14 +171,21 @@ static void print_advice(int show_hint, struct
replay_opts *opts)

if (show_hint) {
if (opts->no_commit)
advise(_("after resolving the conflicts, mark
the corrected paths\n"
 "with 'git add ' or 'git rm '"));
-   else
-   advise(_("after resolving the conflicts, mark
the corrected paths\n"
-"with 'git add ' or 'git rm '\n"
-"and commit the result with 'git commit'"));
+   else if  (! file_exists(git_path_seq_dir())) {
+   advise(_("SCM: after resolving the conflicts,
mark the corrected paths\n"
+ "with 'git add ' or 'git rm '\n"
+ "and commit the result with 'git commit'"));
+}
+else
+{
+advise(_("SCM: after resolving the conflicts, mark the
corrected paths\n"
+ "with 'git add ' or 'git rm '\n"
+ "and continue the %s with 'git %s --continue'"),
action_name(opts), action_name(opts));
+}
}
 }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git smudge filter fails

2016-03-15 Thread Stephen Morton

On Thu, Mar 10, 2016 at 5:04 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> On Thu, Mar 10, 2016 at 09:45:19AM -0500, Stephen Morton wrote:
>>
>>> I am a bit confused because this is basically the example used in
>>> ProGit [1] and it is fundamentally broken. In fact, if I understand
>>> correctly, this means that smudge filters cannot be relied upon to
>>> provide any 'keyword expansion' type tasks because they will all by
>>> nature have to query the file with 'git log'.
>>
>> Interesting. Perhaps I am missing something (I am far from an expert in
>> clean/smudge filters, which I do not generally use myself), but the
>> example in ProGit looks kind of bogus to me. I don't think it ever would
>> have worked reliably, under any version of git.
>>
>>> (Note that although in my example I used 'git checkout', with an only
>>> slightly more complicated example I can make it fail on 'git pull'
>>> which is perhaps a much more realistic use case. That was probably
>>> implied in your answer, I just wanted to mention it.)
>>
>> Yeah, I think the issue is basically the same for several commands which
>> update the worktree and the HEAD. Most of them are going to do the
>> worktree first.
>
> You can have a pair of branches A and B that have forked long time
> ago, and have a path F that has been changed identically since they
> forked, perhaps by cherry-picking the same change.  This happens all
> the time.
>
> If there were some mechanism that modifies the checked out version
> of F with information that depends on the history that leads to A
> (e.g. "which commit that is an ancestor of A last modified F?")
> when you check out branch A, it will become invalid when you do "git
> checkout B", because the path F will not change because they are the
> same between the branches.  In short, CVS $Id$-style substitutions
> that depend on the history fundamentally does not work, unless you
> are willing to always rewrite the whole working tree every time you
> switch branches.
>
> The smudge and clean filters are given _only_ the blob contents and
> nothing else, not "which commit (or tree) the blob is taken from",
> not "which path this blob sits in that tree-ish", not "what branch
> am I on" and this is a very much deliberate design decision made in
> order to avoid leading people to a misguided attempt to mimick CVS
> $Id$-style substitutions.
>


I will raise an Issue with ProGit.

It's perhaps beyond the scope of my original question, but for
situations where I need a "last change date" embedded in a file (e.g.
because a protocol standard requires it), is there any recommended way
to do so? We've the hard way that hardcoding makes
merging/cherry-picking a bit of a nightmare and should be avoided. Is
a post-checkout hook the way to go? I've actually found the smudge
filter to be very slow for this application as each file is processed
in series; a post-commit hook that could operate on files in parallel
would likely be substantially faster.

Stephen

(Sorry about the earlier top-posting. I didn't realize what gmail was
doing until after it had happened.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Sample pre-push hook can crash

2016-03-11 Thread Stephen Morton

Thanks for the information Junio. It is just interesting that although
the pre-push hook receives the remote_sha value from the server, it
does not get 'git merge-base $remote_sha $local_sha' which is what a
check that iterated over all outgoing commits would really need. (I'm
sure this is a simple-minded assessment. I don't pretend to have spent
even an order of magnitude less time working on git than you have. I'm
not trying to be foolishly arrogant.)

I do think it would be worth replacing the existing example with yours
because the existing example will crash anytime somebody's workspace
is up to date.

Stephen



On Fri, Mar 11, 2016 at 3:04 PM, Junio C Hamano  wrote:
> Stephen Morton  writes:
>
>> That is interesting, so in the case of a non-ff push, there is no way
>> for a pre-push hook to know what is being pushed in order to run?
>
> If you were up-to-date from the other side once:
>
> ---A---B---C
>
> and built one new commit on top:
>
> ---A---B---C---D
>
> when you attempt to push it out, there are various possibilities.
>
> An ff situation is simple.  They didn't do anything, so the hook
> gets "we'd be updating their ref from C to D", and "rev-list C..D"
> would tell you that you would need to make sure D is sane.
>
> If your push does not fast-forward, that would mean something has
> happened on the other side while you were working on D.  Perhaps
> they accepted another commit that you haven't seen:
>
> ---A---B---C---E
>
> and because you haven't fetched from them, even though the hook may
> say "we'd be updating their ref from E to D", you haven't heard of
> E, you do not know where it would be relative to the history you
> have:
>
>   E???
>
> ---A---B---C---D
>
> Or perhaps they themselves rewound their history and they now have
> this E at the tip:
>
> ---A---B---C
> \
>  E
>
> But again, because you do not yet know where E is relative to your
> history, you cannot say C is where you can stop checking your side
> of the history.
>
> Or perhaps they somehow obtained your D without you pushing into
> them (e.g. you pushed to the "next" tree and they fixed your commit
> and the result was merged there) and have something like this:
>
> ---A---B---C---D---E
>
> In this case, D is not even a new commit from their point of view,
> and updating their tip E with your old D would lose the fixup E they
> made for D, but again, you do not know what E is yet, you cannot say
> "they have this already, so there is no check I need to do".
>
> In summary, you cannot even say which ones you have need to be
> checked.
>
> If you _are_ using @{u} tracking, then you would know at least they
> used to have up to C in their repository to limit your check, but
> you cannot unconditionally say ref@{u}.. without making sure ref@{u}
> makes sense in the first place.
>
> An obvious alternative for the sample script would be to instead let
> the non-ff push pass the pre-push check (as you may have other
> arrangements to forbid non-ff pushes) without actually checking
> anything.  As this sample script is to serve as a sample, I think
> such an advanced consideration of loosening checks depending on the
> situation should be left to the end users.
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Sample pre-push hook can crash

2016-03-11 Thread Stephen Morton

That is interesting, so in the case of a non-ff push, there is no way
for a pre-push hook to know what is being pushed in order to run?

Steve


On Thu, Mar 10, 2016 at 4:43 PM, Junio C Hamano  wrote:
> Stephen Morton  writes:
>
>> The sample pre-push hook provided with git [1] will crash if the local
>> repo is not up to date with the remote as $remote_sha is not present
>> in the local repo. I'm not sure if this patch is exactly correct, it's
>> just provided as an example.
>>
>> Given that people are likely crafting their own solutions based on the
>> examples, it's probably good to get right.
>
> It's probably good to get right, but I do not think use of @{u} is
> making it right, unfortunately.  You may not necessarily have @{u}
> configured, and you may not even pushing to the configured remote
> branch.
>
> The spirit of the sample hook, I think, is to validate the new
> commits you are publishing, so if you cannot even determine which
> ones are new and which ones are not, failing the "push" by exiting
> with non-zero status is the right behaviour for this sample.
>
> So perhaps something like this may be more appropriate as an
> example.
>
>  templates/hooks--pre-push.sample | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/templates/hooks--pre-push.sample 
> b/templates/hooks--pre-push.sample
> index 6187dbf..7ef6780 100755
> --- a/templates/hooks--pre-push.sample
> +++ b/templates/hooks--pre-push.sample
> @@ -41,7 +41,12 @@ do
> fi
>
> # Check for WIP commit
> -   commit=`git rev-list -n 1 --grep '^WIP' "$range"`
> +   commit=`git rev-list -n 1 --grep '^WIP' "$range"` || {
> +   # we do not even know about the range...
> +   echo >&2 "Non-ff update to $remote_ref"
> +   echo >&2 "fetch from there first"
> +   exit 1
> +   }
> if [ -n "$commit" ]
> then
> echo >&2 "Found WIP commit in $local_ref, not pushing"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sample pre-push hook can crash

2016-03-10 Thread Stephen Morton

The sample pre-push hook provided with git [1] will crash if the local
repo is not up to date with the remote as $remote_sha is not present
in the local repo. I'm not sure if this patch is exactly correct, it's
just provided as an example.

Given that people are likely crafting their own solutions based on the
examples, it's probably good to get right.

diff --git a/templates/hooks--pre-push.sample b/templates/hooks--pre-push.sample
index 6187dbf..99ed984 100755
--- a/templates/hooks--pre-push.sample
+++ b/templates/hooks--pre-push.sample
@@ -36,9 +36,9 @@ do
# New branch, examine all commits
range="$local_sha"
else
# Update to existing branch, examine new commits
-   range="$remote_sha..$local_sha"
+   range="@{u}..$local_sha"
fi

# Check for WIP commit
commit=`git rev-list -n 1 --grep '^WIP' "$range"`


(This is just something I noticed and thought you might be interested
in, but is not important to me. I actually do care about the smudge
filter issue.)

Stephen

[1] https://github.com/git/git/blob/master/templates/hooks--pre-push.sample
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git smudge filter fails

2016-03-10 Thread Stephen Morton

I am a bit confused because this is basically the example used in
ProGit [1] and it is fundamentally broken. In fact, if I understand
correctly, this means that smudge filters cannot be relied upon to
provide any 'keyword expansion' type tasks because they will all by
nature have to query the file with 'git log'.


(Note that although in my example I used 'git checkout', with an only
slightly more complicated example I can make it fail on 'git pull'
which is perhaps a much more realistic use case. That was probably
implied in your answer, I just wanted to mention it.)

Steve

[1] https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes



On Wed, Mar 9, 2016 at 8:59 PM, Jeff King  wrote:
> On Wed, Mar 09, 2016 at 01:29:31PM -0500, Stephen Morton wrote:
>
>> git config --local filter.dater.smudge 'myDate=`git log
>> --pretty=format:"%cd" --date=iso -1 -- %f`; sed -e
>> "s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"'
>
> Your filter is running "git log" without a revision parameter, which
> means it is looking at HEAD.
>
> And here
>
>> git checkout no_foo
>> git checkout master
>> cat foo.c
>> #observe keyword expansion lost
>
> You are expecting this second one to do:
>
>   1. Switch HEAD to "master".
>
>   2. Checkout files which need updating. Looking at HEAD in your filter
>  then examines "master", and you see the commit timestamp of the
>  destination.
>
> But that isn't how it is implemented. Checkout will handle the file
> checkout _first_, as that is the part that is likely to run into
> problems (e.g., rejecting a switch because it would lose changes in the
> working tree). Only at the very end, after the change to the working
> tree has succeeded, do we update HEAD.
>
> I think the order you are expecting is conceptually cleaner, but I don't
> think we would want to switch it around (for reasons of efficiency and
> robustness). And I don't think we would want to make a promise about the
> ordering to callers either way, as it binds our implementation.
>
> So is there a way to reliably know the destination of a checkout? My
> first thought was that we could add a placeholder similar to "%f" that
> your filter could use. I'm not sure how we would handle the corner cases
> there, though (e.g., do we always have a "destination" to report? If
> not, what do we give the script?).
>
> I suspect you could also hack something together with a post-checkout
> script, though it would probably be a lot less efficient (and might also
> have some weird corner cases).
>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

git smudge filter fails

2016-03-09 Thread Stephen Morton

A git smudge filter, at least one that relies on the results from 'git
log' does not seem to work
on file A when doing a 'git update' from a revision where file A
doesn't exist to a revision where
it does exist.

Below is a simple recipe to reproduce.

This appears to me to be a bug. If not, why is it expected and is
there anything I can do to
work around this behaviour?

Steve

mkdir git_test
cd git_test/
git init .
touch bar.c
git add .
git commit -am "Initial commit. foo.c not here yet."
git tag no_foo

touch foo.c
git add .
git commit -am "Add foo, no content"
echo 'Date is $Date$' >> foo.c
git commit -am "Add date to foo.c"
echo 'foo.c filter=dater' > .git/info/attributes
git config --local filter.dater.smudge 'myDate=`git log
--pretty=format:"%cd" --date=iso -1 -- %f`; sed -e
"s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"'
git config --local filter.dater.clean 'sed -e
"s/\(\\$\)Date[^\\$]*\\$/\1Date\\$/g"'
rm -f foo.c
git checkout -- foo.c
cat foo.c
# observe keyword expansion

git checkout no_foo
git checkout master
cat foo.c
#observe keyword expansion lost
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Slow git pushes: sitting 1 minute in pack-objects

2015-03-19 Thread Stephen Morton

On Mon, Mar 16, 2015 at 6:15 PM, Jeff King  wrote:
> On Mon, Mar 09, 2015 at 09:37:25PM -0400, Stephen Morton wrote:
>
>> 3. Not sure how long this part takes. It takes 1/3 - 1/2 of the time
>> when straced, but I think it's much less, as little as 10s when not
>> straced.
>> It then reads a bunch of what look like objects from filehandle 0
>> (presumably stdin, read from the remote end?)
>> It then tries to lstat() these filenames in .git/
>> ./git/refs/, .git/heads/, etc. It always fails ENOENT.
>> It fails some 120,000 times. This could be a problem. Though I haven't
>> checked to see if this happens on a fast push on another machine.
>
> Hmm. The "push" process must feed the set of object boundaries to
> "pack-objects" so it knows what to pack (i.e., what we want to send, and
> what the other side has).
>
> 120,000 is an awfully large number of objects to be pass there, though.
> Does the repo you are pushing to by any chance have an extremely large
> number of refs (e.g., on the order of 120K tags)?


No. There are on the order of 9,500 refs (mostly tags) but nowhere near 120k.



>
> Can you try building git with this patch which would tell us more about
> where those objects are coming from?




> Those are all rather blunt debugging methods, but hopefully it can get
> us a sense of where the time is going.
>
> -Peff


Thanks Peff,


I haven't tried your patch, but I tried to backtrack a bit and
double-check that the problem always happened in a fresh clone with no
other modifications.

It did _not_ happen in a new clone --a push took just 5s -- and I
think the culprit could have been "repack.writebitmaps=true". Although
I had thought writebitmaps was not originally enabled, I now suspect
that it was. Let me follow up on that first, before I recompile git
with your changes.


Thanks again, I really appreciate the help.

Steve
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Slow git pushes: sitting 1 minute in pack-objects

2015-03-09 Thread Stephen Morton

Thanks Peff,

I've done an strace and here's what I see. I'll try to put relevant
information in as legible a form as possible. The operation is
cpu-bound on a single core (note that yes, delta compression is using
8 threads. so that's obviously not the bottleneck) for the duration of
the pack-objects.

Timing is tough, because it's slower when straced.

I don't know if it matters, but this push is going to a fork on the
remote end most of whose objects are in the objects/info/alternatives
location. (Which is still on the local drive of the remote, not a
network drive or anything like that.)

1.
 > GIT_TRACE=1 time time git push
12:21:28.980410 git.c:349   trace: built-in: git 'push'
12:21:29.099547 run-command.c:341   trace: run_command: 'ssh' '-p'
'7999' 'git@server.privacy' 'git-receive-pack
'\''~smorton/panos.git'\'''
12:21:42.863968 run-command.c:341   trace: run_command:
'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'
12:21:42.871170 exec_cmd.c:134  trace: exec: 'git'
'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'
12:21:42.907783 git.c:349   trace: built-in: git
'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'

2. At this point, "git pack-objects" is forked

3. Not sure how long this part takes. It takes 1/3 - 1/2 of the time
when straced, but I think it's much less, as little as 10s when not
straced.
It then reads a bunch of what look like objects from filehandle 0
(presumably stdin, read from the remote end?)
It then tries to lstat() these filenames in .git/
./git/refs/, .git/heads/, etc. It always fails ENOENT.
It fails some 120,000 times. This could be a problem. Though I haven't
checked to see if this happens on a fast push on another machine.

This takes a lot of time when straced, but I think less time when not straced.

4. Then it just sits there for almost 1 minute. It uses up almost 100%
of a single core during this period. It spends a lot of time running
lots of brk() (memory allocation) commands and then getting (polled
by?) a SIGALRM every 1s.
About 5.5+ GB (about the repo size) of VIRT memory is allocated. Only
about 400M is ever RES.

5. Then we get to the "Counting objects" phase.
Note the number of threads specified: so it

Counting objects: 4, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
...

6. Time. Note the 121,000 pagefaults

46.08user 0.67system 0:49.47elapsed 94%CPU (0avgtext+0avgdata
1720240maxresident)k
0inputs+16outputs (0major+121199minor)pagefaults 0swaps

Steve

On Mon, Mar 9, 2015 at 3:53 AM, Jeff King  wrote:
> On Thu, Mar 05, 2015 at 04:03:07PM -0500, Stephen Morton wrote:
>
>> I'm experiencing very slow git pushes. On the order of 1 minute to push a
>> trivial one-line change. When I set GIT_TRACE=1, I see that it seems to be
>> taking a lot of time in the pack-objects phase.
>
> Can you tell what pack-objects is doing? Perhaps "strace -f -tt" might
> help. Or even just "time git push", which will tell us whether we're
> spinning on CPU or something else.
>
> If you watch the progress meter, where does it spend its time? In
> "counting objects", "compressing objects", or "writing objects"? Or
> after that?
>
> You said the repo is big. Is it reasonably packed (try running `git
> gc`). Pack-objects may look at objects that you know the other side has
> in order to create deltas against them. If you have some crazy situation
> (e.g., you have several thousand packfiles) that can slow down object
> access quite a bit. A minute is more than I would expect, but if VFS
> operations in your VM are slow, that could be it.
>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Slow git pushes: sitting 1 minute in pack-objects

2015-03-05 Thread Stephen Morton

8GB of RAM.

Sorry, typo.

Steve


On Thu, Mar 5, 2015 at 7:25 PM, Duy Nguyen  wrote:
> On Fri, Mar 6, 2015 at 4:03 AM, Stephen Morton
>  wrote:
>> I'm experiencing very slow git pushes. On the order of 1 minute to push a
>> trivial one-line change. When I set GIT_TRACE=1, I see that it seems to be
>> taking a lot of time in the pack-objects phase.
>>
>> Others are not seeing this with the same repo, but I'm the only one working
>> in a VM.
>>
>> ...
>>
>> Details:
>> git version 2.1.4
>> OS: CentOS 6.6 64-bit in a VM.
>> repo size: huge. 6 GB .git directory, around 800 MB working tree.
>> VM has 8 MB RAM and 8 cores.
>
> Is it 8 GB or MB RAM?
>
>> CPU: i7, 8 core (4 cores hyperthreaded)
> --
> Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Slow git pushes: sitting 1 minute in pack-objects

2015-03-05 Thread Stephen Morton

(Apologies, after a day I'm cross-posting from git.users. I think the question
is maybe too technical for that group.)

I'm experiencing very slow git pushes. On the order of 1 minute to push a
trivial one-line change. When I set GIT_TRACE=1, I see that it seems to be
taking a lot of time in the pack-objects phase.

Others are not seeing this with the same repo, but I'm the only one working
in a VM.

```
~/ws/git/repo.1/repo > date; git push mortons; date
Wed Mar  4 15:03:11 EST 2015
15:03:11.086758 git.c:349   trace: built-in: git 'push' 'mortons'
15:03:11.126665 run-command.c:341   trace: run_command: 'ssh' '-p'
'7999' 'git@privacy.privacy' 'git-receive-pack
'\''~mortons/repo.git'\'''
15:03:20.383341 run-command.c:341   trace: run_command:
'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'
15:03:20.383945 exec_cmd.c:134  trace: exec: 'git'
'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'
15:03:20.385168 git.c:349   trace: built-in: git
'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'
Counting objects: 4, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 20.86 KiB | 0 bytes/s, done.
Total 4 (delta 0), reused 0 (delta 0)
To ssh://git@privacy.privacy:7999/~mortons/repo.git
   5fe662f..a137bda  my_branch -> my_branch
Wed Mar  4 15:04:22 EST 2015

```

After it was slow at first, I tried setting these which did not help

repack.writebitmaps=true
pack.windowmemory=100m



Details:
git version 2.1.4
OS: CentOS 6.6 64-bit in a VM.
repo size: huge. 6 GB .git directory, around 800 MB working tree.
VM has 8 MB RAM and 8 cores.
CPU: i7, 8 core (4 cores hyperthreaded)

It is an ext4 filesystem on the guest linux drive.
On the host side, it is a .vmdk file and the virtualization software used is
virtualbox. While the drive is dynamically allocated, after I ran into
this issue,
I used fallocate to create a 50GB dummy file and then delete it to
ensure that
there was headroom in the drive and that dynamic allocation slowness was not the
issue, and subsequent pushes were still slow.
I have not experienced any filesystem slowness issues in the
months I've been using this vm.


Any ideas? I'm evaluating a move to git and this is the kind of thing
that could derail it.
Thanks,

Stephen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Stephen Morton

This is fantastic. I really appreciate all the answers. And it's great
that I think I've sparked some general discussion that could lead
somewhere too.

Notes:

I'm currently using 2.1.3. I'll move to 2.3.x

I'm experimenting with git-annex to reduce repo size on disk. We'll see.

I could remove all tags older than /n/ years old in the active repo
and just maintain them in the historical repo. (We have quite a lot of
CI-generated tags.) It sounds like that might improve performance.

Questions:

1. Ævar : I'm a bit concerned by your statement that git rebases take
about 1-2 s per commit. Does that mean that a "git pull --rebase", if
it is picking up say 120 commits (not at all unrealistic), could
potentially take 4 minutes to complete? Or have I misinterpreted your
comment.

2. I'd not heard about bitmap indexes before this thread but it sounds
like they should help me. In limited searching I can't find much
useful documentation about them. It is also not clear to me if I have
to explicitly run "git repack --write-bitmap-indexes" or if git will
automatically detect when they're needed; first experiments seem to
indicate that I need to explicitly generate them. I assume that once
the index is there, git will just use it automatically.


Steve


On Thu, Feb 19, 2015 at 7:03 PM, brian m. carlson
 wrote:
> On Thu, Feb 19, 2015 at 04:26:58PM -0500, Stephen Morton wrote:
>> I posted this to comp.version-control.git.user and didn't get any response. I
>> think the question is plumbing-related enough that I can ask it here.
>>
>> I'm evaluating the feasibility of moving my team from SVN to git. We have a 
>> very
>> large repo. [1] We will have a central repo using GitLab (or similar) that
>> everybody works with. Forks, code sharing, pull requests etc. will be done
>> through this central server.
>>
>> By 'performance', I guess I mean speed of day to day operations for devs.
>>
>>* (Obviously, trivially, a (non-local) clone will be slow with a large 
>> repo.)
>>* Will a few simultaneous clones from the central server also slow down
>>  other concurrent operations for other users?
>
> This hasn't been a problem for us at $DAYJOB.  Git doesn't lock anything
> on fetches, so each process is independent.  We probably have about
> sixty developers (and maybe twenty other occasional users) that manage
> to interact with our Git server all day long.  We also have probably
> twenty smoker (CI) systems pulling at two hour intervals, or, when
> there's nothing to do, every two minutes, plus probably fifteen to
> twenty build systems pulling hourly.
>
> I assume you will provide adequate resources for your server.
>
>>* Will 'git pull' be slow?
>>* 'git push'?
>
> The most pathological case I've seen for git push is a branch with a
> single commit merged into the main development branch.  As of Git 2.3.0,
> the performance regression here is fixed.
>
> Obviously, the speed of your network connection will affect this.  Even
> at 30 MB/s, cloning several gigabytes of data takes time.  Git tries
> hard to eliminate sending a lot of data, so if your developers keep
> reasonably up-to-date, the cost of establishing the connection will tend
> to dominate.
>
> I see pull and push times that are less than 2 seconds in most cases.
>
>>* 'git commit'? (It is listed as slow in reference [3].)
>>* 'git stautus'? (Slow again in reference 3 though I don't see it.)
>
> These can be slow with slow disks or over remote file systems.  I
> recommend not doing that.  I've heard rumbles that disk performance is
> better on Unix, but I don't use Windows so I can't say.
>
> You should keep your .gitignore files up-to-date to avoid enumerating
> untracked files.  There's some work towards making this less of an
> issue.
>
> git blame can be somewhat slow, but it's not something I use more than
> about once a day, so it doesn't bother me that much.
>
>> Assuming I can put lots of resources into a central server with lots of CPU,
>> RAM, fast SSD, fast networking, what aspects of the repo are most likely to
>> affect devs' experience?
>>* Number of commits
>>* Sheer disk space occupied by the repo
>
> The number of files can impact performance due to the number of stat()s
> required.
>
>>* Number of tags.
>>* Number of branches.
>
> The number of tags and branches individually is really less relevant
> than the total number of refs (tags, branches, remote branches, etc).
> Very large numbers of refs can impact performance on pushe

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Stephen Morton

On Thu, Feb 19, 2015 at 5:21 PM, Stefan Beller  wrote:
> On Thu, Feb 19, 2015 at 1:26 PM, Stephen Morton
>  wrote:
>> I posted this to comp.version-control.git.user and didn't get any response. I
>> think the question is plumbing-related enough that I can ask it here.
>>
>> I'm evaluating the feasibility of moving my team from SVN to git. We have a 
>> very
>> large repo. [1]
>>
>> [1] (Yes, I'm investigating ways to make our repo not so large etc. That's
>> beyond the scope of the discussion I'd like to have with this
>> question. Thanks.)
>
> What do you mean by large?
> * lots of files
> * large files
> * or even large binary files (bad to diff/merge)
> * long history (i.e. lots of small changes)
> * impactful history (changes which rewrite nearly everything from scratch)
>
> For reference, the linux
> * has 48414 files, in 3128 directories
> * the largest file is 1.1M, the whole repo is 600M
> * no really large binary files
> * more than 500051 changes/commits including merges
> * started in 2004 (when git was invented essentially)
> * the .git folder is 1.4G compared to the 600M files,
>indicating it may have been rewritting 3 times (well this
>metric is bogus, there is lots of compression
>going on in .git)
>
> and linux seems to be doing ok with git.
>
> So as long as you cannot pinpoint your question on what you are exactly
> concerned about, there will be no helpful answer I guess.
>
> linux is by no means a really large project, there are other projects way
> larger than that (I am thinking about the KDE project for example)
> and they do fine as well.
>
> Thanks,
> Stefan

Hi Stefan,

I think I addressed most of this in my original post with the paragraph

 "Assume ridiculous numbers. Let me exaggerate: say 1 million commits,
15 GB repo,
  50k tags, 1,000 branches. (Due to historical code fixups, another
5,000 "fix-up
  branches" which are just one little dangling commit required to
change the code
  a little bit between a commit and a tag that was not quite made from it.)"

To that I'd add 25k files,
no major rewrites,
no huge binary files, but lots of a few MB binary files with many revisions.

But even without details of my specific concerns, I thought that
perhaps the git developers know what limits git's performance even if
large projects like the kernel are not hitting these limits.

Steve
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Stephen Morton

I posted this to comp.version-control.git.user and didn't get any response. I
think the question is plumbing-related enough that I can ask it here.

I'm evaluating the feasibility of moving my team from SVN to git. We have a very
large repo. [1] We will have a central repo using GitLab (or similar) that
everybody works with. Forks, code sharing, pull requests etc. will be done
through this central server.

By 'performance', I guess I mean speed of day to day operations for devs.

   * (Obviously, trivially, a (non-local) clone will be slow with a large repo.)
   * Will a few simultaneous clones from the central server also slow down
 other concurrent operations for other users?
   * Will 'git pull' be slow?
   * 'git push'?
   * 'git commit'? (It is listed as slow in reference [3].)
   * 'git stautus'? (Slow again in reference 3 though I don't see it.)
   * Some operations might not seem to be day-to-day but if they are called
 frequently by the web front-end to GitLab/Stash/GitHub etc then
 they can become bottlenecks. (e.g. 'git branch --contains' seems terribly
 adversely affected by large numbers of branches.)
   * Others?


Assuming I can put lots of resources into a central server with lots of CPU,
RAM, fast SSD, fast networking, what aspects of the repo are most likely to
affect devs' experience?
   * Number of commits
   * Sheer disk space occupied by the repo
   * Number of tags.
   * Number of branches.
   * Binary objects in the repo that cause it to bloat in size [1]
   * Other factors?

Of the various HW items listed above --CPU speed, number of cores, RAM, SSD,
networking-- which is most critical here?

(Stash recommends 1.5 x repo_size x number of concurrent clones of
available RAM.
I assume that is good advice in general.)

Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB repo,
50k tags, 1,000 branches. (Due to historical code fixups, another 5,000 "fix-up
branches" which are just one little dangling commit required to change the code
a little bit between a commit a tag that was not quite made from it.)

While there's lots of information online, much of it is old [3] and with git
constantly evolving I don't know how valid it still is. Then there's anecdotal
evidence that is of questionable value.[2]
Are many/all of the issues Facebook identified [3] resolved? (Yes, I
understand Facebook went with Mercurial. But I imagine the git team nevertheless
took their analysis to heart.)


Thanks,
Steve


[1] (Yes, I'm investigating ways to make our repo not so large etc. That's
beyond the scope of the discussion I'd like to have with this
question. Thanks.)
[2] The large amounts of anecdotal evidence relate to the "why don't you try it
yourself?" response to my question. I will I I have to but setting up a
properly methodical study is time consuming and difficult --I don't want to
produce poor anecdotal numbers that don't really hold up-- and if somebody's
already done the work, then I should leverage it.
[3] http://thread.gmane.org/gmane.comp.version-control.git/189776
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran

Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

Re: git smudge filter fails

Re: Sample pre-push hook can crash

Re: Sample pre-push hook can crash

Sample pre-push hook can crash

Re: git smudge filter fails

git smudge filter fails

Re: Slow git pushes: sitting 1 minute in pack-objects

Re: Slow git pushes: sitting 1 minute in pack-objects

Re: Slow git pushes: sitting 1 minute in pack-objects

Slow git pushes: sitting 1 minute in pack-objects

Re: Git Scaling: What factors most affect Git performance for a large repo?

Re: Git Scaling: What factors most affect Git performance for a large repo?

Git Scaling: What factors most affect Git performance for a large repo?

22 matches

Site Navigation

Mail list logo

Footer information