Hello

2018-03-12 Thread Bauer M


Good day dear, i hope this mail meets you well? I know this may seem 
inappropriate so
i ask for your forgiveness but i wish to get to know you better, if I may be so 
bold.
I consider myself an easy-going man, adventurous, honest and fun loving person 
but I
am currently looking for a relationship in which I will feel loved. I promise to
answer any question that you may want to ask me...all i need is just your 
attention
and the chance to know you more.

Please tell me more about yourself, if you do not mind. Hope to hear back from 
you
soon.
Bauer


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Junio C Hamano
While I do not think it is a bad idea to add an optional way to write the
contents of conflicted stages out to separate files, I do not think it is a
good idea to change what happens to add/add conflict by default.

Two integrators picking up a same patch that adds a file separately
and allowing them to diverge before they are merged should not be
all that surprising, just like two integrators picking up a same patch
that changes an existing path and letting them evolve independently,
so from that point of view, I do not think there is fundamental difference
between edit/edit vs add/add and rename/rename conflict. The latter
certainly would be much rarer, but that is because edit happens a lot
more often than add.

There certainly are cases where conflicts is easier to resolve when
the merged tip versions are unrelated or diverged too vastly, with or
without a common merge base. As I already agreed to, it would be
useful in such a case to have an option to write the conflicted stages
to separate files to let an external merge tool to examine them and
help you resolve them. But that is not limited to resolving vast
differece between contents involved in an add/add conflict, is it?
The same tool (which can be driven as a mergetool backend) would
be useful in reconciling the difference between contents involved in an
edit/edit conflict, no?

If anything, if rename/rename behaves differently by always writing
the result out to separate files, it is that codepath whose behaviour
should be fixed to match how add/add conflicts are dealt with.


[GSoC][PATCH] git-ci: use pylint to analyze the git-p4 code

2018-03-12 Thread Viet Hung Tran
Hello Mr. Schneider,

Here is my link for the passed CI build I tested on my fork: 
https://travis-ci.org/VietHTran/git/builds/35210

In order to test .travis.yml configuration alone, I used a sample python 
script call "test.py" that is already successfully tested on my computer
instead of using the git-p4.py like in the patch I submitted since the 
git-p4.py file still reports a lot of errors when running pylint.

It is possible to fix all the git-p4.py errors. However, I think it would
be best to fix each error separately with multiple commits (each should 
fix a specific problem such as indentication, variable naming, etc.)
since there are a lot of changes needed to be made in the codes. Since the
microproject requires that I submit one patch, I decided to submit a patch
that includes changes in .travis.yml only. If you like, I can also submit the
patches addressing changes on the git-p4.py that fix the pylint errors.

Thank you for feedback,

Viet

Signed-off-by: Viet Hung Tran 
---
 .travis.yml | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/.travis.yml b/.travis.yml
index 5f5ee4f3b..581d75319 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -46,6 +46,16 @@ matrix:
 - docker
   before_install:
   script: ci/run-linux32-docker.sh
+- env: jobname=Pylint
+  compiler:
+  addons:
+apt:
+  packages:
+  - python
+  before_install:
+  install: pip install --user pylint
+  script: pylint git-p4.py
+  after_failure:
 - env: jobname=StaticAnalysis
   os: linux
   compiler:
-- 
2.16.2.440.gc6284da



Fwd: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Elijah Newren
[Re-sending because this computer happened to have plain-text mode
turned off for some weird reason, and thus the email bounced]

Hi,

On Mon, Mar 12, 2018 at 3:19 PM, Ævar Arnfjörð Bjarmason
 wrote:
>
> Does this mean that e.g. in this case of merging two files, one
> containing "foo" and one containing "bar":
>
> (
> rm -rf /tmp/test.git &&
> git init /tmp/test.git &&
> cd /tmp/test.git &&
> echo foo >README &&
> git add README &&
> git commit -mfoo &&
> git checkout --orphan trunk &&
> git reset --hard &&
> echo bar >README &&
> git add README &&
> git commit -mbar &&
> git merge --allow-unrelated-histories master;
> cat README
> )
>
> That instead of getting:
>
> <<< HEAD
> bar
> ===
> foo
> >>> master
>
> I'd now get these split into different files?

As currently implemented, yes.  However, I was more concerned the idea
of handling files differently based on whether or not they were
similar, rather than on what the precise definition of "similar" is
for this context.

As far as the definition of similarity goes, estimate_similarity() is
currently used by rename detection to compare files recorded at
different pathnames.  By contrast, in this context, we are comparing
two files which were recorded with the same pathname.  That suggests
the heuristic could be a bit different and use more than just
estimate_similarity().  (e.g. "We consider these files similar IF more
than 50% of the lines match OR both files are less than 2K.")


> I don't mind this being a configurable option if you want it, but I
> don't think it should be on by default, reasons:

I don't think a configurable option makes sense, at least not for my
purposes.  Having rename/rename conflicts be "safely" mis-detected as
rename/add or add/add, and having rename/add conflicts be "safely"
mis-detected as add/add is my overriding concern.  Thus, making these
three conflict types behave consistently is what I need.  Options
would make that more difficult for me, and would thus feel like a step
backwards.

git am/rebase has been doing such mis-detections for years (almost
since the "dawn" of git time), but it feels really broken to me
because the conflict types aren't handled consistently.  (The facts
that (a) I'm the only one that has added rename/add testcases to
git.git, (b) that I've added all but one of the rename/rename(2to1)
testcases to the git.git testsuite, and (c) that rename/add has had
multiple bugs for many years, all combine to suggest to me that folks
just don't hit those conflict types in practice and thus that they
just aren't noticing this breakage -- yet.)

I also want to allow such mis-detections for cherry-picks and merges
because of the significant (nearly order-of-magnitude in some cases)
performance improvements I can get in rename detection if it's
allowed.


>  1) There's lots of cases where we totally screw up the "is this
> similar?" check, in particular with small files.
>
> E.g. let's say you have a config file like 'fs-path "/tmp/git"' and
> in two branches you change that to 'fs-path "/opt/git"' and 'fs-path
> "/var/git"'. The rename detection will think this these have nothing
> to do with each other since they share no common lines, but to a
> human reader they're really similar, and would make sense in the
> context of resolving a bigger merge where /{opt,var}/git changes are
> conflicting.
>
> This is not some theoretical concern, there's lots of things that
> e.g. use small 5-10 line config files to configure some app that
> because of some combo of indentation changes and changing a couple
> of lines will make git's rename detection totally give up, but to a
> human reader they're 95% the same.

Fair enough.  The small files case could potentially be handled by
just changing the similarity metric for these conflict types, as noted
above.  If it's a small file, that might be the easiest way for a user
to deal with it too.

I'm not sure I see the problem with the bigger files, though.  If you
have bigger files with less than 50% of the lines matching, then
you'll essentially end up with a great big conflict block with one
file on one side and the other file on the other side, which doesn't
seem that different to me than having them be in two separate files.
In fact, separate files seems easier to deal with because then the
user can run e.g. 'git diff --no-index --color-words FILE1 FILE2',
something that they can't do when it's in one file.  That has bothered
me more than once, and made me wish they were just in separate files.


>  2) This will play havoc with already established merge tools on top of
> git which a lot of users use instead of manually resolving these in
> vi or whatever.
>
> If we made this the default they'd need to to deal with this new
> state, and even if 

Re: Git Merge contributor summit notes

2018-03-12 Thread Brandon Williams
On 03/12, Jeff King wrote:
> On Sat, Mar 10, 2018 at 02:01:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> > >  - (peff) Time to deprecate the git anonymous protocol?
> > [...]
> > 
> > I think the conclusion was that nobody cares about the git:// protocol,
> > but people do care about it being super easy to spin up a server, and
> > currently it's easiest to spin up git://, but we could also ship with
> > some git-daemon mode that had a stand-alone webserver (or ssh server) to
> > get around that.
> 
> I don't think keeping support for git:// is too onerous at this point
> (especially because it should make the jump to protocol v2 with the
> rest). But it really is a pretty dated protocol, lacking any kind of
> useful security properties (yes, I know, if we're all verifying signed
> tags it's great, but realistically people are fetching the tip of master
> over a hijack-able TCP connection and running arbitrary code on the
> result). It might be nice if it went away completely so we don't have to
> warn people off of it.
> 
> The only thing git:// really has going over git-over-http right now is
> that it doesn't suffer from the stateless-rpc overhead. But if we unify
> that behavior in v2, then any advantage goes away.

It's still my intention to unify this behavior in v2 but then begin
working on improving negotiation as a whole (once v2 is in) so that we
can hopefully get rid of the nasty corner cases that exist in http://.
Since v2 will be hidden behind a config anyway, it may be prudent to
wait until negotiation gets better before we entertain making v2 default
(well there's also needing to wait for hosting providers to begin
supporting it).

> 
> I do agree we should have _something_ that is easy to spin up. But it
> would be wonderful if git-over-http could become that, and we could just
> deprecate git://. I suppose it's possible people build clients without
> curl, but I suspect that's an extreme minority these days (most third
> party hosters don't seem to offer git:// at all).
> 
> -Peff

-- 
Brandon Williams


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Elijah Newren
I'm worried that my attempt to extract add/add from the rest of the
discussion with rename/add and rename/rename resulted in a false sense
of simplification.  Trying to handle all the edge and corner cases and
remain consistent sometimes gets a little hairy.  Anyway...

On Mon, Mar 12, 2018 at 2:35 PM, Jonathan Nieder  wrote:
> Elijah Newren wrote:
>> On Mon, Mar 12, 2018 at 11:47 AM, Jonathan Nieder  wrote:
>

> Sorry for the lack of clarity.  My understanding was that the proposed
> behavior was to write two files:
>
> ${path}~HEAD
> ${path}~MERGE

(Just to be clear: The proposed behavior was to do that only when the
colliding files were dissimilar, and otherwise two-way merging.)

> My proposal is instead to write one file:
>
> ${path}
>
> with the content that would have gone to ${path}~HEAD.  This is what
> already happens when trying to merge binary files.

Thanks for clarifying.

I feel the analogy to merging binary files breaks down here in more
than one way:

1)

Merging binary files is more complex than you suggest here.  In
particular, when creating a virtual merge base, the resolution chosen
is not the version of the file from HEAD, but the version of the file
from the merge base.  Nasty problems would result otherwise for the
recursive case.

If we tried to match how merging binary files behaved, we run into the
problem that the colliding file conflict case has no common version of
the file from a merge base.  So the same strategy just doesn't apply.
The closest would be outright deleting both versions of the file for
the virtual merge base and punting to the outer merge to deal with it.
That might be okay, but seems really odd to me.  I feel like it'd
unnecessarily increase the number of conflicts users will see, though
maybe it wouldn't be horrible if this was only done when the files
were considered dissimilar anyway.

2)

merging binary files only has 3 versions of the file to store at a
single $path in the index, which fit naturally in stages 1-3;
rename/add and rename/rename(2to1) have 4 and 6, respectively.  Having
three versions of a file from a 3-way merge makes fairly intuitive
sense.  Have 4 or 6 versions of a file at a path from a 3-way merge is
inherently more complex.  I think that just using the version from
HEAD risks users resolving things incorrectly much more likely than in
the case of a binary merge.


> [...]
>>> Can you add something more about the motivation to the commit message?
>>> E.g. is this about performance, interaction with some tools, to
>>> support some particular workflow, etc?
>>
>> To be honest, I'm a little unsure how without even more excessive and
>> repetitive wording across commits.
>
> Simplest way IMHO is to just put the rationale in patch 5/5. :)  In
> other words, explain the rationale for the end-user facing change in the
> same patch that changes the end-user facing behavior.

Patches 3, 4, and 5 all change end-user facing behavior -- one patch
per conflict type to make the three types behave the same (the first
two patches add more testcases, and write a common function all three
types can use).  The rationale for the sets of changes is largely the
same, and is somewhat lengthy.  Should I repeat the rationale in full
in all three places?

>> Let me attempt here, and maybe you
>> can suggest how to change my commit messages?
>>
>>   * When files are wildly dissimilar -- as you mentioned -- it'd be
>> easier for users to resolve conflicts if we wrote files out to
>> separate paths instead of two-way merging them.
>
> Today what we do (in both the wildly-dissimilar case and the
> less-dissimilar case) is write one proposed resolution to the worktree
> and put the competing versions in the index.  Tools like "git
> mergetool" are then able to pull the competing versions out of the
> index to allow showing them at the same time.
>
> My bias is that I've used VCSes before that wrote multiple competing
> files to the worktree and I have been happier with my experience
> resolving conflicts in git.  E.g. at any step I can run a build to try
> out the current proposed resolution, and there's less of a chance of
> accidentally commiting a ~HEAD file.
>
> [...]
>> There are three types of conflicts representing two (possibly
>> unrelated) files colliding at the same path: add/add, rename/add, and
>> rename/rename(2to1).  add/add does the two-way merge of the colliding
>> files, and the other two conflict types write the two colliding files
>> out to separate paths.
>
> Interesting.  I would be tempted to resolve this inconsistency the
> other way: by doing a half-hearted two-way merge (e.g. by picking one
> of the two versions of the colliding file) and marking the path as
> conflicted in the index.  That way it's more similar to edit/edit,
> too.

Your proposal is underspecified; there are more complicated cases
where your wording could have different 

Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-12 Thread Igor Djordjevic
Hi Dscho,

On 12/03/2018 11:20, Johannes Schindelin wrote:
> 
> > > [...] and cannot introduce ambiguities when rebasing the
> > > changes introduced by M (i.e. the "amendmendts" we talked about).
> >
> > Hmm, not following here, which ambiguities are we talking about?
> 
> U1' vs U2' of course. Those are two things that can be different, even if
> they ideally would have identical trees.
> 
> Phillip's strategy does not leave that room for ambiguity.

Ehm, in Sergey`s approach, this is not an issue, but a feature :)

If U1' != U2', it just means a more complex rebase happened, but it 
doesn`t compromise the result (rebased merge) in any way.

On the other hand, if U1' == U2', we can be pretty sure that merge 
rebasing went as clean as possible.

That`s the idea, at least.

Regards, Buga


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-12 Thread Igor Djordjevic
Hi Dscho,

On 12/03/2018 11:46, Johannes Schindelin wrote:
> 
> > Sometimes one just needs to read the manual, and I don`t really think
> > this is a ton complicated, but just something we didn`t really have
> > before (real merge rebasing), so it requires a moment to grasp the
> > concept.
> 
> If that were the case, we would not keep getting bug reports about
> --preserve-merges failing to reorder patches.

Not sure where that is heading to, but what I`m arguing about is that 
introducing new commands and concepts (`merge`, and with `-R`) just 
makes the situation even worse (more stuff to grasp).

Reusing existing concepts where possible doesn`t have this problem.

> > Saying in favor of `--rebase-merges`, you mean as a separate option,
> > alongside `--recreate-merges` (once that series lands)?
> 
> No. I am against yet another option. The only reason I pollute the option
> name space further with --recreate-merges is that it would be confusing to
> users if the new mode was called --preserve-merges=v2 (but work *totally
> differently*).

I see. So I take you`re thinking about renaming `--recreate-merges` 
to `--rebase-merges` instead?

That would seem sensible, too, I think, being the default usage mode 
in the first place. Being able to actually (re)create merges, too, 
once user goes interactive, would be "just" an additional (nice and 
powerful) feature on top of it.

Regards, Buga


Re: [PATCH 0/2] Make cvs tests pass with '-x' tracing and /bin/sh

2018-03-12 Thread Jeff King
On Thu, Mar 08, 2018 at 01:38:42PM +0100, SZEDER Gábor wrote:

> > I installed 'cvs' and whatnot to run t94* and t96* tests, and sure
> > enough, 5 tests in 2 test scripts fail with '-x' tracing and /bin/sh.
> > I think I will be able to get around to send v2 during the weekend.
> 
> Well, I wasn't able to get around to it, and in the meantime
> 'sg/test-x' got merged into 'next'.  So here are the updates for those
> two test scripts.
> 
> The commit message of the first patch is perhaps unnecessary long, but
> it's mostly about explaining why the affected test was working
> reasonably well despite the rather weird 'test_cmp this that >>out
> 2>&1' thing.

You know I would never complain about a long commit message. :)

Both patches look OK to me. My only comment on the first one was "you
should just use 'return'", but I see Eric beat me to it.

I do think the postimage of the second one is a little less readable.
That's not a big deal to me, but I'm pretty sure I would intentionally
write it the "original" way if I found myself in a similar situation.
Which makes me wonder whether we'll end up accidentally re-introducing
these kinds of "-x" failures. But as I said before, I'm willing to wait
and see.

-Peff


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-12 Thread Igor Djordjevic
On 12/03/2018 13:56, Sergey Organov wrote:
> 
> > > I agree with both of you that `pick ` is inflexible 
> > > (not to say just plain wrong), but I never thought about it like that.
> > >
> > > If we are to extract further mentioned explicit old:new merge 
> > > parameter mapping to a separate discussion point, what we`re 
> > > eventually left with is just replacing this:
> > >
> > >   merge -R -C  
> > >
> > > ... with this:
> > >
> > >   pick  
> >
> > I see where you are coming from.
> >
> > I also see where users will be coming from. Reading a todo list in the
> > editor is as much documentation as it is a "program to execute". And I am
> > afraid that reading a command without even mentioning the term "merge"
> > once is pretty misleading in this setting.
> >
> > And even from the theoretical point of view: cherry-picking non-merge
> > commits is *so much different* from "rebasing merge commits" as discussed
> > here, so much so that using the same command would be even more
> > misleading.
> 
> This last statement is plain wrong when applied to the method in the
> [RFC] you are replying to. Using the method in [RFC], "cherry-pick
> non-merge" is nothing more or less than reduced version of generic
> "cherry-pick merge", exactly as it should be.
> 
> Or, in other words, "cherry-pick merge" is generalization of
> "cherry-pick non-merge" to multiple parents.

I think Sergey does have a point here, his approach showing it.

Phillip`s simplification might be further from it, though, but we`re 
talking implementation again - important mental model should just be 
"rebasing a commit" (merge or non-merge), how we`re doing it is 
irrelevant for the user, the point (goal) is the same.


Re: [PATCH 0/4] Speed up git tag --contains

2018-03-12 Thread Jeff King
On Mon, Mar 12, 2018 at 09:45:27AM -0400, Derrick Stolee wrote:

> > diff --git a/builtin/branch.c b/builtin/branch.c
> > index 8dcc2ed058..4d674e86d5 100644
> > --- a/builtin/branch.c
> > +++ b/builtin/branch.c
> > @@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, 
> > struct ref_sorting *sortin
> > memset(, 0, sizeof(array));
> > +   filter->with_commit_tag_algo = 1;
> > filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN);
> > if (filter->verbose)
> > 
> > drops my run of "git branch -a --contains HEAD~100" from 8.6s to
> > 0.4s on a repo with ~1800 branches. That sounds good, but on a repo with
> > a smaller number of branches, we may actually end up slower (because we
> > dig further down in history, and don't benefit from the multiple-branch
> > speedup).
> 
> It's good to know that we already have an algorithm for the multi-head
> approach. Things like `git branch -vv` are harder to tease out because the
> graph walk is called by the line-format code.

Yeah, the ahead/behind stuff will need some work. Part of it is just
code structuring. We know ahead of time which branches (and their
upstreams) are going to need this ahead/behind computation, so we should
be able to do collect them all for a single call.

But I'm not sure if a general multi-pair ahead/behind is going to be
easy. I don't have even experimental code for that. :)

We have a multi-pair ahead/behind command which we use at GitHub, but it
does each pair separately. It leans heavily on reachability bitmaps, so
the main advantage is that it's able to amortize the cost of loading the
bitmaps (both off disk, but also we sometimes have to walk to complete
the bitmaps).

-Peff


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-12 Thread Igor Djordjevic
Hi Dscho,

On 12/03/2018 11:37, Johannes Schindelin wrote:
> 
> > If we are to extract further mentioned explicit old:new merge 
> > parameter mapping to a separate discussion point, what we`re 
> > eventually left with is just replacing this:
> >
> > merge -R -C  
> >
> > ... with this:
> >
> > pick  
> 
> I see where you are coming from.
> 
> I also see where users will be coming from. Reading a todo list in the
> editor is as much documentation as it is a "program to execute". And I am
> afraid that reading a command without even mentioning the term "merge"
> once is pretty misleading in this setting.
> 
> And even from the theoretical point of view: cherry-picking non-merge
> commits is *so much different* from "rebasing merge commits" as discussed
> here, so much so that using the same command would be even more
> misleading.

I would disagree here, as it seems you`re going too much into 
implementation and theory here, where it shouldn`t really matter from 
the user`s point of view - the point is to rebase a commit, `pick` it 
from one place and plant it elsewhere.

Yes, some commits might have a bit different semantics then others 
(merge vs non-merge), but it should just be an implementation detail, 
in my opinion, no need to leak it in user`s face (more than necessary).

I feel that "merge" is a command that works really well in the 
mindset of (re)creating merges. But if we are "only" rebasing an 
existing merge, `pick` seems much more appropriate (to me, at least), 
and it aligns with what I`m already expecting `pick` to be doing.

Down below, if we are (re)creating the merge, or doing magic to 
somehow just port it over, should be irrelevant. So "rebase" equals 
"pick and plant" (port), not "merge".

> > That is what I had in mind, seeming possibly more straightforward and 
> > beautifully aligned with previously existing (and well known) 
> > `rebase` terminology.
> >
> > Not to say this would make it possible to use other `rebase -i` todo 
> > list commands, too, like if you want to amend/edit merge commit after 
> > it was rebased, you would write:
> >
> > edit  
> >
> > ..., where in case you would simply like to reword its commit 
> > message, it would be just:
> >
> > reword  
> >
> >
> > Even `squash` and `fixup` could have their place in combination with 
> > a (to be rebased) merge commit, albeit in a pretty exotic rebases, 
> > thus these could probably be just disallowed - for the time being, at 
> > least.
> 
> Sure, for someone who read the manual, that would be easy to use. Of
> course, that's the minority.

I`m not following you here - the point is these are already existing 
commands, which would still fit in just nicely, so nothing new to 
learn nor read.

Now, if we are to discuss use cases where people don`t even know what 
they`re doing, I would think that misses the point. Besides, it`s 
always easier to make more mistakes when you introduce yet more 
commands/semantics to think about/learn, and I think it can be avoided 
here, for the better.

> Also: the `edit` command is poorly named to begin with. A much cleaner
> design would be to introduce the `break` command as suggested by Stephan.

This is orthogonal to what we`re discussing. Existing commands might 
not be perfect, but that`s what we have now, so let`s be consistent, 
not putting additional burden on the user there, at least.

But for the record - I tend to agree, I often find myself wondering 
if `edit`-ed commit means `rebase` stops after applying the changes 
and _before_ making the commit itself (so we just edit and 
--continue), or _after_ it (so we edit, `commit --amend` and 
--continue).

> > The real power would be buried in implementation, learning to rebase 
> > merge commits, so user is left with a very familiar interface, slightly 
> > adapted do accommodate a bit different nature of merge commit in 
> > comparison to an ordinary one, also to allow a bit more of interactive 
> > rebase functionality, but it would pretty much stay the same, without 
> > even a need to learn about new `merge`, `-R`, `-C`, and so on.
> >
> > Yes, those would have its purpose, but for real merging then 
> > (creating new merges, or recreating old ones), not necessarily for 
> > merge rebasing.
> >
> > With state of `merge -R -C ...` (that `-R` being the culprit), it 
> > kind of feels like we`re now trying to bolt "rebase merges" 
> > functionality onto a totally different one (recreate merges, serving 
> > a different purpose), making them both needlessly dependent on each 
> > other, further complicating user interface, making it more confusing 
> > and less tunable as per each separate functionality needs (rebase vs. 
> > recreate).
> >
> > I guess I`m the one to pretty much blame here, too, as I really 
> > wanted `--recreate-merges` to handle "rebase merges" better, only to 
> > later realize it might not be the best tool for the job, and that a 
> > more separate approach would be better (at least 

Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-03-12 Thread Jeff King
On Mon, Mar 12, 2018 at 04:37:47PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > We could even give it an environment variable, which would allow
> > something like:
> >
> >   tar xf maybe-evil.git.tar
> >   cd maybe-evil
> >   export GIT_TRUST_REPO=false
> >   git log
> [...]
> As an internal implementation detail, this is so obviously fragile
> that it wouldn't give me any feeling of security. ;-)  So it should be
> strictly an improvement.
> 
> As a public-facing feature, I suspect it's a bad idea for exactly that
> reason.

So that pretty much kills off the GIT_TRUST_REPO idea, I guess.

> FWIW for pager specifically I am going for a whitelisting approach:
> new commands would have to explicitly set ALLOW_PAGER if they want to
> respect pager config.  That doesn't guarantee people think about it
> again as things evolve but it should at least help with getting the
> right setting for new plumbing.

I suspect we'd be about as well off with the "don't trust the repo"
internal flag. Touching the ALLOW_PAGER setup code is about as likely to
set off red flags for the developers (or reviewers) as code that checks
the "trust" flag.

Forcing a whitelist on ALLOW_PAGER _is_ more likely to catch people
adding new commands. But I don't think we actually want to add more
commands to the "safe to run in a malicious repo" list. It's already a
slightly sketchy concept. This is really all about upload-pack and its
existing promises.

But ALLOW_PAGER would _just_ fix the pager issue. When we inevitably
find another problem spot, it won't help us there. But a global "trust"
flag might.

I dunno. I guess I'm OK with either approach, but it seems like the
global trust flag has more room to grow.

-Peff


Re: [PATCH v3 12/35] serve: introduce git-serve

2018-03-12 Thread Jeff King
On Tue, Mar 06, 2018 at 07:29:02AM +0100, Jeff King wrote:

> > We want to do better (e.g. see [1]) but that's a bigger change than
> > the initial protocol v2.
> > 
> > As Brandon explained it to me, we really do want to use stateless-rpc
> > semantics by default, since that's just better for maintainability.
> > Instead of having two protocols, one that is sane and one that
> > struggles to hoist that into stateless-rpc, there would be one
> > stateless baseline plus capabilities to make use of state.
> 
> Yes, I think that would be a nice end-game. It just wasn't clear to me
> where we'd be in the interim.

After some more thinking about this, and a little chatting with Brandon
at the contrib summit, I'm willing to soften my position on this.

Basically I was concerned about this as a regression where git-over-ssh
would stop working in a few corner cases. And it would cease to be
available as an escape hatch for those cases where http wouldn't work.

But we may be OK in this "interim" period (before unified
stateful-negotiation bits are added back) because v2 would not yet be
the default. So the ssh cases can't regress without flipping the v2
switch manually, and any escape hatch would continue to work by flipping
back to v1 anyway.

So it's probably OK to continue experimenting in this direction and see
how often it's a problem in practice.

-Peff


Re: Git Merge contributor summit notes

2018-03-12 Thread Jeff King
On Sat, Mar 10, 2018 at 02:01:14PM +0100, Ævar Arnfjörð Bjarmason wrote:

> >  - (peff) Time to deprecate the git anonymous protocol?
> [...]
> 
> I think the conclusion was that nobody cares about the git:// protocol,
> but people do care about it being super easy to spin up a server, and
> currently it's easiest to spin up git://, but we could also ship with
> some git-daemon mode that had a stand-alone webserver (or ssh server) to
> get around that.

I don't think keeping support for git:// is too onerous at this point
(especially because it should make the jump to protocol v2 with the
rest). But it really is a pretty dated protocol, lacking any kind of
useful security properties (yes, I know, if we're all verifying signed
tags it's great, but realistically people are fetching the tip of master
over a hijack-able TCP connection and running arbitrary code on the
result). It might be nice if it went away completely so we don't have to
warn people off of it.

The only thing git:// really has going over git-over-http right now is
that it doesn't suffer from the stateless-rpc overhead. But if we unify
that behavior in v2, then any advantage goes away.

I do agree we should have _something_ that is easy to spin up. But it
would be wonderful if git-over-http could become that, and we could just
deprecate git://. I suppose it's possible people build clients without
curl, but I suspect that's an extreme minority these days (most third
party hosters don't seem to offer git:// at all).

-Peff


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-03-12 Thread Jonathan Nieder
Jeff King wrote:

> We could even give it an environment variable, which would allow
> something like:
>
>   tar xf maybe-evil.git.tar
>   cd maybe-evil
>   export GIT_TRUST_REPO=false
>   git log

Interesting idea.  Putting it in an envvar means it gets inherited by
child processes, which if I understand you correctly is a good thing.

[...]
>   1. We have to manually annotate any "dangerous" code to act more
>  safely when it sees the flag. Which means it's highly likely to
>  a spot, or to add a new feature which doesn't respect it. And
>  suddenly that's a security hole. So I'm concerned it may create a
>  false sense of security and actually make things worse.

As an internal implementation detail, this is so obviously fragile
that it wouldn't give me any feeling of security. ;-)  So it should be
strictly an improvement.

As a public-facing feature, I suspect it's a bad idea for exactly that
reason.

FWIW for pager specifically I am going for a whitelisting approach:
new commands would have to explicitly set ALLOW_PAGER if they want to
respect pager config.  That doesn't guarantee people think about it
again as things evolve but it should at least help with getting the
right setting for new plumbing.

Thanks,
Jonathan


Re: Git Merge contributor summit notes

2018-03-12 Thread Jeff King
On Fri, Mar 09, 2018 at 04:06:18PM -0800, Alex Vandiver wrote:

> It was great to meet some of you in person!  Some notes from the
> Contributor Summit at Git Merge are below.  Taken in haste, so
> my apologies if there are any mis-statements.

Thanks very much for these notes!

I think in future years we should do a better job of making sure we have
an official note-taker so that this stuff makes it onto the list. I was
very happy when you announced part-way through the summit that you had
already been taking notes. :)

>   "Does anyone think there's a compelling reason for git to exist?"
> - peff

Heh, those words did indeed escape my mouth.

Your notes look accurate overall from a brief skim. I'm still post-trip
recovering, but I may try to follow-up and expand on a few areas where I
have thoughts. And I'd encourage others to do the same as a way of
bridging the discussion back to the list.

-Peff


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-03-12 Thread Jeff King
On Mon, Mar 12, 2018 at 03:43:55PM -0700, Jonathan Nieder wrote:

> Hi,
> 
> Jeff King wrote:
> > On Thu, Feb 22, 2018 at 01:26:34PM -0800, Jonathan Nieder wrote:
> 
> >> Keep in mind that git upload-archive (a read-only command, just like
> >> git upload-pack) also already has the same issues.
> >
> > Yuck. I don't think we've ever made a historical promise about that. But
> > then, I don't think the promise about upload-pack has ever really been
> > documented, except in mailing list discussions.
> 
> Sorry to revive this old side-thread.  Good news: for a dashed command
> like git-upload-archive, the pager selection code only runs for
> commands with RUN_SETUP or RUN_SETUP_GENTLY:
> 
>   if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
>   !(p->option & DELAY_PAGER_CONFIG))
>   use_pager = check_pager_config(p->cmd);
> 
> None of upload-pack, receive-pack,git-serve, or upload-archive set
> those flags, so we (narrowly) escape trouble here.

Right, I saw that earlier. But I actually think that is stale from the
days when it wasn't safe to call check_pager_config() too early. So I
could very well see somebody removing it and causing a spooky
vulnerability at a distance.

> Later today I should be able to send a cleanup to make the behavior
> more obvious.

Thanks. I'm still on the fence over the whole builtin concept, but
certainly a "don't ever turn on a pager" flag seems like a reasonable
thing to have.

An alternative approach is some kind of global for "don't trust the
local repo" flag. That could be respected from very low-level code
(e.g., where we read and/or respect the pager command, but also in other
places like hooks, other config that runs arbitrary commands, etc). And
then upload-pack would set that to "do not trust", and other programs
would default to "trust".

We could even give it an environment variable, which would allow
something like:

  tar xf maybe-evil.git.tar
  cd maybe-evil
  export GIT_TRUST_REPO=false
  git log

without worrying about pager.log config, etc. My two concerns with this
approach would be:

  1. We have to manually annotate any "dangerous" code to act more
 safely when it sees the flag. Which means it's highly likely to
 a spot, or to add a new feature which doesn't respect it. And
 suddenly that's a security hole. So I'm concerned it may create a
 false sense of security and actually make things worse.

  2. As a global, I'm not sure how it would interact with multi-repo
 processes like submodules. In theory it ought to go into the
 repository struct, but it would often need to be set globally
 before we've even discovered the repo.

 That might be fine, though. It's really more about context than
 about a specific repo (so you may say "don't trust this repo", and
 that extends to any submodules you happen to access, too).

I dunno. I think (2) is probably OK, but (1) really gives me pause.

-Peff


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Jonathan Nieder
Hi,

Hilco Wijbenga wrote:
> On Mon, Mar 12, 2018 at 2:35 PM, Jonathan Nieder  wrote:

>> Interesting.  I would be tempted to resolve this inconsistency the
>> other way: by doing a half-hearted two-way merge (e.g. by picking one
>> of the two versions of the colliding file) and marking the path as
>> conflicted in the index.  That way it's more similar to edit/edit,
>> too.
>
> If work is going to be done in this area, would it be possible to
> include making auto-merging (in general) optional? Preferably,
> configurable by file (or glob) but I'd already be happy with a global
> setting to opt out.

Have you experimented with the 'merge' attribute (see "git help
attributes")?  E.g. you can put

 * -merge

in .gitattributes or .git/info/attributes.

If that helps, then a patch adding a pointer to the most helpful place
(maybe git-merge.txt?) would be very welcome.

Thanks and hope that helps,
Jonathan


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Hilco Wijbenga
On Mon, Mar 12, 2018 at 2:35 PM, Jonathan Nieder  wrote:
> Interesting.  I would be tempted to resolve this inconsistency the
> other way: by doing a half-hearted two-way merge (e.g. by picking one
> of the two versions of the colliding file) and marking the path as
> conflicted in the index.  That way it's more similar to edit/edit,
> too.

If work is going to be done in this area, would it be possible to
include making auto-merging (in general) optional? Preferably,
configurable by file (or glob) but I'd already be happy with a global
setting to opt out.

I keep running into bugs caused by Git's automatic merging. (And I
don't see how this could be improved without teaching Git the
specifics of various file types.) It's especially hard when rebasing
large numbers of commits. The bug is introduced early on but I will
not notice anything is wrong until late in the rebase.

(Since I seem to keep asking for things that turn out to already have
been implemented ... if this is already possible please just point me
to the right setting and consider me a happy camper. :-) )


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-03-12 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Thu, Feb 22, 2018 at 01:26:34PM -0800, Jonathan Nieder wrote:

>> Keep in mind that git upload-archive (a read-only command, just like
>> git upload-pack) also already has the same issues.
>
> Yuck. I don't think we've ever made a historical promise about that. But
> then, I don't think the promise about upload-pack has ever really been
> documented, except in mailing list discussions.

Sorry to revive this old side-thread.  Good news: for a dashed command
like git-upload-archive, the pager selection code only runs for
commands with RUN_SETUP or RUN_SETUP_GENTLY:

if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
!(p->option & DELAY_PAGER_CONFIG))
use_pager = check_pager_config(p->cmd);

None of upload-pack, receive-pack,git-serve, or upload-archive set
those flags, so we (narrowly) escape trouble here.

Later today I should be able to send a cleanup to make the behavior
more obvious.

Thanks again,
Jonathan


Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-12 Thread Igor Djordjevic
Hi Dscho,

On 11/03/2018 23:04, Igor Djordjevic wrote:
> 
> I`m yet to read (and reason about) your whole (very informative) 
> reply, but I just wanted to address this part first, as it might be a 
> clear end-game situation already, due to a mutual agreement, all the 
> rest being purely academic, interesting, but not any more (that) 
> important to discuss.

Ok, here`s the follow-up.

It`s "for discussion sake only", nothing really groundbreaking in 
here, I would think.

On 11/03/2018 16:40, Johannes Schindelin wrote:
> 
> > > > The main problem with this decision is that we still don't see how
> > > > and when to stop for user amendment using this method. OTOH, the
> > > > original has this issue carefully discussed.
> > >
> > > Why would we want to stop, unless there are merge conflicts?
> >
> > Because we can reliably know that something "unusual" happened - and by
> > that I don`t necessarily mean "wrong", but just might be worth user
> > inspection.
> 
> We have a similar conundrum in recursive merges. Remember how multiple
> merge bases are merged recursively? There can be merge conflicts, too, in
> *any* of the individual merges involved, and indeed, there are (under
> relatively rare circumstances).
> 
> Since we already faced that problem, and we already answered it by
> presenting possibly nested merge conflicts, I am in strong favor of
> keeping our new scenario consistent: present possibly-nested merge
> conflicts.

This is something I didn`t really know (possibly-nested merge 
conflicts already being a regular part of Git user experience), 
thanks for explaining it.

In the light of this, I can only agree, let`s keep it consistent.

If anyone ever decides / finds out there`s a better approach in 
regards to user experience, this might get revised, but it`s a 
different beast altogether, yes.

> As far as I understand, one of the arguments in favor of the current
> approach was: there is no good way to tell the user where they are, and
> how to continue from there. So better just to continue and present the
> user with the entire set of conflicts, and have an obvious way out.

Yes, I see this as the main concern, too. I would have expected that 
being in a kind of a "limbo" for a while shouldn`t be too bad, but I 
guess that`s too academic (and inexperienced) thought, and from a 
practical point of view one may not really know how to approach the 
(iterative) conflicts in the first place, not knowing his exact 
position (nor what`s to come)...?

Or, might be we _can_ provide enough clues on where we currently are 
(even if still inside some intermediate state)...? But, this still 
might be a topic for the future, indeed, and unrelated to rebasing 
merges alone (as you pointed out already).

> > For example, situation like this (M is made on A3 with `-s ours`, 
> > obsoleting Bx commits):
> >
> > (1) ---X8--X9 (master)
> >|\
> >| A1---A2---A3
> >| \
> >|  M (topic)
> >| /
> >\-B1---B2---B3
> >
> > ... where we want to rebase M onto X9 is what I would call "usual 
> > stuff", but this situation (M is still made on A3 with `-s ours`, 
> > obsoleting Bx commits, but note cherry-picked B2'):
> >
> > (2) ---X8--B2'--X9 (master)
> >|\
> >| A1---A2---A3
> >| \
> >|  M (topic)
> >| /
> >\-B1---B2---B3
> >
> > ... where we still want to rebase M onto X9 is what we might consider 
> > "unusual", because we noticed that something that shouldn`t be part 
> > of the rebased merge commit (due to previous `-s ours`) actually got 
> > in there (due to later cherry-pick), and just wanting the user to 
> > check and confirm.
> 
> We already have those scenarios when performing a regular interactive
> rebase, where a patch was already applied upstream. In the normal case,
> the user is not even shown B2, thanks to the --cherry-pick option used in
> generating the todo list.
> 
> Granted, in some cases --cherry-pick does not detect that, and then we
> generate a todo list including B2, and when that patch is applied, the
> interactive rebase stops, saying that there are no changes to be
> committed.
> 
> And this behavior is exactly the same with --recreate-merges!
> 
> So I do not think that it would make sense to bother the user *again* when
> rebasing the merge commit.

This seems fair enough. Phillip also pointed out it might be more 
annoyance then help, but as no one was really sure of the possibilities 
we`re discussing here, I thought being better to play it a bit on the 
safe side, for the first time, at least.

I would still like to see more examples of where this U1' == U2' 
check actually helps, and counter ones, where it only serves to annoy. 
Might be we only discover them in the future, though, once the new 
functionality is in use.

> If there are merge conflicts, yes, we will have to. If there are none
> (even if your U1' != 

Re: [PATCH v4 05/35] upload-pack: factor out processing lines

2018-03-12 Thread Brandon Williams
On 03/12, Brandon Williams wrote:
> On 03/01, Junio C Hamano wrote:
> > Brandon Williams  writes:
> > 
> > > Factor out the logic for processing shallow, deepen, deepen_since, and
> > > deepen_not lines into their own functions to simplify the
> > > 'receive_needs()' function in addition to making it easier to reuse some
> > > of this logic when implementing protocol_v2.
> > 
> > These little functions that still require their incoming data to
> > begin with fixed prefixes feels a bit strange way to refactor the
> > logic for later reuse (when I imagine "reuse", the first use case
> > that comes to my mind is "this data source our new code reads from
> > gives the same data as the old 'shallow' packet used to give, but in
> > a different syntax"---so I'd restructure the code in such a way that
> > the caller figures out the syntax part and the called helper just
> > groks the "information contents" unwrapped from the surface syntax;
> > the syntax may be different in the new codepath but once unwrapped,
> > the "information contents" to be processed would not be different
> > hence we can reuse the helper).
> > 
> > IOW, I would have expected the caller to be not like this:
> > 
> > > - if (skip_prefix(line, "shallow ", )) {
> > > - struct object_id oid;
> > > - struct object *object;
> > > - if (get_oid_hex(arg, ))
> > > - die("invalid shallow line: %s", line);
> > > - object = parse_object();
> > > - if (!object)
> > > - continue;
> > > - if (object->type != OBJ_COMMIT)
> > > - die("invalid shallow object %s", 
> > > oid_to_hex());
> > > - if (!(object->flags & CLIENT_SHALLOW)) {
> > > - object->flags |= CLIENT_SHALLOW;
> > > - add_object_array(object, NULL, );
> > > - }
> > > + if (process_shallow(line, ))
> > >   continue;
> > > + if (process_deepen(line, ))
> > >   continue;
> > ...
> > 
> > but more like
> > 
> > if (skip_prefix(line, "shallow ", ) {
> > process_shallow(arg, );
> > continue;
> > }
> > if (skip_prefix(line, "deepen ", ) {
> > process_deepen(arg, );
> > continue;
> > }
> > ...
> > 
> > I need to defer the final judgment until I see how they are used,
> > though.  It's not too big a deal either way---it just felt "not
> > quite right" to me.
> 
> This is actually a really good point (and maybe the same point stefan
> was trying to make on an old revision of this series).  I think it makes
> much more sense to refactor the code to have a structure like you've
> outlined.  I'll fix this for the next version.

And then I started writing the code and now I don't know which I
prefer.  The issue is that its for processing a line which has some well
defined structure and moving the check for "shallow " away from the rest
of the code which does the processing makes it a little less clear how
that shallow line is to be defined.

-- 
Brandon Williams


Re: [PATCH v4 05/35] upload-pack: factor out processing lines

2018-03-12 Thread Brandon Williams
On 03/01, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Factor out the logic for processing shallow, deepen, deepen_since, and
> > deepen_not lines into their own functions to simplify the
> > 'receive_needs()' function in addition to making it easier to reuse some
> > of this logic when implementing protocol_v2.
> 
> These little functions that still require their incoming data to
> begin with fixed prefixes feels a bit strange way to refactor the
> logic for later reuse (when I imagine "reuse", the first use case
> that comes to my mind is "this data source our new code reads from
> gives the same data as the old 'shallow' packet used to give, but in
> a different syntax"---so I'd restructure the code in such a way that
> the caller figures out the syntax part and the called helper just
> groks the "information contents" unwrapped from the surface syntax;
> the syntax may be different in the new codepath but once unwrapped,
> the "information contents" to be processed would not be different
> hence we can reuse the helper).
> 
> IOW, I would have expected the caller to be not like this:
> 
> > -   if (skip_prefix(line, "shallow ", )) {
> > -   struct object_id oid;
> > -   struct object *object;
> > -   if (get_oid_hex(arg, ))
> > -   die("invalid shallow line: %s", line);
> > -   object = parse_object();
> > -   if (!object)
> > -   continue;
> > -   if (object->type != OBJ_COMMIT)
> > -   die("invalid shallow object %s", 
> > oid_to_hex());
> > -   if (!(object->flags & CLIENT_SHALLOW)) {
> > -   object->flags |= CLIENT_SHALLOW;
> > -   add_object_array(object, NULL, );
> > -   }
> > +   if (process_shallow(line, ))
> > continue;
> > +   if (process_deepen(line, ))
> > continue;
>   ...
> 
> but more like
> 
>   if (skip_prefix(line, "shallow ", ) {
>   process_shallow(arg, );
>   continue;
>   }
>   if (skip_prefix(line, "deepen ", ) {
>   process_deepen(arg, );
>   continue;
>   }
>   ...
> 
> I need to defer the final judgment until I see how they are used,
> though.  It's not too big a deal either way---it just felt "not
> quite right" to me.

This is actually a really good point (and maybe the same point stefan
was trying to make on an old revision of this series).  I think it makes
much more sense to refactor the code to have a structure like you've
outlined.  I'll fix this for the next version.

-- 
Brandon Williams


Re: [PATCH v4 19/35] push: pass ref patterns when pushing

2018-03-12 Thread Brandon Williams
On 03/02, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Construct a list of ref patterns to be passed to 'get_refs_list()' from
> > the refspec to be used during the push.  This list of ref patterns will
> > be used to allow the server to filter the ref advertisement when
> > communicating using protocol v2.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  transport.c | 26 +-
> >  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> When you are pushing 'master', we no longer hear what the other end
> has at 'next', with this change, no?
> 
> In a project whose 'master' is extended primarily by merging topics
> that have been cooking in 'next', old way of pushing would only have
> transferred the merge commits and resulting trees but not bulk of
> the blob data because they are all available on 'next', would it
> make the object transfer far less efficient, I wonder?
> 
> I guess it is OK only because the push side of the current protocol
> does not do common ancestor discovery exchange ;-)

Yep, though we've been throwing around ideas of adding that in push v2
after we figure out a good way to improve negotiation with fetch.

> 
> >
> > diff --git a/transport.c b/transport.c
> > index dfc603b36..bf7ba6879 100644
> > --- a/transport.c
> > +++ b/transport.c
> > @@ -1026,11 +1026,35 @@ int transport_push(struct transport *transport,
> > int porcelain = flags & TRANSPORT_PUSH_PORCELAIN;
> > int pretend = flags & TRANSPORT_PUSH_DRY_RUN;
> > int push_ret, ret, err;
> > +   struct refspec *tmp_rs;
> > +   struct argv_array ref_patterns = ARGV_ARRAY_INIT;
> > +   int i;
> >  
> > if (check_push_refs(local_refs, refspec_nr, refspec) < 0)
> > return -1;
> >  
> > -   remote_refs = transport->vtable->get_refs_list(transport, 1, 
> > NULL);
> > +   tmp_rs = parse_push_refspec(refspec_nr, refspec);
> > +   for (i = 0; i < refspec_nr; i++) {
> > +   const char *pattern = NULL;
> > +
> > +   if (tmp_rs[i].dst)
> > +   pattern = tmp_rs[i].dst;
> > +   else if (tmp_rs[i].src && !tmp_rs[i].exact_sha1)
> > +   pattern = tmp_rs[i].src;
> > +
> > +   if (pattern) {
> > +   if (tmp_rs[i].pattern)
> > +   argv_array_push(_patterns, pattern);
> > +   else
> > +   expand_ref_pattern(_patterns, 
> > pattern);
> > +   }
> > +   }
> > +
> > +   remote_refs = transport->vtable->get_refs_list(transport, 1,
> > +  _patterns);
> > +
> > +   argv_array_clear(_patterns);
> > +   free_refspec(refspec_nr, tmp_rs);
> >  
> > if (flags & TRANSPORT_PUSH_ALL)
> > match_flags |= MATCH_REFS_ALL;

-- 
Brandon Williams


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Ævar Arnfjörð Bjarmason

On Mon, Mar 12 2018, Elijah Newren jotted:

> Hi everyone,
>
> I'd like to change add/add conflict resolution.  Currently when such a
> conflict occurs (say at ${path}), git unconditionally does a two-way
> merge of the two files and sticks the result in the working tree at
> ${path}.
>
> I would like to make it instead first check whether the two files are
> similar; if they are, then do the two-way merge, but if they're not,
> then instead write the two files out to separate paths (${path}~HEAD
> and ${path}~$MERGE, while making sure that ${path} is removed from the
> working copy).
>
> Thoughts?
>
> I have a patch series[1] with more details and other changes, but
> wanted to especially get feedback on this issue even from folks that
> didn't have enough time to read the patches or even the cover letter.

Does this mean that e.g. in this case of merging two files, one
containing "foo" and one containing "bar":

(
rm -rf /tmp/test.git &&
git init /tmp/test.git &&
cd /tmp/test.git &&
echo foo >README &&
git add README &&
git commit -mfoo &&
git checkout --orphan trunk &&
git reset --hard &&
echo bar >README &&
git add README &&
git commit -mbar &&
git merge --allow-unrelated-histories master;
cat README
)

That instead of getting:

<<< HEAD
bar
===
foo
>>> master

I'd now get these split into different files?

I'm assuming by similarity you're talking about the same heuristic we
apply for git diff -M, i.e. if "moving" a file would consider it
removed/added instead of moved you'd want two files instead of the
two-way merge.

I don't mind this being a configurable option if you want it, but I
don't think it should be on by default, reasons:

 1) There's lots of cases where we totally screw up the "is this
similar?" check, in particular with small files.

E.g. let's say you have a config file like 'fs-path "/tmp/git"' and
in two branches you change that to 'fs-path "/opt/git"' and 'fs-path
"/var/git"'. The rename detection will think this these have nothing
to do with each other since they share no common lines, but to a
human reader they're really similar, and would make sense in the
context of resolving a bigger merge where /{opt,var}/git changes are
conflicting.

This is not some theoretical concern, there's lots of things that
e.g. use small 5-10 line config files to configure some app that
because of some combo of indentation changes and changing a couple
of lines will make git's rename detection totally give up, but to a
human reader they're 95% the same.

 2) This will play havoc with already established merge tools on top of
git which a lot of users use instead of manually resolving these in
vi or whatever.

If we made this the default they'd need to to deal with this new
state, and even if it's not the default we'll have some confused
users wondering why Emacs Ediff or whatever isn't showing the right
thing because it isn't supporting this yet.

So actually, given that last point in #2 I'm slightly negative on the
whole thing, but maybe splitting it into some new format existing tools
don't understand is compelling enough to justify the downstream breakage.

I don't think we've ever documented the format we leave the tree in
after a failed merge as equivalent to plumbing, but for the purposes of
tools that build on top of git it really is.


Re: [PATCH v4 18/35] fetch: pass ref patterns when fetching

2018-03-12 Thread Brandon Williams
On 03/02, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > diff --git a/builtin/fetch.c b/builtin/fetch.c
> > index 850382f55..695fafe06 100644
> > --- a/builtin/fetch.c
> > +++ b/builtin/fetch.c
> > @@ -332,11 +332,25 @@ static struct ref *get_ref_map(struct transport 
> > *transport,
> > struct ref *rm;
> > struct ref *ref_map = NULL;
> > struct ref **tail = _map;
> > +   struct argv_array ref_patterns = ARGV_ARRAY_INIT;
> >  
> > /* opportunistically-updated references: */
> > struct ref *orefs = NULL, **oref_tail = 
> >  
> > -   const struct ref *remote_refs = transport_get_remote_refs(transport, 
> > NULL);
> > +   const struct ref *remote_refs;
> > +
> > +   for (i = 0; i < refspec_count; i++) {
> > +   if (!refspecs[i].exact_sha1) {
> > +   if (refspecs[i].pattern)
> > +   argv_array_push(_patterns, refspecs[i].src);
> > +   else
> > +   expand_ref_pattern(_patterns, 
> > refspecs[i].src);
> > +   }
> > +   }
> > +
> > +   remote_refs = transport_get_remote_refs(transport, _patterns);
> > +
> > +   argv_array_clear(_patterns);
> 
> Is the idea here, which is shared with 17/35 about ls-remote, that
> we used to grab literally everything they have in remote_refs, but
> we have code in place to filter that set using refspecs given in the
> remote.*.fetch configuration, so it is OK as long as we grab everything
> that would match the remote.*.fetch pattern?  That is, grabbing too
> much is acceptable, but if we populated ref_patterns[] with too few
> patterns and fail to ask refs that would match our refspec it would
> be a bug?

Yes that's the idea.  Right now we're in the state where we ask for
everything (since there is no server side filtering) and the client just
does its own filtering after the fact using the refspec.  So if we end
up not sending enough ref patterns to match what the refspec is, it
would be a bug.

> 
> The reason behind this question is that I am wondering if/how we can
> take advantage of this remote-side pre-filtering while doing "fetch
> --prune".

Hmm maybe, assuming prune then means "get rid of all remote-tracking
branches that don't match the user provided refspec"

> 
> Thanks.
> 
> >  
> > if (refspec_count) {
> > struct refspec *fetch_refspec;

-- 
Brandon Williams


Re: [PATCH v4 12/35] serve: introduce git-serve

2018-03-12 Thread Brandon Williams
On 03/01, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> >  Documentation/technical/protocol-v2.txt | 171 
> 
> Unlike other things in Documentation/technical/, this is not listed
> on TECH_DOCS list in Documentation/Makefile.  Shouldn't it be?

Yes it should, I'll fix that.

-- 
Brandon Williams


Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-12 Thread Ævar Arnfjörð Bjarmason

On Mon, Mar 12 2018, Junio C. Hamano jotted:

> On Mon, Mar 12, 2018 at 11:56 AM, Ævar Arnfjörð Bjarmason
>  wrote:
>> As someone who expects to use this (although hopefully in slightly
>> modified form), it's very useful if we can keep the useful semantics in
>> gc.* config values without needing some external job finding repos and
>> creating *.keep files to get custom behavior.
>>
>> E.g. I have the use-case of wanting to set this on servers that I know
>> are going to be used for cloning some big repos in user's ~ directory
>> manually, so if I can set something sensible in /etc/gitconfig that's
>> great, but it sucks a lot more to need to write some cronjob that goes
>> hunting for repos in those ~ directories and tweaks *.keep files.
>
> Yeah, but that is exactly what I suggested, no? That is, if you don't do any
> specific marking to describe _which_ ones need to be kept, this new thing
> would kick in and pick the largest one and repack all others. If you choose
> to want more control, on the other hand, you can mark those packs you
> would want to keep, and this mechanism will not kick in to countermand
> your explicit settings done via those .keep files.

Yes, this configurable mechanism as it stands only needs /etc/gitconfig.

What I was pointing out in this mail is that we really should get the
advanced use-cases right as well (see my
87a7vdqegi@evledraar.gmail.com for details) via the config, because
it's a pain to cross the chasm between setting config centrally on the
one hand, and needing to track down .git's in arbitrary locations on the
FS (you may not have cloned them yourself) to set *.keep flags.

Doubly so if the machines in questions are just the laptops of some
developers. It's relatively easy to tell them "we work with git repos,
run this git config commands", not so easy to have them install & keep
up-to-date some arbitrary cronjob that needs to hunt down their repos
and set *.keep flags.


Re: [PATCH v4 02/35] pkt-line: allow peeking a packet line without consuming it

2018-03-12 Thread Brandon Williams
On 03/01, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > +enum packet_read_status packet_reader_read(struct packet_reader *reader)
> > +{
> > +   if (reader->line_peeked) {
> > +   reader->line_peeked = 0;
> > +   return reader->status;
> > +   }
> > +
> > +   reader->status = packet_read_with_status(reader->fd,
> > +>src_buffer,
> > +>src_len,
> > +reader->buffer,
> > +reader->buffer_size,
> > +>pktlen,
> > +reader->options);
> > +
> > +   switch (reader->status) {
> > +   case PACKET_READ_EOF:
> > +   reader->pktlen = -1;
> > +   reader->line = NULL;
> > +   break;
> > +   case PACKET_READ_NORMAL:
> > +   reader->line = reader->buffer;
> > +   break;
> > +   case PACKET_READ_FLUSH:
> > +   reader->pktlen = 0;
> > +   reader->line = NULL;
> > +   break;
> > +   }
> > +
> > +   return reader->status;
> > +}
> 
> With the way _peek() interface interacts with the reader instance
> (which by the way I find is well designed), it is understandable
> that we want almost everything available in reader's fields, but
> having to manually clear pktlen field upon non NORMAL status feels
> a bit strange.  
> 
> Perhaps that is because the underlying packet_read_with_status()
> does not set *pktlen in these cases?  Shouldn't it be doing that so
> the caller does not have to?

That's true, I'll fix that.


-- 
Brandon Williams


Re: Bug in "revision.c: --all adds HEAD from all worktrees" ?

2018-03-12 Thread Ævar Arnfjörð Bjarmason

On Thu, Nov 16 2017, Luke Diamand jotted:

> On 15 November 2017 at 22:08, Junio C Hamano  wrote:
>> Luke Diamand  writes:
>>
>>> Quite a few of the worktrees have expired - their head revision has
>>> been GC'd and no longer points to anything sensible
>>> (gc.worktreePruneExpire). The function other_head_refs() in worktree.c
>>> bails out if there's an error, which I think is the problem. I wonder
>>> if it should instead just report something and then keep going.
>>
>> Am I correct to understand that your "git fsck" would fail because
>> these HEAD refs used by other stale worktrees are pointing at
>> missing objects?
>
> git fsck says:
>
> Checking object directories: 100% (256/256), done.
> Checking objects: 100% (1434634/1434634), done.
> error: HEAD: invalid reflog entry
> 7fa2b7ee4bc0d11529f659db8b13ed1f537d2a98
> error: HEAD: invalid reflog entry
> 7fa2b7ee4bc0d11529f659db8b13ed1f537d2a98
> error: HEAD: invalid reflog entry
> 7e79e09e8a7382f91610f7255a1b99ea59f68c0b
> error: refs/stash: invalid reflog entry
> feeb35e7b045d28943c706e761d0a2ac8206af2f
> error: refs/remotes/origin/master: invalid reflog entry
> 7fa2b7ee4bc0d11529f659db8b13ed1f537d2a98
> Checking connectivity: 1419477, done.
> missing tree 1480c0a7ed2ad59ae701667292399c38d294658e
> missing tree ca2a01116bfbbd1fcbcf9812b95d8dc6c39e69d5
> missing tree 5b7c41e547fc5c4c840e5b496da13d3daebc5fbe
> ...
> ...
>
>>
>> What do you mean by "expired"?  "Even though I want to keep using
>> them, Git for some reason decided to destroy them." or "I no longer
>> use them but kept them lying around."?
>
> git worktree automatically prunes work trees:
>
> "The working tree’s administrative files in the repository (see
> "DETAILS" below) will eventually be removed automatically (see
> gc.worktreePruneExpire in git-config(1)),"
>
> In my case I didn't actually want them removed, but fortunately
> there's nothing important in them (there certainly isn't anymore...).
>
>>
>> If the latter, I wonder "worktree prune" to remove the
>> admininstrative information for them would unblock you?
>
> It doesn't seem to help.
>
> $ git worktree prune -n
> 
> $ git worktree prune
> $ git fetch
> remote: Counting objects: 35, done.
> remote: Compressing objects: 100% (20/20), done.
> remote: Total 21 (delta 17), reused 5 (delta 1)
> Unpacking objects: 100% (21/21), done.
> fatal: bad object HEAD
> error: ssh://whatever/myrepol did not send all necessary objects
> $ /usr/bin/git-2.7.3 fetch
> 

Digging up this old thread. I've also noticed this bug because I tried
to "git repack -A -d" a repo on a GitLab server and just got "fatal: bad
object HEAD".

Bisect revealed that the reason was that GitLab itself runs 2.14.3,
which is the last release version that doesn't have Duy's d0c39a49cc
("revision.c: --all adds HEAD from all worktrees", 2017-08-23), and that
the HEAD revision of some worktrees was corrupt (GitLab creates squash-*
worktrees for some reason).

Doing a "git worktree prune" beforehand makes it work.

This can be reproduced with:

(
rm -rf /tmp/git &&
git clone --bare https://github.com/git/git.git /tmp/git &&
cd /tmp/git &&
git worktree add blah HEAD &&
echo  > worktrees/blah/HEAD
)

Now, obviously the root cause is that the repo is corrupt through some
other bug (since fixed?), but the error message here is really bad, and
should at least say "fatal: bad object HEAD in worktree blah" or
something like that.


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Jonathan Nieder
Hi again,

Elijah Newren wrote:
> On Mon, Mar 12, 2018 at 11:47 AM, Jonathan Nieder  wrote:

>> Would this behavior be configurable or unconditional?  I suspect I
>> would want it turned off in my own use.
>>
>> On the other hand, in the case of wild difference between the two
>> files, skipping the two-way merge and just writing one of the versions
>> to the worktree (like we do for binary files) sounds like something I
>> would like in my own use.
>
> I think you just said the exact opposite thing in these last two
> paragraphs; that you wouldn't want my proposed behavior and that you'd
> want it.  I suspect that may mean that I misunderstood something you
> said here.  Could you clarify?

Sorry for the lack of clarity.  My understanding was that the proposed
behavior was to write two files:

${path}~HEAD
${path}~MERGE

My proposal is instead to write one file:

${path}

with the content that would have gone to ${path}~HEAD.  This is what
already happens when trying to merge binary files.

[...]
>> Can you add something more about the motivation to the commit message?
>> E.g. is this about performance, interaction with some tools, to
>> support some particular workflow, etc?
>
> To be honest, I'm a little unsure how without even more excessive and
> repetitive wording across commits.

Simplest way IMHO is to just put the rationale in patch 5/5. :)  In
other words, explain the rationale for the end-user facing change in the
same patch that changes the end-user facing behavior.

> Let me attempt here, and maybe you
> can suggest how to change my commit messages?
>
>   * When files are wildly dissimilar -- as you mentioned -- it'd be
> easier for users to resolve conflicts if we wrote files out to
> separate paths instead of two-way merging them.

Today what we do (in both the wildly-dissimilar case and the
less-dissimilar case) is write one proposed resolution to the worktree
and put the competing versions in the index.  Tools like "git
mergetool" are then able to pull the competing versions out of the
index to allow showing them at the same time.

My bias is that I've used VCSes before that wrote multiple competing
files to the worktree and I have been happier with my experience
resolving conflicts in git.  E.g. at any step I can run a build to try
out the current proposed resolution, and there's less of a chance of
accidentally commiting a ~HEAD file.

[...]
> There are three types of conflicts representing two (possibly
> unrelated) files colliding at the same path: add/add, rename/add, and
> rename/rename(2to1).  add/add does the two-way merge of the colliding
> files, and the other two conflict types write the two colliding files
> out to separate paths.

Interesting.  I would be tempted to resolve this inconsistency the
other way: by doing a half-hearted two-way merge (e.g. by picking one
of the two versions of the colliding file) and marking the path as
conflicted in the index.  That way it's more similar to edit/edit,
too.

Thanks,
Jonathan


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Elijah Newren
Hi,

Cool, thanks for taking a look!

On Mon, Mar 12, 2018 at 11:47 AM, Jonathan Nieder  wrote:
>
> My immediate reaction is that it seems inconsistent with the rest of
> merge behavior.  Why would add/add behave this way but edit/edit not
> behave this way?

Fair enough.  I have two separate reasons for believing that edit/edit
IS different than add/add; further, the current behavior for add/add
is inconsistent with the rest of merge behavior on a different axis.
I think it's helpful to get the motivation for my changes before
trying to explain why those are different and before delving into the
other inconsistency, so I'll add the explanation of this claim at the
end of this email.

> Would this behavior be configurable or unconditional?  I suspect I
> would want it turned off in my own use.
>
> On the other hand, in the case of wild difference between the two
> files, skipping the two-way merge and just writing one of the versions
> to the worktree (like we do for binary files) sounds like something I
> would like in my own use.

I think you just said the exact opposite thing in these last two
paragraphs; that you wouldn't want my proposed behavior and that you'd
want it.  I suspect that may mean that I misunderstood something you
said here.  Could you clarify?

> Can you add something more about the motivation to the commit message?
> E.g. is this about performance, interaction with some tools, to
> support some particular workflow, etc?

To be honest, I'm a little unsure how without even more excessive and
repetitive wording across commits.  Let me attempt here, and maybe you
can suggest how to change my commit messages?

  * When files are wildly dissimilar -- as you mentioned -- it'd be
easier for users to resolve conflicts if we wrote files out to
separate paths instead of two-way merging them.
  * There is a weird inconsistency between handling of add/add,
rename/add, and rename/rename(2to1).  I want this inconsistency fixed.
  * There is a significant performance optimization possible for
rename detection in git merge if we remove the above inconsistency and
treat these conflict types the same.  (Actually, we only need them to
be the same in the special case where a renamed file is unmodified on
the un-renamed side of history, but I don't want to special case that
situation because it sounds like a recipe for inconsistent results).
  * If we insist on these conflict types being handled differently
because there really is some important distinction between them, then
I'm almost certain that build_fake_ancestor() and it's usage in
am/rebase is broken/wrong.  This is because the usage of
build_fake_ancestor(), as currently implemented, will cause
mis-detection of one conflict type as another.

> Thanks and hope that helps,
> Jonathan

As promised above, here are my reasons for believing that edit/edit IS
fundamentally different than add/add for the behavior considered here,
as well as my explanation of the weird inconsistency add/add currently
has with the rest of merge behavior:

==Reason 1==
edit/edit means that ${path} existed in the merge base as well as both
sides.  It is more likely than not that ${path} on each side is
similar to the merge base and thus similar (though less so) to each
other.  On the other hand, add/add means ${path} did NOT exist in the
merge base.  Thus, we cannot use the same reason to believe they are
similar.  The only reason we have to assume similarity is the
filename, which, while a useful indicator, gives us a weird
inconsistency within git for handling pathname collisions -- see
"Weird inconsistency" below.

==Reason 2==
In merges, git does rename detection, but not copy or break detection.
In particular, break detection is what would allow us to find out that
the edited files are not similar to the original.  add/add conflicts
automatically come to us "pre-broken" (is there a better term for
knowing that two files are not paired?), though it is possible that
they just happen to be similar and should be paired/joined.

==Follow on to reason 2==
Not doing break detection means we get e.g. rename/add-source cases
wrong, so there are valid arguments for saying we should add break
detection.  However, it would be computationally expensive and would
require a fair amount of work re-thinking all the cases in
merge-recursive to make sure we get rename-breaks right (there are
FIXMEs and comments and testcases I added years ago to help someone
along the path if they want to try).  Since rename/add-source just
isn't a conflict type that occurs much (if it all) in real world
repositories, the motivation to implement it is somewhat low.

==Weird inconsistency git currently exhibits==
There are three types of conflicts representing two (possibly
unrelated) files colliding at the same path: add/add, rename/add, and
rename/rename(2to1).  add/add does the two-way merge of the colliding
files, and the other two conflict types write the two colliding files
out to 

Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-12 Thread Junio C Hamano
On Mon, Mar 12, 2018 at 11:56 AM, Ævar Arnfjörð Bjarmason
 wrote:
> As someone who expects to use this (although hopefully in slightly
> modified form), it's very useful if we can keep the useful semantics in
> gc.* config values without needing some external job finding repos and
> creating *.keep files to get custom behavior.
>
> E.g. I have the use-case of wanting to set this on servers that I know
> are going to be used for cloning some big repos in user's ~ directory
> manually, so if I can set something sensible in /etc/gitconfig that's
> great, but it sucks a lot more to need to write some cronjob that goes
> hunting for repos in those ~ directories and tweaks *.keep files.

Yeah, but that is exactly what I suggested, no? That is, if you don't do any
specific marking to describe _which_ ones need to be kept, this new thing
would kick in and pick the largest one and repack all others. If you choose
to want more control, on the other hand, you can mark those packs you
would want to keep, and this mechanism will not kick in to countermand
your explicit settings done via those .keep files.


Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-12 Thread Ævar Arnfjörð Bjarmason

On Tue, Mar 06 2018, Nguyễn Thái Ngọc Duy jotted:

> pack-objects could be a big memory hog especially on large repos,
> everybody knows that. The suggestion to stick a .keep file on the
> giant base pack to avoid this problem is also known for a long time.
>
> Let's do the suggestion automatically instead of waiting for people to
> come to Git mailing list and get the advice. When a certain condition
> is met, "gc --auto" tells "git repack" to keep the base pack around.
> The end result would be two packs instead of one.
>
> On linux-2.6.git, valgrind massif reports 1.6GB heap in "pack all"
> case, and 535MB [1] in "pack all except the base pack" case. We save
> roughly 1GB memory by excluding the base pack.
>
> gc --auto decides to do this based on an estimation of pack-objects
> memory usage, which is quite accurate at least for the heap part, and
> whether that fits in half of system memory (the assumption here is for
> desktop environment where there are many other applications running).
>
> Since the estimation may be inaccurate and that 1/2 threshold is
> really arbitrary, give the user a finer control over this mechanism:
> if the largest pack is larger than gc.bigBasePackThreshold, it's kept.
>
> PS. A big chunk of the remaining 535MB is the result of pack-objects
> running rev-list internally. This will be dealt with when we could run
> rev-list externally. Right now we can't because pack-objects internal
> rev-list does more regarding unreachable objects, which cannot be done
> by "git rev-list".
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/config.txt |   7 ++
>  Documentation/git-gc.txt |  13 
>  builtin/gc.c | 153 +--
>  builtin/pack-objects.c   |   2 +-
>  config.mak.uname |   1 +
>  git-compat-util.h|   4 +
>  pack-objects.h   |   2 +
>  t/t6500-gc.sh|  29 
>  8 files changed, 204 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index f57e9cf10c..120cf6bac9 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1549,6 +1549,13 @@ gc.autoDetach::
>   Make `git gc --auto` return immediately and run in background
>   if the system supports it. Default is true.
>
> +gc.bigBasePackThreshold::
> + Make `git gc --auto` only enable `--keep-base-pack` when the
> + base pack's size is larger than this limit (in bytes).
> + Defaults to zero, which disables this check and lets
> + `git gc --auto` determine when to enable `--keep-base-pack`
> + based on memory usage.
> +

I'm really keen to use this (and would be happy to apply a patch on
top), but want to get your thoughts first, see also my just-sent
87bmftqg1n@evledraar.gmail.com
(https://public-inbox.org/git/87bmftqg1n@evledraar.gmail.com/).

The thing I'd like to change is that the underlying --keep-pack= takes a
list of paths (good!), but then I think this patch needlessly
complicates things by talking about "base packs" and having the
implementation limitation that we only ever pass one --keep-pack down to
pack-objects (bad!).

Why don't we instead just have a gc.* variable that you can set to some
size of pack that we'll always implicitly *.keep? That way I can
e.g. clone a 5GB pack and set the limit to 2GB, then keep adding new
content per the rules of gc.autoPackLimit, until I finally create a
larger than 2GB pack, at that point I'll have 5GB, 2GB, and some smaller
packs and loose objects.

We already have pack.packSizeLimit, perhaps we could call this
e.g. gc.keepPacksSize=2GB?

Or is there a use-case for still having the concept of a "base" pack? Is
it magic in some way? Maybe I'm missing something but I don't see why,
we can just stop thinking about whether some one pack is larger than X,
and consider all packs larger than X specially.

But if we do maybe an extra gc.keepBasePack=true?

Finally I wonder if there should be something equivalent to
gc.autoPackLimit for this. I.e. with my proposed semantics above it's
possible that we end up growing forever, i.e. I could have 1000 2GB
packs and then 50 very small packs per gc.autoPackLimit.

Maybe we need a gc.keepPackLimit=100 to deal with that, then e.g. if
gc.keepPacksSize=2GB is set and we have 101 >= 2GB packs, we'd pick the
two smallest one and not issue a --keep-pack for those, although then
maybe our memory use would spike past the limit.

I don't know, maybe we can leave that for later, but I'm quite keen to
turn the top-level config variable into something that just considers
size instead of "base" if possible, and it seems we're >95% of the way
to that already with this patch.

Finally, I don't like the way the current implementation conflates a
"size" variable with auto detecting the size from memory, leaving no way
to fallback to the auto-detection if you set it manually.

I think we should split out the auto-memory behavior into another

Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-12 Thread Ævar Arnfjörð Bjarmason

On Wed, Mar 07 2018, Junio C. Hamano jotted:

> Duy Nguyen  writes:
>>> But to those who say "packs larger than this value is too big" via
>>> configuration, keeping only the largest of these above-threshold
>>> packs would look counter-intuitive, wouldn't it, I wonder?
>>
>> I think I'll just clarify this in the document. There may be a use
>> case for keeping multiple large packs, but I don't see it (*). We can
>> deal with it when it comes.
>
> When the project's history grows too much, a large pack that holds
> its first 10 years of stuff, together with another one that holds
> its second 20 years of stuff, may both be larger than the threshold
> and want to be kept.  If we pick only the largest one, we would
> explode the other one and repack together with loose objects.
>
> But realistically, those who would want to control the way in which
> their repository is packed to such a degree are very likely to add
> ".keep" files to these two packfiles themselves, so the above would
> probably not a concern.  Perhaps we shouldn't do the "automatically
> pick the largest one and exclude from repacking" when there is a
> packfile that is marked with ".keep"?

As someone who expects to use this (although hopefully in slightly
modified form), it's very useful if we can keep the useful semantics in
gc.* config values without needing some external job finding repos and
creating *.keep files to get custom behavior.

E.g. I have the use-case of wanting to set this on servers that I know
are going to be used for cloning some big repos in user's ~ directory
manually, so if I can set something sensible in /etc/gitconfig that's
great, but it sucks a lot more to need to write some cronjob that goes
hunting for repos in those ~ directories and tweaks *.keep files.


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Jonathan Nieder
Hi,

Elijah Newren wrote:

> Hi everyone,
>
> I'd like to change add/add conflict resolution.  Currently when such a
> conflict occurs (say at ${path}), git unconditionally does a two-way
> merge of the two files and sticks the result in the working tree at
> ${path}.
>
> I would like to make it instead first check whether the two files are
> similar; if they are, then do the two-way merge, but if they're not,
> then instead write the two files out to separate paths (${path}~HEAD
> and ${path}~$MERGE, while making sure that ${path} is removed from the
> working copy).
>
> Thoughts?

My immediate reaction is that it seems inconsistent with the rest of
merge behavior.  Why would add/add behave this way but edit/edit not
behave this way?

Would this behavior be configurable or unconditional?  I suspect I
would want it turned off in my own use.

On the other hand, in the case of wild difference between the two
files, skipping the two-way merge and just writing one of the versions
to the worktree (like we do for binary files) sounds like something I
would like in my own use.

Can you add something more about the motivation to the commit message?
E.g. is this about performance, interaction with some tools, to
support some particular workflow, etc?

Thanks and hope that helps,
Jonathan

> I have a patch series[1] with more details and other changes, but
> wanted to especially get feedback on this issue even from folks that
> didn't have enough time to read the patches or even the cover letter.
>
> Thanks,
> Elijah
>
> [1] https://public-inbox.org/git/20180305171125.22331-1-new...@gmail.com/


Re: [PATCH v2 4/5] pack-objects: show some progress when counting kept objects

2018-03-12 Thread Ævar Arnfjörð Bjarmason

On Tue, Mar 06 2018, Nguyễn Thái Ngọc Duy jotted:

> We only show progress when there are new objects to be packed. But
> when --keep-pack is specified on the base pack, we will exclude most
> of objects. This makes 'pack-objects' stay silent for a long time
> while the counting phase is going.
>
> Let's show some progress whenever we visit an object instead. The
> number of packed objects will be shown after if it's not the same as
> the number of visited objects.
>
> Since the meaning of this number has changed, use another word instead
> of "Counting" to hint about the change.

Can you elaborate on how the meaning has changed? With/without this on
linux.git I get:

With:

Enumerating objects: 5901144, done.
Getting object details: 100% (5901145/5901145), done.
Delta compression using up to 8 threads.

Without:

Counting objects: 5901145, done.
Delta compression using up to 8 threads.

So now we're seemingly off-by-one but otherwise doing the same thing?

As for as user feedback goes we might as well have said "Reticulating
splines", but I have some bias towards keeping the current "Counting
objects..." phrasing. We ourselves have other docs referring to it that
aren't changed by this patch, and there's
e.g. https://githubengineering.com/counting-objects/ and lots of other
3rd party docs that refer to this.


Opinions on changing add/add conflict resolution?

2018-03-12 Thread Elijah Newren
Hi everyone,

I'd like to change add/add conflict resolution.  Currently when such a
conflict occurs (say at ${path}), git unconditionally does a two-way
merge of the two files and sticks the result in the working tree at
${path}.

I would like to make it instead first check whether the two files are
similar; if they are, then do the two-way merge, but if they're not,
then instead write the two files out to separate paths (${path}~HEAD
and ${path}~$MERGE, while making sure that ${path} is removed from the
working copy).

Thoughts?

I have a patch series[1] with more details and other changes, but
wanted to especially get feedback on this issue even from folks that
didn't have enough time to read the patches or even the cover letter.

Thanks,
Elijah


[1] https://public-inbox.org/git/20180305171125.22331-1-new...@gmail.com/


Re: [RFC PATCH 0/5] Improve path collision conflict resolutions

2018-03-12 Thread Elijah Newren
On Mon, Mar 5, 2018 at 9:11 AM, Elijah Newren  wrote:

>   2) Doing content merges for a rename before looking at the path collision
>  between a rename and some other file.  In particular (in what I most
>  suspect others might have an objection to from this series), record
>  that content-merged file -- which may have conflict markers -- in the
>  index at the appropriate higher stage.
>
> 2)
>
> Previously, rename/rename(2to1) conflict resolution for the colliding path
> would just accept the index changes made by unpack_trees(), meaning that
> each of the higher order stages in the index for the path collision would
> implicitly ignore any changes to each renamed file from the other side of
> history.  Since, as noted above, all traces of the rename-source path were
> removed from both the index and the working tree, this meant that the index
> was missing information about changes to such files.  If the user tried to
> resolve the conflict using the index rather than the working copy, they
> would end up with a silent loss of changes.
>
> I "fixed" this by doing the three-way content merge for each renamed-file,
> and then recorded THAT in the index at either stage 2 or 3 as appropriate.
> Since that merge might have conflict markers, that could mean recording in
> the index a file with conflict markers as though it were a given side.
> (See patch 2 for a more detailed explanation.)  I figure this might be the
> most controversial change I made.  I can think of a few alternatives, but I
> liked all of them less.  Opinions?

Actually, I realized this last weekend that this wasn't unprecedented
after all, and thus likely not at all controversial.  In particular,
there is already a case where git stores the sha1 of a file with
conflict markers from a "provisional content merge" at a non-zero
stage in the index: If a recursive merge is needed and there is a file
with content conflicts when creating the virtual merge base, then the
file with all the conflict markers will be accepted as part of the
virtual merge base and, depending on how the outer merge goes, the
sha1 of this file with conflict markers can appear in the index at
stage 1.

So, that really only leaves the changes to the working tree files.
And based on a few factors including things mentioned in the cover
letter, I suspect most would only have an opinion about how this patch
series affects add/add conflicts.  I'll send a separate email to ask
about that, to make it clear I want folks opinions on that issue even
if they don't have enough time to review the patch series or even read
my really long cover letter.


Re: [PATCH/RFC v3 02/12] pack-objects: turn type and in_pack_type to bitfields

2018-03-12 Thread Duy Nguyen
On Fri, Mar 09, 2018 at 02:54:53PM -0800, Junio C Hamano wrote:
> Nguyễn Thái Ngọc Duy   writes:
> 
> > @@ -1570,7 +1576,7 @@ static void drop_reused_delta(struct object_entry 
> > *entry)
> > entry->depth = 0;
> >  
> > oi.sizep = >size;
> > -   oi.typep = >type;
> > +   oi.typep = 
> > if (packed_object_info(entry->in_pack, entry->in_pack_offset, ) < 0) 
> > {
> > /*
> >  * We failed to get the info from this pack for some reason;
> > @@ -1580,6 +1586,10 @@ static void drop_reused_delta(struct object_entry 
> > *entry)
> >  */
> > entry->type = sha1_object_info(entry->idx.oid.hash,
> >>size);
> 
> The comment immediately before this pre-context reads as such:
> 
>   /*
>* We failed to get the info from this pack for some reason;
>* fall back to sha1_object_info, which may find another copy.
>* And if that fails, the error will be recorded in entry->type
>* and dealt with in prepare_pack().
>*/
> 
> The rest of the code relies on the ability of entry->type to record
> the error by storing an invalid (negative) type; otherwise, it cannot
> detect an error where (1) the entry in _this_ pack was corrupt, and
> (2) we wished to find another copy of the object elsewhere (which
> would overwrite the negative entry->type we assign here), but we
> didn't find any.
> 
> How should we propagate the error we found here down the control
> flow in this new code?

Good catch! I don't have any magic trick to do this, so I'm adding an
extra bit to store type validity. Something like this as a fixup patch
(I'll resend the whole series soon).

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index fd217cb51f..f164f1797b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -265,7 +265,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
struct git_istream *st = NULL;
 
if (!usable_delta) {
-   if (entry->type == OBJ_BLOB &&
+   if (oe_type(entry) == OBJ_BLOB &&
entry->size > big_file_threshold &&
(st = open_istream(entry->idx.oid.hash, , , 
NULL)) != NULL)
buf = NULL;
@@ -371,7 +371,7 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
-   enum object_type type = entry->type;
+   enum object_type type = oe_type(entry);
off_t datalen;
unsigned char header[MAX_PACK_OBJECT_HEADER],
  dheader[MAX_PACK_OBJECT_HEADER];
@@ -480,11 +480,12 @@ static off_t write_object(struct hashfile *f,
to_reuse = 0;   /* explicit */
else if (!entry->in_pack)
to_reuse = 0;   /* can't reuse what we don't have */
-   else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
+   else if (oe_type(entry) == OBJ_REF_DELTA ||
+oe_type(entry) == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
to_reuse = usable_delta;
/* ... but pack split may override that */
-   else if (entry->type != entry->in_pack_type)
+   else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0;   /* pack has delta which is unusable */
else if (entry->delta)
to_reuse = 0;   /* we want to pack afresh */
@@ -705,8 +706,8 @@ static struct object_entry **compute_write_order(void)
 * And then all remaining commits and tags.
 */
for (i = last_untagged; i < to_pack.nr_objects; i++) {
-   if (objects[i].type != OBJ_COMMIT &&
-   objects[i].type != OBJ_TAG)
+   if (oe_type([i]) != OBJ_COMMIT &&
+   oe_type([i]) != OBJ_TAG)
continue;
add_to_write_order(wo, _end, [i]);
}
@@ -715,7 +716,7 @@ static struct object_entry **compute_write_order(void)
 * And then all the trees.
 */
for (i = last_untagged; i < to_pack.nr_objects; i++) {
-   if (objects[i].type != OBJ_TREE)
+   if (oe_type([i]) != OBJ_TREE)
continue;
add_to_write_order(wo, _end, [i]);
}
@@ -1067,7 +1068,7 @@ static void create_object_entry(const struct object_id 
*oid,
entry = packlist_alloc(_pack, oid->hash, index_pos);
entry->hash = hash;
if (type)
-   entry->type = type;
+   oe_set_type(entry, type);
if (exclude)
entry->preferred_base = 1;
else
@@ -1433,9 +1434,9 @@ static void check_object(struct object_entry *entry)
switch (entry->in_pack_type) {
  

Dear friend,

2018-03-12 Thread Baari Abdul


Dear friend,

I  Mr.Baari Abdul, Head of Operation at Bank of Africa. I want invite into a 
business overture which involves an amount of $ 22.3 million. At your 
acceptance, this 

amount will be transferred to your name as a foreign partner.
 
I need your help to get this fund to be transfer out from here to your account, 
and we share at a ratio of 50% each. You will receive this amount by bank 
transfer.
Please send your full name.
You’re directly phone numbers.
Address.
 
I will detail you more about this transaction but i need above data to make 
some vital changes in the transit account which will make your name appear as 
the true 

beneficiary of the fund. You have to contact me through my private e-mail at 
(baariabdul...@gmail.com) your prompt reply will be highly appreciated.
Sincerely.

best regard
Mr.Baari Abdul.


submodule..ignore vs git add -u

2018-03-12 Thread Miklos Vajna
Hi,

Let's say I have a fairly simple submodule setup where I do 'git
checkout' inside the submodule to check out a different commit, so the
outer repo 'git diff' shows a submodule update.

In that case

git config submodule..ignore all

makes 'git diff' or 'git commit -a' ignore the change in the outer repo,
but not 'git add -u'.

Reading the git-config documentation if this is intentional behavior,
I'm a bit confused. It specifies that:

- "git status" and the diff family: handle this setting
- git submodule commands: ignore this setting

So that about 'git add -u', is it expected that it ignores this setting
as well?

I guess either the doc should say 'git add -u' doesn't handle this
setting or 'git add -u' should handle it. Happy to try to make a patch
that does the later, but I though better ask first. :-)

Thanks,

Miklos


signature.asc
Description: Digital signature


Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag

2018-03-12 Thread Derrick Stolee

On 2/26/2018 9:32 PM, Derrick Stolee wrote:

This patch is new to the series due to the interactions with the lockfile API
and the hashfile API. I need to ensure the hashfile writes the hash value at
the end of the file, but keep the file descriptor open so the lock is valid.

I welcome any susggestions to this patch or to the way I use it in the commit
that follows.

-- >8 --


I haven't gotten any feedback on this step of the patch. Could someone 
take a look and let me know what you think?


Thanks,
-Stolee


If we want to use a hashfile on the temporary file for a lockfile, then
we need hashclose() to fully write the trailing hash but also keep the
file descriptor open.

Signed-off-by: Derrick Stolee 
---
  csum-file.c | 10 +++---
  csum-file.h |  1 +
  2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index 5eda7fb..302e6ae 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -66,9 +66,13 @@ int hashclose(struct hashfile *f, unsigned char *result, 
unsigned int flags)
flush(f, f->buffer, the_hash_algo->rawsz);
if (flags & CSUM_FSYNC)
fsync_or_die(f->fd, f->name);
-   if (close(f->fd))
-   die_errno("%s: sha1 file error on close", f->name);
-   fd = 0;
+   if (flags & CSUM_KEEP_OPEN)
+   fd = f->fd;
+   else {
+   if (close(f->fd))
+   die_errno("%s: sha1 file error on close", 
f->name);
+   fd = 0;
+   }
} else
fd = f->fd;
if (0 <= f->check_fd) {
diff --git a/csum-file.h b/csum-file.h
index 992e5c0..b7c0e48 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -29,6 +29,7 @@ extern int hashfile_truncate(struct hashfile *, struct 
hashfile_checkpoint *);
  /* hashclose flags */
  #define CSUM_CLOSE1
  #define CSUM_FSYNC2
+#define CSUM_KEEP_OPEN 4
  
  extern struct hashfile *hashfd(int fd, const char *name);

  extern struct hashfile *hashfd_check(const char *name);




Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-12 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:

[...]

> The biggest difference is that it is easy for me to see the motivation
> behind Phillip's strategy, whereas I am still puzzled why one would come
> up with a complicated strategy that splits merge commits and re-merges
> them later, and why it should work in general (I still suspect that this
> is not the case).

Because I believe that rebasing simple commit (1 parent) should be
nothing else but reduced version of rebasing any commit (N parents) at
N=1. The [RFC v2] being discussed provides exactly such a method.

OTOH, check what Phillip's version does at N=1. Is it the same as
"rebase simple commit" strategy you already happily use? If not, please
explain why it must be different.

> Where "easy" meant that I had to spend 1h still to figure out why using
> the unrebased merge parents as merge bases.

That's because you try to figure out something that is not there in the
[RFC v2]. I suggest to forget everything you've already imagined and
just read the [RFC v2] proposal afresh. It should take about 10 minutes
or less to get it. Really.

> The same amount of time did not allow me to wrap my head around
> Sergey's verbose explanations.

Honestly, I don't believe it, sorry, but I'm willing to explain anything
you wish to be explained in _[RFC v2]_.

> But I'll take your word for it that the strategies are equivalent, and go
> with the one that has both a simpler explanation (in my mind, at least),
> and an more robust implementation.

It's up to you, and it'd still be much better than what we have now, but
you will need to face the (I think unfortunate) consequences I just
summarized elsewhere in the thread.

-- Sergey


Re: [PATCH 0/4] Speed up git tag --contains

2018-03-12 Thread Derrick Stolee

On 3/3/2018 12:15 AM, Jeff King wrote:

On Fri, Jan 12, 2018 at 10:56:00AM -0800, csilvers wrote:


This is a resubmission of Jeff King's patch series to speed up git tag
--contains with some changes. It's been cooking for a while as:

Replying to this 6-year-old thread:

Is there any chance this could be resurrected?  We are using
phabricator, which uses `git branch --contains` as part of its
workflow.  Our repo has ~1000 branches on it, and the contains
operation is eating up all our CPU (and time).  It would be very
helpful to us to make this faster!

(The original thread is at
https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/

Sorry, this got thrown on my "to respond" pile and languished.


Thanks for adding me to the thread. It's good to know the pain point 
people are having around commit graph walks.



There are actually three things that make "git branch --contains" slow.

First, if you're filtering 1000 branches, we'll run 1000 merge-base
traversals, which may walk over the same commits multiple times.

These days "tag --contains" uses a different algorithm that can look at
all heads in a single traversal. But the downside is that it's
depth-first, so it tends to walk down to the roots. That's generally OK
for tags, since you often have ancient tags that mean getting close to
the roots anyway.

But for branches, they're more likely to be recent, and you can get away
without going very deep into the history.

So it's a tradeoff. There's no run-time switch to flip between them, but
a patch like this:

diff --git a/builtin/branch.c b/builtin/branch.c
index 8dcc2ed058..4d674e86d5 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
  
  	memset(, 0, sizeof(array));
  
+	filter->with_commit_tag_algo = 1;

filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN);
  
  	if (filter->verbose)


drops my run of "git branch -a --contains HEAD~100" from 8.6s to
0.4s on a repo with ~1800 branches. That sounds good, but on a repo with
a smaller number of branches, we may actually end up slower (because we
dig further down in history, and don't benefit from the multiple-branch
speedup).


It's good to know that we already have an algorithm for the multi-head 
approach. Things like `git branch -vv` are harder to tease out because 
the graph walk is called by the line-format code.



I tried to do a "best of both" algorithm in:

  https://public-inbox.org/git/20140625233429.ga20...@sigill.intra.peff.net/

which finds arbitrary numbers of merge bases in a single traversal.  It
did seem to work, but I felt uneasy about some of the corner cases.
I've been meaning to revisit it, but obviously have never gotten around
to it.

The second slow thing is that during the traversal we load each commit
object from disk. The solution there is to keep the parent information
in a faster cache. I had a few proposals over the years, but I won't
even bother to dig them up, because there's quite recent and promising
work in this area from Derrick Stolee:

   
https://public-inbox.org/git/1519698787-190494-1-git-send-email-dsto...@microsoft.com/

And finally, the thing that the patches you linked are referencing is
about using commit timestamps as a proxy for generation numbers. And
Stolee's patches actually leave room for real, trustable generation
numbers.

Once we have the serialized commit graph and generation numbers, think
the final step would just be to teach the "tag --contains" algorithm to
stop walking down unproductive lines of history. And in fact, I think we
can forget about the best-of-both multi-tip merge-base idea entirely.
Because if you can use the generation numbers to avoid going too deep,
then a depth-first approach is fine. And we'd just want to flip
git-branch over to using that algorithm by default.


I'll keep this in mind as a target for performance measurements in the 
serialized commit graph patch and the following generation number patch.


Thanks,
-Stolee


Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-12 Thread Sergey Organov
Hi Buga,

Igor Djordjevic  writes:

> Hi Dscho,

[...]

> I think the root of misunderstanding might be coming from the fact
> that Sergey was mainly describing a general concept (without a
> strictly defined implementation strategy, not being restricted to a
> specific one), where Phillip came up with a solution that eventually
> seems to use the same concept (as those transformations above should
> show), but simplifying it further inside a concrete implementation.

As a side-note, starting from sound general concept leaves a hope to
end-up with something like Git, while starting from an implementation,
however nice it is, gives a danger of ending-up with something like Bzr.

-- Sergey


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-12 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:

> Hi Buga,
>
> On Sun, 11 Mar 2018, Igor Djordjevic wrote:
>
>> I agree with both of you that `pick ` is inflexible 
>> (not to say just plain wrong), but I never thought about it like that.
>> 
>> If we are to extract further mentioned explicit old:new merge 
>> parameter mapping to a separate discussion point, what we`re 
>> eventually left with is just replacing this:
>> 
>>  merge -R -C  
>> 
>> ... with this:
>> 
>>  pick  
>
> I see where you are coming from.
>
> I also see where users will be coming from. Reading a todo list in the
> editor is as much documentation as it is a "program to execute". And I am
> afraid that reading a command without even mentioning the term "merge"
> once is pretty misleading in this setting.
>
> And even from the theoretical point of view: cherry-picking non-merge
> commits is *so much different* from "rebasing merge commits" as discussed
> here, so much so that using the same command would be even more
> misleading.

This last statement is plain wrong when applied to the method in the
[RFC] you are replying to. Using the method in [RFC], "cherry-pick
non-merge" is nothing more or less than reduced version of generic
"cherry-pick merge", exactly as it should be.

Or, in other words, "cherry-pick merge" is generalization of
"cherry-pick non-merge" to multiple parents.

-- Sergey


Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-12 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:
> Hi Sergey,

[...]

> That is misrepresenting what happened.

No, it's you who are spreading misinformation, probably unintentional,
but still.

> First, you came up with a strategy. I pointed out shortcomings that
> implied that we cannot use it unchanged. Then, Buga fixed your strategy by
> using additional steps (making the process more complicated than before,
> still without a simple-enough explanation for my liking, fixing the
> shortcomings). Then, Phillip presented a super-simple strategy and Buga
> confirmed that it also fixes the shortcomings I pointed out.

Except that you've missed very essential thing: even before Phillip
presented his method, the original has been fixed and simultaneously
became even simpler. It's now entirely described in [RFC v2] that you
apparently still refuse to read.

> I am very excited that we finally found something that works *and* is easy
> to reason about.

You have chances to be even more exited as we in fact have 2 of them
that both work and are both easy to reason about.

> Let's focus on that strategy rather than going back to the strategy which
> has known flaws and only an unsatisfyingly complex explanation.

Not that fast, as it now has no known flaws and still has surprisingly
simple explanation. It also has its own niceties that are currently
being discussed elsewhere in the thread.

-- Sergey


Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-12 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:
> Hi Sergey,
>
> On Wed, 7 Mar 2018, Sergey Organov wrote:
>
>> Johannes Schindelin  writes:
>> 
>> > How can your approach -- which relies *very much* on having the
>> > original parent commits -- not *require* that consistency check?
>> 
>> I don't understand what you mean, sorry. Could you please point me to
>> the *require* you talk about in the original proposal?
>
> Imagine a todo list that contains this line
>
>   merge -C abcdef 123456
>
> and now the user edits it (this is an interactive rebase, after all),
> adding another merge head:
>
>   merge -C abcdef 987654 123456
>
> Now your strategy would have a serious problem: to find the original
> version of 987654. If there was one.

We are talking about different checks then. My method has a built-in
check that Pillip's one doesn't. All the external checks, if any, will
have to be the same.

>
>> > What would your approach (that still has no satisfyingly trivial
>> > explanation, in my mind)
>> 
>> Here is one-liner: rebase sides of the merge commit and then 3-way
>> merge them, using original merge commit as merge base.
>
> But I already pointed out how that would undo a commit having been
> dropped.

No. Not for this version. You did it for originally flawed version of
the method, that has been already fixed by addition of "using original
merge commit as merge base" in the above sentence, and that was the
exact reason for [RFC v2], that in turn is explicitly stated at the
beginning of [RFC v2].

>
>> > do if somebody edited a `merge` command and let it merge a completely
>> > unrelated commit?
>> 
>> Don't see a problem, sorry. The method should still work, provided you have
>> original merge commit and two new parents for the new merge.
>
> That is assuming a lot. That is exactly what this consistency check is
> for, that I mentioned earlier, and which you listed as a downside of
> Phillip's strategy (forgetting that your strategy has the same downside,
> so...).

Again, we are talking about different checks. My method has a built-in
check that Pillip's doesn't. All the external checks, if any, will have
to be the same.

> But I guess that you are still talking about the non-interactive version
> of the rebase, and missed that our conversation proceeded to the point
> where we want that same strategy to work *also* in the interactive version
> (and not have a completely different functionality depending whether you
> use --interactive or not)?

For me, non-interactive is an interactive one with unmodified todo list.

-- Sergey


Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-12 Thread Sergey Organov
Hi Buga,

Igor Djordjevic  writes:

[...]

> That said, *if* we decide we like temporary commit U1' == U2' consistency 
> check (especially for non-interactive rebase, maybe), we can produce 
> these after the fact for the sake of the check only.

I don't believe interactive vs. non-interactive split is actually
helpful. I'd consider non-interactive just a special case of interactive
when user didn't edit the todo list, nothing more. No special treatment
should be required.

For one, consistency checks in both modes has similar importance, even
if only because there could be parts of history being interactively
rebased which the user didn't intend to edit, nor actually edited during
given session.

Now let me get back to pros and cons of the two approaches to rebasing
merges we have. Below I still advocate my approach by further discussing
the differences, but simultaneously I'd like to emphasize that whatever
particular way of rebasing merges will finally be used, it will be a
huge step forward and I'm glad I've raised the issue in the first place.

First, please consider the fact that my "rebase sides" method has yet
another nice property: it reduces back to original "rebase the commit"
operation when you apply it to a non-merge commit. In other words, it's
true generalization on top of rebasing of simple commit.

OTOH, Phillip's approach, when reduced to non-merge commit, still does a
version of rebase, but very specific one, and in inverse manner. I.e.,
rather than merging changes of the commit to be rebased into the new
base, it merges changes introduced by the new base into the commit being
rebased.

One consequence is that when conflict occurs, Phillip's approach will
give surprising order of ours vs theirs changes, inverted with respect
to those of the usual rebase of non-merge commit, while my approach will
give exact non-merge commit semantics. It could likely be fixed by
slightly modifying Phillip's approach, but it will make its
implementation more complex.

Another consequence is that, provided my version is used, all options
that tune "simple commit rebase" behavior will automagically work for
rebasing merge commits, in exactly the same manner. OTOH, Phillip's
approach, without special attention in implementation, will do the same
thing no matter what -m -s, or -X options say.

Yet another consequence is that my approach will likely result in better
code reuse. Even though mine seems to be harder to implement stand-alone
than Phillip's one, it should be actually easier to implement inside the
"git rebase", as it will use exactly the same machinery that "git
rebase" already uses to rebase simple commits, adding only final "git
merge-recursive" (or "git merge-resolve", or "git merge-octopus", -- any
of them  will do the job), which current implementation already performs
as well, for re-creating merges from scratch.

Second thought, unrelated to the above. To me it seems that my "rebasing
sides" approach, being entirely symmetric, is cleaner than incremental
merging suggested by Phillip, as with my approach one will still deal
with branches independently, in the same way as for simple commits,
until the single final merge operation. This comes with drawback of 1
additional step in my approach when compared to the Phillip's one
though, but then mine has definitely no issues with the exact order of
merges.

Overall, to me it seems that unmodified Phillip's approach will bring
unnecessarily wide set of new user experiences, and fixing it will
require some tricks in implementation, for no apparent reason.

-- Sergey


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-12 Thread Johannes Schindelin
Hi Buga,

I just have two thoughts to contribute as answer, so please excuse the
heavily cut quoted text:

On Sun, 11 Mar 2018, Igor Djordjevic wrote:

> Sometimes one just needs to read the manual, and I don`t really think
> this is a ton complicated, but just something we didn`t really have
> before (real merge rebasing), so it requires a moment to grasp the
> concept.

If that were the case, we would not keep getting bug reports about
--preserve-merges failing to reorder patches.

> Saying in favor of `--rebase-merges`, you mean as a separate option,
> alongside `--recreate-merges` (once that series lands)?

No. I am against yet another option. The only reason I pollute the option
name space further with --recreate-merges is that it would be confusing to
users if the new mode was called --preserve-merges=v2 (but work *totally
differently*).

Ciao,
Dscho


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-12 Thread Johannes Schindelin
Hi Buga,

On Sun, 11 Mar 2018, Igor Djordjevic wrote:

> I agree with both of you that `pick ` is inflexible 
> (not to say just plain wrong), but I never thought about it like that.
> 
> If we are to extract further mentioned explicit old:new merge 
> parameter mapping to a separate discussion point, what we`re 
> eventually left with is just replacing this:
> 
>   merge -R -C  
> 
> ... with this:
> 
>   pick  

I see where you are coming from.

I also see where users will be coming from. Reading a todo list in the
editor is as much documentation as it is a "program to execute". And I am
afraid that reading a command without even mentioning the term "merge"
once is pretty misleading in this setting.

And even from the theoretical point of view: cherry-picking non-merge
commits is *so much different* from "rebasing merge commits" as discussed
here, so much so that using the same command would be even more
misleading.

> That is what I had in mind, seeming possibly more straightforward and 
> beautifully aligned with previously existing (and well known) 
> `rebase` terminology.
> 
> Not to say this would make it possible to use other `rebase -i` todo 
> list commands, too, like if you want to amend/edit merge commit after 
> it was rebased, you would write:
> 
>   edit  
> 
> ..., where in case you would simply like to reword its commit 
> message, it would be just:
> 
>   reword  
> 
> 
> Even `squash` and `fixup` could have their place in combination with 
> a (to be rebased) merge commit, albeit in a pretty exotic rebases, 
> thus these could probably be just disallowed - for the time being, at 
> least.

Sure, for someone who read the manual, that would be easy to use. Of
course, that's the minority.

Also: the `edit` command is poorly named to begin with. A much cleaner
design would be to introduce the `break` command as suggested by Stephan.

> The real power would be buried in implementation, learning to rebase 
> merge commits, so user is left with a very familiar interface, slightly 
> adapted do accommodate a bit different nature of merge commit in 
> comparison to an ordinary one, also to allow a bit more of interactive 
> rebase functionality, but it would pretty much stay the same, without 
> even a need to learn about new `merge`, `-R`, `-C`, and so on.
> 
> Yes, those would have its purpose, but for real merging then 
> (creating new merges, or recreating old ones), not necessarily for 
> merge rebasing.
> 
> With state of `merge -R -C ...` (that `-R` being the culprit), it 
> kind of feels like we`re now trying to bolt "rebase merges" 
> functionality onto a totally different one (recreate merges, serving 
> a different purpose), making them both needlessly dependent on each 
> other, further complicating user interface, making it more confusing 
> and less tunable as per each separate functionality needs (rebase vs. 
> recreate).
> 
> I guess I`m the one to pretty much blame here, too, as I really 
> wanted `--recreate-merges` to handle "rebase merges" better, only to 
> later realize it might not be the best tool for the job, and that a 
> more separate approach would be better (at least not through the same 
> `merge` todo list command)...
> 
> [1] 
> https://public-inbox.org/git/f3872fb9-01bc-b2f1-aee9-cfc0e4db7...@gmail.com/

Well, the `-R` option is no worse than `git merge`'s `-s `
option (which *also* changes the strategies rather drastically).

Ciao,
Dscho


Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-12 Thread Johannes Schindelin
Hi Buga,

On Sun, 11 Mar 2018, Igor Djordjevic wrote:

> On 11/03/2018 16:47, Johannes Schindelin wrote:
> > 
> > > Having explained all this, I realized this is the same "essentially
> > > merging the new tips into the original pretending that the new tips
> > > were not rebased but merged into upstream" as Phillip`s one, just
> > > that we have additional temporary commits U1 and U2 (as per
> > > mentioned "patch theory") :)
> > 
> > But if the old tips had been merged into upstream (resulting in the
> > new tips), then the merge bases would be *the old tips*.
> 
> Exactly, and that is what part you`ve cut out of the quote was 
> showing :) By Phillip`s implementation, we would start with *old tips* 
> as merge bases, indeed (old tips being U1 and U2 in this case),

I really do not see how it would make sense to take the original merge
commit as merge base in this scenario. It makes no *logical* sense: in
which interpretation did you develop changes in two divergent directions
from that merge commit?

Whereas if you use the old tips as merge bases, you can say very easily
what those two directions were: one merged with other merge parents, the
other direction rebased on top of upstream. There. Two divergent sets of
changes that we want to reconcile ("merge"). Easy as apple pie.

> where it further gets transformed as previously written:
> 
> > git merge-recursive U1 -- M U1'
> > tree="$(git write-tree)"
> > git merge-recursive U2 -- $tree U2'
> > tree="$(git write-tree)"
> > 
> > ..., where we know U1 = U2 = M (in regards to trees), so this is the 
> > same as:
> > 
> > git merge-recursive M -- M U1'
> > tree="$(git write-tree)"
> > git merge-recursive M -- $tree U2'
> > tree="$(git write-tree)"
> 
> Here, `git merge-recursive M -- M U1'` simply equals to U1' tree 
> (being a fast-forward merge), so we can write the two merges above as
> a single merge, too:
> 
> > git merge-recursive M -- U1' U2'
> > tree="$(git write-tree)"
> > 
> > ... which is exactly what Sergey`s (updated) approach suggests, 
> > merging U1' and U2' with M as merge-base (and shown inside that 
> > sample implementation script I provided[1]) :)
> 
> So from *old tips* being the rebased merge base (Phillip), we got to 
> *old merge commit* being the rebased merge base (Sergey), or vice 
> versa. Does this shed a bit more light on it now? Or you wanted to 
> point out something else in the first place...?

Okay, I'll trust you that these stunts show that the two strategies are
equivalent as to what their results are.

The biggest difference is that it is easy for me to see the motivation
behind Phillip's strategy, whereas I am still puzzled why one would come
up with a complicated strategy that splits merge commits and re-merges
them later, and why it should work in general (I still suspect that this
is not the case).

Where "easy" meant that I had to spend 1h still to figure out why using
the unrebased merge parents as merge bases. The same amount of time did
not allow me to wrap my head around Sergey's verbose explanations.

But I'll take your word for it that the strategies are equivalent, and go
with the one that has both a simpler explanation (in my mind, at least),
and an more robust implementation.

> > I am still not sure for what scenarios Phillip's strategy is the same as
> > Sergey's (updated) one, as the former strategy can do completely without
> > temporary commits [...]
> 
> I think the root of misunderstanding might be coming from the fact 
> that Sergey was mainly describing a general concept (without a 
> strictly defined implementation strategy, not being restricted to a 
> specific one), where Phillip came up with a solution that eventually 
> seems to use the same concept (as those transformations above should 
> show), but simplifying it further inside a concrete implementation.

Well, Sergey started off by suggesting the "rebase the patch relatively to
the first parent always" strategy, then came up with a long-ish email
describing a different approach (which I slowly realize is related to the
first strategy, and it would have been *much* appreciated if it was not
left to the reader to figure that one out), then incorporated what I
called your hack (again, no clear and concise description what changed,
just throwing a bunch of big bones to the dogs with the next long-ish
document).

So I will not apologize for stopping to pay so much attention to that
sub-thread at some point.

> By saying that Phillip "simplified it", even though transformations
> shown above might show different, I mean he managed to further decompose
> what Sergey was aiming for, abstracting temporary commits U1 and U2 out
> of the equation, thus making them optional, but not required.

That is not how I read Phillip's mail. It was more like "how about this
instead". And it was simple enough, with clear example how to implement
it, that I thought about that single mail for an hour, until I was

Re: [GSoC] [PATCH] travis-ci: added clang static analysis

2018-03-12 Thread Lars Schneider
Hi,

That looks interesting but I agree with Dscho that we should not limit
this to master/maint.

I assume you did run this on TravisCI already? Can you share a link?
I assume you did find errors? Can we fix them or are there too many?
If there are existing errors, how do we define a "successful" build?

Thanks for working on this,
Lars

> On 05 Mar 2018, at 21:04, SiddharthaMishra  wrote:
> 
> Added a job to run clang static code analysis on the master and maint branch
> 
> Signed-off-by: SiddharthaMishra 
> ---
> .travis.yml   | 17 -
> ci/run-static-analysis.sh |  9 -
> 2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/.travis.yml b/.travis.yml
> index 4684b3f4f..9b891d182 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -48,7 +48,7 @@ matrix:
>   before_install:
>   before_script:
>   script: ci/run-linux32-docker.sh
> -- env: jobname=StaticAnalysis
> +- env: jobname=CocciStaticAnalysis
>   os: linux
>   compiler:
>   addons:
> @@ -59,6 +59,21 @@ matrix:
>   before_script:
>   script: ci/run-static-analysis.sh
>   after_failure:
> +- if: branch IN (master, maint)
> +  env: jobname=ClangStaticAnalysis
> +  os: linux
> +  compiler:
> +  add_ons:
> +apt:
> +  sources:
> +  - ubuntu-toolchain-r-test
> +  - llvm-toolchain-trusty
> +  packages:
> +  - clang
> +  before_install:
> +  before_script:
> +  script: ci/run-static-analysis.sh
> +  after_failure:
> - env: jobname=Documentation
>   os: linux
>   compiler:
> diff --git a/ci/run-static-analysis.sh b/ci/run-static-analysis.sh
> index fe4ee4e06..6ae032f54 100755
> --- a/ci/run-static-analysis.sh
> +++ b/ci/run-static-analysis.sh
> @@ -5,6 +5,13 @@
> 
> . ${0%/*}/lib-travisci.sh
> 
> -make coccicheck
> +case "$jobname" in
> +ClangStaticAnalysis)
> + scan-build -analyze-headers --status-bugs make
> + ;;
> +CocciStaticAnalysis)
> + make coccicheck
> + ;;
> +esac
> 
> save_good_tree
> -- 
> 2.16.2.248.ge2408a6f7.dirty
> 



Re: [GSoC][PATCH] git-ci: use pylint to analyze the git-p4 code

2018-03-12 Thread Lars Schneider
Hi Viet,

> On 12 Mar 2018, at 03:20, Viet Hung Tran  wrote:
> 
> This is my submission as a microproject for the Google Summer of code.
> I apologize for not setting the [GSoC] in my previous email
> at <20180312020855.7950-1-viethtran1...@gmail.com>.
> Please ignore it.

No worries :-)

> Add a new job named Pylint to .travis.yml in order to analyze git-p4 Python 
> code.
> Although Travis CI have yet to implement continuous integration for multiple
> languages. Python package can be installed using apt packages. From there, 
> pylint can be
> installed using pip and used to analyze git-p4.py file.

That looks interesting!
I assume you did run this on TravisCI already? Can you share a link?
I assume you did find errors? Can we fix them or are there too many?
If there are existing errors, how do we define a "successful" build?

Thanks for working on this,
Lars

> 
> Signed-off-by: Viet Hung Tran 
> ---
> .travis.yml | 10 ++
> 1 file changed, 10 insertions(+)
> 
> diff --git a/.travis.yml b/.travis.yml
> index 5f5ee4f3b..581d75319 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -46,6 +46,16 @@ matrix:
> - docker
>   before_install:
>   script: ci/run-linux32-docker.sh
> +- env: jobname=Pylint
> +  compiler:
> +  addons:
> +apt:
> +  packages:
> +  - python
> +  before_install:
> +  install: pip install --user pylint
> +  script: pylint git-p4.py
> +  after_failure:
> - env: jobname=StaticAnalysis
>   os: linux
>   compiler:
> -- 
> 2.16.2.440.gc6284da
> 



Re: [PATCH 1/1] git-p4: add format-patch subcommand

2018-03-12 Thread Luke Diamand
On 26 February 2018 at 23:48, Eric Sunshine  wrote:
> On Mon, Feb 26, 2018 at 6:48 AM, Luke Diamand  wrote:
>> It takes a list of P4 changelists and generates a patch for
>> each one, using "p4 describe".
>> [...]

Just to say that I almost got this working reasonably well, but it
gets pretty nasty making binary files work - in order to generate the
git binary format I need a base85 encoder which exists in Python3, but
not in Python2. And while I could obviously write it in Python easily
enough, it starts to feel like an awful lot of code implementing (and
maintaining) a slightly second-rate "git diff". And while it's
tempting to just not support binary files, in reality Perforce repos
seem to end up with lots of them.

So I think I might have another go at the previous scheme. What I will
have to do is to first create a faked-up commit representing the point
prior to the shelved commit for each file ("for file@rev-1 in files"),
and then create the real commit ("for file@rev in files"). It's
probably not as grim as it sounds and then means that we end up with
git doing all the hard work of diffing files, rather than relying on
Perforce's diff engine in conjunction with some Python.


>> Signed-off-by: Luke Diamand 
>> ---
>> diff --git a/git-p4.py b/git-p4.py
>> @@ -3749,6 +3761,277 @@ class P4Branches(Command):
>> +class P4MakePatch(Command,P4UserMap):
>> +def run(self, args):
>> +if self.output and not os.path.isdir(self.output):
>> +sys.exit("output directory %s does not exist" % self.output)
>
> For consistency with "git format-patch", this could create the output
> directory automatically rather than erroring out.
>
>> +if self.strip_depot_prefix:
>> +self.clientSpec = getClientSpec()
>> +else:
>> +self.clientSpec = None
>> +
>> +self.loadUserMapFromCache()
>> +if len(args) < 1:
>> +return False
>
> Would it make sense to handle this no-arguments case earlier before
> doing work, such as loading the user map from cache?
>
>> +for change in args:
>> +self.make_patch(int(change))
>> +
>> +return True
>> +
>> +def p4_fetch_delta(self, change, files, shelved=False):
>> +cmd = ["describe"]
>> +if shelved:
>> +cmd += ["-S"]
>> +cmd += ["-du"]
>> +cmd += ["%s" % change]
>> +cmd = p4_build_cmd(cmd)
>> +
>> +p4 = subprocess.Popen(cmd, shell=False, stdout=subprocess.PIPE)
>> +try:
>> +result = p4.stdout.readlines()
>> +except EOFError:
>> +pass
>> +in_diff = False
>> +matcher = re.compile('\s+(.*)#(\d+)\s+\(text\)\s+')
>> +diffmatcher = re.compile("Differences ...")
>
> I'm not familiar with the output of "p4 describe", but does it include
> user-supplied text? If so, would it be safer to anchor 'diffmatcher',
> for instance, "^Diferences...$"?
>
>> +delta = ""
>> +skip_next_blank_line = False
>> +
>> +for line in result:
>> +if diffmatcher.match(line):
>
> Stepping back, does "Differences..." appear on a line of its own? If
> so, why does this need to be a regex at all? Can't it just do a direct
> string comparison?
>
>> +in_diff = True
>> +continue
>> +
>> +if in_diff:
>> +
>> +if skip_next_blank_line and \
>> +line.rstrip() == "":
>> +skip_next_blank_line = False
>> +continue
>> +
>> +m = matcher.match(line)
>> +if m:
>> +file = self.map_path(m.group(1))
>> +ver = m.group(2)
>> +delta += "diff --git a%s b%s\n" % (file, file)
>> +delta += "--- a%s\n" % file
>> +delta += "+++ b%s\n" % file
>> +skip_next_blank_line = True
>> +else:
>> +delta += line
>> +
>> +delta += "\n"
>> +
>> +exitCode = p4.wait()
>> +if exitCode != 0:
>> +raise IOError("p4 '%s' command failed" % str(cmd))
>
> Since p4.stdout.readlines() gathered all output from the command
> before massaging it into Git format, can't the p4.wait() be done
> earlier, just after the output has been read?
>
>> +return delta
>> +
>> +def make_patch(self, change):
>> +[...]
>> +# add new or deleted files
>> +for file in files:
>> +[...]
>> +if add or delete:
>> +if add:
>> +[...]
>> +else:
>> +[...]
>> +
>> +[...]
>> +
>> +if add:
>> +prefix = "+"
>> +else:
>> +prefix = "-"
>> +
>> +if delete:
>> +[...]
>> +