Re: git merge-tree: bug report and some feature requests

2018-01-24 Thread Josh Bleecher Snyder
Thanks, Ed. I think I'll pursue the libgit2 route; sounds promising.


>> But the alternative appears to be punting entirely, as libgit2 does,
>> and merely providing something akin to three index entries.
>
> Indeed, when I added merge to libgit2, we put the higher-level conflict
> analysis into application code because there was not much interest in it
> at the time.  I've been meaning to add this to `git_status` in libgit2,
> but it's not been a high priority.

Is your conflict analysis application code public? I might be game to
do some of the legwork to get it into libgit2's git_status (although
I'm probably not the right person to do the API design). At a minimum,
it would be helpful as a reference, as I'm probably about to recreate
some subset of it myself.


-josh


Re: git merge-tree: bug report and some feature requests

2018-01-22 Thread Josh Bleecher Snyder
>> I'm experimenting with some new porcelain for interactive rebase. One
>> goal is to leave the work tree untouched for most operations. It looks
>> to me like 'git merge-tree' may be the right plumbing command for
>> doing the merge part of the pick work of the todo list, one commit at
>> a time. If I'm wrong about this, I'd love pointers; what follows may
>> still be interesting anyway.
>
> I don't have a concrete alternative (yet?) but here are some pointers
> to two alternate merge-without-touching-working-tree possibilities, if
> your current route doesn't pan out as well as you like:
>
> I posted some patches last year to make merge-recursive.c be able to
> do merges without touching the working tree.  Adding a few flags would
> then enable it for any of 'merge', 'cherry-pick', 'am', or
> 'rebase'...though for unsuccessful merges, there's a clear question of
> what/how conflicts should be reported to the user.  That probably
> depends a fair amount on the precise use-case.
>
> Although that series was placed on the backburner due to the immediate
> driver of the feature going away, I'm still interested in such a
> change, though I think it would fall out as a nice side effect of
> implementing Junio's proposed ideal-world-merge-recursive rewrite[1].
> I have started looking into that[2], but no guarantees about how
> quickly I'll find time to finish or even whether I will.
>
> [1] https://public-inbox.org/git/xmqqd147kpdm@gitster.mtv.corp.google.com
> [2] https://github.com/newren/git/blob/ort/ort-cover-letter contains
> overview of ideas and notes to myself about what I was hoping to
> accomplish; currently it doesn't even compile or do anything

Thanks for the pointer. That does seem promising.

And yes, I see now that serialization of conflicts is decidedly
challenging. More on that below.


>> 4. API suggestion
>>
>> Here's what I really want 'git merge-tree' to output. :)
> ...
>> If the merge had conflicts, write the "as merged as possible" tree to
>
> You'd need to define "as merged as possible" more carefully, because I
> thought you meant a tree containing all the three-way merge conflict
> markers and such being present in the "resolved" file, but from your
> parenthetical note below it appears you think that is a different tree
> that would also be useful to diff against the first one.  That leaves
> me wondering what the first tree is. (Is it just the tree where for
> each path, if that path had no conflicts associated with it then it's
> the merge-resolved-file, and otherwise it's the file contents from the
> merge-base?).

FWIW, the parenthetical suggestion was indeed what I had in mind. But
non-content conflicts appear to make that a non-starter. Or at least
woefully incomplete.


> Both of these trees are actually rather non-trivial to define.  The
> wording above isn't actually sufficient, because content conflicts
> aren't the only kind of conflict.  More on that below.
>
> There is already a bunch of code in merge-recursive.c to create a
> forcibly-merged-accepting-conflict-markers-in-the-resolution and
> record it as a tree (this is used for creating virtual merge bases in
> the recursive case, namely when there isn't a single merge-base for
> the two branches you are merging).  It might be reusable for what you
> want here, but it's not immediately clear whether all the things it
> does are appropriate; someone would have to consider the non-content
> (path-based) conflicts carefully.

Ack. I assume this is also the code that generates the existing 'git
merge-tree' patches, which includes conflict markers.


>> the object database and give me its sha, and then also give me the
>> three-way merge diff output for all conflicts, as a regular patch
>> against that tree, using full path names and shas. (Alternatively,
>> maybe better, give me a second sha for a tree containing all the
>> three-way merge diff patches applied, which I can diff against the
>> first tree to find the conflict patches.)
>
> As far as I can tell, you're assuming that it's possible with two
> trees that are crafted "just right", that you can tell where the merge
> conflicts are, with binary files being your only difficulty.  Content
> conflicts aren't the only type that exist; there are also path-based
> conflicts.  These type of conflicts also make it difficult to know how
> the two trees you are requesting should even be created.
>
> For example, if there is a modify/delete conflict, how can that be
> determined from just two trees?  If the first tree has the base
> version of the file, then the second tree either has a file at the
> same position or it doesn't.  Neither case looks like a conflict, but
> the original merge had one.  You need more information.  The exact
> same thing can be said for rename/delete conflicts.
>
> Similarly, rename/add (one side renames an existing file to some new
> path (say, "new_path"), and the other adds a brand new file at
> "new_path), or rename/rename(2to1) (each 

git merge-tree: bug report and some feature requests

2018-01-20 Thread Josh Bleecher Snyder
Hi, all.

I'm experimenting with some new porcelain for interactive rebase. One
goal is to leave the work tree untouched for most operations. It looks
to me like 'git merge-tree' may be the right plumbing command for
doing the merge part of the pick work of the todo list, one commit at
a time. If I'm wrong about this, I'd love pointers; what follows may
still be interesting anyway.

I've encountered some bumps with 'git merge-tree'. A bug report and
some feature requests follow. Apologies for the long email.


1. Bug

When a binary file containing NUL is added on only one side, the
resulting patch is malformed. Reproduction script:

mkdir test
cd test
git init .
touch shared
git add shared
git commit -m "base"
git checkout -b left
echo "left" > x
printf '\1\0\1\n' > binary
git add x binary
git commit -m "left"
git checkout master
git checkout -b right
echo "right" > x
git add x
git commit -m "right"
git merge-tree master right left
git merge-tree master right left | xxd
cd ..
rm -rf test

The merge-tree results I get with 2.15.1 are:

added in remote
  their  100644 ddc50ce55647db1421b18aa33417442e29f63d2f binary
@@ -0,0 +1 @@
+added in both
  our100644 c376d892e8b105bd712d06ec5162b5f31ce949c3 x
  their  100644 45cf141ba67d59203f02a54f03162f3fcef57830 x
@@ -1 +1,5 @@
+<<< .our
 right
+===
+left
+>>> .their

: 6164 6465 6420 696e 2072 656d 6f74 650a  added in remote.
0010: 2020 7468 6569 7220 2031 3030 3634 3420their  100644
0020: 6464 6335 3063 6535 3536 3437 6462 3134  ddc50ce55647db14
0030: 3231 6231 3861 6133 3334 3137 3434 3265  21b18aa33417442e
0040: 3239 6636 3364 3266 2062 696e 6172 790a  29f63d2f binary.
0050: 4040 202d 302c 3020 2b31 2040 400a 2b01  @@ -0,0 +1 @@.+.
0060: 6164 6465 6420 696e 2062 6f74 680a 2020  added in both.
0070: 6f75 7220 2020 2031 3030 3634 3420 6333  our100644 c3
0080: 3736 6438 3932 6538 6231 3035 6264 3731  76d892e8b105bd71
0090: 3264 3036 6563 3531 3632 6235 6633 3163  2d06ec5162b5f31c
00a0: 6539 3439 6333 2078 0a20 2074 6865 6972  e949c3 x.  their
00b0: 2020 3130 3036 3434 2034 3563 6631 3431100644 45cf141
00c0: 6261 3637 6435 3932 3033 6630 3261 3534  ba67d59203f02a54
00d0: 6630 3331 3632 6633 6663 6566 3537 3833  f03162f3fcef5783
00e0: 3020 780a 4040 202d 3120 2b31 2c35 2040  0 x.@@ -1 +1,5 @
00f0: 400a 2b3c 3c3c 3c3c 3c3c 202e 6f75 720a  @.+<<< .our.
0100: 2072 6967 6874 0a2b 3d3d 3d3d 3d3d 3d0a   right.+===.
0110: 2b6c 6566 740a 2b3e 3e3e 3e3e 3e3e 202e  +left.+>>> .
0120: 7468 6569 720a   their.

Note that the "added in both" explanation appears to be part of the
diff for binary. The diff line should be '\1\0\1\n', but it is only
'\1', obviously suggesting a C string operation gone awry.

I haven't checked whether regular 'git diff' operations contain a
similar bug. (The NUL would have to be pretty far into the file, to
confuse the binary file detection heuristic, but that is possible.)

I don't see any particularly good work-arounds. Looking for all
possible explanations as a trigger to detect a malformed patch runs
into false positives with the explanation "merged", which occurs in
regular code.


2. Feature suggestion

Related to the bug, may I suggest a flag to omit unnecessary patches?
For "added in remote" and "deleted in remote", I don't actually need
the patch--I can grab the blob contents from the SHA myself if needed.
These cases need special handling anyway (to create/delete the file),
so the (often large) patch doesn't add much anyway. This would provide
a workaround for the bug.


3. Feature suggestion

There's no direct indication of whether any given file's merge
succeeded. Currently I sniff for merge conflicts by looking for
"+<<< .our", which feels like an ugly kludge. Could we provide an
explicit indicator? (And maybe also one for binary vs text
processing?)

Note that binary file merge conflicts don't generate patches with
three-way merge markers but instead say "warning: Cannot merge binary
files: binary (.our vs. .their)". Looking for this case even further
complicates the output parser.


4. API suggestion

Here's what I really want 'git merge-tree' to output. :)

If the merge had no conflicts, write the resulting tree to the object
database and give me its sha. I can always diff that tree against
branch1's tree if I want to see what has changed.

If the merge had conflicts, write the "as merged as possible" tree to
the object database and give me its sha, and then also give me the
three-way merge diff output for all conflicts, as a regular patch
against that tree, using full path names and shas. (Alternatively,
maybe better, give me a second sha for a tree containing all the
three-way merge diff patches applied, which I can diff against the
first tree to find the conflict patches.)

I'm not sure what to do about binary merge conflicts, since they
aren't representable with three-way markers. Maybe 

Re: Bug report: $program_name in error message

2016-12-18 Thread Josh Bleecher Snyder
>> To reproduce, run 'git submodule' from within a bare repo. Result:
>>
>> $ git submodule
>> fatal: $program_name cannot be used without a working tree.
>>
>> Looks like the intent was for $program_name to be interpolated.
>
> Which version of git do you use?

$ git version
git version 2.11.0


>> As an aside, I sent a message a few days ago about a segfault when
>> working with a filesystem with direct_io on, but it appears not to
>> have made it to the archives on marc.info. Am I perhaps still
>> greylisted?
>
> Both emails show up in my mailbox (subscribed to the mailing list),
> so I think that just nobody answered your first email as the answer
> may be non trivial.

Thanks for the confirmation.

-josh


Bug report: $program_name in error message

2016-12-18 Thread Josh Bleecher Snyder
To reproduce, run 'git submodule' from within a bare repo. Result:

$ git submodule
fatal: $program_name cannot be used without a working tree.

Looks like the intent was for $program_name to be interpolated.


As an aside, I sent a message a few days ago about a segfault when
working with a filesystem with direct_io on, but it appears not to
have made it to the archives on marc.info. Am I perhaps still
greylisted?

Thanks,
Josh


Segfault in git_config_set_multivar_in_file_gently with direct_io in FUSE filesystem

2016-12-16 Thread Josh Bleecher Snyder
I am using git with a simple in-memory FUSE filesystem. When I enable
direct_io, I get a segfault from
git_config_set_multivar_in_file_gently during git clone.

I have full reproduction instructions using Go and macOS at
https://github.com/josharian/gitbug. It also includes a stack trace in
case anyone wants to try blind debugging. Happy to provide more info
if it will help.

Thanks,
Josh


Bug report: checkout-index --temp path is not always relative to the current directory

2014-12-30 Thread Josh Bleecher Snyder
In the section on using --temp, 'git help checkout-index' says:

The path field is always relative to the current directory and the
temporary file names are always relative to the top level directory.

However, this can be false when an absolute path to the file is provided.

Reproduction:

~$ git version
git version 2.2.1
~$ mkdir demo
~$ cd demo
~/demo$ mkdir a
~/demo$ echo a  a/f
~/demo$ mkdir b
~/demo$ echo b  b/f
~/demo$ git init .
Initialized empty Git repository in ~/demo/.git/
~/demo$ git add .
~/demo$ git commit -m init
[master (root-commit) 2afa910] init
 2 files changed, 2 insertions(+)
 create mode 100644 a/f
 create mode 100644 b/f
~/demo$ echo b2  b/f
~/demo$ cd a
~/demo/a$ git checkout-index --temp -- `git rev-parse --show-toplevel`/b/f
.merge_file_xm8RTd f
~/demo/a$ cat ../.merge_file_xm8RTd
b

Note that if f in the checkout-index output is interpreted as relative
to the current directory, it would refer to a/f, whereas in fact is it
b/f.

This led to https://github.com/golang/go/issues/9476.

Thanks,
Josh
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html