Re: Re: Merge with git-pasky II.

2005-04-18 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

 On Sun, 17 Apr 2005, Ingo Molnar wrote:
  
  in fact, this attack cannot even be proven to be malicious, purely via 
  the email from Malice: it could be incredible bad luck that caused that 
  good-looking patch to be mistakenly matching a dangerous object.
 
 I really hate theoretical discussions.

i was only replying to your earlier point:

   Almost all attacks on sha1 will depend on _replacing_ a file with 
   a bogus new one. So guys, instead of using sha256 or going 
   overboard, just make sure that when you synchronize, you NEVER 
   import a file you already have.

which point i still believe is subtly wrong. You were suggesting to 
concentrate on file replacement to counter most of the practical 
attacks, while i pointed out an attack _using the same basic mechanism 
that your point above supposed_.

[ if you can replace a file with a known hash, with a bogus new one, and 
  you still have enough control over the contents of your bogus new file 
  that it is 1) a valid file that builds 2) compromises the kernel, then 
  you likely have the same amount of control my 'theoretical' attack
  requires. ]

 And the thing is, _if_ somebody finds a way to make sha1 act as just a 
 complex parity bit, and comes up with generating a clashing object 
 that actually makes sense, then going to sha256 is likely pointless 
 too [...]

yes, that's why i suggested to not actually trust the hash to be 
cryptographically secure, but to just assume it's a good generic hash we 
can design a DB around, and to turn -DCOLLISION_CHECK on and enforce 
consistency rules on boundaries.

[ it's not bad to keep sha1 because even my suggested enhancement still
  leaves 'content-less trust-pointers to untrusted content via email'
  vectors open against attack (maintainer sends you an email that commit
  X in Malice's repository Y is fine to pull, and you pull it blindly,
  while the attacker has replaced his content with the compromised one
  meanwhile), but it at least validates the bulk traffic that goes into
  the DB: patches via emails and trusted repositories. ]

so all i was suggesting was to extend your suggested 'overwrite 
collision check' to a stricter 'content we throw away and use the sha1 
shortcut for needs to be checked against the in-DB content as well'.

in other words, your suggested 'rename check' is checking for 'positive 
duplicate content', while my addition would also check for 'negative 
duplicate content' as well.

but as usual, i could be wrong, so dont take this too serious :-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-17 Thread David Woodhouse
On Sat, 2005-04-16 at 17:33 +0200, Johannes Schindelin wrote:
  But if it can be done cheaply enough at a later date even though we end
  up repeating ourselves, and if it can be done _well_ enough that we
  shouldn't have just asked the user in the first place, then yes, OK I
  agree.
 
 The repetition could be helped by using a cache.

Perhaps. Since neither such a cache nor even the commit comments are
strictly part of the git data, they probably shouldn't be included in
the sha1 hash of the commit object. However, I don't see a fundamental
reason why we couldn't store them in the same file but omit them from
the hash calculations. That also allows us to retrospectively edit
commit comments without completely changing the entire subsequent
history.

Or is that a little too heretical a suggestion?

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Merge with git-pasky II.

2005-04-17 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 The compromise relies on you having reviewed something harmless, while 
 in reality what happened within the DB was far less harmless. And the 
 DB remains self-consistent: neither fsck, nor others importing your 
 tree will be able to detect the compromise. This attack can only be 
 detected when you apply the patch, after that point all the 
 information (except Malice's message in your inbox) is gone.

in fact, this attack cannot even be proven to be malicious, purely via 
the email from Malice: it could be incredible bad luck that caused that 
good-looking patch to be mistakenly matching a dangerous object.

In fact this could happen even today, _accidentally_. (but i'm willing 
to bet that hell will be freezing over first, and i'll have some really 
good odds ;) There's probably a much higher likelyhood of Linus' tree 
getting corrupted in some old fashioned way and introducing a security 
hole by accident)

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-17 Thread Petr Baudis
Dear diary, on Mon, Apr 18, 2005 at 01:29:05AM CEST, I got a letter
where Herbert Xu [EMAIL PROTECTED] told me that...
 I get the feeling that it isn't that bad.  For example, if we did it
 at the points where the blobs actually entered the tree, then the cost
 is always proportional to the change size (the number of new blobs).

No. The collision check is done in the opposite cache - when you want to
write a blob and there is already a file of the same hash in the tree.
So either the blob is already in the database, or you have a collision.

Therefore, the cost is proportional to the size of what stays unchanged.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-17 Thread Linus Torvalds


On Mon, 18 Apr 2005, Herbert Xu wrote:
 
 I wasn't disputing that of course.  However, the same effect can be
 achieved in using a single hash with a bigger length, e.g., sha256
 or sha512.

No it cannot.

If somebody actually literally totally breaks that hash, length won't 
matter. There are (bad) hashes where you can literally edit the content of 
the file, and make sure that the end result has the same hash.

In that case, when the hash algorithm has actually been broken, the length 
of the hash ends up being not very relevant. 

For example, you might hash your file by blocking it up in 16-byte
blocks, and xoring all blocks together - the result is a 16-byte hash.  
It's a terrible hash, and obviously trivially breakable, and once broken
it does _not_ help to make it use its 32-byte cousin. Not at all. You can 
just modify the breaking thing to equally cheaply make modifications to a 
file and get the 32-byte hash right again.

Is that kind of breakage likely for sha1? Hell no. Is it possible? In your 
in theory world where practice doesn't matter, yes.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-17 Thread Kenneth Johansson
Petr Baudis wrote:
Dear diary, on Mon, Apr 18, 2005 at 01:29:05AM CEST, I got a letter
where Herbert Xu [EMAIL PROTECTED] told me that...
I get the feeling that it isn't that bad.  For example, if we did it
at the points where the blobs actually entered the tree, then the cost
is always proportional to the change size (the number of new blobs).

No. The collision check is done in the opposite cache - when you want to
write a blob and there is already a file of the same hash in the tree.
So either the blob is already in the database, or you have a collision.
Therefore, the cost is proportional to the size of what stays unchanged.
?? now I'm confused. Surly the only cost involved is to never write over 
a file that already exist in the cache and that is already done NOW as 
far as I read the code. So there is NO extra cost in detecting an collision.



-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-17 Thread Petr Baudis
Dear diary, on Mon, Apr 18, 2005 at 02:49:06AM CEST, I got a letter
where Herbert Xu [EMAIL PROTECTED] told me that...
 Therefore the only conclusion I can draw is that we're only calling
 update-cache on the set of changed files, or at most a small superset
 of them.  In that case, the cost of the collision check *is* proportional
 to the size of the change.

Yes, of course, sorry for the confusion.  We only consider files you
either specify manually or which have their stat metadata changed
relative to the directory cache. (That is from the git-pasky
perspective; from the plumbing perspective, the user just does
update-cache on whatever he picks.)

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-16 Thread David Lang
On Fri, Apr 15, 2005 at 08:32:46AM -0700, Linus Torvalds wrote:
In other words, I'm right. I'm always right, but sometimes I'm more 
right
than other times. And dammit, when I say files don't matter, I'm 
really
really Right(tm).
You're right, of course (All Hail Linus!), if you can make it work
efficiently enough.
Just to put something else on the table, here's how I'd go about
tracking renames and the like, in another world where Linus /does/
make the odd mistake - it's basically a unique id for files in the
repository, added when the file is first recognised and updated when
update-cache adds a new version to the cache. Renames copy the id
across to the new name, and add it into the cache.
This gives you an O(n) way to tell what file was what across
renames, and it might even be useful in Linus' world, or if someone
wanted to build a traditional SCM on top of a git-a-like.
Attached is a patch, and a rename-file.c to use it.
Simon
given that you have multiple machines creating files, how do you deal with 
the idea of the same 'unique id' being assigned to different files by 
different machines?

David Lang

--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-16 Thread Johannes Schindelin
Hi,

On Fri, 15 Apr 2005, David Woodhouse wrote:

 But if it can be done cheaply enough at a later date even though we end
 up repeating ourselves, and if it can be done _well_ enough that we
 shouldn't have just asked the user in the first place, then yes, OK I
 agree.

The repetition could be helped by using a cache.

Ciao,
Dscho
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Merge with git-pasky II.

2005-04-16 Thread Simon Fowler
On Sat, Apr 16, 2005 at 06:03:33PM +0200, Petr Baudis wrote:
 Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter
 where Simon Fowler [EMAIL PROTECTED] told me that...
  On Sat, Apr 16, 2005 at 05:19:24AM -0700, David Lang wrote:
   Simon
   
   given that you have multiple machines creating files, how do you deal 
   with 
   the idea of the same 'unique id' being assigned to different files by 
   different machines?
   
  The id is a sha1 hash of the current time and the full path of the
  file being added - the chances of that being replicated without
  malicious intent is extremely small. There are other things that
  could be used, like the hostname, username of the person running the
  program, etc, but I don't really see them being necessary.
 
 Why not just use UUID?
 
Hey, everything else in git seems to use sha1, so I just copied
Linus' sha1 code ;-)

All I wanted was something that had a good chance of being unique
across any potential set of distributed repositories, to avoid the
chance of accidental clashes. A sha1 hash of something that's not
likely to be replicated is a simple way to do that.

Simon

-- 
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ 


signature.asc
Description: Digital signature


Re: Merge with git-pasky II.

2005-04-15 Thread Junio C Hamano
 CL == Christopher Li [EMAIL PROTECTED] writes:

 - Result is this object $SHA1 with mode $mode at $path (takes
 one of the trees); you can do update-cache --cacheinfo (if
 you want to muck with dircache) or cat-file blob (if you want
 to get the file) or both.

CL Is that SHA1 for tree or the file object?

I am talking about a single file here.


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-15 Thread Christopher Li
On Fri, Apr 15, 2005 at 12:43:47AM -0700, Junio C Hamano wrote:
  CL == Christopher Li [EMAIL PROTECTED] writes:
 
 CL Is that SHA1 for tree or the file object?
 
 I am talking about a single file here.

Then do you emit the entry for it's parents directory?

e.g. /foo/bar get created. foo doesn't exists. You have
to create foo first. You don't have mode information for
foo yet. If it give the top level tree, the SCM can check it
out by tree. hopefully have the mode on directory correctly.
Well, if they care about those little details.

Chris
 
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-15 Thread David Woodhouse
On Fri, 2005-04-15 at 11:36 +0200, Ingo Molnar wrote:
 do such cases occur frequently? In the kernel at least it's not too 
 typical. 

Isn't it? I thought it was a fairly accurate representation of the
process I make a whole bunch of changes to files I maintain, pulling
from Linus while occasionally asking him to pull from my tree. Sometimes
my files are changed by someone else in Linus' tree, and sometimes I
change files that I don't actually own..

-- 
dwmw2


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-15 Thread Theodore Ts'o
On Fri, Apr 15, 2005 at 02:03:08PM +0200, Johannes Schindelin wrote:
 I disagree. In order to be trusted, this thing has to catch the following
 scenario:
 
 Skywalker and Solo start from the same base. They commit quite a lot to
 their trees. In between, Skywalker commits a tree, where the function
 kazoom() has been added to the file deathstar.c, but Solo also added
 this function, but to the file moon.c. A file-based merge would have no
 problem merging each file, such that in the end, kazoom() is defined
 twice.
 
 The same problems arise when one tries to merge line-wise, i.e. when for
 each line a (possibly different) merge-parent is sought.

Be careful.  There is a very big tradeoff between 100% perfections in
catching these sorts of errors, and usability.  There exists SCM's
where you are not allowed to do commit such merges until you do a test
compile, or run a regression test suite (that being the only way to
catch these sorts of problems when we merge two branches like this).  

BitKeeper never caught this sort of thing, and we trusted it.  In
practice it was also rarely a problem.

I'll also note that BitKeeper doesn't restrict you from doing a
committing a changeset when you have modified files that have yet to
be checked in to the tree.  Same issue; you can accidentally check in
changesets that in trees that won't build, but if we added this kind
of SCM-by-straightjacket philosophy it would decrease our productivity
and people would simply not use such an SCM, thus negating its
effectiveness.

- Ted
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-15 Thread Junio C Hamano
 PB == Petr Baudis [EMAIL PROTECTED] writes:

PB I can't see the conflicts between what I want and what Linus wants.
PB After all, Linus says that I can use the directory cache in any way I
PB please (well, the user can, but I'm speaking for him ;-). So I'm doing
PB so, and with your tool I would get into problems, since it is suddenly
PB imposing a policy on what should be in the index.

I think our misunderstanding is coming from the use of the word
merge tree.  I think you have been assuming that I wanted you
to run merge-trees -o ,,merge --- which would certainly cause
me to muck with your dircache there.  I totally agree with you
that that is a *BAD* *THING*.  No question there.

However, my assumption has been different.  I was assuming that
you would run merge-trees -o merge~tree (i.e. different from
your merge tree), so that you can get the merge results in a
form parsable by you.  And then, using that information, you can
make your changes in ,,merge.  After you are done with that
information, you can remove merge~trees, of course.

The format I chose for the merge result in a form parsable by
you happens to be a dircache in merge~tree, with minimum
number of files checked out when merge cannot be automatically
done safely.  In the simplest case of not having any conflicting
merge between $C and $merged, Cogito can immediately run
write-tree in merge~tree (not ,,merge) to obtain its tree-ID
$T, so that it can feed it to diff-tree to compare it with
whatever tree state Cogito wants to apply the merges between $C
and $merged to.

I still do not understand what you do in ,,merge directory, but
here is one way you can update the user working directory
in-place without having a ,,merge directory [*2*].  You can run
your git diff between $C and $T [*1*].  The result is the diff
you need to apply on top of your user's working files.  If the
user does not like the result of running that diff, it can
easily be reversed.

If a manual merge were needed between $C and $merged, Cogito
could guide the user through that manual edit in merge~tree,
and run update-cache on those hand merged files in merge~tree,
before running write-tree in merge~tree to obtain $T; after
that, everything else is the same.

You make interesting points in other parts of your message I
need to regurgitate for a while, so I would not comment on them
in this message.

[Footnote]

*1* I really like the convenience of being able to use tree-ID
and commit-ID interchangeably there.  Thanks.

*2* I understand that this would change the user's git-tools
experience a bit.  The user will not be told to go to ,,merge
and commit there which will reflected back to your working tree
anymore.  Instead the merge happens in-place.  Committing, not
committing, or further hand-fixing the merge is up to the user.
I suspect this change might even be for the better.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-14 Thread Junio C Hamano
 PB == Petr Baudis [EMAIL PROTECTED] writes:

PB Bah, you outran me. ;-)

Just being in a different timezone, I guess.

PB I'll change it to use the cool git-pasky stuff (commit-id etc) and its
PB style of committing - that is, it will merely record the update-caches
PB to be done upon commit, and it will read-tree the branch we are merging
PB to instead of the ancestor. (So that git diff gives useful output.)

Sorry, I have not seen what you have been doing since pasky 0.3,
and I have not even started to understand the mental model of
the world your tool is building.  That said, my gut feeling is
that telling this script about git-pasky's world model might be
a mistake.  I'd rather see you consider the script as mere part
of the plumbing.  Maybe adding an extra parameter to the script
to let the user explicitly specify the common ancestor to use
would be needed, but I would prefer git-pasky-merge to do its
own magic (converting symbolic commit names into raw commit
names and such) before calling this low level script.

That way people like me who have not migrated to your framework
can still keep using it.  All the script currently needs is a
bare git object database; i.e., nothing other than what is in
.git/objects and a couple of commit record SHA1s as its
parameters.  No .git/heads/, no .git/HEAD.local, no .git/tags,
are involved for it to work, and I would prefer to keep things
that way if possible.

 * show-diff updates to add -r flag to squelch diffs for files not in
 the working directory.  This is mainly useful when verifying the
 result of an automated merge.

PB -r traditionally means recursive - what's the reasoning behind the
PB choice of this letter?

Well, '-r' is not necessarily recursive. ls -r is reverse, sort
-r is reverse.  less -r is raw.  cat -r is reversible.
nethack -r is race ;-).  You are thinking as an SCM person so
it may look that way.  diff -r is recursive.  darcs add -r
is recursive.  But even in the SCM world, cvs add -r is not
(it means read-only) neither co -r (explicit revision) ;-).

I would rather pick '-q' if I were doing the patch today, but I
was too tired and did not think of a letter when I wrote it.  I
guess '-r' stood for removed, but I agree it is a bad choice.
Any objections to '-q'?

PB use strict?

Not in this iteration but eventually yes.

PB It'd be simpler to do just

PB my @common = (map { $common{$_} }
PB   sort { $b = $a }
PB   keys %common)

Well, actually you spotted a bug between the implementation and
what I wanted to do.  It should have been:

map { $_-[0] }
sort { $b-[1] = $a-[1] }
map { [ $common{$_} = $_ ] } keys %common

That is, sort [anscestor = number of times it appears] tuple by
the number of times it appears in decreasing order, and
project the resulting list to a list of ancestors.  It is trying
to deal with the following pattern in rev-tree output:

TIMESTAMP1 EDGE1:1 ANCESTOR1:3 ANCESTOR2:3
TIMESTAMP2 EDGE2:2 ANCESTOR1:3

and when the above happens I wanted to pick up ANCESTOR1, but
that was without no sound reason.

PB But I really think this is a horrible heuristic. I believe you should
PB take the latest commit in the --edges output, and from that choose the
PB base whose rev-tree --edges the_base merged_branch has the least lines
PB on output. (That is, the path to it is shortest - ideally it's already
PB part of the merged_branch.)

I'll try something along that line.  Honestly the ancestor
selection part was what I had most trouble with.  Thanks.

PB What about

PB sub OLDMODE { 0 }
PB sub NEWMODE { 1 }
PB sub NEWSHA { 2 }

PB and then using that when accessing the tuple? Would make the code
PB much more readable.

Totally agreed; readability cleanup is needed, just as use
strict you mentioned, before it is ready for public
consumption.  Remember, however, the primary purpose of the
message was to share it with Linus so that I can ask his opinion
while the script was still slushy; the contents that array
contained was still changing then and was too early for symbolic
constants.  I'll do that in the next round.

PB It is a good idea to check merge's exit code and give a notice at the
PB end if there were any conflicts.

In principle yes, but I noticed that merge already gave me a
nice warning message when it found conflicts, so there was no
need to do so myself in this case.  See sample output:

$ perl ./git-merge.perl \
71796686221a0a56ccc25b02386ed8ea648da14d \
bb95843a5a0f397270819462812735ee29796fb4 
Common ancestor: 9f02d4d233223462d3f6217b5837b786e6286ba4
O - COPYING
O - README
...
O - write-tree.c
A M write-blob.c
A M show-diff.c
...
A M update-cache.c
A M git-merge.perl
B M merge-tree.c
MRG Makefile
merge: warning: conflicts during merge
$ 

 +# Create a temporary directory and go there.
 +system 'rm', '-rf', ',,merge-temp';

PB Can't we call it just ,,merge?

I'd rather have a command line option 

Re: Re: Re: Merge with git-pasky II.

2005-04-14 Thread Petr Baudis
Dear diary, on Thu, Apr 14, 2005 at 10:23:26PM CEST, I got a letter
where Erik van Konijnenburg [EMAIL PROTECTED] told me that...
 On Thu, Apr 14, 2005 at 09:35:07PM +0200, Petr Baudis wrote:
  Hmm. I actually don't like this naming. I think it's not too consistent,
  is irregular, therefore parsing it would be ugly. What I propose:
  
  12c\tname - legend
- original file
  D - tree #1 removed file
   D- tree #2 removed file
  DD- both trees removed file
  M - tree #1 modified file
   M
  DM*   - conflict, tree #1 removed file, tree #2 modified file
  MD*
  MM- exact same modification
  MM*   - different modifications, merging
  
  This is generic, theoretically scales well even to more trees, is easy
  to parse trivially, still is human readable (actually the asterisk in
  the 'conflict' column is there basically only for the humans), is
  completely regular and consistent.
 
 Detail: perhaps use underscore instead of space, to avoid space/tab typos
 that are invisible on paper and user friendly mail clients?

I'd go for dots in that case. Looks less intrusive. :^)

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Merge with git-pasky II.

2005-04-14 Thread Christopher Li
Is that some thing you want to see? Maybe clean up the error printing.


Chris

--- /dev/null   2003-01-30 05:24:37.0 -0500
+++ merge.py2005-04-14 16:34:39.0 -0400
@@ -0,0 +1,76 @@
+#!/usr/bin/env python
+
+import re
+import sys
+import os
+from pprint import pprint
+
+def get_tree(commit):
+data = os.popen(cat-file commit %s%commit).read()
+return re.findall(r(?m)^tree (\w+), data)[0]
+
+PREFIX = 0
+PATH = -1
+SHA = -2
+ORIGSHA = -3
+
+def get_difftree(old, new):
+lines = os.popen(diff-tree %s %s%(old, new)).read().split(\x00)
+patterns = (r(\*)(\d+)-(\d+)\s(\w+)\s(\w+)-(\w+)\s(.*),
+   r([+-])(\d+)\s(\w+)\s(\w+)\s(.*))
+res = {}
+for l in lines:
+   if not l: continue
+   for p in patterns:
+   m = re.findall(p, l)
+   if m:
+   m = m[0]
+   res[m[-1]] = m
+   break
+   else:
+   raise difftree: unknow line, l
+return res
+
+def analyze(diff1, diff2):
+diff1only = [ diff1[k] for k in diff1 if k not in diff2 ]
+diff2only = [ diff2[k] for k in diff2 if k not in diff1 ]
+both = [ (diff1[k],diff2[k]) for k in diff2 if k in diff1 ]
+
+action(diff1only)
+action(diff2only)
+action_two(both)
+
+def action(diffs):
+for act in diffs:
+   if act[PREFIX] == *:
+   print modify, act[PATH], act[SHA]
+   elif act[PREFIX] == '-':
+   print remove, act[PATH], act[SHA]
+   elif act[PREFIX] == '+':
+   print add, act[PATH], act[SHA]
+   else:
+   raise unknow action
+
+def action_two(diffs):
+for act1, act2 in diffs:
+   if len(act1) == len(act2):  # same kind type
+   if act1[PREFIX] == act2[PREFIX]:
+   if act1[SHA] == act2[SHA] or act1[PREFIX] == '-': 
+   return action(act1)
+   if act1[PREFIX]=='*':
+   print do_merge, act1[PATH], act1[ORIGSHA], act1[SHA], 
act2[SHA]
+   return
+   print unable to handle, act[PATH]
+   print one side wants, act1[PREFIX]
+   print the other side wants, act2[PREFIX]
+   
+
+args = sys.argv[1:]
+if len(args)!=3:
+print Usage merge.py common rev1 rev2
+trees = map(get_tree, args)
+print checkout-tree, trees[0]
+diff1 = get_difftree(trees[0], trees[1])
+diff2 = get_difftree(trees[0], trees[2])
+analyze(diff1, diff2)
+
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html