Re: [darcs-devel] Darcs and git: plan of action

2005-04-20 Thread David Roundy
On Tue, Apr 19, 2005 at 09:49:12AM -0700, Linus Torvalds wrote:
 On Tue, 19 Apr 2005, Tupshin Harper wrote:
  I suspect that any use of wildcards in a new format would be impossible
  for darcs since it wouldn't allow darcs to construct dependencies,
  though I'll leave it to david to respond to that.
 
 Note that git _does_ very efficiently (and I mean _very_) expose the 
 changed files.
 
 So if this kind of darcs patch is always the same pattern just repeated
 over n files, then you really don't need to even list the files at all.
 Git gives you a very efficient file listing by just doing a diff-tree
 (which does not diff the _contents_ - it really just gives you a pretty
 much zero-cost which files changed listing).

The catch is that it's possible to have a darcs patch that doesn't change
any files, or that affects files without changing them.  If I rename
function foo to bar, I might want to do

darcs replace foo bar *.c

which would issue a replace on all files, which means that when this patch
is merged with any patches that add occurrences of foo in a file, that will
get modified to a bar, regardless of whether there was previously an
occurrence of foo in that file.

I think we might (when working with git--it would be problematic within
darcs straight) be able to work out some sort of a wildcard replace
scheme, so it could be something like

replace foo bar in: mm/*.c

The regexp bit could be left out, if we restrict the definition of tokens
in token replaces--which probably isn't a troublesome limitation.  By
default darcs uses two tokenizing schemes, one which allows . in tokens
(usually relevant in Makefiles), and one which doesn't, and basically
matches C identifiers.  We could allow for both of these if we had a second
option:

replace filename foo.h bar.h in: mm/*.c

We'd just need to expand the wildcards when translating from the git
repository into darcs patches.

 So that combination would be 100% reliable _if_ you always split up darcs 
 patches to common elements. 
 
 And note that there does not have to be a 1:1 relationship between a git
 commit and a darcs patch. For example, say that you have a darcs patch
 that does a combination of change token x to token y in 100 files and
 rename file a into b. I don't know if you do those kind of combination 
 patches at all, but if you do, why not just split them up into two? That 
 way the list of files changed _does_ 100% determine the list of files for 
 the token exchange.

We do allow multiple sorts of changes (in darcs terminology, multiple
primitive patches) in a single patch.

One *could* have multiple git commits for a single darcs patch, but that
seems ugly and I'd rather avoid it.  In my view, revision control system is
more about communication than history (which is why by default, darcs
doesn't do history), and grouping changes together is how we express
which changes go together.  Of course, we could still have a grouping at
a higher level, so that a single changeset could consist of multiple git
commits (for example by recognizing that identical commit logs mean that
it's a single change), but that adds a layer of complexity that I'd like to
avoid if possible.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-20 Thread David Roundy
On Tue, Apr 19, 2005 at 02:25:18PM +0200, Petr Baudis wrote:
 Dear diary, on Tue, Apr 19, 2005 at 02:20:55PM CEST, I got a letter
 where Juliusz Chroboczek [EMAIL PROTECTED] told me that...
   The problem is that there is no sequence of alien versions that one
   can differentiate.  Git has a branched history, with each version
   that follows a merge having multiple parents.
  
  Yep.  I've just realised that this morning.  Is there some notion of
  ``primary parent'' as in Arch?  Can a changeset have 0 parents?
 
 Yes, the root commit. Usually, there is only one, but there may be
 multiple of them theoretically.

Incidentally (and completely off-topic for this thread), wouldn't there be
a sha1 tree hash corresponding to a completely empty directory, and
couldn't one use that as the parent for the root? Would there be any reason
to do so? Just a silly thought...
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-20 Thread David Roundy
On Tue, Apr 19, 2005 at 02:20:55PM +0200, Juliusz Chroboczek wrote:
 [Removing Linus from CC, keeping the Git list -- or should we remove it?]

I think leaving much of this on git would be appropriate, since there are
issues of how to relate to git that should be relevant.

  If we do it right (automatically tagging like crazy people), darcs
  users between themselves can cherry-pick all they like, without
  introducing inconsistencies or losing interoperability with git.
 
 You've lost me here.  How can you cherry-pick if every tag depends on
 the preceding patches?  Or are you thinking of pulling just the patch
 and not the tag -- in that case, what happens when you push to git a
 Darcs patch that depends on a patch that originated with git?

Yes, I'm thinking of pulling patches from one darcs repo to another.  If we
cherry-pick in this way, we need to create a git-tag for each patch that
we pull without its associated tag.  To git, this would look like two
separate changes that have the same commit log, except that they have
different parents and different commiters and commit dates.

I don't think this will be a problem for git, and since darcs will
recognize the two patches as the identical darcs patch (we'll need to put
somewhere in the git commit log a magic word indicating that this patch
originated in darcs), there won't be a problem for darcs either.

In case I haven't been clear (which seems likely), the scenario is that
darcs user 1 makes the following changes to his darcs version of a
git-based repository:

changes in 1: A - B
tags in 1:A1   B1

Darcs user 2 wants B, but not A, and didn't do any development:

changes in 2: B
tags in 2:B2

User 2 pushes to git, and now git has (where P is the parent of both of the
above):

git:
P - B/B2  (where B/B2 is the commit log with B2 as committer info and B
as the author info and long comment)

User 1 pushes (everything) to git and merges the two (patch M, which has
two parents, B1 and B2:

git:

   -B/B2-
  /   \
P-- A/A1 - B/B1--- M

It's a little lame, and if user 2 doesn't do any real work, the git-using
person might be annoyed, but I think it's doable.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-20 Thread Ralph Corderoy

Hi Ray,

 Give me a case where assuming it's a replace will do the wrong thing,
 for C code, where it's a variable or function name.

How about two patches.

1.  s/foo/bar/ throughout file because foo() has been decided upon
as the name of a new globally visible forthcoming function but was
already in use as a static function.

2.  Add definition of new foo().

Patch 1 mustn't be a `darcs replace' despite it changing every occurence
of the C token foo into bar.

Cheers,


Ralph.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread Juliusz Chroboczek
  Aye, that will require some metadata on the git side (the hack,
  suggested by Linus, of using git hashes to notice moves won't work).

 So, why won't it work?

Because two files can legitimately have identical contents without
being ``the same'' file from the VC system's point of view.

In other words, two files may happen to have the same contents but
have distinct histories.

Juliusz


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread David Roundy
On Mon, Apr 18, 2005 at 08:38:25AM -0700, Linus Torvalds wrote:
 On Mon, 18 Apr 2005, David Roundy wrote:
   In particular, it would make life (that is, life interacting back
  and forth with git) easier if we were to embed darcs patches in their
  entirety in the git comment block.
 
 Hell no.

I was afraid that would be the response...

 The commit _does_ specify the patch uniquely and exactly, so I really 
 don't see the point. You can always get the patch by just doing a
 
   git diff $parent_tree $thistree
 
 so putting the patch in the comment is not an option.

The issue is that in darcs the parent and child trees *don't* uniquely or
exactly specify the patch.  In fact, even the output of git diff will
depend on what version of diff you're using (e.g. if someone were to use
BSD diff rather than GNU diff).

  As I say, it's a bit ugly, and before we explore the idea further, it would
  be nice to know if this would cause Linus to vomit in disgust and/or refuse
  patches from darcs users.
 
 That's definitely the case. I will _not_ be taking random files etc just 
 to keep other peoples stuff straightened up.

Okay.

  Another slightly less noxious possibility would be to store the darcs
  patch as a hidden file, if git were given the concept of
  commit-specific files.
 
 No, git will not track commit-specific files. There's the comment
 section, and that _is_ the commit-specific file. But I will refuse to
 take any comments that aren't just human-readable explanations, together
 with maybe one extra line of
 
   # Darcs ID: 780c057447d4feef015a905aaf6c87db894ff58c
 
 (others will want to track _their_ PR numbers etc) and that's it. The 
 actual darcs data that that ID refers to can obviously be maintained in 
 _another_ git archive, but it's not one I'm going to carry about.

The trouble is that the philosophy of darcs and git are about as orthogonal
as one can come.  Git treats the content as fundamental, where in darcs the
changes are fundamental.  Since in darcs there can be different changes
that lead from the same parent to the same child--and these differences are
meaningful when merges happen---when interacting with git, we either need
to restrict darcs to only describe changes in a way that can be uniquely
determined by a parent and child, or we need to have extra metadata
somewhere.

For bidirectional functionality, we either need to avoid the use of
advanced darcs features, or we need to include that information in git
somehow, or we need to keep a parallel darcs archive holding that
information.

Would a small amount of human-readable change information be acceptable in
the free-form comment area? In the rename thread I got the impression this
would be okay for renames.  For example,

rename foo bar

or (this is less important, but you might consider it to be a useful
human-readable comment)

replace [_a-zA-Z0-9] old_variable new_variable file/path

Currently these two patch types account for almost the sum total of the
cases where different patches lead to the same resulting trees.
-- 
David Roundy
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread David Roundy
On Tue, Apr 19, 2005 at 02:55:05AM +0200, Juliusz Chroboczek wrote:
 [Using git as a backend for Darcs.]
...
   1. remove the assumption that patch IDs have a fixed format.  Patch
   IDs should be opaque blobs of binary data that Darcs only compares
   for equality.
 
  I'm not really comfortable with this,
 
 Why?

I'm not clear why it would be necesary, and it takes the only immutable
piece of information regarding a patch, and makes it variable.  Just seems
dangerous and complicated, and I'm not sure why we'd need to do it.

 Suppose I record a patch in Darcs; it gets a Darcs id.  I push it into
 git, at which point it gets a git id, whether we want it to or not.
 What do we do when we pull that patch back into darcs?
 
 Either we arbitrarily discard one of the ids (which one?), or we keep
 both.  If there's more pulling/pushing going on on the git side, we
 definitely need to keep both.

Or alternatively, we could have a one-to-one mapping between git IDs and
darcs IDs, which is what I'd do.

  I think when dealing with git (and probably also with *any* other SCM
  (arch being a possible exception), we need to consider the exchange
  medium to be not a patch, but a tag.
 
 We're thinking in opposite directions -- you're thinking of the alien
 versions as integrals of Darcs patches, I'm thinking of Darcs patches
 as derivatives of alien versions.
 
   You:  alien version = Darcs tag
 
   Me:   Darcs patch = pair of successive alien versions
 
 My gut instinct is that the second model can be made to work almost
 seamlessly, unlike the first one.  But that's just a guess.

The problem is that there is no sequence of alien versions that one can
differentiate.  Git has a branched history, with each version that follows
a merge having multiple parents.  How do you define that change?  It's easy
enough to do if we tag each git version in darcs, since we know what the
two parents are, and we know what the final state is, but there *is* no
translation from a single git ID either to a single patch(1) patch, or to a
single darcs patch--unless you treat its parents as tags.

The key is that we can't make git work like darcs, so we'll have to make
darcs work like git.  If we do it right (automatically tagging like crazy
people), darcs users between themselves can cherry-pick all they like,
without introducing inconsistencies or losing interoperability with git.

To summarize how I'd see the mapping between git information and darcs, a
git commit would be composed of one darcs patch and one darcs tag.  With
this mapping, I don't believe we lose any information, and I believe we'll
be able to (except that patches would have to be uniquely determined by a
pair of trees) simply translate the darcs system right back again, since
it's a one-to-one correspondence of information.

My proposed mapping:

tree 6ff0e9f3d131bd110d32829f0b14f07da8313c45
# This is a darcs tag ID
parent abd62b9caee377595a9bf75f363328c82a38f86e
# This is the context of both a patch and tag.
author James Bottomley [EMAIL PROTECTED] 1113879319 -0700
# This is the author and date of the patch
committer Linus Torvalds [EMAIL PROTECTED](none) 1113879319 -0700
# This is the author and date of the tag
# Everything below would be the name and long comment of the patch

[PATCH] SCSI trees, merges and git status

Doing the latest SCSI merge exposed two bugs in your merge script:

1) It doesn't like a completely new directory (the misc tree contains a
   new drivers/scsi/lpfc)
2) the merge testing logic is wrong.  You only want to exit 1 if the
   merge fails. 


-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread David Roundy
On Mon, Apr 18, 2005 at 06:42:11PM -0700, Ray Lee wrote:
 On Mon, 2005-04-18 at 21:05 -0400, Kevin Smith wrote:
  You could guess, but that's not good enough for darcs to be able to
  reliably commute the patches later.

 Who said anything about guessing? If a user replaces all instances of
 foo with bar, that's as close to proof as you can ever get, without
 recording intent of the user at the time it's done. Now, I realize that
 darcs *does* record intent, but I claim that's immaterial.

The problem is, how do you know how to define a token? That's also included
in a darcs patch.  And a darcs user may choose not to use a replace patch,
if (for example) he's renaming a local variable, since he might not want to
mess with other functions in the same file.

Guessing the author's intent cannot reliably reproduce the author's stated
intent.  Either we need to include that information in one form or another
(and in one location or another), or we've got to simply disallow replaces
(and moves?) when interacting with git.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread Juliusz Chroboczek
[Removing Linus from CC, keeping the Git list -- or should we remove it?]

 I'm not clear why it would be necesary, and it takes the only immutable
 piece of information regarding a patch, and makes it variable.

Er... I'm not suggesting to make it variable, just to make it an
opaque blob of bytes (still immutable).  I see from the examples you
give below that you agree that the format needs extending, so I
suspect we're actually agreeing here, just failing to communicate.

about having multiple ids per patch:

 Or alternatively, we could have a one-to-one mapping between git IDs and
 darcs IDs, which is what I'd do.

Okay, you've convinced me.  It's much simpler that way, we'll see how
well it works.

 The problem is that there is no sequence of alien versions that one can
 differentiate.  Git has a branched history, with each version that follows
 a merge having multiple parents.

Yep.  I've just realised that this morning.  Is there some notion of
``primary parent'' as in Arch?  Can a changeset have 0 parents?

 If we do it right (automatically tagging like crazy people), darcs
 users between themselves can cherry-pick all they like, without
 introducing inconsistencies or losing interoperability with git.

You've lost me here.  How can you cherry-pick if every tag depends on
the preceding patches?  Or are you thinking of pulling just the patch
and not the tag -- in that case, what happens when you push to git a
Darcs patch that depends on a patch that originated with git?

I've started interfacing Haskell with git this week-end, that's
something we'll need whichever model we choose.  We should be able to
start playing with actually modifying Darcs after next week-end.

Juliusz
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread Petr Baudis
Dear diary, on Tue, Apr 19, 2005 at 02:20:55PM CEST, I got a letter
where Juliusz Chroboczek [EMAIL PROTECTED] told me that...
  The problem is that there is no sequence of alien versions that one can
  differentiate.  Git has a branched history, with each version that follows
  a merge having multiple parents.
 
 Yep.  I've just realised that this morning.  Is there some notion of
 ``primary parent'' as in Arch?  Can a changeset have 0 parents?

Yes, the root commit. Usually, there is only one, but there may be
multiple of them theoretically.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Tupshin Harper wrote:
 
 I suspect that any use of wildcards in a new format would be impossible 
 for darcs since it wouldn't allow darcs to construct dependencies, 
 though I'll leave it to david to respond to that.

Note that git _does_ very efficiently (and I mean _very_) expose the 
changed files.

So if this kind of darcs patch is always the same pattern just repeated
over n files, then you really don't need to even list the files at all.  
Git gives you a very efficient file listing by just doing a diff-tree  
(which does not diff the _contents_ - it really just gives you a pretty
much zero-cost which files changed listing).

So that combination would be 100% reliable _if_ you always split up darcs 
patches to common elements. 

And note that there does not have to be a 1:1 relationship between a git
commit and a darcs patch. For example, say that you have a darcs patch
that does a combination of change token x to token y in 100 files and
rename file a into b. I don't know if you do those kind of combination 
patches at all, but if you do, why not just split them up into two? That 
way the list of files changed _does_ 100% determine the list of files for 
the token exchange.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread Patrick McFarland
On Monday 18 April 2005 10:05 pm, Kevin Smith wrote:
 The big feature of a darcs replace patch is that it works forward and
 backward in time. Let me try to come up with an example that can help
 explain it. Hopefully I'll get it right. Let's start with a file like
 this that exists in a project for which both you and I have darcs repos:

 cat
 dog
 fish

 Now, you change it to:

 cat dog
 dog
 fish

 while I simultaneously do a replace of dog with plant, resulting in:

 cat
 plant
 fish

 We merge. The final result in both of our trees is:

 cat plant
 plant
 fish

 Notice that just by looking at my diffs, you can't tell that I used a
 replace operation. I didn't just replace the instances of dog that
 were in my file at that moment. I conceptually replaced all instances,
 including ones that aren't there yet.

I think that's the best explanation of how it works. And that is partially why 
darcs is so powerful.

-- 
Patrick Diablo-D3 McFarland || [EMAIL PROTECTED]
Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music. -- Kristian Wilson, Nintendo, Inc, 1989


pgprVLv2ZgcYv.pgp
Description: PGP signature


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread Ray Lee
(Sorry for the delayed reply -- I'm living on tape delay for a bit.)

On Mon, 2005-04-18 at 22:05 -0400, Kevin Smith wrote:
 The other is replace very instace of identifier `foo` with 
 identifier`bar`.
 
 That could be derived, however, by a particularly smart parser [1].
 
 No, it can't. Seriously. A darcs replace patch is encoded as rules, not
 effects, and it is impossible to derive the rules just by looking at the
 results. Not difficult. Impossible.
   
  If I do a token replace in an editor (say one of those fancy new-fangled
  refactoring thangs, or good ol' vi), a token-level comparator can
  discover what I did. That link I sent is an example of one such beast.
 
 The big feature of a darcs replace patch is that it works forward and
 backward in time.

That's *not* a feature of the token replace patch, however. That's a
feature of the darcs commutation machinery, correct? (With the obvious
caveat that darcs can only *do* the commutation if it has correctly
nuanced darcs-style token replace patches, rather than mere ASCII
textual diffs.)

 Let me try to come up with an example that can help
 explain it. Hopefully I'll get it right. Let's start with a file like
 this that exists in a project for which both you and I have darcs repos:
 
 cat
 dog
 fish
 
 Now, you change it to:
 
 cat dog
 dog
 fish
 
 while I simultaneously do a replace of dog with plant, resulting in:
 
 cat
 plant
 fish
 
 We merge. The final result in both of our trees is:
 
 cat plant
 plant
 fish

Okay, that all makes sense.

 Notice that just by looking at my diffs, you can't tell that I used a
 replace operation.

Here's where we disagree. If you checkpoint your tree before the
replace, and immediately after, the only differences in the
source-controlled files would be due to the replace. And since the
language of the file is known (and thereby the tokenization -- it *is*
well-defined), then a tokenizer that compares the before and after trees
(for just the files that changed, obviously), can discover what you did,
and promote the mere ASCII diff into a token-replace diff. (The same
sort of idea could be done for reindention, I'd hope.)

 I didn't just replace the instances of dog that
 were in my file at that moment. I conceptually replaced all instances,
 including ones that aren't there yet.

Well yes, that's exactly what we want. And the key point of all of this
is that there's no magic here. The darcs machinery does all the
commutations such that the patches can wiggle together without
conflicts. To do it's job, of course, it needs nuanced patches, rather
than the quite literal ones generated by diff.

We agree on everything except that it's provable that one can discover a
replace operation, given a before and after tree.

 Now, I should mention here that I personally dislike the replace
 operation, and I think it is more dangerous than helpful. However, other
 darcs users are quite happy with it, and it certainly is a creative and
 powerful feature.

It's creative alright, though I had the same misgivings. In my common
code workflow, I almost never have global tokens -- all my replaces
would be per function, so I never saw an opportunity to use it when I
was screwing around with darcs.

 Other creative patch types have also been dreamed of. For example, a
 powerful language-specific refactoring operation has been discussed as a
 far-future possibility. That would be safe, and cool.

subliminal indention patch type, indention patch type... /subliminal

   Automated refactoring tools, for example, perform the
   rename+modify as an atomic operation.
  [...]
 Although there are no such nifty refactoring tools available today, they
 will exist at some point.

Yeah, I spent some time drooling over the refactoring editors before
slapping myself and deciding I'd wait for others to live on that
bleeding edge for a while. I've had to clean up too much code from other
people.

 Even without tools, many shops have policies against checking in code
 that won't compile. If you rename a java class, you must simultaneously
 perform the rename and modify the class name inside. If you commit
 between those steps, it's broken.

I'm trying hard to find a nice way to say that's silly. I'm failing. My
suggestion in that case would be that the local coder commit many
patches to a local repository, one of which is the rename. Then upon
completion of the refactoring, the set of patches is committed to the
group repository. Tags before and after preserve the repository's
precondition that it always compiles.

 [I do realize that the kernel doesn't have java code, by the way.]

Don't worry, I didn't think that you did :-).

Ray

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread Tupshin Harper
Ray Lee wrote:
Here's where we disagree. If you checkpoint your tree before the
replace, and immediately after, the only differences in the
source-controlled files would be due to the replace.
This is assuming that you only have one replace and no other operations 
recorded in the patch. If you have multiple replaces or a replace and a 
traditional diff  recorded in the same patch, then this is not true.

And since the
language of the file is known (and thereby the tokenization -- it *is*
well-defined), then a tokenizer that compares the before and after trees
(for just the files that changed, obviously), can discover what you did,
and promote the mere ASCII diff into a token-replace diff. (The same
sort of idea could be done for reindention, I'd hope.)
 

See above for one set of limitations on this. A more fundamental problem 
comes back to intent. If I have a file foo before:
a1
a2
and after:
b1
b2
is that a replace [_a-zA-Z0-9] a b foo patch, or is that a
-a1
-a2
+b1
+b2
patch? Note that this comes down to heuristics, and no matter what you 
use, you will be wrong sometimes,  *and* the choice that is made can 
substantively affect the contents of the repository after additional 
patches are applied.

We agree on everything except that it's provable that one can discover a
replace operation, given a before and after tree.
 

It's provable that you can not.
-Tupshin
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread Ray Lee
On Tue, 2005-04-19 at 10:22 +0200, Juliusz Chroboczek wrote:
   Aye, that will require some metadata on the git side (the hack,
   suggested by Linus, of using git hashes to notice moves won't work).
 
  So, why won't it work?
 
 Because two files can legitimately have identical contents without
 being ``the same'' file from the VC system's point of view.
 
 In other words, two files may happen to have the same contents but
 have distinct histories.

Eh, let's not talk using integral/summation view across all the patches
that ever could have come in against the file. We're hamstringing
ourselves if we do that, and it's not what darcs does. darcs looks at a
differential view of the changes, and for a mv, it looks at it when it
happens.

darcs does a darcs mv to commit a file move patch to whatever
logging or patch repository it keeps below the surface.

The equivalent in git would be to have a given tree, move a file via
bash's mv, and then checkpoint a new tree. (I'm sure there's details in
there, but that's plumbing, and what we have Petr for.)

A differential comparison of the two trees shows no content changed, but
a file label was modified. Ergo, a rename occurred.

QED.

~r.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-18 Thread Ray Lee
On Mon, 2005-04-18 at 08:20 -0400, David Roundy wrote:
 Putting darcs patches *into* git is more complicated, since we'll want to
 get them back again without modification.  Normal hunk patches would be
 no problem, provided we never change our diff algorithm (which has been
 discussed recently, in the context of making hunks better align with blocks
 of code).  We could perhaps tell users not to use replace patches.  But
 avoiding mv patches would be downright silly.

Okay, I still haven't used git yet (and have only toyed around with
darcs for a bit), so take what I'm saying with a grain of salt.
Regardless, I think you may be asking the wrong question. The tracking
of renames was bandied about pretty thoroughly on-list from Wednesday
through Friday (for far better commentary and insight, see Linus'
messages with subject: Merge with git-pasky II.)

git does track changesets that describe the parent tree(s) and the
result. The trees track filenames and hashes. So, doing a fairly
straightforward compare on two trees will let you immediately discover
renames that have occurred, as the filename in the tree changed while
the hash didn't.

So, the question then becomes, can an outside tool cheaply derive all
the information that darcs would need to perform it's work? The renames
should be easy, as long as no content changed during the rename. As for
token replacement (and whitespace changes, etc.), that could be
discovered via domain-specific parsers (something specific per language,
for example). Linus tossed a link to one such tool (hmm, where was it.
Sheesh. You sure right a lot, dude :-).)

http://minnie.tuhs.org/Programs   (see Ctcompare)

...which should be viewed more as a proof-of-concept than a mergeable
code-set. It does show that diff's vocabulary is sadly lacking in
expressiveness, and improving that, I think, would be a useful area to
expend effort. 

Again, I may be off here, especially considering I've a backlog of a
couple hundred messages to read since the weekend. (You guys need to go
outside more often.)

Ray

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-18 Thread linux
 Hell no.
 
 The commit _does_ specify the patch uniquely and exactly, so I really 
 don't see the point. You can always get the patch by just doing a

   git diff $parent_tree $thistree

 so putting the patch in the comment is not an option.

Er... no.

One of darcs' big points is that it has at least two fundamentally
different *kinds* of patches.  One is the classic diff(1) style.

The other is replace very instace of identifier `foo` with identifier`bar`.

Note that merging such a patch with another that adds a new instance
of foo has a quite different effect from a similar diff-style patch.
Even though both have identical effects on the tree to which they were
initially merged.

And darcs is specifically intended to support additional kinds of patches.
Again, all in order that the patch can work better when applied to
trees *other* that the one it was originally developed against.


Anyway, the point is that, in the darcs world, it is NOT possible to
reconstruct a patch from the before and after trees.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-18 Thread Ray Lee
On Mon, 2005-04-18 at 21:04 +, [EMAIL PROTECTED] wrote:
 The other is replace very instace of identifier `foo` with identifier`bar`.

That could be derived, however, by a particularly smart parser [1].
Alternately, that itself could be embedded in the comment for patches
sourced from darcs. Of course, that means patches from others are less
commutable than from other darcs users, but that's the price you'd pay
for relying on the user to explicitly note a token rename.

  [1] An example: http://minnie.tuhs.org/Programs/Ctcompare/index.html

As for darcs mv, that can be derived from the before/after pictures of
the trees.

 And darcs is specifically intended to support additional kinds of patches.

Anything missing out of what I listed above? (darcs has adddir and
addfile, IIRC, but those are trivially discovered via inspection of the
trees as well, I think.)

 Anyway, the point is that, in the darcs world, it is NOT possible to
 reconstruct a patch from the before and after trees.

Not yet, and maybe not ever, but I think we can certainly get closer to
discovering what the coder was thinking during a changeset.

Ray

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-18 Thread Kevin Smith
Ray Lee wrote:
 On Mon, 2005-04-18 at 21:04 +, [EMAIL PROTECTED] wrote:
 
The other is replace very instace of identifier `foo` with identifier`bar`.
 
 
 That could be derived, however, by a particularly smart parser [1].

No, it can't. Seriously. A darcs replace patch is encoded as rules, not
effects, and it is impossible to derive the rules just by looking at the
results. Not difficult. Impossible. You could guess, but that's not good
enough for darcs to be able to reliably commute the patches later.

I am curious whether Linus's suggestion about including the
corresponding darcs patch id in the git commit comments would be good
enough.

 As for darcs mv, that can be derived from the before/after pictures of
 the trees.

Perhaps. If a file is moved and edited within the same commit, I'm not
sure that you can be certain whether it was done with d 'darcs mv' or
not. Requiring separate checkins for the rename and the subsequent
modify would make things easier on SCM's, but is impractical in real
life. Automated refactoring tools, for example, perform the
rename+modify as an atomic operation.

Now, git might not need to deal with any of this, because it only needs
to work with the kernel project. But darcs does have to deal with this
wide range of uses, as does just about any other SCM.

I'm *not* advocating cluttering up git with features that are not
directly needed for kernel development. I'm just trying to clarify the
facts so everyone can understand what darcs is trying to do.

Kevin
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-18 Thread Kevin Smith
Ray Lee wrote:
 On Mon, 2005-04-18 at 21:05 -0400, Kevin Smith wrote:
 
The other is replace very instace of identifier `foo` with 
identifier`bar`.

That could be derived, however, by a particularly smart parser [1].

No, it can't. Seriously. A darcs replace patch is encoded as rules, not
effects, and it is impossible to derive the rules just by looking at the
results. Not difficult. Impossible.
 
 
 Okay, either I'm a sight stupider than I thought, or I'm not
 communicating well. Same net effect either way, I 'spose.
 
 If I do a token replace in an editor (say one of those fancy new-fangled
 refactoring thangs, or good ol' vi), a token-level comparator can
 discover what I did. That link I sent is an example of one such beast.

The big feature of a darcs replace patch is that it works forward and
backward in time. Let me try to come up with an example that can help
explain it. Hopefully I'll get it right. Let's start with a file like
this that exists in a project for which both you and I have darcs repos:

cat
dog
fish

Now, you change it to:

cat dog
dog
fish

while I simultaneously do a replace of dog with plant, resulting in:

cat
plant
fish

We merge. The final result in both of our trees is:

cat plant
plant
fish

Notice that just by looking at my diffs, you can't tell that I used a
replace operation. I didn't just replace the instances of dog that
were in my file at that moment. I conceptually replaced all instances,
including ones that aren't there yet.

Now, I should mention here that I personally dislike the replace
operation, and I think it is more dangerous than helpful. However, other
darcs users are quite happy with it, and it certainly is a creative and
powerful feature.

Other creative patch types have also been dreamed of. For example, a
powerful language-specific refactoring operation has been discussed as a
far-future possibility. That would be safe, and cool.

Automated refactoring tools, for example, perform the
rename+modify as an atomic operation.
 
 And that's harder, I agree. But unless I'm missing some nifty
 refactoring editor out there that integrates with darcs during the edit
 session, the user *still* has to tell the SCM about the rename manually.

Although there are no such nifty refactoring tools available today, they
will exist at some point. If they existed today, the world would be a
better place.

Even without tools, many shops have policies against checking in code
that won't compile. If you rename a java class, you must simultaneously
perform the rename and modify the class name inside. If you commit
between those steps, it's broken. [I do realize that the kernel doesn't
have java code, by the way.]

I should also mention that I currently believe that Linus is correct
that explicit rename tracking is not required for git. I have every hope
that his plan for handling the more general case of moved text will
take care of renames as a side effect. I don't know if that will be
sufficient to allow a two-way lossless gateway between git and darcs or
other systems that do track renames explicitly.

Kevin
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html