On Sun, Nov 22, 2020 at 10:08:45AM +0100, Ben Franksen wrote:
> Am 21.11.20 um 20:39 schrieb James Cook:
> > I've been trying to put together a proof, and ran into a strange
> > example: what if AB and B' are both repositories, but A and B don't
> > commute? I.e. A can only be applied if B hasn't been applied yet, but B
> > doesn't depend on A.
> > 
> > My questions: Is this situation disallowed by some existing patch
> > theory formalization? Is there a good reason to rule it out? Is it
> > actually possible in Darcs? (I doubt the last one.)
> 
> TL;DR:
> 
> I don't believe that any existing text on patch theory has ever fully
> formalized this requirement, but I may be wrong e.g. the camp paper may
> have. (It's been quite a while since I last looked at the paper.)
> 
> There are very good reasons to rule this out, indeed it is essential to
> do so, otherwise there is no point in naming primitive patches.
> 
> It is possible in Darcs, but only if you manually craft patches and not
> with darcs commands, unless you are exceptionally unlucky ;-)
> 
> > As far as I can tell, the example is consistent with the patch theory
> > axioms in [0] and [1]: we can simply say that in our patch theory,
> > nothing ever commutes, and then the axioms don't have much to say.
> > 
> > The Camp theory pdf [2] makes a claim that seems to rule out the
> > example: in section 8.1, "Merge preparation", it's asserted that as a
> > first step toward merging two repos, we can move the common patches to
> > the beginning. In this case, that would imply B can be moved to the
> > beginning of the repo AB. However, I'm not sure where that's proved,
> > and I also can't see why the above situation is inconsistent with the
> > axioms in that write-up. (Admittedly, I haven't read it completely ---
> > if someone can point me to the right parts, I'd appreciate it.)
> 
> The property you need to rule this out is a global one: you must assume
> that each newly recorded patch has a universally unique name. The patch
> laws are all local properties, so they cannot give you that.
> 
> Darcs tries to ensure this by adding a random number (taken from system
> entropy i.e. cryptographically secure, as far as that is possible) to
> the patch name when a patch is recorded (or amended etc). But it is
> clear that this is no guarantee, since it cannot prevent someone from
> manually creating a patch that has the same identity as a completely
> unrelated patch somewhere else. If you pull such a patch, you repo
> invariants are broken and you may get crashes (or worse: inconsistent
> behavior).
> 
> This is a great, perhaps the greatest, weakness of patch theories a la
> Darcs. We have discussed this at great length during the past years on
> darcs-devel in varying contexts. It crops up again and again.

If we could easily commute a patch to its minimal context, would that
solve the problem? (E.g: the name of the patch is a hash of all the
names in its minimal context together with the content of the patch
when commuted to that minimal context.)

I remember you said earlier that there is no known way to efficiently
compute that minimal context. Also, even if it were known, commuting
the patch to that context could be expensive. Are those the reasons
that scheme doesn't work?

Here's a sketch of an idea to try to overcome the computational
concern, inspired by pijul. I don't know if it works because the
details are lacking. (Also I wouldn't be surprised if something like
this has already been discussed; this doesn't feel very original.)

For simplicity, assume prim patches are all hunks and the repo is one
text file.

For every primitive patch in the repo, keep track of an interval of
lines "touched" by the patch. When a patch p is first applied, its
interval is the lines it added (and maybe one extra line before and
after). As hunks are added, p's interval is adjusted in some
appropriate way. If lines from the beginning or end of p's hunk are
deleted or replaced by a later patch, p's interval shrinks. Eventually,
p might be reduced to the empty interval, but we still keep track of
the interval even then --- we think of p as living "between" two lines
of the file. This is important because any later patch that touches
that region depends on p.

Updating every interval every time a patch is added could be expensive.
Maybe this could be mitigated by assigning unique identifiers to the
lines of the file, and recording the intervals in terms of those
identifiers.

Could these "touched" intervals be used to compute a minimal context
for a new patch? The patch's name could be a hash of four things: that
minimal context; the lines added; the unique ID of the line before the
hunk is added; and metadata like the user-supplied description.

-- 
James
_______________________________________________
darcs-users mailing list
darcs-users@osuosl.org
https://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to