Re: [darcs-users] Request for more details

Dalon Work Thu, 12 Nov 2020 04:22:26 -0800

Ah, I figured it out. The baseline when importing patches is the set of
patches common to the two repositories.


Dalon

On Wed, Nov 11, 2020, 12:09 PM Dalon Work <dww...@gmail.com> wrote:

> Mr. Franksen,
>
> Thank you for the detailed reply! I think I see better how this works now.
> Patch theory details the laws that a given implementation (primitive patch
> theory) must follow. Your current implementation has not been formally
> proven, but is checked with random testing. (The QuickCheck that you
> mention.) Is this correct?
>
> I can see that I have perhaps been a bit too cavalier with my use of
> "equality" as well, so I appreciate the more nuanced explanations. I don't
> know haskell, but I will try to figure out enough to work through the code
> that you've pointed me to. I also found (again) the explanation of how to
> do a merge in the "Understanding Darcs" Wikibook, so that combined with
> your answer helped me understand that part of it.
>
> As far as my scalability questions, I'm glad to know (as I suspected) that
> you don't have to start over to reconstruct the repository. My second
> question about "batches" wasn't phrased very well, and may be just
> ridiculous, so please let me know if it is. The idea is to create
> "mega-patches" that are composites of several patches. Would it be
> theoretically possible (i.e. correct) to operate on mega-patches, and only
> when that fails, break them down and operate on single patches? From your
> comments, it seems this kind of optimization may not be necessary.
>
> As a followup question about merging patches from different repositories,
> I assume that internally, the patches have been applied in some sequence.
> As the patches are modified or added, the patches are allowed to move
> around in that sequence, following the appropriate rules. When you pull
> patches from another repository, how do you determine where in the sequence
> to place the new patches? The implicit assumption in the commutation
> diagram is that you have a common base context to work from. So how do you
> find that common state? Or perhaps, you don't have to. Perhaps you just
> start applying the new patches and seeing if they work? I'm a little fuzzy
> on this point.
>
> Again, thank you for the explanation. It's pretty interesting!
>
> Dalon
>
>
> On Wed, Nov 11, 2020 at 6:10 AM Ben Franksen <ben.frank...@online.de>
> wrote:
>
>> Hi Dalon
>>
>> Am 10.11.20 um 14:51 schrieb Dalon Work:
>> > I recently discovered Darcs, and probably like lots of people, I'm very
>> > intrigued by the "Patch Theory" behind it. I've read everything I could
>> > find, and to be fair, much of it was over my head. I think I understand
>> the
>> > basic ideas, but I still have a lot of questions. I would appreciate it
>> if
>> > anyone could point me at some materials I missed or misunderstood, or
>> could
>> > explain in greater detail how it all works.
>>
>> Unfortunately there is no ultimate reference for patch theory that
>> covers every aspect and every variant implemented by darcs.
>>
>> > So, starting with a set of patches with a linear history: AB, and if you
>> > want to "undo" A, you have to reorder A & B so you can undo A.
>>
>> Correct. However, note that there are two ways to undo a patch A: you
>> can record a new patch that inverts all changes made by A. Or you can
>> just drop it from the end of the sequence. The first is 'darcs rollback
>> + darcs record', the second is 'darcs obliterate'.
>>
>> > Take AB,
>> > commute them to get B'A', and then post-multiply by the inverse of A':
>> >
>> > B'A'.inv(A) = B'.
>>
>> These sequences are not equal, though they have the same effect. In
>> general we consider two sequences of patches equal if they differ only
>> by commutation; writing A^ for the inverse of A and [] for the empty
>> sequence, AA^ and [] are not the same. In /some/ contexts we can treat
>> them as equivalent, for instance when applying them to a tree, but not
>> in general.
>>
>> > I understand the basic diagram that gives the equation 0 -> AB = B'A'
>> -> 1,
>> > where we start with context 0, and want to get to context 1. But I don't
>> > understand how we actually solve for B' and A'? It seems we have 2
>> > unknowns, but only 1 equation. This is the part I can't seem to find.
>>
>> The underlying theory of (so called) primitive patches must have come
>> already equipped with such a commute operation. How it is implemented
>> depends on exactly what the primitive patches are (e.g. their
>> representation) and on what the designer of the theory wants to achieve.
>> As an extreme example, a theory in which no two primitive patches can be
>> commuted is valid.
>>
>> > Is
>> > this a manual step? I can see how if the two patches have nothing to do
>> > with each other, then B' = B, and A' = A. But what if they do influence
>> > each other, but don't actually conflict?
>>
>> If you are able to read Haskell, you could look at
>>
>> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Commute.hs
>> to find out how it is done in darcs. This is not covered in any text on
>> patch theory: these texts always assume a primitive patch theory as
>> given (subject to certain laws).
>>
>> > Then, in the case of a merge of two parallel patches, how do we know
>> what
>> > the "unique" merge result is?
>>
>> Uniqueness of (clean i.e. unconflicted) merge follows from uniqueness of
>> commute, given the usual patch laws, since the unconflicted merge is
>> defined as "invert one of the two patches, commute, then invert result
>> back".
>>
>> > (Assuming no conflicts). The equation is now
>> > 0 -> AU = BV -> 1, but now we don't even know what the ending context 1
>> > looks like! This to me looks like we have 3 unknowns, and only 1
>> equation.
>> > What am I missing here?
>>
>> Again, the key is to assume you already have a function commute that
>> takes a sequentional pair of patches and gives you back another
>> sequential pair.
>>
>> > My next question is about scalability. In a system with hundreds of
>> > thousands of patches, it seems like a scalability nightmare, as the
>> naive
>> > description of how the system works makes it seems that to reconstruct
>> the
>> > state of the repository would require building the working tree from the
>> > empty context every time.
>>
>> Correct. Without additional efforts, a "naked" patch theory would be
>> extremely slow!
>>
>> > There has to be a way to "snapshot" a set of
>> > patches so you have a new ground-zero to start from.
>>
>> And there is (in darcs). We call it the "pristine tree", which is very
>> much like a "tree object" in git (also stored in a hashed format,
>> similar to git). It corresponds to the current state of a repo i.e. the
>> sequence of recorded patches. You could, in principle, store
>> intermediate snapshots, too, e.g. for each tag, but this is not
>> currently done in darcs.
>>
>> Darcs has other "snapshot" like optimisations. For instance, the
>> sequence of patches in a repo is broken up into so called inventories,
>> which are sequences of patches terminated by a tag that depends on all
>> previous patches. This means that most operations only have to work with
>> the "current inventory" which contains (only) the patches since the
>> latest tag.
>>
>> > And then, importing
>> > patches from an outside repository, is it possible to "batch" them, and
>> > work on a sets of patches inside of comparing patches 1 at a time?
>> (ignore
>> > conflicts for now).
>>
>> Not sure I understand the question correctly. Ultimately you need to
>> merge patches one by one and also apply them one by one.
>>
>> > It seems to me
>> > that you would have to take all the incoming patches, and 1 at a time,
>> > modify the context for each one of them to apply them on top of the
>> > importing repository. It sounds painfully slow.
>>
>> Starting with the current state (the pristine tree) and applying patches
>> (or unapplying them) to it to get a new state is pretty much optimised
>> in darcs but dpenending on the number and size of patches can still take
>> some time. An extreme case is doing 'darcs check' on the current darcs
>> repo with about 13000 patches; this takes about 15 seconds on my laptop.
>>
>> But normal operations seldom consider more than a hand full of patches,
>> perhaps a few hundred in bad cases.
>>
>> > How do you determine which patches are dependent on which patches? I
>> know
>> > the user can manually specify dependencies, but there has to be some
>> > automated way to figure out the important ones, the ones that would
>> break
>> > things if not maintained correctly.
>>
>> Simple: B depends on A if A;B does not commute.
>>
>> > Naively thinking, would you have to try
>> > and commute the incoming patch with every other stored patch? Or can
>> you do
>> > that until you find a match, and attach it to the "dependency tree", or
>> are
>> > there encoded manual rules?
>>
>> There is no explicit representation of the dependency graph anywhere in
>> darcs. It wouldn't help, since patches can change representation when
>> commuted, so even if you new beforehand that A and B will successfully
>> commute that still doesn't give you the result of the commute.
>>
>> To understand how darcs merges patches it is necessary to consider patch
>> "names". The name of a patch is a hash of all its meta-data (name, log
>> message, timestamp, author, and some random noise added when you
>> record). This is what identifies a patch semantically. Darcs assumes
>> that patches with the same "name" really are commuted versions of the
>> same original patch. This is insecure, but allows to quickly compare
>> large sets of patches for equality.
>>
>> When we merge patches from another repo, we first find the set of all
>> patches common to both the local and the remote repo. What we actually
>> have to merge are the remaining sequences of patches i.e. those that are
>> only in our repo and those that are only on the remote one.
>>
>> Finding the common set of patches is further optimised using inventories
>> but I guess explaining that would lead us too far away.
>>
>> > What is the internal patch format for darcs? Like, could I implement a
>> > crude patch-based system using unix diff/patch utils?
>>
>> No, that is not possible. The basic patch operations MUST strictly
>> adhere to the laws of patch theory.
>>
>> > Or do you have to
>> > store extra metadata to get the mechanics to work correctly? So far, all
>> > I've found is the high-level "a patch is a transformation function".
>> Great.
>> > What does that function actually look like in real-life?
>>
>> As I mention above, this is not covered by the existing texts on patch
>> theory. The point is that this function must be reified as data!
>>
>> I will give you an example. In Darcs, the most important primitive patch
>> type is called "hunk". A hunk consists of the following data:
>>
>>  * file name
>>  * position (line number)
>>  * lines to remove
>>  * lines to add
>>
>> In short, we have
>>
>>   Hunk filename pos [ContentLine] [ContentLine]
>>
>> where ContentLine is how you represent a line of text.
>>
>> To commute two Hunks very roughly goes as follows:
>>
>> If the filenames differ, they commute trivially (i.e. without change).
>> If what the first patch adds overlaps with what the second one removes,
>> then they fail to commute. Otherwise, we can commute them by adapting
>> the line numbers. The details are a bit tricky (fiddling with line
>> numbers, what exactly means "overlap") but not hard to reproduce.
>>
>> For details, see
>>
>> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Commute.hs#214
>>
>> The definition of the primitive patch data type is
>> here:
>> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Core.hs#40
>>
>> It is not difficult to define your own primitive patch theory, but
>> making sure it conforms to all the required patch laws is another
>> matter! We never formally proved that for darcs, but instead rely on
>> random property testing using QuickCheck.
>>
>> Cheers
>> Ben
>>
>> _______________________________________________
>> darcs-users mailing list
>> darcs-users@osuosl.org
>> https://lists.osuosl.org/mailman/listinfo/darcs-users
>>
>

_______________________________________________
darcs-users mailing list
darcs-users@osuosl.org
https://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] Request for more details

Reply via email to