Ah, I figured it out. The baseline when importing patches is the set of patches common to the two repositories.
Dalon On Wed, Nov 11, 2020, 12:09 PM Dalon Work <dww...@gmail.com> wrote: > Mr. Franksen, > > Thank you for the detailed reply! I think I see better how this works now. > Patch theory details the laws that a given implementation (primitive patch > theory) must follow. Your current implementation has not been formally > proven, but is checked with random testing. (The QuickCheck that you > mention.) Is this correct? > > I can see that I have perhaps been a bit too cavalier with my use of > "equality" as well, so I appreciate the more nuanced explanations. I don't > know haskell, but I will try to figure out enough to work through the code > that you've pointed me to. I also found (again) the explanation of how to > do a merge in the "Understanding Darcs" Wikibook, so that combined with > your answer helped me understand that part of it. > > As far as my scalability questions, I'm glad to know (as I suspected) that > you don't have to start over to reconstruct the repository. My second > question about "batches" wasn't phrased very well, and may be just > ridiculous, so please let me know if it is. The idea is to create > "mega-patches" that are composites of several patches. Would it be > theoretically possible (i.e. correct) to operate on mega-patches, and only > when that fails, break them down and operate on single patches? From your > comments, it seems this kind of optimization may not be necessary. > > As a followup question about merging patches from different repositories, > I assume that internally, the patches have been applied in some sequence. > As the patches are modified or added, the patches are allowed to move > around in that sequence, following the appropriate rules. When you pull > patches from another repository, how do you determine where in the sequence > to place the new patches? The implicit assumption in the commutation > diagram is that you have a common base context to work from. So how do you > find that common state? Or perhaps, you don't have to. Perhaps you just > start applying the new patches and seeing if they work? I'm a little fuzzy > on this point. > > Again, thank you for the explanation. It's pretty interesting! > > Dalon > > > On Wed, Nov 11, 2020 at 6:10 AM Ben Franksen <ben.frank...@online.de> > wrote: > >> Hi Dalon >> >> Am 10.11.20 um 14:51 schrieb Dalon Work: >> > I recently discovered Darcs, and probably like lots of people, I'm very >> > intrigued by the "Patch Theory" behind it. I've read everything I could >> > find, and to be fair, much of it was over my head. I think I understand >> the >> > basic ideas, but I still have a lot of questions. I would appreciate it >> if >> > anyone could point me at some materials I missed or misunderstood, or >> could >> > explain in greater detail how it all works. >> >> Unfortunately there is no ultimate reference for patch theory that >> covers every aspect and every variant implemented by darcs. >> >> > So, starting with a set of patches with a linear history: AB, and if you >> > want to "undo" A, you have to reorder A & B so you can undo A. >> >> Correct. However, note that there are two ways to undo a patch A: you >> can record a new patch that inverts all changes made by A. Or you can >> just drop it from the end of the sequence. The first is 'darcs rollback >> + darcs record', the second is 'darcs obliterate'. >> >> > Take AB, >> > commute them to get B'A', and then post-multiply by the inverse of A': >> > >> > B'A'.inv(A) = B'. >> >> These sequences are not equal, though they have the same effect. In >> general we consider two sequences of patches equal if they differ only >> by commutation; writing A^ for the inverse of A and [] for the empty >> sequence, AA^ and [] are not the same. In /some/ contexts we can treat >> them as equivalent, for instance when applying them to a tree, but not >> in general. >> >> > I understand the basic diagram that gives the equation 0 -> AB = B'A' >> -> 1, >> > where we start with context 0, and want to get to context 1. But I don't >> > understand how we actually solve for B' and A'? It seems we have 2 >> > unknowns, but only 1 equation. This is the part I can't seem to find. >> >> The underlying theory of (so called) primitive patches must have come >> already equipped with such a commute operation. How it is implemented >> depends on exactly what the primitive patches are (e.g. their >> representation) and on what the designer of the theory wants to achieve. >> As an extreme example, a theory in which no two primitive patches can be >> commuted is valid. >> >> > Is >> > this a manual step? I can see how if the two patches have nothing to do >> > with each other, then B' = B, and A' = A. But what if they do influence >> > each other, but don't actually conflict? >> >> If you are able to read Haskell, you could look at >> >> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Commute.hs >> to find out how it is done in darcs. This is not covered in any text on >> patch theory: these texts always assume a primitive patch theory as >> given (subject to certain laws). >> >> > Then, in the case of a merge of two parallel patches, how do we know >> what >> > the "unique" merge result is? >> >> Uniqueness of (clean i.e. unconflicted) merge follows from uniqueness of >> commute, given the usual patch laws, since the unconflicted merge is >> defined as "invert one of the two patches, commute, then invert result >> back". >> >> > (Assuming no conflicts). The equation is now >> > 0 -> AU = BV -> 1, but now we don't even know what the ending context 1 >> > looks like! This to me looks like we have 3 unknowns, and only 1 >> equation. >> > What am I missing here? >> >> Again, the key is to assume you already have a function commute that >> takes a sequentional pair of patches and gives you back another >> sequential pair. >> >> > My next question is about scalability. In a system with hundreds of >> > thousands of patches, it seems like a scalability nightmare, as the >> naive >> > description of how the system works makes it seems that to reconstruct >> the >> > state of the repository would require building the working tree from the >> > empty context every time. >> >> Correct. Without additional efforts, a "naked" patch theory would be >> extremely slow! >> >> > There has to be a way to "snapshot" a set of >> > patches so you have a new ground-zero to start from. >> >> And there is (in darcs). We call it the "pristine tree", which is very >> much like a "tree object" in git (also stored in a hashed format, >> similar to git). It corresponds to the current state of a repo i.e. the >> sequence of recorded patches. You could, in principle, store >> intermediate snapshots, too, e.g. for each tag, but this is not >> currently done in darcs. >> >> Darcs has other "snapshot" like optimisations. For instance, the >> sequence of patches in a repo is broken up into so called inventories, >> which are sequences of patches terminated by a tag that depends on all >> previous patches. This means that most operations only have to work with >> the "current inventory" which contains (only) the patches since the >> latest tag. >> >> > And then, importing >> > patches from an outside repository, is it possible to "batch" them, and >> > work on a sets of patches inside of comparing patches 1 at a time? >> (ignore >> > conflicts for now). >> >> Not sure I understand the question correctly. Ultimately you need to >> merge patches one by one and also apply them one by one. >> >> > It seems to me >> > that you would have to take all the incoming patches, and 1 at a time, >> > modify the context for each one of them to apply them on top of the >> > importing repository. It sounds painfully slow. >> >> Starting with the current state (the pristine tree) and applying patches >> (or unapplying them) to it to get a new state is pretty much optimised >> in darcs but dpenending on the number and size of patches can still take >> some time. An extreme case is doing 'darcs check' on the current darcs >> repo with about 13000 patches; this takes about 15 seconds on my laptop. >> >> But normal operations seldom consider more than a hand full of patches, >> perhaps a few hundred in bad cases. >> >> > How do you determine which patches are dependent on which patches? I >> know >> > the user can manually specify dependencies, but there has to be some >> > automated way to figure out the important ones, the ones that would >> break >> > things if not maintained correctly. >> >> Simple: B depends on A if A;B does not commute. >> >> > Naively thinking, would you have to try >> > and commute the incoming patch with every other stored patch? Or can >> you do >> > that until you find a match, and attach it to the "dependency tree", or >> are >> > there encoded manual rules? >> >> There is no explicit representation of the dependency graph anywhere in >> darcs. It wouldn't help, since patches can change representation when >> commuted, so even if you new beforehand that A and B will successfully >> commute that still doesn't give you the result of the commute. >> >> To understand how darcs merges patches it is necessary to consider patch >> "names". The name of a patch is a hash of all its meta-data (name, log >> message, timestamp, author, and some random noise added when you >> record). This is what identifies a patch semantically. Darcs assumes >> that patches with the same "name" really are commuted versions of the >> same original patch. This is insecure, but allows to quickly compare >> large sets of patches for equality. >> >> When we merge patches from another repo, we first find the set of all >> patches common to both the local and the remote repo. What we actually >> have to merge are the remaining sequences of patches i.e. those that are >> only in our repo and those that are only on the remote one. >> >> Finding the common set of patches is further optimised using inventories >> but I guess explaining that would lead us too far away. >> >> > What is the internal patch format for darcs? Like, could I implement a >> > crude patch-based system using unix diff/patch utils? >> >> No, that is not possible. The basic patch operations MUST strictly >> adhere to the laws of patch theory. >> >> > Or do you have to >> > store extra metadata to get the mechanics to work correctly? So far, all >> > I've found is the high-level "a patch is a transformation function". >> Great. >> > What does that function actually look like in real-life? >> >> As I mention above, this is not covered by the existing texts on patch >> theory. The point is that this function must be reified as data! >> >> I will give you an example. In Darcs, the most important primitive patch >> type is called "hunk". A hunk consists of the following data: >> >> * file name >> * position (line number) >> * lines to remove >> * lines to add >> >> In short, we have >> >> Hunk filename pos [ContentLine] [ContentLine] >> >> where ContentLine is how you represent a line of text. >> >> To commute two Hunks very roughly goes as follows: >> >> If the filenames differ, they commute trivially (i.e. without change). >> If what the first patch adds overlaps with what the second one removes, >> then they fail to commute. Otherwise, we can commute them by adapting >> the line numbers. The details are a bit tricky (fiddling with line >> numbers, what exactly means "overlap") but not hard to reproduce. >> >> For details, see >> >> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Commute.hs#214 >> >> The definition of the primitive patch data type is >> here: >> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Core.hs#40 >> >> It is not difficult to define your own primitive patch theory, but >> making sure it conforms to all the required patch laws is another >> matter! We never formally proved that for darcs, but instead rely on >> random property testing using QuickCheck. >> >> Cheers >> Ben >> >> _______________________________________________ >> darcs-users mailing list >> darcs-users@osuosl.org >> https://lists.osuosl.org/mailman/listinfo/darcs-users >> >
_______________________________________________ darcs-users mailing list darcs-users@osuosl.org https://lists.osuosl.org/mailman/listinfo/darcs-users