AntC <anthony_clayden <at> clear.net.nz> writes: > ...
OK, here's a very speculative approach for how to deal with hunk changes in a context-independent way. (Compare my post that talks about the approach for file-id's and tracing the 'same' file through repos and moves http://lists.osuosl.org/pipermail/darcs-users/2012-September/026682.html ) I'm aiming to support the same behaviour and validation as darcs hunk changes, not introduce any of the semantic-oriented wishfulness that's been discussed in this thread. The VCS gives each text-line an 'internal' id that is: - unique across all repos; and - persistent wherever the line goes, or however its position in the file changes. Since each line must have got into the repo only through a hunk change, the line-id is to be a pair (hunk-id, offset) where: - hunk-id is the ppid of the hunk change, or a guid, or whatever suits - offset is the relative position of the line in the text getting inserted To spell out the consequences: - if the line shuffles up/down the file because of inserts/deletes around it, that doesn't change the (hunk-id, offset) - if some lines of the hunk later get deleted, that doesn't change the (hunk-id, offset) of the lines that remain. So there'll be a 'gap' in the offset sequence. - if some lines get inserted in the middle of the hunk, that doesn't change the (hunk-id, offset) of the original lines. So there'll be a 'intruder' amongst the offset numbering. - if some lines get moved around the file (or moved to a different file) that doesn't change the (hunk-id, offset) of the moved lines. So there'll be offset numbering out of sequence. This implies there's a move-lines operation, for similar purposes as darcs' move-file operation. I think it also needs a copy-lines operation. We need the VCS to maintain two 'internal' maps: - (file-id, line-num) -> (hunk-id, offset) - (hunk-id, offset) -> content Notes: (Yes, I know the file-to-line structure is horribly inefficient, and fails to capture that lines are in contiguous blocks, but for now I'm just trying to get the idea across.) (The line-nums don't have to be integral, nor do they have to be sequential. Their only purpose is relative ordering of the lines. I've seen one cunning scheme where the line-nums were rationals, so that for line insertions, the line numbering interval could be arbitrarily sub-divided.) (Replace token changes the content, but retains the (hunk-id, offset) and line position. I'm assuming that replace token can't merge or split lines.) (A given (hunk-id, offset) could appear in more than one file, or even more than one position within the same file. That would be the result from a copy- lines.) Then a hunk-change patch is represented as a triple of: - (hunk-id, offset) position for the operation - a list of (hunk-id, offset, content) to delete, and - a list of (hunk-id, offset, content) to insert. (Where the to-insert hunk-id is this patch's ppid.) (Either of those lists could be empty, meaning this is a delet-only or insert- only operation.) Pre-condition for applying a hunk patch: - all of the to-delete (hunk-id, offset)'s must exist in the target repo; - and their content must match the to-delete content; - and they must appear contiguously and in the same relative sequence, within some file (not necessarily the file they came from in the source of the patch) - none of the to-insert (hunk-id, offset)'s must exist in the target repo. (that would be a duplicate patch) Applying the patch is just like hunk change: - delete the to-delete lines - insert in their place the to-insert lines - those might be different numbers of lines, so renumber the (file-id, line-num)s throughout. One wrinkle: as described so far, there's no way to insert fresh lines 'after' existing lines. - Especially, we can't insert fresh lines into an empty file, - or into the end of a file. But notice in this case that darcs' insertion point is last-line-num + 1. (That is, line number 1 for an empty file. This is a 'ghost': there is no line number 1.) I'll use the same trick: - every hunk map has an extra 'invisible' after-last line (where its content is <endofhunk> or somesuch) - if a hunk operation deletes all lines from a hunk, that after-last line remains - and every file map has an extra 'invisible' after-last line that points to that remnant. - so a newly-added file automatically gets a (invisible) hunk, we might as well have hunk-id same as file-id. AntC _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
