(bcc: mpm to hopefully get pointers to anything he might have handy in notes on 
this subject)
> On Dec 18, 2016, at 4:31 PM, Gregory Szorc <gregory.sz...@gmail.com> wrote:
> 
> Mercurial currently stores file copy/rename metadata as a "header" in filelog 
> revision data. Furthermore, there is some wonkiness with p1 and p2 in the 
> filelog when copies are at play (see _filecommit() in localrepo.py). This 
> metadata means copies/renames can be followed without expensive run-time 
> "similarity" detection, which is great, especially for large repositories.
> 
> However, people or automated processes don't always perform the necessary 
> invocations of `hg copy` or `hg rename` to record copy/rename metadata. And 
> historically there have been a number of bugs or feature deficiencies where 
> copy/rename metadata is lost or not recorded where it should have been. 
> Coupled with the design of having copy metadata in the filelog data (which is 
> part of the hash and the merkle tree contributing to the changeset node), 
> this means that if copy metadata isn't correct from the beginning, it is 
> wrong forever. That's a pretty painful constraint.
> 
> The subject of copy/rename inaccuracy is a frequent complaint among Mozilla 
> developers doing lots of code archeology - in short they can't trust it and 
> they fall back to a Git conversion of the repo when they know copies/renames 
> are in play (Git performs copy/rename detection at operation run-time).
> 
> I recall a very informal conversation with mpm at the 3.8 Sprint in March 
> about this topic and he seemed to express a desire to move copy/rename 
> detection/metadata out of filelogs. I vaguely recall him suggesting it be 
> computed at run-time and cached if performance dictates. I also recall him 
> saying something about modern research in the area of copy detection has 
> enabled better solutions than "measure the percentage of identical lines."
> 
> I was wondering if there have been any formal discussions or proposals on the 
> future of copy metadata. I am most interested in:
> 
> * Whether there are plans for (or even an extension implementation of) a 
> supplemental copy metadata "database." The goal would be to correct 
> deficiencies in the set-in-stone filelog-based metadata.
> * Whether there are plans to move copy metadata out of filelog revisions 
> completely. (This would make the filelogs simpler and more clearly separate 
> file content from metadata.)
> * If we're talking about new designs for copy/rename metadata, should 
> improvements to linkrev be discussed at the same time?

I’m also interested in this, because it’d be nice for remotefilelog et al to be 
able to transmit copy information as a separate step on some setups.

> 
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to