On 07/03/2011 18:24, Jay Savage wrote:
I'm working on a project to to track changes to text files over time. The goal is build of a data set that tags or tokenizes each word in the file with version where it was introduced. Basically I want to create data that could drive something similar this: http://hint.fm/projects/historyflow/ (just the data, not the dataviz). I haven't been able to find any information on doing this in Perl (or otherwise, for that matter). I suspect it's because I don't know the right search terms. Right now I'm looking at rolling my own solution using Algorithm::Diff::Callback and the word definitions from Text::WordDiff to tag hunks with revision information. I feel like this probably a solved problem and I'm just looking in the wrong place, though... Any pointers would be appreciated.
It is certainly not a 'solved problem', or the paper that you (indirectly) refer to would not have been written. What you are doing is non-trivial, but is described in detail in the paper <http://alumni.media.mit.edu/~fviegas/papers/history_flow.pdf> I see the purpose of this list to be to help people code an algorithm in Perl, not to try to establish an algorithm in the first place. That said, it isn't uncommon for problems posted here to grab the imagination of the more experienced minds, and you may well have offers of help here. But your job is first to describe the problem accurately, then to construct an algorithm that would solve it, and finally to code it up in your chosen programming language. If that language is Perl then this list should be able to help. Since the referenced article works from the archived change log that Wikipedia maintains, I suggest that you should investigate that log and aim for something comparable. Cheers, Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/