On 07/03/2011 18:24, Jay Savage wrote:

I'm working on a project to to track changes to text files over
time. The goal is build of a data set that tags or tokenizes each
word in the file with version where it was introduced. Basically I
want to create data that could drive something similar this:
http://hint.fm/projects/historyflow/ (just the data, not the
dataviz).

I haven't been able to find any information on doing this in Perl
(or otherwise, for that matter). I suspect it's because I don't know
the right search terms.

Right now I'm looking at rolling my own solution using
Algorithm::Diff::Callback and the word definitions from
Text::WordDiff to tag hunks with revision information.

I feel like this probably a solved problem and I'm just looking in
the wrong place, though...

Any pointers would be appreciated.

It is certainly not a 'solved problem', or the paper that you
(indirectly) refer to would not have been written. What you are doing is
non-trivial, but is described in detail in the paper

  <http://alumni.media.mit.edu/~fviegas/papers/history_flow.pdf>

I see the purpose of this list to be to help people code an algorithm in
Perl, not to try to establish an algorithm in the first place. That
said, it isn't uncommon for problems posted here to grab the imagination
of the more experienced minds, and you may well have offers of help here.

But your job is first to describe the problem accurately, then to
construct an algorithm that would solve it, and finally to code it up in
your chosen programming language. If that language is Perl then this
list should be able to help.

Since the referenced article works from the archived change log that
Wikipedia maintains, I suggest that you should investigate that log and
aim for something comparable.

Cheers,

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to