Experiments with quote anchoring performance and flexibility

Robert Knight Tue, 28 May 2019 06:10:34 -0700

Hello,

Hypothesis currently uses the dom-anchor-text-quote library for anchoring 
quotes in HTML and PDF documents. It has served us well for a long time but we 
have a need for better performance (esp. when there are hundreds of annotations 
on a page and many are affected by content changes) and flexibility 
(specifically, the ability to ignore or discount certain differences in the 
document content and quotes when anchoring).


Given these needs, I’ve started some experiments into a new approximate string 
matching implementation (https://github.com/robertknight/approx-string-match-js 
<https://github.com/robertknight/approx-string-match-js>) and a library that 
uses it for anchoring quote selectors 
(https://github.com/robertknight/anchor-quote 
<https://github.com/robertknight/anchor-quote>). In testing, I’m putting quite 
a strong emphasis on “real world” testing with web pages and PDFs that have 
been annotated by Hypothesis users.

At the moment this is still in the experimentation phase, but if it pans out 
well, I think they could be a good fit for the Apache Annotator project. The 
string matching implementation is fairly solid and offers significant 
performance improvements over diff-match-patch (see the README of the 
anchor-quote project for some early benchmarks). The quote anchoring library is 
very much in early development.

Kind Regards,
Robert Knight

Experiments with quote anchoring performance and flexibility

Reply via email to