Hello, Hypothesis currently uses the dom-anchor-text-quote library for anchoring quotes in HTML and PDF documents. It has served us well for a long time but we have a need for better performance (esp. when there are hundreds of annotations on a page and many are affected by content changes) and flexibility (specifically, the ability to ignore or discount certain differences in the document content and quotes when anchoring).
Given these needs, I’ve started some experiments into a new approximate string matching implementation (https://github.com/robertknight/approx-string-match-js <https://github.com/robertknight/approx-string-match-js>) and a library that uses it for anchoring quote selectors (https://github.com/robertknight/anchor-quote <https://github.com/robertknight/anchor-quote>). In testing, I’m putting quite a strong emphasis on “real world” testing with web pages and PDFs that have been annotated by Hypothesis users. At the moment this is still in the experimentation phase, but if it pans out well, I think they could be a good fit for the Apache Annotator project. The string matching implementation is fairly solid and offers significant performance improvements over diff-match-patch (see the README of the anchor-quote project for some early benchmarks). The quote anchoring library is very much in early development. Kind Regards, Robert Knight