Great to see you here, Sasha! On Wed, May 10, 2017 at 5:39 PM Sasha Goodman <[email protected]> wrote:
> > P.S. This afternoon I streamlined the TextQuoteSelector and > TextPositionSelector to work (in principle ) consistently with Randall > Leed's implementation that used NodeIterator and textContents. > > Neat :). I think my takeaway from the simple example thread, and something of which many of us were likely already well aware, is that there's a desire for a good highlighter implementation. A way to highlight text is often the first example people want to see. While I hope to see experimentation with implementations that try to limit the impact on the DOM, I think <mark> or <span> wrapping of text nodes is still the easiest to understand. In this approach, the actual wrapping is easy. The difficult part is iteration. Now, some quick background on node iteration. I chose to use NodeIterator rather than TreeWalker for my dom-seek library because it meant that the seek function could be stateless, support seeking forward and backward, and still be able to return the number of characters consumed by a seek. The desire to know whether to include the current node's content in the seek count is fulfilled by NodeIterator's "pointerBeforeReferenceNode". Essentially, a NodeIterator stores a point before or after a node, rather than simply a current node. However, using NodeIterator to traverse a Range is not really great. Since it has a read only currentNode, the best that can be done is to start with the commonAncestorContainer of the Range. Range has compareNode, comparePoint, and isPointInRange. I have no idea how expensive these are. Iterating all the nodes under the commonAncestorContainer doesn't feel great to begin with. TreeWalker might be more appropriate since its currentNode could be set to startContainer directly. TreeWalker also appears to have consistent platform support. All of this is complicated by the Range being able to point to offsets within text nodes. For the purposes of highlighting with wrapper elements it's necessary to split the boundary nodes. I think there are probably a number of libraries for this, but I propose we write one under our repo. We might also find that normalizing the endpoints of a Range in some fashion is a helpful prerequisite. There is a library I found that does this, but I found its algorithm terribly confusing. I put time into rewriting it without dependencies. Despite some initial excitement, the author never fully vetted and accepted my pull request: https://github.com/webmodules/range-normalize/pull/2 In conclusion, I think there'd be value in bringing some functional utilities into Apache Annotator for dealing with iteration, range splitting, and range normalization, with the goal of providing a very succinct and simple highlighter that looks like this: ``` for (const node of textNodes(range)) { const mark = document.createElement('mark'); node.replaceWith(mark); mark.appendChild(node); } ``` Some care needs to be taken that whatever iteration we use is not invalidated by the replacement of the text node with its wrapper. The fact that a simple example like this is hard to produce is evidence of the underlying complexity described in the above paragraphs. When I see people wanting a simple highlighter what I hear is that they actually need simple abstractions upon which to build a highlighter. The highlighter itself should be easy. Often, highlighters that projects provide are not shipped standalone or don't do exactly what the author needs (use spans instead of marks, add a particular class, coalesce overlapping highlights or not, etc). There is lots of room to do different things but being able to simply get the nodes to be highlighted is the prerequisite task that contains most of the complexity. That's all (and probably way too much) for now. Finding all the tools for all these things is a pain enough that I think we should have a comprehensive set of such utilities in Apache Annotator, even if that risks looking like a bit of NIH syndrome. Unless anyone objects, I think I'll aim to ship libraries for these: - Node iteration (https://github.com/tilgovi/dom-node-iterator) - Tree walking (might not need a library if support is good) - Range splitting - Range normalization (see my pull request reference, above) - Range iterating - Text distance (https://github.com/tilgovi/dom-seek) If anyone wants to start on any of the above, you're welcome to depend on libraries that are outside Apache Annotator. In the case of libraries that I've written, there is value to bringing them into Apache Annotator because they are all written in ES6 but not packaged to be consumed as ES6. Bringing them inside our repo means better code deduplication by tree shaking in tools like rollup and webpack. They could be packaged as ES6 where they are, but if I'm going to spend time improving the packaging I would rather just toss out the packaging and get the benefits of the monorepo having all that build/test boilerplate done once for all of them.
