Exciting to see this conversation happening. ^_^

Randall, how feasible would it be to bring (soon) your libraries (even via 
copy/paste) into the Apache Annotator repo. I believe (according to GitHub) 
you're author/owner of 90%+ of the code in them, and (consequently) able to do 
that if you believe that's the right step.


Sasha you're classes modeled around the selector and a "builder" sound very 
similar to the hopes I wrote up in 
https://cwiki.apache.org/confluence/display/ANNO/Planning


I'd very much like to combine these efforts in some way.


Additionally--and the thing driving me personally at the moment--I have to 
present on Apache Annotator next Wednesday!

https://apachecon2017.sched.com/event/AbBW


Consequently, I'd very much love it if we (collectively) could build a demo 
together! There's plenty to talk about wrt to annotation, community building, 
Web Annotation Data Model & Protocol, as well as why (those of us that are here 
at least) have chosen to start collaborating at the ASF.


At any rate, I plan to be coding on all the things leading up to Wednesday, so 
any help, input, pointers, and code (hehe) that anyone wants to toss in ahead 
of my codez, I'd be most grateful to code together!


Thanks, all!

Benjamin

--

http://bigbluehat.com/

http://linkedin.com/in/benjaminyoung

________________________________
From: Randall Leeds <[email protected]>
Sent: Thursday, May 11, 2017 3:34:24 PM
To: [email protected]
Subject: DOM Iteration (was Re: Just a simple example?)

Great to see you here, Sasha!

On Wed, May 10, 2017 at 5:39 PM Sasha Goodman <[email protected]>
wrote:

>
> P.S. This afternoon I streamlined the TextQuoteSelector and
> TextPositionSelector to work (in principle ) consistently with Randall
> Leed's implementation that used NodeIterator and textContents.
>
>
Neat :).

I think my takeaway from the simple example thread, and something of which
many of us were likely already well aware, is that there's a desire for a
good highlighter implementation. A way to highlight text is often the first
example people want to see.

While I hope to see experimentation with implementations that try to limit
the impact on the DOM, I think <mark> or <span> wrapping of text nodes is
still the easiest to understand. In this approach, the actual wrapping is
easy. The difficult part is iteration.

Now, some quick background on node iteration.

I chose to use NodeIterator rather than TreeWalker for my dom-seek library
because it meant that the seek function could be stateless, support seeking
forward and backward, and still be able to return the number of characters
consumed by a seek. The desire to know whether to include the current
node's content in the seek count is fulfilled by NodeIterator's
"pointerBeforeReferenceNode". Essentially, a NodeIterator stores a point
before or after a node, rather than simply a current node.

However, using NodeIterator to traverse a Range is not really great. Since
it has a read only currentNode, the best that can be done is to start with
the commonAncestorContainer of the Range. Range has compareNode,
comparePoint, and isPointInRange. I have no idea how expensive these are.
Iterating all the nodes under the commonAncestorContainer doesn't feel
great to begin with. TreeWalker might be more appropriate since its
currentNode could be set to startContainer directly. TreeWalker also
appears to have consistent platform support.

All of this is complicated by the Range being able to point to offsets
within text nodes. For the purposes of highlighting with wrapper elements
it's necessary to split the boundary nodes. I think there are probably a
number of libraries for this, but I propose we write one under our repo.

We might also find that normalizing the endpoints of a Range in some
fashion is a helpful prerequisite. There is a library I found that does
this, but I found its algorithm terribly confusing. I put time into
rewriting it without dependencies. Despite some initial excitement, the
author never fully vetted and accepted my pull request:
https://github.com/webmodules/range-normalize/pull/2

In conclusion, I think there'd be value in bringing some functional
utilities into Apache Annotator for dealing with iteration, range
splitting, and range normalization, with the goal of providing a very
succinct and simple highlighter that looks like this:

```
for (const node of textNodes(range)) {
  const mark = document.createElement('mark');
  node.replaceWith(mark);
  mark.appendChild(node);
}
```

Some care needs to be taken that whatever iteration we use is not
invalidated by the replacement of the text node with its wrapper.

The fact that a simple example like this is hard to produce is evidence of
the underlying complexity described in the above paragraphs. When I see
people wanting a simple highlighter what I hear is that they actually need
simple abstractions upon which to build a highlighter. The highlighter
itself should be easy. Often, highlighters that projects provide are not
shipped standalone or don't do exactly what the author needs (use spans
instead of marks, add a particular class, coalesce overlapping highlights
or not, etc). There is lots of room to do different things but being able
to simply get the nodes to be highlighted is the prerequisite task that
contains most of the complexity.

That's all (and probably way too much) for now. Finding all the tools for
all these things is a pain enough that I think we should have a
comprehensive set of such utilities in Apache Annotator, even if that risks
looking like a bit of NIH syndrome.

Unless anyone objects, I think I'll aim to ship libraries for these:
- Node iteration (https://github.com/tilgovi/dom-node-iterator)
- Tree walking (might not need a library if support is good)
- Range splitting
- Range normalization (see my pull request reference, above)
- Range iterating
- Text distance (https://github.com/tilgovi/dom-seek)

If anyone wants to start on any of the above, you're welcome to depend on
libraries that are outside Apache Annotator. In the case of libraries that
I've written, there is value to bringing them into Apache Annotator because
they are all written in ES6 but not packaged to be consumed as ES6.
Bringing them inside our repo means better code deduplication by tree
shaking in tools like rollup and webpack. They could be packaged as ES6
where they are, but if I'm going to spend time improving the packaging I
would rather just toss out the packaging and get the benefits of the
monorepo having all that build/test boilerplate done once for all of them.

Reply via email to