Thanks TB Dinesh for finding the bug. I just fixed this, at least on my Firefox, so hopefully the github pages content provider has updated the files...you might need to hit refresh or wait a few minutes. The issue was that in Firefox has a stricter implementation of document.evaluate where no default value is provided for the fifth argument. The code also seems to work on my Safari.
On Tue, May 16, 2017 at 10:46 AM TB Dinesh <[email protected]> wrote: > Sasha. Thanks. > Fyi. Demo works on Chrome. Not on Firefox. > > On Tue, May 16, 2017 at 12:49 AM, Sasha Goodman <[email protected]> > wrote: > > Here is a demo of simple annotation, thanks to Benjamin: > > > > https://predict-r.github.io/annotation-model/ > > > > > > On Fri, May 12, 2017 at 12:19 PM Sasha Goodman <[email protected]> > > wrote: > > > >> I would be delighted if my efforts were useful in this project!!! > >> Regarding that code, if any parts are used it would make my week. The > class > >> structure is sorta self-documented by the standard, and combined with > >> builders the classes it can accommodate a variety of motives. > >> > >> Highlighting is the most common motive now (correct me if I'm wrong). My > >> gut-feeling is that to get the support and time of hard core annotators, > >> the code needs to accommodate the idiosyncrasies of highlighting first. > For > >> example, if there are thousands of highlights on a page, an annotation > >> builder might iterate/walk the document just once and fill in the > thousands > >> of highlights in one pass. Also, a highlighting app would probably need > to > >> modify the source document by inserting spans and such. > >> > >> If Randall needs familiar code for node iteration, tree walking, range > >> splitting, string similarity and normalization, that's cool! Custom > code, > >> *especially* Polyfill type implementations, could smooth over browser > >> idiosyncrasies. Also, I saw a Jsperf.com microbenchmark that put custom > >> walkers on par with the native browser based ones. > >> > >> On a personal note, I do archival work and did not initially see the > value > >> in modifying the source document by inserting spans (however, a > highlight > >> app would need that). The main reason I'm excited about annotation is > its > >> value for labeling data for text analysis and machine learning. A lot of > >> the advancements in machine learning are because of large bodies of data > >> that have been tagged. The most common examples are usually of images > that > >> have regions selected and then labeled, but annotation could also help > turn > >> semi-structured text into more structured text data (e.g. for labeling > >> parts of government documents). For archival work on mostly static > >> documents, there does not seem to be a need to modify source document. > On > >> the other hand, for dynamically changing documents, inserting spans with > >> unique IDs seems appropriate because its more robust to document > changes. > >> Yet, it is also vulnerable to turf battles with other extensions and the > >> page's own javascript, so I hope it's not a requirement of the Apache > >> library but rather a feature. > >> > >> > >> On Thu, May 11, 2017 at 1:43 PM Benjamin Young <[email protected]> > >> wrote: > >> > >>> Exciting to see this conversation happening. ^_^ > >>> > >>> > >>> Randall, how feasible would it be to bring (soon) your libraries (even > >>> via copy/paste) into the Apache Annotator repo. I believe (according to > >>> GitHub) you're author/owner of 90%+ of the code in them, and > (consequently) > >>> able to do that if you believe that's the right step. > >>> > >>> > >>> Sasha you're classes modeled around the selector and a "builder" sound > >>> very similar to the hopes I wrote up in > >>> https://cwiki.apache.org/confluence/display/ANNO/Planning > >>> > >>> > >>> I'd very much like to combine these efforts in some way. > >>> > >>> > >>> Additionally--and the thing driving me personally at the moment--I have > >>> to present on Apache Annotator next Wednesday! > >>> > >>> https://apachecon2017.sched.com/event/AbBW > >>> > >>> > >>> Consequently, I'd very much love it if we (collectively) could build a > >>> demo together! There's plenty to talk about wrt to annotation, > community > >>> building, Web Annotation Data Model & Protocol, as well as why (those > of us > >>> that are here at least) have chosen to start collaborating at the ASF. > >>> > >>> > >>> At any rate, I plan to be coding on all the things leading up to > >>> Wednesday, so any help, input, pointers, and code (hehe) that anyone > wants > >>> to toss in ahead of my codez, I'd be most grateful to code together! > >>> > >>> > >>> Thanks, all! > >>> > >>> Benjamin > >>> > >>> -- > >>> > >>> http://bigbluehat.com/ > >>> > >>> http://linkedin.com/in/benjaminyoung > >>> > >>> ________________________________ > >>> From: Randall Leeds <[email protected]> > >>> Sent: Thursday, May 11, 2017 3:34:24 PM > >>> To: [email protected] > >>> Subject: DOM Iteration (was Re: Just a simple example?) > >>> > >>> Great to see you here, Sasha! > >>> > >>> On Wed, May 10, 2017 at 5:39 PM Sasha Goodman <[email protected]> > >>> wrote: > >>> > >>> > > >>> > P.S. This afternoon I streamlined the TextQuoteSelector and > >>> > TextPositionSelector to work (in principle ) consistently with > Randall > >>> > Leed's implementation that used NodeIterator and textContents. > >>> > > >>> > > >>> Neat :). > >>> > >>> I think my takeaway from the simple example thread, and something of > which > >>> many of us were likely already well aware, is that there's a desire > for a > >>> good highlighter implementation. A way to highlight text is often the > >>> first > >>> example people want to see. > >>> > >>> While I hope to see experimentation with implementations that try to > limit > >>> the impact on the DOM, I think <mark> or <span> wrapping of text nodes > is > >>> still the easiest to understand. In this approach, the actual wrapping > is > >>> easy. The difficult part is iteration. > >>> > >>> Now, some quick background on node iteration. > >>> > >>> I chose to use NodeIterator rather than TreeWalker for my dom-seek > library > >>> because it meant that the seek function could be stateless, support > >>> seeking > >>> forward and backward, and still be able to return the number of > characters > >>> consumed by a seek. The desire to know whether to include the current > >>> node's content in the seek count is fulfilled by NodeIterator's > >>> "pointerBeforeReferenceNode". Essentially, a NodeIterator stores a > point > >>> before or after a node, rather than simply a current node. > >>> > >>> However, using NodeIterator to traverse a Range is not really great. > Since > >>> it has a read only currentNode, the best that can be done is to start > with > >>> the commonAncestorContainer of the Range. Range has compareNode, > >>> comparePoint, and isPointInRange. I have no idea how expensive these > are. > >>> Iterating all the nodes under the commonAncestorContainer doesn't feel > >>> great to begin with. TreeWalker might be more appropriate since its > >>> currentNode could be set to startContainer directly. TreeWalker also > >>> appears to have consistent platform support. > >>> > >>> All of this is complicated by the Range being able to point to offsets > >>> within text nodes. For the purposes of highlighting with wrapper > elements > >>> it's necessary to split the boundary nodes. I think there are probably > a > >>> number of libraries for this, but I propose we write one under our > repo. > >>> > >>> We might also find that normalizing the endpoints of a Range in some > >>> fashion is a helpful prerequisite. There is a library I found that does > >>> this, but I found its algorithm terribly confusing. I put time into > >>> rewriting it without dependencies. Despite some initial excitement, the > >>> author never fully vetted and accepted my pull request: > >>> https://github.com/webmodules/range-normalize/pull/2 > >>> > >>> In conclusion, I think there'd be value in bringing some functional > >>> utilities into Apache Annotator for dealing with iteration, range > >>> splitting, and range normalization, with the goal of providing a very > >>> succinct and simple highlighter that looks like this: > >>> > >>> ``` > >>> for (const node of textNodes(range)) { > >>> const mark = document.createElement('mark'); > >>> node.replaceWith(mark); > >>> mark.appendChild(node); > >>> } > >>> ``` > >>> > >>> Some care needs to be taken that whatever iteration we use is not > >>> invalidated by the replacement of the text node with its wrapper. > >>> > >>> The fact that a simple example like this is hard to produce is > evidence of > >>> the underlying complexity described in the above paragraphs. When I see > >>> people wanting a simple highlighter what I hear is that they actually > need > >>> simple abstractions upon which to build a highlighter. The highlighter > >>> itself should be easy. Often, highlighters that projects provide are > not > >>> shipped standalone or don't do exactly what the author needs (use spans > >>> instead of marks, add a particular class, coalesce overlapping > highlights > >>> or not, etc). There is lots of room to do different things but being > able > >>> to simply get the nodes to be highlighted is the prerequisite task that > >>> contains most of the complexity. > >>> > >>> That's all (and probably way too much) for now. Finding all the tools > for > >>> all these things is a pain enough that I think we should have a > >>> comprehensive set of such utilities in Apache Annotator, even if that > >>> risks > >>> looking like a bit of NIH syndrome. > >>> > >>> Unless anyone objects, I think I'll aim to ship libraries for these: > >>> - Node iteration (https://github.com/tilgovi/dom-node-iterator) > >>> - Tree walking (might not need a library if support is good) > >>> - Range splitting > >>> - Range normalization (see my pull request reference, above) > >>> - Range iterating > >>> - Text distance (https://github.com/tilgovi/dom-seek) > >>> > >>> If anyone wants to start on any of the above, you're welcome to depend > on > >>> libraries that are outside Apache Annotator. In the case of libraries > that > >>> I've written, there is value to bringing them into Apache Annotator > >>> because > >>> they are all written in ES6 but not packaged to be consumed as ES6. > >>> Bringing them inside our repo means better code deduplication by tree > >>> shaking in tools like rollup and webpack. They could be packaged as ES6 > >>> where they are, but if I'm going to spend time improving the packaging > I > >>> would rather just toss out the packaging and get the benefits of the > >>> monorepo having all that build/test boilerplate done once for all of > them. > >>> > >> >
