Re: Integrating Annotator into Memex

Gerben Thu, 23 Apr 2020 10:32:09 -0700

Hi Oliver, good to hear how things are going!

My two cents for the first two questions: I suppose Annotator as it is
today is ready to play with but not to use without contributing to it;
but making it fit use cases like yours is exactly the goal, so this is a
great target to work towards. The main focus is on TextQuoteSelectors in
HTML documents, which seems to fit the scenario. As for how to help, we
may need to draw up more specific tasks&challenges to lower the barrier
to do things independently; but until then (and to make that happen!),
we’d be glad if people willing to get involved pop into a weekly call or
IRC (#annotator on freenode), or try write on this list or open issues
with things that block them. Also creating test cases would be really
helpful to find out what needs fixing. My guess is that with a couple of
person-days (or rather -weeks, to do things properly) we could get it to
a state where it does the tricks you need for the majority of cases.


I especially like the described scenario with two types of views, as it
creates an interoperability challenge within a single project. As
Benjamin indicates, if the viewer only changes HTML but not text
content, perhaps exact text quote matches would already work.
Implementing fuzzy anchoring would help overcome slight mismatches (and
something we’d like to provide anyway), or perhaps just some improved
whitespace normalisation would suffice. This use case may be a good
experiment to discover the limitations and possibilities of the
anchoring algorithm.

By the way, could you perhaps clarify what you mean with the hypothes.is
annotation library?

— Gerben

On 23/04/2020 16:02, Benjamin Young wrote:
> Thanks for posting that here, Oliver!
>
> We'll probably have to take those questions one at a time. 🙂 I'd actually 
> like to start with the last one--as it spells out some target APIs and code.
>
>> 3. What do you think will be the challenges to get Apache Annotator work on 
>> both a reader and a full-html version?
> If I understand the use case correctly, you're wanting annotation on 
> something like Firefox's "reader view" (where the original HTML is stripped 
> away, and only the content remains) and wanting those same annotations to be 
> re-anchor-able on the original HTML (and vice versa).
>
> If that's indeed what you're after, then the "hard" part is making sure we 
> have a way for implementations to "opt-in" to fuzzy anchoring when they both 
> create and use an annotation.
>
> For starters, you could simply store the TextQuoteSelector which *should* 
> re-anchor on both those representations (and possibly even on a PDF), but it 
> would come at the cost of performance on large documents. So, what you'd want 
> to follow that up with is additional, narrower, more brittle selectors which 
> would (knowingly) fail when you switch representations, but would give you 
> better performance on a specific representation--i.e. you'd have an XPath or 
> CSS selector for the original HTML which would fail on the "reader" and/or 
> "PDF" view at which point you'd (knowingly) fall back to the 
> TextQuoteSelector.
>
> I think the core "plumbing" for that is already available, but Randall or 
> Gerben would know better. 🙂
>
> Is that what you're after?
>
> Cheers,
> Benjamin
>
>
> --
>
> http://bigbluehat.com/
>
> http://linkedin.com/in/benjaminyoung
>
> ________________________________
> From: Oliver Sauter <o...@worldbrain.io>
> Sent: Wednesday, April 22, 2020 12:17 PM
> To: dev@annotator.incubator.apache.org <dev@annotator.incubator.apache.org>
> Subject: Integrating Annotator into Memex
>
> Hey folks,
>
> I just had a call with Benjamin and we talked about the ability to integrate 
> annotator into getmemex.com <http://getmemex.com/>
> Right now we use the Hypothes.is <http://hypothes.is/> library but it is 
> causing us some troubles (mainly the ram usage for hooking it into each tab)
>
> But also we are about to start the development of the Pocket-style 
> offline-reader for desktop and mobile on which we want to also integrate 
> annotation capabilities.
> This means there is an anticipated use case where people annotate on a 
> reader-version and want to see the annotations also successfully anchored on 
> a live html page. Annotating a reader-version will be missing a lot of 
> details usually used for anchoring the annotations, so the challenge would be 
> to make those interoperable with Apache Annotator.
>
> So the questions I have:
> 1. How mature is Annotator in terms of its ability to replace the hypothesis 
> annotation library? What still needs to be done (and how much work for that? 
> Where do you need help?)
> 2. How much work do you anticipate for a replacement?
> 3. What do you think will be the challenges to get Apache Annotator work on 
> both a reader and a full-html version?
>
> I’ve been looking forward to find a way to collaborate so hopefully this time 
> is the time!
> Cheers
> Oliver
>
>
>

Re: Integrating Annotator into Memex

Reply via email to