Hi Johann,

I think there is quite a bit of work that you can look at.

Besides all the third party UIMA libraries (DKPro Core, ClearTK, cTAKES, JCoRe, 
etc.), 
there is a lot more that one can look at.

There is GATE of course ;) Although as far as I understood, GATE like UIMA
intentionally does not prescribe a particular annotation schema / encoding
so that it remains as flexible as possible to its users.

Nancy Ide [1] has done a lot of work on interoperability in the NLP
space. One of the recent projects she is involved in is the LAPPS Grid [2]
which includes a JSON-based data format, a schema, and an whole processing
platform including components. The LAPPS Grid also integrates third-party
components such as GATE or DKPro Core.

In Germany, there is the Weblicht [3] platform of CLARIN-D. They have the
XML-based TCF format for representing their stuff.

In the Netherlands, there is CLARIAH [4]. They have the XML-based FoLiA
and a lot of stuff building on that, e.g. CLAM [6].

From the semantic web space, there is the RDF-based NIF [7].

... and these are just the ones I remember off the top of my head.

If you follow these references and do a bit of digging, you probably find much 
more.

However, doing a fine-grained comparison between all of these do distill 
commonalities
and differences is quite a daunting task. Been there, done that - as you say - 
that is
a place few people dare to venture.

Cheers,

-- Richard

[1] https://scholar.google.de/citations?hl=de&user=WkfhlGkAAAAJ
[2] https://www.lappsgrid.org
[3] https://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/Main_Page
[4] https://www.clariah.nl
[5] https://pypi.org/project/FoLiA/
[6] https://clam.readthedocs.io/en/latest/installation.html
[7] 
https://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html

Reply via email to