Hi Johann, I think there is quite a bit of work that you can look at.
Besides all the third party UIMA libraries (DKPro Core, ClearTK, cTAKES, JCoRe, etc.), there is a lot more that one can look at. There is GATE of course ;) Although as far as I understood, GATE like UIMA intentionally does not prescribe a particular annotation schema / encoding so that it remains as flexible as possible to its users. Nancy Ide [1] has done a lot of work on interoperability in the NLP space. One of the recent projects she is involved in is the LAPPS Grid [2] which includes a JSON-based data format, a schema, and an whole processing platform including components. The LAPPS Grid also integrates third-party components such as GATE or DKPro Core. In Germany, there is the Weblicht [3] platform of CLARIN-D. They have the XML-based TCF format for representing their stuff. In the Netherlands, there is CLARIAH [4]. They have the XML-based FoLiA and a lot of stuff building on that, e.g. CLAM [6]. From the semantic web space, there is the RDF-based NIF [7]. ... and these are just the ones I remember off the top of my head. If you follow these references and do a bit of digging, you probably find much more. However, doing a fine-grained comparison between all of these do distill commonalities and differences is quite a daunting task. Been there, done that - as you say - that is a place few people dare to venture. Cheers, -- Richard [1] https://scholar.google.de/citations?hl=de&user=WkfhlGkAAAAJ [2] https://www.lappsgrid.org [3] https://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/Main_Page [4] https://www.clariah.nl [5] https://pypi.org/project/FoLiA/ [6] https://clam.readthedocs.io/en/latest/installation.html [7] https://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html