Hi, there is an RDF vocabulary for just doing this: the NLP InterchangeFormat (NIF) [1] Maybe that helps.
[1] http://persistence.uni-leipzig.org/nlp2rdf/ Am 04.03.2015 um 16:56 schrieb Rafa Haro <rh...@apache.org>: > Hi all, > > Recently, while working on a post-processing engine, I have realized that > currently it is not straightforward to deal with the data produced by Linking > engines. Basically, in my opinion, the problem is that there is not currently > easy to relate the results of NLP analysis with the results of the Linking > process. After NLP analysis, all the extracted Spans (tokens, sentences, > chunks and so on) are stored in an AnalyzedText object [1]. This model has a > nice to use API and it really eases the work in the next engines within a > chain. However, the result of the Linking Engines are currently only stored > in the Clerezza graph holding the metadata of a ContentItem mainly as Text > and Entity Annotations. Although there are some helpers to deal with the > annotations within the graph, when developing a, let’s say, post-linking > engine, a developer really miss a way to find, for example, the text and > entity annotations that could be associated with the spans. The only way I > have found without started to work on a good solution for this, has been to > locate the spans associated to a Text Annotation by using the start and end > offsets. > > I would like to start a discussion here about the best design for tackling > this problem. > > Cheers, > Rafa > > [1] - > https://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext > > -- Magnus Knuth Hasso-Plattner-Institut für Softwaresystemtechnik GmbH Prof.-Dr.-Helmert-Str. 2-3 14482 Potsdam Amtsgericht Potsdam, HRB 12184 Geschäftsführung: Prof. Dr. Christoph Meinel tel: +49 331 5509 547 email: magnus.kn...@hpi.de web: http://www.hpi.de/ webID: http://magnus.13mm.de/