Hi,

there is an RDF vocabulary for just doing this: the NLP InterchangeFormat (NIF) 
[1]
Maybe that helps.

[1] http://persistence.uni-leipzig.org/nlp2rdf/

Am 04.03.2015 um 16:56 schrieb Rafa Haro <rh...@apache.org>:

> Hi all, 
> 
> Recently, while working on a post-processing engine, I have realized that 
> currently it is not straightforward to deal with the data produced by Linking 
> engines. Basically, in my opinion, the problem is that there is not currently 
> easy to relate the results of NLP analysis with the results of the Linking 
> process. After NLP analysis, all the extracted Spans (tokens, sentences, 
> chunks and so on) are stored in an AnalyzedText object [1]. This model has a 
> nice to use API and it really eases the work in the next engines within a 
> chain. However, the result of the Linking Engines are currently only stored 
> in the Clerezza graph holding the metadata of a ContentItem mainly as Text 
> and Entity Annotations. Although there are some helpers to deal with the 
> annotations within the graph, when developing a, let’s say, post-linking 
> engine, a developer really miss a way to find, for example, the text and 
> entity annotations that could be associated with the spans. The only way I 
> have found without started to work on a good solution for this, has been to 
> locate the spans associated to a Text Annotation by using the start and end 
> offsets.
> 
> I would like to start a discussion here about the best design for tackling 
> this problem.
> 
> Cheers,
> Rafa
> 
> [1] - 
> https://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> 
> 

-- 
Magnus Knuth

Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
Prof.-Dr.-Helmert-Str. 2-3
14482 Potsdam

Amtsgericht Potsdam, HRB 12184
Geschäftsführung: Prof. Dr. Christoph Meinel

tel:     +49 331 5509 547
email:   magnus.kn...@hpi.de
web:     http://www.hpi.de/
webID:   http://magnus.13mm.de/

Reply via email to