Re: NIF + Stanbol

Sebastian Hellmann Mon, 27 May 2013 00:58:35 -0700

Hi Rupert,

Am 25.05.2013 09:23, schrieb Rupert Westenthaler:

(3) oa:SpecificResources could be used to explicitly model Blobs in an
ContentItem. The 'source' would represent the parsed content (e.g. an
PDF), the oa:SpecificResources would represent the extracted plain
text Blob. ao:Selectors would clearly state that they are relative to
the text/plain Blob and not the originally parsed PDF document. On the
downside this model would introduce a lot of indirections for users
that are only interested e.g. in the fise:selected-text (oa:exact of
the oa:TextQuoteSelector) of a fise:TextAnnotation.

oa:TextQuoteSelector is not the correct one, it is for interpreted DOMonly and not well defined for text.

(I am investigating this currently)

Based on those observations my fist impression is that a full adaption
of the Open Annotation Data Model would require a complete re-thinking
of how annotations are composed and result in a complete rewrite of
everything in Stanbol that is related to RDF. IMO the resulting RDF
would be also much harder to consume and produce and therefore affect
both users that need to extract informations form the enhancement
results as well as programmers that want to implement their
Enhancement Engine.

Open Annotation has a very expressive model, their design is quitegood, I think we can still map anything we do to their model later.The NLP annotation use cases needs a specialized format however, as itis more intuitive and easier to work with.


The main diff to NIF Simple profile:
1. include text via a nif:Context node (IIRC not supported in OA yet )

2. merge oa:SpecificResource and oa:Selector to one resource using a NIFURI Scheme3. don't create oa:Body and oa:Annotation and attach the triple (p,o)directly to 2.


Diff with NIF Stanbol profile:
1. include text via a nif:Context node (IIRC not supported in OA yet )

2. merge oa:SpecificResource and oa:Selector to one resource using a NIFURI Scheme

3. merge oa.Body and oa:Annotation to one resource.

Doing it this way, obsoletes several triples and URNs which areirrelevant for NLP on the one hand, while providing additional feature(e.g. merge via NIF URI Schemes, Domain specific reasoning) on the otherhand.


All the best,
Sebastian

However as I noted in the beginning those observations are based on a
first look at the Open Annotation Data Model. So I might as well have
missed a much better alignment of the Stanbol Enhancement Structure.

best
Rupert


[1] http://www.openannotation.org/spec/core/

I really recommend grounding any work on their model, as it is really good
and powerful. I am not sure however, whether, it provides the right level of
scalability for NLP.
Looking at:
http://de.slideshare.net/paolociccarese/open-annotation-specifiers-and-specific-resources-tutorial
There are 3 important things missing:
- inclusion of the actual text in the web service request
- providing best practices for identifiers, e.g.
http://purl.org/olia/penn.owl#DT
- reducing the number of URNs and triples

This is where NIF comes in. (If you are in doubt, please try to create an OA
example where a simple sentence is POS annotated over a web service).

Regarding Ruperts problem with backward compatibility.
In a first step, it should be enough to build an RDF parser/serializer based
on the new OWL file.

I didn't yet understand, what is meant exactly by "Stanbol Enhancement
Structure"[1].
Is this the OWL file for serializing annotations (e.g. for use in SPARQL) or
does it describe the internal structure of the Stanbol Java Framework?

I think the second one can stay as it is for now and then the new structure
should be created (as serialization format) meanwhile with the clear aim to
replace the former in the future. This would give all clients enough time to
adapt.

What do you think?

All the best,
Sebastian

[1]
http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html



Am 23.05.2013 14:12, schrieb Reto Bachmann-Gmür:

Hi Sebastian

Are you aware of https://issues.apache.org/jira/browse/STANBOL-351?

Rtaher than doing telcos we should discus things on the list.

Cheers,
Reto


On Thu, May 23, 2013 at 9:27 AM, Sebastian Hellmann <
[email protected]> wrote:

Hi all,
we created an OWL schema called NLP Interchange Format(NIF), which
leverages Apache Stanbols FISE ontology.
Recent documentation is here:

http://svn.aksw.org/papers/**2013/ISWC_NIF/public.pdf<http://svn.aksw.org/papers/2013/ISWC_NIF/public.pdf>

Personally, I think the general structure (using URN for each annotation)
is quite good, but I am a little bit unhappy with some facts:
1. URL persistence: when will the FISE ontology move from IKS to the
Apache Stanbol namespace. In my opinion, sooner is better. The longer it
is
out there, the more side effects it will cause:
http://xkcd.com/1172/
2. Some issues need discussions and some streamlining. I would be happy
to
be of assistance and would offer to hold some Ontology telcos to get it
straight.
http://svn.apache.org/repos/**asf/stanbol/trunk/enhancer/**

generic/servicesapi/src/main/**resources/fise.owl<http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/servicesapi/src/main/resources/fise.owl>
e.g.
- start and end have xsd:int limiting it to a 4GB text file
- extracted-from might not need to be functional. Also there might be a
relation to prov:wasDerivedFrom
These issues all need discussion however.

Any ideas on how to proceed?

All the best,
Sebastian

--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013
(http://nlp-dbpedia2013.blogs.**aksw.org<http://nlp-dbpedia2013.blogs.aksw.org>,
Deadline: *July 8th*)
Venha para a Alemanha como PhD:
http://bis.informatik.uni-**leipzig.de/csf<http://bis.informatik.uni-leipzig.de/csf>
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage:
http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
Research Group: http://aksw.org


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Deadline:
*July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf

Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org



--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen



--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig

Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org,Deadline: *July 8th*)

Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf

Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,http://dbpedia.org/Wiktionary , http://dbpedia.org

Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Re: NIF + Stanbol

Reply via email to