Hi! On 2 October 2012 08:31, Rupert Westenthaler <[email protected]> wrote: > Hi, > > let me just comment on your last point > > On Mon, Oct 1, 2012 at 8:55 PM, Mihály Héder <[email protected]> wrote: >>> However this feature is much more important for UIMA as for Stanbol, >>> because with Stanbol EnhancementEngines are expected to create >>> Annotations that confirm to the EnhancementStructure. >> >> I totally support the self-description interface you propose, as the >> conformity to the structure is really helpful but not everything. For >> instance I had to experiment with Stanbol to figure out that LangId >> will provide a "dc:language" property, and there will be only one of >> this, not multiple ones (e.g. for every sentence). > > This is defined by STANBOL-613. > >> An other example >> that the UIMAToTriples in my current deployment puts an sso:posTag >> property to every TextAnnotation. > > Here the idea is to use NIF (NLP Interchange Format), but this is > still in the workings. Current work is done in STANBOL-741, but most > likely I will create an own Issue that defines how NIF annotations are > linked to Stanbol Enhancements. > > Generally representing Word/Phrase level annotations as RDF does not > scale. This is the reason why STANBOL-733 introduced the AnalyzedText > ContentPart. So if you would like to allow other Engines to consume > NLP annotations the UIMA integration should also support the > AnalyzedText ContentPart.
That's good news. Will look into it! >> That might be helpful for other EE >> developers but they have to figure the uri of the property somehow - >> ok, it is in the documentation, but still... >> > > Maybe we can use the already existing > > org.apache.stanbol.enhancer.servicesapi.ServiceProperties > > interface (already implemented by most Enhancement Engines. Possible > additions would include > > * EnhancementFeature: MetadataExtraction, PlainTextExtraction, > LanguageIdentification, POS tagging, Chunking, NER, EntityLinking, ... I think I see the benefits of describing the features by naming their functions. But as we surely cannot foresee all various ways people might intend to use EEs, I'm afraid this kind of ontology will have to be continuously expanded or we end up having some joker category. So the question arises where this ontology will be kept/maintained? Anyway, I think a useful addition to this descriptor would be an other one that tells little about the function of the EE but tells how precisely the structure of RDF-s look like (what namespace/ontology they use, the hierarchy/multiplicity of the triples, etc). Best Mihály > * RequiresFeature: Enhancements required by an EnhancementEngine > * supportsLanguage: list of languages supported (with support for > exclusions and wildcard (e.g. !fr, !de, *) > * supportsMimeType: allows an EnhancementEngine to define the > supported mime types > * ... > > If we use an Ontology for those Features we can > > 1. implement the Webservice that publishes the RDF metadata for > EnhancementEngines based on the ServiceProperties provided by an > EnhancementEngine > 2. the URIs of those properties would be also a good entry point for > the documentation of how those features are represented in the > EnhancementStructure (or NIF) > > best > Rupert > >> Cheers >> Mihály >> >>> best >>> Rupert >>> >>> >>> -- >>> | Rupert Westenthaler [email protected] >>> | Bodenlehenstraße 11 ++43-699-11108907 >>> | A-5500 Bischofshofen > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen
