Hi, let me just comment on your last point
On Mon, Oct 1, 2012 at 8:55 PM, Mihály Héder <[email protected]> wrote: >> However this feature is much more important for UIMA as for Stanbol, >> because with Stanbol EnhancementEngines are expected to create >> Annotations that confirm to the EnhancementStructure. > > I totally support the self-description interface you propose, as the > conformity to the structure is really helpful but not everything. For > instance I had to experiment with Stanbol to figure out that LangId > will provide a "dc:language" property, and there will be only one of > this, not multiple ones (e.g. for every sentence). This is defined by STANBOL-613. > An other example > that the UIMAToTriples in my current deployment puts an sso:posTag > property to every TextAnnotation. Here the idea is to use NIF (NLP Interchange Format), but this is still in the workings. Current work is done in STANBOL-741, but most likely I will create an own Issue that defines how NIF annotations are linked to Stanbol Enhancements. Generally representing Word/Phrase level annotations as RDF does not scale. This is the reason why STANBOL-733 introduced the AnalyzedText ContentPart. So if you would like to allow other Engines to consume NLP annotations the UIMA integration should also support the AnalyzedText ContentPart. > That might be helpful for other EE > developers but they have to figure the uri of the property somehow - > ok, it is in the documentation, but still... > Maybe we can use the already existing org.apache.stanbol.enhancer.servicesapi.ServiceProperties interface (already implemented by most Enhancement Engines. Possible additions would include * EnhancementFeature: MetadataExtraction, PlainTextExtraction, LanguageIdentification, POS tagging, Chunking, NER, EntityLinking, ... * RequiresFeature: Enhancements required by an EnhancementEngine * supportsLanguage: list of languages supported (with support for exclusions and wildcard (e.g. !fr, !de, *) * supportsMimeType: allows an EnhancementEngine to define the supported mime types * ... If we use an Ontology for those Features we can 1. implement the Webservice that publishes the RDF metadata for EnhancementEngines based on the ServiceProperties provided by an EnhancementEngine 2. the URIs of those properties would be also a good entry point for the documentation of how those features are represented in the EnhancementStructure (or NIF) best Rupert > Cheers > Mihály > >> best >> Rupert >> >> >> -- >> | Rupert Westenthaler [email protected] >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
