Hi,

let me just comment on your last point

On Mon, Oct 1, 2012 at 8:55 PM, Mihály Héder <[email protected]> wrote:
>> However this feature is much more important for UIMA as for Stanbol,
>> because with Stanbol EnhancementEngines are expected to create
>> Annotations that confirm to the EnhancementStructure.
>
> I totally support the self-description interface you propose, as the
> conformity to the structure is really helpful but not everything. For
> instance I had to experiment with Stanbol to figure out that LangId
> will provide a "dc:language" property, and there will be only one of
> this, not multiple ones (e.g. for every sentence).

This is defined by STANBOL-613.

> An other example
> that the UIMAToTriples in my current deployment puts an sso:posTag
> property to every TextAnnotation.

Here the idea is to use NIF (NLP Interchange Format), but this is
still in the workings. Current work is done in STANBOL-741, but most
likely I will create an own Issue that defines how NIF annotations are
linked to Stanbol Enhancements.

Generally representing Word/Phrase level annotations as RDF does not
scale. This is the reason why STANBOL-733 introduced the AnalyzedText
ContentPart. So if you would like to allow other Engines to consume
NLP annotations the UIMA integration should also support the
AnalyzedText ContentPart.

> That might be helpful for other EE
> developers but they have to figure the uri of the property somehow -
> ok, it is in the documentation, but still...
>

Maybe we can use the already existing

    org.apache.stanbol.enhancer.servicesapi.ServiceProperties

interface (already implemented by most Enhancement Engines. Possible
additions would include

* EnhancementFeature: MetadataExtraction, PlainTextExtraction,
LanguageIdentification, POS tagging, Chunking, NER, EntityLinking, ...
* RequiresFeature: Enhancements required by an EnhancementEngine
* supportsLanguage: list of languages supported (with support for
exclusions and wildcard (e.g. !fr, !de, *)
* supportsMimeType: allows an EnhancementEngine to define the
supported mime types
* ...

If we use an Ontology for those Features we can

1. implement the Webservice that publishes the RDF metadata for
EnhancementEngines based on the ServiceProperties provided by an
EnhancementEngine
2. the URIs of those properties would be also a good entry point for
the documentation of how those features are represented in the
EnhancementStructure (or NIF)

best
Rupert

> Cheers
> Mihály
>
>> best
>> Rupert
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to