Hi Nathan,

You are testing the enhancer with a very short sentence and the Language
Detection engine is identifying 'no' (probable Norwegian) as the sentence
language. By default, Stanbol uses the identified language code for both
loading OpenNLP models in that language and for entity lookup for searching
only entity labels in that language. There is a couple of things you can do
for avoiding an empty annotation is these situations:

1. Force the language code as a header in your request (curl request in
this case)
2. Configure English 'en' or whatever language you know your dataset has
labels for the entities as Default Matching Language which is missing in
your configuration. More information here:
https://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking

Also, you also would probably like to disable NER engines for such kind of
entities.

Hope that helps,
Rafa

On Tue, May 31, 2016 at 6:13 PM Nathan Breit <br...@ecohealthalliance.org>
wrote:

> Hello,
> I am trying to configure the Entityhub linking engine to use an Entityhub
> site with vocabulary from the Disease Ontology (
> http://disease-ontology.org/),
> but when I enhance text with it, labels from the ontology are not being
> annotated in the text. I am looking for advice on how to debug this. Here
> is what I've tried so far:
> - I used the genericrdf indexing tool to import the Disease Ontology into a
> new Entityhub site. When I used the entityhub /find API endpoint to search
> for the name "dengue hemorrhagic fever" a result from the Disease Ontology
> was returned.
> - I configured and built a EntityhubLinkingEngine and a WeightedChain
> containing the linking engine. They show up on the Stanbol admin site and
> felix console. These are the config files:
>
> https://github.com/ecohealthalliance/t11/tree/master/ansible/roles/stanbol/templates/enhancer
> - When I used the following API call to enhance text containing the same
> term I was able to find using the /find endpoint, the language detected is
> the only annotation returned.
>
> curl -X POST -H "Accept: appltion/json" -H "Content-type: text/plain"
> --data "Avoid dengue hemorrhagic fever."
> http://54.197.175.163:3000/enhancer/chain/doidEnhancerChain
>
> This appears in the Stanbol error.log when the enhancement runs:
>
> ```
> 31.05.2016 12:05:06.204 *INFO* [Thread-5]
> org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine language
> identified as no
> 31.05.2016 12:05:06.206 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine
> No NER Model for person and language no available!
> 31.05.2016 12:05:06.206 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine
> No NER Model for organization and language no available!
> 31.05.2016 12:05:06.207 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine
> No NER Model for location and language no available!
> 31.05.2016 12:05:06.210 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> EntityLinking Statistics:
> 31.05.2016 12:05:06.210 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>     - overal: 2ms (text processing: 4%, lookup: 127%, matching 0%, ranking
> 0%, other -31%)
> 31.05.2016 12:05:06.210 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>   - Text Processing: 0.071543ms [count: 4 | time: 0.01788575ms
> (max:0.051031, min:0.005928)]
> 31.05.2016 12:05:06.211 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>   - Vocabulary Lookup: 2.541598ms [count: 3 | time: 0.8471993333333333ms
> (max:1.190281, min:0.667284)]
> 31.05.2016 12:05:06.211 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>     - cache hits: 1 (33.333332%)
> 31.05.2016 12:05:06.211 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>       - 0 query results (0 filtered - NaN%)
> 31.05.2016 12:05:06.211 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>   - Label Matching: 0.00218ms [count: 3 | time: 7.266666666666667E-4ms
> (max:7.55E-4, min:7.04E-4)]
> 31.05.2016 12:05:06.211 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>   - Suggestion Ranking: 0.0ms [count: 0 | time: NaNms (max:-1.0E-6,
> min:9.223372036854775E12)]
> 31.05.2016 12:05:06.214 *INFO* [qtp1118916813-38]
> org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl
> Execution of Chain doidEnhancerChain finished after 14ms for ContentItem
> <urn:content-item-sha1-d2851c0b02e12cc3b42bb6608fa2e1d50c43b17f>
> 31.05.2016 12:05:06.215 *INFO* [qtp1118916813-38]
> org.apache.stanbol.enhancer.servicesapi.EnhancementJobManager > processed
> ContentItem
> <urn:content-item-sha1-d2851c0b02e12cc3b42bb6608fa2e1d50c43b17f> with Chain
> 'doidEnhancerChain' in 14ms | chain:[tika: 1ms (7%), langid: 3ms (21%),
> opennlp-sentence: 0ms (0%), opennlp-token: 0ms (0%), opennlp-pos: 1ms (7%),
> opennlp-ner: 1ms (7%), entityhubExtraction: 4ms (29%), doidEnhancer: 7ms
> (50%), dbpediaLinking: 0ms (0%)], concurrency: 1.0 (0%)
> ```
>
> The Ansible playbook here performs all the steps I am been using to set up
> Stanbol: https://github.com/ecohealthalliance/t11/tree/master/ansible
>
> Thanks,
> -Nathan Breit
>
> --
>
> Nathan Breit
>
> Software Developer
>
> EcoHealth Alliance
>
> 460 West 34th Street – 17th floor
>
> New York, NY 10001
>
> My Skype: nathanathan3 <http://is.gd/OyRVnD>
>
> My Phone Number: 1-425-296-1123
>
> www.ecohealthalliance.org
>
> EcoHealth Alliance leads cutting-edge research into the critical
> connections between human and wildlife health and delicate ecosystems. With
> this science we develop solutions that promote conservation and prevent
> pandemics.
>

Reply via email to