Hi Nathan, can you check entities' labels language in your dataset? Cheers, Rafa El El mié, 1 jun 2016 a las 19:30, Nathan Breit <br...@ecohealthalliance.org> escribió:
> Thanks for your assistance Rafa. Unfortunately, I'm still stuck. I used the > following longer test string that was detected as en, "This is really > English text and dengue hemorrhagic fever is a disease." However, there > were still no entity annotations returned. This was printed in my > error.log: > ``` > 01.06.2016 13:14:40.641 *INFO* [Thread-7] > org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine language > identified as en > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > EntityLinking Statistics: > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > - overal: 7ms (text processing: 6%, lookup: 91%, matching 0%, ranking > 0%, other 3%) > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > - Text Processing: 0.399572ms [count: 5 | time: 0.0799144ms > (max:0.366414, min:0.007158)] > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > - Vocabulary Lookup: 6.356819ms [count: 4 | time: 1.58920475ms > (max:2.560572, min:0.893326)] > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > - cache hits: 1 (25.0%) > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > - 0 query results (0 filtered - NaN%) > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > - Label Matching: 0.003802ms [count: 4 | time: 9.505E-4ms (max:0.001065, > min:8.85E-4)] > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > - Suggestion Ranking: 0.0ms [count: 0 | time: NaNms (max:-1.0E-6, > min:9.223372036854775E12)] > 01.06.2016 13:14:40.671 *INFO* [qtp621234008-38] > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > Execution of Chain doidEnhancerChain finished after 36ms for ContentItem > <urn:content-item-sha1-f506f062502e1c37eddbc5777073a1239cba0c4e> > 01.06.2016 13:14:40.672 *INFO* [qtp621234008-38] > org.apache.stanbol.enhancer.servicesapi.EnhancementJobManager > processed > ContentItem > <urn:content-item-sha1-f506f062502e1c37eddbc5777073a1239cba0c4e> with Chain > 'doidEnhancerChain' in 34ms | chain:[langid: 6ms (18%), tika: 0ms (0%), > opennlp-sentence: 1ms (3%), opennlp-token: 0ms (0%), opennlp-pos: 3ms (9%), > opennlp-ner: 5ms (15%), dbpediaLinking: 1ms (3%), entityhubExtraction: 18ms > (53%), doidEnhancer: 9ms (26%)], concurrency: 1.0 (0%) > ``` > I'm not sure what to make of NER mentions in the logs. My enhancement chain > does not include a NER, unless it is being invoked by another enhancer like > opennlp-pos. > Regards, > -Nathan > > On Wed, Jun 1, 2016 at 5:32 PM, Rafa Haro <rh...@apache.org> wrote: > > > Hi Nathan, > > > > You are testing the enhancer with a very short sentence and the Language > > Detection engine is identifying 'no' (probable Norwegian) as the sentence > > language. By default, Stanbol uses the identified language code for both > > loading OpenNLP models in that language and for entity lookup for > searching > > only entity labels in that language. There is a couple of things you can > do > > for avoiding an empty annotation is these situations: > > > > 1. Force the language code as a header in your request (curl request in > > this case) > > 2. Configure English 'en' or whatever language you know your dataset has > > labels for the entities as Default Matching Language which is missing in > > your configuration. More information here: > > > > > https://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking > > > > Also, you also would probably like to disable NER engines for such kind > of > > entities. > > > > Hope that helps, > > Rafa > > > > On Tue, May 31, 2016 at 6:13 PM Nathan Breit < > br...@ecohealthalliance.org> > > wrote: > > > > > Hello, > > > I am trying to configure the Entityhub linking engine to use an > Entityhub > > > site with vocabulary from the Disease Ontology ( > > > http://disease-ontology.org/), > > > but when I enhance text with it, labels from the ontology are not being > > > annotated in the text. I am looking for advice on how to debug this. > Here > > > is what I've tried so far: > > > - I used the genericrdf indexing tool to import the Disease Ontology > > into a > > > new Entityhub site. When I used the entityhub /find API endpoint to > > search > > > for the name "dengue hemorrhagic fever" a result from the Disease > > Ontology > > > was returned. > > > - I configured and built a EntityhubLinkingEngine and a WeightedChain > > > containing the linking engine. They show up on the Stanbol admin site > and > > > felix console. These are the config files: > > > > > > > > > https://github.com/ecohealthalliance/t11/tree/master/ansible/roles/stanbol/templates/enhancer > > > - When I used the following API call to enhance text containing the > same > > > term I was able to find using the /find endpoint, the language detected > > is > > > the only annotation returned. > > > > > > curl -X POST -H "Accept: appltion/json" -H "Content-type: text/plain" > > > --data "Avoid dengue hemorrhagic fever." > > > http://54.197.175.163:3000/enhancer/chain/doidEnhancerChain > > > > > > This appears in the Stanbol error.log when the enhancement runs: > > > > > > ``` > > > 31.05.2016 12:05:06.204 *INFO* [Thread-5] > > > org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine > > language > > > identified as no > > > 31.05.2016 12:05:06.206 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine > > > No NER Model for person and language no available! > > > 31.05.2016 12:05:06.206 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine > > > No NER Model for organization and language no available! > > > 31.05.2016 12:05:06.207 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine > > > No NER Model for location and language no available! > > > 31.05.2016 12:05:06.210 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > EntityLinking Statistics: > > > 31.05.2016 12:05:06.210 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > - overal: 2ms (text processing: 4%, lookup: 127%, matching 0%, > > ranking > > > 0%, other -31%) > > > 31.05.2016 12:05:06.210 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > - Text Processing: 0.071543ms [count: 4 | time: 0.01788575ms > > > (max:0.051031, min:0.005928)] > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > - Vocabulary Lookup: 2.541598ms [count: 3 | time: > 0.8471993333333333ms > > > (max:1.190281, min:0.667284)] > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > - cache hits: 1 (33.333332%) > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > - 0 query results (0 filtered - NaN%) > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > - Label Matching: 0.00218ms [count: 3 | time: 7.266666666666667E-4ms > > > (max:7.55E-4, min:7.04E-4)] > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > - Suggestion Ranking: 0.0ms [count: 0 | time: NaNms (max:-1.0E-6, > > > min:9.223372036854775E12)] > > > 31.05.2016 12:05:06.214 *INFO* [qtp1118916813-38] > > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > > Execution of Chain doidEnhancerChain finished after 14ms for > ContentItem > > > <urn:content-item-sha1-d2851c0b02e12cc3b42bb6608fa2e1d50c43b17f> > > > 31.05.2016 12:05:06.215 *INFO* [qtp1118916813-38] > > > org.apache.stanbol.enhancer.servicesapi.EnhancementJobManager > > processed > > > ContentItem > > > <urn:content-item-sha1-d2851c0b02e12cc3b42bb6608fa2e1d50c43b17f> with > > Chain > > > 'doidEnhancerChain' in 14ms | chain:[tika: 1ms (7%), langid: 3ms (21%), > > > opennlp-sentence: 0ms (0%), opennlp-token: 0ms (0%), opennlp-pos: 1ms > > (7%), > > > opennlp-ner: 1ms (7%), entityhubExtraction: 4ms (29%), doidEnhancer: > 7ms > > > (50%), dbpediaLinking: 0ms (0%)], concurrency: 1.0 (0%) > > > ``` > > > > > > The Ansible playbook here performs all the steps I am been using to set > > up > > > Stanbol: https://github.com/ecohealthalliance/t11/tree/master/ansible > > > > > > Thanks, > > > -Nathan Breit > > > > > > -- > > > > > > Nathan Breit > > > > > > Software Developer > > > > > > EcoHealth Alliance > > > > > > 460 West 34th Street – 17th floor > > > > > > New York, NY 10001 > > > > > > My Skype: nathanathan3 <http://is.gd/OyRVnD> > > > > > > My Phone Number: 1-425-296-1123 > > > > > > www.ecohealthalliance.org > > > > > > EcoHealth Alliance leads cutting-edge research into the critical > > > connections between human and wildlife health and delicate ecosystems. > > With > > > this science we develop solutions that promote conservation and prevent > > > pandemics. > > > > > > > > > -- > > Nathan Breit > > Software Developer > > EcoHealth Alliance > > 460 West 34th Street – 17th floor > > New York, NY 10001 > > My Skype: nathanathan3 <http://is.gd/OyRVnD> > > My Phone Number: 1-425-296-1123 > > www.ecohealthalliance.org > > EcoHealth Alliance leads cutting-edge research into the critical > connections between human and wildlife health and delicate ecosystems. With > this science we develop solutions that promote conservation and prevent > pandemics. >