The entities' labels are in English but they don't have a language attribute. If that is required, is there a way I can specify a mapping that will give all the labels an @en attribute? I tried adding "rdfs:label > rdfs:label@en" to the generic rdf reader's mappings.txt to no avail. Thanks, -Nathan
On Thu, Jun 2, 2016 at 7:07 AM, Rafa Haro <rh...@apache.org> wrote: > Hi Nathan, can you check entities' labels language in your dataset? > > Cheers, > Rafa > El El mié, 1 jun 2016 a las 19:30, Nathan Breit < > br...@ecohealthalliance.org> > escribió: > > > Thanks for your assistance Rafa. Unfortunately, I'm still stuck. I used > the > > following longer test string that was detected as en, "This is really > > English text and dengue hemorrhagic fever is a disease." However, there > > were still no entity annotations returned. This was printed in my > > error.log: > > ``` > > 01.06.2016 13:14:40.641 *INFO* [Thread-7] > > org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine > language > > identified as en > > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > EntityLinking Statistics: > > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > - overal: 7ms (text processing: 6%, lookup: 91%, matching 0%, ranking > > 0%, other 3%) > > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > - Text Processing: 0.399572ms [count: 5 | time: 0.0799144ms > > (max:0.366414, min:0.007158)] > > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > - Vocabulary Lookup: 6.356819ms [count: 4 | time: 1.58920475ms > > (max:2.560572, min:0.893326)] > > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > - cache hits: 1 (25.0%) > > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > - 0 query results (0 filtered - NaN%) > > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > - Label Matching: 0.003802ms [count: 4 | time: 9.505E-4ms > (max:0.001065, > > min:8.85E-4)] > > 01.06.2016 13:14:40.670 *INFO* [Thread-5] > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > - Suggestion Ranking: 0.0ms [count: 0 | time: NaNms (max:-1.0E-6, > > min:9.223372036854775E12)] > > 01.06.2016 13:14:40.671 *INFO* [qtp621234008-38] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > Execution of Chain doidEnhancerChain finished after 36ms for ContentItem > > <urn:content-item-sha1-f506f062502e1c37eddbc5777073a1239cba0c4e> > > 01.06.2016 13:14:40.672 *INFO* [qtp621234008-38] > > org.apache.stanbol.enhancer.servicesapi.EnhancementJobManager > processed > > ContentItem > > <urn:content-item-sha1-f506f062502e1c37eddbc5777073a1239cba0c4e> with > Chain > > 'doidEnhancerChain' in 34ms | chain:[langid: 6ms (18%), tika: 0ms (0%), > > opennlp-sentence: 1ms (3%), opennlp-token: 0ms (0%), opennlp-pos: 3ms > (9%), > > opennlp-ner: 5ms (15%), dbpediaLinking: 1ms (3%), entityhubExtraction: > 18ms > > (53%), doidEnhancer: 9ms (26%)], concurrency: 1.0 (0%) > > ``` > > I'm not sure what to make of NER mentions in the logs. My enhancement > chain > > does not include a NER, unless it is being invoked by another enhancer > like > > opennlp-pos. > > Regards, > > -Nathan > > > > On Wed, Jun 1, 2016 at 5:32 PM, Rafa Haro <rh...@apache.org> wrote: > > > > > Hi Nathan, > > > > > > You are testing the enhancer with a very short sentence and the > Language > > > Detection engine is identifying 'no' (probable Norwegian) as the > sentence > > > language. By default, Stanbol uses the identified language code for > both > > > loading OpenNLP models in that language and for entity lookup for > > searching > > > only entity labels in that language. There is a couple of things you > can > > do > > > for avoiding an empty annotation is these situations: > > > > > > 1. Force the language code as a header in your request (curl request in > > > this case) > > > 2. Configure English 'en' or whatever language you know your dataset > has > > > labels for the entities as Default Matching Language which is missing > in > > > your configuration. More information here: > > > > > > > > > https://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking > > > > > > Also, you also would probably like to disable NER engines for such kind > > of > > > entities. > > > > > > Hope that helps, > > > Rafa > > > > > > On Tue, May 31, 2016 at 6:13 PM Nathan Breit < > > br...@ecohealthalliance.org> > > > wrote: > > > > > > > Hello, > > > > I am trying to configure the Entityhub linking engine to use an > > Entityhub > > > > site with vocabulary from the Disease Ontology ( > > > > http://disease-ontology.org/), > > > > but when I enhance text with it, labels from the ontology are not > being > > > > annotated in the text. I am looking for advice on how to debug this. > > Here > > > > is what I've tried so far: > > > > - I used the genericrdf indexing tool to import the Disease Ontology > > > into a > > > > new Entityhub site. When I used the entityhub /find API endpoint to > > > search > > > > for the name "dengue hemorrhagic fever" a result from the Disease > > > Ontology > > > > was returned. > > > > - I configured and built a EntityhubLinkingEngine and a WeightedChain > > > > containing the linking engine. They show up on the Stanbol admin site > > and > > > > felix console. These are the config files: > > > > > > > > > > > > > > https://github.com/ecohealthalliance/t11/tree/master/ansible/roles/stanbol/templates/enhancer > > > > - When I used the following API call to enhance text containing the > > same > > > > term I was able to find using the /find endpoint, the language > detected > > > is > > > > the only annotation returned. > > > > > > > > curl -X POST -H "Accept: appltion/json" -H "Content-type: text/plain" > > > > --data "Avoid dengue hemorrhagic fever." > > > > http://54.197.175.163:3000/enhancer/chain/doidEnhancerChain > > > > > > > > This appears in the Stanbol error.log when the enhancement runs: > > > > > > > > ``` > > > > 31.05.2016 12:05:06.204 *INFO* [Thread-5] > > > > org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine > > > language > > > > identified as no > > > > 31.05.2016 12:05:06.206 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine > > > > No NER Model for person and language no available! > > > > 31.05.2016 12:05:06.206 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine > > > > No NER Model for organization and language no available! > > > > 31.05.2016 12:05:06.207 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine > > > > No NER Model for location and language no available! > > > > 31.05.2016 12:05:06.210 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > > EntityLinking Statistics: > > > > 31.05.2016 12:05:06.210 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > > - overal: 2ms (text processing: 4%, lookup: 127%, matching 0%, > > > ranking > > > > 0%, other -31%) > > > > 31.05.2016 12:05:06.210 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > > - Text Processing: 0.071543ms [count: 4 | time: 0.01788575ms > > > > (max:0.051031, min:0.005928)] > > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > > - Vocabulary Lookup: 2.541598ms [count: 3 | time: > > 0.8471993333333333ms > > > > (max:1.190281, min:0.667284)] > > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > > - cache hits: 1 (33.333332%) > > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > > - 0 query results (0 filtered - NaN%) > > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > > - Label Matching: 0.00218ms [count: 3 | time: > 7.266666666666667E-4ms > > > > (max:7.55E-4, min:7.04E-4)] > > > > 31.05.2016 12:05:06.211 *INFO* [Thread-5] > > > > > > > > > > > > > > org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine > > > > - Suggestion Ranking: 0.0ms [count: 0 | time: NaNms (max:-1.0E-6, > > > > min:9.223372036854775E12)] > > > > 31.05.2016 12:05:06.214 *INFO* [qtp1118916813-38] > > > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > > > Execution of Chain doidEnhancerChain finished after 14ms for > > ContentItem > > > > <urn:content-item-sha1-d2851c0b02e12cc3b42bb6608fa2e1d50c43b17f> > > > > 31.05.2016 12:05:06.215 *INFO* [qtp1118916813-38] > > > > org.apache.stanbol.enhancer.servicesapi.EnhancementJobManager > > > processed > > > > ContentItem > > > > <urn:content-item-sha1-d2851c0b02e12cc3b42bb6608fa2e1d50c43b17f> with > > > Chain > > > > 'doidEnhancerChain' in 14ms | chain:[tika: 1ms (7%), langid: 3ms > (21%), > > > > opennlp-sentence: 0ms (0%), opennlp-token: 0ms (0%), opennlp-pos: 1ms > > > (7%), > > > > opennlp-ner: 1ms (7%), entityhubExtraction: 4ms (29%), doidEnhancer: > > 7ms > > > > (50%), dbpediaLinking: 0ms (0%)], concurrency: 1.0 (0%) > > > > ``` > > > > > > > > The Ansible playbook here performs all the steps I am been using to > set > > > up > > > > Stanbol: > https://github.com/ecohealthalliance/t11/tree/master/ansible > > > > > > > > Thanks, > > > > -Nathan Breit > > > > > > > > -- > > > > > > > > Nathan Breit > > > > > > > > Software Developer > > > > > > > > EcoHealth Alliance > > > > > > > > 460 West 34th Street – 17th floor > > > > > > > > New York, NY 10001 > > > > > > > > My Skype: nathanathan3 <http://is.gd/OyRVnD> > > > > > > > > My Phone Number: 1-425-296-1123 > > > > > > > > www.ecohealthalliance.org > > > > > > > > EcoHealth Alliance leads cutting-edge research into the critical > > > > connections between human and wildlife health and delicate > ecosystems. > > > With > > > > this science we develop solutions that promote conservation and > prevent > > > > pandemics. > > > > > > > > > > > > > > > -- > > > > Nathan Breit > > > > Software Developer > > > > EcoHealth Alliance > > > > 460 West 34th Street – 17th floor > > > > New York, NY 10001 > > > > My Skype: nathanathan3 <http://is.gd/OyRVnD> > > > > My Phone Number: 1-425-296-1123 > > > > www.ecohealthalliance.org > > > > EcoHealth Alliance leads cutting-edge research into the critical > > connections between human and wildlife health and delicate ecosystems. > With > > this science we develop solutions that promote conservation and prevent > > pandemics. > > > -- Nathan Breit Software Developer EcoHealth Alliance 460 West 34th Street – 17th floor New York, NY 10001 My Skype: nathanathan3 <http://is.gd/OyRVnD> My Phone Number: 1-425-296-1123 www.ecohealthalliance.org EcoHealth Alliance leads cutting-edge research into the critical connections between human and wildlife health and delicate ecosystems. With this science we develop solutions that promote conservation and prevent pandemics.