Hi Nathan, can you check entities' labels language in your dataset?

Cheers,
Rafa
El El mié, 1 jun 2016 a las 19:30, Nathan Breit <br...@ecohealthalliance.org>
escribió:

> Thanks for your assistance Rafa. Unfortunately, I'm still stuck. I used the
> following longer test string that was detected as en, "This is really
> English text and dengue hemorrhagic fever is a disease." However, there
> were still no entity annotations returned. This was printed in my
> error.log:
> ```
> 01.06.2016 13:14:40.641 *INFO* [Thread-7]
> org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine language
> identified as en
> 01.06.2016 13:14:40.670 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> EntityLinking Statistics:
> 01.06.2016 13:14:40.670 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>     - overal: 7ms (text processing: 6%, lookup: 91%, matching 0%, ranking
> 0%, other 3%)
> 01.06.2016 13:14:40.670 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>   - Text Processing: 0.399572ms [count: 5 | time: 0.0799144ms
> (max:0.366414, min:0.007158)]
> 01.06.2016 13:14:40.670 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>   - Vocabulary Lookup: 6.356819ms [count: 4 | time: 1.58920475ms
> (max:2.560572, min:0.893326)]
> 01.06.2016 13:14:40.670 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>     - cache hits: 1 (25.0%)
> 01.06.2016 13:14:40.670 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>       - 0 query results (0 filtered - NaN%)
> 01.06.2016 13:14:40.670 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>   - Label Matching: 0.003802ms [count: 4 | time: 9.505E-4ms (max:0.001065,
> min:8.85E-4)]
> 01.06.2016 13:14:40.670 *INFO* [Thread-5]
>
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
>   - Suggestion Ranking: 0.0ms [count: 0 | time: NaNms (max:-1.0E-6,
> min:9.223372036854775E12)]
> 01.06.2016 13:14:40.671 *INFO* [qtp621234008-38]
> org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl
> Execution of Chain doidEnhancerChain finished after 36ms for ContentItem
> <urn:content-item-sha1-f506f062502e1c37eddbc5777073a1239cba0c4e>
> 01.06.2016 13:14:40.672 *INFO* [qtp621234008-38]
> org.apache.stanbol.enhancer.servicesapi.EnhancementJobManager > processed
> ContentItem
> <urn:content-item-sha1-f506f062502e1c37eddbc5777073a1239cba0c4e> with Chain
> 'doidEnhancerChain' in 34ms | chain:[langid: 6ms (18%), tika: 0ms (0%),
> opennlp-sentence: 1ms (3%), opennlp-token: 0ms (0%), opennlp-pos: 3ms (9%),
> opennlp-ner: 5ms (15%), dbpediaLinking: 1ms (3%), entityhubExtraction: 18ms
> (53%), doidEnhancer: 9ms (26%)], concurrency: 1.0 (0%)
> ```
> I'm not sure what to make of NER mentions in the logs. My enhancement chain
> does not include a NER, unless it is being invoked by another enhancer like
> opennlp-pos.
> Regards,
> -Nathan
>
> On Wed, Jun 1, 2016 at 5:32 PM, Rafa Haro <rh...@apache.org> wrote:
>
> > Hi Nathan,
> >
> > You are testing the enhancer with a very short sentence and the Language
> > Detection engine is identifying 'no' (probable Norwegian) as the sentence
> > language. By default, Stanbol uses the identified language code for both
> > loading OpenNLP models in that language and for entity lookup for
> searching
> > only entity labels in that language. There is a couple of things you can
> do
> > for avoiding an empty annotation is these situations:
> >
> > 1. Force the language code as a header in your request (curl request in
> > this case)
> > 2. Configure English 'en' or whatever language you know your dataset has
> > labels for the entities as Default Matching Language which is missing in
> > your configuration. More information here:
> >
> >
> https://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking
> >
> > Also, you also would probably like to disable NER engines for such kind
> of
> > entities.
> >
> > Hope that helps,
> > Rafa
> >
> > On Tue, May 31, 2016 at 6:13 PM Nathan Breit <
> br...@ecohealthalliance.org>
> > wrote:
> >
> > > Hello,
> > > I am trying to configure the Entityhub linking engine to use an
> Entityhub
> > > site with vocabulary from the Disease Ontology (
> > > http://disease-ontology.org/),
> > > but when I enhance text with it, labels from the ontology are not being
> > > annotated in the text. I am looking for advice on how to debug this.
> Here
> > > is what I've tried so far:
> > > - I used the genericrdf indexing tool to import the Disease Ontology
> > into a
> > > new Entityhub site. When I used the entityhub /find API endpoint to
> > search
> > > for the name "dengue hemorrhagic fever" a result from the Disease
> > Ontology
> > > was returned.
> > > - I configured and built a EntityhubLinkingEngine and a WeightedChain
> > > containing the linking engine. They show up on the Stanbol admin site
> and
> > > felix console. These are the config files:
> > >
> > >
> >
> https://github.com/ecohealthalliance/t11/tree/master/ansible/roles/stanbol/templates/enhancer
> > > - When I used the following API call to enhance text containing the
> same
> > > term I was able to find using the /find endpoint, the language detected
> > is
> > > the only annotation returned.
> > >
> > > curl -X POST -H "Accept: appltion/json" -H "Content-type: text/plain"
> > > --data "Avoid dengue hemorrhagic fever."
> > > http://54.197.175.163:3000/enhancer/chain/doidEnhancerChain
> > >
> > > This appears in the Stanbol error.log when the enhancement runs:
> > >
> > > ```
> > > 31.05.2016 12:05:06.204 *INFO* [Thread-5]
> > > org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine
> > language
> > > identified as no
> > > 31.05.2016 12:05:06.206 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine
> > > No NER Model for person and language no available!
> > > 31.05.2016 12:05:06.206 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine
> > > No NER Model for organization and language no available!
> > > 31.05.2016 12:05:06.207 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine
> > > No NER Model for location and language no available!
> > > 31.05.2016 12:05:06.210 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> > > EntityLinking Statistics:
> > > 31.05.2016 12:05:06.210 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> > >     - overal: 2ms (text processing: 4%, lookup: 127%, matching 0%,
> > ranking
> > > 0%, other -31%)
> > > 31.05.2016 12:05:06.210 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> > >   - Text Processing: 0.071543ms [count: 4 | time: 0.01788575ms
> > > (max:0.051031, min:0.005928)]
> > > 31.05.2016 12:05:06.211 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> > >   - Vocabulary Lookup: 2.541598ms [count: 3 | time:
> 0.8471993333333333ms
> > > (max:1.190281, min:0.667284)]
> > > 31.05.2016 12:05:06.211 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> > >     - cache hits: 1 (33.333332%)
> > > 31.05.2016 12:05:06.211 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> > >       - 0 query results (0 filtered - NaN%)
> > > 31.05.2016 12:05:06.211 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> > >   - Label Matching: 0.00218ms [count: 3 | time: 7.266666666666667E-4ms
> > > (max:7.55E-4, min:7.04E-4)]
> > > 31.05.2016 12:05:06.211 *INFO* [Thread-5]
> > >
> > >
> >
> org.apache.stanbol.enhancer.engines.entitylinking.engine.EntityLinkingEngine
> > >   - Suggestion Ranking: 0.0ms [count: 0 | time: NaNms (max:-1.0E-6,
> > > min:9.223372036854775E12)]
> > > 31.05.2016 12:05:06.214 *INFO* [qtp1118916813-38]
> > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl
> > > Execution of Chain doidEnhancerChain finished after 14ms for
> ContentItem
> > > <urn:content-item-sha1-d2851c0b02e12cc3b42bb6608fa2e1d50c43b17f>
> > > 31.05.2016 12:05:06.215 *INFO* [qtp1118916813-38]
> > > org.apache.stanbol.enhancer.servicesapi.EnhancementJobManager >
> processed
> > > ContentItem
> > > <urn:content-item-sha1-d2851c0b02e12cc3b42bb6608fa2e1d50c43b17f> with
> > Chain
> > > 'doidEnhancerChain' in 14ms | chain:[tika: 1ms (7%), langid: 3ms (21%),
> > > opennlp-sentence: 0ms (0%), opennlp-token: 0ms (0%), opennlp-pos: 1ms
> > (7%),
> > > opennlp-ner: 1ms (7%), entityhubExtraction: 4ms (29%), doidEnhancer:
> 7ms
> > > (50%), dbpediaLinking: 0ms (0%)], concurrency: 1.0 (0%)
> > > ```
> > >
> > > The Ansible playbook here performs all the steps I am been using to set
> > up
> > > Stanbol: https://github.com/ecohealthalliance/t11/tree/master/ansible
> > >
> > > Thanks,
> > > -Nathan Breit
> > >
> > > --
> > >
> > > Nathan Breit
> > >
> > > Software Developer
> > >
> > > EcoHealth Alliance
> > >
> > > 460 West 34th Street – 17th floor
> > >
> > > New York, NY 10001
> > >
> > > My Skype: nathanathan3 <http://is.gd/OyRVnD>
> > >
> > > My Phone Number: 1-425-296-1123
> > >
> > > www.ecohealthalliance.org
> > >
> > > EcoHealth Alliance leads cutting-edge research into the critical
> > > connections between human and wildlife health and delicate ecosystems.
> > With
> > > this science we develop solutions that promote conservation and prevent
> > > pandemics.
> > >
> >
>
>
>
> --
>
> Nathan Breit
>
> Software Developer
>
> EcoHealth Alliance
>
> 460 West 34th Street – 17th floor
>
> New York, NY 10001
>
> My Skype: nathanathan3 <http://is.gd/OyRVnD>
>
> My Phone Number: 1-425-296-1123
>
> www.ecohealthalliance.org
>
> EcoHealth Alliance leads cutting-edge research into the critical
> connections between human and wildlife health and delicate ecosystems. With
> this science we develop solutions that promote conservation and prevent
> pandemics.
>

Reply via email to