problem with entityhub

Joseph M'Bimbi-Bene Mon, 06 May 2013 01:37:03 -0700

Hello everybody, i'm having some problems with the EntityhubLinkingEngine.
Before about 2 weeks ago, i used it for NER tasks on a custom vocabulary
and it worked fine. now I cannot spot entities with label on several words
(even with the parameter lmmtip in "languages configuration" and it now
seems to be case sensitive, even if configured not to be.


Here is what my entity looks like

<rdf:Description rdf:about="http://example.org/resource#Mario";>
        <skos:prefLabel>Mario</skos: prefLabel>
        <skos:altLabel>le plombier moustachu</skos:altLabel>
        <rdf:type>http://example.org/concept#gentil</rdf:type>
        <rdf:type>http://example.org/concept#humain</rdf:type>
</rdf:Description>

And i want to spot it with the mention "plombier moustachu".
here is a log illustrating what i used to have :

18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker ---
preocess Token 825: plombier (lemma: none | pos:[]) chunk: none

18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     -
824:'le' (lemma: none | pos:[])

18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     +
826:'moustachu' (lemma: none | pos:[])

18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker
>> searchStrings
[plombier, moustachu]

18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker    -
found 1 entities ...

18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     >
http://example.org/resource#Mario

18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker       <
le plombier moustachu[m=FULL,s=3,c=3(1.0)/3] score=1.0[l=1.0,t=1.0] for
http://example.org/resource#Mario

18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker   >>
Suggestions:
18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker    - 0:
le plombier moustachu[m=FULL,s=3,c=3(1.0)/3] score=1.0[l=1.0,t=1.0] for
http://example.org/resource#Mario

and here is what i now have:
here with the processing of the token "plombier"

EntityLinker --- *preocess Token 17: plombier* (lemma: none | pos:[])
chunk: none
EntityLinker     - 16:'le' (lemma: none | pos:[])
EntityLinker     - 18*:'moustachu'* (lemma: none | pos:[])
EntityLinker     - 15:'sont' (lemma: none | pos:[])
EntityLinker     - 19:'des' (lemma: none | pos:[])
EntityLinker     - 14:',' (lemma: none | pos:[])
EntityLinker     - 20:'collines' (lemma: none | pos:[])
EntityLinker   >> *searchStrings [plombier]*
.MainLabelTokenizer  > use Tokenizer class
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
for language null
MainLabelTokenizer    - tokenized le plombier moustachu ->* **[le,
plombier, moustachu]*
MainLabelTokenizer  > use Tokenizer class
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
for language null
MainLabelTokenizer    - tokenized Mario -> [Mario]
EntityLinker       -* **no match *

why isn't "plombier" or moustachu" in the searchstring, just as before ?
and now with the processing of "mario"

 .EntityLinker --- preocess Token 16: *mario* (lemma: none | pos:[]) chunk:
none
.EntityLinker - 15:'sont' (lemma: none | pos:[])
.EntityLinker - 17:'des' (lemma: none | pos:[])
.EntityLinker - 14:',' (lemma: none | pos:[])
.EntityLinker - 18:'collines' (lemma: none | pos:[])
.EntityLinker - 13:'mendips' (lemma: none | pos:[])
.EntityLinker - 19:'situées' (lemma: none | pos:[])
.EntityLinker >> searchStrings *[mario]*
.EntityLinker - found 1 entities ...
.EntityLinker > http://example.org/resource/Mario (ranking: null)
.MainLabelTokenizer > use Tokenizer class
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
for language null
.MainLabelTokenizer - tokenized le plombier moustachu -> [le, plombier,
moustachu]
.MainLabelTokenizer > use Tokenizer class
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
for language null
.MainLabelTokenizer - tokenized Mario -> *[Mario]*
.EntityLinker - *no match*

 why isn't "mario" matched against "Mario", i configured the engine so
thtat it is not case sensitive

as you can see, in the MaxTokenSearchDistance, i still have "le" and
"moustachu" tokens but it doesn't go in the SearchString for lookup. In the
result of the enhancement is now pretty bad. What is going on ?

Thank you a lot in advance

problem with entityhub

Reply via email to