Hi all,

FYI, Joseph provided a detailed report about his problem. A first look
indicates that this problems could potentially be a bug introduced
with STANBOL-1049 [1] however I had not yet time to look into this as
I was traveling for the last 7 days.

best
Rupert

[1] https://issues.apache.org/jira/browse/STANBOL-1049

On Mon, May 6, 2013 at 10:48 AM, Joseph M'Bimbi-Bene
<[email protected]> wrote:
> i thought it might be a bug in the absence of POS tagging, etc. so i used
> Talismane for NLP tasks, i configured the EnitytihubLinkingEngine to link
> adjectives since it is what Talismane tags "mario" as, but it doesn't
> change anything. here are the logs*
>
>  .EntityLinker --- preocess Token 117: *moustachu *(lemma: none |
> pos:[Value [pos: ADJ(olia:Adjective)].prob=0.4520518431389538]) chunk: none
> .EntityLinker - 116:'*plombier'* (lemma: none | pos:[Value [pos:
> NC(olia:CommonNoun|olia:Noun)].prob=0.6784572817881412]).EntityLinker +
> 118:'supérieure' (lemma: none | pos:[Value [pos:
> ADJ(olia:Adjective)].prob=0.9366843193563169]).EntityLinker >>
> searchStrings *[moustachu, supérieure]*.EntityLinker - found 1
> entities ....EntityLinker
>> http://example.org/resource/Mario (ranking: null).MainLabelTokenizer >
> use Tokenizer class
> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
> for language null
> .MainLabelTokenizer - tokenized le plombier moustachu -> *[le, plombier,
> moustachu]*
> .MainLabelTokenizer > use Tokenizer class
> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
> for language null.
> MainLabelTokenizer - tokenized Mario -> [Mario].EntityLinker - *no match*
>
> why isn't "plombier" in "searchstrings" ? even if i configured the engine
> so that adjective are linkable tokens, according to the documentation,
> "plombier" should be a "matchable token". The behavior of this engine is
> quite disturbing ...
>
>
> 2013/5/6 Joseph M'Bimbi-Bene <[email protected]>
>
>> Hello everybody, i'm having some problems with the EntityhubLinkingEngine.
>> Before about 2 weeks ago, i used it for NER tasks on a custom vocabulary
>> and it worked fine. now I cannot spot entities with label on several words
>> (even with the parameter lmmtip in "languages configuration" and it now
>> seems to be case sensitive, even if configured not to be.
>>
>> Here is what my entity looks like
>>
>> <rdf:Description rdf:about="http://example.org/resource#Mario";>
>>         <skos:prefLabel>Mario</skos: prefLabel>
>>         <skos:altLabel>le plombier moustachu</skos:altLabel>
>>         <rdf:type>http://example.org/concept#gentil</rdf:type>
>>         <rdf:type>http://example.org/concept#humain</rdf:type>
>> </rdf:Description>
>>
>> And i want to spot it with the mention "plombier moustachu".
>> here is a log illustrating what i used to have :
>>
>> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
>> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker ---
>> preocess Token 825: plombier (lemma: none | pos:[]) chunk: none
>>
>> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
>> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     -
>> 824:'le' (lemma: none | pos:[])
>>
>> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
>> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     +
>> 826:'moustachu' (lemma: none | pos:[])
>>
>> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
>> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker   >> 
>> searchStrings
>> [plombier, moustachu]
>>
>> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
>> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker    -
>> found 1 entities ...
>>
>> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
>> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     >
>> http://example.org/resource#Mario
>>
>> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
>> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker       <
>> le plombier moustachu[m=FULL,s=3,c=3(1.0)/3] score=1.0[l=1.0,t=1.0] for
>> http://example.org/resource#Mario
>>
>> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
>> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker   >>
>> Suggestions:
>> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
>> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker    - 0:
>> le plombier moustachu[m=FULL,s=3,c=3(1.0)/3] score=1.0[l=1.0,t=1.0] for
>> http://example.org/resource#Mario
>>
>> and here is what i now have:
>> here with the processing of the token "plombier"
>>
>> EntityLinker --- *preocess Token 17: plombier* (lemma: none | pos:[])
>> chunk: none
>> EntityLinker     - 16:'le' (lemma: none | pos:[])
>> EntityLinker     - 18*:'moustachu'* (lemma: none | pos:[])
>> EntityLinker     - 15:'sont' (lemma: none | pos:[])
>> EntityLinker     - 19:'des' (lemma: none | pos:[])
>> EntityLinker     - 14:',' (lemma: none | pos:[])
>> EntityLinker     - 20:'collines' (lemma: none | pos:[])
>> EntityLinker   >> *searchStrings [plombier]*
>> .MainLabelTokenizer  > use Tokenizer class
>> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
>> for language null
>> MainLabelTokenizer    - tokenized le plombier moustachu ->* **[le,
>> plombier, moustachu]*
>> MainLabelTokenizer  > use Tokenizer class
>> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
>> for language null
>> MainLabelTokenizer    - tokenized Mario -> [Mario]
>> EntityLinker       -* **no match *
>>
>> why isn't "plombier" or moustachu" in the searchstring, just as before ?
>> and now with the processing of "mario"
>>
>>  .EntityLinker --- preocess Token 16: *mario* (lemma: none | pos:[])
>> chunk: none
>> .EntityLinker - 15:'sont' (lemma: none | pos:[])
>> .EntityLinker - 17:'des' (lemma: none | pos:[])
>> .EntityLinker - 14:',' (lemma: none | pos:[])
>> .EntityLinker - 18:'collines' (lemma: none | pos:[])
>> .EntityLinker - 13:'mendips' (lemma: none | pos:[])
>> .EntityLinker - 19:'situées' (lemma: none | pos:[])
>> .EntityLinker >> searchStrings *[mario]*
>> .EntityLinker - found 1 entities ...
>> .EntityLinker > http://example.org/resource/Mario (ranking: null)
>> .MainLabelTokenizer > use Tokenizer class
>> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
>> for language null
>> .MainLabelTokenizer - tokenized le plombier moustachu -> [le, plombier,
>> moustachu]
>> .MainLabelTokenizer > use Tokenizer class
>> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
>> for language null
>> .MainLabelTokenizer - tokenized Mario -> *[Mario]*
>> .EntityLinker - *no match*
>>
>>  why isn't "mario" matched against "Mario", i configured the engine so
>> thtat it is not case sensitive
>>
>> as you can see, in the MaxTokenSearchDistance, i still have "le" and
>> "moustachu" tokens but it doesn't go in the SearchString for lookup. In the
>> result of the enhancement is now pretty bad. What is going on ?
>>
>> Thank you a lot in advance
>>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to