Hi Manish

I had the same error once. In that case the error was originating from
some uncommon UTF8 chars, where basically two chars at the same
position where creating a single character in the text.

On Tue, May 7, 2013 at 1:01 PM, Manish Aggarwal <[email protected]> wrote:
> For example if I run the stanbol server for the text
> "Because demand for the Figaro exceeded the 20,000 vehicles built, Nissan
> sold the car by lottery: winners could place orders for the car. Despite
> being a JDM-only model, the Figaro is one of the most imported models of
> the K10 derivatives; its popularity among numerous celebrity owners helped
> it earn cult status. The K10 ceased production on 21 December 1992."
>

I can not reproduce the reported error when using this text. Could be
because some special chars where removed/converted while sending the
mail.
Can you try if you can reproduce the error on
"http://dev.iks-project.eu:8081/enhancer"; or
"http://dev.iks-project.eu:8081/enhancer/chain/dbpedia-noun-linking";?

> Caused by: java.lang.IllegalArgumentException: The span '1' MUST BE >=
> the number of matched tokens '2': Cult status[m=FULL,s=1,c=1(0.875)/2]
> score=1.53125[l=0.875,t=1.75]!
>         at 
> org.apache.stanbol.enhancer.engines.entitylinking.impl.LabelMatch.<init>(LabelMatch.java:96)
>         at 
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker.matchLabel(EntityLinker.java:762)

This exception is basically a "safeguard" that wars for an unexpected
state in the EntityLinking process. One could also just write a
warning to the log, but then it would be much less likely to discover
such issues.

The warning suggests that the section "cult status" is the reason for
the exception. As the matching label "Cult status" is a FULL match,
but the matching score is lower than 1 (0.875) my assumption is that
the section "... earn cult status. The ..." does indeed contain some
special UTF8 characters because otherwise I one would expect an exact
match with a score of 1.0.

Can you please check the original text for such chars? I will have a
look at the exception. Maybe I should catch those exceptions and write
a detailed summary (including the source text, tokens, pos tags ...)
to the logs instead.

Thanks for the report!
best
Rupert

--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to