Hi

On Thu, Jun 6, 2013 at 5:26 PM, Joseph M'Bimbi-Bene
<jbi...@object-ive.com> wrote:
> Hello, sorry for the late answer. Thank you for yours
>
>
> 2013/6/3 Rupert Westenthaler <rupert.westentha...@gmail.com>
>
>> Hi Joseph
>>
>> On Mon, Jun 3, 2013 at 3:43 PM, Joseph M'Bimbi-Bene
>> <jbi...@object-ive.com> wrote:
>> [..]
>> >
>> > Now, the logs of the processing of the token "La"
>> >
>> > ProcessingState > 0: Token: [1087, 1089] La (pos:[Value [pos:
>> > ADJ(olia:Adjective)].prob=0.016871281997002517]) chunk: 'none'
>> >
>> > ProcessingState - TokenData: 'La'[linkable=true(linkabkePos=null)|
>> > matchable=true(matchablePos=null)| alpha=true| seachLength=true|
>> > upperCase=true]
>> >
>>
>> The reason why the 'La' of the last sentence of your document is
>> marked as 'linkable' is the combination of the following things:
>>
>> 1. the POS tag has a very low probability (0.017) and is therefore
>> ignored as the configured minimum probability is higher as that.
>>
>
> Actually, i set both parameters "prop" and "pprob" to 0.01 , i didn't
> commit any mistake, did i ? You mentionned or a previous mail something
> about a strange tokenizing behaviour, it might be a source of a new
> problem: here is, for example a log excerpt from the stanbol web console
> for an integration test. I isolated the pathologic case :
>

The reason is that "prop=0.01" should be "prob=0.01". There is a typo
in the default configuration, because of that the changed value for
"prop" does not have any effect. I created STANBOL-1100 for fixing
this.

> and when i curl the text to Talismane, i get the following message:
>
> 16:49:21,166 [main] INFO server.Main - ... starting server
> 16:53:55,560 [btpool0-2] ERROR resource.AnalysisResource - Exception while
> analysing Blob
> java.lang. IllegalArgumentException: Illegal span [2199,2201] for Token
> relative to Text: [0, 2200] : Span of the contained Token MUST NOT extend
> the others!

When implementing the Talismane Stanbol integration I had a lot of
problems with the getting the index positions right. Getting a Span of
a Token exceeding the size of the document could indicate that there
are still some problems with that.

If you come across a text that can reproduce this please open an issue
on the stanbol-talismane [1]

[1] https://github.com/westei/stanbol-talismane

best
Rupert


--
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to