Aaaaaah, thank you a lot, i should have figured out the misspelling myself !
Until i can find another pathological text, i isolated one and sent you in private if you have the opportunity to work on it. My colleagues told me it is not supposed to be published. And rolling back to the "__3. Unknown POS tag Rules__" of the issue n°1049, i think the behaviour should be left to the user and parametrisable. As far as i'm concernd, when POS tagging is available, i would like to simply ignore tokens without POS tags or below a specified threshold. In an ideal case, i would like to be able to edit/set rules for linkable / matchable tokens from the web console. 2013/6/7 Rupert Westenthaler <[email protected]> > Hi > > On Thu, Jun 6, 2013 at 5:26 PM, Joseph M'Bimbi-Bene > <[email protected]> wrote: > > Hello, sorry for the late answer. Thank you for yours > > > > > > 2013/6/3 Rupert Westenthaler <[email protected]> > > > >> Hi Joseph > >> > >> On Mon, Jun 3, 2013 at 3:43 PM, Joseph M'Bimbi-Bene > >> <[email protected]> wrote: > >> [..] > >> > > >> > Now, the logs of the processing of the token "La" > >> > > >> > ProcessingState > 0: Token: [1087, 1089] La (pos:[Value [pos: > >> > ADJ(olia:Adjective)].prob=0.016871281997002517]) chunk: 'none' > >> > > >> > ProcessingState - TokenData: 'La'[linkable=true(linkabkePos=null)| > >> > matchable=true(matchablePos=null)| alpha=true| seachLength=true| > >> > upperCase=true] > >> > > >> > >> The reason why the 'La' of the last sentence of your document is > >> marked as 'linkable' is the combination of the following things: > >> > >> 1. the POS tag has a very low probability (0.017) and is therefore > >> ignored as the configured minimum probability is higher as that. > >> > > > > Actually, i set both parameters "prop" and "pprob" to 0.01 , i didn't > > commit any mistake, did i ? You mentionned or a previous mail something > > about a strange tokenizing behaviour, it might be a source of a new > > problem: here is, for example a log excerpt from the stanbol web console > > for an integration test. I isolated the pathologic case : > > > > The reason is that "prop=0.01" should be "prob=0.01". There is a typo > in the default configuration, because of that the changed value for > "prop" does not have any effect. I created STANBOL-1100 for fixing > this. > > > and when i curl the text to Talismane, i get the following message: > > > > 16:49:21,166 [main] INFO server.Main - ... starting server > > 16:53:55,560 [btpool0-2] ERROR resource.AnalysisResource - Exception > while > > analysing Blob > > java.lang. IllegalArgumentException: Illegal span [2199,2201] for Token > > relative to Text: [0, 2200] : Span of the contained Token MUST NOT extend > > the others! > > When implementing the Talismane Stanbol integration I had a lot of > problems with the getting the index positions right. Getting a Span of > a Token exceeding the size of the document could indicate that there > are still some problems with that. > > If you come across a text that can reproduce this please open an issue > on the stanbol-talismane [1] > > [1] https://github.com/westei/stanbol-talismane > > best > Rupert > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >
