RE: POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2

Jeyendran Balakrishnan Tue, 22 May 2012 07:34:43 -0700

Hi Rupert, are you sure you are loading a model from "en--pos-perceptron.bin", 
but passing it to a POSTaggerME tagger? 
Unless it is a typo in your email, that could be the problem (ie passing a 
perceptron model to a maxent classifier)...


Cheers,
Jeyendran


-----Original Message-----
From: Rupert Westenthaler [mailto:[email protected]] 
Sent: Tuesday, May 22, 2012 5:01 AM
To: [email protected]; [email protected]
Subject: Re: POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2

Hi

Maybe it has to do with the fact that I use topKSequences(String[]
sentence) instead.

basically I use

String sentence = "A nice travel to the biggest volcano of Mexico."
Span[] tokenSpans = tokenizer().tokenizePos(sentence);
String[] tokens = new String[tokenSpans.length]; POSModel model; //loaded from 
"en--pos-perceptron.bin"
POSTaggerME tagger = new POSTaggerME(posModel) for(int ti = 0; 
ti<tokenSpans.length;ti++) {
    tokens[ti] = tokenSpans[ti].getCoveredText(sentence).toString();
}
Sequence[] posSequences = tagger.topKSequences(tokens);

The source can be found at [1] (line 440 - 460)

I can not easily switch between 1.5.1 and 1.5.2 in Stanbol as Stanbol now uses 
1.5.2 as OSGI bundle (what is not possible with 1.5.1). So I used an older 
version of Stanbol for comparing. However the affected code has not changed 
since than.

best
Rupert

[1] 
http://svn.apache.org/repos/asf/incubator/stanbol/trunk/commons/opennlp/src/main/java/org/apache/stanbol/commons/opennlp/TextAnalyzer.java

On Tue, May 22, 2012 at 10:06 AM, Jeyendran Balakrishnan 
<[email protected]> wrote:
> FWIW, I'm using the POS tagger in 1.5.2, and getting probabilities of between 
> 0.5 to 1 for most tags [tested on a few hundred sentences].
> I use tagger.tag(String[] tokens) to get the POS tags, and tagger.probs() 
> immediately afterwards to get the probabilities.
>
> Cheers,
> Jeyendran
>
>
> -----Original Message-----
> From: Jörn Kottmann [mailto:[email protected]]
> Sent: Tuesday, May 22, 2012 12:48 AM
> To: [email protected]
> Subject: Re: POS Tagger Probability changes between OpenNLP 1.5.1 and 
> 1.5.2
>
> Hello,
>
> that looks strange to me, I don't really know which changes caused this.
> Did you run both tests on the same machine?
>
> I will try to reproduce your results.
>
> Thanks for reporting this!
>
> Jörn
>
> On 05/21/2012 01:49 PM, Rupert Westenthaler wrote:
>> Hi,
>>
>> While debugging why POS tags are recently ignored by the Apache 
>> Stanbol Enhancer I noticed that the reason where that with openNLP
>> 1.5.2 the probabilities returned by the POS tagger have changed.
>>
>> Previously typical probabilities of POS tags where>  0.9+ for most of 
>> the tokens. Because of that a configuration that ignores POS tags<
>> 0.8 looked like a reasonable default. However with OpenNLP 1.5.2 
>> probabilities are much lowers. At first it looks even like 1.5.2 
>> returns now the uncertainty ('1-{probability}') instead of the 
>> probability, but after looking a little bit into the source this 
>> seams also unlikely to me.
>>
>> I have already searched the Documentation and recent Jira Issues, but 
>> I could not find anything related.
>>
>> As an example the results for an single Sentence analyzed using 
>> OpenNLP 1.5.1 and 1.5.2.
>>
>> Sentence:
>>
>>      A nice travel to the biggest volcano of Mexico.
>>
>> Tokens are as expected
>>
>> With openNLP 1.5.1 I get the following top Sequence when calling
>> POSTaggerME#topKSequences(tokens):
>>
>> -0.0011259470521596032 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .]
>>
>> Detailed Probabilities:
>>
>> [1.0, 1.0, 0.9999999952604672, 0.9999999999971082, 1.0, 
>> 0.9988748880601196, 0.9999999702598833, 1.0, 0.9999999999989716, 
>> 0.9999998327848956]
>>
>> Switching to openNLP 1.5.2 results in
>>
>> -30.89400016135042 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .]
>>
>> Detailed Probabilities:
>>
>> [0.05013598125548828, 0.053016102976047086, 0.04032588713661259, 
>> 0.03995389549856565, 0.04685198986899964, 0.03659501930208113, 
>> 0.04132356969119329, 0.06434037591280849, 0.046311143933396866, 
>> 0.04233395769746884]
>>
>>
>> Is this a Bug or an intentional change. If the later it would be 
>> great if someone could provide a link to the documentation.
>>
>> best
>> Rupert Westenthaler
>>
>>
>> p.s:
>>
>> with OpenNLP 1.5.1 I refer to
>>
>>      opennlp-tools-1.5.1-incubating.jar
>>      opennlp-maxent-3.0.1-incubating.jar
>>
>> with OpenNLP 1.5.2 I refer to
>>
>>      opennlp-tools-1.5.2-incubating.jar
>>      opennlp-maxent-3.0.2-incubating.jar
>>
>> In both cases the "en-pos-maxent.bin" as available via openly.sf.org 
>> is used
>>
>
>



-- 
| Rupert Westenthaler             [email protected] 
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

RE: POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2

Reply via email to