Hi
Maybe it has to do with the fact that I use topKSequences(String[]
sentence) instead.
basically I use
String sentence = "A nice travel to the biggest volcano of Mexico."
Span[] tokenSpans = tokenizer().tokenizePos(sentence);
String[] tokens = new String[tokenSpans.length];
POSModel model; //loaded from "en--pos-perceptron.bin"
POSTaggerME tagger = new POSTaggerME(posModel)
for(int ti = 0; ti<tokenSpans.length;ti++) {
tokens[ti] = tokenSpans[ti].getCoveredText(sentence).toString();
}
Sequence[] posSequences = tagger.topKSequences(tokens);
The source can be found at [1] (line 440 - 460)
I can not easily switch between 1.5.1 and 1.5.2 in Stanbol as Stanbol
now uses 1.5.2 as OSGI bundle (what is not possible with 1.5.1). So I
used an older version of Stanbol for comparing. However the affected
code has not changed since than.
best
Rupert
[1]
http://svn.apache.org/repos/asf/incubator/stanbol/trunk/commons/opennlp/src/main/java/org/apache/stanbol/commons/opennlp/TextAnalyzer.java
On Tue, May 22, 2012 at 10:06 AM, Jeyendran Balakrishnan
<[email protected]> wrote:
> FWIW, I'm using the POS tagger in 1.5.2, and getting probabilities of between
> 0.5 to 1 for most tags [tested on a few hundred sentences].
> I use tagger.tag(String[] tokens) to get the POS tags, and tagger.probs()
> immediately afterwards to get the probabilities.
>
> Cheers,
> Jeyendran
>
>
> -----Original Message-----
> From: Jörn Kottmann [mailto:[email protected]]
> Sent: Tuesday, May 22, 2012 12:48 AM
> To: [email protected]
> Subject: Re: POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2
>
> Hello,
>
> that looks strange to me, I don't really know which changes caused this.
> Did you run both tests on the same machine?
>
> I will try to reproduce your results.
>
> Thanks for reporting this!
>
> Jörn
>
> On 05/21/2012 01:49 PM, Rupert Westenthaler wrote:
>> Hi,
>>
>> While debugging why POS tags are recently ignored by the Apache
>> Stanbol Enhancer I noticed that the reason where that with openNLP
>> 1.5.2 the probabilities returned by the POS tagger have changed.
>>
>> Previously typical probabilities of POS tags where> 0.9+ for most of
>> the tokens. Because of that a configuration that ignores POS tags<
>> 0.8 looked like a reasonable default. However with OpenNLP 1.5.2
>> probabilities are much lowers. At first it looks even like 1.5.2
>> returns now the uncertainty ('1-{probability}') instead of the
>> probability, but after looking a little bit into the source this seams
>> also unlikely to me.
>>
>> I have already searched the Documentation and recent Jira Issues, but
>> I could not find anything related.
>>
>> As an example the results for an single Sentence analyzed using
>> OpenNLP 1.5.1 and 1.5.2.
>>
>> Sentence:
>>
>> A nice travel to the biggest volcano of Mexico.
>>
>> Tokens are as expected
>>
>> With openNLP 1.5.1 I get the following top Sequence when calling
>> POSTaggerME#topKSequences(tokens):
>>
>> -0.0011259470521596032 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .]
>>
>> Detailed Probabilities:
>>
>> [1.0, 1.0, 0.9999999952604672, 0.9999999999971082, 1.0,
>> 0.9988748880601196, 0.9999999702598833, 1.0, 0.9999999999989716,
>> 0.9999998327848956]
>>
>> Switching to openNLP 1.5.2 results in
>>
>> -30.89400016135042 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .]
>>
>> Detailed Probabilities:
>>
>> [0.05013598125548828, 0.053016102976047086, 0.04032588713661259,
>> 0.03995389549856565, 0.04685198986899964, 0.03659501930208113,
>> 0.04132356969119329, 0.06434037591280849, 0.046311143933396866,
>> 0.04233395769746884]
>>
>>
>> Is this a Bug or an intentional change. If the later it would be great
>> if someone could provide a link to the documentation.
>>
>> best
>> Rupert Westenthaler
>>
>>
>> p.s:
>>
>> with OpenNLP 1.5.1 I refer to
>>
>> opennlp-tools-1.5.1-incubating.jar
>> opennlp-maxent-3.0.1-incubating.jar
>>
>> with OpenNLP 1.5.2 I refer to
>>
>> opennlp-tools-1.5.2-incubating.jar
>> opennlp-maxent-3.0.2-incubating.jar
>>
>> In both cases the "en-pos-maxent.bin" as available via openly.sf.org
>> is used
>>
>
>
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen