Hi Rupert, are you sure you are loading a model from "en--pos-perceptron.bin", but passing it to a POSTaggerME tagger? Unless it is a typo in your email, that could be the problem (ie passing a perceptron model to a maxent classifier)...
Cheers, Jeyendran -----Original Message----- From: Rupert Westenthaler [mailto:[email protected]] Sent: Tuesday, May 22, 2012 5:01 AM To: [email protected]; [email protected] Subject: Re: POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2 Hi Maybe it has to do with the fact that I use topKSequences(String[] sentence) instead. basically I use String sentence = "A nice travel to the biggest volcano of Mexico." Span[] tokenSpans = tokenizer().tokenizePos(sentence); String[] tokens = new String[tokenSpans.length]; POSModel model; //loaded from "en--pos-perceptron.bin" POSTaggerME tagger = new POSTaggerME(posModel) for(int ti = 0; ti<tokenSpans.length;ti++) { tokens[ti] = tokenSpans[ti].getCoveredText(sentence).toString(); } Sequence[] posSequences = tagger.topKSequences(tokens); The source can be found at [1] (line 440 - 460) I can not easily switch between 1.5.1 and 1.5.2 in Stanbol as Stanbol now uses 1.5.2 as OSGI bundle (what is not possible with 1.5.1). So I used an older version of Stanbol for comparing. However the affected code has not changed since than. best Rupert [1] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/commons/opennlp/src/main/java/org/apache/stanbol/commons/opennlp/TextAnalyzer.java On Tue, May 22, 2012 at 10:06 AM, Jeyendran Balakrishnan <[email protected]> wrote: > FWIW, I'm using the POS tagger in 1.5.2, and getting probabilities of between > 0.5 to 1 for most tags [tested on a few hundred sentences]. > I use tagger.tag(String[] tokens) to get the POS tags, and tagger.probs() > immediately afterwards to get the probabilities. > > Cheers, > Jeyendran > > > -----Original Message----- > From: Jörn Kottmann [mailto:[email protected]] > Sent: Tuesday, May 22, 2012 12:48 AM > To: [email protected] > Subject: Re: POS Tagger Probability changes between OpenNLP 1.5.1 and > 1.5.2 > > Hello, > > that looks strange to me, I don't really know which changes caused this. > Did you run both tests on the same machine? > > I will try to reproduce your results. > > Thanks for reporting this! > > Jörn > > On 05/21/2012 01:49 PM, Rupert Westenthaler wrote: >> Hi, >> >> While debugging why POS tags are recently ignored by the Apache >> Stanbol Enhancer I noticed that the reason where that with openNLP >> 1.5.2 the probabilities returned by the POS tagger have changed. >> >> Previously typical probabilities of POS tags where> 0.9+ for most of >> the tokens. Because of that a configuration that ignores POS tags< >> 0.8 looked like a reasonable default. However with OpenNLP 1.5.2 >> probabilities are much lowers. At first it looks even like 1.5.2 >> returns now the uncertainty ('1-{probability}') instead of the >> probability, but after looking a little bit into the source this >> seams also unlikely to me. >> >> I have already searched the Documentation and recent Jira Issues, but >> I could not find anything related. >> >> As an example the results for an single Sentence analyzed using >> OpenNLP 1.5.1 and 1.5.2. >> >> Sentence: >> >> A nice travel to the biggest volcano of Mexico. >> >> Tokens are as expected >> >> With openNLP 1.5.1 I get the following top Sequence when calling >> POSTaggerME#topKSequences(tokens): >> >> -0.0011259470521596032 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .] >> >> Detailed Probabilities: >> >> [1.0, 1.0, 0.9999999952604672, 0.9999999999971082, 1.0, >> 0.9988748880601196, 0.9999999702598833, 1.0, 0.9999999999989716, >> 0.9999998327848956] >> >> Switching to openNLP 1.5.2 results in >> >> -30.89400016135042 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .] >> >> Detailed Probabilities: >> >> [0.05013598125548828, 0.053016102976047086, 0.04032588713661259, >> 0.03995389549856565, 0.04685198986899964, 0.03659501930208113, >> 0.04132356969119329, 0.06434037591280849, 0.046311143933396866, >> 0.04233395769746884] >> >> >> Is this a Bug or an intentional change. If the later it would be >> great if someone could provide a link to the documentation. >> >> best >> Rupert Westenthaler >> >> >> p.s: >> >> with OpenNLP 1.5.1 I refer to >> >> opennlp-tools-1.5.1-incubating.jar >> opennlp-maxent-3.0.1-incubating.jar >> >> with OpenNLP 1.5.2 I refer to >> >> opennlp-tools-1.5.2-incubating.jar >> opennlp-maxent-3.0.2-incubating.jar >> >> In both cases the "en-pos-maxent.bin" as available via openly.sf.org >> is used >> > > -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
