Xavi Ivars <x...@infobenissa.com> writes: > Not strictly related to this, but I've found what I think is a bug in > the HMM tagger working with the NULL FLUSH. > > You can check it here > > http://api.apertium.org/json/translate?langpair=ca|es&q=correu gives > "corred" (verb) > > http://api.apertium.org/json/translate?langpair=ca|es&q=.%20correu > gives "correo" (noun) > > But I could not reproduce it in local (I guess I'm using the null > flush in a wrong way), but when using (in local) apertium without null > flush, both texts (". correu" and "correu") return "correo".
Using the client/server bash scripts from http://comments.gmane.org/gmane.comp.nlp.apertium/3665 $ ./server apertium-tagger -g -z ca-es.prob & sleep 1 $ echo 'correu' | apertium -d . ca-es-anmor | ./client ^correu<n><m><sg>$^.<sent>$ $ echo 'correu' | apertium -d . ca-es-anmor | ./client ^córrer<vblex><imp><p2><pl>$^.<sent>$ $ echo 'correu' | apertium -d . ca-es-anmor | ./client ^córrer<vblex><imp><p2><pl>$^.<sent>$ $ echo '. correu' | apertium -d . ca-es-anmor | ./client ^.<sent>$ ^correu<n><m><sg>$^.<sent>$ $ echo 'correu' | apertium -d . ca-es-anmor | ./client ^córrer<vblex><imp><p2><pl>$^.<sent>$ So right after startup, we get <n>, then we get <vblex>. A period gives an <n> again, then we are back to vblex. > My guess is that when the tagger gets a null character, all the vector > are not reinitialized (the only thing that it does related to > reinitializing is this, starting at line #856) > > if(morpho_stream.getEndOfFile()) > { > if(null_flush) > { > fputwc_unlocked(L'\0', out); > } > > fflush(out); > morpho_stream.setEndOfFile(false); > } This seems to work: if(morpho_stream.getEndOfFile()) { if(null_flush) { fputwc_unlocked(L'\0', out); tags.clear(); tags.insert(eos); alpha[0][eos] = 1; } fflush(out); morpho_stream.setEndOfFile(false); } I'm not 100 % sure if "alpha[0][eos] = 1;" is needed. It seems to give the same result without it too, but it is set when starting the tagging and doesn't hurt so I included it. Felipe (or anyone else who understands the HMM): does this look right? Should I just commit? -- Kevin Brubeck Unhammer GPG: 0x766AC60C
pgpZ489Ehz_5S.pgp
Description: PGP signature
------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff