Xavi Ivars <x...@infobenissa.com> writes:

> Not strictly related to this, but I've found what I think is a bug in
> the HMM tagger working with the NULL FLUSH.
>
> You can check it here
>
> http://api.apertium.org/json/translate?langpair=ca|es&q=correu gives
> "corred" (verb)
>
> http://api.apertium.org/json/translate?langpair=ca|es&q=.%20correu
> gives "correo" (noun)
>
> But I could not reproduce it in local (I guess I'm using the null
> flush in a wrong way), but when using (in local) apertium without null
> flush, both texts (". correu" and "correu") return "correo".

Using the client/server bash scripts from
http://comments.gmane.org/gmane.comp.nlp.apertium/3665

$ ./server apertium-tagger -g -z ca-es.prob & sleep 1

$ echo 'correu' | apertium -d . ca-es-anmor | ./client                          
                                                                                
                   
^correu<n><m><sg>$^.<sent>$

$ echo 'correu' | apertium -d . ca-es-anmor | ./client 
^córrer<vblex><imp><p2><pl>$^.<sent>$

$ echo 'correu' | apertium -d . ca-es-anmor | ./client 
^córrer<vblex><imp><p2><pl>$^.<sent>$

$ echo '. correu' | apertium -d . ca-es-anmor | ./client 
^.<sent>$ ^correu<n><m><sg>$^.<sent>$

$ echo 'correu' | apertium -d . ca-es-anmor | ./client 
^córrer<vblex><imp><p2><pl>$^.<sent>$


So right after startup, we get <n>, then we get <vblex>. A period gives
an <n> again, then we are back to vblex.

> My guess is that when the tagger gets a null character, all the vector
> are not reinitialized (the only thing that it does related to
> reinitializing is this, starting at line #856)
>
> if(morpho_stream.getEndOfFile())
> {
> if(null_flush)
> {
> fputwc_unlocked(L'\0', out);
> }
>
> fflush(out);
> morpho_stream.setEndOfFile(false);
> }

This seems to work:

    if(morpho_stream.getEndOfFile())
    {
      if(null_flush)
      { 
        fputwc_unlocked(L'\0', out);
        tags.clear();
        tags.insert(eos);
        alpha[0][eos] = 1;
      }
      
      fflush(out);
      morpho_stream.setEndOfFile(false);
    }

I'm not 100 % sure if "alpha[0][eos] = 1;" is needed. It seems to give
the same result without it too, but it is set when starting the tagging
and doesn't hurt so I included it.

Felipe (or anyone else who understands the HMM): does this look right?
Should I just commit?


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C

Attachment: pgpZ489Ehz_5S.pgp
Description: PGP signature

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to