Re: [Apertium-stuff] [libvoikko] Lttoolbox (Apertium) morphology backend
El dv 02 de 09 de 2011 a les 13:24 +, en/na Francis Tyers va escriure: > El dv 02 de 09 de 2011 a les 11:13 +0200, en/na Kevin Brubeck Unhammer > va escriure: > > Kevin Brubeck Unhammer writes: > > > > > Francis Tyers writes: > > > > > >> El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va > > >> escriure: > > >>> On Sunday 28 February 2010, Francis Tyers wrote: > > >>> > > I don't know Icelandic at all and therefore can't tell whether some > > >>> > > of > > >>> > > the words are accepted or rejected incorrectly. > > >>> > > > >>> > Nice, it looks good. Some of the capitalised words should be > > >>> > recognised > > >>> > corrected, at least 'Bretlandi' and 'Norðmenn' . > > >>> > > >>> I tried to fix the checking of capitalized words but started to run > > >>> into > > >>> problems. It seems that the library API works in somewhat surprising > > >>> (at least > > >>> to me) ways when you enter a word that starts with a capital letter and > > >>> ends > > >>> with garbage. > > >>> > > >>> The implementation is here > > >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup > > >>> > > >>> and test cases here > > >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup > > >>> > > >>> I was able to get all test cases expect the one with TODO in method > > >>> name > > >>> implemented. How would you suggest fixing the code so that all tests > > >>> would > > >>> pass? Of course a patch would be most welcome :) > > >> > > >> Hmm, strangely enough, when I try an unknown word I get similar strange > > >> output: > > >> > > >> $ ./test mor.bin > > >> ^Reykjanghfghesi$ --> > > >> ^Reykja/Reykja/Reykur$ > > > > > > Seems to be a bug with partly-matching regexes in the biltrans > > > functions. > > > > > > Testing the different functions, I get: > > > > > > biltransWithQueue: > > > ^Reykja/Reykja/Reykur$ > > > qSize: 0 > > > biltransWithoutQueue: > > > ^Reykja/Reykja/Reykur$ > > > biltrans: > > > ^Reykja/Reykja/Reykur$ > > > biltransfull: ^$ > > > > > > But, if I comment out the two regex entries > > > > > > > > > > > > > > > at the end of apertium-is-en.is.dix, I get > > > > > > biltransWithQueue: @Reykjanghfghesi qSize: 0 > > > biltransWithoutQueue: @Reykjanghfghesi > > > biltrans: @Reykjanghfghesi > > > biltransfull: @Reykjanghfghesi > > > > > > Similarly on the command line with lt-proc -b (while regular lt-proc -a > > > returns unknown, as it should – the persons/orgnisations regexes don't > > > fully match either). > > > > I put a patch up at > > http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=131 which > > solves this for both lt-proc -b, as well as biltransWithQueue. Please > > test. > > > > I haven't tried with the other biltrans* functions (I can't see that > > they're actually used in the rest of Apertium, so I'm not sure what > > they're there for). > > > > It also fixes a problem where superfluous characters after tags would > > pass as matches in lt-proc -b (this bug was not present in > > biltransWithQueue). It's still possible to carry over _tags_ after the > > analysis of course. > > > > > > I guess it's not strange that this bug was here, since normally you > > never have words without tags in bidix, but when using these functions > > on a monodix it of course becomes a problem. (And, although it's not > > recommended, if people really do want to have non-tagged lemmas in > > bidix, lttoolbox should at least not give analyses for lemmas that are > > _not_ in the bidix.) > > > > > > best regards, > > Kevin Brubeck Unhammer > > Looks good to me, and to Jim. We suggest commit and close. I'm going to > do one final test, running a corpus with lt-proc -b before and after the > patch and see if there are any difference. I'll report back soon. $ wc -l /tmp/ca-BILTRANS.* 376857 /tmp/ca-BILTRANS.new 376857 /tmp/ca-BILTRANS.old 753714 total $ cmp /tmp/ca-BILTRANS.old /tmp/ca-BILTRANS.new No changes in ca->en over 376857 lines of the Catalan Wikipedia. Fran -- Special Offer -- Download ArcSight Logger for FREE! Finally, a world-class log management solution at an even better price-free! And you'll get a free "Love Thy Logs" t-shirt when you download Logger. Secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsisghtdev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [libvoikko] Lttoolbox (Apertium) morphology backend
El dv 02 de 09 de 2011 a les 11:13 +0200, en/na Kevin Brubeck Unhammer va escriure: > Kevin Brubeck Unhammer writes: > > > Francis Tyers writes: > > > >> El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va > >> escriure: > >>> On Sunday 28 February 2010, Francis Tyers wrote: > >>> > > I don't know Icelandic at all and therefore can't tell whether some of > >>> > > the words are accepted or rejected incorrectly. > >>> > > >>> > Nice, it looks good. Some of the capitalised words should be recognised > >>> > corrected, at least 'Bretlandi' and 'Norðmenn' . > >>> > >>> I tried to fix the checking of capitalized words but started to run into > >>> problems. It seems that the library API works in somewhat surprising (at > >>> least > >>> to me) ways when you enter a word that starts with a capital letter and > >>> ends > >>> with garbage. > >>> > >>> The implementation is here > >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup > >>> > >>> and test cases here > >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup > >>> > >>> I was able to get all test cases expect the one with TODO in method name > >>> implemented. How would you suggest fixing the code so that all tests > >>> would > >>> pass? Of course a patch would be most welcome :) > >> > >> Hmm, strangely enough, when I try an unknown word I get similar strange > >> output: > >> > >> $ ./test mor.bin > >> ^Reykjanghfghesi$ --> > >> ^Reykja/Reykja/Reykur$ > > > > Seems to be a bug with partly-matching regexes in the biltrans > > functions. > > > > Testing the different functions, I get: > > > > biltransWithQueue: > > ^Reykja/Reykja/Reykur$ > > qSize: 0 > > biltransWithoutQueue: > > ^Reykja/Reykja/Reykur$ > > biltrans: > > ^Reykja/Reykja/Reykur$ > > biltransfull: ^$ > > > > But, if I comment out the two regex entries > > > > > > > > > > at the end of apertium-is-en.is.dix, I get > > > > biltransWithQueue: @Reykjanghfghesi qSize: 0 > > biltransWithoutQueue: @Reykjanghfghesi > > biltrans: @Reykjanghfghesi > > biltransfull: @Reykjanghfghesi > > > > Similarly on the command line with lt-proc -b (while regular lt-proc -a > > returns unknown, as it should – the persons/orgnisations regexes don't > > fully match either). > > I put a patch up at > http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=131 which > solves this for both lt-proc -b, as well as biltransWithQueue. Please > test. > > I haven't tried with the other biltrans* functions (I can't see that > they're actually used in the rest of Apertium, so I'm not sure what > they're there for). > > It also fixes a problem where superfluous characters after tags would > pass as matches in lt-proc -b (this bug was not present in > biltransWithQueue). It's still possible to carry over _tags_ after the > analysis of course. > > > I guess it's not strange that this bug was here, since normally you > never have words without tags in bidix, but when using these functions > on a monodix it of course becomes a problem. (And, although it's not > recommended, if people really do want to have non-tagged lemmas in > bidix, lttoolbox should at least not give analyses for lemmas that are > _not_ in the bidix.) > > > best regards, > Kevin Brubeck Unhammer Looks good to me, and to Jim. We suggest commit and close. I'm going to do one final test, running a corpus with lt-proc -b before and after the patch and see if there are any difference. I'll report back soon. Fran -- Special Offer -- Download ArcSight Logger for FREE! Finally, a world-class log management solution at an even better price-free! And you'll get a free "Love Thy Logs" t-shirt when you download Logger. Secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsisghtdev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [libvoikko] Lttoolbox (Apertium) morphology backend
Kevin Brubeck Unhammer writes: > Francis Tyers writes: > >> El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va >> escriure: >>> On Sunday 28 February 2010, Francis Tyers wrote: >>> > > I don't know Icelandic at all and therefore can't tell whether some of >>> > > the words are accepted or rejected incorrectly. >>> > >>> > Nice, it looks good. Some of the capitalised words should be recognised >>> > corrected, at least 'Bretlandi' and 'Norðmenn' . >>> >>> I tried to fix the checking of capitalized words but started to run into >>> problems. It seems that the library API works in somewhat surprising (at >>> least >>> to me) ways when you enter a word that starts with a capital letter and >>> ends >>> with garbage. >>> >>> The implementation is here >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup >>> >>> and test cases here >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup >>> >>> I was able to get all test cases expect the one with TODO in method name >>> implemented. How would you suggest fixing the code so that all tests would >>> pass? Of course a patch would be most welcome :) >> >> Hmm, strangely enough, when I try an unknown word I get similar strange >> output: >> >> $ ./test mor.bin >> ^Reykjanghfghesi$ --> >> ^Reykja/Reykja/Reykur$ > > Seems to be a bug with partly-matching regexes in the biltrans > functions. > > Testing the different functions, I get: > > biltransWithQueue: > ^Reykja/Reykja/Reykur$ > qSize: 0 > biltransWithoutQueue: > ^Reykja/Reykja/Reykur$ > biltrans: > ^Reykja/Reykja/Reykur$ > biltransfull: ^$ > > But, if I comment out the two regex entries > > > > > at the end of apertium-is-en.is.dix, I get > > biltransWithQueue: @Reykjanghfghesi qSize: 0 > biltransWithoutQueue: @Reykjanghfghesi > biltrans: @Reykjanghfghesi > biltransfull: @Reykjanghfghesi > > Similarly on the command line with lt-proc -b (while regular lt-proc -a > returns unknown, as it should – the persons/orgnisations regexes don't > fully match either). I put a patch up at http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=131 which solves this for both lt-proc -b, as well as biltransWithQueue. Please test. I haven't tried with the other biltrans* functions (I can't see that they're actually used in the rest of Apertium, so I'm not sure what they're there for). It also fixes a problem where superfluous characters after tags would pass as matches in lt-proc -b (this bug was not present in biltransWithQueue). It's still possible to carry over _tags_ after the analysis of course. I guess it's not strange that this bug was here, since normally you never have words without tags in bidix, but when using these functions on a monodix it of course becomes a problem. (And, although it's not recommended, if people really do want to have non-tagged lemmas in bidix, lttoolbox should at least not give analyses for lemmas that are _not_ in the bidix.) best regards, Kevin Brubeck Unhammer -- Special Offer -- Download ArcSight Logger for FREE! Finally, a world-class log management solution at an even better price-free! And you'll get a free "Love Thy Logs" t-shirt when you download Logger. Secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsisghtdev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [libvoikko] Lttoolbox (Apertium) morphology backend
Francis Tyers writes: > El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va > escriure: >> On Sunday 28 February 2010, Francis Tyers wrote: >> > > I don't know Icelandic at all and therefore can't tell whether some of >> > > the words are accepted or rejected incorrectly. >> > >> > Nice, it looks good. Some of the capitalised words should be recognised >> > corrected, at least 'Bretlandi' and 'Norðmenn' . >> >> I tried to fix the checking of capitalized words but started to run into >> problems. It seems that the library API works in somewhat surprising (at >> least >> to me) ways when you enter a word that starts with a capital letter and ends >> with garbage. >> >> The implementation is here >> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup >> >> and test cases here >> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup >> >> I was able to get all test cases expect the one with TODO in method name >> implemented. How would you suggest fixing the code so that all tests would >> pass? Of course a patch would be most welcome :) > > Hmm, strangely enough, when I try an unknown word I get similar strange > output: > > $ ./test mor.bin > ^Reykjanghfghesi$ --> > ^Reykja/Reykja/Reykur$ Seems to be a bug with partly-matching regexes in the biltrans functions. Testing the different functions, I get: biltransWithQueue: ^Reykja/Reykja/Reykur$ qSize: 0 biltransWithoutQueue: ^Reykja/Reykja/Reykur$ biltrans: ^Reykja/Reykja/Reykur$ biltransfull: ^$ But, if I comment out the two regex entries at the end of apertium-is-en.is.dix, I get biltransWithQueue: @Reykjanghfghesi qSize: 0 biltransWithoutQueue: @Reykjanghfghesi biltrans: @Reykjanghfghesi biltransfull: @Reykjanghfghesi Similarly on the command line with lt-proc -b (while regular lt-proc -a returns unknown, as it should – the persons/orgnisations regexes don't fully match either). -- Kevin Brubeck Unhammer -- uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff