Thank you both for the suggestions. I never considered CG because it looked complicated but I actually got a grip of it right away. I went with: REMOVE NOUN IF (0 DET) (0 NOUN) (1 (n mp)); and it works perfectly. It did not work with 1C there. I looked up the C symbol in the documentation and it says "Every reading this position must match the pattern (normally only 1 has to)". I don't know what this sentence means. Every time this position is read, it must match the pattern? Can I find any elaboration on this anywhere? I checked http://beta.visl.sdu.dk/cg3/single/ but can't seem to find anything about it there. Thank you!Greg We wtorek, 21 gru 2021 ô godzinie 09:25, Hèctor Alòs i Font ( hectora...@gmail.com) pisze: > Missatge de Daniel Swanson <awesomeevildu...@gmail.com> del dia dt., > 21 de des. 2021 a les 7:57: > > Hi Greg, > > > > > > > > The file where you want to write rules for this is > > > > https://github.com/apertium/apertium-pol/blob/master/apertium-pol.pol.rlx > > > > > > > > If you want something like "tacy is <det> before <n>", you could > > get that with > > > > > > > > SELECT DET IF (0 DET) (0 NOUN) (1 NOUN) ; > > The problem with this rule is that (1 NOUN) is not necessarily a > noun, but something that can be analysed as a noun at the moment this > rule is executed. Similarly, the 0 word may be correctly analysed as > something else, like an adjective. So, a more cautious rule can be, > for instance: > REMOVE NOUN IF (0 DET) (0 NOUN) (1C NOUN) ; > > The problem with this alternative variant of the rule is that it > matches less often than the first one. It may not solve cases > Daniel's version solve, although it probably makes less wrong > decisions. Your knowledge of the language, and testing on corpus, > should help you decide what is better, or maybe you will choose > something else in the middle. Tuning can be done adding a few rules, > previous to the general one, for often words/cases. > Hèctor > > > > Daniel > > > > > > > > On Mon, Dec 20, 2021 at 1:40 PM Grzegorz Kulik < > > gregorykku...@gmail.com> wrote: > > > > > > > > > > Hello all, > > > > > > > > > > I haven't contacted you for some time, I hope you are all well. I > > developed the pol-szl pair and although the translation is quite > > reasonable, I decided to make it better by improving the lexical > > selection. I've been reading the documentation and managed to write > > several rules for forms that need disambiguation and are the same > > parts of speech. However, I cannot find any information anywhere > > about what to do if there is a form that can mean two completely > > different things. Example in Polish: > > > > > > > > > > tacy (such) = taki<det><dem><mp><pl><nom> > > > > > tacy (of a tablet) = > > taca<n><f><sg><gen>/taca<n><f><sg><dat>/taca<n><f><sg><loc> > > > > > > > > > > The first meaning is obviously much more frequent but the > > translator chooses the second one, which is less than desirable. > > > > > > > > > > What can I do to remedy this? Can I write rules for that > > manually? Should I train the tagger? If so, what method would be > > the best? There's multiple training methods and I don't know which > > one to choose for my pair. Could you recommend me the best > > approach? > > > > > > > > > > Thank you in advance > > > > > Greg > > > > > _______________________________________________ > > > > > Apertium-stuff mailing list > > > > > Apertium-stuff@lists.sourceforge.net > > > > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > > > > > > > > > _______________________________________________ > > > > Apertium-stuff mailing list > > > > Apertium-stuff@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > _______________________________________________Apertium-stuff mailing > listapertium-st...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff