[Apertium-stuff] Ôdp: Ôdp: Ôdp: Questions about lexical selection

Grzegorz Kulik Wed, 22 Dec 2021 05:35:22 -0800

I have another question. In Polish "ich" can mean both "their" and
"them". The tagger always chooses the "their" meaning, so I want to
create a negative match, so that if the word is not followed by a noun,
it would choose "them". I'm trying to put it together based on the
documentation but I must be missing something.
REMOVE ICH IF (0 ICH LINK NOT 1 NOUN);
If this worked, it would tag the wordform as an inflected personal
pronoun. What am I doing wrong?
BestGreg
We wtorek, 21 gru 2021 ô godzinie 17:59, Grzegorz Kulik (
gregorykku...@gmail.com) pisze:
> Now I get it. Thank you very much!
> 
> Best
> Greg
> 
> We wtorek, 21 gru 2021 ô godzinie 10:27, Daniel Swanson (
> awesomeevildu...@gmail.com) pisze:
> > (1C NOUN) would match ^a/b<n>/c<n>$ or ^a/b<n>$ but it would not
> > match^a/b<n>/c<adj>$ - you can read it as "if the next word can
> > only be anoun".
> > On Tue, Dec 21, 2021 at 10:10 AM Grzegorz Kulik <
> > gregorykku...@gmail.com> wrote:
> > > Thank you both for the suggestions. I never considered CG because
> > > it looked complicated but I actually got a grip of it right away.
> > > I went with:
> > > REMOVE NOUN IF (0 DET) (0 NOUN) (1 (n mp));
> > > and it works perfectly. It did not work with 1C there. I looked
> > > up the C symbol in the documentation and it says "Every reading
> > > this position must match the pattern (normally only 1 has to)". I
> > > don't know what this sentence means. Every time this position is
> > > read, it must match the pattern? Can I find any elaboration on
> > > this anywhere? I checked http://beta.visl.sdu.dk/cg3/single/ but
> > > can't seem to find anything about it there.
> > > Thank you!Greg
> > > We wtorek, 21 gru 2021 ô godzinie 09:25, Hèctor Alòs i Font (
> > > hectora...@gmail.com) pisze:
> > > 
> > > 
> > > Missatge de Daniel Swanson <awesomeevildu...@gmail.com> del dia
> > > dt., 21 de des. 2021 a les 7:57:
> > > Hi Greg,
> > > The file where you want to write rules for this is
> > > https://github.com/apertium/apertium-pol/blob/master/apertium-pol.pol.rlx
> > > 
> > > If you want something like "tacy is <det> before <n>", you could
> > > get that with
> > > SELECT DET IF (0 DET) (0 NOUN) (1 NOUN) ;
> > > 
> > > The problem with this rule is that (1 NOUN) is not necessarily a
> > > noun, but something that can be analysed as a noun at the moment
> > > this rule is executed. Similarly, the 0 word may be correctly
> > > analysed as something else, like an adjective. So, a more
> > > cautious rule can be, for instance:
> > > REMOVE NOUN IF (0 DET) (0 NOUN) (1C NOUN) ;
> > > The problem with this alternative variant of the rule is that it
> > > matches less often than the first one. It may not solve cases
> > > Daniel's version solve, although it probably makes less wrong
> > > decisions. Your knowledge of the language, and testing on corpus,
> > > should help you decide what is better, or maybe you will choose
> > > something else in the middle. Tuning can be done adding a few
> > > rules, previous to the general one, for often words/cases.
> > > Hèctor
> > > 
> > > 
> > > Daniel
> > > On Mon, Dec 20, 2021 at 1:40 PM Grzegorz Kulik <
> > > gregorykku...@gmail.com> wrote:
> > > > Hello all,
> > > > I haven't contacted you for some time, I hope you are all well.
> > > > I developed the pol-szl pair and although the translation is
> > > > quite reasonable, I decided to make it better by improving the
> > > > lexical selection. I've been reading the documentation and
> > > > managed to write several rules for forms that need
> > > > disambiguation and are the same parts of speech. However, I
> > > > cannot find any information anywhere about what to do if there
> > > > is a form that can mean two completely different things.
> > > > Example in Polish:
> > > > tacy (such) = taki<det><dem><mp><pl><nom>tacy (of a tablet) =
> > > > taca<n><f><sg><gen>/taca<n><f><sg><dat>/taca<n><f><sg><loc>
> > > > The first meaning is obviously much more frequent but the
> > > > translator chooses the second one, which is less than
> > > > desirable.
> > > > What can I do to remedy this? Can I write rules for that
> > > > manually? Should I train the tagger? If so, what method would
> > > > be the best? There's multiple training methods and I don't know
> > > > which one to choose for my pair. Could you recommend me the
> > > > best approach?
> > > > Thank you in
> > > > advanceGreg_______________________________________________Apert
> > > > ium-stuff mailing listapertium-st...@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > > 
> > > _______________________________________________Apertium-stuff
> > > mailing listapertium-st...@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > > 
> > > _______________________________________________
> > > Apertium-stuff mailing list
> > > Apertium-stuff@lists.sourceforge.net
> > > 
> > > 
> > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > > 
> > > 
> > > _______________________________________________Apertium-stuff
> > > mailing listapertium-st...@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > 
> > _______________________________________________Apertium-stuff
> > mailing listapertium-st...@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> 
> _______________________________________________Apertium-stuff mailing
> listapertium-st...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Ôdp: Ôdp: Ôdp: Questions about lexical selection

Reply via email to