On Tue, 1 Nov 2022 at 11:45, Kevin Brubeck Unhammer <unham...@fsfe.org> wrote:
> Hèctor Alòs i Font <hectoralos-re5jqeeqqe8avxtiumw...@public.gmane.org> > čálii: > > > As for your proposal, I do not yet have sufficient knowledge of CG to > fully > > understand it. My idea would be to make a first pass through a whole text > > to understand if enunciatives are used in it (for example, recognising > > other, more infrequent, but more easily recognisable enunciatives). In > the > > solution you propose, it seems that this knowledge is acquired > > progressively, as sentences are translated. I fear that "que" is so messy > > that at least the first sentences of a text would have the same problems > as > > we have now when we translate a Gascon text without enunciatives. > > That should be possible too, though I'm not sure how feasible it is to > get CG to go that far into a text. By default, CG keeps a context of two > windows, but that's configurable. It should be possible (perhaps with > minor modifications to cg-proc) to read a bunch of sentences and use > Window Spanning tests https://visl.sdu.dk/cg3/single/#test-spanning > > Tino, have you tried looking ahead several paragraphs, are there any > downsides? This should be a fairly simple rule file. > The max I've seen in production is 9 windows, but there is no hard limit. Just have to be careful of spanning tests, as they are going to look ahead for every active window. A multi-pass system will perform better, and for this particular task I'd say multi-pass is the correct approach. -- Tino Didriksen
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff