Re: [Apertium-stuff] New Occitan-French release

Tino Didriksen Thu, 03 Nov 2022 05:58:47 -0700

On Tue, 1 Nov 2022 at 11:45, Kevin Brubeck Unhammer <unham...@fsfe.org>
wrote:


> Hèctor Alòs i Font <hectoralos-re5jqeeqqe8avxtiumw...@public.gmane.org>
> čálii:
>
> > As for your proposal, I do not yet have sufficient knowledge of CG to
> fully
> > understand it. My idea would be to make a first pass through a whole text
> > to understand if enunciatives are used in it (for example, recognising
> > other, more infrequent, but more easily recognisable enunciatives). In
> the
> > solution you propose, it seems that this knowledge is acquired
> > progressively, as sentences are translated. I fear that "que" is so messy
> > that at least the first sentences of a text would have the same problems
> as
> > we have now when we translate a Gascon text without enunciatives.
>
> That should be possible too, though I'm not sure how feasible it is to
> get CG to go that far into a text. By default, CG keeps a context of two
> windows, but that's configurable. It should be possible (perhaps with
> minor modifications to cg-proc) to read a bunch of sentences and use
> Window Spanning tests https://visl.sdu.dk/cg3/single/#test-spanning
>
> Tino, have you tried looking ahead several paragraphs, are there any
> downsides? This should be a fairly simple rule file.
>

The max I've seen in production is 9 windows, but there is no hard limit.
Just have to be careful of spanning tests, as they are going to look ahead
for every active window. A multi-pass system will perform better, and for
this particular task I'd say multi-pass is the correct approach.

-- Tino Didriksen

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] New Occitan-French release

Reply via email to