What if you do

lt-proc oci.automorf.bin | cg-proc enondetect.rlx.bin | cg-proc oci.rlx.bin | …

The first CG step would output a stream variable, so that what the next
step sees is

[more text here]

If the next step is CG, it's just

 REMOVE:var-is-set (enon) IF (0 (VAR:non-enon)) ;

ie. remove enunciatives whenever the var is set.

One can also unset it in the middle of the stream (if doing corpus
runs), so output of the enon-detector is

[more text here]
[more text here]

and the REMOVE:var-is-set rule will remove enunciatives in the first
part, not after seeing the REMVARIABLE.

Then the problem of looking several windows ahead is restricted to that
first enon-detector step.


Alternatively, if we assume all the input is of the same language, we
just don't know what language it is ahead of time, then you could
do several passes, where one is a detector pipeline like

lt-proc oci.automorf.bin | cg-proc enondetect.rlx.bin

that outputs the STREAMCMD and then Apy would grep for that, and insert
the STREAMCMD at the start of the call to the regular pipeline

lt-proc oci.automorf.bin | cg-proc oci.rlx.bin | …

That won't automatically work in modes files, and won't work for corpus
tests if the corpus has a mix, but OTOH you could use 'export
AP_SETVAR=non-enon' to force the regular pipeline to insert the
STREAMCMD at the start.

Attachment: signature.asc
Description: PGP signature

Apertium-stuff mailing list

Reply via email to