What if you do lt-proc oci.automorf.bin | cg-proc enondetect.rlx.bin | cg-proc oci.rlx.bin | …
The first CG step would output a stream variable, so that what the next step sees is [<STREAMCMD:SETVARIABLE:non-enon>] ^que/que<enon>/que<itg>$ [more text here] If the next step is CG, it's just REMOVE:var-is-set (enon) IF (0 (VAR:non-enon)) ; ie. remove enunciatives whenever the var is set. One can also unset it in the middle of the stream (if doing corpus runs), so output of the enon-detector is [<STREAMCMD:SETVARIABLE:non-enon>] ^que/que<enon>/que<itg>$ [more text here] [<STREAMCMD:REMVARIABLE:non-enon>] ^que/que<enon>/que<itg>$ [more text here] and the REMOVE:var-is-set rule will remove enunciatives in the first part, not after seeing the REMVARIABLE. Then the problem of looking several windows ahead is restricted to that first enon-detector step. ---- Alternatively, if we assume all the input is of the same language, we just don't know what language it is ahead of time, then you could do several passes, where one is a detector pipeline like lt-proc oci.automorf.bin | cg-proc enondetect.rlx.bin that outputs the STREAMCMD and then Apy would grep for that, and insert the STREAMCMD at the start of the call to the regular pipeline lt-proc oci.automorf.bin | cg-proc oci.rlx.bin | … That won't automatically work in modes files, and won't work for corpus tests if the corpus has a mix, but OTOH you could use 'export AP_SETVAR=non-enon' to force the regular pipeline to insert the STREAMCMD at the start.
signature.asc
Description: PGP signature
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff