Francis Tyers <fty...@prompsit.com> čálii:

[...]

>>> That would be a good feature, but wouldn't get past the issue of the
>>> tagger/cg. E.g. if we do that then the tagger can't take into account
>>> context.
>>
>> Isn't that the whole point? (Ie. treat each line as completely
>> independent, no context.)
>
> I don't think so, I think Mansur wants the tagger to disambiguate
> according
> to the context, but have it in line-by-line output, like TreeTagger or
> UDpipe
> etc.

Well, it's only lt-proc doing the moving, so just move the NUL-deletion
before cg-proc:

   cat corpus.txt                     \
   | tr -d '\0'                       \
   | apertium-deshtml -n              \
   | sed 's/\[$/[][/; s/^]/]\x00/'    \
   | lt-proc -z -w 'tat.automorf.bin' \
   | tr -d '\0'                       \
   | cg-proc -z  'tat.rlx.bin'        \
   | cg-proc -z -w -1 dev/mansur.bin' \
   | apertium-rehtml-noent

Now only lt-proc should treat end-of-line as a stream delimiter.

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to