Francis Tyers <fty...@prompsit.com> čálii:

> El 2017-07-01 09:18, Kevin Brubeck Unhammer escribió:
>> Jaume Ortolà i Font
>> <jaumeort...@gmail.com> čálii:

[...]

>>> This could be solved differently. I think these contractions should
>>> be tokenized earlier in the pipeline as two
>>> tokens. This way we would avoid a lot of exceptions and workarounds
>>> when dealing with them. Is it feasible?
>>> These contractions are extremely frequent and now they cause a lot
>>> of undesired results.
>>
>> Yeah, you could split them early too. If "del" isn't ambiguous then you
>> don't really gain much by keeping it as one lexical unit.
>
> How would you do that ? :)

The easiest way I can think of would be just to add a pre-disambiguation
CG that does nothing but split "<del>" (ADDCOHORT/REMCOHORT).

Attachment: signature.asc
Description: PGP signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to