Francis Tyers <fty...@prompsit.com> čálii: > El 2017-07-01 09:18, Kevin Brubeck Unhammer escribió: >> Jaume Ortolà i Font >> <jaumeort...@gmail.com> čálii:
[...] >>> This could be solved differently. I think these contractions should >>> be tokenized earlier in the pipeline as two >>> tokens. This way we would avoid a lot of exceptions and workarounds >>> when dealing with them. Is it feasible? >>> These contractions are extremely frequent and now they cause a lot >>> of undesired results. >> >> Yeah, you could split them early too. If "del" isn't ambiguous then you >> don't really gain much by keeping it as one lexical unit. > > How would you do that ? :) The easiest way I can think of would be just to add a pre-disambiguation CG that does nothing but split "<del>" (ADDCOHORT/REMCOHORT).
signature.asc
Description: PGP signature
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff