Tanmai Khanna <khanna.tan...@gmail.com>
čálii:

>> So currently if I have the multiword "i dag", it'll recognize
> "i<i>dag</i>" but it won't recognize "i <i>dag</i>"? (And I suppose if
> I have the non-multiword "today" it won't recognize "to<i>day</i>".)
>
> Exactly, but even when it recognises "i<i>dag</i>", the <i> will probably
> be lost because it's being seen as a normal blank.
>
>> One possibility might be to have wordbound blanks match "space or
> epsilon" in lt-proc – then it would recognize all of the above.
>
> I had to do this for postgeneration and it wasn't trivial, so it's not like
> I can't do it for the analyser as well, but we decided that all multiword
> matches will be offloaded to apertium-separable, so the individual parts
> can be analysed as LUs and then apertium-separable can combine them into
> one LU. I have already modified apertium-separable such that it applies the
> individual markups on the final MWE. If this is done then
> both "i<i>dag</i>" and "i <i>dag</i>" will be recognised and the italics
> will apply on the entire word.
>
> If this isn't acceptable or too much of an inconvenience, then I can modify
> the analyser.

Using separable for those cases seems like a good solution to me :)

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to