Tanmai Khanna <khanna.tan...@gmail.com> čálii: >> So currently if I have the multiword "i dag", it'll recognize > "i<i>dag</i>" but it won't recognize "i <i>dag</i>"? (And I suppose if > I have the non-multiword "today" it won't recognize "to<i>day</i>".) > > Exactly, but even when it recognises "i<i>dag</i>", the <i> will probably > be lost because it's being seen as a normal blank. > >> One possibility might be to have wordbound blanks match "space or > epsilon" in lt-proc – then it would recognize all of the above. > > I had to do this for postgeneration and it wasn't trivial, so it's not like > I can't do it for the analyser as well, but we decided that all multiword > matches will be offloaded to apertium-separable, so the individual parts > can be analysed as LUs and then apertium-separable can combine them into > one LU. I have already modified apertium-separable such that it applies the > individual markups on the final MWE. If this is done then > both "i<i>dag</i>" and "i <i>dag</i>" will be recognised and the italics > will apply on the entire word. > > If this isn't acceptable or too much of an inconvenience, then I can modify > the analyser.
Using separable for those cases seems like a good solution to me :)
signature.asc
Description: PGP signature
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff