Hèctor, The extra blank there is because there's a blank in your rule output. See:
$ echo "^052<num>/052<num>$^F<n><m><sp>/F<n><m><sp>$" | apertium-transfer -z -b 'apertium-fra-frp.frp-fra.t1x' 'frp-fra.t1x.bin' ^num_n<SN><m><sp><sl_m><sl_sp>{^052<num>$ ^F<n><m><sp>$}$ The rule for num_n has a <b/> in the rule output and hence there's a space. The reason earlier there wasn't space was because an empty string was considered a blank. Now, if you don't want a space between the LUs in the rule output, you just don't put a <b/>. So if you remove the <b/> from the num_n rule it will start working properly. Earlier you used to add a <b/> everytime the rule had multiple LUs in the output but now *you only add a <b/> if you want a space/blank between the output words.* Try removing the <b/> and it should work. As for the discussion about I<b/>ér o 5<b/>e, we all agreed that we don't want them in the dictionaries and hence you can analyse them as individual LUs and then using apertium-separable you can combine them into one LU. Finally, the space between l and ér shouldn't appear in the rule output and it is because of an issue that's still being fixed. But it'll be fine soon :) *तन्मय खन्ना * *Tanmai Khanna* On Thu, Sep 3, 2020 at 11:46 PM Hèctor Alòs i Font <hectora...@gmail.com> wrote: > Hi Tanmai, > > Yes, hyphens and quotes (") seem to be solved. But the system persists to > add blanks where there were not. For instance, this causes that we get now > strange Unicode codes: > > 05076. Table des caractères Unicode U+0500 à U+052F. > < 05076. Tâbla des caractèros Unicode *U+0500 a *U+052F. > --- > > 05076. Tâbla des caractèros Unicode *U+0500 a *U+052 F. > > The same for names of standards (e.g. 802.3j), road names, car (Fiat > 621RN) or plane (EA-18G Growler) models, etc. > > On the <sup>...</sup> I wouldn't say that it is very beautiful. It could > be misleading if there is just one character, as it often happens, like in > 5e. In any case, what most interests me is how to deal with these things > in the dictionaries. That's not a problem of the new blank-treatment or > Transfuse. That's a problem we already had, but I never thought about it. I > wouldn't like to have I<b/>ér o 5<b/>e in the dictionaries. It may cause > problems, i.a. because ér and e can be words of their own, so we'll get a > wrong morphological analysis. > > Hèctor > > > > > Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dj., 3 de > set. 2020 a les 18:57: > >> Hèctor can you check the page on beta now? The hyphen and the superscript >> issues are solved. Of course, there's now a space between l and ér. If >> that's a big problem we can discuss other solutions. >> >> *तन्मय खन्ना * >> *Tanmai Khanna* >> >> >> On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen <m...@tinodidriksen.com> >> wrote: >> >>> I have adjusted Transfuse with how spaces are treated for Apertium, and >>> implemented adding temporary spaces around <sub> and <sup>. Changes are >>> deployed on beta. >>> >>> I repeat my plea that all symbols should have an analysis. It breaks >>> markup that things like - and : are not tokens. >>> >>> -- Tino Didriksen >>> >>> >>> On Wed, 2 Sep 2020 at 13:23, Tino Didriksen <m...@tinodidriksen.com> >>> wrote: >>> >>>> That's not something the pipe ever sees - you can't fix it on your end. >>>> It's something I have to adjust in Transfuse. >>>> >>>> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604 >>>> and L629 expands inline tags to encompass surrounding plain text, because >>>> it is unfortunately common for formatting to be partially on a word while >>>> you really want the whole word translated as a unit. >>>> >>>> However, for HTML I should add spaces around <sub> and <sup> so that >>>> they can't gobble up their surroundings. Tracked as >>>> https://github.com/TinoDidriksen/Transfuse/issues/7 >>>> >>>> -- Tino Didriksen >>>> >>>> >>>> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font <hectora...@gmail.com> >>>> wrote: >>>> >>>>> I'm taking a look on how this list of names on Wikipedia: >>>>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8 >>>>> and how it is translated in beta.apertium: >>>>> https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation >>>>> >>>>> There still are quite a few problems with HTML-tags if we look that >>>>> the whole Iér is becoming a superscript, and also with italics. The space >>>>> after the hyphen is an already known problem. >>>>> >>>>> By the way, I wonder whether it is possible to match in our >>>>> dictionaries I<sup>ér</sup>. I have Iér in the dictionary, but when the >>>>> ending ér stays as a superscript, as usually done in the texts, it is not >>>>> matched. Should I add I<b/>ér to the dictionary? >>>>> >>>>> Hèctor >>>>> >>>> _______________________________________________ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff