Hèctor,
The extra blank there is because there's a blank in your rule output. See:

$ echo "^052<num>/052<num>$^F<n><m><sp>/F<n><m><sp>$" | apertium-transfer
-z -b 'apertium-fra-frp.frp-fra.t1x' 'frp-fra.t1x.bin'

^num_n<SN><m><sp><sl_m><sl_sp>{^052<num>$ ^F<n><m><sp>$}$


The rule for num_n has a <b/> in the rule output and hence there's a space.
The reason earlier there wasn't space was because an empty string was
considered a blank. Now, if you don't want a space between the LUs in the
rule output, you just don't put a <b/>. So if you remove the <b/> from the
num_n rule it will start working properly. Earlier you used to add a <b/>
everytime the rule had multiple LUs in the output but now *you only add a
<b/> if you want a space/blank between the output words.*


Try removing the <b/> and it should work.


As for the discussion about  I<b/>ér o 5<b/>e, we all agreed that we don't
want them in the dictionaries and hence you can analyse them as individual
LUs and then using apertium-separable you can combine them into one LU.
Finally, the space between l and ér shouldn't appear in the rule output and
it is because of an issue that's still being fixed. But it'll be fine soon
:)



*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Sep 3, 2020 at 11:46 PM Hèctor Alòs i Font <hectora...@gmail.com>
wrote:

> Hi Tanmai,
>
> Yes, hyphens and quotes (") seem to be solved. But the system persists to
> add blanks where there were not. For instance, this causes that we get now
> strange Unicode codes:
>
> 05076. Table des caractères Unicode U+0500 à U+052F.
> < 05076. Tâbla des caractèros Unicode *U+0500 a *U+052F.
> ---
> > 05076. Tâbla des caractèros Unicode *U+0500 a *U+052 F.
>
> The same for names of standards (e.g. 802.3j), road names, car (Fiat
> 621RN) or plane (EA-18G Growler) models, etc.
>
> On the <sup>...</sup> I wouldn't say that it is very beautiful. It could
> be misleading if there is just one character, as it often happens, like in
> 5e. In any case, what most interests me is how to deal with these things
> in the dictionaries. That's not a problem of the new blank-treatment or
> Transfuse. That's a problem we already had, but I never thought about it. I
> wouldn't like to have I<b/>ér o 5<b/>e in the dictionaries. It may cause
> problems, i.a. because ér and e can be words of their own, so we'll get a
> wrong morphological analysis.
>
> Hèctor
>
>
>
>
> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dj., 3 de
> set. 2020 a les 18:57:
>
>> Hèctor can you check the page on beta now? The hyphen and the superscript
>> issues are solved. Of course, there's now a space between l and ér. If
>> that's a big problem we can discuss other solutions.
>>
>> *तन्मय खन्ना *
>> *Tanmai Khanna*
>>
>>
>> On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen <m...@tinodidriksen.com>
>> wrote:
>>
>>> I have adjusted Transfuse with how spaces are treated for Apertium, and
>>> implemented adding temporary spaces around <sub> and <sup>. Changes are
>>> deployed on beta.
>>>
>>> I repeat my plea that all symbols should have an analysis. It breaks
>>> markup that things like - and : are not tokens.
>>>
>>> -- Tino Didriksen
>>>
>>>
>>> On Wed, 2 Sep 2020 at 13:23, Tino Didriksen <m...@tinodidriksen.com>
>>> wrote:
>>>
>>>> That's not something the pipe ever sees - you can't fix it on your end.
>>>> It's something I have to adjust in Transfuse.
>>>>
>>>> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
>>>> and L629 expands inline tags to encompass surrounding plain text, because
>>>> it is unfortunately common for formatting to be partially on a word while
>>>> you really want the whole word translated as a unit.
>>>>
>>>> However, for HTML I should add spaces around <sub> and <sup> so that
>>>> they can't gobble up their surroundings. Tracked as
>>>> https://github.com/TinoDidriksen/Transfuse/issues/7
>>>>
>>>> -- Tino Didriksen
>>>>
>>>>
>>>> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font <hectora...@gmail.com>
>>>> wrote:
>>>>
>>>>> I'm taking a look on how this list of names on Wikipedia:
>>>>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
>>>>> and how it is translated in beta.apertium:
>>>>> https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>>>>>
>>>>> There still are quite a few problems with HTML-tags if we look that
>>>>> the whole Iér is becoming a superscript, and also with italics. The space
>>>>> after the hyphen is an already known problem.
>>>>>
>>>>> By the way, I wonder whether it is possible to match in our
>>>>> dictionaries I<sup>ér</sup>. I have Iér in the dictionary, but when the
>>>>> ending ér stays as a superscript, as usually done in the texts, it is not
>>>>> matched. Should I add I<b/>ér to the dictionary?
>>>>>
>>>>> Hèctor
>>>>>
>>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to