Hi Tanmai,

Yes, hyphens and quotes (") seem to be solved. But the system persists to
add blanks where there were not. For instance, this causes that we get now
strange Unicode codes:

05076. Table des caractères Unicode U+0500 à U+052F.
< 05076. Tâbla des caractèros Unicode *U+0500 a *U+052F.
---
> 05076. Tâbla des caractèros Unicode *U+0500 a *U+052 F.

The same for names of standards (e.g. 802.3j), road names, car (Fiat 621RN)
or plane (EA-18G Growler) models, etc.

On the <sup>...</sup> I wouldn't say that it is very beautiful. It could be
misleading if there is just one character, as it often happens, like in 5e.
In any case, what most interests me is how to deal with these things in the
dictionaries. That's not a problem of the new blank-treatment or Transfuse.
That's a problem we already had, but I never thought about it. I wouldn't
like to have I<b/>ér o 5<b/>e in the dictionaries. It may cause problems,
i.a. because ér and e can be words of their own, so we'll get a wrong
morphological analysis.

Hèctor




Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dj., 3 de set.
2020 a les 18:57:

> Hèctor can you check the page on beta now? The hyphen and the superscript
> issues are solved. Of course, there's now a space between l and ér. If
> that's a big problem we can discuss other solutions.
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen <m...@tinodidriksen.com>
> wrote:
>
>> I have adjusted Transfuse with how spaces are treated for Apertium, and
>> implemented adding temporary spaces around <sub> and <sup>. Changes are
>> deployed on beta.
>>
>> I repeat my plea that all symbols should have an analysis. It breaks
>> markup that things like - and : are not tokens.
>>
>> -- Tino Didriksen
>>
>>
>> On Wed, 2 Sep 2020 at 13:23, Tino Didriksen <m...@tinodidriksen.com>
>> wrote:
>>
>>> That's not something the pipe ever sees - you can't fix it on your end.
>>> It's something I have to adjust in Transfuse.
>>>
>>> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
>>> and L629 expands inline tags to encompass surrounding plain text, because
>>> it is unfortunately common for formatting to be partially on a word while
>>> you really want the whole word translated as a unit.
>>>
>>> However, for HTML I should add spaces around <sub> and <sup> so that
>>> they can't gobble up their surroundings. Tracked as
>>> https://github.com/TinoDidriksen/Transfuse/issues/7
>>>
>>> -- Tino Didriksen
>>>
>>>
>>> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font <hectora...@gmail.com>
>>> wrote:
>>>
>>>> I'm taking a look on how this list of names on Wikipedia:
>>>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
>>>> and how it is translated in beta.apertium:
>>>> https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>>>>
>>>> There still are quite a few problems with HTML-tags if we look that the
>>>> whole Iér is becoming a superscript, and also with italics. The space after
>>>> the hyphen is an already known problem.
>>>>
>>>> By the way, I wonder whether it is possible to match in our
>>>> dictionaries I<sup>ér</sup>. I have Iér in the dictionary, but when the
>>>> ending ér stays as a superscript, as usually done in the texts, it is not
>>>> matched. Should I add I<b/>ér to the dictionary?
>>>>
>>>> Hèctor
>>>>
>>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to