Note : Sent again. My account to this mailing list was disabled after sending 
it !

User-Agent: libremail : logiciel libre multilingue
Date: Sat, 21 Jan 2017 15:03:11 +0100
From: "Bernard Chardonneau" <[email protected]>
To: [email protected]
Subject: Re: [Apertium-stuff] Duplicate entries in apertium-fr-es.fr.metadix

> User-Agent: Roundcube Webmail/1.2.3
> Date: Thu, 19 Jan 2017 14:16:46 +0100
> From: [email protected]
> To: [email protected]
> Reply-To: [email protected]
> Subject: [Apertium-stuff] Duplicate entries in apertium-fr-es.fr.metadix
>
>
> Hello,
> There are some identical entries in apertium-fr-es.fr.metadix, with just 
> the author that differs :
> (The line numbers are from the latest svn version of the file)
>
> (6363)    <e lm="Abbas" a="eleka">                <i>Abbas</i><par 
> n="Abraham__np"/></e>
> (6364)    <e lm="Abbas" a="webform">              <i>Abbas</i><par 
> n="Abraham__np"/></e>
>
> (6426)    <e lm="Abidjan" a="eleka">              <i>Abidjan</i><par 
> n="Andorre__np"/></e>
> (6427)    <e lm="Abidjan" a="webform">            <i>Abidjan</i><par 
> n="Andorre__np"/></e>
>
> (6430)    <e lm="abîme" a="eleka">                <i>abîme</i><par 
> n="livre__n"/></e>
> (6431)    <e lm="abîme" a="webform">              <i>abîme</i><par 
> n="livre__n"/></e>
>
> The next line is also the same with the LR restriction which doesn't 
> seem right for this word?
> (6432)    <e lm="abîme" a="webform" r="LR">       <i>abîme</i><par 
> n="livre__n"/></e>

Done for these entries and some around. That may be very old duplicate
entries. The reason is there was for this langage pair a possibility
to enter new words directly from a website (webform). And also lemmas
were not in alphabetic order until I run a personal tool to sort them.
So, now, it's a bit easier to see these duplicate entries which aure a
lot in the whole file.

Quite one year ago now, I did the same on fra-por language pair on which
I thougth to do big changes on dictionaries to extend word coverage.
A simple way I found to do that was to take off a="something" comments
and then to use "uniq" GNU/Linux - UNIX command to tale off easily
duplicate lines.

But may be historical developpers of fr-es langage pair would prefer
to keep a="something" comments. So, a more complicated tool would be
usefull to do automatically this kind of work.

Is apertium-dixtolls sort command clever enough to tale off duplicate
entries in this case, and also when there is form the same word a line
allowing both translation side and a line with a restricted direction ?

Anyway, the same kind of corrections may need to be done in
languages/apertium-fra
metadix file. But as this file should include any entry of different French
metadix (and at least one dix) files, I don't think to do that merging in
the very next following months.


> I've also noticed that in the apostrophes/postblank section the lines 
> 29111 to 29167 and 29168 to 29224 are exactly the same twice:
> 29111 -> 29167 / 29168 -> 29224
>      <e r="LR" lm="à cause qu'">             
> <p><l>à<b/>cause<b/>qu'</l><r>à<b/>cause<b/>que</r></p><par 
> n="afin_que__cnjadv"/></e>
> (...............)
> n="afin_que__cnjadv"/></e>
>      <e r="LR" lm="à mesure qu'">            
> n="afin_que__cnjadv"/></e>
>      <e r="LR" lm="tel qu'">                 
> <p><l>tel<b/>qu'</l><r>tel<b/>que</r></p><par n="afin_que__cnjadv"/></e>
>

Done (by sorting entries of this section by alphabetic order). As my sorting
tool was both done for .dix and .metadix files, it only process the
main section.

>
> -- 
> Gabriel Paderni
> www.phone-m.com
> +33 9 84 34 20 20

Quelqu'un de France avec un prénom français. Par contre l'opérateur de ton
numéro de téléphone sur IP ne semble pas répertorié :
https://fr.wikipedia.org/wiki/Liste_des_pr%C3%A9fixes_des_op%C3%A9rateurs_de_t%C3%A9l%C3%A9phonie_par_internet_en_France

Paderni a l'air italien comme nom de famille. C'est sur cette paire que tu
travailles ?

>
>
------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
--------------------------------
Bernard Chardonneau (France)
Phone : [33] 9 72 36 32 90
GSM phone : [33] 7 69 46 16 31

Multilingual websites for my free softwares :
http://libremail.free.fr and http://libremail.tuxfamily.org
http://cyloop.tuxfamily.org (mainly translated with Apertium)

My general website (in french only)
http://bech.free.fr

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to