Note : Sent again. My account to this mailing list was disabled after sending it !
User-Agent: libremail : logiciel libre multilingue Date: Sat, 21 Jan 2017 15:03:11 +0100 From: "Bernard Chardonneau" <[email protected]> To: [email protected] Subject: Re: [Apertium-stuff] Duplicate entries in apertium-fr-es.fr.metadix > User-Agent: Roundcube Webmail/1.2.3 > Date: Thu, 19 Jan 2017 14:16:46 +0100 > From: [email protected] > To: [email protected] > Reply-To: [email protected] > Subject: [Apertium-stuff] Duplicate entries in apertium-fr-es.fr.metadix > > > Hello, > There are some identical entries in apertium-fr-es.fr.metadix, with just > the author that differs : > (The line numbers are from the latest svn version of the file) > > (6363) <e lm="Abbas" a="eleka"> <i>Abbas</i><par > n="Abraham__np"/></e> > (6364) <e lm="Abbas" a="webform"> <i>Abbas</i><par > n="Abraham__np"/></e> > > (6426) <e lm="Abidjan" a="eleka"> <i>Abidjan</i><par > n="Andorre__np"/></e> > (6427) <e lm="Abidjan" a="webform"> <i>Abidjan</i><par > n="Andorre__np"/></e> > > (6430) <e lm="abîme" a="eleka"> <i>abîme</i><par > n="livre__n"/></e> > (6431) <e lm="abîme" a="webform"> <i>abîme</i><par > n="livre__n"/></e> > > The next line is also the same with the LR restriction which doesn't > seem right for this word? > (6432) <e lm="abîme" a="webform" r="LR"> <i>abîme</i><par > n="livre__n"/></e> Done for these entries and some around. That may be very old duplicate entries. The reason is there was for this langage pair a possibility to enter new words directly from a website (webform). And also lemmas were not in alphabetic order until I run a personal tool to sort them. So, now, it's a bit easier to see these duplicate entries which aure a lot in the whole file. Quite one year ago now, I did the same on fra-por language pair on which I thougth to do big changes on dictionaries to extend word coverage. A simple way I found to do that was to take off a="something" comments and then to use "uniq" GNU/Linux - UNIX command to tale off easily duplicate lines. But may be historical developpers of fr-es langage pair would prefer to keep a="something" comments. So, a more complicated tool would be usefull to do automatically this kind of work. Is apertium-dixtolls sort command clever enough to tale off duplicate entries in this case, and also when there is form the same word a line allowing both translation side and a line with a restricted direction ? Anyway, the same kind of corrections may need to be done in languages/apertium-fra metadix file. But as this file should include any entry of different French metadix (and at least one dix) files, I don't think to do that merging in the very next following months. > I've also noticed that in the apostrophes/postblank section the lines > 29111 to 29167 and 29168 to 29224 are exactly the same twice: > 29111 -> 29167 / 29168 -> 29224 > <e r="LR" lm="à cause qu'"> > <p><l>à<b/>cause<b/>qu'</l><r>à<b/>cause<b/>que</r></p><par > n="afin_que__cnjadv"/></e> > (...............) > n="afin_que__cnjadv"/></e> > <e r="LR" lm="à mesure qu'"> > n="afin_que__cnjadv"/></e> > <e r="LR" lm="tel qu'"> > <p><l>tel<b/>qu'</l><r>tel<b/>que</r></p><par n="afin_que__cnjadv"/></e> > Done (by sorting entries of this section by alphabetic order). As my sorting tool was both done for .dix and .metadix files, it only process the main section. > > -- > Gabriel Paderni > www.phone-m.com > +33 9 84 34 20 20 Quelqu'un de France avec un prénom français. Par contre l'opérateur de ton numéro de téléphone sur IP ne semble pas répertorié : https://fr.wikipedia.org/wiki/Liste_des_pr%C3%A9fixes_des_op%C3%A9rateurs_de_t%C3%A9l%C3%A9phonie_par_internet_en_France Paderni a l'air italien comme nom de famille. C'est sur cette paire que tu travailles ? > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -------------------------------- Bernard Chardonneau (France) Phone : [33] 9 72 36 32 90 GSM phone : [33] 7 69 46 16 31 Multilingual websites for my free softwares : http://libremail.free.fr and http://libremail.tuxfamily.org http://cyloop.tuxfamily.org (mainly translated with Apertium) My general website (in french only) http://bech.free.fr ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
