El ds 30 de 06 de 2012 a les 12:20 +0200, en/na Mikel Forcada va escriure: > Dear Apertiumers, > > as "principal instigator" and current PMC president of Apertium, I think > my two cents worth is expected here, so here I go. It will be a long two > cents, so pour youselves your favouite drink before reading it. > > 1. Apertium has a very, very flexible language to specify lexical > transformations, such as the ones found on bilingual dictionaries. This > allows for many different "coding styles". This freedom has, on the one > hand, made Apertium a very successful project, but, on the other hand, > allowed for divergent styles of coding. > > 2. The existence for different styles of coding does not worry me "per > se" (after all, this is a free/open-source project and therefore > everyone's project) but I think it would be a very, very good idea for > the PMC to do (or promote) some substantial work on public > recommendations on how linguistic data should be built, as its absence > and the existence of so many radically diverging "dix dialects" may > effectively drive people away from adopting Apertium for "serious" work. > In particular I worry about maintainability, as this is crucial for quality.
I definitely agree with this. > 3. There is no normative decision as regards what information should go > in a bilingual dictionary, but,yes, there was a tradition. When Apertium > started, it was used to translate between Romance languages, which meant > that tranlsations were basically word-per-word, and structural transfer > did not cover all words. This was the reason to have bilingual > dictionaries that only encoded a prefix of the lexical forms: the > remaining part was simply copied or just slightly modified by transfer, > as all morphological dictionaries were much alike. In most cases, we > coded them as in a paper bilingual dictionary. but left out gender, for > instance, when it did not change. This was inherited, in fact, from > interNOSTRUM, and was not questioned as it was working reasonably. But > now, Apertium covers many different languages and morphological > dictionaries are sometimes very different. Therefore, the question > arises as to what to encode there. Different criteria may be used. > Francis Tyers seems to favour reusability (which is nice, but, I agree > with Felipe, secondary if it is not reusability inside Apertium), but I > don't think this entails including complete lexical forms like the ones > that started this thread in the dictionary (after all, not including > them makes the dictionaries as compact as paper bilingual dictionaries > which do not contain everything). I'd like to bring in the Wiktionary/Wikipedia mantra here "Wiktionary is not a paper dictionary". We don't have to underspecify as we're not writing on dead trees. > Another criterion is compactness. > Héctor Alòs considers a "radical" prefix approach, where not even the > part-of-speech would be featured. Another criterion is to encode what is > more likely to be preserved by transfer, which is what speakers of both > languages would put in a bilingual dictionary as morphology would be > automatically discounted in their minds. But as I said above, we need to > reflect on the interplay between these criteria and try to draft a > recommendation. Morphology would probably be discounted, but would gender ? Really ? > 4. One thing in favour of having more than the minimum information > necessary is that excess information that is the same on both sides may > easily be automatically removed for applications like the ones Felipe > mentions. > > 5. I am not in favour of using <i> in bilingual dictionaries, if you > want a coding recommendation from a pioneer. It is an early mistake > (from the times of Spanish-Catalan and Spanish-Galician) that should be > avoided. Yes, agree with this '<i>' in bilingual dictionaries is bad.[tm] > 6. We don't usually have paradigms in bilingual dictionaries but I > believe that could avoid a great deal of "default" structural transfer > by adding paradigms to bilingual dictionaries that would deal with the > tags in "default" situations when morphological dictionaries are very > different in their tagsets. Just an idea from your president. In dictionaries from newer pairs we do. In fact, Jacob gave a short talk about this (and other things) in FreeRBMT 2009. > 7. Isn't this the kind of stuff that would have to be treated in an > Apertium conference? Shouldn't people draft RFC's (requests for > comments), and shouldn't all of us discuss them? Yes definitely. F. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff