El dc 04 de 04 de 2012 a les 08:53 +0200, en/na Per Tunedal va escriure: > Hi, > I have just rapidly scanned through the documentation and have some very > basic questions about building a new language pair:
I'll give some very brief answers below. > 1. Can I reuse the already developed monolingual dictionaries for the > two languages? Yes > 2. Is someone maintaining an updated version (i.e. joining all > additions) of the most complete monolingual dictionary for each > language? No. This is something we would like to do, there is a GSOC idea for it: http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Monolingual_and_bilingual_data_decoupling > 3. Can the two monolingual dictionaries for a new language pair have > different length (i.e. contain a different number of words) - relative > each other and relative the bilingual dictionary? Not currently. This would cause errors in the translation. Here is an example translation from Nursery. Original: Planten finnes naturlig i Lilleasia og i deler av Midtøsten (Iran, Irak). Den står oppført på norsk svarteliste som uønsket og er utbredt langs veikanter, bekkekanter, og bakgårder i Nord-Norge, fra Finnmark og sørover til Nord-Trøndelag, men den er også funnet i Sør-Norge, særlig i Osloområdet. finns også i Østfold og Vestfold. MT output from no-en in nursery: Plant #be natural #in #Asia Minor and #in parts #of #Middle East (Iran, Iraq). He stands staged on Norwegian #blacklist as undesirable and are expanded #along #roadside, *bekkekanter, and @bakgård #in #Northern Norway, from Finnmark and @sørover #to #Nord-Trøndelag, but he is also found #in #South Norway, especially #in *Osloområdet. Finn's also #in #Østfold and #Vestfold. My "translation" (I don't know Norwegian): The plant is found naturally in Asia Minor and in parts of the Middle East (Iran, Iraq). It is currently on the Norwegian blacklist as undesirable and is alongside roadsides, _?_ and back gardens in Northern Norway, from Finnmark southwards to Nord-Trøndelag, but it is also found in South Norway, especially in Osloområdet. It is also found in Østfold and Vestfold. This pair has basically been put together by myself and Unhammer in a few days, hence the lack of transfer rules (#be) and bilingual dictionary errors (@sørover). > In case of a positive answer to my questions: > > Does the job of creating a new language pair basically consist of "just" > creating a bilingual dictionary? In that case it would be easy to start > with some frequent words and let it grow over time. In many cases it consists of "just" creating a bilingual dictionary, the transfer rules, and performing the "vocabulary test"[1] This can take between 10 days and three months. Fran 1. http://wiki.apertium.org/wiki/Testvoc > Yours, > Per Tunedal > > PS Why is the pair sv-en presented as in "nursery" if there is neither > any files left, nor any work going on? No idea, it should probably be in incubator. It was probably created along with the Luxembourg Workshop because we had to make skeletons for all language pairs in the matrix. ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
