El dc 04 de 04 de 2012 a les 08:53 +0200, en/na Per Tunedal va escriure:
> Hi,
> I have just rapidly scanned through the documentation and have some very
> basic questions about building a new language pair:

I'll give some very brief answers below.

> 1. Can I reuse the already developed monolingual dictionaries for the
> two languages?

Yes

> 2. Is someone maintaining an updated version (i.e. joining all
> additions) of the most complete monolingual dictionary for each
> language?

No. This is something we would like to do, there is a GSOC idea for it:

http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Monolingual_and_bilingual_data_decoupling

> 3. Can the two monolingual dictionaries for a new language pair have
> different length (i.e. contain a different number of words) - relative
> each other and relative the bilingual dictionary?

Not currently. This would cause errors in the translation. Here is an
example translation from Nursery.

Original:

Planten finnes naturlig i Lilleasia og i deler av Midtøsten (Iran,
Irak). Den står oppført på norsk svarteliste som uønsket og er utbredt
langs veikanter, bekkekanter, og bakgårder i Nord-Norge, fra Finnmark og
sørover til Nord-Trøndelag, men den er også funnet i Sør-Norge, særlig i
Osloområdet. finns også i Østfold og Vestfold.

MT output from no-en in nursery:

Plant #be natural #in #Asia Minor and #in parts #of #Middle East (Iran,
Iraq). He stands staged on Norwegian #blacklist as undesirable and are
expanded #along #roadside, *bekkekanter, and @bakgård #in #Northern
Norway, from Finnmark and @sørover #to #Nord-Trøndelag, but he is also
found #in #South Norway, especially #in *Osloområdet. Finn's also #in
#Østfold and #Vestfold.

My "translation" (I don't know Norwegian):

The plant is found naturally in Asia Minor and in parts of the Middle
East (Iran, Iraq). It is currently on the Norwegian blacklist as
undesirable and is alongside roadsides, _?_ and back gardens in Northern
Norway, from Finnmark southwards to Nord-Trøndelag, but it is also found
in South Norway, especially in Osloområdet. It is also found in Østfold
and Vestfold.

This pair has basically been put together by myself and Unhammer in a
few days, hence the lack of transfer rules (#be) and bilingual
dictionary errors (@sørover).

> In case of a positive answer to my questions:
> 
> Does the job of creating a new language pair basically consist of "just"
> creating a bilingual dictionary? In that case it would be easy to start
> with some frequent words and let it grow over time.

In many cases it consists of "just" creating a bilingual dictionary, the
transfer rules, and performing the "vocabulary test"[1] This can take
between 10 days and three months. 

Fran

1. http://wiki.apertium.org/wiki/Testvoc

> Yours,
> Per Tunedal
> 
> PS Why is the pair sv-en presented as in "nursery" if there is neither
> any files left, nor any work going on?

No idea, it should probably be in incubator. It was probably created
along with the Luxembourg Workshop because we had to make skeletons for
all language pairs in the matrix. 


------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to