On 11 October 2011 02:46, Bernard Chardonneau <bechapert...@free.fr> wrote:
> I don't know if you whant it just for translating or to develop this
> language pair.
>
> I am interested to develop pairs including French.
>
> Presently, at least up to december I prefer only working on Apertium
> wiki translations in French. It's also a way to learn about Apertium.
>
> For next year, fr <-> en is the second pair I think I should work
> about.
>
> The first one is from Esperanto to French (using what was done in the
> opposite direction and models of transfer rule in other pairs).
>
> For the actual fr-en pair, I didn't see if dictionary is big enough to
> be interesting. If this pair is at nursery level, dictionaries may be
> small.
>

The dictionary is relatively large, but it would take about 2 years to
get it to the level

> So if I had to build them I would choose to use French - Esperanto and
> English - Esperanto dictionaries and crossdics for two reasons :
> - first, the fr-eo translator has good coverage (better than fr-es with
>  the texts I gave it), and en-eo dictionaries are a bit bigger (according
>  what I remember)

The fr-es and en-es dictionaries have been developed for years to be
bi-directional; eo-en has had much less work in choosing the right
candidate from eo->en, and fr-eo has, to my knowledge, had no work
done on fr->eo to date. That makes them quite poor choices for
crossing.

Done right, crossing can give you larger dictionaries in less time,
but it's not a fully automatic process, because triangulation errors
are unavoidable. And crossing is rarely done right.

Crossdics doesn't just cross the dictionaries, it also produces a set
of patterns it encountered which can be refined into a crossing model.
With an iterative process of re-crossing and refining the model, you
can eventually get a relatively good dictionary (it took about 8
attempts on my current project, if I remember correctly). The first
crossing is essentially useless as a dictionary, but it's impossible
to know how to refine the model without seeing the errors that were
generated, so it's necessary.

> - secondly, I think esperanto is a good choice for a cross language with
>  few homonyms (one word, one meaning).

Realistically, the dictionaries themselves are more of a factor than
anything inherent to the language.

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to