On 11 October 2011 02:46, Bernard Chardonneau <bechapert...@free.fr> wrote: > I don't know if you whant it just for translating or to develop this > language pair. > > I am interested to develop pairs including French. > > Presently, at least up to december I prefer only working on Apertium > wiki translations in French. It's also a way to learn about Apertium. > > For next year, fr <-> en is the second pair I think I should work > about. > > The first one is from Esperanto to French (using what was done in the > opposite direction and models of transfer rule in other pairs). > > For the actual fr-en pair, I didn't see if dictionary is big enough to > be interesting. If this pair is at nursery level, dictionaries may be > small. >
The dictionary is relatively large, but it would take about 2 years to get it to the level > So if I had to build them I would choose to use French - Esperanto and > English - Esperanto dictionaries and crossdics for two reasons : > - first, the fr-eo translator has good coverage (better than fr-es with > the texts I gave it), and en-eo dictionaries are a bit bigger (according > what I remember) The fr-es and en-es dictionaries have been developed for years to be bi-directional; eo-en has had much less work in choosing the right candidate from eo->en, and fr-eo has, to my knowledge, had no work done on fr->eo to date. That makes them quite poor choices for crossing. Done right, crossing can give you larger dictionaries in less time, but it's not a fully automatic process, because triangulation errors are unavoidable. And crossing is rarely done right. Crossdics doesn't just cross the dictionaries, it also produces a set of patterns it encountered which can be refined into a crossing model. With an iterative process of re-crossing and refining the model, you can eventually get a relatively good dictionary (it took about 8 attempts on my current project, if I remember correctly). The first crossing is essentially useless as a dictionary, but it's impossible to know how to refine the model without seeing the errors that were generated, so it's necessary. > - secondly, I think esperanto is a good choice for a cross language with > few homonyms (one word, one meaning). Realistically, the dictionaries themselves are more of a factor than anything inherent to the language. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff