El dt 04 de 09 de 2012 a les 09:44 +0200, en/na Per Tunedal va escriure:
> Hi,
> I've been thinking about facilitating contributions once more.
> 
> On Wed, Aug 22, 2012, at 14:14, Jimmy O'Regan wrote:
> > On 22 August 2012 12:06, Per Tunedal <per.tune...@operamail.com> wrote:
> > > Hi,
> > > OK.
> > > Back to my original wish for some kind of easy to use interface for
> > > contributions for specific domains. I guess most experts in law or
> > > beetles are not hackers and would not even think of contributing by
> > > adding codes to dictionary files.
> > 
> > There have been a number of attempts at this over the years, and it's
> > surprisingly difficult to get right. IIRC, someone at UA was working
> > on it, but I don't have details.
> 
> It would be very interesting to know exactly what went wrong. From the
> outside it looks like a very simple task. There must be something fishy
> about that! I have a very strong interest in things that go wrong, as
> they might teach you something important.

Hi,

As one of the people who has worked on this, I can fill you in.

Let's start off with a simple example, translating the Icelandic word
"tíð" to French, "temps"...

Imagine for a second that we have a typical dictionary, it looks like:

tíð<n><f>:temps<n><m>

So far so good right... This would be easy for newbies to add. 

Now let's take it a bit further:

Our source language dictionary for "tíð" looks like:

tíð:tíð<n><f><sg><nom><ind>
tíð:tíð<n><f><sg><acc><ind>
tíð:tíð<n><f><sg><dat><ind>
tíðar:tíð<n><f><sg><gen><ind>
tíðin:tíð<n><f><sg><nom><def>
tíðina:tíð<n><f><sg><acc><def>
tíðinni:tíð<n><f><sg><dat><def>
tíðarinnar:tíð<n><f><sg><gen><def>
tíðir:tíð<n><f><pl><nom><ind>
tíðir:tíð<n><f><pl><acc><ind>
tíðum:tíð<n><f><pl><dat><ind>
tíða:tíð<n><f><pl><gen><ind>
tíðirnar:tíð<n><f><pl><nom><def>
tíðirnar:tíð<n><f><pl><acc><def>
tíðunum:tíð<n><f><pl><dat><def>
tíðanna:tíð<n><f><pl><gen><def>

And our source language dictionary for "temps" looks like:

temps:temps<n><m><sp>

So, now, we are stuck, if we translate "tíð" to "temps" using the
bilingual dictionary we have above, we would get "temps<n><m><sg>" or
"temps<n><m><pl>" ... which would cause a generation error (marked with
'#').

What do we do ? 

tíð<n><f><sg>:>:temps<n><m><sp>
tíð<n><f><pl>:>:temps<n><m><sp>
tíð<n><f><ND>:<:temps<n><m><sp>

The <ND> marks that the number is to be determined by the structural
transfer module.

But, this is only half the story. We've taken care of the minor
structural difference, but then both words may also have more
translations:

tíð (f):

  1. (tímabil) period, time
  2. (veðurfar) weather conditions
  3. pl (blóðlát kvenna) menstruation, period
  4. pl (messa) mass
  5. (í málfr) tense 

temps (m):

  1. (uncountable) time (in general) 
  2. (uncountable) weather 
  3. (countable) tense (grammar) 

So, we have an overlap in senses 1,2,5 as an exercise for the reader,
try and make bilingual dictionary entries with the appropriate direction
restrictions for senses 3,4 in Icelandic, translating to French.

There are more bilingual dictionary patterns here:

http://wiki.apertium.org/wiki/Bilingual_dictionary

Fran


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to