--- In FairfieldLife@yahoogroups.com, Bhairitu <noozg...@...> wrote:
>
> As as software professional and developer/designer my first step 
> would be to buy some time from linguistics experts.  I would also 
> brainstorm with other developers on solutions to the problem of 
> language translation. Then you would want to do some modeling on 
> several of the solutions and see what floats and what sinks. Some 
> of the first translation software that I bought years ago came out 
> of Russia. That was most likely based on research done at the their 
> universities and  institutes. And yes I might well look into 
> Sanskrit as an intermediate language too and most certainly track 
> down what has already been done in that area. 

First, I was just rappin' and trippin' on the
idea, not actually proposing that I'd ever be
interested in writing such software.  :-)

Second, I did a little Googling and found that
I was on the right track in at least two critical
areas. The first was that there is a general con-
sensus that the higher the degree of ambiguity
in the language itself (the more possible ways
that a sentence can be possibly translated, given
all the possible meanings and possible parts of
speech that the words in that sentence can have),
the more monumental the task of translating that
language is. The second is that such translation
programs are, in fact, done in discrete stages,
not as sheer number-crunching dictionary matches.

Dictionary matching is seen as the least useful
and practical method. A base knowledge of ling-
uistics and natural language are considered
essential. "Intermediate languages" are used,
but they are never an existing human language;
none of them are precise and unambiguous enough
to qualify. Instead, they are a made-up symbolic
language into which the source human language is 
decoded, prior to encoding it back into other
human languages. This tends to be referred to
as the "interlingual" approach.

An assumption to my earlier rap that I didn't
spell out was that any such system has to be
heavily rule-based. That is agreed to by all the
sources I found today. 

Dictionary-based translation (one-to-one word 
mapping) is considered the least successful map-
ping, as I suspected. "Statistical" methods trans-
late by comparing many side-by-side previous
translations between the two languages. "Example-
based" translation is similar to what I described
as "parsing for idioms," and makes use of a large
database of known phrases and word combinations.

The best software so far uses combinations of all
these methods, not just one, but they tend to 
*rely* on one approach more than another. For
example, SYSTRAN (which underlies Yahoo's Babel-
fish) is primarily an example-based system, 
whereas Google Translate's engine is primarily
statistical-based.

It is generally agreed that the best commercial
translation software available rarely works well
enough "out of the box." Instead, like optical
recognition software for scanning printed pages,
it needs to have the ability to "learn" from past
mistakes and correct similar mistakes in the future.
Thus you translate, have a native speaker of the
target language go in and make corrections to the
output, send that back to the processing engine,
and hopefully it will do better the next time, 
having "learned" from its own mistakes.

And that's all I know about this arcane subject...  :-)



Reply via email to