On Thursday, 5 May 2016 at 23:47:15 UTC, H. S. Teoh wrote:

Rule-based letter-to-sound systems don't work too well for English precisely because you have to basically reproduce 500 years' worth of sound change plus all the exceptions introduced by words borrowed from other contemporous languages across the centuries. A rule-based system possibly could work, provided the rules were extensive enough (and multi-layered, to account for borrowed exceptions and other oddities). But there comes a point where even the most industrious programmer would throw up his hands and say, forget this exercise in futility, let's just have the machine teach itself instead.

It's not just sound changes, English is just weird from a non-native speaker's point of view. As Kurt Tucholsky, one of the best German writers ever, once said, English is a simple and a difficult language at the same time. It consists of foreign words that are pronounced wrongly. English pronunciation makes any speaker of a Latin language cringe. In many European languages, and certainly in Latin languages, the letter-to-sound correspondence is more or less one-to-one: <a> is /a/, <e> is /e/ etc. In English it's often /ei/ and /i:/. <i> is often /ai/ (of for f**k's sake!): "emeritus", a Latin word, is pronounced /e.'me(:).ri.tus/, in English it's /em@.'rai.d@s/. This just makes you cringe. Native speakers of English often don't realize how weird their pronunciation sounds to those who natively speak the language they borrowed the words from (around 60% of the words). Makes me laugh when I hear English speakers who say "Oh, there is no Irish word for 'afterhours'!?" - Well, what's the English for "restaurant", "evict", "condone", "depot", "deposit" ... and what's the English for "language"?

Rule-based systems work better for Spanish because the orthography is much closer to actual pronunciation, and other parameters such as stress is more predictable. I'd venture to guess that rule-based systems might not work as well for Russian, in spite of the orthography being almost 1-to-1 with actual pronunciation, because of unpreditable stress positions which can fundamentally alter vowel values. At best, you'd need a database of stress patterns for various words so that the accent would fall in the correct places. Plus a set of exceptions for certain archaic word combinations that have unusual stress. If you had a database of English stress positions, I think half the battle is already won.

French would have the same problem as English, except that you could just do as a first approximation:

        if (rand() > someFactor)
                word = word[0 .. $/2];

and then touch it up with a small set of exceptions.  :-P


T

Are Russian stress-rules based on context? Long vs. short vowels, palatalized vs. velarized consonants etc.? If yes, you can program rules.

Reply via email to