RE: [Mt-list] Cheeseburgery hamburgers the problem of computerisedtranslations
Thanks Mikel. I see our comment got a positive comment too. Some of this is just a continual need for education. I hope that FT steps up and does a solid article about were the technology really is today and where it seems to be going. It has been a while since we've communicated about Apertium etc. In the context of Africa I think we're moving towards being able to develop such tools for African languages. At the moment efforts are relatively minor. I think I related my idea that South Africa would be a great place to develop MT for closely related languages - they have the resources, the policy commitment in principle to linguistic diversity (not a small matter), and two sets of official languages that are closely releted. With the need to produce some documents in the various official languages, the ability to facilitate translation from say Zulu into Xhosa (Nguni languages) or Sotho into Tswana could be quite important. However, most of the talent to work on such is otherwise occupied with locales, termoinology, fonts, keyboards etc in the African Network for Localisation (ANLoc) project. In the longer term it will get attention... All the beat. Don -Original Message- From: Mikel L. Forcada [mailto:m...@dlsi.ua.es] Sent: Sunday, February 01, 2009 2:08 AM To: mt-list@eamt.org Cc: Don Osborn Subject: Re: [Mt-list] Cheeseburgery hamburgers the problem of computerisedtranslations El Saturday 31 January 2009 15:53:06 Don Osborn va escriure: FYI, this item on a Financial Times blog may be of interest - another article on how inadequate MT is. I posted a comment; others may want to also. I did. Thanks for pointing out. Mikel Forcada Cheeseburgery hamburgers and the problem of computerised translations January 26, 2009by Tony Barber http://blogs.ft.com/brusselsblog/2009/01/cheeseburgery-hamburgers- and-the-p r oblem-of-computerised-translations/ -- Mikel L. Forcada m...@dlsi.ua.es http://www.dlsi.ua.es/~mlf/ ___ Mt-list mailing list
[Mt-list] Cheeseburgery hamburgers the problem of computerised translations
FYI, this item on a Financial Times blog may be of interest - another article on how inadequate MT is. I posted a comment; others may want to also. Cheeseburgery hamburgers and the problem of computerised translations January 26, 2009by Tony Barber http://blogs.ft.com/brusselsblog/2009/01/cheeseburgery-hamburgers-and-the-pr oblem-of-computerised-translations/ ___ Mt-list mailing list
[Mt-list] Review of MT articles in MultiLingual
I just posted a review of sorts of the several articles on MT in the April-May edition of MultiLingual at http://donosborn.org/blog/2008/06/30/paradigm-shift-on-machine-translation/ . Nothing too significant in that - and indeed I realize how much I have to learn on the topic - but I thought it was worth calling attention to the topics addressed. Are we at a paradigm shift or watershed point in the practical use of MT and perceptions of it in the business world (if not yet the public at large)? Comments, corrections, etc. are welcome. Don Osborn ___ Mt-list mailing list
[Mt-list] Dvorak on MT Computing's Final Frontiers
Although it's not a technical article, a column in the March edition of PC Magazine by John Dvorak may be of interest. Among other things he deplores the state of MT. See: http://www.pcmag.com/article2/0,2704,2256955,00.asp FWIW, I put up some comments at http://donosborn.org/blog/?p=6 in which I consider some broader issues. Don Osborn Bisharat.net PanAfriL10n.org ___ Mt-list mailing list
[Mt-list] Finnish firm develops machine translation technology
FYI, just saw this item from Engineering News: http://www.engineeringnews.co.za/article.php?a_id=108492 This may not be news to many/most(?) of you. Whenever I see an article in any field that claims a new generation technology my first question (with no disrespect intended) is how does it really compare to other new generation (or old) technologies? Don Osborn Bisharat.net PanAfriL10n.org Views Columnists Hitech Briefs - Karel Smrcka Finnish firm develops machine translation technology Published: 18 May 07 - 0:00 Sunda Systems, of Finland, has developed a new-generation machine-translation technology that can be used to develop efficient machine translators for any pair of languages you care to name. The Sunda MT Workbench provides a set of tools for building high-quality machine translators for even small languages that have been bypassed so far by major vendors. Sunda Systems applies a number of key technologies to guarantee the suitability of its Sunda MT Workbench for a wide variety of languages and ensure that it can be used for any pair of languages, and contains all the tools needed for building a machine translator from scratch. Dependency theory, for example, is employed to handle sentence structure, as structures that can be quite different on the surface in different languages can actually be quite close when projected on this abstract level. The company has also pioneered the principle of parallel translation. Among the most important tangible theoretical and practical benefits of this approach is efficiency as a common processor can translate thousands of sentences in a second. This principle also keeps linguistic and computational issues strictly separate, and enables linguists to concentrate on linguistic issues and see the effects of a linguistic change on the system in only a few seconds. The Sunda MT Workbench also includes tools for quality control and teamwork. A high-quality English-Finnish translator built, using the Sunda MT Workbench, is already in wide use and has yielded good results. To minimise the reworking needed for different languages, the Sunda approach is based on the principle of late commitment, which means that the processing of source- language sentences is conditioned to a specific target language only when it is imperative - not before. This means that a major portion of the source-language processing developed for one pair of languages can be reused in a translator for another target language. SMOOTHING THE PATH FOR DEVELOPERS In addition to good translation quality, Sunda Systems has also prioritised the need for its technology and the applications built around it to be efficient, user-friendly, adaptable and robust. Sunda's core translation engine can be embedded easily in external systems using standard programming interfaces and Internet protocols, for example, and runs seamlessly under most commonly used operating systems. The engine also has built-in support for common file formats, such as RTM and HTML - and documents written in these formats retain their formats in translation. Language-independent end- user applications are already avail-able to translate home pages in a Web browser, translate formatted documents, and translate general text content in desktop applications. Coupon No.: EN0108492 ___ Mt-list mailing list
[Mt-list] RE: [OT] Terminology relative to NLP for (African) pi-languages and pi-pairs, towards more oral systems
Merci Christian, replies to the last part of your message below... ... Last bit of thought: we should be more precise about the term language. For example, Chinese is not 1 language, but several (no oral intercomprehension), with their dialects: Mandarin, Cantonese, Wu (Shanghainese is a dialect of Wu), Fujien, etc. Or Arabic, for that matter. And that is quite important for NLP. For instance, a morphological analyzer for Literal or Standard Arabic is almost useless for Iraki. The US now have more resources for Iraki than for Standard ArabicŠ In the broad sense it may not be possible to have a consistent definition of language that is applicable to all uses. Ethnologue has a consistent approach, though very clearly a splitter one. That may be more useful in some areas (perhaps MT?) than others (like software localization). There is not AFAIK an index of languageness, that is how independent a language is, or whether it exists as a variant very close to one or more others, whether it or another is a standard for a wider range of uses than the other closely-related tongues, etc. (Incidentally a low languageness index, if there were such a thing, might indicate potential use of certain kinds of MT [shallow transfer models, as far as I understand the term] among the related languages.) We get into some interesting and complex areas with situations like Chinese and Arabic as you describe - what do you call a language that is pretty much the same written, but different languages spoken? Or when there is coexistence of related standard/written form and colloquial/spoken forms? Concerning African languages, I was told the situation is the same, with many dialects and sometimes different writing systems (missionaries from various countries and confessions created them almost indemendently). For NLP systems to be really useful, then, they must be tuned to these variants. Also, we should more and more take into account that, although all are technically written, their use is mostly oral, and their speakers rarely write or read in them. Then, unity provided by a common script is somewhat destroyed: systems have to become increasingly directly oral. I'm actually (re)writing something that touches on these issues. Writing systems are sometimes multiple for the same tongue, but nowadays that might be due to differences in country language policies (as pertain to orthographies within their borders - borders that very often split language communities); there are also as you mention sometimes legacies of divergent missionary approaches (an example are the orthographies for Twi Ashanti, Twi Akuapem, and Fanti in Ghana). The situation, though, is often dynamic changing which is both good and bad news: good in that standardized or unified forms benefit wider use, but bad for NLP or localization when they are in flux due to the transition not being complete or completely adopted. Your mention of possible directly oral approaches is very much on target. I do, however, see this as a family of technologies including audio-based applications, speech - text transformation softwares, and of course computer translation programs (MT. translation memory). The sum object would be to make the transition among languages and forms of expression more seamless. I've had discussions where the notion of written + neo-oral culture in Africa has been mentioned. That's talking big vague, but as far as I see from the technology anyway, there is a lot that can be done in that direction. The bottom line is how can all these wonderful things ICT can do be made to accommodate situations where there are many languages, often with oral traditions, easy codeswitching by speakers, still low literacy/pluriliteracy rates, and more access to cellphones than computers. With that trend, an important question has become how to adapt/reuse resources and tools from 1 rho- or mu- language to a variant which is still very much pi-! This is true. Actually I have been looking at this system and others to help categorize our working list of priority languages (and language groups/clusters) at http://www.panafril10n.org/wikidoc/pmwiki.php/PanAfrLoc/MajorLanguages (some experiments offline). The idea being larger strategies for languages in which one could divide such a priority list into areas for attention and support. I also hope that in the case of languages in Africa it will be possible to develop some novel approaches in developing applications, not only adapting reusing what is created elsewhere. Don Osborn Bisharat.net PanAfrican Localisation project Best regards, Ch.Boitet -- --- -- Christian Boitet (Pr. Universite' Joseph Fourier) Tel: +33 (0)4 76 51 43 55/48 17 GETA, CLIPS, IMAG-campus, BP53 Fax: +33 (0)4 76 44 66 75/51 44 05 385, rue de la Bibliothe`que Mel: [EMAIL PROTECTED] 38041 Grenoble Cedex 9, France
[Mt-list] Translation industry has vast potential in India
This item from the Telugu Portal at http://www.teluguportal.net/modules/news/article.php?storyid=17237 may be of interest. I read it thinking of the potential for machine translation facilitate work and various kinds of communication across diverse languages in various regions of the developing world. (I was alerted to this article by RSS from Kwintessential Cross Cultural News: http://www.kwintessential.co.uk/crossculturalnews/ ) Don Osborn Bisharat.net PanAfrican Localisation project Nation: Translation industry has vast potential in India: Pitroda Posted by admin on 2006/10/11 10:00:12 New Delhi, Oct 11 (IANS) The translation industry has the potential to generate more than 500,000 jobs in India, and necessary recommendations would be made to exploit the potential, said Knowledge Commission Chairman Sam Pitroda Wednesday. We are working towards strengthening the translation industry by opening state-run training institutions and then open it for the private sector, Pitroda said at a discussion organised by the Confederation of Indian Industry (CII) here. The translation industry in India has been neglected so far. India is a diverse country and we don't understand each other's culture or languages. Why can't a Bengali work be translated into a Gujarati work? he queried. That's the only way knowledge can be truly imparted. Pitroda said the entire education system in India needed a complete overhauling - right from government-run schools to institutions of higher education - since education was becoming a privilege for the few who could afford it. He added that the Knowledge Commission - set up by Prime Minister Manmohan Singh in 2005 - has given a set of 10 recommendations in this regard to the government and another set of 10 suggestions would be made in a couple of months. Our recommendations cover areas like increasing the number of universities to 1,500 from 350 in the next few years. We have also given recommendations on libraries, affirmative action, language, translations, literacy and programmes, said Pitroda. He hoped the recommendations would trigger wide debates in society, and said: I want criticism to arise because that is how there will be change in people's mindsets, which is very important for the country to develop. Pitroda - who led India's telecommunications revolution of the 1980s and headed the Technology Mission that covered areas like drinking water and edible oils - said the government had accepted the commission's paper on e-governance. The Chicago-based technocrat-entrepreneur - who is also part of a UN committee to help push technology across the globe in the 21st century - said India had a long way to go before it could call itself a superpower. ___ Mt-list mailing list
[OT] Terminology (RE: [Mt-list] NLP for (African) pi-languages, not minority languages)
Thanks to all who have responded in this thread. I will follow up offline. Re the question of terminology, and minority languages in particular, here are a few quick thoughts (with apologies for taking this off on a tangent): 1. I hadn't thought of minority being offensive, but I guess we need to be attentive to such matters. The main problem with the term I saw was its imprecision. There was not long ago a project to compile information on minority languages. To the surprise of a few people asked about it, including me, Hausa was one of them (next to Swahili it supposedly has the highest speakership of all African languages). But when we discussed it further, the criteria indeed seemed to admit it: In Hausaland across much of Niger and Nigeria it is the main language, but Hausaphones are minorities elsewhere and it is spoken as a trade language by some people further away. However, by extension, then, just about every other language in Africa is minority as well. What capped it was discovering that Chinese also qualiified as a minority language - which it is in fact in many countries, though we wouldn't think to call it, or Spanish or English, etc. As Francis puts it, situational minority languages. But that just shows how dependent the term is on context. 2. So people grope for an appropriate term. For more widely spoken languages, LWC for language of wider communication emerged at some point (rather like lingua francas, but let's not try to sort out the difference between those two here). And at the other extreme there are endangered languages about which, although definitions can vary, there is a generally accepted sense of what it means (though even on that I've read references to Igbo, a language spoken by somewhere on the order of 20 million people described as endangered - but let's not delve into the issues there either). But in between those two what do you say? Small languages as shorthand for less widely spoken languages are more appropriately spoken of as the latter - but that's too cumbersome. In Europe there was the term lesser-used languages but with uncertain implications - less people speak then or those that do use them less or both? Local languages is one that I've tried to avoid lately because it seems to me to be used in a way that reduces the languages status, and is applied only in some parts of the world (and what of local when you have, say, Wolof-speaking merchants in New York and Paris, for instance?). In Francophone countries the term langue partenaire has been coined, but that raises questions of what kind of partnership, and who's partner with whom and why and so on 3. A lot depends of course on context. Under-resourced languages is very descriptive for ICT contexts and even some traditional technologies (e.g., no textbooks in so many less-widely-spoken-languages for the better part of the past century - now that's under-resourced). But maybe not in demographic or sociolinguistic contexts. Just for an example, Fula definitely is under resourced in the technical and monetary sense, but definitely not linguistically (e.g., its lexicon is staggering - there's a large dictionary of the roots alone). Less commonly taught languages (LCTLs) is purely an academic reference. Pi-language is a new one on me but seems to be mainly a technical reference (pi=poorly informatisées or what?). 4. I ran into this problem personally when I wanted a way to refer to a very wide class of languages not counting the LWCs as LWCs, and came up with an acronym that I think covers the intended field and is in itself constructively ambiguous: MINEL - where M is maternal (which is every language, but here the emphasis is on this role as opposed to the 2nd language role) or minority (sorry!); I is indigenous (which also can mean anything, but here meant in the sense of languages of indigenous peoples; N is national which is an appellation more common in Francophone countries especially in Africa and is *not* the same as official; E is endangered, or ethnic which one will hear with regard to languages in some parts of the world (funny that a language might be referred to as ethnic and not indigenous or vice-vera, but the criteria for the distinction are arguable); and L could be less-widely-spoken or even local or, well, language. That about runs the gamut, from what I have. Hope all have a good weekend (some of you are in the midst of it and others just starting, and some of us will work through it either way!). Don Osborn ___ Mt-list mailing list
[Mt-list] Computers translation in Africa / involving African languages
I have updated a very modest presentation of some info relevant to MT in Africa at http://www.bisharat.net/Trans/ . There is not much there, so I would like to request information/recommendations for other links relating to MT in Africa and in African languages wherever. (The page also needs some reworking, but I'm mainly concerned now with content.) TIA Don Osborn Bisharat.net PanAfrican Localisation project ___ Mt-list mailing list
RE: [Mt-list] Computers translation in Africa / involving African languages
Thanks. It does, but info on Arabic could easily overwhelm a page like that. Is there a (meta-)page(s) with links to MT projects on Arabic? More ideal to add such a link. -Original Message- From: Robert Frederking [mailto:[EMAIL PROTECTED] Sent: Thursday, August 24, 2006 1:04 PM To: Don Osborn Cc: mt-list@eamt.org Subject: Re: [Mt-list] Computers translation in Africa / involving African languages If Arabic counts, there's much work in the US on Arabic these days. Don Osborn wrote: I have updated a very modest presentation of some info relevant to MT in Africa at http://www.bisharat.net/Trans/ . There is not much there, so I would like to request information/recommendations for other links relating to MT in Africa and in African languages wherever. (The page also needs some reworking, but I'm mainly concerned now with content.) TIA Don Osborn Bisharat.net PanAfrican Localisation project ___ Mt-list mailing list ___ Mt-list mailing list
[Mt-list] World's first translation software for cell phone launched in Xiamen
FYI, in case you haven't already seen thisw item elsewhere...World's first translation software for cell phone launched in Xiamen http://english.people.com.cn/200607/28/eng20060728_287732.html It has now become a reality that you hear Chinese when the other people speak to you in English on a mobile phone. This has been achieved by a translation software developed by Xiamen Talentedsoft Co. Ltd. (Talentedsoft). With storage of 0.1 million words, the world's first translation software for cell phone can translate a common sentence from Chinese to English or from English to Chinese in less than 0.5 second and can also read out the words after the translation is done. Talentedsoft is a company engaged in developing machine translation technology and voice recognition technology and is co-founded by two professors from Institute of Artificial Intelligence of Xiamen University and a returned Chinese intellectual from overseas with a PHD. By People's Daily Online UPDATED: 18:05, July 28, 2006 ___ Mt-list mailing list