[Moses-support] Worldwide Lexicon project, vetting an idea, need feedback

McConnell Brian Tue, 15 Sep 2009 11:11:31 -0700

Hello.

Brian McConnell from the Worldwide Lexicon project here.


I wanted to talk with some people who've worked on Moses and SMT in  
general about an idea I am developing for a highly scalable  
translation engine. WWL, for those of you who aren't familiar with  
it, is an ad hoc/on demand translation memory that combines machine  
and human translations from a variety of sources. The big missing  
piece, from my viewpoint, is an open source machine translation  
engine that is commercially viable (e.g. scales, renders results  
quickly, etc).

My idea is to build something that mimics DNS in the way it  
functions, except it is replicating tables of n-grams and their  
translations, the point being to have a very simple, very fast and  
very dumb service that responds to requests to translate texts. The  
way this would work is you'd have a zero configuration program that  
runs on a cluster of CPUs or a service like AWS. All it does is  
respond to look up requests, break an incoming text down into n- 
grams, and make a best effort attempt to translate (possibly with an  
option to proxy out to services like Google when it's stumped).  
Training and generating the translation tables is done via a separate  
process that lives elsewhere. I am not an expert on the internals of  
MT engines, so I am looking for input from others about the  
feasibility of doing this. Ideally what I'd like is a "Moses Lite"  
process that is slaved to other servers and just processes read only  
requests.

Apart from making the service easier to deploy, I want to create a  
directory for locating translation engines, so that open MT becomes  
an embedded service that is easily located and queried by  
applications. The process would go something like this:

1. an application queries a directory server (we can host this at WWL  
and make it part of our open source release), e.g. "I need to  
translate from English to Catalan"
2. the directory server tells the app the URI for the translation  
servers available and protocol to talk to them
3. the application goes off and queries the translation server and  
gets its results back

I am not trying to create an overweight standard like SOAP, just a  
simple directory service that makes these resources more accessible.  
We are already doing this in a limited scope with our Firefox  
translator, which calls WWL to find out who it should talk to for  
various language pairs. I'd like to make this a more general service,  
and also work out how we can promote wider use to open source MT  
engines like Moses. Google is a fine product, but in the long run I'd  
prefer to see language services run on open systems as most websites  
run on Apache nowadays.

I can be reached at bsmcconnell at gmail or skype.

PS - the latest build of our Firefox translator is available at  
www.worldwidelexicon.org, it is significantly improved with better  
page rendering, local caching and other bells and whistles that make  
it pretty transparent to users

Brian
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Worldwide Lexicon project, vetting an idea, need feedback

Reply via email to