Hello. Brian McConnell from the Worldwide Lexicon project here.
I wanted to talk with some people who've worked on Moses and SMT in general about an idea I am developing for a highly scalable translation engine. WWL, for those of you who aren't familiar with it, is an ad hoc/on demand translation memory that combines machine and human translations from a variety of sources. The big missing piece, from my viewpoint, is an open source machine translation engine that is commercially viable (e.g. scales, renders results quickly, etc). My idea is to build something that mimics DNS in the way it functions, except it is replicating tables of n-grams and their translations, the point being to have a very simple, very fast and very dumb service that responds to requests to translate texts. The way this would work is you'd have a zero configuration program that runs on a cluster of CPUs or a service like AWS. All it does is respond to look up requests, break an incoming text down into n- grams, and make a best effort attempt to translate (possibly with an option to proxy out to services like Google when it's stumped). Training and generating the translation tables is done via a separate process that lives elsewhere. I am not an expert on the internals of MT engines, so I am looking for input from others about the feasibility of doing this. Ideally what I'd like is a "Moses Lite" process that is slaved to other servers and just processes read only requests. Apart from making the service easier to deploy, I want to create a directory for locating translation engines, so that open MT becomes an embedded service that is easily located and queried by applications. The process would go something like this: 1. an application queries a directory server (we can host this at WWL and make it part of our open source release), e.g. "I need to translate from English to Catalan" 2. the directory server tells the app the URI for the translation servers available and protocol to talk to them 3. the application goes off and queries the translation server and gets its results back I am not trying to create an overweight standard like SOAP, just a simple directory service that makes these resources more accessible. We are already doing this in a limited scope with our Firefox translator, which calls WWL to find out who it should talk to for various language pairs. I'd like to make this a more general service, and also work out how we can promote wider use to open source MT engines like Moses. Google is a fine product, but in the long run I'd prefer to see language services run on open systems as most websites run on Apache nowadays. I can be reached at bsmcconnell at gmail or skype. PS - the latest build of our Firefox translator is available at www.worldwidelexicon.org, it is significantly improved with better page rendering, local caching and other bells and whistles that make it pretty transparent to users Brian _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
