On Sat, Mar 21, 2009 at 10:31, Mindaugas Indriunas <iny...@gmail.com> wrote: > The implications of this kind of microformat could be far reaching. It > could result in better machine translation, and possibly something > like Wikipedia written in one language (that is, in concepts defined > through use of multitude of all existing human languages and > dictionaries), yet displayed in a preferred human language > automatically... >
The W3C have an Incubator Group in place to try and push 'CWL', the Common Web Language. The idea of it is that instead of writing a Web documents in existing natural languages, one writes them in this semantically-rich markup language, which then gets machine translated. Here is their charter: http://www.w3.org/2005/Incubator/cwl-ei/charter I put up a blog post about it a while back, where I snarkily called it Esperanto-over-HTTP: http://tommorris.org/blog/2008/07/01#When:22:29:49 A microformat that sits atop an existing machine language (X/HTML) and existing natural languages is a lot less impractical than something like CWL. That said, the idea that general web documents will end up filled with semantically unambiguous identifiers instead of words is ambitious to say the least. Both this proposal and the CWL proposal suffer from the problem that it'll turn the richness of human languages into machine slop. Human languages have given us Plato, Dante, the Song of Solomon, Eliot and Shakespeare. A highly efficient method to turn that into something like a Java stack trace is perhaps less than ideal. Maybe, in a hundred years time, we might get some kind of XML Esperanto thing going on, but we need to just solve the big problems - the common blobs of data, the common relationships between the things those blobs of data represent. This is how it is in the real world - there's a reason why things like the signs at hospitals, train stations, airports and trams are made internationally readable with a greater degree of urgency than, say, television shows. If you turn up at a hospital and don't speak much of the native language, you risk death. If you can't watch Lost, big deal. If you think that this approach has a shot, I think the best way is to produce a demo - write an example in X/HTML and show how linguistic disambiguation could make for better machine translation. You need to get the guts working first, then if it's necessary, a microformat can come later. -- Tom Morris http://tommorris.org/ _______________________________________________ microformats-new mailing list microformats-new@microformats.org http://microformats.org/mailman/listinfo/microformats-new