Hoi, The notion that this black box needs to use text that is licensed under the CC-by-sa is a folly. The data that is gathered by data mining strips the meaning of the text. Consequently it can be considered to be a completely and utterly separate work. Using text as the basis of a corpus is essentially less intrusive then using the same text for "search engine" purposes.
I have never argued for the WMF to involve itself in machine translation. What I do argue is that the WMF might partner with organisations that are involved in machine translations. It is not just Google that comes to mind, Apertium is another project that has a different approach that is effective for certain language combinations. The legalities and practicalities of language technology are quite distinct from our standard considerations. Thanks, GerardM 2009/5/31 Brian <brian.min...@colorado.edu> > Proprietary algorithms aren't what make their system better - it's that > they > have a larger corpus. Google has published a trillion token dataset for > machine translation researchers but it's presumably just a subset of what > they now have. The data that makes their system so good is already > available public but it is not (yet) within the scope of the WMF to harvest > all copyrighted information in order to increase the performance of already > published machine translation algorithms. > > It would cost the WMF dearly in resources to build such a system themselves > based on published > research. In other words, as long as the output of the black box is > CC-BY-SA the other factors aren't very important. > > In my mind if you consider using a corporation's semi-proprietary > translation engine to be a violation of the WMF's principles then accepting > visitors that come from Google in the first place would be an analogous > violation. We have no idea how the search engine that is the single largest > source of visitors to Wikipedia works, and yet we accept them graciously. > > On Sun, May 31, 2009 at 1:45 AM, Gerard Meijssen > <gerard.meijs...@gmail.com>wrote: > > > Hoi, > > Currently the translation engine by Goole works for some twenty > languages. > > We have Wikipedias in over 250 languages and we localise in over 300. If > we > > are to collaborate with Google on this, we should partner in the building > > of > > translation engines for our other languages. We could and we should > > consider > > this when the software was to be open source. > > Thanks, > > GerardM > > > > 2009/5/31 Foxy Loxy <foxyloxy.wikime...@gmail.com> > > > > > I would guess a partership with Google would be a good idea because: > > > 1) They are the best (according to Brian) and > > > 2) If we were to go through with this proposal we'd want the > translation > > > technology now, not in X years when the technology catches up with > > > google, if at all. > > > > > > And with many OSS/free projects, the X could be insanely high. > > > > > > On Sunday, 31 May 2009 2:50 pm, Fajro wrote: > > > > And why partner with Google? There are Free alternatives in > > > > development: > > > > > > > > http://www.apertium.org/ > > > > > > > > http://wiki.apertium.org/wiki/Main_Page > > > > > > > > -- > > > > △ ℱajro △ > > > > > > -- > > > fl > > > <http://en.wikipedia.org/wiki/user_talk:fl> > > > _______________________________________________ > > > foundation-l mailing list > > > foundation-l@lists.wikimedia.org > > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l > > > > > _______________________________________________ > > foundation-l mailing list > > foundation-l@lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l > > > _______________________________________________ > foundation-l mailing list > foundation-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l > _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l