The Oxford English Dictionary, generously supported by the Oxford University Press, is one of the earliest instances of what are now called "pro-am" or "commons-based peer production" projects. From 1857 to 1928, thousands of readers collected examples of uses of words their dictionaries didn't define; they mailed these examples on slips of paper to a small number of editors, who undertook to collate them into a dictionary. From 1884 to 1928, these editors published their work in fascicles, mostly in alphabetical order. <http://en.wikipedia.org/wiki/Oxford_English_Dictionary --- Wikipedia article "Oxford English Dictionary">
In recent years, with the advent of public access to the internet, it has become apparent that commons-based peer production works best when no single party can restrict the uses of the end product; more people can use it, it can be put to more uses, poor coordinators can be replaced, and contributors have assurance that they will be able to use their own work. <http://perens.com/Articles/Economic.html --- "The Emerging Economic Paradigm of Open Source", by Bruce Perens; http://www.benkler.org/CoasesPenguin.html --- "Coase's Penguin, or Linux and the Nature of the Firm, by Yochai Benkler> This form of commons-based peer production of information, in which the end product can be studied, copied, modified, and used freely, is often called "Open Source development". <http://opensource.org/docs/definition.php --- "The Open Source Definition, Version 1.9", promulgated by the Open Source Initiative; http:///www.catb.org/~esr/writings/cathedral-bazaar/ --- "The Cathedral and the Bazaar", by Eric S. Raymond> It got this name because it started with software whose source code was freely available for all these purposes, also known as "free software" <http://www.fsf.org/ --- the Free Software Foundation>. Tim Bray, the world-famous hacker who co-invented XML, explains how the OED is not currently open source: Well, literally thousands of people around the world diligently read books looking for usages of words and writing them on slips and sending them to Oxford. Many, many millions of these things are in filing cabinets in the basement of Oxford. Then Oxford, of course, turned them around to do a commercial product. It's not as though the underlying citation store or the dictionary itself are open for free access to anybody except for Oxford. So I don't think it's really open source in some of the essential characteristics. It is certainly community-based and community-driven. And it clearly became the case that some of the unpaid volunteers became thought leaders in terms of how you go about finding things. <http://www.acmqueue.com/modules.php?name=Content&pa=printer_friendly&pid=282&page=1 --- "A Conversation with Tim Bray", ACM Queue, Vol. 3, No. 1, February 2005> If the Oxford English Dictionary were Open Source, we could expect the following improvements: - Definitions would be available in many contexts; for example, within a word processor, at the command line, in a web browser. - OED definitions and etymologies would be available to many more people, so many more people would think about how they needed improvement. - When a person noticed a bad definition or an opportunity for improvement, they could immediately fix it in their local copy of the dictionary, and later, share their improvement with others who were interested. This is particularly important because the OED is quite out of date, especially the parts in the public domain. - Definitions and etymologies could be augmented with unlimited examples of use, drawn from the English literary canon (via Project Gutenberg) on demand. - People could develop innovative software for looking up definitions; for example, it could disambiguate misspellings according to context of use, and preferentially display word senses that might apply in context (noun versus verb, for example, or by the publication year or country of origin of the work containing the unknown word, if that's available.) - Web sites such as http://www.snopes.com/ could link to authoritative definitions and etymologies of words, and even quote them in full without fear of copyright infringement. - Its English-language definitions could be translated into other languages (perhaps incrementally, as people requested them) to supplement existing inter-lingual dictionaries, or perhaps even create new ones. I have been investigating what would be required to make the OED Open Source. Much of the first edition is out of copyright; in general, anything published before 1923 is in the public domain in the US and in Berne Convention countries. Someone could take this out-of-copyright text and create a public-domain or open-source-licensed version thereof. The fascicle 'W-Wash' was published in 1921 <http://www.colbycosh.com/old/december02.html --- http://oed.com/pdfs/oed-news-2002-06.pdf --- article "J.R.R. Tolkien and the OED", Oxford English Dictionary News, Series 2, Number 21, June 2002, by Peter Gilliver, pp.1-3>; this suggests that nearly the entire dictionary is out of copyright, in the form in the fascicles. However, I don't know how to get hold of them, and the Wikipedia article cited above mentions that the first one sold only 4000 copies --- so there may be fascicles of which no copies survive. For example, none are listed on http://www.abebooks.com/ as far as I can tell; I searched on "new english dictionary historical" and "fascicle dictionary english" with little luck. Searching for "new english dictionary historical principles" in the title, however, I did find several volumes supposedly published before 1921, at very reasonable prices --- US$30-US$130. (I found advertisements for volumes D-E, H-K, L-N, Q-R, S-SH, V-Z, X-ZYXT, all claiming to be from before 1923, comprising nearly half of the first edition OED.) If someone were to take the original pages, of which I guess there are around ten thousand, photograph each one with a cheap five-megapixel digital camera, and compress the result, each page image would probably be around a megabyte and take about ten seconds to produce; the entire set would require only about ten gigabytes and about 30 hours of labor to produce. It could then be distributed by BitTorrent and DVD-R's. The Internet Archive's Texts Collection's Million Books Project <http://www.archive.org/details/texts> consists of books scanned in more or less this manner, although they are using expensive sixteen-megapixel cameras because they have a wider range of uses in mind. The sample book I'm looking at is one bit deep and is compressed by about a factor of 24 --- 558 pages, 8.5 megapixels each, which would be 600 megabytes uncompressed, but is 24.4 megabytes compressed with DjVu --- still around a megabyte per page. <http://www.archive.org/details/ChurchDictionary --- Walter Parquhar Hook's 1842 Church Dictionary>. By itself, such a collection of images would be slightly less useful than the original books, although much more easily reproduced. You would still have to page through the collection of numbered pages one by one to find the page containing the word you wanted, but the images would be displayed on a conventional small low-resolution computer monitor rather than a large, high-resolution book page. Consequently it would be somewhat slower to access than the original books. However, once the page images were available in the public domain, it would be possible at any later time, and in any small increment, to annotate them with OCR results or hand-written transcriptions, which could also be corrected by people consulting it. (My experiments with free-software OCR have not been terribly encouraging so far.)