I've noticed a possible commercial-use restriction while adopting the wenglish package [a /usr/share/dict/words list of english words, in main/text]. I searched the -devel and -legal archives, but found no previous discussion about this.
The upstream README.linux.words file describes the "non-copyright" status of the word lists that were used to construct this word list, but its description of one of those component lists (a README within the README) says: To the best of my knowledge, all the files I used to build these wordlists were available for public distribution and use, at least for non-commercial purposes. I have confirmed this assumption with the authors of the lists, whenever they were known. Therefore, it is safe to assume that the wordlists in this package can also be freely copied, distributed, modified, and used for personal, educational, and research purposes. (Use of these files in commercial products may require written permission from DEC and/or the authors of the original lists.) The upstream README.linux.words has until now been (unintentionally?) excluded from the Debian package. The previous debian maintainers whom I've heard from have no knowledge of why this was left out, or of this package's DFSG compliance.) A perhaps important point: this is not 'software' (in the genrally accepted sense), it is a plain-text alphabetical list of english words, which were extracted from other lists of english words, which were (as described in the README.linux.words) created from various apparently free sources. So my *feeling* is that DEC and the other authors of those original lists can't place inherited commercial-use restrictions on a new word list that was constructed by copying (most of) the words from their lists and merging them with other lists. [Hmmmm, if I took a copyrighted novel and published an alphabetical list of the words extracted from its text, would I be violating the author's copyright? I doubt it.] Here is the entire upstream README.linux.words file. I've numbered the lines (with 'nl'); in all other respects this is unaltered. lines 160-169 are what I'm worried about, but you probably have to read it in context. The simple question is: is the resultant list DFSG-compliant? Thanks. 1 #!/bin/sh -xe 2 # README.linux.words - file used to create linux.words 3 # Created: Wed Mar 10 09:12:49 1993 by [EMAIL PROTECTED] (Rik Faith) 4 # Revised: Sat Mar 13 17:02:08 1993 by [EMAIL PROTECTED] 5 # 6 # Care was taken to be sure that the linux.words list was free of 7 # copyright. This makes linux.words a suitable /usr/dict/words 8 # replacement for the Linux community. 9 # 10 # Since the majority of the words are from Tanenbaum's minix.dict file, 11 # the notice from Barry Brachman, included below, should accompany any 12 # redistribution of this list. 13 # Here is a detailed explaination of how I created the linux.words file. 14 # 15 # This README.words file is actually a shell script that you can use to 16 # recreate the linux.words file from original sources. 17 # 18 # First, I started with minix.dict 19 # from cs.ubc.ca:/pub/local/src/sp-1.5/wordlists-1.0.tar.Z 20 # 21 # The following is from the NOTES file in wordlists-1.0.tar.Z: 22 # NOTES> These word lists were collected by Barry Brachman 23 # NOTES> <[EMAIL PROTECTED]> at the University of British Columbia. They 24 # NOTES> may be freely distributed as long as this notice accompanies them. 25 # NOTES> 26 # NOTES> ================================================================== 27 # NOTES> Info for minix.dict: 28 # NOTES> 29 # NOTES> Article 1997 of comp.os.minix: 30 # NOTES> From: [EMAIL PROTECTED] 31 # NOTES> Subject: A spelling checker for MINIX 32 # NOTES> Date: 6 Jan 88 22:28:22 GMT 33 # NOTES> Reply-To: [EMAIL PROTECTED] (Andy Tanenbaum) 34 # NOTES> Organization: VU Informatica, Amsterdam 35 # NOTES> 36 # NOTES> This dictionary is NOT based on the UNIX dictionary so it is free 37 # NOTES> of AT&T copyright. I built the dictionary from three sources. 38 # NOTES> First, I started by sorting and uniq'ing some public domain 39 # NOTES> dictionaries. Second, as some of you probably know, I have 40 # NOTES> written somewhere between 3 and 6 books (depending on precisely 41 # NOTES> what you count) and an additional 50 published papers on operating 42 # NOTES> systems, networks, compilers, languages, etc. This data base, 43 # NOTES> which is online, is nonnegligible :-) Finally, I added a number of 44 # NOTES> words that I thought ought to be in the dictionary including all 45 # NOTES> the U.S. states, all the European and some other major countries, 46 # NOTES> principal U.S. and world cities, and a bunch of technical terms. 47 # NOTES> I don't want my spelling checker to barf on arpanet, diskless, 48 # NOTES> modem, login, internetwork, subdirectory, superuser, vlsi, or 49 # NOTES> winchester just because Webster wouldn't approve of them. All in 50 # NOTES> all, the dictionary is over 40,000 words. If you have any 51 # NOTES> suggestions for additions or deletions, please post them. But 52 # NOTES> please be sure you are not infringing on anyone's copyright in 53 # NOTES> doing so. 54 # NOTES> 55 # NOTES> Andy Tanenbaum ([EMAIL PROTECTED]) 56 # The main problem with minix.dict is that many proper names are not 57 # capitalized. So, I got english.tar.Z from ftp.uu.net:/doc/dictionaries, 58 # which is a mirror of nic.funet.fi:/pub/unix/security/dictionaries. 59 # 60 # Here is part of the README file for english.tar.Z: 61 # README> 62 # README> FILE: english.words 63 # README> VERSION: DEC-SRC-92-04-05 64 # README> 65 # README> EDITOR 66 # README> 67 # README> Jorge Stolfi <[EMAIL PROTECTED]> 68 # README> DEC Systems Research Center 69 # README> 70 # README> AUTHORS OF ORIGIONAL WORDLISTS 71 # README> 72 # README> Andy Tanenbaum <[EMAIL PROTECTED]> 73 # README> Barry Brachman <[EMAIL PROTECTED]> 74 # README> Geoff Kuenning <[EMAIL PROTECTED]> 75 # README> Henk Smit <[EMAIL PROTECTED]> 76 # README> Walt Buehring <[EMAIL PROTECTED]> 77 # 78 # [stuff seleted] 79 # 80 # README> AUXILIARY LISTS 81 # README> 82 # README> In the same directory as englis.words there are a few 83 # README> complementary word lists, all derived from the same sources 84 # README> [1--8] as the main list: 85 # README> 86 # README> english.names 87 # README> 88 # README> A list of common English proper names and their derivatives. 89 # README> The list includes: person names ("John", "Abigail", 90 # README> "Barrymore"); countries, nations, and cities ("Germany", 91 # README> "Gypsies", "Moscow"); historical, biblical and mythological 92 # README> figures ("Columbus", "Isaiah", "Ulysses"); important 93 # README> trademarked products ("Xerox", "Teflon"); biological genera 94 # README> ("Aerobacter"); and some of their derivatives ("Germans", 95 # README> "Xeroxed", "Newtonian"). 96 # README> 97 # README> misc.names 98 # README> 99 # README> A list of foreign-sounding names of persons and places 100 # README> ("Antonio", "Albuquerque", "Balzac", "Stravinski"), extracted 101 # README> from the lists [1--8]. (The distinction betweeen 102 # README> "English-sounding" and "foreign-sounding" is of course rather 103 # README> arbitrary). 104 # README> 105 # README> org.names 106 # README> 107 # README> A short lists names of corporations and other institutions 108 # README> ("Pepsico", "Amtrak", "Medicare"), and a few derivatives. 109 # README> 110 # README> The file also includes some initialisms --- acronyms and 111 # README> abbreviations that are generally pronounced as words rather 112 # README> than spelled out ("NASA", "UNESCO"). 113 # README> 114 # README> english.abbrs 115 # README> 116 # README> A list of common abbreviations ("etc.", "Dr.", "Wed."), 117 # README> acronyms ("A&M", "CPU", "IEEE"), and measurement symbols 118 # README> ("ft", "cm", "ns", "kHz"). 119 # README> 120 # README> english.trash 121 # README> 122 # README> A list of words from the original wordlists 123 # README> that I decided were either wrong or unsuitable for inclusion 124 # README> in the file english.words or any of the other auxiliary 125 # README> lists. It includes 126 # README> 127 # README> typos ("accupy", "aquariia", "automatontons") 128 # README> spelling errors ("abcissa", "alleviater", "analagous") 129 # README> bogus derived forms ("homeown", "unfavorablies", "catched") 130 # README> uncapitalized proper names ("afghanistan", 131 # README> "algol", "decnet") 132 # README> uncapitalized acronyms ("apl", "ccw", "ibm") 133 # README> unpunctuated abbreviations ("amp", "approx", "etc") 134 # README> British spellings ("advertize", "archaeology") 135 # README> archaic words ("bedight") 136 # README> rare variants ("babirousa") 137 # README> unassimilated foreign words ("bambino", "oui", "caballero") 138 # README> mis-hyphenated compounds ("babylike", "backarrows") 139 # README> computer keywords and slang ("lconvert", "noecho", "prog") 140 # README> 141 # README> (I apologize for excluding British spellings. I should have 142 # README> split the list in three sublists--- common English, British, 143 # README> American---as ispell does. But there are only so many hours 144 # README> in a day...) 145 # README> 146 # README> english.maybe 147 # README> 148 # README> A list of about 5,000 lowercase words from the "mts.dict" 149 # README> wordlist [6] that weren't included in english.words. 150 # README> 151 # README> This list seems to include lots of "trash", like 152 # README> uncapitalized proper names and weird words. It would 153 # README> take me several days to sort this mess, so I decided to 154 # README> leave it as a separate file. Use at your own risk... 155 # 156 # [stuff deleted] 157 # 158 # README> (NON-)COPYRIGHT STATUS 159 # README> 160 # README> To the best of my knowledge, all the files I used to build these 161 # README> wordlists were available for public distribution and use, at least 162 # README> for non-commercial purposes. I have confirmed this assumption with 163 # README> the authors of the lists, whenever they were known. 164 # README> 165 # README> Therefore, it is safe to assume that the wordlists in this 166 # README> package can also be freely copied, distributed, modified, and 167 # README> used for personal, educational, and research purposes. (Use of 168 # README> these files in commercial products may require written 169 # README> permission from DEC and/or the authors of the original lists.) 170 # README> 171 # README> Whenever you distribute any of these wordlists, please distribute 172 # README> also the accompanying README file. If you distribute a modified 173 # README> copy of one of these wordlists, please include the original README 174 # README> file with a note explaining your modifications. Your users will 175 # README> surely appreciate that. 176 # README> 177 # README> (NO-)WARRANTY DISCLAIMER 178 # README> 179 # README> These files, like the original wordlists on which they are 180 # README> based, are still very incomplete, uneven, and inconsitent, and 181 # README> probably contain many errors. They are offered "as is" without 182 # README> any warranty of correctness or fitness for any particular 183 # README> purpose. Neither I nor my employer can be held responsible for 184 # README> any losses or damages that may result from their use. 185 # subtract english.trash 186 cat minix.dict english.trash english.trash | sort | uniq -u > dict.1 187 # subtract english.maybe 188 cat dict.1 english.maybe english.maybe | sort | uniq -u > dict.2 189 # build subtraction list of proper names and abbreviations 190 cat english.names misc.names org.names computer.names english.abbrs > sub.1 191 tr 'A-Z' 'a-z' < sub.1 | sort | uniq -u > sub.2 192 # subtract proper names with incorrect capitalization 193 cat dict.2 sub.2 sub.2 | sort | uniq -u > dict.3 194 # build proper name list without possessives 195 cat english.names misc.names org.names computer.names | fgrep -v \'s > names.1 196 # add in proper names (use sort twice to get uppercase before lowercase) 197 cat dict.3 names.1 | sort | sort -df | uniq > linux.words 198 # clean up 199 rm dict.[123] sub.[12] names.1