[I am not subscribed to aspell-devel; kindly follow-up to both lists.] On Mon, Oct 21, 2002 at 01:33:39AM -0400, Glenn Maynard wrote: > My experience with debian-legal is that, while they're picky about > licenses (for very good reason), they tend to respond to questioned > licenses with "let's fix it", not "let's rip it out", whenever possible. > > The question of whether wordlists are copyrightable came up before: > > http://lists.debian.org/debian-legal/2002/debian-legal-200208/msg00288.html > > There doesn't appear to have been any resolution, though, since it > wasn't required at the time. (It may not be now, either.)
In the message you cite, RMS makes a blanket claim that "word lists are copyrightable". I think he's wrong, at least under U.S. copyright law. I think we can agree that a word list is a "compilation" of things, so we can apply the definition of "compilation" under Title 17, Section 101 of the United States Code: A ''compilation'' is a work formed by the collection and assembling of preexisting materials or of data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship. The term ''compilation'' includes collective works. (One might tempted to call a word list a "collective work", but one would be wrong under the language of Section 101. A ''collective work'' is a work, such as a periodical issue, anthology, or encyclopedia, in which a number of contributions, constituting separate and independent works in themselves, are assembled into a collective whole. I take it we can all agree that individual words in a language are not separate and independent copyrightable works in and of themselves.) Now, let's check Section 102: (b) In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work. So, even if a person tries to assert copyright in a word list, one can create an alternative word list of equal or greater utility by simply extracting terms from any of several publicly-licensed dictionaries, such as Webster's 1913 dictionary (one of the few works this century to have fallen into the public domain in the U.S.), FOLDOC, VERA, the Jargon file, and so forth. The copyright in those dictionaries -- themselves copyrighted works at some point -- resides in the definitions, not the defined words themselves, at least insofar as anything calling itself a "dictionary" purports to define words that are *actually already in use*. This application is the only one that is broadly useful for programs like ispell and aspell (if you need a private word list so you don't misspell the names of the characters in a novel you're writing, you can make one). By this argument, an arrangement of unoriginal words cannot be copyrightable when it purports to be ordered according to a certain idea, procedure, process, system, method of operation, concept, principle, or discovery. I have another argument to levy against the copyrightablility of word lists: If there is any process by which we can create a superset of the supposedly-copyrighted word list and subsequently prune it down to a duplicate, then that word list is provably non-original. Yes, under my reasoning you could create a huge list of all permutations of the letters of the alphabet, and use the supposedly copyrighted word list to weed out everything that didn't match. This reasoning would also have consequences for collections of unoriginal words where order is important for semantic reasons. For instance, a news article or a novel. It is obviously extremely computationally expensive to generate all possible sequences of words. (Given a million monkeys sitting at a million typewriters, typing at 10 characters per second for a million years, would a sequence identical to _Hamlet_ appear somewhere in their output?) In practice, there is already a lower bound on the length of works that are allowed to be copyrighted in most countries, though large corporate interests are aggresively trying to lower it. As a matter of social policy, perhaps what we want to be doing is raising the bar, instead. Ponder the following thought experiment: Take the twelve notes of the western chromatic scale. Consider a rest to be a thirteenth "note". Take the five most common note lengths, the whole note, half note, quarter note, eighth note, and sixteenth note. We have now defined a domain of pitch and duration. Consider next a sequence of five of these domains. Many of us can recongize a "unique" work of music in five notes or less -- almost no one needs more than four, for instance, to recognize the opening of Beethoven's Fifth Symphony. Sticking with the beginnings of pieces, this same number of notes and/or rests gives you a pretty good guess on Mozart's "Eine Kleine Nachmusik" and a virtual certainty on Metallica's "Master of Puppets" -- assuming you're familiar with the piece. How many combinations of such sequences are there? (13*5)^5 1160290625 That's right, just over 1.1 billion. To some people this will sound large. They should recall that there are over six times this many *people* on the planet right now. This space can be comprehensively enumerated, and all of the sequences generated, by a modern computer in just a few minutes. Even without compression, each sequence could be encoded in forty bytes. That means many of us have hard drives that can store this entire "music-space". So, should we be extending copyright protection to five-note sequences? Are you a musician? How do you propose to avoid using one of those five-note sequences? You'd better hang up the guitar or piano, and/or start using lots of breves and thirty-second notes. Sony Entertainment might decide to sell a 40GB hard drive containing all of the above sequences in something called an "S-Pod", and might sue you for copyright infringement and "unfair competition" if you dared to distribute music that duplicated the copyrighted music on their S-Pod. To wrap this all up, I think a lot of people desperately underplay the requirement for _originality_ that copyright demands. "Sweat-of-the-brow" arguments don't work, and must not be allowed to work. Enumerating all possible sequences of five notes or rests may have been an impossible task for a medieval monk, but is ridiculously simple with today's computers. Modern technology should be enhancing our creativity, not substituting for it. If copyright is to remain meaningful it must be restricted to works that only humans can create. We must retain the requirement of originality. To do otherwise exposes us to the depredations of scoundrels who would assert copyright protection in all 1.1 billons five-note sequences and then extract "public performance" license fees from you when you whistle on the street. Think that's an exaggeration? Fair Use is already dead in the United States when it comes to digital media. The only thing keeping it alive elsewhere is the difficulty of enforcement, which is why media cartels have bought laws like the DMCA, which forbid you from telling people how to use technology to exercise your Fair Use rights. If we, the Free Software community, want copyright law to be our tool and not merely the tool of our ideological opponents, me must ensure that its premises remain firmly anchored on common sense. That an intangible work of the intellect is good or useful doesn't necessarily mean that it is copyrightable. To claim otherwise is senseless yielding of ground to expansionist, rent-seeking media empires. Asserting copyright in word lists extracted from dictionaries, and other works that are the result of the un-original application of processes, is a tactic that forfeits strategies we'll need in the future. Let us not surrender battles we have yet to fight. -- G. Branden Robinson | Communism is just one step on the Debian GNU/Linux | long road from capitalism to [EMAIL PROTECTED] | capitalism. http://people.debian.org/~branden/ | -- Russian saying
pgpdnUkHUvCx6.pgp
Description: PGP signature