Re: [aspell-devel] Problems with aspell-en license

Branden Robinson Mon, 21 Oct 2002 14:01:38 -0500

[I am not subscribed to aspell-devel; kindly follow-up to both lists.]

On Mon, Oct 21, 2002 at 01:33:39AM -0400, Glenn Maynard wrote:
> My experience with debian-legal is that, while they're picky about
> licenses (for very good reason), they tend to respond to questioned
> licenses with "let's fix it", not "let's rip it out", whenever possible.
> 
> The question of whether wordlists are copyrightable came up before:
> 
>    http://lists.debian.org/debian-legal/2002/debian-legal-200208/msg00288.html
> 
> There doesn't appear to have been any resolution, though, since it
> wasn't required at the time.  (It may not be now, either.)


In the message you cite, RMS makes a blanket claim that "word lists are
copyrightable".

I think he's wrong, at least under U.S. copyright law.

I think we can agree that a word list is a "compilation" of things, so we
can apply the definition of "compilation" under Title 17, Section 101 of
the United States Code:

        A ''compilation'' is a work formed by the collection and
      assembling of preexisting materials or of data that are selected,
      coordinated, or arranged in such a way that the resulting work as
      a whole constitutes an original work of authorship.  The term
      ''compilation'' includes collective works.

(One might tempted to call a word list a "collective work", but one
would be wrong under the language of Section 101.

        A ''collective work'' is a work, such as a periodical issue,
      anthology, or encyclopedia, in which a number of contributions,
      constituting separate and independent works in themselves, are
      assembled into a collective whole.

I take it we can all agree that individual words in a language are not
separate and independent copyrightable works in and of themselves.)

Now, let's check Section 102:

      (b) In no case does copyright protection for an original work of
    authorship extend to any idea, procedure, process, system, method
    of operation, concept, principle, or discovery, regardless of the
    form in which it is described, explained, illustrated, or embodied
    in such work.

So, even if a person tries to assert copyright in a word list, one can
create an alternative word list of equal or greater utility by simply
extracting terms from any of several publicly-licensed dictionaries,
such as Webster's 1913 dictionary (one of the few works this century to
have fallen into the public domain in the U.S.), FOLDOC, VERA, the
Jargon file, and so forth.

The copyright in those dictionaries -- themselves copyrighted works at
some point -- resides in the definitions, not the defined words
themselves, at least insofar as anything calling itself a "dictionary"
purports to define words that are *actually already in use*.  This
application is the only one that is broadly useful for programs like
ispell and aspell (if you need a private word list so you don't misspell
the names of the characters in a novel you're writing, you can make
one).

By this argument, an arrangement of unoriginal words cannot be
copyrightable when it purports to be ordered according to a certain
idea, procedure, process, system, method of operation, concept,
principle, or discovery.

I have another argument to levy against the copyrightablility of word
lists:

If there is any process by which we can create a superset of the
supposedly-copyrighted word list and subsequently prune it down to a
duplicate, then that word list is provably non-original.

Yes, under my reasoning you could create a huge list of all
permutations of the letters of the alphabet, and use the supposedly
copyrighted word list to weed out everything that didn't match.

This reasoning would also have consequences for collections of
unoriginal words where order is important for semantic reasons.  For
instance, a news article or a novel.  It is obviously extremely
computationally expensive to generate all possible sequences of words.
(Given a million monkeys sitting at a million typewriters, typing at 10
characters per second for a million years, would a sequence identical to
_Hamlet_ appear somewhere in their output?)

In practice, there is already a lower bound on the length of works that
are allowed to be copyrighted in most countries, though large corporate
interests are aggresively trying to lower it.

As a matter of social policy, perhaps what we want to be doing is
raising the bar, instead.

Ponder the following thought experiment:

Take the twelve notes of the western chromatic scale.  Consider a rest
to be a thirteenth "note".  Take the five most common note lengths, the
whole note, half note, quarter note, eighth note, and sixteenth note.
We have now defined a domain of pitch and duration.  Consider next a
sequence of five of these domains.  Many of us can recongize a "unique"
work of music in five notes or less -- almost no one needs more than
four, for instance, to recognize the opening of Beethoven's Fifth
Symphony.  Sticking with the beginnings of pieces, this same number of
notes and/or rests gives you a pretty good guess on Mozart's "Eine
Kleine Nachmusik" and a virtual certainty on Metallica's "Master of
Puppets" -- assuming you're familiar with the piece.

How many combinations of such sequences are there?

(13*5)^5
1160290625

That's right, just over 1.1 billion.  To some people this will sound
large.  They should recall that there are over six times this many
*people* on the planet right now.
This space can be comprehensively enumerated, and all of the sequences
generated, by a modern computer in just a few minutes.  Even without
compression, each sequence could be encoded in forty bytes.  That means
many of us have hard drives that can store this entire "music-space".

So, should we be extending copyright protection to five-note sequences?

Are you a musician?  How do you propose to avoid using one of those
five-note sequences?  You'd better hang up the guitar or piano, and/or
start using lots of breves and thirty-second notes.  Sony Entertainment
might decide to sell a 40GB hard drive containing all of the above
sequences in something called an "S-Pod", and might sue you for
copyright infringement and "unfair competition" if you dared to
distribute music that duplicated the copyrighted music on their S-Pod.

To wrap this all up, I think a lot of people desperately underplay the
requirement for _originality_ that copyright demands.
"Sweat-of-the-brow" arguments don't work, and must not be allowed to
work.  Enumerating all possible sequences of five notes or rests may have
been an impossible task for a medieval monk, but is ridiculously simple
with today's computers.

Modern technology should be enhancing our creativity, not substituting
for it.  If copyright is to remain meaningful it must be restricted to
works that only humans can create.  We must retain the requirement of
originality.  To do otherwise exposes us to the depredations of
scoundrels who would assert copyright protection in all 1.1 billons
five-note sequences and then extract "public performance" license fees
from you when you whistle on the street.  Think that's an exaggeration?
Fair Use is already dead in the United States when it comes to digital
media.  The only thing keeping it alive elsewhere is the difficulty of
enforcement, which is why media cartels have bought laws like the DMCA,
which forbid you from telling people how to use technology to exercise
your Fair Use rights.

If we, the Free Software community, want copyright law to be our tool
and not merely the tool of our ideological opponents, me must ensure that
its premises remain firmly anchored on common sense.  That an intangible
work of the intellect is good or useful doesn't necessarily mean that it
is copyrightable.  To claim otherwise is senseless yielding of ground to
expansionist, rent-seeking media empires.  Asserting copyright in word
lists extracted from dictionaries, and other works that are the result
of the un-original application of processes, is a tactic that forfeits
strategies we'll need in the future.  Let us not surrender battles we
have yet to fight.

-- 
G. Branden Robinson                |     Communism is just one step on the
Debian GNU/Linux                   |     long road from capitalism to
[EMAIL PROTECTED]                 |     capitalism.
http://people.debian.org/~branden/ |     -- Russian saying

pgpdnUkHUvCx6.pgp
Description: PGP signature

Re: [aspell-devel] Problems with aspell-en license

Reply via email to