Pricewatch.com currently (2010-10-26) lists a 2TB drive (“Seagate
ST32000542AS Seagate Barracuda LP ST32000542AS 2TB 5900 RPM 32MB Cache
SATA 3.0Gb/s”) for US$120 with free shipping in the US, and that
appears to be a typical price.  US$120 for two terabytes is
US$7.5 × 10⁻¹² per bit.

I pay AR$100 per month for my internet connection here in Argentina;
last I checked, I could download stuff from abroad over it at 31
kilobytes per second, although this varies considerably.  AR$100 is
about US$25, so if I were downloading constantly at an average of 31
kilobytes per second, I would be paying US$3.8 × 10⁻¹¹ per bit.  In
practice, I don’t download at full speed 24/7, not least because the
latency on the poorly-configured cable modem goes to hell, so I
actually pay more for this.

The interesting point about the above is that, for me, downloading
some piece of data costs about five times more than buying disk space
to store it.  If I bought that 2TB drive, it would take me 24 months
of constant full-speed downloading to fill it, which would cost
US$600.

Downloading an ebook
--------------------

To sharpen this point further, suppose I’m downloading a copy of Uncle
Tom’s Cabin from Manybooks.net.  It takes me about a minute to
navigate the site to find and download the book, which is an
opportunity cost of about US$2.00.  The .mobi-format file is 657200
bytes, which takes about 21 seconds to download ($0.0002), and until I
delete it, it occupies that amount of space ($0.00004).  And reading
it will take about four hours, an opportunity cost of about US$480.

What about energy costs?  I’m using a US$300 computer to do the
downloading, which is consuming about 100 watts.  Straight-line
depreciation of the computer over three years yields a depreciation of
US$0.00026 during the 81 seconds, and 100 watts at a sort of average
retail cost of electricity of US$0.10 per kilowatt-hour is US$0.00023.

    |---------------------+----------+---------+------------|
    | what                | how much | of what | cost (US$) |
    |---------------------+----------+---------+------------|
    | navigate site       |        1 | minute  |       2.00 |
    | download file       |   657200 | bytes   |     0.0002 |
    | store file          |   657200 | bytes   |    0.00004 |
    | depreciate computer |       81 | seconds |    0.00026 |
    | 100 watts           |       81 | seconds |    0.00023 |
    | read book           |        4 | hours   |     480.00 |
    |---------------------+----------+---------+------------|
    | total               |          |         |  482.00073 |
    |---------------------+----------+---------+------------|

By comparison, a 384-page paper copy of Uncle Tom’s Cabin costs
US$4.00 on Amazon.

The Amazon “Swindle” (so-called because even after you buy it, Amazon
still controls it) and similar devices have removed the need to
consume US$4 worth of paper (and US$40 or so worth of laser printer
time, at least at the rates charged around here) to read the book
comfortably, at least if you read substantially more than 30 books.
(One downside of this is that Amazon, since they still control the
device, can send your books to the memory hole if it decides it
doesn’t like them, as they famously did with copies of _1984_.  For
the time being, they probably can’t do the same with copies on your
hard disk.)

For non-laborers
----------------

For people who can’t sell their time for money, there is a remarkable
thing in the above.  The cost of downloading the ebook, exclusive of
the cost to their time, is US$0.00073.  This is a substantial
reduction from the US$4.00000 cost of the paper copy.  But it is only
available to them if they have a computing device like the Swindle or
the OLPC XO that can display the text to them comfortably.

Straight-line depreciating a US$139 Swindle over three years yields a
cost of US$0.02 for the four hours needed to read Uncle Tom’s Cabin,
which swamps all the downloading costs.  But it’s still substantially
less than the US$4.00 for the paper copy.

A device that cost an order of magnitude less --- perhaps with
text-to-speech --- would lower the effective cost to the non-laborer
of reading ebooks by an order of magnitude.

For non-text
------------

The above makes clear that the limiting factor in access to textual
information is no longer the cost to transmit and store it; the costs
of transmitting and storing it are about 30 times less than the
depreciation cost of displaying it, and about five orders of magnitude
less than the opportunity cost of a laborer like me taking the time to
enjoy the information.  Other forms of information require many more
bits per second, but they can be enjoyed at only a slightly higher
same cost per second, until you get to formats like JPEG, MP3, and
MPEG.

Geographical reach
------------------

The curious inversion that I’m in, where it costs more to fill the
disk than to buy it, has not yet reached much of the US, and will take
even longer to reach Japan and Korea.  However, it has already reached
much of the world, and there’s no reason to expect the exponential
growth lines to fail to cross everywhere the way they’ve already
crossed here.  Disks continue to halve their cost per bit every 15
months, while internet bandwidth continues to halve its cost per bit
every 4 years or so.

There are places that pay even more than I do. New Zealanders tell me
that typical broadband there costs NZ$60 per month plus NZ$2/GB.  If
we assume 30GB as typical, that adds up to NZ$4/GB, which is US$3/GB,
or US$3.8 × 10⁻¹⁰ per bit, ten times as high as the price I pay.

Some interesting corpus sizes
-----------------------------

What kinds of things could you fill a 2TB disk with?

    |--------------------------------------+--------+-----------------------|
    | what                                 | size   | contents              |
    |--------------------------------------+--------+-----------------------|
    | English Wikipedia (compressed)       | 6.1GB  | 2 million articles    |
    | (uncompressed)                       | 27GB   | same                  |
    | (all historical revisions, 7-zipped) | 31GB   | same, plus history    |
    | Project Gutenberg April 2010 DVD     | 7.8GB  | 29500 published books |
    | Current Debian stable source (5.0.6) | 16.8GB | lots of free software |
    | Debian i386 binaries                 | 18.5GB | same, but compiled    |
    |--------------------------------------+--------+-----------------------|

All of those together only add up to 74GB.  I don't know of any place
to download two terabytes of data.

Possible consequences
---------------------

The rapidly falling price of disk storage --- and the more slowly
falling price of network bandwidth --- seems likely to have some
interesting effects in the coming years.  

First, perhaps the market for bigger and bigger disks will collapse,
since most people don’t generate enough data locally to fill their
disks, or they do so only with the expectation of being able to share
it over the internet with their friends and family and beyond.  We’re
already seeing this to some extent as many computers have switched
entirely to SSDs and no longer use spinning disks.

Second, perhaps secondary means of transferring data will gain more
importance.  LAN parties, local wireless networks, and physically
shipping disks from one place to another may become more widely used,
as it becomes comparatively more difficult to copy around
high-resolution digital photographs, amateur movies, crawls of the
entire World-Wide Web, and so on.

Third, perhaps deletion of files will become less important --- and
less easy in the user interface.  Certain kinds of files, such as the
aforementioned high-resolution digital photographs, will still need to
be deleted because they weren’t interesting enough to share.  But old
versions of text documents, software, copies of Uncle Tom’s Cabin?
Delete only for privacy and security reasons.

Fourth, perhaps disks will be normally sold pre-filled with files ---
movies, books, snapshots of Wikipedia, massive quantities of free
software, and so on.

Fifth, perhaps software to tell when you already have a file on your
disk, and can thus avoid downloading it, will become more important.
Content-based naming schemes like the ones used in Git and BitTorrent
could facilitate this enormously.  In some cases, these can be used to
find when other computers physically near you have the files as well.
(BitTorrent is a good example of this, although it has some trouble
with NAT.)

Sixth, perhaps software will become much more aggressive about using
local disk to avoid downloading stuff over the network.

Seventh, an increasing range of material would ideally be downloaded
optimistically (“prefetched”), especially when the connection is idle.
21 seconds of my time waiting costs on the order of US$0.70; 21
seconds of use of my internet connection costs US$0.0002.  So even if
I only ever read one out of every 3500 things that was optimistically
downloaded, I’m still better off.  Even at a much lower time
opportunity cost, reading 1% of the prefetched text would make it a
better deal.

-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-tol

Reply via email to