Pricewatch.com currently (2010-10-26) lists a 2TB drive (“Seagate ST32000542AS Seagate Barracuda LP ST32000542AS 2TB 5900 RPM 32MB Cache SATA 3.0Gb/s”) for US$120 with free shipping in the US, and that appears to be a typical price. US$120 for two terabytes is US$7.5 × 10⁻¹² per bit.
I pay AR$100 per month for my internet connection here in Argentina; last I checked, I could download stuff from abroad over it at 31 kilobytes per second, although this varies considerably. AR$100 is about US$25, so if I were downloading constantly at an average of 31 kilobytes per second, I would be paying US$3.8 × 10⁻¹¹ per bit. In practice, I don’t download at full speed 24/7, not least because the latency on the poorly-configured cable modem goes to hell, so I actually pay more for this. The interesting point about the above is that, for me, downloading some piece of data costs about five times more than buying disk space to store it. If I bought that 2TB drive, it would take me 24 months of constant full-speed downloading to fill it, which would cost US$600. Downloading an ebook -------------------- To sharpen this point further, suppose I’m downloading a copy of Uncle Tom’s Cabin from Manybooks.net. It takes me about a minute to navigate the site to find and download the book, which is an opportunity cost of about US$2.00. The .mobi-format file is 657200 bytes, which takes about 21 seconds to download ($0.0002), and until I delete it, it occupies that amount of space ($0.00004). And reading it will take about four hours, an opportunity cost of about US$480. What about energy costs? I’m using a US$300 computer to do the downloading, which is consuming about 100 watts. Straight-line depreciation of the computer over three years yields a depreciation of US$0.00026 during the 81 seconds, and 100 watts at a sort of average retail cost of electricity of US$0.10 per kilowatt-hour is US$0.00023. |---------------------+----------+---------+------------| | what | how much | of what | cost (US$) | |---------------------+----------+---------+------------| | navigate site | 1 | minute | 2.00 | | download file | 657200 | bytes | 0.0002 | | store file | 657200 | bytes | 0.00004 | | depreciate computer | 81 | seconds | 0.00026 | | 100 watts | 81 | seconds | 0.00023 | | read book | 4 | hours | 480.00 | |---------------------+----------+---------+------------| | total | | | 482.00073 | |---------------------+----------+---------+------------| By comparison, a 384-page paper copy of Uncle Tom’s Cabin costs US$4.00 on Amazon. The Amazon “Swindle” (so-called because even after you buy it, Amazon still controls it) and similar devices have removed the need to consume US$4 worth of paper (and US$40 or so worth of laser printer time, at least at the rates charged around here) to read the book comfortably, at least if you read substantially more than 30 books. (One downside of this is that Amazon, since they still control the device, can send your books to the memory hole if it decides it doesn’t like them, as they famously did with copies of _1984_. For the time being, they probably can’t do the same with copies on your hard disk.) For non-laborers ---------------- For people who can’t sell their time for money, there is a remarkable thing in the above. The cost of downloading the ebook, exclusive of the cost to their time, is US$0.00073. This is a substantial reduction from the US$4.00000 cost of the paper copy. But it is only available to them if they have a computing device like the Swindle or the OLPC XO that can display the text to them comfortably. Straight-line depreciating a US$139 Swindle over three years yields a cost of US$0.02 for the four hours needed to read Uncle Tom’s Cabin, which swamps all the downloading costs. But it’s still substantially less than the US$4.00 for the paper copy. A device that cost an order of magnitude less --- perhaps with text-to-speech --- would lower the effective cost to the non-laborer of reading ebooks by an order of magnitude. For non-text ------------ The above makes clear that the limiting factor in access to textual information is no longer the cost to transmit and store it; the costs of transmitting and storing it are about 30 times less than the depreciation cost of displaying it, and about five orders of magnitude less than the opportunity cost of a laborer like me taking the time to enjoy the information. Other forms of information require many more bits per second, but they can be enjoyed at only a slightly higher same cost per second, until you get to formats like JPEG, MP3, and MPEG. Geographical reach ------------------ The curious inversion that I’m in, where it costs more to fill the disk than to buy it, has not yet reached much of the US, and will take even longer to reach Japan and Korea. However, it has already reached much of the world, and there’s no reason to expect the exponential growth lines to fail to cross everywhere the way they’ve already crossed here. Disks continue to halve their cost per bit every 15 months, while internet bandwidth continues to halve its cost per bit every 4 years or so. There are places that pay even more than I do. New Zealanders tell me that typical broadband there costs NZ$60 per month plus NZ$2/GB. If we assume 30GB as typical, that adds up to NZ$4/GB, which is US$3/GB, or US$3.8 × 10⁻¹⁰ per bit, ten times as high as the price I pay. Some interesting corpus sizes ----------------------------- What kinds of things could you fill a 2TB disk with? |--------------------------------------+--------+-----------------------| | what | size | contents | |--------------------------------------+--------+-----------------------| | English Wikipedia (compressed) | 6.1GB | 2 million articles | | (uncompressed) | 27GB | same | | (all historical revisions, 7-zipped) | 31GB | same, plus history | | Project Gutenberg April 2010 DVD | 7.8GB | 29500 published books | | Current Debian stable source (5.0.6) | 16.8GB | lots of free software | | Debian i386 binaries | 18.5GB | same, but compiled | |--------------------------------------+--------+-----------------------| All of those together only add up to 74GB. I don't know of any place to download two terabytes of data. Possible consequences --------------------- The rapidly falling price of disk storage --- and the more slowly falling price of network bandwidth --- seems likely to have some interesting effects in the coming years. First, perhaps the market for bigger and bigger disks will collapse, since most people don’t generate enough data locally to fill their disks, or they do so only with the expectation of being able to share it over the internet with their friends and family and beyond. We’re already seeing this to some extent as many computers have switched entirely to SSDs and no longer use spinning disks. Second, perhaps secondary means of transferring data will gain more importance. LAN parties, local wireless networks, and physically shipping disks from one place to another may become more widely used, as it becomes comparatively more difficult to copy around high-resolution digital photographs, amateur movies, crawls of the entire World-Wide Web, and so on. Third, perhaps deletion of files will become less important --- and less easy in the user interface. Certain kinds of files, such as the aforementioned high-resolution digital photographs, will still need to be deleted because they weren’t interesting enough to share. But old versions of text documents, software, copies of Uncle Tom’s Cabin? Delete only for privacy and security reasons. Fourth, perhaps disks will be normally sold pre-filled with files --- movies, books, snapshots of Wikipedia, massive quantities of free software, and so on. Fifth, perhaps software to tell when you already have a file on your disk, and can thus avoid downloading it, will become more important. Content-based naming schemes like the ones used in Git and BitTorrent could facilitate this enormously. In some cases, these can be used to find when other computers physically near you have the files as well. (BitTorrent is a good example of this, although it has some trouble with NAT.) Sixth, perhaps software will become much more aggressive about using local disk to avoid downloading stuff over the network. Seventh, an increasing range of material would ideally be downloaded optimistically (“prefetched”), especially when the connection is idle. 21 seconds of my time waiting costs on the order of US$0.70; 21 seconds of use of my internet connection costs US$0.0002. So even if I only ever read one out of every 3500 things that was optimistically downloaded, I’m still better off. Even at a much lower time opportunity cost, reading 1% of the prefetched text would make it a better deal. -- To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-tol