Storing high-volume digital data so that it will last for decades or centuries without any intervention or maintenance --- the way the Archimedes Palimpsest partly survived being ignored for centuries in a variety of untoward places --- is an important problem. One problem is that the very clever ways we have of encoding data with error-correction coding, data compression, interleaving bits from different codewords so that localized damage only takes out a few from each code word, and so on --- make the data formats very difficult to reverse-engineer.
Here's a two-dimensional image data format that's designed to be easy to reverse-engineer, even though it contains a certain amount of redundancy. In theory, it should allow you to store about three megabytes per side on a letter-size sheet of paper printed in a 1200dpi laser printer, and if you use archival-quality paper, the laser priner toner on the paper should be able to survive many centuries of benign neglect. It's also cool to look at. I haven't written a decoder, although I have decoded some of the data by hand to verify that it's working properly. #!/usr/bin/python # -*- encoding: utf-8 -*- # Fractal barcodes, by Kragen Sitaker, 2007. 8-of-9 variant. # This program converts a stream of data into a self-similar # two-dimensional matrix barcode, with the earliest bits of data being # encoded as the largest features. Each level of detail is is encoded # in a 3x3 square, which encodes 4 bits of data, and 8 of the 9 # squares are then used for the next level of detail. # The reason this is interesting is the following: # - It looks really cool. # - It should be really easy to get a fix on the positioning of the # resulting image and compensate for any distortions. # - It's an "archival-friendly" encoding mechanism in the sense that # there are visual clues to the encoding mechanism. # - If the image is scanned with insufficient resolution to capture # all of its details, whatever scale gets captured will convey some # prefix of the data. # So: # with 9 pixels (8 live) you get 4 bits (4 new) # with 81 pixels (64 live) you get 36 bits (32 new) # with 729 pixels (512 live) you get 292 bits (256 new) # with 6561 pixels (4096 live) you get 2340 bits (2048 new) # with 59049 pixels (32768 live) you get 18724 bits (16384 new) # with 531441 pixels (262144 live) you get 149796 bits (131072 new) # with 4782969 pixels (2097152 live) you get 1198372 bits (1048576 new) # (that's the limit at 300 dpi) # with 43046721 pixels (16777216 live) you get 9586980 bits (8388608 new) # (Actually you could get one more bit at the top level, but I don't.) # The bits b0-b3 and their complements are laid out like this: # !b2 !b1 !b0 # b3 K !b3 # b0 b1 b2 # K is the bit from the next higher size of block. b0 is the MSB, # i.e. the 4's bit of the nibble. White means 1; black means 0. # The symmetry of the pattern should simplify reverse-engineering its # interpretation. # (There are actually 70 8-bit patterns that are half ones and half # zeroes, not just 16, so you could encode 6 bits or a little more per # 3x3 region, which would boost your coding efficiency by 50%. This # has some disadvantages for an archival format, though; it destroys # byte-alignment of the bits being encoded, and the mapping between # the balanced 8-bit patterns and the "cleartext" 6-bit patterns is # necessarily a bit arbitrary and thus difficult to discover.) # This is a little trickier than the 2x2-based one I did earlier, # because laser printer resolutions tend to increase by factors of 2, # not 3: 150dpi, 300dpi, 600dpi, 1200dpi. So you get some benefit # from not being resolution-agnostic. But you could produce: # * at 300dpi, a 7.29" x 7.29" 2187x2187 matrix containing 4782969 # pixels and 1.2 megabits (25% coding efficiency) according to the # above. (That's the size this program prints.) # * at 600dpi, four 3.645" square matrices (unfortunately you may not be # able to fit six on a sheet) for a total of 4.8 megabits or 600kB. # * at 1200dpi, you could have a single 6561x6561 matrix at 5.47" # square, encoding about 9.6 megabits (22% coding efficiency); you # might be able to fit two of them. Or you could have a bunch of # 1.8" square 2187x2187 matrices --- say, 20 of them, in a 4x5 # arrangement, encoding just under 24 megabits or three megabytes. # Three megabytes of ASCII text is somewhere between 400 and 1000 # printed pages. If you just make 80x66 page-images out of 5x8 pixels, # you get up to 25x22 = 550 pages out of a 1200dpi letter-size page. # So this method has comparable overhead, just distributed # differently. # I think this makes for a reasonable-cost inactive long-term archival # format for digitally-encoded data; although it's still a little more # expensive than archival-quality CD-Rs, it should last many centuries # longer on 3.5-cent-per-page archival paper. # So our basic scale is 2187 pixels, divided by 300 pixels per inch. import sys bytes = lambda infile: iter(lambda: infile.read(1), '') # read(1) until '' def nibbles(infile): for char in bytes(infile): yield ord(char) >> 4 yield ord(char) & 15 def zoomed(array): return [[v for v in row for _ in range(3)] for row in array for _ in range(3)] class Symbol: def __init__(self): self.full = [[False]] # no-go zones self.colors = [[1]] # an optimization to avoid redundant painting def zoom_in(self): self.full, self.colors = (zoomed(self.full), zoomed(self.colors)) print "zoom_in" def put_square(self, xx, yy, color): if self.colors[xx][yy] != color: print " %s %s %s sq" % (xx, yy, color) self.colors[xx][yy] = color def blocks(self): return ((xx, yy) for yy in range(0, len(self.full), 3) for xx in range(0, len(self.full), 3) if not self.full[xx][yy]) def fill(self, xx, yy): self.full[xx][yy] = True print """%! /rl { rlineto } bind def % abbreviation to shorten 'sq' (draw a square) /sq { setgray moveto 1 0 rl 0 1 rl -1 0 rl closepath fill } bind def /inch { 72 mul } def /center { sub 2 div } def /sz 2187 300 div 72 mul def % 2187 pixels (3^7) divided by 300 dpi 8.5 inch sz center 11 inch sz center translate % close enough on A4 sz dup scale % now the square is (0, 0) to (1, 1) /zoom_in { 1 3 div dup scale } def gsave """ symbol, input_nibbles = Symbol(), nibbles(sys.stdin) try: while 1: symbol.zoom_in() for (xx, yy) in symbol.blocks(): symbol.fill(xx+1, yy+1) # center of block nibble = input_nibbles.next() for ((dx, dy), mask) in [((0, 0), 8), ((1, 0), 4), ((2, 0), 2), ((0, 1), 1)]: if nibble & mask: color = 1 # white else: color = 0 # black symbol.put_square(xx+dx, yy+dy, color) symbol.put_square(xx+(2-dx), yy+(2-dy), 1-color) except StopIteration: pass print "grestore showpage"