Storing high-volume digital data so that it will last for decades or centuries
without any intervention or maintenance --- the way the Archimedes Palimpsest
partly survived being ignored for centuries in a variety of untoward places ---
is an important problem.  One problem is that the very clever ways we have of
encoding data with error-correction coding, data compression, interleaving bits
from different codewords so that localized damage only takes out a few from
each code word, and so on --- make the data formats very difficult to
reverse-engineer.

Here's a two-dimensional image data format that's designed to be easy to
reverse-engineer, even though it contains a certain amount of redundancy.  In
theory, it should allow you to store about three megabytes per side on a
letter-size sheet of paper printed in a 1200dpi laser printer, and if you use
archival-quality paper, the laser priner toner on the paper should be able to
survive many centuries of benign neglect.

It's also cool to look at.

I haven't written a decoder, although I have decoded some of the data by hand
to verify that it's working properly.

#!/usr/bin/python
# -*- encoding: utf-8 -*-
# Fractal barcodes, by Kragen Sitaker, 2007.  8-of-9 variant.

# This program converts a stream of data into a self-similar
# two-dimensional matrix barcode, with the earliest bits of data being
# encoded as the largest features.  Each level of detail is is encoded
# in a 3x3 square, which encodes 4 bits of data, and 8 of the 9
# squares are then used for the next level of detail.

# The reason this is interesting is the following:
# - It looks really cool.
# - It should be really easy to get a fix on the positioning of the
#   resulting image and compensate for any distortions.
# - It's an "archival-friendly" encoding mechanism in the sense that
#   there are visual clues to the encoding mechanism.
# - If the image is scanned with insufficient resolution to capture
#   all of its details, whatever scale gets captured will convey some
#   prefix of the data.

# So:

# with 9 pixels (8 live) you get 4 bits (4 new)
# with 81 pixels (64 live) you get 36 bits (32 new)
# with 729 pixels (512 live) you get 292 bits (256 new)
# with 6561 pixels (4096 live) you get 2340 bits (2048 new)
# with 59049 pixels (32768 live) you get 18724 bits (16384 new)
# with 531441 pixels (262144 live) you get 149796 bits (131072 new)
# with 4782969 pixels (2097152 live) you get 1198372 bits (1048576 new)
# (that's the limit at 300 dpi)
# with 43046721 pixels (16777216 live) you get 9586980 bits (8388608 new)

# (Actually you could get one more bit at the top level, but I don't.)

# The bits b0-b3 and their complements are laid out like this:
# !b2 !b1 !b0
#  b3  K  !b3
#  b0  b1  b2
# K is the bit from the next higher size of block.  b0 is the MSB,
# i.e. the 4's bit of the nibble.  White means 1; black means 0.

# The symmetry of the pattern should simplify reverse-engineering its
# interpretation.

# (There are actually 70 8-bit patterns that are half ones and half
# zeroes, not just 16, so you could encode 6 bits or a little more per
# 3x3 region, which would boost your coding efficiency by 50%.  This
# has some disadvantages for an archival format, though; it destroys
# byte-alignment of the bits being encoded, and the mapping between
# the balanced 8-bit patterns and the "cleartext" 6-bit patterns is
# necessarily a bit arbitrary and thus difficult to discover.)

# This is a little trickier than the 2x2-based one I did earlier,
# because laser printer resolutions tend to increase by factors of 2,
# not 3: 150dpi, 300dpi, 600dpi, 1200dpi.  So you get some benefit
# from not being resolution-agnostic.  But you could produce:
# * at 300dpi, a 7.29" x 7.29" 2187x2187 matrix containing 4782969
#   pixels and 1.2 megabits (25% coding efficiency) according to the
#   above.  (That's the size this program prints.)
# * at 600dpi, four 3.645" square matrices (unfortunately you may not be
#   able to fit six on a sheet) for a total of 4.8 megabits or 600kB.
# * at 1200dpi, you could have a single 6561x6561 matrix at 5.47"
#   square, encoding about 9.6 megabits (22% coding efficiency); you
#   might be able to fit two of them. Or you could have a bunch of
#   1.8" square 2187x2187 matrices --- say, 20 of them, in a 4x5
#   arrangement, encoding just under 24 megabits or three megabytes.

# Three megabytes of ASCII text is somewhere between 400 and 1000
# printed pages. If you just make 80x66 page-images out of 5x8 pixels,
# you get up to 25x22 = 550 pages out of a 1200dpi letter-size page.
# So this method has comparable overhead, just distributed
# differently.

# I think this makes for a reasonable-cost inactive long-term archival
# format for digitally-encoded data; although it's still a little more
# expensive than archival-quality CD-Rs, it should last many centuries
# longer on 3.5-cent-per-page archival paper.

# So our basic scale is 2187 pixels, divided by 300 pixels per inch.

import sys

bytes = lambda infile: iter(lambda: infile.read(1), '') # read(1) until ''
def nibbles(infile):
    for char in bytes(infile):
        yield ord(char) >> 4
        yield ord(char) & 15

def zoomed(array):
    return [[v for v in row for _ in range(3)]
            for row in array for _ in range(3)]

class Symbol:
    def __init__(self):
        self.full = [[False]]  # no-go zones
        self.colors = [[1]]    # an optimization to avoid redundant painting
    def zoom_in(self):
        self.full, self.colors = (zoomed(self.full), zoomed(self.colors))
        print "zoom_in"
    def put_square(self, xx, yy, color):
        if self.colors[xx][yy] != color:
            print "  %s %s %s sq" % (xx, yy, color)
            self.colors[xx][yy] = color
    def blocks(self):
        return ((xx, yy) for yy in range(0, len(self.full), 3)
                         for xx in range(0, len(self.full), 3)
                if not self.full[xx][yy])
    def fill(self, xx, yy): self.full[xx][yy] = True

print """%!
/rl { rlineto } bind def  % abbreviation to shorten 'sq' (draw a square)
/sq { setgray  moveto  1 0 rl 0 1 rl -1 0 rl  closepath fill } bind def
/inch { 72 mul } def  /center { sub 2 div } def
/sz 2187 300 div 72 mul def  % 2187 pixels (3^7) divided by 300 dpi
8.5 inch sz center  11 inch sz center  translate  % close enough on A4
sz dup scale  % now the square is (0, 0) to (1, 1)
/zoom_in { 1 3 div dup scale } def  gsave
"""
symbol, input_nibbles = Symbol(), nibbles(sys.stdin)
try:
    while 1:
        symbol.zoom_in()
        for (xx, yy) in symbol.blocks():
            symbol.fill(xx+1, yy+1)  # center of block
            nibble = input_nibbles.next()
            for ((dx, dy), mask) in [((0, 0), 8), ((1, 0), 4),
                                     ((2, 0), 2), ((0, 1), 1)]:
                if nibble & mask: color = 1  # white
                else: color = 0              # black
                symbol.put_square(xx+dx, yy+dy, color)
                symbol.put_square(xx+(2-dx), yy+(2-dy), 1-color)
except StopIteration: pass

print "grestore showpage"

Reply via email to