The spatial responses of neurons in the first stages of the human visual system are well approximated by Gabor features (the wavelet that arises from the multiplication of a spatial sinusoid, i.e. sin(omega dotproduct(somedirection, [x y]) with a Gaussian), and the visual system is more sensitive to some frequencies of these Gabor features than others. (Interestingly, amblyopic vision is sensitive to a different set of spatial frequencies.) JPEG, JPEG2000, and MPEG compression exploit this feature of visual perception by spending more bits on spatial frequencies that are likely to be perceptually salient, given a typical viewing distance and pixel density.
(I'm going to stop saying "spatial frequencies". All the frequencies in this post are spatial; I'm not going to talk about how many terahertz is red or how frequently you microsaccade.) Dithering or halftoning is another sort of lossy data compression problem. In a typical case, given a picture with, say, 9 bits per pixel of luminance information, you want to produce another picture with only 1 bit per pixel, but which nevertheless appears identical to the original, to a human viewer. This is similar to the problem JPEG solves, but with the additional constraint that you don't have the freedom to specify the "decompression algorithm", because biology has already determined it for you. There is clearly a tradeoff here between different frequencies: the bits you use to encode higher frequencies are not available to encode lower frequencies, and vice versa. This implies that there's an unavoidable tradeoff between precisely locating edges and precisely rendering levels of brightness. But that doesn't mean that standard algorithms are Pareto-optimal in these tradeoffs, and because of the human visual system's varying sensitivity to errors at different Gabor frequencies, even Pareto-optimal algorithms could possibly be improved. A special case where this is a particularly easy tradeoff is when the pixels are very, very small: say, 600 or 1200 dpi. At a typical viewing distance of 1 meter, a 1200dpi pixel subtends 9 seconds of arc. A foveal cone subtends some 30 arcseconds, and ideal human foveal visual resolution is commonly cited as 20 arcseconds, although "vernier acuity", the ability to visually detect edge misalignments which is used by vernier scales, goes down to 4-6 arcseconds. So modern inkjet printers have the advantage of being able to use spatial frequencies to which we are not only neurally insensitive but actually not even able to see at all except perhaps via aliasing and microsaccade-based subpixel effects. (Apparently this "visual hyperacuity" is still not understood, and is listed on Wikipedia's "Perceptual paradox" page; my bet is on microsaccades and antialiasing.) So you can guiltlessly devote none of your bits to representing the top octave or two of the frequency range, which gives you plenty of bits to encode the lower frequencies that you can actually see. That's a tradeoff you don't even have to think about. (You just have to be careful that your halftoning algorithm isn't going to generate artifacts that alias down into the visual range.) However, we don't always enjoy the advantage of being able to use 10-arcsecond pixels. The pixels on the netbook screen I'm typing this on are about 0.2mm across, or about 130dpi, which is considerably denser than is typical for a computer screen. If I look at the screen from 25cm, each pixel subtends about 2½ minutes of arc. And I'm thinking about a laser-printed microfilm project where you'll typically be magnifying the printed pixels, for reading, to about the same size. At these resolutions, bilevel images generated using the usual algorithms (halftone screens, ordered dithering, Floyd-Steinberg error diffusion; [Cris Luengo wrote an excellent overview][2]) are dramatically inferior to grayscale images. They're also dramatically inferior to the visual quality achieved by stippling, like the work of Noli Novak and her colleagues in the Wall Street Journal, and copperplate engraving, which also produce bilevel images with limited resolution. This, then, is the background against which I wrote my kragen-hacks post last month, which crudely [makes images look like you ran them through a photocopier][3]. This weekend, I was working on rendering text with a very small pixel proportional pixel font (a followup to [my font rendering engine in DHTML][0], with application to archival data recording as in [Carving stone tablets at home in your spare time][1]) and it occurred to me that a really useful thing to archive in a tiny font would be Wikipedia. Maybe not all of Wikipedia, but at least [the most important articles][4]. And it occurred to me that you'd pretty much need to include at least some of the images; nearly all of the articles are substantially improved by them (see <http://en.wikipedia.org/wiki/Claude_Monet>, <http://en.wikipedia.org/wiki/Richard_Wagner>, and <http://en.wikipedia.org/wiki/Maize>) but articles about two- or three-dimensional inventions, such as <http://en.wikipedia.org/wiki/Arch>; visual arts, such as <http://en.wikipedia.org/wiki/Mona_Lisa>; or anatomy, such as <http://en.wikipedia.org/wiki/Kidney>; would be totally vitiated by their loss. With an eye toward finding a reasonable way to include images in what is effectively microfilm made on a standard laser printer, I [tried out some different dithering methods on a photograph of singer Gwen Stefani][5], including my kragen-hacks post. I was able to get results nearly as good as my photocopier hack by the well-known technique of applying a "sharpen" filter to the image before dithering it, thus amplifying the high frequencies. But none of the results even approached the quality of WSJ stipple portraits. [0]: http://canonical.org/~kragen/dofonts.html [1]: http://lists.canonical.org/pipermail/kragen-tol/2005-May/000777.html [2]: http://www.cb.uu.se/~cris/blog/index.php/archives/355 [3]: http://canonical.org/~kragen/sw/byn/ (Contains bare breasts.) [5]: http://canonical.org/~kragen/sw/stefani-images/ (Contains bare arms.) [4]: http://en.wikipedia.org/wiki/Wikipedia:Vital_articles Why was the quality so bad? I looked at the academic literature on dithering and halftoning using computers, and found a lot of stuff about trying to minimize the error between a smoothed version of the output image and the original input image, but nothing about minimizing an approximation of perceptual error using a Gabor-wavelet-based model of the human visual system. The only thing I came up with where someone was using Gabor wavelets to measure and optimize the performance of halftoning algorithms was an article called "Joint Edge-Preserving and Moire-Suppressing Image Resampling Using the Discrete Gabor Transform", which was specifically about preventing the creation of moiré artifacts or "aliasing" in gravure printing --- that is, visible difference frequencies created by heterodyning through nonlinear interaction between a sampling grid and high-frequency components of the input image --- without low-pass filtering the holy living fuck out of the input image with a single-pole filter, thus blurring the edges of everything, and without just brick-wall filtering its Fourier transform, which will create all kinds of poorly localized ringing artifacts. Instead they use its the Gabor transform to figure out which parts of the image to low-pass filter the holy living fuck out of. At least that's what it sounds like from the abstract; I haven't actually read the whole paper. You'd think maybe they'd just low-pass filter it in the Gabor domain instead of the Fourier frequency domain or the spatial domain, but what do I know? There's been some dramatic recent work by Chang, Alain, and Ostromoukhov called "[Structure-Aware Error Diffusion][6]", building on a 2008 paper by Pang, Wong, Cohen-Or, and Heng called "[Structure-aware Halftoning][7]", which has even more dramatic results but runs orders of magnitude slower than other halftoning algorithms. [6]: http://liris.cnrs.fr/Documents/Liris-4442.pdf [7]: http://www.cse.cuhk.edu.hk/~ttwong/papers/structurehalftone/paper/texhalftone.pdf But none of this work is exploiting the bandpass nature of the contrast sensitivity function (CSF) of the human visual system, a well-documented phenomenon that has been known for decades among vision scientists. The HVS is most sensitive to frequencies between 2 and 5 cycles per degree; frequencies *on both sides* are less visible. (Those seem to be the half-power points, i.e. this is a band-pass filter of just over an octave of bandwidth, with a Q factor of just below 1.) That means that when papers say things like, "It is accepted that the ideal halftone image has a blue-noise spectrum," ([Pang et al. 2008][7]) which implies that the lower-frequency the noise, the less acceptable it is, they are setting themselves an arbitrary goal that cannot be justified in terms of perceptual quality. So here's what I was thinking: halftone by minimizing some vector norm in the Gabor space between the input and output images, where the Gabor space components are perceptually weighted. That is, you do the Gabor transform of both images, multiply the Gabor components pointwise by a CSF model of the HVS, subtract corresponding components, and measure the size of the resulting difference vector, say by the algebraic sum or Euclidean sum of its components. That's what you want to minimize. The great thing is that because the Gabor transform is a linear operator, modifications to the spatial-domain image affect the Gabor-domain image and difference vector linearly. It would be nice if the Gabor transform were invertible, but the Gabor basis functions are apparently not orthogonal, which I think means that most sets of coefficients for the Gabor basis functions do not correspond to an image in the spatial domain. If they were, you could tell which pixels in your image most need to be flipped in order to improve the image quality: they're the ones with the highest peaks in the inverse Gabor transform of the motherfucking difference vector. Presumably you can make some kind of headway by throwing out coefficients at random until a solution starts to exist. There was actually a 1997 SPIE paper, [Image Quality Assessment with a Gabor Pyramid Model of the Human Visual System][8], by Taylor, Pizlo, Allebach, and Bouman, which proposed doing exactly this in order to measure the fidelity of image approximation methods, including, among other things, halftoning. This hasn't been completely ignored (Google Scholar finds 34 citations) but you still find even the best current halftoning work using totally boneheaded blue-noise criteria instead. Maybe it's because Taylor was a grad student at Purdue, where he'd actually just done a bunch of work on halftoning and then went on to do his Ph.D. thesis on this model, and then went off to teach at a vocational school in Milwaukee and stopped doing research. [8]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.2727&rep=rep1&type=pdf Taylor et al.'s algorithm doesn't let you invert the Gabor transform, but it does produce for each pixel a probability that a human will notice a difference between the images at that pixel, which seems like it would be just as good for the case of halftoning, where it tells you which pixels you most need to flip. From there, it's gradient descent and boom, Bob's your uncle. Or maybe I just had too much Diet Coke and it's 5 AM. Your call. -- To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-tol