As a bad photographer with several forays into the forensic world, I have a couple of comments on a recent (and pretty interesting!) Black Hat presentation by Neal Krawetz (www.hackerfactor.com) on image forensics:
http://blog.wired.com/27bstroke6/files/bh-usa-07-krawetz.pdf To make things clear: I liked it. I think it's solid. I don't want this to be picked up by the press and turned into a food fight. I respect the author. My point is to express my slight doubts regarding several of the far-fetched conclusions presented later in the talk -before the approach is relied upon to fire someone from his post or the like. First things first: in the presentation, following an overview of some of the most rudimentary "manual" anslysis techniques, Mr. Krawetz employs several mathametical transformations as a method to more accurately detect image tampering. This is based on a valid high-level premise: when lossy formats are repeatedly edited and recompressed, the quality of various portions of the image will proportionally degrade. If the image is composed from a couple of previously lossy compressed files from various sources, their compression degradation patterns may differ - and the current level of degradation can be quantified, in the most rudimentary way simply by measuring how each compression unit (with JPEG, an 8x8px cell) changes with further compression - which is a nonlinear process. The property that makes this possible is known to all photographers - the progressive degradation is the main reason why professional and "prosumer" photo editing and delivery is done almost exclusively using storage-extensive lossless formats, and why SLR cameras support RAW / TIFF output (and why skilled image forgers would not use lossy formats until they're done, or if forced to, would rescale their work and add subtle noise to thwart analysis). I'm pretty sure the approach is used as one of the inputs by commercial image forensics software, too - along with a couple of other tricks, such as similarity testing to spot the use of clone tool. Now, to the point: the "wow" factor associated with the presentation and picked up by the press comes from a claim about an apparent heavy manipulation of certain publicly released pictures of Al Qaeda associates, as a proof of the accuracy and reliability of the automated approach - and that's where I'm not really so sure about the conclusions reached. In essence, my issue with this is that the presentation fails to acknowledge that observed patterns do not necessarily depend on the number of saves alone. There are certain very common factors that play a far more pronounced role - and in fact, some of them seem to offer a *better* explanation of some of the artifacts observed. The two most important ones: - Non-uniform subsampling: JPEG and MPEG typically employ 4:2:0 chroma subsampling. This means that a region where a contrast between objects is primarily a product of color changes (at comparable intensity of reflected light) may appear to be "older" (already lower frequency & contrast, producing less pronounced error difference patterns) compared to a region where the same level of contrast can be attributed to luminosity changes alone. Consider this example: http://lcamtuf.coredump.cx/subsampling.png ...we then compress it as a JPEG: http://lcamtuf.coredump.cx/subsampling.jpg ...and can compare the level of compression-related degradation by converting it to cyan-weighted BW: http://lcamtuf.coredump.cx/subsampling_bw.png I attempted to recreate the RGB "error difference" approach of Mr. Krawetz, resaving it again at a slightly different compression level, and came up with this image, which seems to suggest that only the top text is brand new (comparing this to the conclusions reached for various TV frame grabs later in his presentation, where similar differences in color and contrast were resolved in favor of manipulation): http://lcamtuf.coredump.cx/subsampling_nk.jpg Simply picking out Y component does not help either - since the working space of the editor is inevitably RGB, each resave causes Cb and Cr resampling imprecision to spill to Y on YCbCr -> RGB -> YCbCr conversions, and introduce errors comparable to what we're trying to detect. - Quantization. JPEG quality is controlled primarily by the accuracy of image quantization step that discards differences in many high-frequency 8x8 patterns, while generally preserving low-frequency ones, but possibly introducing higher-frequency artifacts around more complex shapes, subject to rapid degradation. A good example of this is the following picture: http://blog.wired.com/photos/uncategorized/2007/08/01/ayman_alzawahiri.jpg http://blog.wired.com/photos/uncategorized/2007/08/01/ayman_alzawahiri_analysis.jpg Krawetz attributes the outline around al-Zawahiri seen on the second picture to chroma key manipulation, but fails to address the fact that the high-contrast, low-frequency edge between al-Zawahiri's black scarf against his white clothing produced an identical artifact. I highly doubt the scarf was altered, and Krawetz makes no such assumption when tracing the original image later on. It's still perfectly possible that this picture was manipulated (and a visual inspection of a thin black outline around his body may confirm this), but Krawetz's analysis does not strike me as solid evidence of such tampering (particularly with the banner, as suggested by Krawetz in an interview). To test for this, I took my own photo with a couple of contrasty areas (most certainly not a collage) and subjected it to the error difference treatment: http://lcamtuf.coredump.cx/photo/current/ula3-000.jpg http://lcamtuf.coredump.cx/ula3-000_nk.jpg Now, if you interpret the output in line with what we see on page 62 of the presentation, one should assume that the background in the top-right part of the image predates the model, and much of the phone and some of her nails postdate her. There's also a list of other problems with the approach that may cause it to fail in specific circumstances... non-square chroma subsamling in certain video formats and JPEG encoders would make regions with dominant high-frequency vertical chrominance contrast patterns degrade at a rate different from ones with dominant horizontal patterns, especially when resaved in 4:2:0... digtal cameras produce non-linear noise, remarkably more pronounced at the bottom part of the dynamic range - which may cause dark areas to behave in a significantly different manner when reaching "stability" on subsequent recompressions, etc. I think the point I'm trying to make is this: it's a good idea to rely on the manual approaches described in this paper. It's also to good to learn about many of the tools-of-trade not described there, such as pixel-level noise uniformity analysis, etc. The ideas proposed for automated analysis, on the other hand, may be good in some applications, but IMO is going to be hit-and-miss with far too many false positives to be useful in general-purpose forensics. /mz _______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/