Ross Ridge wrote: > So identifying PDF files is pretty easy. Steven D'Aprano wrote: > Sure. MIS-identifying PDF files is pretty easy. Identifying them is not. > Consider this example:
Your contrived example doesn't show how a PDF file would be misidentified, it only shows how a file deliberately made to look like PDF file would be "misidentified". Since that was the intent of crafting such a file, I don't see the problem. > Is there a security vulnerability buried in the detection of file types by > magic bytes? I don't know, but I wouldn't be surprised if there were. There's only a security vulnerability if you choose to trust a file based on it's assumed file type. Since PDF files generally aren't trusted, it's not likely to be an issue for whatever application tubby has in mind. >Any file system that doesn't have file type metadata is reduced to >guessing the type of the file, and guesses can be wrong. File type metadata can also be wrong. You can give any file a .PDF extension and Windows will believe it's a PDF file. On Mac OS if file has a signature "CARO"/"PDF ", it will believe it's a PDF file regardless of it's contents. Metadata doesn't make programs any less vulnerable to deliberate attempts to fool them. Ross Ridge -- http://mail.python.org/mailman/listinfo/python-list