>>>> Well, lynx said it may be a binary, see it anyway? It was a mess. >> Yes. Most PDFs in my experience have most of their data compressed, >> so they are "binary junk" when looked at with tools that don't >> understand PDF structure and the compression method(s) in question. > zless may be a better alternative since it does compressed data.
Not of much use here. PDFs are not simply text files which have had a general-purpose compression tool applied to them; they have internal structure, and _some_ of the content gets compressed. One PDF I have, for example, begins %PDF-1.6 %âãÏÓ 5191 0 obj <</Filter/FlateDecode/First 939/Length 3647/N 93/Type/ObjStm>>stream after which the "binary junk" begins. A few KB later (3647 bytes, I expect), I see endstream endobj 5192 0 obj <</Filter/FlateDecode/First 909/Length 4329/N 93/Type/ObjStm>>stream and it's back to binary compressed data. Other PDFs have more plaintext before the compressed data begins; another one I checked has some sixty or seventy lines of plain text before going into compressed data. I don't recall enough details to know whether FlateDecode's compression algorithm is close enough to any of the general-purpose compression tools like gzip or compress to be of use, but even if it is, you would at a minimum have to pick apart the PDF structure enough to extract the compressed portion. And, of course, FlateDecode is not the only compression algorithm PDFs can use. For full details, of course, read the PDF spec. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML [email protected] / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B _______________________________________________ Lynx-dev mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/lynx-dev
