Re: [Lynx-dev] Displaying a pdf live on the Fly?

Mouse Tue, 04 Jun 2019 04:07:16 -0700

>>>> Well, lynx said it may be a binary, see it anyway?  It was a mess.
>> Yes.  Most PDFs in my experience have most of their data compressed,
>> so they are "binary junk" when looked at with tools that don't
>> understand PDF structure and the compression method(s) in question.
> zless may be a better alternative since it does compressed data.


Not of much use here.  PDFs are not simply text files which have had a
general-purpose compression tool applied to them; they have internal
structure, and _some_ of the content gets compressed.

One PDF I have, for example, begins

%PDF-1.6
%âãÏÓ
5191 0 obj
<</Filter/FlateDecode/First 939/Length 3647/N 93/Type/ObjStm>>stream

after which the "binary junk" begins.  A few KB later (3647 bytes, I
expect), I see

endstream
endobj
5192 0 obj
<</Filter/FlateDecode/First 909/Length 4329/N 93/Type/ObjStm>>stream

and it's back to binary compressed data.

Other PDFs have more plaintext before the compressed data begins;
another one I checked has some sixty or seventy lines of plain text
before going into compressed data.

I don't recall enough details to know whether FlateDecode's compression
algorithm is close enough to any of the general-purpose compression
tools like gzip or compress to be of use, but even if it is, you would
at a minimum have to pick apart the PDF structure enough to extract the
compressed portion.  And, of course, FlateDecode is not the only
compression algorithm PDFs can use.

For full details, of course, read the PDF spec.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                [email protected]
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

_______________________________________________
Lynx-dev mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lynx-dev

Re: [Lynx-dev] Displaying a pdf live on the Fly?

Reply via email to