PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com _____________________________________________________________

At 02:25 PM 6/26/2003 -0600, Carolyn Briles wrote:
So, let me rephrase.  Do any statistics exist on the frequency of
common name objects that are keys in PDF files?

No, because it can vary greatly based on the document. /Type is probably the most common, since it's used by most of the major PDF objects. Other names will vary based on the contents - the more pages the more /Page, /MediaBox keys. The more images in the document, the more /Filter, /ColorSpace, etc.


But the bigger question is WHY DO YOU CARE? What purpose does knowing this serve???


Here is another way to ask this question:
Does a signature (meaning a unique identifier, not the approval kind
of signature) exist for the "typical" PDF file.

Every PDF file always starts with %PDF-1. and ends with %%EOF. Is that not signature enough?



There are lots of details that I am leaving out here, like exactly how
I would index the  dependent axis of the histogram.  But in general,
I am trying to see if a "typical" PDF file can be described (if it exists)
by an examination of the contents of the raw file (not the printed or
viewed result of the reader).

Sure - read the PDF Reference manual for a formal description of the file format. In fact, some folks have even been able to describe the format using the formal Baccus-Norr notation.



Could I look at the frequency of the keys and
say to myself, "Hmmmm.  This is a catalog.  This one is a form. This
one is a ......."

No, but you can usually look at the keys themselves to determine what an object is. /Type /Catalog is required in the Catalog, for example.


However, a normal parser wouldn't be just iterating over the objects - it would walk them in a logical order.


Leonard --------------------------------------------------------------------------- Leonard Rosenthol <mailto:[EMAIL PROTECTED]> Chief Technical Officer <http://www.pdfsages.com> PDF Sages, Inc. 215-629-3700 (voice)


To change your subscription: http://www.pdfzone.com/discussions/lists-pdfdev.html



Reply via email to