PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com _____________________________________________________________
At 02:25 PM 6/26/2003 -0600, Carolyn Briles wrote:
So, let me rephrase. Do any statistics exist on the frequency of common name objects that are keys in PDF files?
No, because it can vary greatly based on the document. /Type is probably the most common, since it's used by most of the major PDF objects. Other names will vary based on the contents - the more pages the more /Page, /MediaBox keys. The more images in the document, the more /Filter, /ColorSpace, etc.
But the bigger question is WHY DO YOU CARE? What purpose does knowing this serve???
Here is another way to ask this question: Does a signature (meaning a unique identifier, not the approval kind of signature) exist for the "typical" PDF file.
Every PDF file always starts with %PDF-1. and ends with %%EOF. Is that not signature enough?
There are lots of details that I am leaving out here, like exactly how I would index the dependent axis of the histogram. But in general, I am trying to see if a "typical" PDF file can be described (if it exists) by an examination of the contents of the raw file (not the printed or viewed result of the reader).
Sure - read the PDF Reference manual for a formal description of the file format. In fact, some folks have even been able to describe the format using the formal Baccus-Norr notation.
Could I look at the frequency of the keys and say to myself, "Hmmmm. This is a catalog. This one is a form. This one is a ......."
No, but you can usually look at the keys themselves to determine what an object is. /Type /Catalog is required in the Catalog, for example.
However, a normal parser wouldn't be just iterating over the objects - it would walk them in a logical order.
Leonard --------------------------------------------------------------------------- Leonard Rosenthol <mailto:[EMAIL PROTECTED]> Chief Technical Officer <http://www.pdfsages.com> PDF Sages, Inc. 215-629-3700 (voice)
To change your subscription: http://www.pdfzone.com/discussions/lists-pdfdev.html
