Zend_Pdf preloads PDF objects reference tables and pages. Both operations take enough time and memory.
I think pages loading may be omitted for some cases and it may save a lot of resources, but it should be tested. Could I ask you to do this? :) (It looks you have a good set of "real world" PDF examples) Please comment line 294 of library/Zend/Pdf.php file (current SVN version): --------------------------------------------- // $this->_loadPages($this->_trailer->Root->Pages); --------------- Note: $pdf->pages array will be empty. With best regards, Alexander Veremyev. > -----Original Message----- > From: Markus Fischer [mailto:[EMAIL PROTECTED] > Sent: Friday, August 31, 2007 3:14 AM > To: Alexander Veremyev > Cc: Zend Framework General > Subject: Re: [fw-general] Extracting data out of PDF with Zend_Pdf? > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I just discovered another need ... however I think this won't > easily implemented. > > Currently the complete PDF needs to be parsed into memory, > even all I want from a PDF is the metadata information. > > Would it be possible to implement a smart way to extract > metadata information without parsing everything into memory ... ? > > Some PDF files I tested needed more then 128M of memory to be > parsed even all I need is Title and Author ... and besides > memory it also takes quite some time, too. > > thanks, > - - Markus > > Markus Fischer wrote: > > Hey! > > > > This is great, I just saw your commit and tested it. I saw the API > > being changed : > > > > * $oPdf->properties is now a property, not a method anymore > > * $oPdf->getMetaData() returns some xml rdf sequence > > > > I tested it with quite some PDFs and it worked very well. I also > > realized that the amount of information in the properties can vary, > > some have a "Title", others don't. > > > > Is there a difference in practice between the distilled information > > through the properties property and the RDF data? > > > > thank you! > > - Markus > > > > Alexander Veremyev wrote: > >> Hi Markus, > > > >> Thanks for the offered help! > > > >> I mentioned JIRA issue only to indicate that feature already was > >> requested. So it increases its chances to be done in a > short time :) > >> Actually I am going to take a look into it and determine > plans for it > >> tomorrow. > > > >> With best regards, > >> Alexander Veremyev. > > > >>> -----Original Message----- > >>> From: Markus Fischer [mailto:[EMAIL PROTECTED] > >>> Sent: Monday, August 27, 2007 11:54 PM > >>> To: Alexander Veremyev > >>> Cc: Zend Framework General > >>> Subject: Re: [fw-general] Extracting data out of PDF with > Zend_Pdf? > >>> > >> Hi Alexander, > > > >> thank you for answering so quickly. I'll search JIRA next time. > > > >> I'm not new to PHP but the PDF spec is quite complex so is the PDF > >> implementation ... unfortunately I've not enough time to dig into, > >> I'ld love to help and come up with a patch. > > > >> So I hope it will get implemented soon, this would really be great. > > > >> thanks, > >> - Markus > > > >> Alexander Veremyev wrote: > >>>>> Hi Markus, > >>>>> > >>>>> PDF properties processing is planned > >>>>> (http://framework.zend.com/issues/browse/ZF-294), but > not done yet. > >>>>> > >>>>> It's not the first request for the feature and > implementation is > >>>>> relatively simple. I think it should be done in the near future. > >>>>> > >>>>> > >>>>> With best regards, > >>>>> Alexander Veremyev. > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Markus Fischer [mailto:[EMAIL PROTECTED] > >>>>>> Sent: Sunday, August 26, 2007 10:37 PM > >>>>>> To: Zend Framework General > >>>>>> Subject: [fw-general] Extracting data out of PDF with Zend_Pdf? > >>>>>> > >>>>> Hi, > >>>>> > >>>>> is it supported to extra metadata information from a PDF? The > >>>>> information I'm seeking is > >>>>> * title > >>>>> * number of pages > >>>>> * author > >>>>> > >>>>> (of course as long as the information is contained in the PDF). > >>>>> > >>>>> I've gone through quite some PDFs where Adobes Reader shows > >> me title > >>>>> and author information but from Zend_Pdf I get nothing back. > >>>>> > >>>>> Following the documentation I thought I can get this > >> information from > >>>>> the properties() method, e.g. > >>>>> > >>>>> $oPdf = Zend_Pdf::load($sFile); > >>>>> var_dump( $oPdf->properties() ); > >>>>> > >>>>> But the returned array was empty in all cases. > >>>>> > >>>>> I know I can get the number of pages by counting the "pages" > >>>>> property, but what about the other information? > >>>>> > >>>>> If it's not possible with Zend_Pdf, although off-topic, > what other > >>>>> possibilities are out there? fpdf? Or some unix > commands (I'm on > >>>>> Linux)? > >>>>> > >>>>> thanks, > >>>>> - Markus > >>>>> > >>>>> ps: I was using 1.0.1 > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFG109Q1nS0RcInK9ARAmoPAJsGXp8DuD72lFpirddPV6WLX3ke8ACgqF5I > 7glEVrmvYgZxIJEf3HGeEg8= > =Emla > -----END PGP SIGNATURE----- >