Zend_Pdf preloads PDF objects reference tables and pages. Both
operations take enough time and memory.

I think pages loading may be omitted for some cases and it may save a
lot of resources, but it should be tested. Could I ask you to do this?
:)  (It looks you have a good set of "real world" PDF examples)
Please comment line 294 of library/Zend/Pdf.php file (current SVN
version):
---------------------------------------------
//                $this->_loadPages($this->_trailer->Root->Pages);
---------------
Note: $pdf->pages array will be empty.


With best regards,
   Alexander Veremyev.

> -----Original Message-----
> From: Markus Fischer [mailto:[EMAIL PROTECTED] 
> Sent: Friday, August 31, 2007 3:14 AM
> To: Alexander Veremyev
> Cc: Zend Framework General
> Subject: Re: [fw-general] Extracting data out of PDF with Zend_Pdf?
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I just discovered another need ... however I think this won't 
> easily implemented.
> 
> Currently the complete PDF needs to be parsed into memory, 
> even all I want from a PDF is the metadata information.
> 
> Would it be possible to implement a smart way to extract 
> metadata information without parsing everything into memory ... ?
> 
> Some PDF files I tested needed more then 128M of memory to be 
> parsed even all I need is Title and Author ... and besides 
> memory it also takes quite some time, too.
> 
> thanks,
> - - Markus
> 
> Markus Fischer wrote:
> > Hey!
> > 
> > This is great, I just saw your commit and tested it. I saw the API 
> > being changed :
> > 
> > * $oPdf->properties is now a property, not a method anymore
> > * $oPdf->getMetaData() returns some xml rdf sequence
> > 
> > I tested it with quite some PDFs and it worked very well. I also 
> > realized that the amount of information in the properties can vary, 
> > some have a "Title", others don't.
> > 
> > Is there a difference in practice between the distilled information 
> > through the properties property and the RDF data?
> > 
> > thank you!
> > - Markus
> > 
> > Alexander Veremyev wrote:
> >> Hi Markus,
> > 
> >> Thanks for the offered help!
> > 
> >> I mentioned JIRA issue only to indicate that feature already was 
> >> requested. So it increases its chances to be done in a 
> short time :) 
> >> Actually I am going to take a look into it and determine 
> plans for it 
> >> tomorrow.
> > 
> >> With best regards,
> >>    Alexander Veremyev.
> > 
> >>> -----Original Message-----
> >>> From: Markus Fischer [mailto:[EMAIL PROTECTED]
> >>> Sent: Monday, August 27, 2007 11:54 PM
> >>> To: Alexander Veremyev
> >>> Cc: Zend Framework General
> >>> Subject: Re: [fw-general] Extracting data out of PDF with 
> Zend_Pdf?
> >>>
> >> Hi Alexander,
> > 
> >> thank you for answering so quickly. I'll search JIRA next time.
> > 
> >> I'm not new to PHP but the PDF spec is quite complex so is the PDF 
> >> implementation ... unfortunately I've not enough time to dig into, 
> >> I'ld love to help and come up with a patch.
> > 
> >> So I hope it will get implemented soon, this would really be great.
> > 
> >> thanks,
> >> - Markus
> > 
> >> Alexander Veremyev wrote:
> >>>>> Hi  Markus,
> >>>>>
> >>>>> PDF properties processing is planned 
> >>>>> (http://framework.zend.com/issues/browse/ZF-294), but 
> not done yet.
> >>>>>
> >>>>> It's not the first request for the feature and 
> implementation is 
> >>>>> relatively simple. I think it should be done in the near future.
> >>>>>
> >>>>>
> >>>>> With best regards,
> >>>>>    Alexander Veremyev.
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Markus Fischer [mailto:[EMAIL PROTECTED]
> >>>>>> Sent: Sunday, August 26, 2007 10:37 PM
> >>>>>> To: Zend Framework General
> >>>>>> Subject: [fw-general] Extracting data out of PDF with Zend_Pdf?
> >>>>>>
> >>>>> Hi,
> >>>>>
> >>>>> is it supported to extra metadata information from a PDF? The 
> >>>>> information I'm seeking is
> >>>>> * title
> >>>>> * number of pages
> >>>>> * author
> >>>>>
> >>>>> (of course as long as the information is contained in the PDF).
> >>>>>
> >>>>> I've gone through quite some PDFs where Adobes Reader shows
> >> me title
> >>>>> and author information but from Zend_Pdf I get nothing back.
> >>>>>
> >>>>> Following the documentation I thought I can get this
> >> information from
> >>>>> the properties() method, e.g.
> >>>>>
> >>>>> $oPdf = Zend_Pdf::load($sFile);
> >>>>> var_dump( $oPdf->properties() );
> >>>>>
> >>>>> But the returned array was empty in all cases.
> >>>>>
> >>>>> I know I can get the number of pages by counting the "pages" 
> >>>>> property, but what about the other information?
> >>>>>
> >>>>> If it's not possible with Zend_Pdf, although off-topic, 
> what other 
> >>>>> possibilities are out there? fpdf? Or some unix 
> commands (I'm on 
> >>>>> Linux)?
> >>>>>
> >>>>> thanks,
> >>>>> - Markus
> >>>>>
> >>>>> ps: I was using 1.0.1
> > 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFG109Q1nS0RcInK9ARAmoPAJsGXp8DuD72lFpirddPV6WLX3ke8ACgqF5I
> 7glEVrmvYgZxIJEf3HGeEg8=
> =Emla
> -----END PGP SIGNATURE-----
> 

Reply via email to