RE: [fw-general] Zend_Lucene_Search for PDFs

Alexander Veremyev Thu, 01 Nov 2007 08:39:04 -0800

Yeah.


Zend_Search_Lucene needs text extracted from PDF document to index it.

 

Text extraction feature is planned since first versions of Zend_Pdf and was 
estimated as “easy to implement”. But it’s not done up to now. The problem is 
in some special cases which increase implementation complexity. I mean 
compressed or encrypted text streams and some encoding issues.

 

I am not sure, what is preferable, to have implementation which doesn’t work 
correct for all cases or don’t have it at all (in view of existing PDF to text 
converting solutions).

 

 

With best regards,

   Alexander Veremyev.

 

 

   _____  

From: Jack Sleight [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 01, 2007 2:11 PM
To: peoplesoft
Cc: fw-general@lists.zend.com
Subject: Re: [fw-general] Zend_Lucene_Search for PDFs

 

I could be wrong, but unfortunately I don't THINK it's currently possible. 
Search_Lucene doesn't have built in support for PDF files. If you can find 
another PHP class that can successfully extract all the text from a PDF file 
(something Zend_Pdf unfortunately can't do), then indexing that with 
Search_Lucene is fairly straight forward, it just getting that text out that's 
the problem, because PDF files are encoded, so just indexing the source of a 
PDF file wont work.

Of course I could be wrong, hopefully someone from the MFS team can 
confirm/correct this?

peoplesoft wrote: 

Please help.... 
its very urgent and i had high hopes of having PDF search with Zend lucene.
:-((
 
 
peoplesoft wrote:
  
 
  

 

-- 
Jack


No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30.10.2007 
18:26



No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30.10.2007 
18:26

RE: [fw-general] Zend_Lucene_Search for PDFs

Reply via email to