RE: [fw-general] Zend_Lucene_Search for PDFs

Alexander Veremyev Thu, 01 Nov 2007 15:32:48 -0800

Jack Sleight wrote:
> Alexander Veremyev wrote:
> > I am not sure, what is preferable, to have implementation which
> > doesn’t work correct for all cases or don’t have it at all (in view of
> > existing PDF to text converting solutions).
> I guess that would depend on what's returned. If it returned garbled
> nonsense when it hit something it didn't understand that would probably
> be a bad thing. But if it just ignored it, and worked correctly with the
> majority of PDFs I think it would be a welcome addition, even if it was
> marked as an experimental/incomplete feature.


I think ignoring non-recognized content is not a right way. It doesn't give 
possibility to detect, that something is not recognized.
But it may be done with an optional exception.

That may help with compressed and encrypted documents.
But I expect some difficulties with character set recognition (I am not sure). 
Wrong string may be returned in this case.

On the other hand I agree that it's important feature for Zend_Pdf, so these 
things should be checked deeply.


> With regard to other converting tools, I've yet to find any that are
> purely class based (like Zend_Pdf), they all require some sort of
> additional software or extension installed on the server, which isn't
> possible for many, am I wrong?

Yes, that's correct (as far as I know).


With best regards,
   Alexander Veremyev.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30.10.2007 
18:26

RE: [fw-general] Zend_Lucene_Search for PDFs

Reply via email to