Re: New feature: Indexing of PDF documents

Michael Wechner Mon, 28 Jan 2008 00:01:36 -0800

Andreas Hartmann wrote:

Hi Lenya devs,
in 2.0.1-dev, PDF documents are now indexed. The text inside the PDFis extracted using PDFBox (http://pdfbox.org) which fortunately uses aBSD license. The sitemap determines if a document contains a PDF basedon the source extension. I hope this is sufficient, maybe it makessense to use the MIME type instead.

I think the mime type is better, because you cannot always trust asuffix. Also you might want to consider using the Tika project.


Cheers

Michael


Testing is of course greatly appreciated!

-- Andreas



--
Michael Wechner
Wyona      -   Open Source Content Management - Yanel, Yulup
http://www.wyona.com
[EMAIL PROTECTED], [EMAIL PROTECTED]
+41 44 272 91 61


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: New feature: Indexing of PDF documents

Reply via email to