Andreas Hartmann wrote:

Hi Lenya devs,

in 2.0.1-dev, PDF documents are now indexed. The text inside the PDF is extracted using PDFBox (http://pdfbox.org) which fortunately uses a BSD license. The sitemap determines if a document contains a PDF based on the source extension. I hope this is sufficient, maybe it makes sense to use the MIME type instead.


I think the mime type is better, because you cannot always trust a suffix. Also you might want to consider using the Tika project.

Cheers

Michael


Testing is of course greatly appreciated!

-- Andreas




--
Michael Wechner
Wyona      -   Open Source Content Management - Yanel, Yulup
http://www.wyona.com
[EMAIL PROTECTED], [EMAIL PROTECTED]
+41 44 272 91 61


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to