Re: Lucene & Zend Lucene Search : indexation speed, document parsing

Julien Nioche Tue, 16 Sep 2008 04:05:33 -0700

Bonjour Romain,


Im asking myself a few questions. Mainly about speed (indexation time) and
> document parsing (way to index most of commonly used office documents).  For
> document parsing, I'm planning to use different open sources library. The
> company Im doing this for will be indexing a few Gigabytes of data. Around
> 5Gb I think. Any advices about this project? Comments and suggestion are
> welcome.
>

For the parsing you should have a look at Apache Tika. It supports the most
common formats and exposes the OS libraries it uses for each format under a
very nice and simple API. That should spare you the trouble of interfacing
with each individual library.

Julien
-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Re: Lucene & Zend Lucene Search : indexation speed, document parsing

Reply via email to