Bonjour Romain,
Im asking myself a few questions. Mainly about speed (indexation time) and > document parsing (way to index most of commonly used office documents). For > document parsing, I'm planning to use different open sources library. The > company Im doing this for will be indexing a few Gigabytes of data. Around > 5Gb I think. Any advices about this project? Comments and suggestion are > welcome. > For the parsing you should have a look at Apache Tika. It supports the most common formats and exposes the OS libraries it uses for each format under a very nice and simple API. That should spare you the trouble of interfacing with each individual library. Julien -- DigitalPebble Ltd http://www.digitalpebble.com