Hello Erik Thanks for the feedback. If you don't mind elaborating further, what kind > of documents are you indexing (database rows? file system files? other?), > how many documents do you have, and how are you indexing it? > > Thanks, > > Erik >
Now, we are indexing file system files varying from HTML pages (85%) to IMAGES (10%) (We index Meta information here), PDF(2%) WORD (2%) and PURE TEXT (1%), we have 100 000 000 documents to index (10%) is already done. And for the last question, I didn't exactly understand what do you mean by "How we are indexing", What I can say is that before we index non full text documents (like PDF, WORD and HTML), we operate a content extraction (usingpdftotext, antiword and 'hpricot' ruby library). We axtract also the metadata related to each document we index. > > > > > _______________________________________________ > Ferret-talk mailing list > [email protected] > http://rubyforge.org/mailman/listinfo/ferret-talk > -- =========== | Lyes Amazouz | USTHB, Algiers ===========
_______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

