Hi:
I have seen the post in
http://www.mail-archive.com/[EMAIL PROTECTED]/msg12700.html and
I am implementing a similar application in a distributed enviroment, a
cluster of nodes only 5 nodes. The operating system I use is Linux(Centos)
so I am using nfs file system too to access the home directory where the
documents to be indexed reside and I would like to know how much time an
application spends to index a big amount of documents like 10 Gb ?
I use lucene version 2.2.0, CPU processor xeon dual 2.4 Ghz 512 Mb in every
nodes, LAN: 1Gbits/s.

The problem I have is that my application spends a lot of time to index all
the documents, the delay to index 10 gb of pdf documents is about 2 days (to
convert pdf to text I am using pdfbox) that is of course a lot of time,
others applications based in lucene, for instance ibm omnifind only takes 5
hours to index the same amount of pdfs documents. I would like to find out
why my application has this big delay to index, any help is welcome.
Dou you know others distributed architecture application that uses lucene to
index big amounts of documents ? How long time it takes to index ?
I hope yo can help me
Greetings

Reply via email to