Thanks all you for yours answers, I going to change a few things in my application and make tests. One thing I haven't find another good pdfToText converter like pdfBox Do you know any other faster ? Greetings Thanks for yours answers Ariel
On Jan 9, 2008 11:08 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Ariel, > > I believe PDFBox is not the fastest thing and was built more to handle all > possible PDFs than for speed (just my impression - Ben, PDFBox's author > might still be on this list and might comment). Pulling data from NFS to > index seems like a bad idea. I hope at least the indices are local and not > on a remote NFS... > > We benchmarked local disk vs. NFS vs. a FC SAN (don't recall which one) > and indexing overNFS was slooooooow. > > Otis > > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Ariel <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Wednesday, January 9, 2008 2:50:41 PM > Subject: Why is lucene so slow indexing in nfs file system ? > > Hi: > I have seen the post in > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12700.html > and > I am implementing a similar application in a distributed enviroment, a > cluster of nodes only 5 nodes. The operating system I use is > Linux(Centos) > so I am using nfs file system too to access the home directory where > the > documents to be indexed reside and I would like to know how much time > an > application spends to index a big amount of documents like 10 Gb ? > I use lucene version 2.2.0, CPU processor xeon dual 2.4 Ghz 512 Mb in > every > nodes, LAN: 1Gbits/s. > > The problem I have is that my application spends a lot of time to index > all > the documents, the delay to index 10 gb of pdf documents is about 2 > days (to > convert pdf to text I am using pdfbox) that is of course a lot of time, > others applications based in lucene, for instance ibm omnifind only > takes 5 > hours to index the same amount of pdfs documents. I would like to find > out > why my application has this big delay to index, any help is welcome. > Dou you know others distributed architecture application that uses > lucene to > index big amounts of documents ? How long time it takes to index ? > I hope yo can help me > Greetings > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >