Stop. Back up. Test. <G>.... The very _first_ thing I'd do is just comment out the bit that actually indexes the content. I'm guessing you have some loop like:
while (more files) { read the file transform the data create a Lucene document index the document } Just comment out the "index the document" line and see how long _that_ takes. 9 times out of 10, the bottleneck is here. As a comparison, I can index 3-4K docs/second on my laptop. This is using Solr and is the Wikipedia dump so the docs are several K each. So, if you're going to multi-thread, you'll probably want to multi-thread the acquisition of the data and feed that through a separate thread that actually does the indexing, you don't want multiple IndexWriters active at once. FWIW, Erick On Mon, Sep 2, 2013 at 10:13 AM, nischal reddy <nischal.srini...@gmail.com>wrote: > Hi, > > I am thinking to make my lucene indexing multi threaded, can someone throw > some light on the best approach to be followed for achieving this. > > I will give short gist about what i am trying to do, please suggest me the > best way to tackle this. > > What am i trying to do? > > I am building an index for files (around 30000 files), and later will use > this index to search the contents of the files. The usual sequential > approach works fine but is taking humungous amount of time (around 30 > minutes is this the expected time or am i screwing up things somewhere?). > > What am i thinking to do? > > So to improve the performance i am thinking to make my application > multithreaded > > Need suggestions :) > > Please suggest me best ways to do this and normally how long does lucene > take to index 30k files? > > Please suggest me some links of examples (or probably best practices for > multithreading lucene) for making my application more robust. > > TIA, > Nischal Y >