Besides the other suggestions, I'd really, really, really put some instrumentationin the code and see where you're spending your time. For a fast hint, put a cumulative timer around your indexing part only. This will indicate whether the time is consumed in querying your database or indexing......
I'd also just use the default merge factor etc. until you answer this question. They weren't just chosen at random <G>. Something like this. long elapsed = 0; while (more database records) { get database record long temp = get time index Lucene doc elapsed += (get time) - temp } print time It's unclear from your message whether you're calling optimize after every 10,000 docs or not, but don't. But I really suspect you're spending time interacting with the DB. If these responses aren't all that helpful, some code samples would help... Best Erick On Thu, Oct 22, 2009 at 8:45 AM, Paul Taylor <paul_t...@fastmail.fm> wrote: > I'm building a lucene index from a database, creating 1 about 1 million > documents, unsuprisingly this takes quite a long time. > I do this by sending a query to the db over a range of ids , (10,000) > records > Add these results in Lucene > Then get next 10,0000 and so on. > When completed indexing I then call optimize() > I also set indexWriter.setMaxBufferedDocs(1000) and > indexWriter.setMergeFactor(3000) but don't fully understand these values. > Each document contains about 10 small fields > > I'm looking for some ways to improve performance. > > This index writing is single threaded, is there a way I can multi-thread > writing to the indexing ? > I only call optimize() once at the end, is the best way to do it. > I'm going to run a profiler over the code, but are there any rules of > thumbs on the best values to set for MaxBufferedDocs and Mergefactor() > > thanks Paul > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >