So, you are reading 100 million records from somewhere and are writing each record to one of 1 million indexes? Really 1 million, with an average of 100 docs in each? 17 hours doesn't sound too bad to me. Before worrying about lucene performance you should double check everything else - in general lucene is not the bottleneck, but your case may be different.
Caching the index writers is likely to help, at the cost of complexity and memory. How about reading all the records and storing a target index identifier somewhere (memory? DB?) then either re-reading the 100 million in sequence of target index, or making 1 million passes through - no, that doesn't sound too clever. Sometime you just have to accept that doing complex operations on large datasets can take a long time. -- Ian. On Wed, Jun 22, 2011 at 2:06 AM, Hiller, Dean x66079 <dean.hil...@broadridge.com> wrote: > I am running over a 100 million row nosql set and unfortunately building 1 > million indexes. Each row I get may or may not be for the index I just wrote > too so I can't keep IndexWriter open very long. I am currently simulating > how long it would take me to build all the indexes and it looks like it is > somewhere around 17 hours :( > > Any other ways to optimize this code(and then I can maybe apply it to our > index map/reduce job), thanks, Dean This is done in 20 different threads and > again taking IndexWriter out of the loop is probably not an option since as I > go over the 100 million records each one needs a different IndexWriter and I > can't have too many IndexWriters open. > > Directory dir = FSDirectory.open(new File(INDEX_DIR_PREFIX > + this.account)); > > for (int i = 0; i < 125; i++) { > IndexWriterConfig conf = new IndexWriterConfig( > Version.LUCENE_32, new KeywordAnalyzer()); > > IndexWriter writer = new IndexWriter(dir, conf); > > LocalDate date = new LocalDate(); > int random = this.r.nextInt(1000); > date = date.plusDays(random); > int next = this.r.nextInt(5000); > int name = this.r.nextInt(1000); > > Document document = createDocument(("temp" + next), > ("dean" + name), > "some url", date); > writer.addDocument(document); > > writer.close(); > } > > Hmmmm, I maybe could use a IndexWriter cache of 2000 to leave them open until > evicted? I can't think of anything else to help though. Ideas? > Thanks, > Dean --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org