I am running over a 100 million row nosql set and unfortunately building 1 million indexes. Each row I get may or may not be for the index I just wrote too so I can't keep IndexWriter open very long. I am currently simulating how long it would take me to build all the indexes and it looks like it is somewhere around 17 hours :(
Any other ways to optimize this code(and then I can maybe apply it to our index map/reduce job), thanks, Dean This is done in 20 different threads and again taking IndexWriter out of the loop is probably not an option since as I go over the 100 million records each one needs a different IndexWriter and I can't have too many IndexWriters open. Directory dir = FSDirectory.open(new File(INDEX_DIR_PREFIX + this.account)); for (int i = 0; i < 125; i++) { IndexWriterConfig conf = new IndexWriterConfig( Version.LUCENE_32, new KeywordAnalyzer()); IndexWriter writer = new IndexWriter(dir, conf); LocalDate date = new LocalDate(); int random = this.r.nextInt(1000); date = date.plusDays(random); int next = this.r.nextInt(5000); int name = this.r.nextInt(1000); Document document = createDocument(("temp" + next), ("dean" + name), "some url", date); writer.addDocument(document); writer.close(); } Hmmmm, I maybe could use a IndexWriter cache of 2000 to leave them open until evicted? I can't think of anything else to help though. Ideas? Thanks, Dean This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.