Hi, I am trying to use Hadoop for Lucene index creation. I have to create multiple indexes based on contents of the files (i.e. if author is "hrishikesh", it should be added to a index for "hrishikesh". There has to be a separate index for every author). For this, I am keeping multiple IndexWriter open for every author and maintaining them in a hashmap in map() function. I parse incoming file and if I see author is one for which I already have opened a IndexWriter, I just add this file in that index, else I create a new IndesWriter for new author. As authors might run into thousands, I am closing IndexWriter and clearing hashmap once it reaches a certain threshold and starting all over again. There is no reduced function.
Does this logic sound correct? Is there any other way of implementing this requirement? --Hrishi DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.