Lucene + Hadoop

Hrishikesh Agashe Tue, 10 Nov 2009 01:57:07 -0800

Hi,

I am trying to use Hadoop for Lucene index creation. I have to create multiple 
indexes based on contents of the files (i.e. if author is "hrishikesh", it 
should be added to a index for "hrishikesh". There has to be a separate index 
for every author). For this, I am keeping multiple IndexWriter open for every 
author and maintaining them in a hashmap in map() function. I parse incoming 
file and if I see author is one for which I already have opened a IndexWriter, 
I just add this file in that index, else I create a new IndesWriter for new 
author. As authors might run into thousands, I am closing IndexWriter and 
clearing hashmap once it reaches a certain threshold and starting all over 
again. There is no reduced function.


Does this logic sound correct? Is there any other way of implementing this 
requirement?

--Hrishi

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Lucene + Hadoop

Reply via email to