Updating existing documents in index: Solutions

John Smith Thu, 11 Aug 2005 11:12:32 -0700

Hi all


This is a slightly long email. Pardon me.

 

As Lucene does not allow for updating an existing document in the index, the 
only option is to delete and reindex the message.When you have too many 
updates, this gets a little cumbersome. In our case, as such the actual content 
of the document being indexed does 

not change, but the fields around the content, like say "LastReadby" or 
something like Folder associated with it etc change. These are all fields that 
have been indexed as a part of the original  document in the index.

 

I have been contemplating putting these "commonly changing fields" into one  
index and allow for delete and reindex on this  index alone and keep the static 
data in another index. DocumentID will be a stored field and will be stored in 
both the static and dynamic index, as a way of identifying the document.

 

Static index: Contains content of document indexed and documentID stored.

Dynamic index: Contains all fields about the document which change frequently 
indexed  and documentID stored.

 

 

Questions

 

1. First of all, is there a better solution to this frequently changing fields 
having to be reindexed ?

 

2. Let's say I go  with the 2 index approach, 

 

Example query:  Content: "Hello world" AND Folder:Folder1 AND LastReadBy: jane. 
If we execute these queries on our static and dynamic indexes, they will 
obviously fail to get hits.



 

     Let's say I have a way of splitting my queries such that  all content 
queries go to static (content) index only and queries on other fields go to the 
dynamic index, basically allow for queries to come in such a way that it is 
always a AND between the dynamic index result set and static index result set. 
So on the results set, I would have to retrieve the document ID and make sure 
we have the same documentID in both the result sets, in order for it to be a 
match.

      In  cases where the result sets are really huge from  both the queries, 
then even to get the number of hits, I will have to retrieve each and every 
document from the results, in order to get the documentID for comparison. 
Queries can get really slow.

 

Has anyone faced similar problems, If so what was your solution?

Any comments/thoughts will be appreciated.

 

Thank you

JS



                
---------------------------------
 Start your day with Yahoo! - make it your home page

Updating existing documents in index: Solutions

Reply via email to