Hi all
This is a slightly long email. Pardon me.
As Lucene does not allow for updating an existing document in the index, the
only option is to delete and reindex the message.When you have too many
updates, this gets a little cumbersome. In our case, as such the actual content
of the document being indexed does
not change, but the fields around the content, like say "LastReadby" or
something like Folder associated with it etc change. These are all fields that
have been indexed as a part of the original document in the index.
I have been contemplating putting these "commonly changing fields" into one
index and allow for delete and reindex on this index alone and keep the static
data in another index. DocumentID will be a stored field and will be stored in
both the static and dynamic index, as a way of identifying the document.
Static index: Contains content of document indexed and documentID stored.
Dynamic index: Contains all fields about the document which change frequently
indexed and documentID stored.
Questions
1. First of all, is there a better solution to this frequently changing fields
having to be reindexed ?
2. Let's say I go with the 2 index approach,
Example query: Content: "Hello world" AND Folder:Folder1 AND LastReadBy: jane.
If we execute these queries on our static and dynamic indexes, they will
obviously fail to get hits.
Let's say I have a way of splitting my queries such that all content
queries go to static (content) index only and queries on other fields go to the
dynamic index, basically allow for queries to come in such a way that it is
always a AND between the dynamic index result set and static index result set.
So on the results set, I would have to retrieve the document ID and make sure
we have the same documentID in both the result sets, in order for it to be a
match.
In cases where the result sets are really huge from both the queries,
then even to get the number of hits, I will have to retrieve each and every
document from the results, in order to get the documentID for comparison.
Queries can get really slow.
Has anyone faced similar problems, If so what was your solution?
Any comments/thoughts will be appreciated.
Thank you
JS
---------------------------------
Start your day with Yahoo! - make it your home page