A simple variant on Approach 1 would be to open your writer with autoCommit=false.

This way no reader will ever see the changes until you successfully close the writer. If the machine crashes the index is still in the starting state as of when the writer was first opened.

Also, re-open of Approach 1 should be a bit (not a lot, though there is work to make it a lot) faster than wholly new open required in approach 2.

There should not be problems optimizing while searching. Yes, you use more disk space, but no more (in fact, less) than approach 2 requires.

I think approach 2 is only possibly better if the indexing would be done on a different computer / IO system.

Mike

Sridhar Raman wrote:

This is my situation. I have an index, which has a lot of search requests coming into it. I use just a single instance of IndexSearcher to process these requests. At the same time, this index is also getting updated by an IndexWriter. And I want these new changes to be reflected _only_ at certain intervals. I have thought of a few ways of doing this. Each has its share of problems and pluses. I would be glad if someone can help me in figuring out the right approach, especially from the performance point of view, as
the number of documents that will get indexed are pretty large.

Approach 1:
Have just one copy of the index for both Search & Index. At time T, when I need to see the new changes reflected, I close the Searcher, and open it
again.
- The re-open of the Searcher might be a bit slow (which I could probably
solve by using some warm-up threads).
- Update and Search on the index at the same - will this affect the
performance?
- If server crashes before time T, the new Searcher would reflect the
changes, which is not acceptable. I want the changes to be reflected only at time T. If server crashes, the index should be the previous T-1 index.
- Possible problems while optimising the index (as Search is also
happening).
+ Just one copy of the index being stored.

Approach 2:
Keep 2 copies of the index - 1 for Search, 1 for Index. At time T, I just
switch the Searcher to a copy of index that is being updated.
- Before I do the switch to the new index, I need to make a copy of it so
that the updates continue to happen on the other index.  Is there a
convenient way to make this copy?  Is it efficient?
- Time taken to create a new Searcher will still be a problem (but this is a
problem in the previous approach as well, and we can live with it).
+ Optimise can happen on an index that is not being read, as a result, its
resource requirements would be lesser.  And probably even the speed of
optimisation.
+ Faster search as the index update is happening on a different index.

So, these are the 2 approaches I am contemplating about. Any pointers which
would be the better approach?

Thanks,
Sridhar


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to