Re: Performance of never optimizing

Paul Smith Wed, 05 Nov 2008 13:48:17 -0800

I don't believe our large users to have enough memory for Luceneindexes to fit in RAM. (Especially given we use quite a bit of RAMfor other stuff.) I think we also close readers pretty frequently(whenever any user updates a JIRA issue, which I am assuminghappening nearly constantly when you've got thousands of users). Iwas trying to mimic our usage as closely as I could to see whetherLucene behaves pathologically poorly given our current architecture.There have been some excellent suggestions about using in-memoryindexes for recent updates but changes of that kind are,unfortunately, currently outside of my purview :-(
Given that our current usage may be suboptimal :-/ does anyone haveany ideas about what may be causing the anomalies I identifiedearlier?

We have exactly the same problem JIRA has only even bigger I think..We have large projects with 10's of millions of documents and mailitems. Our requirement was a 5 second refresh time (that is, anupdate (add, delete, or update) can take no longer than 5 secondsbefore a subsequent search can see it. Worse, we have a large numberof fields customers need to sort by, so tearing down a 15Gb index witha dozen sorting fields every 5 seconds and rebuilding theFieldSortedHitQueue's is clearly not going to work.. :)

We solved this by having a virtual index made up of an 'archive' and a'work' index, and then run a multi-reader over the 2. All updates(adds, updates, deletes) are done as a delete to the Archive index,and then an add/update to the work index. Every week during a lull wemerge the 2 into a new archive index directory and 'switch' to it(blocking updates while we optimize and switch). This means the worksub-index can be refreshed every 5 seconds because it is small and we'pin' the archive index in memory by doing... well.. a fairlyegregious hack to be honest. We actually have to do updates to theArchive to satisfy the delete, but doing that normally would require atotal refresh for that delete to be made visible. We accomplish thatby allowing the delete to go to the disk (via deleted segment) butalso we apply the deletes in memory as well so that can be seen. Thisway the most up-to-date data can be seen in the work index.

This gives the best of both worlds, a really warmed up large archiveindex, and a smaller work index ( no more than a weeks worth ofupdates) that we can refresh every 5 seconds. The tear down/warm upcycle appears to be fine for us for the work index and we can satisfysearches very quickly.

It would be really nice if Lucene could allow deletes to be doneagainst a live IndexReader without flushing anything else out.


cheers,

Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance of never optimizing

Reply via email to