I keep considering a full response too this, but I just can't get over the hump and spend the time writing something up. Figured someone else would get to it - perhaps they still will.
I will make a comment here though: >Before Lucene 2.9, I don't think this made any difference, as (I think) the >only advantage to calling reopen vs. just creating another IndexReader was >having reopen figure out whether the index had actually changed. (And whave >a different way to figure that out, so it was a non-issue.) Thats not quite right. Reopen did not just check if the index had changed - it also only reloaded the segments that changed. The big change allowed by per segment searching is that now only the *FieldCache pieces that have changed* are also reloaded, rather than the whole FieldCache. So reopen was nice and advantageous before - but now it is more so if you are using FieldCaches. Nigel wrote: > Anyone have any ideas here? I imagine a lot of other people will have a > similar question when trying to take advantage of the reopen improvements in > 2.9. > > Thanks, > Chris > > On Thu, Oct 1, 2009 at 5:15 PM, Nigel <nigelspl...@gmail.com> wrote: > > >> I have a question about the reopen functionality in Lucene 2.9. As I >> understand it, since FieldCaches are now per-segment, it can avoid reloading >> everything when the index is reopened, and instead just load the new >> segments. >> >> For background, like many people we have a distributed architecture where >> indexes are created on one server and copied to multiple other servers. The >> way that copying works now is something like the following: >> >> 1. Let's say the current index is in /indexes/a and is open >> 2. An empty directory for the updated index is created, let's say >> /indexes/b >> 3. Hard links for the files in /indexes/a are created in /indexes/b >> 4. We rsync the current index on the server with /indexes/b, thus >> copying over new cfs files and deleting hard links to files no longer in >> use >> 5. A new IndexReader is opened for /indexes/b and warmed up >> 6. The application starts using the new reader instead of the old one >> 7. The old IndexReader is closed and /indexes/a is deleted >> >> I'm simplifying a few steps, but I think this is familiar to many people, >> and it's my impression that Solr implements something similar. >> >> The point is, the updated index lives in a new directory in this scheme, >> and so we don't actually reopen the existing IndexReader; we open a new one >> with a different FSDirectory. >> >> Before Lucene 2.9, I don't think this made any difference, as (I think) the >> only advantage to calling reopen vs. just creating another IndexReader was >> having reopen figure out whether the index had actually changed. (And whave >> a different way to figure that out, so it was a non-issue.) >> >> With Lucene 2.9, there's now a big difference, namely the per-segment >> caching mentioned above. So the question is how to make use of reopen with >> our distribution scheme. Is there an informal best practice for handling >> this case? For example, should step #5 above rename /indexes/b to >> /indexes/a so the index can be reopened in the same physical location? Or >> should rsync operate on the existing directory in-place, updating the >> segments* files last and relying on the fact that deleted files will not >> really be deleted (on Linux, at least) as long as the app is still holding >> them open? >> >> I guess the answer may depend on how exactly reopen knows which files are >> the "same" (e.g. does it look at filenames, or file descriptors, etc.). >> >> Thanks, >> Chris >> >> > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org