On Thu, Jul 06, 2006, Yonik Seeley wrote about "Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)": >.. > When one interleaves adds and deletes, it isn't the case that > indexreaders and indexwriters need to be opened and closed each > interleave.
Actually, you do have to do exactly that, because you can't leave both an indexreader and indexwriter open, and delete documents in one and add documents in another, interleaved: the lock that both indexreader (when deleting) and indexwriter open will not allow that. Granted, if you buffer either the deletes or additions in memory and do them later in batches, you don't need to open indexreader and indexwriter for every single document; But this is also something which is not trivial to do (correctly and consistently) without writing a bunch of code. > I was left wondering if the extensive changes to IndexWriter were > worth it, or if it was best left to something at a higher level (like > a better IndexModifier, or something like what Solr does). My guess, based on my own experience as a Lucene newbie and on the large number of questions I see on the lucene-user list, is that most users don't understand why they need to concern themselves with the separate IndexReader and IndexWriter objects. They'd rather have a single "Index" object on which you can do any operation at any order, efficiently (not like IndexModifier). Such an object should be part of the Lucene core, and not left for everyone to implement themselves in a different way (like happens now). If we could create a BetterIndexModifier which does this based on lower level IndexWriter and IndexReader objects, that would be great, but it's not obvious that it will be possible to do it efficiently enough, and it's less clear what kind of guarantees such an implementation can make (e.g., can it guarantee that a parallel IndexReader will not see 0 or 2 versions of the same document?). That being said, I'd love to see a patch like Ning's, but which goes further to combine the capabilties of an IndexReader and IndexWriter: After an IndexWriter can delete a document based on a term it contains, why stop there - why not allow this IndexWriter full reading capabilities, and allow it to make more sophisticated searches to decide what to delete? As I mentioned in a previous post, I needed this capability in an application which indexed emails and attachments, and when an email document was deleted I also had to delete the attached documents (listed in a field of the email) from the index. -- Nadav Har'El | Friday, Jul 7 2006, 11 Tammuz 5766 IBM Haifa Research Lab |----------------------------------------- |May you live as long as you want - and http://nadav.harel.org.il |never want as long as you live. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]