Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
+1 Though not by docID (since they aren't reliable in context of IndexWriter)... and it should be undeleteDocuments (with an s) since it could affect more than one doc. Mike On Wed, Jul 29, 2009 at 10:55 AM, Shai Ereraser...@gmail.com wrote: Hi I think such methods are useful for a Lucene app, which needs to rollback a single document delete. Today, IndexReader offers undeleteAll(), which is a bit extreme. There are two scenarios for this, that I know of: 1) (recently showed up on the user list) I'd like to synchronize documents on disk and in the index. So if I have a document in the index which I want to delete, and also a file on the file system (corresponds to an ID or something), and the file delete fails, I may want to undelete that document. This has alternatives, but still and undeleteDocument will be useful in this case. 2) ParallelReader allows one to add a document to two indexes, some fields to one index and other to the second index, and then read those indexes in parallel. Such applications will need to delete documents sometimes, and an undeleteDocument will be useful if a transactional delete is needed: i.e., if the first delete succeeds, and the second fails, undo the first delete. 3) ParallelReader doesn't support deleteDocument well currently - i.e., if one of the deletes fail, some readers will be left w/ the document and some won't (this is I think a bug). What do you think? Shai - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
Yes of course. I meant to create an undeleteDoc variant for every deleteDoc. So if IndexWriter has deleteDocuments(Term), I will add undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add undeleteDocument(int). It is up to the caller to make sure whatever he undeletes was indeed deleted, i.e., if you reader.deleteDocument(4) and then reader.undeleteDocument(4), you should make sure that 4 represents the same document. In fact, I think it might be useful to restrict the undeleteDoc methods to the same reader instance with which they were deleted? It's easy to do by checking if deletedDocs does not contain any of the docs passed to the undelete method. The rational is that I believe the best use case for these undelete methods to be a mini undo of the last delete. Using the same reader instance you're guaranteed that the document is still deleted between delete() and undelete(). Also, since I can only open the index for write once, whether by IndexWriter or IndexReader w/ readOnly=false, we can guarantee that an undelete followed by delete is safe? Shai On Wed, Jul 29, 2009 at 7:26 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 Though not by docID (since they aren't reliable in context of IndexWriter)... and it should be undeleteDocuments (with an s) since it could affect more than one doc. Mike On Wed, Jul 29, 2009 at 10:55 AM, Shai Ereraser...@gmail.com wrote: Hi I think such methods are useful for a Lucene app, which needs to rollback a single document delete. Today, IndexReader offers undeleteAll(), which is a bit extreme. There are two scenarios for this, that I know of: 1) (recently showed up on the user list) I'd like to synchronize documents on disk and in the index. So if I have a document in the index which I want to delete, and also a file on the file system (corresponds to an ID or something), and the file delete fails, I may want to undelete that document. This has alternatives, but still and undeleteDocument will be useful in this case. 2) ParallelReader allows one to add a document to two indexes, some fields to one index and other to the second index, and then read those indexes in parallel. Such applications will need to delete documents sometimes, and an undeleteDocument will be useful if a transactional delete is needed: i.e., if the first delete succeeds, and the second fails, undo the first delete. 3) ParallelReader doesn't support deleteDocument well currently - i.e., if one of the deletes fail, some readers will be left w/ the document and some won't (this is I think a bug). What do you think? Shai - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
On Wed, Jul 29, 2009 at 3:05 PM, Shai Ereraser...@gmail.com wrote: Yes of course. I meant to create an undeleteDoc variant for every deleteDoc. So if IndexWriter has deleteDocuments(Term), I will add undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add undeleteDocument(int). OK. It is up to the caller to make sure whatever he undeletes was indeed deleted, i.e., if you reader.deleteDocument(4) and then reader.undeleteDocument(4), you should make sure that 4 represents the same document. Presumably in IndexReader we can return int count (how many deleted), but in IndexWriter it's void. In fact, I think it might be useful to restrict the undeleteDoc methods to the same reader instance with which they were deleted? It's easy to do by checking if deletedDocs does not contain any of the docs passed to the undelete method. The rational is that I believe the best use case for these undelete methods to be a mini undo of the last delete. Using the same reader instance you're guaranteed that the document is still deleted between delete() and undelete(). That might be too restrictive? Ie, this is the best use case we can picture today, but others could come up with different use cases, and there's no technical reason for such a restriction? undeleteAll doesn't have such a restriction. Also, since I can only open the index for write once, whether by IndexWriter or IndexReader w/ readOnly=false, we can guarantee that an undelete followed by delete is safe? Or the undelete methods in IndexReader could just acquire the write lock? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
Or the undelete methods in IndexReader could just acquire the write lock? I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete a document, no? And then I'll need to acquire the write lock, just like any other write operation done through IndexReader, right? Or do you suggest we allow this for readOnly IndexReaders too? That might be too restrictive? Yes - I pointed that just as a safety measure. However, sometimes (especially following the 'agile' guidelines) it's better to develop something for a problem we know exist, rather than trying to over-engineer for something we 'think might exist'. If a good use case will be presented in the future which requires the undelete to work also in readers that did not do the delete themselves, we can change that behavior then, no? Maybe I'll start to work on it and we can decide that as we go? There's no point making decisions now, when we don't know if it is a major thing to support or not. Maybe it can be supported 'for free', and then it won't be a question at all. Shai On Wed, Jul 29, 2009 at 10:58 PM, Michael McCandless luc...@mikemccandless.com wrote: undeleteAll doesn't have such a restriction.
Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
On Wed, Jul 29, 2009 at 4:06 PM, Shai Ereraser...@gmail.com wrote: Or the undelete methods in IndexReader could just acquire the write lock? I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete a document, no? And then I'll need to acquire the write lock, just like any other write operation done through IndexReader, right? Or do you suggest we allow this for readOnly IndexReaders too? Right, you'll definitely need to acquire the write lock for undeleteDoc. That might be too restrictive? Yes - I pointed that just as a safety measure. However, sometimes (especially following the 'agile' guidelines) it's better to develop something for a problem we know exist, rather than trying to over-engineer for something we 'think might exist'. If a good use case will be presented in the future which requires the undelete to work also in readers that did not do the delete themselves, we can change that behavior then, no? Maybe I'll start to work on it and we can decide that as we go? There's no point making decisions now, when we don't know if it is a major thing to support or not. Maybe it can be supported 'for free', and then it won't be a question at all. I agree! There's no need to decide now. So let's defer. Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org