Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Michael McCandless
+1

Though not by docID (since they aren't reliable in context of
IndexWriter)... and it should be undeleteDocuments (with an s) since
it could affect more than one doc.

Mike

On Wed, Jul 29, 2009 at 10:55 AM, Shai Ereraser...@gmail.com wrote:
 Hi

 I think such methods are useful for a Lucene app, which needs to rollback a
 single document delete. Today, IndexReader offers undeleteAll(), which is a
 bit extreme. There are two scenarios for this, that I know of:
 1) (recently showed up on the user list) I'd like to synchronize documents
 on disk and in the index. So if I have a document in the index which I want
 to delete, and also a file on the file system (corresponds to an ID or
 something), and the file delete fails, I may want to undelete that document.
 This has alternatives, but still and undeleteDocument will be useful in this
 case.

 2) ParallelReader allows one to add a document to two indexes, some fields
 to one index and other to the second index, and then read those indexes in
 parallel. Such applications will need to delete documents sometimes, and an
 undeleteDocument will be useful if a transactional delete is needed: i.e.,
 if the first delete succeeds, and the second fails, undo the first delete.

 3) ParallelReader doesn't support deleteDocument well currently - i.e., if
 one of the deletes fail, some readers will be left w/ the document and some
 won't (this is I think a bug).

 What do you think?

 Shai


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Shai Erera
Yes of course. I meant to create an undeleteDoc variant for every deleteDoc.
So if IndexWriter has deleteDocuments(Term), I will add
undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add
undeleteDocument(int).

It is up to the caller to make sure whatever he undeletes was indeed
deleted, i.e., if you reader.deleteDocument(4) and then
reader.undeleteDocument(4), you should make sure that 4 represents the same
document.

In fact, I think it might be useful to restrict the undeleteDoc methods to
the same reader instance with which they were deleted? It's easy to do by
checking if deletedDocs does not contain any of the docs passed to the
undelete method. The rational is that I believe the best use case for these
undelete methods to be a mini undo of the last delete. Using the same
reader instance you're guaranteed that the document is still deleted
between delete() and undelete().

Also, since I can only open the index for write once, whether by IndexWriter
or IndexReader w/ readOnly=false, we can guarantee that an undelete followed
by delete is safe?

Shai

On Wed, Jul 29, 2009 at 7:26 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 +1

 Though not by docID (since they aren't reliable in context of
 IndexWriter)... and it should be undeleteDocuments (with an s) since
 it could affect more than one doc.

 Mike

 On Wed, Jul 29, 2009 at 10:55 AM, Shai Ereraser...@gmail.com wrote:
  Hi
 
  I think such methods are useful for a Lucene app, which needs to rollback
 a
  single document delete. Today, IndexReader offers undeleteAll(), which is
 a
  bit extreme. There are two scenarios for this, that I know of:
  1) (recently showed up on the user list) I'd like to synchronize
 documents
  on disk and in the index. So if I have a document in the index which I
 want
  to delete, and also a file on the file system (corresponds to an ID or
  something), and the file delete fails, I may want to undelete that
 document.
  This has alternatives, but still and undeleteDocument will be useful in
 this
  case.
 
  2) ParallelReader allows one to add a document to two indexes, some
 fields
  to one index and other to the second index, and then read those indexes
 in
  parallel. Such applications will need to delete documents sometimes, and
 an
  undeleteDocument will be useful if a transactional delete is needed:
 i.e.,
  if the first delete succeeds, and the second fails, undo the first
 delete.
 
  3) ParallelReader doesn't support deleteDocument well currently - i.e.,
 if
  one of the deletes fail, some readers will be left w/ the document and
 some
  won't (this is I think a bug).
 
  What do you think?
 
  Shai
 

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Michael McCandless
On Wed, Jul 29, 2009 at 3:05 PM, Shai Ereraser...@gmail.com wrote:
 Yes of course. I meant to create an undeleteDoc variant for every deleteDoc.
 So if IndexWriter has deleteDocuments(Term), I will add
 undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add
 undeleteDocument(int).

OK.

 It is up to the caller to make sure whatever he undeletes was indeed
 deleted, i.e., if you reader.deleteDocument(4) and then
 reader.undeleteDocument(4), you should make sure that 4 represents the same
 document.

Presumably in IndexReader we can return int count (how many deleted),
but in IndexWriter it's void.

 In fact, I think it might be useful to restrict the undeleteDoc methods to
 the same reader instance with which they were deleted? It's easy to do by
 checking if deletedDocs does not contain any of the docs passed to the
 undelete method. The rational is that I believe the best use case for these
 undelete methods to be a mini undo of the last delete. Using the same
 reader instance you're guaranteed that the document is still deleted
 between delete() and undelete().

That might be too restrictive?  Ie, this is the best use case we can
picture today, but others could come up with different use cases, and
there's no technical reason for such a restriction?

undeleteAll doesn't have such a restriction.

 Also, since I can only open the index for write once, whether by IndexWriter
 or IndexReader w/ readOnly=false, we can guarantee that an undelete followed
 by delete is safe?

Or the undelete methods in IndexReader could just acquire the write lock?

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Shai Erera

 Or the undelete methods in IndexReader could just acquire the write lock?


I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete
a document, no? And then I'll need to acquire the write lock, just like any
other write operation done through IndexReader, right?

Or do you suggest we allow this for readOnly IndexReaders too?

That might be too restrictive?


Yes - I pointed that just as a safety measure. However, sometimes
(especially following the 'agile' guidelines) it's better to develop
something for a problem we know exist, rather than trying to over-engineer
for something we 'think might exist'. If a good use case will be presented
in the future which requires the undelete to work also in readers that did
not do the delete themselves, we can change that behavior then, no?

Maybe I'll start to work on it and we can decide that as we go? There's no
point making decisions now, when we don't know if it is a major thing to
support or not. Maybe it can be supported 'for free', and then it won't be a
question at all.

Shai

On Wed, Jul 29, 2009 at 10:58 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 undeleteAll doesn't have such a restriction.



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Michael McCandless
On Wed, Jul 29, 2009 at 4:06 PM, Shai Ereraser...@gmail.com wrote:
 Or the undelete methods in IndexReader could just acquire the write lock?

 I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete
 a document, no? And then I'll need to acquire the write lock, just like any
 other write operation done through IndexReader, right?

 Or do you suggest we allow this for readOnly IndexReaders too?

Right, you'll definitely need to acquire the write lock for undeleteDoc.

 That might be too restrictive?

 Yes - I pointed that just as a safety measure. However, sometimes
 (especially following the 'agile' guidelines) it's better to develop
 something for a problem we know exist, rather than trying to over-engineer
 for something we 'think might exist'. If a good use case will be presented
 in the future which requires the undelete to work also in readers that did
 not do the delete themselves, we can change that behavior then, no?

 Maybe I'll start to work on it and we can decide that as we go? There's no
 point making decisions now, when we don't know if it is a major thing to
 support or not. Maybe it can be supported 'for free', and then it won't be a
 question at all.

I agree!  There's no need to decide now.  So let's defer.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org