The delete by query is solved by recording the primary / UID of the document(s) deleted. It's only expensive if the transaction log implementation is not designed properly. :)
On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer <simon.willna...@googlemail.com> wrote: > hey folks, > > we already have transaction logging on Solr side so I should have > started this discussion earlier. However, I want to bring this up to > the list since I think this is a very valuable feature also for plain > Lucene users and eventually this should also be available to them. I > don't think this needs to be a core feature at all but I think we need > to provide the necessary hooks in Lucene core to make this reliable > and consistent. I have a couple of concerns that which the current > extension mechanism we provide on the IndexWriter side this feature > can only be implemented in a sub-optimal way on the Solr (or basically > on top of lucene) but lemme elaborate this a little. > > IndexWriter doesn't provide any transaction guarantees neither does it > give any guarantees on the order. So if you index two versions of a > document with the same delete key you can't tell which one wins unless > you prevent IW from seeing those two documents at the same time ie. > locking before you hit IW. This is basically what other implementation > do like ElasticSearch which uses locks assigned to buckets in an array > selected based on the del terms hash. However this gets a little more > complex once you get to DeleteQueries where you can't tell which > document is affected so they might be misplaced in the transaction log > if the order doesn't match the order the IW sees. Under the hood IW > does maintain such an order inside the DocumentsWriterDeleteQueue > which could be utilized to provide a total ordering that IMO should be > reflected in the transaction log. > > Before I am going to propose ways of how this could be implemented I > want to check if other think we should provide more reliable ways for > users with the need for durability and consistent recovery. > > simon > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org