This isn't a new problem. Databases have been around for what, 30+ years? On Thu, Sep 8, 2011 at 11:01 AM, Simon Willnauer <simon.willna...@googlemail.com> wrote: > On Thu, Sep 8, 2011 at 4:21 PM, Jason Rutherglen > <jason.rutherg...@gmail.com> wrote: >> The delete by query is solved by recording the primary / UID of the >> document(s) deleted. It's only expensive if the transaction log >> implementation is not designed properly. :) > > phew I don't think this is realistic. I mean this could be a lot of > documents and looking up a lot of primary keys, plus you need to know > what the primary key is and you somehow need to do this async. I don't > consider this as an option. > > simon >> >> On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer >> <simon.willna...@googlemail.com> wrote: >>> hey folks, >>> >>> we already have transaction logging on Solr side so I should have >>> started this discussion earlier. However, I want to bring this up to >>> the list since I think this is a very valuable feature also for plain >>> Lucene users and eventually this should also be available to them. I >>> don't think this needs to be a core feature at all but I think we need >>> to provide the necessary hooks in Lucene core to make this reliable >>> and consistent. I have a couple of concerns that which the current >>> extension mechanism we provide on the IndexWriter side this feature >>> can only be implemented in a sub-optimal way on the Solr (or basically >>> on top of lucene) but lemme elaborate this a little. >>> >>> IndexWriter doesn't provide any transaction guarantees neither does it >>> give any guarantees on the order. So if you index two versions of a >>> document with the same delete key you can't tell which one wins unless >>> you prevent IW from seeing those two documents at the same time ie. >>> locking before you hit IW. This is basically what other implementation >>> do like ElasticSearch which uses locks assigned to buckets in an array >>> selected based on the del terms hash. However this gets a little more >>> complex once you get to DeleteQueries where you can't tell which >>> document is affected so they might be misplaced in the transaction log >>> if the order doesn't match the order the IW sees. Under the hood IW >>> does maintain such an order inside the DocumentsWriterDeleteQueue >>> which could be utilized to provide a total ordering that IMO should be >>> reflected in the transaction log. >>> >>> Before I am going to propose ways of how this could be implemented I >>> want to check if other think we should provide more reliable ways for >>> users with the need for durability and consistent recovery. >>> >>> simon >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> >> >
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org