Maybe we should close this issue with a won't-fix and start a new one for
filtered deletions?

A few thoughts, without looking at the code, just thinking aloud  :)

It is inverted filter what we are talking about here, Lucene uses Filter as a 
pass filter (Set bit defines document that should pass)  causing very high 
density BitVector/Iteators... for a few deletions. Imo, current filter 
implementation would not bring performance benefit for simple cases as you 
would have to check every document that passes query (a few deletions case). 
That is the same as today, just recycling concept of filters for deletions (+1, 
simplification) 


This is conceptually almost equal (fully equal, when Paul gets "Fillters as 
bolean clauses" done) to having separate, single valued field indexed 

isDeleted {true, false}

where each Query gets implicitly transformed to "OriginalQuery AND 
isDeleted:false" without scoring on second clause.

skipTo() performance here is obviously relevant.

Clearly, It is better to apply this condition higher in query evaluation than 
at lower, per-term level. Especially for worst case Queries with High density 
terms where we have a lot of overlap between terms (exactly the case we hate).

I could it work with Column stride fields? Simply using them internally by 
Lucene for implementing deletions (fast updates...). and use them as filters? 


   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to