Hi, http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/FieldCacheDocIdSet.html
You just have to implement the "protected boolean matchDoc(int docId)" method. You should return this DocIdSet from your filter instead of the manual code you created. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: chrisbamford [mailto:chrisbamf...@chrisbamford.plus.com] > Sent: Monday, March 16, 2015 1:07 PM > To: Uwe Schindler > Cc: java-user@lucene.apache.org; ch...@bammers.net > Subject: RE: Filtering question > > Hi Uwe > > I have downloaded Lucene 5.0.0 source to look at the Filters you mention. > DocValuesTermsFilter looks promising, however I cannot find > FieldCacheDocIdSet anywhere in Lucene 4.10.2 or in 5.0.0. Where should I be > looking? > > I take your point about brute-forcing the DocValues search and all I can do is > implement / test and decide if it is acceptable. This is the main driver > behind > getting flitering working correctly! > > Thanks for your continued help. > > - Chris > > On 12.03.2015 18:45, Uwe Schindler wrote: > > Hi Chris, > > > > > >> Hi Uwe, thanks for your suggestions. I have tried a couple of things > >> with no luck yet: > >> > >> > Sorry, > >> > I just noticed, you are using TermFilter not TermsFilter: This one > >> > does not support random access (using bits()). Because of this the > >> > filtered docs cannot be passed down using acceptDocs. > >> > > >> TermsFilter made no difference, still no acceptDocs passed to the > >> filter. > > > > I know, the problem is that a TermsFilter with one term behaves like a > > TermFilter :-) In any case you could use CachingWrapperFilter to > > forcefully create a bitset, but I don't think it's worth the hassle. > > It is still strange, I did not dig into it. > > > >> > The should > >> > clause in addition causes that the ConstantScoreQuery has to try > >> all > >> > documents because there is nothing else that could drive the > >> query. > >> > > >> As an experiment I tried MUST, this didn't help either. > > > > I checked your impl: You just create a Bitset, so it won't help. > > Please look at other DocValues filters like DocValuesRangeFilter how > > they implement the iterator. Creating a BitSet is just overhead and > > while doing so, you have no chance to take other query constraints > > into account (because the bitset is built *before* the query is > > executed). > > > > Instead you should implement a custom DocIdSet (Lucene 4.10 offers > > FieldCacheDocIdSet as base class; in 5.0 it was renamed to > > DocValuesDocIdSet, you can implement the abstract matchDoc() method > > there). This one automatically handles everything correctly, like > > acceptDocs or uses advance(). It does not build a bitset, it does > > everything by calling the abstract matchDoc() method on the fly. You > > just have to put the matching logic into matchDoc(int docId). > > > >> > An alternative approach would be (in Lucene 4.10 or 5.0) to add > >> the > >> > TermFilter as ConstantScoreFilter(TermQuery) with boost=0 to the > >> > BooleanQuery. In that case it can drive the query and does not > >> affect > >> > scoring. In later Lucene versions you may use the new > >> > BooleanQuery.Occur type "FILTER" which can add any query as > >> filter. > >> > Filters will be deprecated once this is ready. > >> > > >> This is interesting and I will try it when I get a chance. > > > > I mean ConstantScoreQuery, not ConstantScoreFilter. But you need to > > implement your own DocIdSetIterator with DocIdSetIterator.advance(), > > otherwise it won't help (see above). > > > >> >> My goal is to slowly transform a particular field from > >> StringField to > >> >> BinaryDocValues so that during the transition a doc may hold the > >> >> value either in the old location or the new. Therefore a query > >> must > >> >> be able to say > >> >> oldField:"foo" OR newField:"foo" > >> >> Where oldField is a StringField and newField is a > >> BinaryDocValues. > >> > > >> > Why do you want to do this. > >> > > >> Good question! In our architecture we build indexes by pulling data > >> from several sources and it is _expensive_. Increasingly we are > >> requested to change one or two fields which currently requires a full > >> re-index of the doc. > >> When I attended the Dublin Lucene conference I spoke to Shai Erera > >> about this problem and he pointed me at DocValues which allow you to > >> update fields without incurring the full doc reindex cost. That is > >> the appeal for us. > >> As I said before, we want to transform docs only as they are updated, > >> where transformation involves dropping the old TextField and creating > >> a new BinaryDocValuesField containing the same value. Hence the need > >> for the query to be able to search 'old OR new'. > >> > >> > If you want to query like this on the field, it is a bad idea to > >> use > >> > DocValues. > >> > > >> Why is it a bad idea? > > > > Indeed, DocValues are update-able. But they have the backside, that > > they don't provide a way to query the index for a term and it tells > > you which documents have the term (our inverted index - the reason why > > we use Lucene!). DocValues are just a large array with random access. > > If you want to query on it, you have to brute force, unless there is > > something else in the query structure that can "drive" your query > > (advance() on the filter's iterator). On a BooleanQuery containing of > > 2 should clauses, nothing can drive the query, so there is only the > > possibility to do a full scan of the docvalues doc-by-doc. > > > > Uwe > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org