Hi Carsten, Why not use your idea of the BooleanQuery but wrap it in a Filter instead? Since you are not doing any scoring (only filtering), the max boolean clauses limit should not apply to a filter.
-sujit On Apr 12, 2013, at 7:34 AM, Carsten Schnober wrote: > Dear list, > I would like to create a sub-set of the documents in an index that is to > be used for further searches. However, the criteria that lead to the > creation of that sub-set are not predefined so I think that faceted > search cannot be applied my this use case. > > For instance: > A user searches for documents that contain token 'A' in a field 'text'. > These results form a set of documents that is persistently stored (in a > database). Each document in the index has a field 'id' that identifies > it, so these "external" IDs are stored in the database. > > Later on, a user loads the document IDs from the database and wants to > execute another search on this set of documents only. However, > performing a search on the full index and subsequently filtering the > results against that list of documents takes very long if there are many > matches. This is obvious as I have to retrieve the external id from each > matching document and check whether it is part of the desired sub-set. > Constructing a BooleanQuery in the style "id:Doc1 OR id:Doc2 ..." is not > suitable either because there could be thousands of documents exceeding > any limit for Boolean clauses. > > Any suggestions how to solve this? I would have gone for the Lucene > document numbers and store them as a bit set that I could use as a > filter during later searches, but I read that the document numbers are > ephemeral. > > One possible way out seems to be to create another index from the > documents that have matched the initial search, but this seems quite an > overkill, especially if there are plenty of them... > > Thanks for any hint! > Carsten > > -- > Institut für Deutsche Sprache | http://www.ids-mannheim.de > Projekt KorAP | http://korap.ids-mannheim.de > Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de > Korpusanalyseplattform der nächsten Generation > Next Generation Corpus Analysis Platform > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org