Hi Carsten, I would suggest to use my example code with the fake query and custom rewrite. This does not have the overhead of BooleanQuery and more important: You don't need to change the *global* and *static* default in BooleanQuery. Otherwise you could introduce a denial of service case into your application, if you at some other place execute a wildcard using Boolean rewrite with unlimited number of terms.
The custom rewrite with the fake query to collect the terms was posted into another mail on this thread. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Carsten Schnober [mailto:schno...@ids-mannheim.de] > Sent: Tuesday, March 12, 2013 10:13 AM > To: java-user@lucene.apache.org > Subject: Re: Rewrite for RegexpQuery > > Am 11.03.2013 18:22, schrieb Michael McCandless: > > On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober > > <schno...@ids-mannheim.de> wrote: > >> Am 11.03.2013 13:38, schrieb Michael McCandless: > >>> On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler <u...@thetaphi.de> > wrote: > >>> > >>>> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE, > then this should work (after rewrite your query is a BooleanQuery, which > supports extractTerms()). > >>> > >>> ... as long as you don't exceed the max number of terms allowed by > >>> BQ > >>> (1024 by default, but you can raise it). > >> > >> True, I've noticed this meanwhile. Are there any recommendations for > >> this setting where the limit is as large as possible while staying > >> within a reasonable performance? Of course, this is highly > >> subjective, but what's the magnitude here? Will a limit of 1,024,000 > >> typically increase the query time by the factor 1,000 too? > >> Carsten > > > > I think 1024 may already be too high ;) > > > > But really it depends on your situation: test different limits and see. > > > > How much slower a larger query is depends on the specifics of the terms ... > > For the purpose of initial testing, I've increased the limit by the factor > 1,000. > As Uwe pointed out, I don't actually execute the query, but only extract the > terms. In this regard, there are no performance issues with thousands of > terms, although I will have to perform a systematic evaluation yet. > Best, > Carsten > > > -- > Institut für Deutsche Sprache | http://www.ids-mannheim.de > Projekt KorAP | http://korap.ids-mannheim.de > Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de > Korpusanalyseplattform der nächsten Generation Next Generation Corpus > Analysis Platform > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org