RE: Rewrite for RegexpQuery

Uwe Schindler Tue, 12 Mar 2013 02:40:13 -0700

Hi Carsten,

I would suggest to use my example code with the fake query and custom rewrite. 
This does not have the overhead of BooleanQuery and more important: You don't 
need to change the *global* and *static* default in BooleanQuery. Otherwise you 
could introduce a denial of service case into your application, if you at some 
other place execute a wildcard using Boolean rewrite with unlimited number of 
terms.


The custom rewrite with the fake query to collect the terms was posted into 
another mail on this thread.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Carsten Schnober [mailto:schno...@ids-mannheim.de]
> Sent: Tuesday, March 12, 2013 10:13 AM
> To: java-user@lucene.apache.org
> Subject: Re: Rewrite for RegexpQuery
> 
> Am 11.03.2013 18:22, schrieb Michael McCandless:
> > On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober
> > <schno...@ids-mannheim.de> wrote:
> >> Am 11.03.2013 13:38, schrieb Michael McCandless:
> >>> On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler <u...@thetaphi.de>
> wrote:
> >>>
> >>>> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE,
> then this should work (after rewrite your query is a BooleanQuery, which
> supports extractTerms()).
> >>>
> >>> ... as long as you don't exceed the max number of terms allowed by
> >>> BQ
> >>> (1024 by default, but you can raise it).
> >>
> >> True, I've noticed this meanwhile. Are there any recommendations for
> >> this setting where the limit is as large as possible while staying
> >> within a reasonable performance? Of course, this is highly
> >> subjective, but what's the magnitude here? Will a limit of 1,024,000
> >> typically increase the query time by the factor 1,000 too?
> >> Carsten
> >
> > I think 1024 may already be too high ;)
> >
> > But really it depends on your situation: test different limits and see.
> >
> > How much slower a larger query is depends on the specifics of the terms ...
> 
> For the purpose of initial testing, I've increased the limit by the factor 
> 1,000.
> As Uwe pointed out, I don't actually execute the query, but only extract the
> terms. In this regard, there are no performance issues with thousands of
> terms, although I will have to perform a systematic evaluation yet.
> Best,
> Carsten
> 
> 
> --
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP                 | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
> Korpusanalyseplattform der nächsten Generation Next Generation Corpus
> Analysis Platform
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Rewrite for RegexpQuery

Reply via email to