Re: Rewrite for RegexpQuery

Carsten Schnober Tue, 12 Mar 2013 02:13:06 -0700

Am 11.03.2013 18:22, schrieb Michael McCandless:
> On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober
> <schno...@ids-mannheim.de> wrote:
>> Am 11.03.2013 13:38, schrieb Michael McCandless:
>>> On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler <u...@thetaphi.de> wrote:
>>>
>>>> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE, then this 
>>>> should work (after rewrite your query is a BooleanQuery, which supports 
>>>> extractTerms()).
>>>
>>> ... as long as you don't exceed the max number of terms allowed by BQ
>>> (1024 by default, but you can raise it).
>>
>> True, I've noticed this meanwhile. Are there any recommendations for
>> this setting where the limit is as large as possible while staying
>> within a reasonable performance? Of course, this is highly subjective,
>> but what's the magnitude here? Will a limit of 1,024,000 typically
>> increase the query time by the factor 1,000 too?
>> Carsten
> 
> I think 1024 may already be too high ;)
> 
> But really it depends on your situation: test different limits and see.
> 
> How much slower a larger query is depends on the specifics of the terms ...


For the purpose of initial testing, I've increased the limit by the
factor 1,000. As Uwe pointed out, I don't actually execute the query,
but only extract the terms. In this regard, there are no performance
issues with thousands of terms, although I will have to perform a
systematic evaluation yet.
Best,
Carsten


-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Rewrite for RegexpQuery

Reply via email to