Re: BooleanQuery.TooManyClauses exception

Erick Erickson Tue, 17 Oct 2006 11:17:14 -0700

Under the covers, as I understand it, a BooleanQuery is assembled for each
unique term in the range. So, if you store your dates with milliseconds,
there can be, what, 86,000,000+ unique terms per day. If you stored your
times as strings to millisecond resolution, you can have a lot of clauses in
the boolean query, whereas with day resolution, you only would have one at
most. Again, as I understand it, this is tied Lucene's way of scoring
documents for relevance.


An RDBMS really doesn't care about scoring, it's either "it's there or it's
not". That functionality is supplied in Lucene by Filters, which bypass
scoring (e.g. by using ConstantScoreQuery).

BTW, this also applies to other "expanding" queries, like PrefixQuery,
WildcardQuery, etc.

Erick

On 10/17/06, Bushey, John <[EMAIL PROTECTED]> wrote:


Thanks.  That's the explanation that I was looking for.  The WIKI does
not cover this in much detail. The architectural reason for this sounds
strange to me since my background is in relational databases where this
is not an issue so I still have a question. How does reducing the
precision really help?  Does Lucene track the max length of the indexed
value and use that to enumerate all the unique date/time values for the
query?  In my case I have done nothing special to index my dates.  I
just treat them as a string of numbers.


-----Original Message-----
From: Steven Parkes [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 17, 2006 12:13 PM
To: java-user@lucene.apache.org
Subject: RE: BooleanQuery.TooManyClauses exception

Lucene takes your date range, enumerates all the unique date/time values
in your corpus within that range, and then executes that query. So the
number of terms in your query is going to be equal to the number of
unique date/time values in the range.

The most common way of handling this is to not index the dates to a
higher precision than you need to support your query. If you're only
going to query down to days (and not the time of day within a date),
don't include the extra hours/minutes/seconds in the indexed field. You
can always put the higher precision value in a stored but unindexed
field if you want to retrieve it via the query results.

-----Original Message-----
From: Bushey, John [mailto:[EMAIL PROTECTED]
Sent: Monday, October 16, 2006 10:44 AM
To: java-user@lucene.apache.org
Subject: BooleanQuery.TooManyClauses exception

Hi - Can someone explain the reason why I'm getting the TooManyClauses
exception?  I have a general understanding of the issue based on my
reading, but I don't understand the mechanics of the it.  Specifically
how is my query being expanded to cause this problem?  How am I
exceeding the default 1024 clauses?  My query looks like the following.



pyLabel:(test) OR pyDescription:(test) AND ( pxCreateDateTime:[20060401
TO 20060901] )



The problem only happens when my date range exceeds ~2 months.  The date
is indexed with more precision, but for my query purposes I only care
about the date and not the time stamp portion.  What can I do to solve
or mitigate this problem and be able to search a date range that spans
at least a year? Is setMaxClauseCount() a predictable solution?  How
would reducing the precision of my date during indexing help?





Thanks

John








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: BooleanQuery.TooManyClauses exception

Reply via email to