Under the covers, as I understand it, a BooleanQuery is assembled for each unique term in the range. So, if you store your dates with milliseconds, there can be, what, 86,000,000+ unique terms per day. If you stored your times as strings to millisecond resolution, you can have a lot of clauses in the boolean query, whereas with day resolution, you only would have one at most. Again, as I understand it, this is tied Lucene's way of scoring documents for relevance.
An RDBMS really doesn't care about scoring, it's either "it's there or it's not". That functionality is supplied in Lucene by Filters, which bypass scoring (e.g. by using ConstantScoreQuery). BTW, this also applies to other "expanding" queries, like PrefixQuery, WildcardQuery, etc. Erick On 10/17/06, Bushey, John <[EMAIL PROTECTED]> wrote:
Thanks. That's the explanation that I was looking for. The WIKI does not cover this in much detail. The architectural reason for this sounds strange to me since my background is in relational databases where this is not an issue so I still have a question. How does reducing the precision really help? Does Lucene track the max length of the indexed value and use that to enumerate all the unique date/time values for the query? In my case I have done nothing special to index my dates. I just treat them as a string of numbers. -----Original Message----- From: Steven Parkes [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 17, 2006 12:13 PM To: java-user@lucene.apache.org Subject: RE: BooleanQuery.TooManyClauses exception Lucene takes your date range, enumerates all the unique date/time values in your corpus within that range, and then executes that query. So the number of terms in your query is going to be equal to the number of unique date/time values in the range. The most common way of handling this is to not index the dates to a higher precision than you need to support your query. If you're only going to query down to days (and not the time of day within a date), don't include the extra hours/minutes/seconds in the indexed field. You can always put the higher precision value in a stored but unindexed field if you want to retrieve it via the query results. -----Original Message----- From: Bushey, John [mailto:[EMAIL PROTECTED] Sent: Monday, October 16, 2006 10:44 AM To: java-user@lucene.apache.org Subject: BooleanQuery.TooManyClauses exception Hi - Can someone explain the reason why I'm getting the TooManyClauses exception? I have a general understanding of the issue based on my reading, but I don't understand the mechanics of the it. Specifically how is my query being expanded to cause this problem? How am I exceeding the default 1024 clauses? My query looks like the following. pyLabel:(test) OR pyDescription:(test) AND ( pxCreateDateTime:[20060401 TO 20060901] ) The problem only happens when my date range exceeds ~2 months. The date is indexed with more precision, but for my query purposes I only care about the date and not the time stamp portion. What can I do to solve or mitigate this problem and be able to search a date range that spans at least a year? Is setMaxClauseCount() a predictable solution? How would reducing the precision of my date during indexing help? Thanks John --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]