All has to do with the total focus on strings in an inverted index, as opposed to the more general model in an RDBMS.
Lucene doesn't need to track the max length. It sees each date as a string and understands all string intervals lexicographically. That means 20060401 is less than 20060401HHMMSS for any HHMMSS (equal, if there is none). This is basically the same ordering as saying "base" sorts less than "baseball": if one term is a prefix for another, it is "less than". Given this ordering, when you ask for DATE TO DATE, Lucene first finds all tokens in your index within that range and uses that list to compose a boolean query, which it then executes. With resolution down to seconds, that can be an awful lot of tokens. If you do resolution down to days, you know you'll have no more than 365 tokens a year. Does raise a subtle issue I glossed over, which is that your range query might not actually getting what you think (depending, of course, on what you think). You say "TO 20060901", but since 20060901HHMMSS is larger than 20060901 for any HHMMSS, you're only getting dates up to 20060831HHMMSS (unless you have dates that don't have a time part, which might then be included.) The wiki page Doron mentioned, http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing, does talk about this a bit more. You can add multiple fields so you can query arbitrary date ranges without things blowing up. For example, if you wanted to do July 15, 2004 through July 15, 2006, using the coding in the wiki, instead of doing one query, you could look for all dates by year in 05, by month in the whole months in 04 and 06, and then by day in the partial months. As others have mentioned, there are filters, too. -----Original Message----- From: Bushey, John [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 17, 2006 10:57 AM To: java-user@lucene.apache.org Subject: RE: BooleanQuery.TooManyClauses exception Thanks. That's the explanation that I was looking for. The WIKI does not cover this in much detail. The architectural reason for this sounds strange to me since my background is in relational databases where this is not an issue so I still have a question. How does reducing the precision really help? Does Lucene track the max length of the indexed value and use that to enumerate all the unique date/time values for the query? In my case I have done nothing special to index my dates. I just treat them as a string of numbers. -----Original Message----- From: Steven Parkes [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 17, 2006 12:13 PM To: java-user@lucene.apache.org Subject: RE: BooleanQuery.TooManyClauses exception Lucene takes your date range, enumerates all the unique date/time values in your corpus within that range, and then executes that query. So the number of terms in your query is going to be equal to the number of unique date/time values in the range. The most common way of handling this is to not index the dates to a higher precision than you need to support your query. If you're only going to query down to days (and not the time of day within a date), don't include the extra hours/minutes/seconds in the indexed field. You can always put the higher precision value in a stored but unindexed field if you want to retrieve it via the query results. -----Original Message----- From: Bushey, John [mailto:[EMAIL PROTECTED] Sent: Monday, October 16, 2006 10:44 AM To: java-user@lucene.apache.org Subject: BooleanQuery.TooManyClauses exception Hi - Can someone explain the reason why I'm getting the TooManyClauses exception? I have a general understanding of the issue based on my reading, but I don't understand the mechanics of the it. Specifically how is my query being expanded to cause this problem? How am I exceeding the default 1024 clauses? My query looks like the following. pyLabel:(test) OR pyDescription:(test) AND ( pxCreateDateTime:[20060401 TO 20060901] ) The problem only happens when my date range exceeds ~2 months. The date is indexed with more precision, but for my query purposes I only care about the date and not the time stamp portion. What can I do to solve or mitigate this problem and be able to search a date range that spans at least a year? Is setMaxClauseCount() a predictable solution? How would reducing the precision of my date during indexing help? Thanks John --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]