We are exploring using Jackrabbit in a production environment. I have a repository that we have created from our content that has > 100K nodes. Several of our use case need to use date range queries and also use 'order by' frequently. We have noticed that the query time is significantly slower than necessary. After warming up the repository (i.e., running the suite of queries once), as an example:
"select * from Column where jcr:path like 'Gossip/ColumnName/Columns/%' and status <> 'hidden' order by publishDate desc" takes 500 ms to execute - this is just the execution time, I am not actually using or accessing the NodeIterator. Whereas: "select * from Column where jcr:path like 'Gossip/ColumnName/Columns/%' and status <> 'hidden'" takes only 33 ms to execute. /jcr:root/Gossip/ColumnName/Columns//element(*,Column)[EMAIL PROTECTED] > xs:dateTime("way in the past") and @publishDate < xs:dateTime("way in the future") and (@status != 'hidden')] order by @publishDate descending takes 1096 ms to execute. Clearly dates (ordering and ranges) have a significant impact on query execution speed. Digging into the internals of Jackrabbit, we have noticed that there is an implementation of RangeQuery that essentially walks the results if the # of query terms is greater than what Lucene can handle. Reading the Lucene documentation, it looks like Filters are the recommended method of implementing "large" range queries, and also seem like a natural for matching node types - i.e., select * from Column Is there any ongoing work on query optimization and performance. We would be very interested in such work, including offering any help that we can. -Dave