Query Performance and Optimization

David Johnson Tue, 27 Feb 2007 21:50:27 -0800

We are exploring using Jackrabbit in a production environment.  I have a
repository that we have created from our content that has > 100K nodes.
Several of our use case need to use date range queries and also use 'order
by' frequently.  We have noticed that the query time is significantly slower
than necessary.  After warming up the repository (i.e., running the suite of
queries once), as an example:


"select * from Column where jcr:path like 'Gossip/ColumnName/Columns/%' and
status <> 'hidden' order by publishDate desc" takes 500 ms to execute - this
is just the execution time, I am not actually using or accessing the
NodeIterator.

Whereas: "select * from Column where jcr:path like
'Gossip/ColumnName/Columns/%' and status <> 'hidden'" takes only 33 ms to
execute.

/jcr:root/Gossip/ColumnName/Columns//element(*,Column)[EMAIL PROTECTED] >
xs:dateTime("way in the past") and @publishDate < xs:dateTime("way in the
future") and (@status != 'hidden')] order by @publishDate descending takes
1096 ms to execute.

Clearly dates (ordering and ranges) have a significant impact on query
execution speed.

Digging into the internals of Jackrabbit, we have noticed that there is an
implementation of RangeQuery that essentially walks the results if the # of
query terms is greater than what Lucene can handle.  Reading the Lucene
documentation, it looks like Filters are the recommended method of
implementing "large" range queries, and also seem like a natural for
matching node types - i.e., select * from Column

Is there any ongoing work on query optimization and performance.  We would
be very interested in such work, including offering any help that we can.

-Dave

Query Performance and Optimization

Reply via email to