Re: Query Performance and Optimization

Jukka Zitting Fri, 02 Mar 2007 01:58:28 -0800

Hi,,

On 2/28/07, David Johnson <[EMAIL PROTECTED]> wrote:

"select * from Column where jcr:path like 'Gossip/ColumnName/Columns/%' and
status <> 'hidden' order by publishDate desc" takes 500 ms to execute - this
is just the execution time, I am not actually using or accessing the
NodeIterator.


Are you using Jackrabbit 1.2.x? Jackrabbit 1.2 uses lazy loading of
query results, which should considerably reduce query execution time
by moving the effort to the resulting Node- or RowIterator.

In general my rule of thumb so far has been to use the query feature
when you want a narrow selection of nodes from a large source set, and
to use explicit traversal with filtering when the expected result set
includes a considerable percentage of the source set. Optimally the
query feature should in all cases be at least equal to traversal speed
plus a small constant query parsing and setup overhead. I don't think
we are there yet.

Digging into the internals of Jackrabbit, we have noticed that there is an
implementation of RangeQuery that essentially walks the results if the # of
query terms is greater than what Lucene can handle.  Reading the Lucene
documentation, it looks like Filters are the recommended method of
implementing "large" range queries, and also seem like a natural for
matching node types - i.e., select * from Column


I'm not too familiar with Lucene details to comment on whether Filters
would cover everything we need. It would be great if you're interested
in pursuing such alternatives!

Is there any ongoing work on query optimization and performance.  We would
be very interested in such work, including offering any help that we can.


Not apart from the recent lazy loading improvements. Any help would be
much appreciated.

BR,

Jukka Zitting

Re: Query Performance and Optimization

Reply via email to