On 03/03/2016 14:15, Ancona Francesco wrote:
> ...
> About query Engine
> -     Could you explain more in deep what traverse is ? If we have 
> understood, Treverse doesn't delegate to index server engine (good in case of 
> index server trouble) but is built in        component in oak: but where keep 
> repository graph to Traverse ? In memory ? on filesystem ? getting data from 
> db ? 
Traverse will physically traverse the repository in search for the right
data. It's not the most efficient index and it's there mainly to operate
in case either all other indexes are not suitable for the provided query
or there are no other indexes.

But be careful. It doesn't mean it's intrinsically a bad index. Let's
take the following query as an example

SELECT *
FROM [nt:unstructured] AS a
WHERE
ISDESCENDANTNODE(a, '/content/mysite/colour/red')
AND colour = 'red'

and you initialised the repository with the InitialContent that provides
you some indexes, as I said in a previous email, and on top you have a
PropertyIndex on `colour` and you have no Lucene index. Lucene is quite
powerful with a lot of configuration options.

Overall in the repository you have grossly the following node distribution

- 10k nodes nt:unstructured
- 5k nodes with colour red
- 3 nodes under /content/mysite/colour/red

For the above query, if you look at the plans you'll have the following
costs (taking some freedom on numbers):

- NodeTypeIndex 10000
- PropertyIndex: 3000
- Traversing: 3

In this case the traversing index would actually be more performant than
any other index as the query engine will have to post-analyse a set of
only 3 nodes.
> -     we have to manage a potentially large amount of documents so we need 
> more than a node, so is it possibile clustering lucene ?
You can't cluster the built-in lucene. If you're looking for such
feature maybe a remote Solr can be a better solution but so far I don't
think I heard the need of clustering lucene.

You can have a look at my slides from the talk I gave to the adaptTo
conference last year. They may help shedding some light on the query
engine, even if the biggest part of my presentation were the 20 minutes
of Q&A :)

http://adapt.to/2015/en/schedule/scaling-the-query-with-oak.html

HTH
Davide

Reply via email to