On 03/03/2016 14:15, Ancona Francesco wrote: > ... > About query Engine > - Could you explain more in deep what traverse is ? If we have > understood, Treverse doesn't delegate to index server engine (good in case of > index server trouble) but is built in component in oak: but where keep > repository graph to Traverse ? In memory ? on filesystem ? getting data from > db ? Traverse will physically traverse the repository in search for the right data. It's not the most efficient index and it's there mainly to operate in case either all other indexes are not suitable for the provided query or there are no other indexes.
But be careful. It doesn't mean it's intrinsically a bad index. Let's take the following query as an example SELECT * FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/content/mysite/colour/red') AND colour = 'red' and you initialised the repository with the InitialContent that provides you some indexes, as I said in a previous email, and on top you have a PropertyIndex on `colour` and you have no Lucene index. Lucene is quite powerful with a lot of configuration options. Overall in the repository you have grossly the following node distribution - 10k nodes nt:unstructured - 5k nodes with colour red - 3 nodes under /content/mysite/colour/red For the above query, if you look at the plans you'll have the following costs (taking some freedom on numbers): - NodeTypeIndex 10000 - PropertyIndex: 3000 - Traversing: 3 In this case the traversing index would actually be more performant than any other index as the query engine will have to post-analyse a set of only 3 nodes. > - we have to manage a potentially large amount of documents so we need > more than a node, so is it possibile clustering lucene ? You can't cluster the built-in lucene. If you're looking for such feature maybe a remote Solr can be a better solution but so far I don't think I heard the need of clustering lucene. You can have a look at my slides from the talk I gave to the adaptTo conference last year. They may help shedding some light on the query engine, even if the biggest part of my presentation were the 20 minutes of Q&A :) http://adapt.to/2015/en/schedule/scaling-the-query-with-oak.html HTH Davide