Hello, I have some problems with performance of searches in jackrabbit. I have a simple search, like, give me all nodes where (prop1=a +prop2=b +prop3=c + prop4=d). This is for lucene obviously an extremely simple query. Doing this on a lucene index with millions of docs and the number of hits is small (< 100) will result in queries executed in couple of ms.
When having these kind of queries in Jackrabbit, with for example 100.000 nodes and I do the above described search (repeated), results in *slow* responses (couple of hundred of ms for 100.000 nodes only). I did ask on the lucene list what impact a MultiSearcher (I know we use a CombinedIndexReader and a normal IndexSearcher, though I am quite convinced the problem stays the same) has on performance with respect to a single index. I got only one answer, but a search of say 100 indexes would take 100 times longer (which I kind am experiencing when the number of actual hits is small). I wrote a seperate programm to do some testing, like merging the jackrabbit indexes into one single index. Then, my queries are fast. The original reason for multiple indexes is I think to be able to keep more indexReaders open and cache the results, and have easier/faster incremental updating, right? Also see [1]. Also the thread in [2] between Christoph and Marcel might be pretty much related to this (RangeQueries I did not test, but intuitive they will suffer even more from multiple indexes, because each index has to expand the RangeQuery seperately I think). The problem with slow DescendantSelfAxisWeight won't be solved, though I did some changes in our code to be able to know fast wether a node is a child of some node or not (if people interested, I have been thinking about this one, and it is a trade off between fast renaming in jackrabbit of a node, or fast searching for child nodes (write versus read)) Before I will try to see what can be changed, do other people experience the same thing? Might it be someting that was faster at the time of lucene 1.9, but is now perhaps outdated? I also found some parts on FileSystem access for multiple indexes is slower, because head movements during reading might be much larger compared to a single index (though might be platform dependant of course how the FileSystem cache is managed). To start with, I have tried to keep the number of indexes created as small as possible tuning the minMergeDocs, volatileIdleTime, maxMergeDocs and mergeFactor. Whenever my number of documents/nodes grow however (though only 100.000 nodes), my number of indexes grow. I think the idea about seperate indexes if perfectly valid, only I want to reduce the number of indexes to no more then for example 10. Adding each VolatileIndex when persisting it, to an already persistent index untill for example the index contains 100.000 docs, and then, when there are 10 of them, merge them all, and start creating indexes of 1.000.000 docs, untill there are 10, would perhaps benefit of both worlds. WDOT? Do other people experience the same problems? I do not know how other people use JackRabbit, but the way I want to use it mainly consists of searching. Almost everything I do is a search. Building a website with JackRabbit as content store results in queries all over the place, where currently, some are IMHO to slow, and where some aren't even possible within reasonable time scales (like, give me the most recent 10 articles in /content/en/news//[EMAIL PROTECTED]'news'] because this will result in a ChildAxisQuery or DescendantSelfAxisQuery which cannot be done over millions of documents AFAICS. To solve this at my setup, I choose to index the path of a document, where I do realize that moving a node now becomes expensive regarding re-indexing) Hope to hear what you think about it, Regards Ard [1] http://jackrabbit.apache.org/doc/arch/operate/query.html#Query [2] http://www.mail-archive.com/dev@jackrabbit.apache.org/msg06026.html -- Hippo Oosteinde 11 1017WT Amsterdam The Netherlands Tel +31 (0)20 5224466 ------------------------------------------------------------- [EMAIL PROTECTED] / [EMAIL PROTECTED] / http://www.hippo.nl --------------------------------------------------------------