Hi ! First .. good new year to the whole community !
We have in one of our workspace a lot of office documents (about 10 Million) which are full text indexed. At the moment we have searches that take really long .. 1~3 Minutes ... The following xpath query will be executed .. "//element(*, dvt:document)[@dvt:referenceId = 'protid:123' and jcr:contains(jcr:content, 'tirol')] order by jcr:score()"; Every node has a field called referenceid. In sum there are not really much documents who has this value. So the search "//element(*, dvt:document)[@dvt:referenceId = 'protid:123'] order by jcr:score()"; is quite fast. But in combination with the fulltextsearch on the content, it is really slow. The word 'tirol' is used in really much documents ... So I have tried to set a limit of 100 on the Query [query.setLimit(100)] but this has made the search not really faster. After struggling around the code I have found that the limit hint is handled in the TopFieldCollector. But the whole time is spent before the collector skips the results. I see a lot of info logs like this INFO 2017-01-02 13:13:10,610 org.apache.jackrabbit.core.query.lucene.DocNumberCache.size=107757/100000000, #accesses=428591, #hits=0, #misses=428591, cacheRatio=0% I think the BooleanScorer of the fulltext will be evaluated against all of its hits and therefore the docid to nodeid cache will be filled. Is it possible to give the limit hint to the fulltext scorer ? or maybe I do not understand it and somebody give me some hints how I can make the search faster .. thanks claus
