Search Question ...

KÖLL Claus Mon, 02 Jan 2017 04:30:08 -0800

Hi !

First .. good new year to the whole community !


We have in one of our workspace a lot of office documents (about 10 Million) 
which are full text indexed.
At the moment we have searches that take really long .. 1~3 Minutes ...

The following xpath query will be executed ..

"//element(*, dvt:document)[@dvt:referenceId = 'protid:123' and 
jcr:contains(jcr:content, 'tirol')] order by jcr:score()";

Every node has a field called referenceid. In sum there are not really much 
documents who has this value.
So the search "//element(*, dvt:document)[@dvt:referenceId = 'protid:123'] 
order by jcr:score()";
is quite fast.

But in combination with the fulltextsearch on the content, it is really slow.
The word 'tirol' is used in really much documents ...

So I have tried to set a limit of 100 on the Query [query.setLimit(100)] but 
this has made the search not really faster.
After struggling around the code I have found that the limit hint is handled in 
the TopFieldCollector.
But the whole time is spent before the collector skips the results.

I see a lot of info logs like this
INFO  2017-01-02 13:13:10,610 
org.apache.jackrabbit.core.query.lucene.DocNumberCache.size=107757/100000000, 
#accesses=428591, #hits=0, #misses=428591, cacheRatio=0%

I think the BooleanScorer of the fulltext will be evaluated against all of its 
hits and therefore the docid to nodeid cache will be filled.

Is it possible to give the limit hint to the fulltext scorer ? or maybe I do 
not understand it and somebody give me some hints how I can make the search 
faster ..

thanks
claus

Search Question ...

Reply via email to