Hi Adrien,
Thanks for your reply.
I have also tried testing with UsageTrackingQueryCachingPolicy, but did not
observe a significant change in both latency and throughput.
Given that I have specific search requirements of no scoring and sorting the
search results in a random order (reason for custom sort object), I have also
explored writing a custom collector and could observe quite a difference in
latency figures.
Let me know if this custom collector code has any loopholes which I could be
missing:
class RandomOrderCollector extends SimpleCollector
{
private int maxHitsRequired;
private int docBase;
private List<Integer> matches = new ArrayList<Integer>();
public RandomOrderCollector(int maxHitsRequired)
{
this.maxHitsRequired = maxHitsRequired;
}
public boolean needsScores()
{
return false;
}
@Override
public void collect(int doc) throws IOException
{
matches.add(docBase + doc);
}
@Override
protected void doSetNextReader(LeafReaderContext context) throws
IOException
{
super.doSetNextReader(context);
this.docBase = context.docBase;
}
public List<Integer> getHits()
{
Collections.shuffle(matches);
maxHitsRequired = Math.min(matches.size(), maxHitsRequired);
return matches.subList(0, maxHitsRequired);
}
}
Best Regards,
Atul Bisaria
-----Original Message-----
From: Adrien Grand [mailto:[email protected]]
Sent: Wednesday, January 31, 2018 6:33 PM
To: [email protected]
Subject: Re: Increase search performance
Hi Atul,
Le mar. 30 janv. 2018 à 16:24, Atul Bisaria <[email protected]> a écrit
:
> 1. Using ConstantScoreQuery so that scoring overhead is removed since
> scoring is not required in my search use case. I also use a custom
> Sort object which does not sort by score (see code below).
>
If you don't sort by score, then wrapping with a ConstantScoreQuery won't help
as Lucene will figure out scores are not needed anyway.
> 2. Using query cache
>
>
>
> My understanding is that query cache would cache query results and
> hence lead to significant increase in performance. Is this understanding
> correct?
>
It depends what you mean by performance. If you are optimizing for worst-case
latency, then the query cache might make things worse due to the fact that
caching a query requires to visit all matches, while query execution can
sometimes just skip over non-interesting matches (eg. in conjunctions).
However if you are looking at improving throughput, then usually the default
policy of the query cache of caching queries that look reused usually helps.
> I am using Lucene version 5.4.1 where query cache seems to be enabled
> by default (https://issues.apache.org/jira/browse/LUCENE-6784), but I
> am not able to see any significant change in search performance.
>
> Here is the code I am testing with:
>
>
>
> DirectoryReader reader = DirectoryReader.open(directory); //using
> MMapDirectory
>
> IndexSearcher searcher = new IndexSearcher(reader); //IndexReader and
> IndexSearcher are created only once
>
> searcher.setQueryCachingPolicy(QueryCachingPolicy.ALWAYS_CACHE);
>
Don't do that, this will always cache all filters, which usually makes things
slower for the reason mentioned above. I would rather advise that you use an
instance of UsageTrackingQueryCachingPolicy.