Re: CachingWrapperQuery performance

Anton Lyska Wed, 17 Jun 2015 04:55:16 -0700

Hi Adrien.

" Can you confirm that the response times that you are reporting both
happen on Lucene 5.2 "
 - no, I upgraded Lucene from 5.0 to 5.2, and only in 5.2 tried to use
CachingWrapperQuery


"  on a "hot cache" "
  - answer is no, because I have a lot of different of queries, and
therefore different filters.
 I use LRU cache, so when filter is not in cache, I create new one.
But nevertheless, my timings contain both heating CachingWrapperFilter and
CachingWrapperQuery


 Yesterday I reverted sources to use filters, and added some statistics
that you had asked for. I will provide stat only for one query

 - index size
        18M docs

 - response time with caching:

reponseTimeWithCachingMs.samples    | 59487
reponseTimeWithCachingMs.min        | 0
reponseTimeWithCachingMs.avg        | 53.68003093112781
reponseTimeWithCachingMs.max        | 2530
reponseTimeWithCachingMs[-INF; 0)   | 0 (0.0000%; 0.0000%)
reponseTimeWithCachingMs[0; 10)     | 216 (0.3631%; 0.3631%)
reponseTimeWithCachingMs[10; 20)    | 42 (0.0706%; 0.4337%)
reponseTimeWithCachingMs[20; 30)    | 12133 (20.3961%; 20.8298%)
reponseTimeWithCachingMs[30; 40)    | 25749 (43.2851%; 64.1148%)
reponseTimeWithCachingMs[40; 50)    | 9111 (15.3160%; 79.4308%)
reponseTimeWithCachingMs[50; 60)    | 3572 (6.0047%; 85.4355%)
reponseTimeWithCachingMs[60; 70)    | 1565 (2.6308%; 88.0663%)
reponseTimeWithCachingMs[70; 80)    | 1062 (1.7853%; 89.8516%)
reponseTimeWithCachingMs[80; 90)    | 535 (0.8994%; 90.7509%)
reponseTimeWithCachingMs[90; 100)   | 390 (0.6556%; 91.4065%)
reponseTimeWithCachingMs[100; 110)  | 446 (0.7497%; 92.1563%)
reponseTimeWithCachingMs[110; 120)  | 476 (0.8002%; 92.9564%)
reponseTimeWithCachingMs[120; 130)  | 380 (0.6388%; 93.5952%)
reponseTimeWithCachingMs[130; 140)  | 189 (0.3177%; 93.9130%)
reponseTimeWithCachingMs[140; 150)  | 157 (0.2639%; 94.1769%)
reponseTimeWithCachingMs[150; 160)  | 106 (0.1782%; 94.3551%)
reponseTimeWithCachingMs[160; 170)  | 94 (0.1580%; 94.5131%)
reponseTimeWithCachingMs[170; 180)  | 82 (0.1378%; 94.6509%)
reponseTimeWithCachingMs[180; 190)  | 126 (0.2118%; 94.8627%)
reponseTimeWithCachingMs[190; 200)  | 129 (0.2169%; 95.0796%)
reponseTimeWithCachingMs[200; 210)  | 140 (0.2353%; 95.3149%)
reponseTimeWithCachingMs[210; 220)  | 143 (0.2404%; 95.5553%)
reponseTimeWithCachingMs[220; 230)  | 177 (0.2975%; 95.8529%)
reponseTimeWithCachingMs[230; 240)  | 224 (0.3766%; 96.2294%)
reponseTimeWithCachingMs[240; 250)  | 222 (0.3732%; 96.6026%)
reponseTimeWithCachingMs[250; 260)  | 220 (0.3698%; 96.9725%)
reponseTimeWithCachingMs[260; 270)  | 212 (0.3564%; 97.3288%)
reponseTimeWithCachingMs[270; 280)  | 175 (0.2942%; 97.6230%)
reponseTimeWithCachingMs[280; 290)  | 213 (0.3581%; 97.9811%)
reponseTimeWithCachingMs[290; 300)  | 268 (0.4505%; 98.4316%)
reponseTimeWithCachingMs[300; 310)  | 225 (0.3782%; 98.8098%)
reponseTimeWithCachingMs[310; 320)  | 155 (0.2606%; 99.0704%)
reponseTimeWithCachingMs[320; 330)  | 116 (0.1950%; 99.2654%)
reponseTimeWithCachingMs[330; 340)  | 72 (0.1210%; 99.3864%)
reponseTimeWithCachingMs[340; 350)  | 49 (0.0824%; 99.4688%)
reponseTimeWithCachingMs[350; 360)  | 46 (0.0773%; 99.5461%)
reponseTimeWithCachingMs[360; 370)  | 43 (0.0723%; 99.6184%)
...

 - response time without caching:
      (I disabled caching on live app, and gathered some amount of samples)

reponseTimeWithoutCachingMs.samples    | 1525
reponseTimeWithoutCachingMs.min        | 8
reponseTimeWithoutCachingMs.avg        | 115.27213114754099
reponseTimeWithoutCachingMs.max        | 893
reponseTimeWithoutCachingMs[-INF; 0)   | 0 (0.0000%; 0.0000%)
reponseTimeWithoutCachingMs[0; 10)     | 2 (0.1311%; 0.1311%)
reponseTimeWithoutCachingMs[10; 20)    | 0 (0.0000%; 0.1311%)
reponseTimeWithoutCachingMs[20; 30)    | 0 (0.0000%; 0.1311%)
reponseTimeWithoutCachingMs[30; 40)    | 29 (1.9016%; 2.0328%)
reponseTimeWithoutCachingMs[40; 50)    | 193 (12.6557%; 14.6885%)
reponseTimeWithoutCachingMs[50; 60)    | 136 (8.9180%; 23.6066%)
reponseTimeWithoutCachingMs[60; 70)    | 110 (7.2131%; 30.8197%)
reponseTimeWithoutCachingMs[70; 80)    | 107 (7.0164%; 37.8361%)
reponseTimeWithoutCachingMs[80; 90)    | 134 (8.7869%; 46.6230%)
reponseTimeWithoutCachingMs[90; 100)   | 111 (7.2787%; 53.9016%)
reponseTimeWithoutCachingMs[100; 110)  | 36 (2.3607%; 56.2623%)
reponseTimeWithoutCachingMs[110; 120)  | 48 (3.1475%; 59.4098%)
reponseTimeWithoutCachingMs[120; 130)  | 37 (2.4262%; 61.8361%)
reponseTimeWithoutCachingMs[130; 140)  | 39 (2.5574%; 64.3934%)
reponseTimeWithoutCachingMs[140; 150)  | 56 (3.6721%; 68.0656%)
reponseTimeWithoutCachingMs[150; 160)  | 50 (3.2787%; 71.3443%)
reponseTimeWithoutCachingMs[160; 170)  | 58 (3.8033%; 75.1475%)
reponseTimeWithoutCachingMs[170; 180)  | 71 (4.6557%; 79.8033%)
reponseTimeWithoutCachingMs[180; 190)  | 69 (4.5246%; 84.3279%)
reponseTimeWithoutCachingMs[190; 200)  | 73 (4.7869%; 89.1148%)
reponseTimeWithoutCachingMs[200; 210)  | 50 (3.2787%; 92.3934%)
reponseTimeWithoutCachingMs[210; 220)  | 44 (2.8852%; 95.2787%)
reponseTimeWithoutCachingMs[220; 230)  | 15 (0.9836%; 96.2623%)
reponseTimeWithoutCachingMs[230; 240)  | 11 (0.7213%; 96.9836%)
reponseTimeWithoutCachingMs[240; 250)  | 12 (0.7869%; 97.7705%)
reponseTimeWithoutCachingMs[250; 260)  | 13 (0.8525%; 98.6229%)
reponseTimeWithoutCachingMs[260; 270)  | 10 (0.6557%; 99.2787%)
reponseTimeWithoutCachingMs[270; 280)  | 6 (0.3934%; 99.6721%)
reponseTimeWithoutCachingMs[280; 290)  | 1 (0.0656%; 99.7377%)
reponseTimeWithoutCachingMs[290; 300)  | 2 (0.1311%; 99.8689%)
reponseTimeWithoutCachingMs[300; 310)  | 1 (0.0656%; 99.9344%)


- how many documents cached filter matches:

filterMatchCount.samples        | 24071
filterMatchCount.min            | 0
filterMatchCount.avg            | 17765.553736861784
filterMatchCount.max            | 438929
filterMatchCount[-INF; 0)       | 0 (0.0000%; 0.0000%)
filterMatchCount[0; 1000)       | 10233 (42.5117%; 42.5117%)
filterMatchCount[1000; 2000)    | 2338 (9.7129%; 52.2247%)
filterMatchCount[2000; 3000)    | 1330 (5.5253%; 57.7500%)
filterMatchCount[3000; 4000)    | 950 (3.9467%; 61.6966%)
filterMatchCount[4000; 5000)    | 642 (2.6671%; 64.3638%)
filterMatchCount[5000; 6000)    | 582 (2.4178%; 66.7816%)
filterMatchCount[6000; 7000)    | 481 (1.9983%; 68.7799%)
filterMatchCount[7000; 8000)    | 494 (2.0523%; 70.8321%)
filterMatchCount[8000; 9000)    | 351 (1.4582%; 72.2903%)
filterMatchCount[9000; 10000)   | 298 (1.2380%; 73.5283%)
filterMatchCount[10000; 11000)  | 255 (1.0594%; 74.5877%)
filterMatchCount[11000; 12000)  | 194 (0.8059%; 75.3936%)
filterMatchCount[12000; 13000)  | 209 (0.8683%; 76.2619%)
filterMatchCount[13000; 14000)  | 191 (0.7935%; 77.0554%)
filterMatchCount[14000; 15000)  | 183 (0.7603%; 77.8156%)
filterMatchCount[15000; 16000)  | 216 (0.8973%; 78.7130%)
filterMatchCount[16000; 17000)  | 146 (0.6065%; 79.3195%)
filterMatchCount[17000; 18000)  | 190 (0.7893%; 80.1088%)
filterMatchCount[18000; 19000)  | 134 (0.5567%; 80.6655%)
filterMatchCount[19000; 20000)  | 174 (0.7229%; 81.3884%)
filterMatchCount[20000; +INF)   | 4480 (18.6116%; 100.0000%)

- how many documents query matches:

queryMatchCount.samples        | 24198
queryMatchCount.min            | 0
queryMatchCount.avg            | 2489.612240681048
queryMatchCount.max            | 75347
queryMatchCount-INF; 0)        | 0 (0.0000%; 0.0000%)
queryMatchCount[0; 1000)       | 16339 (67.5221%; 67.5221%)
queryMatchCount[1000; 2000)    | 2210 (9.1330%; 76.6551%)
queryMatchCount[2000; 3000)    | 1242 (5.1327%; 81.7878%)
queryMatchCount[3000; 4000)    | 792 (3.2730%; 85.0607%)
queryMatchCount[4000; 5000)    | 482 (1.9919%; 87.0527%)
queryMatchCount[5000; 6000)    | 545 (2.2523%; 89.3049%)
queryMatchCount[6000; 7000)    | 444 (1.8349%; 91.1398%)
queryMatchCount[7000; 8000)    | 248 (1.0249%; 92.1646%)
queryMatchCount[8000; 9000)    | 166 (0.6860%; 92.8506%)
queryMatchCount[9000; 10000)   | 172 (0.7108%; 93.5615%)
queryMatchCount[10000; 11000)  | 91 (0.3761%; 93.9375%)
queryMatchCount[11000; 12000)  | 85 (0.3513%; 94.2888%)
queryMatchCount[12000; 13000)  | 59 (0.2438%; 94.5326%)
queryMatchCount[13000; 14000)  | 58 (0.2397%; 94.7723%)
queryMatchCount[14000; 15000)  | 37 (0.1529%; 94.9252%)
queryMatchCount[15000; 16000)  | 164 (0.6777%; 95.6029%)
queryMatchCount[16000; 17000)  | 36 (0.1488%; 95.7517%)
queryMatchCount[17000; 18000)  | 25 (0.1033%; 95.8550%)
queryMatchCount[18000; 19000)  | 109 (0.4505%; 96.3055%)
queryMatchCount[19000; 20000)  | 25 (0.1033%; 96.4088%)
queryMatchCount[20000; +INF)   | 869 (3.5912%; 100.0000%)


I think I should say few more details.

First, there is difference between samples count  in queryMatchCount/
filterMatchCount and reponseTimeWithCachingMs.
This is because I need to execute one more query to get filterMatchCount (
 searcher.search(new MatchAllDocsQuery(), searchFilter,
totalHitCountCollector); )
and I do this only for the first search, and dont do when execute
searchAfter func. But reponseTimeWithCachingMs also contains time of
searchAfter .

Second, I will explain about stats.
   queries "doc.fieldA=1 and doc.fieldB=2" and  "doc.fieldA=3 and
doc.fieldB=4" I process as queries of same type and aggregate them in one
stat.
 Therefore  there is a big variance in filterMatchCount & queryMatchCount.


And finally, if you notice, query with filter takes 53 ms, but in previous
letter I provided much less number (22 ms for this query).
This may happen because I was too lazy while reverting sourses. Instead of
revert using BooleanFilter, TermFilter, NumericRangeFilter (as was before)
I just create chain  CachingWrapperFilter(QueryWrapperFilter( BooleanQuery
(TermQueries... NumericRangeQueries) )).
Second possible reason why there is difference:  22ms was on Lucene 5.0.

But nevertheless, on Lucene 5.2 query with CachingWrapperQuery takes 95ms,
with CachingWrapperFilter 53 ms, and without any caching 115ms.

Huh, I think that's all. I would appreciate any advice.


Regards,
Anton


2015-06-16 0:35 GMT+03:00 Adrien Grand <jpou...@gmail.com>:

> Hi Anton,
>
> Thanks for reporting this. It is indeed a bit surprising given that
> both classes work in a very similar way. Can you confirm that the
> response times that you are reporting both happen on Lucene 5.2 (even
> with CachingWrapperFilter) and on a "hot cache" (so that they don't
> include the generation of the cache entries)?
>
> Could you also provide us with the following information:
>  - what are the response times without caching?
>  - how many docs do you have in the index?
>  - how many documents does the cached filter match?
>  - how many documents do your two popular queries match?
>
> If you have the opportunity to perform some profiling of your query
> execution, this might be useful as well. Thanks!
>
>
> On Mon, Jun 15, 2015 at 10:56 AM, Anton Lyska <profondome...@gmail.com>
> wrote:
> >  Hi,
> >
> >     I have performance issues with CachingWrapperQuery with lucene 5.2
> and
> > dont know how to solve it.
> >
> > Prehistory:
> >     I have search with different parameters, where some parameters are
> used
> > more frequently then others. For these params I used filters(and cached
> > them), and my search looked like this
> >
> >     BooleanFilter bf = new BooleanFilter ();
> >     bf.add(.., BooleanClause.Occur.MUST);    //frequently used params
> here
> >     bf.add(.., BooleanClause.Occur.MUST);
> >
> >     Filter searchFilter = new CachingWrapperFilter(bf);    //and this
> > filter object was reused between queries
> >
> >     BooleanQuery searchQuery = new BooleanQuery();
> >     searchQuery.add(.., BooleanClause.Occur.MUST);
> >     searchQuery.add(.., BooleanClause.Occur.MUST);
> >
> >     hits = searcher.search(searchQuery, searchFilter, count, sort);
> >
> > And everything were great. I looked at two popular queries and each took
> 32
> > ms and 22 ms to execute.
> >
> > Now I did upgrade Lucene to 5.2 (from 5.0), and saw that filters are
> > deprecated and its advisable to use queries and CachingWrapperQuery.
> >
> > So i changed sources to this:
> >
> >
> >     BooleanQuery bq = new BooleanQuery ();   //now it becomes
> BooleanQuery
> >     bq.add(.., BooleanClause.Occur.MUST);    //frequently used params
> here,
> > same terms as before
> >     bq.add(.., BooleanClause.Occur.MUST);
> >
> >     Query cachedQuery = new CachingWrapperQuery(bq);    //this object is
> > reused between queries
> >
> >     BooleanQuery searchQuery = new BooleanQuery();
> >     searchQuery.add(.., BooleanClause.Occur.MUST);
> >     searchQuery.add(.., BooleanClause.Occur.MUST);
> >     searchQuery.add(cachedQuery, BooleanClause.Occur.FILTER);    //here i
> > add cached part of the query
> >
> >     hits = searcher.search(searchQuery, count, sort);
> >
> > Later I looked at same queries, and they took 145ms and 95ms  to execute!
> >
> > Moving to CachingWrapperQuery was the only difference between queries, so
> > question is:
> >    How to use CachingWrapperQuery right in my situation, or should I
> switch
> > back to filters?
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: CachingWrapperQuery performance

Reply via email to