Hi Mike, If you have not started a RC yet, I'd like to include some small fixes for bugs that were recently introduced in Lucene: - https://github.com/apache/lucene/pull/11792 - https://github.com/apache/lucene/pull/11794
On Tue, Sep 20, 2022 at 1:26 AM Julie Tibshirani <juliet...@gmail.com> wrote: > Sorry for the confusion. To explain, I use a local ann-benchmarks set-up > that makes use of KnnGraphTester. It is a bit hacky and I accidentally > included the warm-ups in the final timings. So the change to warm-up > explains why we saw different results in our tests. This is great > motivation to solidify and publish my local ann-benchmarks set-up so that > it's not so fragile! > > In summary, with your latest fix the recall and QPS look good to me -- I > don't detect any regression between 9.3 and 9.4. > > Julie > > On Mon, Sep 19, 2022 at 3:45 PM Michael Sokolov <msoko...@gmail.com> > wrote: > >> I'm confused, since warming should not be counted in the timings. Are you >> saying that the recall was affected?? >> >> On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <juliet...@gmail.com> >> wrote: >> >>> Using the ann-benchmarks framework, I still saw a similar regression as >>> Mayya between 9.3 and 9.4. I investigated and found it was due to >>> "KnnGraphTester to use KnnVectorQuery" ( >>> https://github.com/apache/lucene/pull/796), specifically the change to >>> the warm-up strategy. If I revert it, the results look exactly as expected. >>> >>> I guess we can keep an eye on the nightly benchmarks tomorrow to >>> double-check there's no drop. It would also be nice to formalize the >>> ann-benchmarks set-up and run it regularly (like we've discussed in >>> https://github.com/apache/lucene/issues/10665). >>> >>> Julie >>> >>> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msoko...@gmail.com> >>> wrote: >>> >>>> Thanks for your speedy testing! I am observing comparable latencies >>>> *when the index geometry (ie number of segments)* is unchanged. Agree we >>>> can leave this for a later day. I'll proceed to cut 9.4 artifacts >>>> >>>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova >>>> <mayya.sharip...@elastic.co.invalid> wrote: >>>> >>>>> It would be great if you all are able to test again with >>>>>> https://github.com/apache/lucene/pull/11781/ applied >>>>> >>>>> >>>>> >>>>> I ran the ann benchmarks with this change, and was happy to confirm >>>>> that in my test recall with this PR is the same as in 9.3 branch, although >>>>> QPS is lower, but we can investigate QPSs later. >>>>> >>>>> glove-100-angular M:16 efConstruction:100 >>>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS >>>>> n_cands=10 0.620 2745.933 0.620 1675.500 >>>>> n_cands=20 0.680 2288.665 0.680 1512.744 >>>>> n_cands=40 0.746 1770.243 0.746 1040.240 >>>>> n_cands=80 0.809 1226.738 0.809 695.236 >>>>> n_cands=120 0.843 948.908 0.843 525.914 >>>>> n_cands=200 0.878 671.781 0.878 351.529 >>>>> n_cands=400 0.918 392.265 0.918 207.854 >>>>> n_cands=600 0.937 282.403 0.937 144.311 >>>>> n_cands=800 0.949 214.620 0.949 116.875 >>>>> >>>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msoko...@gmail.com> >>>>> wrote: >>>>> >>>>>> OK, I think I was wrong about latency having increased due to a change >>>>>> in KnnGraphTester -- I did some testing there and couldn't reproduce. >>>>>> There does seem to be a slight vector search latency increase, >>>>>> possibly noise, but maybe due to the branching introduced to check >>>>>> whether to do byte vs float operations? It would be a little >>>>>> surprising if that were the case given the small number of branchings >>>>>> compared to the number of multiplies in dot-product though. >>>>>> >>>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msoko...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > Thanks for the deep-dive Julie. I was able to reproduce the changing >>>>>> > recall. I had introduced some bugs in the diversity checks (that may >>>>>> > have partially canceled each other out? it's hard to understand what >>>>>> > was happening in the buggy case) and posted a fix today >>>>>> > https://github.com/apache/lucene/pull/11781. >>>>>> > >>>>>> > There are a couple of other outstanding issues I found while doing a >>>>>> > bunch of git bisecting; >>>>>> > >>>>>> > I think we might have introduced a (test-only) performance >>>>>> regression >>>>>> > in KnnGraphTester >>>>>> > >>>>>> > We may still be over-allocating the size of NeighborArray, leading >>>>>> to >>>>>> > excessive segmentation? I wonder if we could avoid dynamic >>>>>> > re-allocation there, and simply initialize every neighbor array to >>>>>> > 2*M+1. >>>>>> > >>>>>> > While I don't think these are necessarily blockers, given that we >>>>>> are >>>>>> > releasing HNSW improvements, it seems like we should address these, >>>>>> > especially as the build-graph-on-index is one of the things we are >>>>>> > releasing, and it is (may be?) impacted. I will see if I can put up >>>>>> a >>>>>> > patch or two. >>>>>> > >>>>>> > It would be great if you all are able to test again with >>>>>> > https://github.com/apache/lucene/pull/11781/ applied >>>>>> > >>>>>> > -Mike >>>>>> > >>>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpou...@gmail.com> >>>>>> wrote: >>>>>> > > >>>>>> > > Thank you Mike, I just backported the change. >>>>>> > > >>>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov < >>>>>> msoko...@gmail.com> wrote: >>>>>> > >> >>>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?) >>>>>> for a >>>>>> > >> while now and no test failures showed up, I guess. Should be OK >>>>>> to >>>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the >>>>>> latest, >>>>>> > >> but if you can do the backport today or tomorrow, that's fine by >>>>>> me. >>>>>> > >> >>>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpou...@gmail.com> >>>>>> wrote: >>>>>> > >> > >>>>>> > >> > Mike, I'm tempted to backport >>>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a >>>>>> bugfix that looks pretty safe to me. What do you think? >>>>>> > >> > >>>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova < >>>>>> mayya.sharip...@elastic.co.invalid> wrote: >>>>>> > >> >> >>>>>> > >> >> Thanks for running more tests, Michael. >>>>>> > >> >> It is encouraging that you saw a similar performance between >>>>>> 9.3 and 9.4. I will also run more tests with different parameters. >>>>>> > >> >> >>>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov < >>>>>> msoko...@gmail.com> wrote: >>>>>> > >> >>> >>>>>> > >> >>> As a follow-up, I ran a test using the same parameters as >>>>>> above, only >>>>>> > >> >>> changing M=200 to M=16. This did result in a single segment >>>>>> in both >>>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar; >>>>>> within noise >>>>>> > >> >>> I think. The main difference I saw was that the 9.3 index >>>>>> was written >>>>>> > >> >>> using CFS: >>>>>> > >> >>> >>>>>> > >> >>> 9.4: >>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth >>>>>> visited index ms >>>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200 >>>>>> 891402 1.00 >>>>>> > >> >>> post-filter >>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06 >>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec >>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06 >>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem >>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06 >>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex >>>>>> > >> >>> >>>>>> > >> >>> 9.3: >>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth >>>>>> visited index ms >>>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033 >>>>>> 977043 >>>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe >>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs >>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si >>>>>> > >> >>> >>>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov < >>>>>> msoko...@gmail.com> wrote: >>>>>> > >> >>> > >>>>>> > >> >>> > I ran another test. I thought I had increased the RAM >>>>>> buffer size to >>>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in >>>>>> the index that >>>>>> > >> >>> > was created. And looking at the infostream I see: >>>>>> > >> >>> > >>>>>> > >> >>> > dir=MMapDirectory@ >>>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index >>>>>> > >> >>> > lockFactory=org\ >>>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20 >>>>>> > >> >>> > index= >>>>>> > >> >>> > version=9.4.0 >>>>>> > >> >>> > >>>>>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer >>>>>> > >> >>> > ramBufferSizeMB=8000.0 >>>>>> > >> >>> > maxBufferedDocs=-1 >>>>>> > >> >>> > ... >>>>>> > >> >>> > perThreadHardLimitMB=1945 >>>>>> > >> >>> > ... >>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush >>>>>> postings as >>>>>> > >> >>> > segment _6 numDocs=555373 >>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to >>>>>> write norms >>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to >>>>>> write docValues >>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to >>>>>> write points >>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to >>>>>> write vectors >>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to >>>>>> finish stored fields >>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to >>>>>> write postings >>>>>> > >> >>> > and finish vectors >>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to >>>>>> write fieldInfos >>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment >>>>>> has 0 deleted docs >>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment >>>>>> has 0 >>>>>> > >> >>> > soft-deleted docs >>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment >>>>>> has no >>>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs >>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]: >>>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, >>>>>> _6.fdt, _6_\ >>>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx, >>>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex] >>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed >>>>>> codec=Lucene94 >>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: >>>>>> segment=_6 >>>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \ >>>>>> > >> >>> > docs/MB=521.134 >>>>>> > >> >>> > >>>>>> > >> >>> > so I think it's this perThreadHardLimit that is triggering >>>>>> the >>>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but >>>>>> the docs say: >>>>>> > >> >>> > >>>>>> > >> >>> > /** >>>>>> > >> >>> > * Expert: Sets the maximum memory consumption per >>>>>> thread triggering >>>>>> > >> >>> > a forced flush if exceeded. A >>>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully >>>>>> flushed once it >>>>>> > >> >>> > exceeds this limit even if the >>>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded. >>>>>> This is a >>>>>> > >> >>> > safety limit to prevent a {@link >>>>>> > >> >>> > * DocumentsWriterPerThread} from address space >>>>>> exhaustion due to >>>>>> > >> >>> > its internal 32 bit signed >>>>>> > >> >>> > * integer based memory addressing. The given value must >>>>>> be less >>>>>> > >> >>> > that 2GB (2048MB) >>>>>> > >> >>> > * >>>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB >>>>>> > >> >>> > */ >>>>>> > >> >>> > >>>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov < >>>>>> msoko...@gmail.com> wrote: >>>>>> > >> >>> > > >>>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to >>>>>> wrestle this to >>>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was >>>>>> the default >>>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not >>>>>> specifically set heap >>>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer >>>>>> to see if I >>>>>> > >> >>> > > can get 9.4 to produce a single segment for the same >>>>>> test settings. I >>>>>> > >> >>> > > see you used a much smaller M (16), which should have >>>>>> produced quite >>>>>> > >> >>> > > small graphs, and I agree, should have been a single >>>>>> segment. Were you >>>>>> > >> >>> > > able to verify the number of segments? >>>>>> > >> >>> > > >>>>>> > >> >>> > > Agree that decrease in recall is not expected when more >>>>>> segments are produced. >>>>>> > >> >>> > > >>>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova >>>>>> > >> >>> > > <mayya.sharip...@elastic.co.invalid> wrote: >>>>>> > >> >>> > > > >>>>>> > >> >>> > > > Hello Michael, >>>>>> > >> >>> > > > Thanks for checking. >>>>>> > >> >>> > > > Sorry for bringing this up again. >>>>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene >>>>>> 9.4 release and leaving the performance investigations for later. >>>>>> > >> >>> > > > >>>>>> > >> >>> > > > I am interested in what's the maxConn/M value you used >>>>>> for your tests? What was the heap memory and the size of the RAM buffer >>>>>> for >>>>>> indexing? >>>>>> > >> >>> > > > Usually, when we have multiple segments, recall should >>>>>> increase, not decrease. But I agree that with multiple segments we can >>>>>> see >>>>>> a big drop in QPS. >>>>>> > >> >>> > > > >>>>>> > >> >>> > > > Here is my investigation with detailed output of the >>>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a >>>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single >>>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in >>>>>> 9.4. >>>>>> > >> >>> > > > >>>>>> > >> >>> > > > Thank you. >>>>>> > >> >>> > > > >>>>>> > >> >>> > > > >>>>>> > >> >>> > > > >>>>>> > >> >>> > > > >>>>>> > >> >>> > > > >>>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward < >>>>>> romseyg...@gmail.com> wrote: >>>>>> > >> >>> > > >> >>>>>> > >> >>> > > >> Done. Thanks! >>>>>> > >> >>> > > >> >>>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov < >>>>>> msoko...@gmail.com> wrote: >>>>>> > >> >>> > > >> > >>>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch; >>>>>> seems pretty safe, >>>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks! >>>>>> > >> >>> > > >> > >>>>>> > >> >>> > > >> > Mike >>>>>> > >> >>> > > >> > >>>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward < >>>>>> romseyg...@gmail.com> wrote: >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> Hi Mike, >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> I’ve opened >>>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR >>>>>> for a problem with interval queries. Am I OK to port this to the 9.4 >>>>>> branch? >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> Thanks, Alan >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov < >>>>>> msoko...@gmail.com> wrote: >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> NOTICE: >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions >>>>>> updated to 9.5 on stable branch. >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> Please observe the normal rules: >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> * No new features may be committed to the branch. >>>>>> > >> >>> > > >> >> * Documentation patches, build patches and serious >>>>>> bug fixes may be >>>>>> > >> >>> > > >> >> committed to the branch. However, you should >>>>>> submit all patches you >>>>>> > >> >>> > > >> >> want to commit to Jira first to give others the >>>>>> chance to review >>>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind >>>>>> that it is our >>>>>> > >> >>> > > >> >> main intention to keep the branch as stable as >>>>>> possible. >>>>>> > >> >>> > > >> >> * All patches that are intended for the branch >>>>>> should first be committed >>>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable >>>>>> branch, and then into >>>>>> > >> >>> > > >> >> the current release branch. >>>>>> > >> >>> > > >> >> * Normal unstable and stable branch development >>>>>> may continue as usual. >>>>>> > >> >>> > > >> >> However, if you plan to commit a big change to the >>>>>> unstable branch >>>>>> > >> >>> > > >> >> while the branch feature freeze is in effect, >>>>>> think twice: can't the >>>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug >>>>>> fixes into the branch >>>>>> > >> >>> > > >> >> may become more difficult. >>>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and >>>>>> priority "Blocker" will delay >>>>>> > >> >>> > > >> >> a release candidate build. >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> >>>>>> --------------------------------------------------------------------- >>>>>> > >> >>> > > >> >> To unsubscribe, e-mail: >>>>>> dev-unsubscr...@lucene.apache.org >>>>>> > >> >>> > > >> >> For additional commands, e-mail: >>>>>> dev-h...@lucene.apache.org >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> >> >>>>>> > >> >>> > > >> > >>>>>> > >> >>> > > >> > >>>>>> --------------------------------------------------------------------- >>>>>> > >> >>> > > >> > To unsubscribe, e-mail: >>>>>> dev-unsubscr...@lucene.apache.org >>>>>> > >> >>> > > >> > For additional commands, e-mail: >>>>>> dev-h...@lucene.apache.org >>>>>> > >> >>> > > >> > >>>>>> > >> >>> > > >> >>>>>> > >> >>> > > >> >>>>>> > >> >>> > > >> >>>>>> --------------------------------------------------------------------- >>>>>> > >> >>> > > >> To unsubscribe, e-mail: >>>>>> dev-unsubscr...@lucene.apache.org >>>>>> > >> >>> > > >> For additional commands, e-mail: >>>>>> dev-h...@lucene.apache.org >>>>>> > >> >>> > > >> >>>>>> > >> >>> >>>>>> > >> >>> >>>>>> --------------------------------------------------------------------- >>>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> > >> >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> > >> >>> >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > -- >>>>>> > >> > Adrien >>>>>> > >> >>>>>> > >> >>>>>> --------------------------------------------------------------------- >>>>>> > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> > >> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> > >> >>>>>> > > >>>>>> > > >>>>>> > > -- >>>>>> > > Adrien >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>>> -- Adrien