Re: Subject: New branch and feature freeze for Lucene 9.4.0

Mayya Sharipova Mon, 19 Sep 2022 08:02:40 -0700

>
> It would be great if you all are able to test again with
> https://github.com/apache/lucene/pull/11781/ applied




I ran  the ann benchmarks with this change, and was happy to confirm that
in my test recall with this PR is the same as in 9.3 branch, although QPS
is lower, but we can investigate QPSs later.

glove-100-angular M:16 efConstruction:100
9.3 recall9.3 QPSthis PR recallthis PR QPS
n_cands=10 0.620 2745.933 0.620 1675.500
n_cands=20 0.680 2288.665 0.680 1512.744
n_cands=40 0.746 1770.243 0.746 1040.240
n_cands=80 0.809 1226.738 0.809 695.236
n_cands=120 0.843 948.908 0.843 525.914
n_cands=200 0.878 671.781 0.878 351.529
n_cands=400 0.918 392.265 0.918 207.854
n_cands=600 0.937 282.403 0.937 144.311
n_cands=800 0.949 214.620 0.949 116.875

On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msoko...@gmail.com> wrote:

> OK, I think I was wrong about latency having increased due to a change
> in KnnGraphTester -- I did some testing there and couldn't reproduce.
> There does seem to be a slight vector search latency increase,
> possibly noise, but maybe due to the branching introduced to check
> whether to do byte vs float operations? It would be a little
> surprising if that were the case given the small number of branchings
> compared to the number of multiplies in dot-product though.
>
> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msoko...@gmail.com>
> wrote:
> >
> > Thanks for the deep-dive Julie. I was able to reproduce the changing
> > recall. I had introduced some bugs in the diversity checks (that may
> > have partially canceled each other out? it's hard to understand what
> > was happening in the buggy case) and posted a fix today
> > https://github.com/apache/lucene/pull/11781.
> >
> > There are a couple of other outstanding issues I found while doing a
> > bunch of git bisecting;
> >
> > I think we might have introduced a (test-only) performance regression
> > in KnnGraphTester
> >
> > We may still be over-allocating the size of NeighborArray, leading to
> > excessive segmentation? I wonder if we could avoid dynamic
> > re-allocation there, and simply initialize every neighbor array to
> > 2*M+1.
> >
> > While I don't think these are necessarily blockers, given that we are
> > releasing HNSW improvements, it seems like we should address these,
> > especially as the build-graph-on-index is one of the things we are
> > releasing, and it is (may be?) impacted. I will see if I can put up a
> > patch or two.
> >
> > It would be great if you all are able to test again with
> > https://github.com/apache/lucene/pull/11781/ applied
> >
> > -Mike
> >
> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpou...@gmail.com> wrote:
> > >
> > > Thank you Mike, I just backported the change.
> > >
> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msoko...@gmail.com>
> wrote:
> > >>
> > >> it looks like a small bug fix, we have had on main (and 9.x?) for a
> > >> while now and no test failures showed up, I guess. Should be OK to
> > >> port. I plan to cut artifacts this weekend, or Monday at the latest,
> > >> but if you can do the backport today or tomorrow, that's fine by me.
> > >>
> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpou...@gmail.com>
> wrote:
> > >> >
> > >> > Mike, I'm tempted to backport
> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
> bugfix that looks pretty safe to me. What do you think?
> > >> >
> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
> mayya.sharip...@elastic.co.invalid> wrote:
> > >> >>
> > >> >> Thanks for running more tests, Michael.
> > >> >> It is encouraging that you saw a similar performance between 9.3
> and 9.4. I will also run more tests with different parameters.
> > >> >>
> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
> msoko...@gmail.com> wrote:
> > >> >>>
> > >> >>> As a follow-up, I ran a test using the same parameters as above,
> only
> > >> >>> changing M=200 to M=16. This did result in a single segment in
> both
> > >> >>> cases (9.3, 9.4) and the performance was pretty similar; within
> noise
> > >> >>> I think. The main difference I saw was that the 9.3 index was
> written
> > >> >>> using CFS:
> > >> >>>
> > >> >>> 9.4:
> > >> >>> recall  latency nDoc    fanout  maxConn beamWidth       visited
> index ms
> > >> >>> 0.755    1.36   1000000 100     16      100     200     891402
> 1.00
> > >> >>>  post-filter
> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
> > >> >>>
> > >> >>> 9.3:
> > >> >>> recall  latency nDoc    fanout  maxConn beamWidth       visited
> index ms
> > >> >>> 0.775    1.34   1000000 100     16      100     4033    977043
> > >> >>> rw-r--r-- 1 sokolovm amazon  297 Sep 13 13:26 _0.cfe
> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
> > >> >>> -rw-r--r-- 1 sokolovm amazon  340 Sep 13 13:26 _0.si
> > >> >>>
> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
> msoko...@gmail.com> wrote:
> > >> >>> >
> > >> >>> > I ran another test. I thought I had increased the RAM buffer
> size to
> > >> >>> > 8G and heap to 16G. However I still see two segments in the
> index that
> > >> >>> > was created. And looking at the infostream I see:
> > >> >>> >
> > >> >>> > dir=MMapDirectory@
> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
> > >> >>> > lockFactory=org\
> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
> > >> >>> > index=
> > >> >>> > version=9.4.0
> > >> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> > >> >>> > ramBufferSizeMB=8000.0
> > >> >>> > maxBufferedDocs=-1
> > >> >>> > ...
> > >> >>> > perThreadHardLimitMB=1945
> > >> >>> > ...
> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
> > >> >>> > segment _6 numDocs=555373
> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write
> norms
> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write
> docValues
> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write
> points
> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write
> vectors
> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish
> stored fields
> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write
> postings
> > >> >>> > and finish vectors
> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write
> fieldInfos
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has
> 0 deleted docs
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
> > >> >>> > soft-deleted docs
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has
> no
> > >> >>> > vectors; no norms; no docValues; no prox; freqs
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
> _6.fdt, _6_\
> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
> codec=Lucene94
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
> segment=_6
> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
> > >> >>> > docs/MB=521.134
> > >> >>> >
> > >> >>> > so I think it's this perThreadHardLimit that is triggering the
> > >> >>> > flushes? TBH this isn't something I had seen before; but the
> docs say:
> > >> >>> >
> > >> >>> >   /**
> > >> >>> >    * Expert: Sets the maximum memory consumption per thread
> triggering
> > >> >>> > a forced flush if exceeded. A
> > >> >>> >    * {@link DocumentsWriterPerThread} is forcefully flushed
> once it
> > >> >>> > exceeds this limit even if the
> > >> >>> >    * {@link #getRAMBufferSizeMB()} has not been exceeded. This
> is a
> > >> >>> > safety limit to prevent a {@link
> > >> >>> >    * DocumentsWriterPerThread} from address space exhaustion
> due to
> > >> >>> > its internal 32 bit signed
> > >> >>> >    * integer based memory addressing. The given value must be
> less
> > >> >>> > that 2GB (2048MB)
> > >> >>> >    *
> > >> >>> >    * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
> > >> >>> >    */
> > >> >>> >
> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
> msoko...@gmail.com> wrote:
> > >> >>> > >
> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to wrestle
> this to
> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was the
> default
> > >> >>> > > checked in, which is weirdly: 1994MB. I did not specifically
> set heap
> > >> >>> > > size. I used maxConn/M=200. I'll  try with larger buffer to
> see if I
> > >> >>> > > can get 9.4 to produce a single segment for the same test
> settings. I
> > >> >>> > > see you used a much smaller M (16), which should have
> produced quite
> > >> >>> > > small graphs, and I agree, should have been a single segment.
> Were you
> > >> >>> > > able to verify the number of segments?
> > >> >>> > >
> > >> >>> > > Agree that decrease in recall is not expected when more
> segments are produced.
> > >> >>> > >
> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
> > >> >>> > > <mayya.sharip...@elastic.co.invalid> wrote:
> > >> >>> > > >
> > >> >>> > > > Hello Michael,
> > >> >>> > > > Thanks for checking.
> > >> >>> > > > Sorry for bringing this up again.
> > >> >>> > > > First of all, I am ok with proceeding with the Lucene 9.4
> release and leaving the performance investigations for later.
> > >> >>> > > >
> > >> >>> > > > I am interested in what's the maxConn/M value you used for
> your tests? What was the heap memory and the size of the RAM buffer for
> indexing?
> > >> >>> > > > Usually, when we have multiple segments, recall should
> increase, not decrease. But I agree that with multiple segments we can see
> a big drop in QPS.
> > >> >>> > > >
> > >> >>> > > > Here is my investigation with detailed output of the
> performance difference between 9.3 and 9.4 releases. In my tests I used a
> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
> > >> >>> > > >
> > >> >>> > > > Thank you.
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
> romseyg...@gmail.com> wrote:
> > >> >>> > > >>
> > >> >>> > > >> Done.  Thanks!
> > >> >>> > > >>
> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
> msoko...@gmail.com> wrote:
> > >> >>> > > >> >
> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
> seems pretty safe,
> > >> >>> > > >> > please go ahead and port to 9.4.  Thanks!
> > >> >>> > > >> >
> > >> >>> > > >> > Mike
> > >> >>> > > >> >
> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
> romseyg...@gmail.com> wrote:
> > >> >>> > > >> >>
> > >> >>> > > >> >> Hi Mike,
> > >> >>> > > >> >>
> > >> >>> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760
> as a small bug fix PR for a problem with interval queries.  Am I OK to port
> this to the 9.4 branch?
> > >> >>> > > >> >>
> > >> >>> > > >> >> Thanks, Alan
> > >> >>> > > >> >>
> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
> msoko...@gmail.com> wrote:
> > >> >>> > > >> >>
> > >> >>> > > >> >> NOTICE:
> > >> >>> > > >> >>
> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated to
> 9.5 on stable branch.
> > >> >>> > > >> >>
> > >> >>> > > >> >> Please observe the normal rules:
> > >> >>> > > >> >>
> > >> >>> > > >> >> * No new features may be committed to the branch.
> > >> >>> > > >> >> * Documentation patches, build patches and serious bug
> fixes may be
> > >> >>> > > >> >> committed to the branch. However, you should submit all
> patches you
> > >> >>> > > >> >> want to commit to Jira first to give others the chance
> to review
> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind that
> it is our
> > >> >>> > > >> >> main intention to keep the branch as stable as possible.
> > >> >>> > > >> >> * All patches that are intended for the branch should
> first be committed
> > >> >>> > > >> >> to the unstable branch, merged into the stable branch,
> and then into
> > >> >>> > > >> >> the current release branch.
> > >> >>> > > >> >> * Normal unstable and stable branch development may
> continue as usual.
> > >> >>> > > >> >> However, if you plan to commit a big change to the
> unstable branch
> > >> >>> > > >> >> while the branch feature freeze is in effect, think
> twice: can't the
> > >> >>> > > >> >> addition wait a couple more days? Merges of bug fixes
> into the branch
> > >> >>> > > >> >> may become more difficult.
> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority
> "Blocker" will delay
> > >> >>> > > >> >> a release candidate build.
> > >> >>> > > >> >>
> > >> >>> > > >> >>
> ---------------------------------------------------------------------
> > >> >>> > > >> >> To unsubscribe, e-mail:
> dev-unsubscr...@lucene.apache.org
> > >> >>> > > >> >> For additional commands, e-mail:
> dev-h...@lucene.apache.org
> > >> >>> > > >> >>
> > >> >>> > > >> >>
> > >> >>> > > >> >
> > >> >>> > > >> >
> ---------------------------------------------------------------------
> > >> >>> > > >> > To unsubscribe, e-mail:
> dev-unsubscr...@lucene.apache.org
> > >> >>> > > >> > For additional commands, e-mail:
> dev-h...@lucene.apache.org
> > >> >>> > > >> >
> > >> >>> > > >>
> > >> >>> > > >>
> > >> >>> > > >>
> ---------------------------------------------------------------------
> > >> >>> > > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >> >>> > > >> For additional commands, e-mail:
> dev-h...@lucene.apache.org
> > >> >>> > > >>
> > >> >>>
> > >> >>>
> ---------------------------------------------------------------------
> > >> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >> >>>
> > >> >
> > >> >
> > >> > --
> > >> > Adrien
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>
> > >
> > >
> > > --
> > > Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0

Reply via email to