Re: [JENKINS] Lucene-9.x-Linux (64bit/openj9/jdk-17.0.5) - Build # 9891 - Unstable!
openj9. Does not reproduce for me. On Thu, Apr 20, 2023 at 4:50 AM Policeman Jenkins Server wrote: > > Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/9891/ > Java: 64bit/openj9/jdk-17.0.5 -XX:-UseCompressedOops -Xgcpolicy:metronome > > 1 tests failed. > FAILED: org.apache.lucene.misc.document.TestLazyDocument.testLazy > > Error Message: > java.lang.ArrayIndexOutOfBoundsException > > Stack Trace: > java.lang.ArrayIndexOutOfBoundsException > at > __randomizedtesting.SeedInfo.seed([620750232F3F24D0:53DE5D719F63F07B]:0) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.BytesStore$2.readByte(BytesStore.java:459) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.store.DataInput.readVLong(DataInput.java:224) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.store.DataInput.readVLong(DataInput.java:209) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readUnpackedNodeTarget(FST.java:1119) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readArc(FST.java:1385) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readArcByDirectAddressing(FST.java:1292) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readNextRealArc(FST.java:1325) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readFirstRealTargetArc(FST.java:1178) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readNextArc(FST.java:1202) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FSTEnum.doNext(FSTEnum.java:112) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.BytesRefFSTEnum.next(BytesRefFSTEnum.java:55) > at > app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter$PendingBlock.append(OrdsBlockTreeTermsWriter.java:452) > at > app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter$PendingBlock.compileIndex(OrdsBlockTreeTermsWriter.java:419) > at > app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter$TermsWriter.writeBlocks(OrdsBlockTreeTermsWriter.java:616) > at > app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter$TermsWriter.finish(OrdsBlockTreeTermsWriter.java:917) > at > app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter.write(OrdsBlockTreeTermsWriter.java:258) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:172) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:135) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexingChain.flush(IndexingChain.java:310) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:392) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:492) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:671) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4194) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4168) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1322) > at > app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1362) > at > app//org.apache.lucene.misc.document.TestLazyDocument.createIndex(TestLazyDocument.java:82) > at > java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base@17.0.5/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base@17.0.5/java.lang.reflect.Method.invoke(Method.java:568) > at > app/randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) > at > app/randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980) > at >
Should IndexWriter.flush return seqNo?
Hi folks, I just realized that while "commit" returns the sequence number which represents the latest event that committed in the index, "flush" still returns nothing. Since they're essentially the same except fsync I wonder whether there's any specific reason to not do so? Best Patrick
Re: HNSW questions
That class is intended for use by the Lucene index writer - it's not designed as a general purpose class for re-use outside that context. And IndexWriter writes documents to disk in bulk. On Wed, Apr 19, 2023 at 3:54 PM Jonathan Ellis wrote: > > Thanks, Michael! > > Looking at the paper by Malkov and Yashunin, it looks like the algorithm > allows for building the hnsw graph incrementally. Why does our > implementation require specifying all the vectors up front to > HnswGraphBuilder.create? > > On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov wrote: >> >> These vector values have internal buffers they use to return the vectors. In >> order to compare two vectors we need to use two independent sources so that >> one doesn't overwrite this internal state when fetching the second vector. >> >> Sorry I forgot the second question and can't see it on my phone. Brb >> >> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote: >>> >>> HI all, a couple questions on how HNSW works: >>> >>> 1. What is driving the requirement for two copies of the input vectors? It >>> looks like the RAVV implementations do shallow copies, so the vector from A >>> is the same that would be returned by B. What am I missing? >>> >>> 2. What is the intended behavior when adding identical vectors to a HNSW? >>> It looks like when I supply 10 identical vectors, they all get added to the >>> graph, but when I search for the nearest neighbors, I only get one of them >>> in the result set. >>> >>> -- >>> Jonathan Ellis >>> co-founder, http://www.datastax.com >>> @spyced > > > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 9.6 release
Yes, thanks Alan! On Wed, Apr 19, 2023 at 3:41 PM Michael Wechner wrote: > > +1 > > Thanks! > > Michael > > Am 19.04.23 um 18:09 schrieb Benjamin Trent: > > +1 ! > > You rock Alan! > > On Wed, Apr 19, 2023, 9:54 AM Ignacio Vera wrote: >> >> +1 >> >> Thanks Alan! >> >> On Wed, Apr 19, 2023 at 1:27 PM Alan Woodward wrote: >>> >>> Hi all, >>> >>> It’s been a while since our last release, and we have a number of nice >>> improvements and optimisations sitting in the 9x branch. I propose that we >>> start the process for a 9.6 release, and I will volunteer to be the release >>> manager. If there are no objections, I will cut a release branch one week >>> today, April 26th. >>> >>> - Alan >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: HNSW questions
Thanks, Michael! Looking at the paper by Malkov and Yashunin, it looks like the algorithm allows for building the hnsw graph incrementally. Why does our implementation require specifying all the vectors up front to HnswGraphBuilder.create? On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov wrote: > These vector values have internal buffers they use to return the vectors. > In order to compare two vectors we need to use two independent sources so > that one doesn't overwrite this internal state when fetching the second > vector. > > Sorry I forgot the second question and can't see it on my phone. Brb > > On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote: > >> HI all, a couple questions on how HNSW works: >> >> 1. What is driving the requirement for two copies of the input vectors? >> It looks like the RAVV implementations do shallow copies, so the vector >> from A is the same that would be returned by B. What am I missing? >> >> 2. What is the intended behavior when adding identical vectors to a >> HNSW? It looks like when I supply 10 identical vectors, they all get added >> to the graph, but when I search for the nearest neighbors, I only get one >> of them in the result set. >> >> -- >> Jonathan Ellis >> co-founder, http://www.datastax.com >> @spyced >> > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced
Re: Lucene 9.6 release
+1 Thanks! Michael Am 19.04.23 um 18:09 schrieb Benjamin Trent: +1 ! You rock Alan! On Wed, Apr 19, 2023, 9:54 AM Ignacio Vera wrote: +1 Thanks Alan! On Wed, Apr 19, 2023 at 1:27 PM Alan Woodward wrote: Hi all, It’s been a while since our last release, and we have a number of nice improvements and optimisations sitting in the 9x branch. I propose that we start the process for a 9.6 release, and I will volunteer to be the release manager. If there are no objections, I will cut a release branch one week today, April 26th. - Alan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 9.6 release
+1 ! You rock Alan! On Wed, Apr 19, 2023, 9:54 AM Ignacio Vera wrote: > +1 > > Thanks Alan! > > On Wed, Apr 19, 2023 at 1:27 PM Alan Woodward > wrote: > >> Hi all, >> >> It’s been a while since our last release, and we have a number of nice >> improvements and optimisations sitting in the 9x branch. I propose that we >> start the process for a 9.6 release, and I will volunteer to be the release >> manager. If there are no objections, I will cut a release branch one week >> today, April 26th. >> >> - Alan >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>
Re: Lucene 9.6 release
+1 Thanks Alan! On Wed, Apr 19, 2023 at 1:27 PM Alan Woodward wrote: > Hi all, > > It’s been a while since our last release, and we have a number of nice > improvements and optimisations sitting in the 9x branch. I propose that we > start the process for a 9.6 release, and I will volunteer to be the release > manager. If there are no objections, I will cut a release branch one week > today, April 26th. > > - Alan > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Lucene 9.6 release
Hi all, It’s been a while since our last release, and we have a number of nice improvements and optimisations sitting in the 9x branch. I propose that we start the process for a 9.6 release, and I will volunteer to be the release manager. If there are no objections, I will cut a release branch one week today, April 26th. - Alan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: HNSW questions
Oh identical vectors. Basically unsupported. If you create a large index filled with identical vectors it leads to pathological behavior. Seems to be a weakness in the algorithm. If you have any idea how to improve that, it would be welcome. But in real world scenarios, it doesn't seem to arise? On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote: > HI all, a couple questions on how HNSW works: > > 1. What is driving the requirement for two copies of the input vectors? > It looks like the RAVV implementations do shallow copies, so the vector > from A is the same that would be returned by B. What am I missing? > > 2. What is the intended behavior when adding identical vectors to a HNSW? > It looks like when I supply 10 identical vectors, they all get added to the > graph, but when I search for the nearest neighbors, I only get one of them > in the result set. > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced >
Re: HNSW questions
These vector values have internal buffers they use to return the vectors. In order to compare two vectors we need to use two independent sources so that one doesn't overwrite this internal state when fetching the second vector. Sorry I forgot the second question and can't see it on my phone. Brb On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote: > HI all, a couple questions on how HNSW works: > > 1. What is driving the requirement for two copies of the input vectors? > It looks like the RAVV implementations do shallow copies, so the vector > from A is the same that would be returned by B. What am I missing? > > 2. What is the intended behavior when adding identical vectors to a HNSW? > It looks like when I supply 10 identical vectors, they all get added to the > graph, but when I search for the nearest neighbors, I only get one of them > in the result set. > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced >