Re: [JENKINS] Lucene-9.x-Linux (64bit/openj9/jdk-17.0.5) - Build # 9891 - Unstable!

2023-04-19 Thread Dawid Weiss
openj9. Does not reproduce for me.

On Thu, Apr 20, 2023 at 4:50 AM Policeman Jenkins Server
 wrote:
>
> Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/9891/
> Java: 64bit/openj9/jdk-17.0.5 -XX:-UseCompressedOops -Xgcpolicy:metronome
>
> 1 tests failed.
> FAILED:  org.apache.lucene.misc.document.TestLazyDocument.testLazy
>
> Error Message:
> java.lang.ArrayIndexOutOfBoundsException
>
> Stack Trace:
> java.lang.ArrayIndexOutOfBoundsException
> at 
> __randomizedtesting.SeedInfo.seed([620750232F3F24D0:53DE5D719F63F07B]:0)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.BytesStore$2.readByte(BytesStore.java:459)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.store.DataInput.readVLong(DataInput.java:224)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.store.DataInput.readVLong(DataInput.java:209)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readUnpackedNodeTarget(FST.java:1119)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readArc(FST.java:1385)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readArcByDirectAddressing(FST.java:1292)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readNextRealArc(FST.java:1325)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readFirstRealTargetArc(FST.java:1178)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FST.readNextArc(FST.java:1202)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.FSTEnum.doNext(FSTEnum.java:112)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.util.fst.BytesRefFSTEnum.next(BytesRefFSTEnum.java:55)
> at 
> app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter$PendingBlock.append(OrdsBlockTreeTermsWriter.java:452)
> at 
> app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter$PendingBlock.compileIndex(OrdsBlockTreeTermsWriter.java:419)
> at 
> app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter$TermsWriter.writeBlocks(OrdsBlockTreeTermsWriter.java:616)
> at 
> app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter$TermsWriter.finish(OrdsBlockTreeTermsWriter.java:917)
> at 
> app/org.apache.lucene.codecs@9.6.0-SNAPSHOT/org.apache.lucene.codecs.blocktreeords.OrdsBlockTreeTermsWriter.write(OrdsBlockTreeTermsWriter.java:258)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:172)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:135)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexingChain.flush(IndexingChain.java:310)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:392)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:492)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:671)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4194)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4168)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1322)
> at 
> app/org.apache.lucene.core@9.6.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1362)
> at 
> app//org.apache.lucene.misc.document.TestLazyDocument.createIndex(TestLazyDocument.java:82)
> at 
> java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> at 
> java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> java.base@17.0.5/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base@17.0.5/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> app/randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at 
> app/randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
> at 
> 

Should IndexWriter.flush return seqNo?

2023-04-19 Thread Patrick Zhai
Hi folks,
I just realized that while "commit" returns the sequence number which
represents the latest event that committed in the index, "flush" still
returns nothing. Since they're essentially the same except fsync I wonder
whether there's any specific reason to not do so?

Best
Patrick


Re: HNSW questions

2023-04-19 Thread Michael Sokolov
That class is intended for use by the Lucene index writer - it's not
designed as a general purpose class for re-use outside that context.
And IndexWriter writes documents to disk in bulk.

On Wed, Apr 19, 2023 at 3:54 PM Jonathan Ellis  wrote:
>
> Thanks, Michael!
>
> Looking at the paper by Malkov and Yashunin, it looks like the algorithm 
> allows for building the hnsw graph incrementally.  Why does our 
> implementation require specifying all the vectors up front to 
> HnswGraphBuilder.create?
>
> On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov  wrote:
>>
>> These vector values have internal buffers they use to return the vectors. In 
>> order to compare two vectors we need to use two independent sources so that 
>> one doesn't overwrite this internal state when fetching the second vector.
>>
>> Sorry I forgot the second question and can't see it on my phone. Brb
>>
>> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis  wrote:
>>>
>>> HI all, a couple questions on how HNSW works:
>>>
>>> 1. What is driving the requirement for two copies of the input vectors?  It 
>>> looks like the RAVV implementations do shallow copies, so the vector from A 
>>> is the same that would be returned by B.  What am I missing?
>>>
>>> 2. What is the intended behavior when adding identical vectors to a HNSW?  
>>> It looks like when I supply 10 identical vectors, they all get added to the 
>>> graph, but when I search for the nearest neighbors, I only get one of them 
>>> in the result set.
>>>
>>> --
>>> Jonathan Ellis
>>> co-founder, http://www.datastax.com
>>> @spyced
>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 9.6 release

2023-04-19 Thread Michael Sokolov
Yes, thanks Alan!

On Wed, Apr 19, 2023 at 3:41 PM Michael Wechner
 wrote:
>
> +1
>
> Thanks!
>
> Michael
>
> Am 19.04.23 um 18:09 schrieb Benjamin Trent:
>
> +1 !
>
> You rock Alan!
>
> On Wed, Apr 19, 2023, 9:54 AM Ignacio Vera  wrote:
>>
>> +1
>>
>> Thanks Alan!
>>
>> On Wed, Apr 19, 2023 at 1:27 PM Alan Woodward  wrote:
>>>
>>> Hi all,
>>>
>>> It’s been a while since our last release, and we have a number of nice 
>>> improvements and optimisations sitting in the 9x branch.  I propose that we 
>>> start the process for a 9.6 release, and I will volunteer to be the release 
>>> manager.  If there are no objections, I will cut a release branch one week 
>>> today, April 26th.
>>>
>>> - Alan
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: HNSW questions

2023-04-19 Thread Jonathan Ellis
Thanks, Michael!

Looking at the paper by Malkov and Yashunin, it looks like the algorithm
allows for building the hnsw graph incrementally.  Why does our
implementation require specifying all the vectors up front to
HnswGraphBuilder.create?

On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov  wrote:

> These vector values have internal buffers they use to return the vectors.
> In order to compare two vectors we need to use two independent sources so
> that one doesn't overwrite this internal state when fetching the second
> vector.
>
> Sorry I forgot the second question and can't see it on my phone. Brb
>
> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis  wrote:
>
>> HI all, a couple questions on how HNSW works:
>>
>> 1. What is driving the requirement for two copies of the input vectors?
>> It looks like the RAVV implementations do shallow copies, so the vector
>> from A is the same that would be returned by B.  What am I missing?
>>
>> 2. What is the intended behavior when adding identical vectors to a
>> HNSW?  It looks like when I supply 10 identical vectors, they all get added
>> to the graph, but when I search for the nearest neighbors, I only get one
>> of them in the result set.
>>
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
>>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Lucene 9.6 release

2023-04-19 Thread Michael Wechner

+1

Thanks!

Michael

Am 19.04.23 um 18:09 schrieb Benjamin Trent:

+1 !

You rock Alan!

On Wed, Apr 19, 2023, 9:54 AM Ignacio Vera  wrote:

+1

Thanks Alan!

On Wed, Apr 19, 2023 at 1:27 PM Alan Woodward
 wrote:

Hi all,

It’s been a while since our last release, and we have a number
of nice improvements and optimisations sitting in the 9x
branch.  I propose that we start the process for a 9.6
release, and I will volunteer to be the release manager.  If
there are no objections, I will cut a release branch one week
today, April 26th.

- Alan
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 9.6 release

2023-04-19 Thread Benjamin Trent
+1 !

You rock Alan!

On Wed, Apr 19, 2023, 9:54 AM Ignacio Vera  wrote:

> +1
>
> Thanks Alan!
>
> On Wed, Apr 19, 2023 at 1:27 PM Alan Woodward 
> wrote:
>
>> Hi all,
>>
>> It’s been a while since our last release, and we have a number of nice
>> improvements and optimisations sitting in the 9x branch.  I propose that we
>> start the process for a 9.6 release, and I will volunteer to be the release
>> manager.  If there are no objections, I will cut a release branch one week
>> today, April 26th.
>>
>> - Alan
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>


Re: Lucene 9.6 release

2023-04-19 Thread Ignacio Vera
+1

Thanks Alan!

On Wed, Apr 19, 2023 at 1:27 PM Alan Woodward  wrote:

> Hi all,
>
> It’s been a while since our last release, and we have a number of nice
> improvements and optimisations sitting in the 9x branch.  I propose that we
> start the process for a 9.6 release, and I will volunteer to be the release
> manager.  If there are no objections, I will cut a release branch one week
> today, April 26th.
>
> - Alan
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Lucene 9.6 release

2023-04-19 Thread Alan Woodward
Hi all,

It’s been a while since our last release, and we have a number of nice 
improvements and optimisations sitting in the 9x branch.  I propose that we 
start the process for a 9.6 release, and I will volunteer to be the release 
manager.  If there are no objections, I will cut a release branch one week 
today, April 26th.

- Alan
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: HNSW questions

2023-04-19 Thread Michael Sokolov
Oh identical vectors. Basically unsupported. If you create a large index
filled with identical vectors it leads to pathological behavior. Seems to
be a weakness in the algorithm. If you have any idea how to improve that,
it would be welcome. But in real world scenarios, it doesn't seem to arise?

On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis  wrote:

> HI all, a couple questions on how HNSW works:
>
> 1. What is driving the requirement for two copies of the input vectors?
> It looks like the RAVV implementations do shallow copies, so the vector
> from A is the same that would be returned by B.  What am I missing?
>
> 2. What is the intended behavior when adding identical vectors to a HNSW?
> It looks like when I supply 10 identical vectors, they all get added to the
> graph, but when I search for the nearest neighbors, I only get one of them
> in the result set.
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


Re: HNSW questions

2023-04-19 Thread Michael Sokolov
These vector values have internal buffers they use to return the vectors.
In order to compare two vectors we need to use two independent sources so
that one doesn't overwrite this internal state when fetching the second
vector.

Sorry I forgot the second question and can't see it on my phone. Brb

On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis  wrote:

> HI all, a couple questions on how HNSW works:
>
> 1. What is driving the requirement for two copies of the input vectors?
> It looks like the RAVV implementations do shallow copies, so the vector
> from A is the same that would be returned by B.  What am I missing?
>
> 2. What is the intended behavior when adding identical vectors to a HNSW?
> It looks like when I supply 10 identical vectors, they all get added to the
> graph, but when I search for the nearest neighbors, I only get one of them
> in the result set.
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>