ChrisHegarty commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1611648786
I ran @mayya-sharipova's exact same benchmark/test on my machine. Here are
the results.
### Test environment
- Dataset:
- [nq](https://huggingface.co/datasets/BeIR/nq) dataset with `text` field
embedded with OpenAI `text-embedding-ada-002` model, 1536 dims
-
[KnnGraphTester](https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java)
- maxConn: 16, beamWidthIndex: 100
- Linux, x86_64 11th Intel Core i5-11400 @ 2.60GHz - AVX 512
- JDK 20.0.1
### Result
| Panama(bits)| dims | time (secs) |
| ----------- | --------|-------------|
| No | 1024 | 3136 |
| Yes(512) | 1536 | 2633 |
So the test run with 1536 dims and Panama enabled at AVX 512 was 503 secs
(or ~16%) faster than the run with 1024 dims and No Panama.
### Test1:
- Lucene 9.7.0
- Panama Vector API **not** enabled
- vector dims=1024 (OpenAi vectors that were cut off to first 1024 dims)
- Results: Indexed 2680961 documents in 3136s
<details>
<summary>Details</summary>
```
davekim$ time /home/chegar/binaries/jdk-20.0.1/bin/java -cp
lucene-9.7.0/modules/*:/home/chegar/git/lucene/lucene/core/build/classes/java/test
-Xmx16g -Xms16g org.apache.lucene.util.hnsw.KnnGraphTester -dim 1024 -ndoc
2680961 -reindex -docs
vector_search-open_ai_vectors_1024-vectors_dims1024.bin -maxConn 16
-beamWidthIndex 100
creating index in
vector_search-open_ai_vectors_1024-vectors_dims1024.bin-16-100.index
Jun 28, 2023 1:44:34 PM
org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 20; to disable start with
-Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
MS 0 [2023-06-28T12:44:34.340877459Z; main]: initDynamicDefaults
maxThreadCount=4 maxMergeCount=9
IFD 0 [2023-06-28T12:44:34.355786340Z; main]: init: current segments file is
"segments";
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@7e9a5fbe
IFD 0 [2023-06-28T12:44:34.358595927Z; main]: now delete 0 files: []
IFD 0 [2023-06-28T12:44:34.359321686Z; main]: now checkpoint "" [0 segments
; isCommit = false]
IFD 0 [2023-06-28T12:44:34.359380405Z; main]: now delete 0 files: []
IFD 0 [2023-06-28T12:44:34.360606701Z; main]: 0 ms to checkpoint
IW 0 [2023-06-28T12:44:34.361060247Z; main]: init: create=true reader=null
IW 0 [2023-06-28T12:44:34.367050357Z; main]:
dir=MMapDirectory@/home/chegar/git/lucene-vector-bench/vector_search-open_ai_vectors_1024-vectors_dims1024.bin-16-100.index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@46238e3f
index=
version=9.7.0
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
ramBufferSizeMB=1994.0
maxBufferedDocs=-1
mergedSegmentWarmer=null
delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
commit=null
openMode=CREATE
similarity=org.apache.lucene.search.similarities.BM25Similarity
mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=4, maxMergeCount=9,
ioThrottle=true
codec=Lucene95
infoStream=org.apache.lucene.util.PrintStreamInfoStream
mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10,
maxMergedSegmentMB=5120.0, floorSegmentMB=2.0,
forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0,
maxCFSSegmentSizeMB=8.796093022208E12, noCFSRatio=0.1, deletesPctAllowed=20.0
readerPooling=true
perThreadHardLimitMB=1945
useCompoundFile=false
commitOnClose=true
indexSort=null
checkPendingFlushOnUpdate=true
softDeletesField=null
maxFullFlushMergeWaitMillis=500
leafSorter=null
eventListener=org.apache.lucene.index.IndexWriterEventListener$1@6c9f5c0d
writer=org.apache.lucene.index.IndexWriter@de3a06f
IW 0 [2023-06-28T12:44:34.367221110Z; main]:
MMapDirectory.UNMAP_SUPPORTED=true
Jun 28, 2023 1:44:34 PM org.apache.lucene.util.VectorUtilProvider lookup
WARNING: Java vector incubator module is not readable. For optimal vector
performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
DWPT 0 [2023-06-28T12:53:31.591056430Z; main]: flush postings as segment _0
numDocs=460521
IW 0 [2023-06-28T12:53:31.591842896Z; main]: 0 ms to write norms
IW 0 [2023-06-28T12:53:31.592260907Z; main]: 0 ms to write docValues
IW 0 [2023-06-28T12:53:31.592370750Z; main]: 0 ms to write points
IW 0 [2023-06-28T12:53:32.987321518Z; main]: 1394 ms to write vectors
IW 0 [2023-06-28T12:53:32.997512174Z; main]: 10 ms to finish stored fields
IW 0 [2023-06-28T12:53:32.997693539Z; main]: 0 ms to write postings and
finish vectors
IW 0 [2023-06-28T12:53:32.998159715Z; main]: 0 ms to write fieldInfos
DWPT 0 [2023-06-28T12:53:32.999257618Z; main]: new segment has 0 deleted docs
DWPT 0 [2023-06-28T12:53:32.999365945Z; main]: new segment has 0
soft-deleted docs
DWPT 0 [2023-06-28T12:53:33.000456314Z; main]: new segment has no vectors;
no norms; no docValues; no prox; freqs
DWPT 0 [2023-06-28T12:53:33.000586334Z; main]:
flushedFiles=[_0_Lucene95HnswVectorsFormat_0.vem, _0.fdm,
_0_Lucene95HnswVectorsFormat_0.vec, _0.fdx, _0_Lucene95HnswVectorsFormat_0.vex,
_0.fdt, _0.fnm]
DWPT 0 [2023-06-28T12:53:33.000673681Z; main]: flushed codec=Lucene95
DWPT 0 [2023-06-28T12:53:33.001725500Z; main]: flushed: segment=_0
ramUsed=1,945.017 MB newFlushedSize=1,824.658 MB docs/MB=252.388
DWPT 0 [2023-06-28T12:53:33.002919290Z; main]: flush time 1412.932331 ms
IW 0 [2023-06-28T12:53:33.004048349Z; main]: publishFlushedSegment
seg-private updates=null
IW 0 [2023-06-28T12:53:33.004702334Z; main]: publishFlushedSegment
_0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic,
lucene.version=9.7.0, source=flush, timestamp=1687956813001,
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation,
os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
:id=1qx5zulv7rcv8o0t4f62zfjjz
BD 0 [2023-06-28T12:53:33.006074639Z; main]: finished packet delGen=1 now
completedDelGen=1
IW 0 [2023-06-28T12:53:33.007517182Z; main]: publish sets newSegment
delGen=1 seg=_0(9.7.0):C460521:[diagnostics={os.arch=amd64,
os.version=6.2.0-23-generic, lucene.version=9.7.0, source=flush,
timestamp=1687956813001, java.runtime.version=20.0.1+9-29, java.vendor=Oracle
Corporation,
os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
:id=1qx5zulv7rcv8o0t4f62zfjjz
IFD 0 [2023-06-28T12:53:33.007718974Z; main]: now checkpoint
"_0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic,
lucene.version=9.7.0, source=flush, timestamp=1687956813001,
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation,
os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
:id=1qx5zulv7rcv8o0t4f62zfjk0" [1 segments ; isCommit = false]
IFD 0 [2023-06-28T12:53:33.008114732Z; main]: now delete 0 files: []
IFD 0 [2023-06-28T12:53:33.008168685Z; main]: 0 ms to checkpoint
MP 0 [2023-06-28T12:53:33.010309939Z; main]:
seg=_0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic,
lucene.version=9.7.0, source=flush, timestamp=1687956813001,
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation,
os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
:id=1qx5zulv7rcv8o0t4f62zfjk0 size=1824.659 MB
MP 0 [2023-06-28T12:53:33.010610953Z; main]: findMerges: 1 segments
MP ...
Indexed 2680961 documents in 3136s
```
</details>
### Test2
- Lucene 9.7 with FloatVectorValues.MAX_DIMENSIONS patched to a
MAX_DIMENSIONS of 2048
- Panama Vector API **enabled** preferredBitSize=`512`
- vector dims=1536
- Results: Indexed 2680961 documents in 2633s
<details>
<summary>Details</summary>
```
davekim$ time /home/chegar/binaries/jdk-20.0.1/bin/java \
--add-modules=jdk.incubator.vector \
-cp
/home/chegar/git/lucene/lucene/core/build/libs/lucene-core-9.7.0-SNAPSHOT.jar:lucene-9.7.0/modules/*:/home/chegar/git/lucene/lucene/core/build/classes/java/test
\
-Xmx16g -Xms16g \
org.apache.lucene.util.hnsw.KnnGraphTester \
-dim 1536 \
-ndoc 2680961 \
-reindex \
-docs vector_search-open_ai_vectors-vectors.bin \
-maxConn 16 \
-beamWidthIndex 100
WARNING: Using incubator modules: jdk.incubator.vector
creating index in vector_search-open_ai_vectors-vectors.bin-16-100.index
Jun 28, 2023 3:18:08 PM
org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 20; to disable start with
-Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
MS 0 [2023-06-28T14:18:08.783226914Z; main]: initDynamicDefaults
maxThreadCount=4 maxMergeCount=9
IFD 0 [2023-06-28T14:18:08.798094830Z; main]: init: current segments file is
"segments";
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1efee8e7
IFD 0 [2023-06-28T14:18:08.800639373Z; main]: now delete 0 files: []
IFD 0 [2023-06-28T14:18:08.801349082Z; main]: now checkpoint "" [0 segments
; isCommit = false]
IFD 0 [2023-06-28T14:18:08.801461676Z; main]: now delete 0 files: []
IFD 0 [2023-06-28T14:18:08.802987862Z; main]: 0 ms to checkpoint
IW 0 [2023-06-28T14:18:08.803265302Z; main]: init: create=true reader=null
IW 0 [2023-06-28T14:18:08.809406650Z; main]:
dir=MMapDirectory@/home/chegar/git/lucene-vector-bench/vector_search-open_ai_vectors-vectors.bin-16-100.index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@1dd02175
index=
version=9.7.0
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
ramBufferSizeMB=1994.0
maxBufferedDocs=-1
mergedSegmentWarmer=null
delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
commit=null
openMode=CREATE
similarity=org.apache.lucene.search.similarities.BM25Similarity
mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=4, maxMergeCount=9,
ioThrottle=true
codec=Lucene95
infoStream=org.apache.lucene.util.PrintStreamInfoStream
mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10,
maxMergedSegmentMB=5120.0, floorSegmentMB=2.0,
forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0,
maxCFSSegmentSizeMB=8.796093022208E12, noCFSRatio=0.1, deletesPctAllowed=20.0
readerPooling=true
perThreadHardLimitMB=1945
useCompoundFile=false
commitOnClose=true
indexSort=null
checkPendingFlushOnUpdate=true
softDeletesField=null
maxFullFlushMergeWaitMillis=500
leafSorter=null
eventListener=org.apache.lucene.index.IndexWriterEventListener$1@3d3fcdb0
writer=org.apache.lucene.index.IndexWriter@641147d0
IW 0 [2023-06-28T14:18:08.809591811Z; main]:
MMapDirectory.UNMAP_SUPPORTED=true
Jun 28, 2023 3:18:08 PM org.apache.lucene.util.VectorUtilPanamaProvider
<init>
INFO: Java vector incubator API enabled; uses preferredBitSize=512
DWPT 0 [2023-06-28T14:23:17.927393364Z; main]: flush postings as segment _0
numDocs=314897
IW 0 [2023-06-28T14:23:17.928214793Z; main]: 0 ms to write norms
IW 0 [2023-06-28T14:23:17.928486805Z; main]: 0 ms to write docValues
IW 0 [2023-06-28T14:23:17.928593869Z; main]: 0 ms to write points
IW 0 [2023-06-28T14:23:19.282981254Z; main]: 1354 ms to write vectors
IW 0 [2023-06-28T14:23:19.290000600Z; main]: 6 ms to finish stored fields
IW 0 [2023-06-28T14:23:19.290178853Z; main]: 0 ms to write postings and
finish vectors
IW 0 [2023-06-28T14:23:19.290669001Z; main]: 0 ms to write fieldInfos
DWPT 0 [2023-06-28T14:23:19.291053701Z; main]: new segment has 0 deleted docs
DWPT 0 [2023-06-28T14:23:19.291129515Z; main]: new segment has 0
soft-deleted docs
DWPT 0 [2023-06-28T14:23:19.292160606Z; main]: new segment has no vectors;
no norms; no docValues; no prox; freqs
DWPT 0 [2023-06-28T14:23:19.292249403Z; main]:
flushedFiles=[_0_Lucene95HnswVectorsFormat_0.vem, _0.fdm,
_0_Lucene95HnswVectorsFormat_0.vec, _0.fdx, _0_Lucene95HnswVectorsFormat_0.vex,
_0.fdt, _0.fnm]
DWPT 0 [2023-06-28T14:23:19.292320403Z; main]: flushed codec=Lucene95
DWPT 0 [2023-06-28T14:23:19.295665508Z; main]: flushed: segment=_0
ramUsed=1,945.012 MB newFlushedSize=1,863.46 MB docs/MB=168.985
DWPT 0 [2023-06-28T14:23:19.296825017Z; main]: flush time 1370.228388 ms
IW 0 [2023-06-28T14:23:19.297541689Z; main]: publishFlushedSegment
seg-private updates=null
IW 0 [2023-06-28T14:23:19.298158353Z; main]: publishFlushedSegment
_0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295,
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux,
os.arch=amd64, os.version=6.2.0-23-generic,
lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
:id=9b08nbm1nw553b43pa9kzvach
BD 0 [2023-06-28T14:23:19.299549573Z; main]: finished packet delGen=1 now
completedDelGen=1
IW 0 [2023-06-28T14:23:19.301085879Z; main]: publish sets newSegment
delGen=1 seg=_0(9.7.0):C314897:[diagnostics={source=flush,
timestamp=1687962199295, java.runtime.version=20.0.1+9-29, java.vendor=Oracle
Corporation, os=Linux, os.arch=amd64, os.version=6.2.0-23-generic,
lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
:id=9b08nbm1nw553b43pa9kzvach
IFD 0 [2023-06-28T14:23:19.301281180Z; main]: now checkpoint
"_0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295,
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux,
os.arch=amd64, os.version=6.2.0-23-generic,
lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
:id=9b08nbm1nw553b43pa9kzvaci" [1 segments ; isCommit = false]
IFD 0 [2023-06-28T14:23:19.301666023Z; main]: now delete 0 files: []
IFD 0 [2023-06-28T14:23:19.301718781Z; main]: 0 ms to checkpoint
MP 0 [2023-06-28T14:23:19.303689024Z; main]:
seg=_0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295,
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux,
os.arch=amd64, os.version=6.2.0-23-generic,
lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
:id=9b08nbm1nw553b43pa9kzvaci size=1863.460 MB
MP 0 [2023-06-28T14:23:19.303936133Z; main]: findMerges: 1 segments
MP ....
Indexed 2680961 documents in 2633s
```
</details>
Full output from the test runs can be see here
https://gist.github.com/ChrisHegarty/ef008da196624c1a3fe46578ee3a0a6c.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]