Greetings!
I am doing some benchmarking on MariaDB and got to the deep-image-96-angular
dataset from https://github.com/erikbern/ann-benchmarks.
This dataset has 9.99 million vectors of dimension 96. I am running the vector
index create with M=24,ef_construction = 200 (similar parameters as pgvector in
ANN-Benchmark). I wanted to check and confirm a couple of observations based on
the progress of the index create at present -
MariaDB [(none)]> show processlist;
+----+--------+-----------+------+---------+------+-------------------+----------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info
| Progress |
+----+--------+-----------+------+---------+------+-------------------+----------------------------------------------------------+----------+
| 4 | ubuntu | localhost | ann | Query | 4092 | copy to tmp table | ALTER
TABLE t1 ADD VECTOR INDEX (v) M=24 DISTANCE=cosine | 9.636 |
MariaDB [(none)]> select trx_rows_modified from information_schema.innodb_trx;
+-------------------+
| trx_rows_modified |
+-------------------+
| 125215100 |
+-------------------+
1 row in set (0.004 sec)
-rw-rw---- 1 ubuntu ubuntu 40420507648 Feb 19 06:20 undo002
Can you please confirm that the undo log reaching 40GB+ is expected for a
progress of 9%? Did I miss something? I have configured the hnsw cache size and
buffer pool to 16GB each. Should I increase them further? I want to benchmark
for "hnsw index fits in memory" use-case.
MariaDB [(none)]> show variables like '%hnsw%';
+------------------------+-------------+
| Variable_name | Value |
+------------------------+-------------+
| mhnsw_default_distance | euclidean |
| mhnsw_default_m | 24 |
| mhnsw_ef_search | 200 |
| mhnsw_max_cache_size | 17179869184 |
+------------------------+-------------+
4 rows in set (0.001 sec)
Just to let know, I am working on the vector plugin for MySQL (MyVector)
Thanks!
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]