Greetings!

I am doing some benchmarking on MariaDB and got to the deep-image-96-angular 
dataset from https://github.com/erikbern/ann-benchmarks.

This dataset has 9.99 million vectors of dimension 96. I am running the vector 
index create with M=24,ef_construction = 200 (similar parameters as pgvector in 
ANN-Benchmark). I wanted to check and confirm a couple of observations based on 
 the progress of the index create at present -

MariaDB [(none)]> show processlist;
+----+--------+-----------+------+---------+------+-------------------+----------------------------------------------------------+----------+
| Id | User   | Host      | db   | Command | Time | State             | Info    
                                                 | Progress |
+----+--------+-----------+------+---------+------+-------------------+----------------------------------------------------------+----------+
|  4 | ubuntu | localhost | ann  | Query   | 4092 | copy to tmp table | ALTER 
TABLE t1 ADD VECTOR INDEX (v) M=24 DISTANCE=cosine |    9.636 |

MariaDB [(none)]> select trx_rows_modified from information_schema.innodb_trx;
+-------------------+
| trx_rows_modified |
+-------------------+
|         125215100 |
+-------------------+
1 row in set (0.004 sec)

-rw-rw---- 1 ubuntu ubuntu 40420507648 Feb 19 06:20 undo002

Can you please confirm that the undo log reaching 40GB+ is expected for a 
progress of 9%? Did I miss something? I have configured the hnsw cache size and 
buffer pool to 16GB each. Should I increase them further? I want to benchmark 
for "hnsw index fits in memory" use-case.

MariaDB [(none)]> show variables like '%hnsw%';
+------------------------+-------------+
| Variable_name          | Value       |
+------------------------+-------------+
| mhnsw_default_distance | euclidean   |
| mhnsw_default_m        | 24          |
| mhnsw_ef_search        | 200         |
| mhnsw_max_cache_size   | 17179869184 |
+------------------------+-------------+
4 rows in set (0.001 sec)

Just to let know, I am working on the vector plugin for MySQL (MyVector)

Thanks!
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to