Hi, shiyer22---,

On Feb 19, shiyer22--- via discuss wrote:
> Greetings!
> 
> I am doing some benchmarking on MariaDB and got to the
> deep-image-96-angular dataset from
> https://github.com/erikbern/ann-benchmarks.
> 
> This dataset has 9.99 million vectors of dimension 96. I am running
> the vector index create with M=24,ef_construction = 200 (similar
> parameters as pgvector in ANN-Benchmark). I wanted to check and
> confirm a couple of observations based on  the progress of the index
> create at present -

ef_construction is not configurable in MariaDB, ef_search is, but it has
no effect on the index construction.

> 
> MariaDB [(none)]> show processlist;
> +----+--------+-----------+------+---------+------+-------------------+----------------------------------------------------------+----------+
> | Id | User   | Host      | db   | Command | Time | State             | Info  
>                                                    | Progress |
> +----+--------+-----------+------+---------+------+-------------------+----------------------------------------------------------+----------+
> |  4 | ubuntu | localhost | ann  | Query   | 4092 | copy to tmp table | ALTER 
> TABLE t1 ADD VECTOR INDEX (v) M=24 DISTANCE=cosine |    9.636 |
> 
> MariaDB [(none)]> select trx_rows_modified from information_schema.innodb_trx;
> +-------------------+
> | trx_rows_modified |
> +-------------------+
> |         125215100 |
> +-------------------+
> 1 row in set (0.004 sec)
> 
> -rw-rw---- 1 ubuntu ubuntu 40420507648 Feb 19 06:20 undo002
> 
> Can you please confirm that the undo log reaching 40GB+ is expected
> for a progress of 9%? Did I miss something? I have configured the hnsw
> cache size and buffer pool to 16GB each. Should I increase them
> further? I want to benchmark for "hnsw index fits in memory" use-case.

with the index being ~6GB range, you shouldn't need notably more than
than for mhnsw_max_cache_size. If you'll watch the mariadbd memory usage
in top, you'll likely see that it'll grow by 6GB and then will stop,
even if mhnsw_max_cache_size will be much larger than that.

There're no optimizations for bulk index construction at the moment, so
your ALTER TABLE means many millions of updates internally, all done
within one transaction. It seems to be possible for the undo log to grow
that large.

> MariaDB [(none)]> show variables like '%hnsw%';
> +------------------------+-------------+
> | Variable_name          | Value       |
> +------------------------+-------------+
> | mhnsw_default_distance | euclidean   |
> | mhnsw_default_m        | 24          |
> | mhnsw_ef_search        | 200         |
> | mhnsw_max_cache_size   | 17179869184 |
> +------------------------+-------------+
> 4 rows in set (0.001 sec)
> 
> Just to let know, I am working on the vector plugin for MySQL (MyVector)
> 
Regards,
Sergei
Chief Architect, MariaDB Server
and [email protected]
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to