[MariaDB discuss] Re: Inquiry Regarding Slow Index Creation with MariaDB 11.7 RC (SIFT1M)

Sergei Golubchik via discuss Sun, 12 Jan 2025 01:19:53 -0800

Hi, kase,

First: please send questions like this to [email protected].
it's a public mailing list dedicated to MariaDB and how to use it better.
I am subscribed, so I'll see you mail there, and you may be sure I will,
because it won't be accidentally catched by my spam filter, or sorted out in
some obscure folder. Furthermore other subscribers will see your question and
could reply if I will be not available (e.g. I could be travelling).

Thank you.

On Jan 12, kase jojo wrote:
> Dear Sir
> 
> I hope you are doing well. I recently read your blog 
> https://mariadb.com/resources/blog/how-fast-is-mariadb-vector/ and was
> particularly impressed by the efficient index-building times demonstrated
> in your tests. However, when I attempted similar experiments on MariaDB
> 11.7 RC, using the SIFT1M dataset and building an index with M=32, I
> noticed that the index creation process was much slower than expected.
> 
> In my case, I have been inserting data into the table gradually, and I
> wanted to inquire about the process you mentioned in your blog: "We
> build the index slowly as we insert the data row by row." Could you
> clarify how this process works? Specifically, I am curious to know if
> there are any steps or techniques you followed to ensure such
> efficient index construction, as it seems to differ from my
> experience.

To get faster inserts you need to use a smaller M. Try M=8, for example.
It will reduce the recall, and you'll need to increase ef_search to
compensate for that.

Look at it this way: MariaDB needs to do the work to get good recall. It
has to do it *somewhere*. But you can decide where to do it. MariaDB can
spend more time doing inserts, build a better index, and search in it
quickly. Or it can insert faster, the index will be of worse quality,
and it'll need to spend more time searching in it.

It's a trade-off and you decide what is more important for your
application.

See 
https://github.com/vuvova/ann-benchmarks/blob/dev/ann_benchmarks/algorithms/mariadb/config.yml
For faster inserts you can use M=8 and ef_search=800.

Of course, always make sure that the mhnsw_max_cache_size is big enough
to hold your entire data set. SIFT1M is rather small, 300M should likely
be enough. I'd use at least mhnsw_max_cache_size=1G to be safe, it's an
upper limit, MariaDB won't use more memory than necessary anyway.

Regards,
Sergei
Chief Architect, MariaDB Server
and [email protected]
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[MariaDB discuss] Re: Inquiry Regarding Slow Index Creation with MariaDB 11.7 RC (SIFT1M)

Reply via email to