Hi, kase, First: please send questions like this to [email protected]. it's a public mailing list dedicated to MariaDB and how to use it better. I am subscribed, so I'll see you mail there, and you may be sure I will, because it won't be accidentally catched by my spam filter, or sorted out in some obscure folder. Furthermore other subscribers will see your question and could reply if I will be not available (e.g. I could be travelling).
Thank you. On Jan 12, kase jojo wrote: > Dear Sir > > I hope you are doing well. I recently read your blog > https://mariadb.com/resources/blog/how-fast-is-mariadb-vector/ and was > particularly impressed by the efficient index-building times demonstrated > in your tests. However, when I attempted similar experiments on MariaDB > 11.7 RC, using the SIFT1M dataset and building an index with M=32, I > noticed that the index creation process was much slower than expected. > > In my case, I have been inserting data into the table gradually, and I > wanted to inquire about the process you mentioned in your blog: "We > build the index slowly as we insert the data row by row." Could you > clarify how this process works? Specifically, I am curious to know if > there are any steps or techniques you followed to ensure such > efficient index construction, as it seems to differ from my > experience. To get faster inserts you need to use a smaller M. Try M=8, for example. It will reduce the recall, and you'll need to increase ef_search to compensate for that. Look at it this way: MariaDB needs to do the work to get good recall. It has to do it *somewhere*. But you can decide where to do it. MariaDB can spend more time doing inserts, build a better index, and search in it quickly. Or it can insert faster, the index will be of worse quality, and it'll need to spend more time searching in it. It's a trade-off and you decide what is more important for your application. See https://github.com/vuvova/ann-benchmarks/blob/dev/ann_benchmarks/algorithms/mariadb/config.yml For faster inserts you can use M=8 and ef_search=800. Of course, always make sure that the mhnsw_max_cache_size is big enough to hold your entire data set. SIFT1M is rather small, 300M should likely be enough. I'd use at least mhnsw_max_cache_size=1G to be safe, it's an upper limit, MariaDB won't use more memory than necessary anyway. Regards, Sergei Chief Architect, MariaDB Server and [email protected] _______________________________________________ discuss mailing list -- [email protected] To unsubscribe send an email to [email protected]
