Re: Quantization for vector search

Michael Wechner Sat, 04 Nov 2023 16:09:03 -0700

Hi Ben

Am 04.11.23 um 14:41 schrieb Benjamin Trent:

Hey Michael,


In short, it's being worked on :).


cool, thanks!


Could you point to the LinkedIN post?


https://www.linkedin.com/posts/reimersnils_%3F%3F%3F%3F%3F%3F-%3F%3F%3F%3F%3F-%3F%3F-%3F%3F%3F-%3F%3F%3F-activity-7125863813064581120-bO6N/?utm_source=share&utm_medium=member_desktop

Is Nils talking about the model output quantized output or that theirdefault output is easily compressible because of how the embeddingsare built?

it is not clear to me from the post, but maybe you understand the post(link above) better

I have done a bad job of linking back against that original issue thework that is being done:
The initial implementation of adding int8 (really, its int7 because ofsigned bytes...): https://github.com/apache/lucene/pull/12582
A significant refactor to make adding new quantized storage easier:https://github.com/apache/lucene/pull/12729
Lucene already supports folks just giving it signed `byte[]` values.But this only gets so far. The additional work should get Lucenefurther down the road towards better lossy-compression for vectors.


very cool, thank you!

All the best

Michael


Thanks!

Ben

On Sat, Nov 4, 2023 at 4:07 AM Michael Wechner<michael.wech...@wyona.com> wrote:


    Hi

    If I understand correctly some devs are working on introducing
    quantization for vector search or at least considering it

    https://github.com/apache/lucene/issues/12497

    Just being curious what is the status on this resp. is somebody
    working on this actively?


    It came to my mind, because Cohere recently made their new
    embedding model "Embed v3" available

    https://txt.cohere.com/introducing-embed-v3/

    whereas IIUC, Cohere intends to also provide embeddings optimized
    for compression soon.

    Nils Reimers recently wrote on LinkedIn:

    ----
    "... what we see on the BioASQ dataset:
    4x - 99.99% search quality
    16x - 99.9% search quality
    32x - 95% search quality
    64x - 85% search quality
    But it requires that the respective vector DB supports these
    modes, what we currently work on with partners."
    ----

    This might be interesting for Lucene as well, resp. I am not sure
    whether somebody at Lucene is already working on something like this.

    Thanks

    Michael

Re: Quantization for vector search

Reply via email to