Can we import an HNSW graph into lucene index ?

2024-06-14 Thread Anand Kotriwal
Hi all,

We extensively use Lucene and HNSW graph search capability for ANN
searches.
One issue we have been running into is long build times with higher
dimensional vectors. To address this, we are exploring ways where we can
build the hnsw index on the GPU and merge it into an existing Lucene index
to serve queries. For example, Nvidia's cuvs library supports building a
CAGRA <https://arxiv.org/pdf/2308.15136> index and  transforming it into a
hnswlib graph.

My idea is - once the hnswgraph is built on the GPUs, we can import the
graph. We need the graph vertices and their connections. We can then write
it to a lucene compatible segment file format. We also map the docids to
embeddings and update the fieldinfos.

I would like feedback from the community on whether this sounds feasible
and any implementation pointers you might have.


Thanks,
Anand Kotriwal


Add custom merge policy to Lucene sandbox ?

2021-11-09 Thread Anand Kotriwal
Hi,

We (at Amazon product search) have customized TieredMergePolicy to

1. make it easy to configure merge-on-commit merges

2. to ensure no merged segment is accidentally too big a percentage of the
total index thus harming effective within-query concurrency and long-pole
query latencies.

If this community feels this is useful, I am happy to open a jira and
contribute a patch that will add it to Lucene's sandbox module.


Thanks,

Anand


Re: Faster advance on Vector Values

2021-01-18 Thread Anand Kotriwal
Sure ! created  https://issues.apache.org/jira/browse/LUCENE-9674 .
Also attached a PR to the above issue.

Thanks,
Anand

On Mon, Jan 18, 2021 at 6:14 AM Michael Sokolov  wrote:

> Thanks for the suggestion! This will be a nice improvement for use
> cases wanting to retrieve vectors for a sparse set of documents, eg
> when incorporating a vector-based score as a scoring signal. Would you
> mind opening an issue, Anand?
>
> On Sat, Jan 16, 2021 at 9:07 AM Anand Kotriwal 
> wrote:
> >
> > Hi ,
> >
> > Our team is using the recently introduced Lucene90Codec support for
> vectors. We have a use case to quickly scan a segment for documents having
> vectors.  While implementing it, we noticed that the advance function in
> the class Lucene90VectorReader does a linear search for the target document.
> > I have a proposal to make it faster - We can implement a binary search
> over the "ordToDoc" array which will make the advance operation take
> logarithmic time to search.
> >
> > I would like to seek ideas, suggestions from the community. I have an
> implementation on my private fork that implements the above idea. I can
> open a PR if the idea sounds reasonable.
> >
> > Thanks !
> > Anand Kotriwal
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Faster advance on Vector Values

2021-01-16 Thread Anand Kotriwal
Hi ,

Our team is using the recently introduced Lucene90Codec support for
vectors. We have a use case to quickly scan a segment for documents having
vectors.  While implementing it, we noticed that the advance function in
the class Lucene90VectorReader does a linear search for the target document.
I have a proposal to make it faster - We can implement a binary search over
the "ordToDoc" array which will make the advance operation take logarithmic
time to search.

I would like to seek ideas, suggestions from the community. I have an
implementation on my private fork that implements the above idea. I can
open a PR if the idea sounds reasonable.

Thanks !
Anand Kotriwal