Thank you for sharing, and it is exciting to see how advanced your thinking is.
Yes, the idea is the same idea with an extra step that Rene also seems to elude to here<https://www.slideshare.net/RenKriegler/a-picture-is-worth-a-thousand-words-93680178> in his comment. Instead of using these types of techniques only at the scoring time, we can use them for information retrieval from the index. This will allow us to, for example, index millions of images and quickly and efficiently lookup the most relevant images. I would love to hear yours and others thoughts on this. I think there is a great opportunity here, but it would need a lot of input and guidance from the experts here. Thank you, Pedram From: David Smiley <[email protected]> Sent: Friday, March 1, 2019 12:11 PM To: [email protected] Cc: Radhakrishnan Srikanth (SRIKANTH) <[email protected]>; Arun Sacheti <[email protected]>; Kun Wu <[email protected]>; Junhua Wang <[email protected]>; Jason Li <[email protected]>; René Kriegler <[email protected]> Subject: Re: Vector based store and ANN This presentation by Rene Kriegler at Haystack 2018 was a real eye-opener to me on this subject: https://haystackconf.com/2018/relevance-scoring/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhaystackconf.com%2F2018%2Frelevance-scoring%2F&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908753995&sdata=sD7ZF4x1iXIjJ1GDAwlc0lUWkTpkarEkd2SAXI5qev0%3D&reserved=0>. Uses random-projection forests which is a very clever technique. (CC'ing Rene) ~ David On Fri, Mar 1, 2019 at 1:30 PM Pedram Rezaei <[email protected]<mailto:[email protected]>> wrote: Hi there, Thank you for the responses. Yes, we have a few scenarios in mind that can benefit from a vector-based index optimized for ANN searches: * Advanced, optimized, and high precision visual search: For this to work, we would convert the images to their vector representations and then use algorithms and implementations such as SPTAG<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2FSPTAG&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908763999&sdata=pOKRUksZ4sTsgtbE7eW88kiFLovTAQJRiPz%2F2LQXvCg%3D&reserved=0>, FAISS<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffacebookresearch%2Ffaiss&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908763999&sdata=if7uUn9OysK1c%2FDh6qb7hLcWGuaDjU9W5gKF2JQzOrk%3D&reserved=0>, and HNSWLIB<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnmslib%2Fhnswlib&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908774009&sdata=%2BFHGSAWnlsfe%2BhLiimjz1T%2B3YMH90pO%2FXSi15Eszzmg%3D&reserved=0>. * Advanced document retrieval: Using a numerical vector representation of a document, we could improve the search result * Nearest neighbor queries: discovering the nearest neighbors to a given query could also benefit from these ANN algorithms (although doesn’t necessarily need the vector based index) I would be grateful to hear your thoughts and whether the community is open to a conversation on this topic with my team. Thanks, Pedram From: J. Delgado <[email protected]<mailto:[email protected]>> Sent: Thursday, February 28, 2019 7:38 AM To: [email protected]<mailto:[email protected]> Cc: Radhakrishnan Srikanth (SRIKANTH) <[email protected]<mailto:[email protected]>> Subject: Re: Vector based store and ANN Lucene’s scoring function (which I believe is okapi BM25 https://en.m.wikipedia.org/wiki/Okapi_BM25<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.m.wikipedia.org%2Fwiki%2FOkapi_BM25&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908774009&sdata=UsNUOOH88fog95sKTM%2FkgjYak5%2Bp%2F%2BWaMZYsMAgQ5MA%3D&reserved=0>) is a kind of nearest neighbor using the TF-IDF vector representation of documents and query. Are you interested in ANN to be applied to a different kind of vector representation, say for example Doc2Vec? On Thu, Feb 28, 2019 at 5:59 AM Adrien Grand <[email protected]<mailto:[email protected]>> wrote: Hi Pedram, We don't have much in this area, but I'm hearing increasing interest so it'd be nice to get better there! The closest that we have is this class that can search for nearest neighbors for a vector of up to 8 dimensions: https://github.com/apache/lucene-solr/blob/master/lucene/sandbox/src/java/org/apache/lucene/document/FloatPointNearestNeighbor.java<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Flucene-solr%2Fblob%2Fmaster%2Flucene%2Fsandbox%2Fsrc%2Fjava%2Forg%2Fapache%2Flucene%2Fdocument%2FFloatPointNearestNeighbor.java&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908784014&sdata=XrrdrkhWOHp8%2FYLGowJK5%2B3km0f04Nr6BxPFxbiRQdM%3D&reserved=0>. On Wed, Feb 27, 2019 at 1:44 AM Pedram Rezaei <[email protected]<mailto:[email protected]>> wrote: > > Hi there, > > > > Is there a way to store numerical vectors (vector based index) and perform > search based on Approximate Nearest Neighbor class of algorithms in Lucene? > > > > If not, has there been any interests in the topic so far? > > > > Thanks, > > > > Pedram -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected]<mailto:[email protected]> For additional commands, e-mail: [email protected]<mailto:[email protected]> -- Lucene/Solr Search Committer (PMC), Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flinkedin.com%2Fin%2Fdavidwsmiley&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908794023&sdata=rmLY5WMZtQCZ99yumefC%2BQoglS4JeONfLShsj5qaWkU%3D&reserved=0> | Book: http://www.solrenterprisesearchserver.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.solrenterprisesearchserver.com&data=02%7C01%7Cpedramr%40microsoft.com%7Cd4ac932962eb42ef813e08d69e8216cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636870678908794023&sdata=DZslOJYShNLZ9GOSpstuq85F%2FwVrFtnZIVDiXe%2F%2B0fw%3D&reserved=0>
