alessandrobenedetti commented on a change in pull request #476: URL: https://github.com/apache/solr/pull/476#discussion_r787636712
########## File path: solr/solr-ref-guide/src/neural-search.adoc ########## @@ -0,0 +1,324 @@ += Neural Search +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Search comprises of performing four primary steps: + +* generate a representation of the query that specifies the information need +* generate a representation of the document that captures the information contained +* match the query and the document representations from the corpus of information +* assign a score to each matched document in order to establish a meaningful document ranking by relevance in the results + +The Apache Solr *Neural Search* module adds support for neural networks based techniques that can improve various aspects of search. + +These techniques can be differentiated based on whether they affect the query representation, the document representation, or the estimation of the relevance score. + +Neural Search is an industry derivation from the academic field of https://www.microsoft.com/en-us/research/uploads/prod/2017/06/fntir2018-neuralir-mitra.pdf[Neural information Retrieval]. + +== Neural Search Concepts + +=== Deep Learning + +More and more frequently, we hear about how Artificial Intelligence (AI) permeates every aspect of our lives. + +When we talk about AI we are referring to a superset of techniques that enable machines to learn and show intelligence like humans. + +Since computing power has strongly and steadily advanced in the recent past, AI has seen a resurgence lately and it is now used in many domains, including software engineering and Information Retrieval (the science that regulates Search Engines and similar systems). + +In particular the advent of https://en.wikipedia.org/wiki/Deep_learning[Deep Learning] introduced the use of deep neural networks to solve complex problems that could not be solved simply by an algorithm. + +Deep Learning can be used to produce a vector representation of both the query and the documents in a corpus of information. + +=== Dense Vector Representation +A Dense vector describes information as an array of elements, each of them explicitly defined. + +Various Deep Learning models such as https://en.wikipedia.org/wiki/BERT_(language_model)[BERT] are able to encode textual information as dense vectors, to be used for Dense Retrieval strategies. + +For additional information you can refer to this https://sease.io/2021/12/using-bert-to-improve-search-relevance.html[blog post]. + +=== Dense Retrieval +Given a dense vector `v` that models the information need, the easiest approach for providing dense vector retrieval would be to calculate the distance(euclidean, dot product, etc.) between `v` and each vector `d` that represents a document in the corpus of information. + +This approach is quite expensive, so many approximate strategies are currently under active research. + +The strategy implemented in Apache Lucene and used by Apache Solr is based on Navigable Small-world graph. + +It provides efficient approximate nearest neighbor search for high dimensional vectors. + +See https://doi.org/10.1016/j.is.2013.10.006[Approximate nearest neighbor algorithm based on navigable small world graphs [2014]] and https://arxiv.org/abs/1603.09320[this paper [2018]] for details. + + +== Index Time +This is the list of Apache Solr field types designed to support Neural Search: + +=== DenseVectorField +The Dense Vector field gives the possibility of indexing and searching dense vectors of float elements. + +e.g. + Review comment: Makes sense! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org