magibney commented on a change in pull request #476:
URL: https://github.com/apache/solr/pull/476#discussion_r789773406



##########
File path: solr/solr-ref-guide/src/dense-vector-search.adoc
##########
@@ -0,0 +1,308 @@
+= Dense Vector Search
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+The Apache Solr *Dense Vector Search* module adds support for indexing and 
searching dense numerical vectors.
+
+https://en.wikipedia.org/wiki/Deep_learning[Deep learning] can be used to 
produce a vector representation of both the query and the documents in a corpus 
of information.
+
+These neural network-based techniques are usually referred to as neural 
search, an industry derivation from the academic field of 
https://www.microsoft.com/en-us/research/uploads/prod/2017/06/fntir2018-neuralir-mitra.pdf[Neural
 information Retrieval].
+
+== Important Concepts
+
+=== Dense Vector Representation 
+A dense vector describes information as an array of elements, each of them 
explicitly defined.

Review comment:
       Thanks, Alessandro!
   
   I think maybe we should go either/or with reference to "Bag of words" vs. 
"Inverted index". In the text I drafted I referred to an "inverted index" as an 
example of sparse vector representation because I figured it would likely be 
the most familiar reference point for the "Solr refguide" audience.
   
   I note that Wikipedia has two separate entries: [Bag-of-words 
model](https://en.wikipedia.org/wiki/Bag-of-words_model) and [Inverted 
index](https://en.wikipedia.org/wiki/Inverted_index), neither of which refers 
directly to the other, but both of which include a "See also" reference to the 
page for [Vector space 
model](https://en.wikipedia.org/wiki/Vector_space_model). The wikipedia pages 
are explicit about one difference, with the second clearly describing how most 
people use Solr:
   1. "The Bag-of-words model is an orderless document representation — only 
the counts of words matter."
   2. "A word-level inverted index (or full inverted index or inverted list) 
additionally contains the positions of each word within a document.[2] The 
latter form offers more functionality (like phrase searches)"
   
   Either model is arguably a valid example of a "sparse vector" 
representation. My practical concern is that in this context, it would be easy 
to misinterpret the reference to "Bag of words" as an implicit reference to 
"_non-vector_ search in Solr", whatever the intention of using "Bag-of-words" 
as a point of reference. I think that interpretation would be misleading, and 
could obscure the true distinction between "dense vector search" and classic 
TF-IDF/BM25/phrase-boosted/etc. search in Solr, and the appropriate use cases 
for each approach.
   
   (minor point: I'd also be inclined to drop the separate heading for "Sparse 
Vector Representation" -- I was purposefully vague in saying `can be considered 
to model text as a "sparse" vector`; perhaps I'm mistaken, but my impression is 
that "sparse" retrieval models (as _accurate_ as that characterization may be), 
are most often characterized as "sparse" as a foil to explain why "dense 
retrieval" is characterized as "dense". Indeed, that's what we're doing here!)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to