alessandrobenedetti commented on a change in pull request #12:
URL: https://github.com/apache/solr/pull/12#discussion_r593042789



##########
File path: solr/solr-ref-guide/src/morelikethis.adoc
##########
@@ -16,97 +16,624 @@
 // specific language governing permissions and limitations
 // under the License.
 
-The `MoreLikeThis` search component enables users to query for documents 
similar to a document in their result list.
+MoreLikeThis enables queries for documents similar to a document in their 
result list.
 
 It does this by using terms from the original document to find similar 
documents in the index.
 
-There are three ways to use MoreLikeThis. The first, and most common, is to 
use it as a request handler. In this case, you would send text to the 
MoreLikeThis request handler as needed (as in when a user clicked on a "similar 
documents" link).
+There are several ways to use MoreLikeThis.
+The first, and most common, is to use it as a request handler.
+In this case, you would send text to the MoreLikeThis request handler as 
needed (as in when a user clicked on a "similar documents" link).
 
-The second is to use it as a search component. This is less desirable since it 
performs the MoreLikeThis analysis on every document returned. This may slow 
search results.
+The second is to use it as a search component.
+This is less desirable since it performs the MoreLikeThis analysis on every 
document that matches a user query. This may slow search results.
 
-The final approach is to use it as a request handler but with externally 
supplied text. This case, also referred to as the MoreLikeThisHandler, will 
supply information about similar documents in the index based on the text of 
the input document.
+Another approach is to use it as a request handler but with externally 
supplied text.
+This case, also referred to as the MoreLikeThisHandler, will supply 
information about similar documents in the index based on the text of the input 
document.
+
+Finally, the MLT query parser can be used.
+This operates in much the same way as the request handler but since it is a 
query parser it can be used in filter queries, boost queries, etc., and results 
can be paginated or highlighted as needed.
 
 == How MoreLikeThis Works
 
-`MoreLikeThis` constructs a Lucene query based on terms in a document. It does 
this by pulling terms from the defined list of fields ( see the `mlt.fl` 
parameter, below). For best results, the fields should have stored term vectors 
in `schema.xml`. For example:
+`MoreLikeThis` constructs a Lucene query based on terms in a document.
+It does this by pulling terms from the list of fields provided with the 
request.
 
-[source,xml]
-----
-<field name="cat" ... termVectors="true" />
-----
+For best results, the fields should have stored term vectors 
(`termVectors=true`), which can be <<defining-fields.adoc#,configured in the 
schema>>.
+If term vectors are not stored, MoreLikeThis can generate terms from stored 
fields.
+The field used for the `uniqueKey` must also be stored in order for 
MoreLikeThis to work properly.
+
+Terms from the original document are filtered using thresholds defined with 
the MoreLikeThis parameters.
+Once the terms have been selected, a query is run with any other query 
parameters as appropriate and a new document set is returned.
 
-If term vectors are not stored, `MoreLikeThis` will generate terms from stored 
fields. A `uniqueKey` must also be stored in order for MoreLikeThis to work 
properly.
+== MoreLikeThis Handler and Component
 
-The next phase filters terms from the original document using thresholds 
defined with the MoreLikeThis parameters. Finally, a query is run with these 
terms, and any other query parameters that have been defined (see the `mlt.qf` 
parameter, below) and a new document set is returned.
+The MoreLikeThis request handler and search component share several 
parameters, but also have some key differences in response and operation, as 
described below.
 
-== Common Parameters for MoreLikeThis
+=== Common Handler and Component Parameters
 
-The table below summarizes the `MoreLikeThis` parameters supported by 
Lucene/Solr. These parameters can be used with any of the three possible 
MoreLikeThis approaches.
+The list below summarizes the `MoreLikeThis` parameters supported by Solr.
+These parameters can be used with the MoreLikeThis search component or request 
handler.
 
 `mlt.fl`::
-Specifies the fields to use for similarity. If possible, these should have 
stored `termVectors`.
++
+[%autowidth,frame=none]
+|===
+s|Required |Default: none
+|===
++
+Specifies the fields to use for similarity.
+A list of fields can be provided separated by commas.
+If possible, the fields should have stored `termVectors`.
 
 `mlt.mintf`::
-Specifies the Minimum Term Frequency, the frequency below which terms will be 
ignored in the source document.
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `2`
+|===
++
+Specifies the minimum frequency below which terms will be ignored in the 
source document.
 
 `mlt.mindf`::
-Specifies the Minimum Document Frequency, the frequency at which words will be 
ignored which do not occur in at least this many documents.
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `5`
+|===
++
+Specifies the minimum frequency below which terms will be ignored which do not 
occur in at least this many documents.
 
 `mlt.maxdf`::
-Specifies the Maximum Document Frequency, the frequency at which words will be 
ignored which occur in more than this many documents.
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Specifies the maximum frequency above which terms will be ignored which occur 
in more than this many documents.
 
 `mlt.maxdfpct`::
-Specifies the Maximum Document Frequency using a relative ratio to the number 
of documents in the index. The argument must be an integer between 0 and 100. 
For example 75 means the word will be ignored if it occurs in more than 75 
percent of the documents in the index.
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Specifies the maximum document frequency using a ratio relative to the number 
of documents in the index.
+The value provided must be an integer between `0` and `100`.
+For example, `mlt.maxdfpct=75` means the word will be ignored if it occurs in 
more than 75 percent of the documents in the index.
 
 `mlt.minwl`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
 Sets the minimum word length below which words will be ignored.
 
 `mlt.maxwl`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
 Sets the maximum word length above which words will be ignored.
 
 `mlt.maxqt`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `25`
+|===
++
 Sets the maximum number of query terms that will be included in any generated 
query.
 
 `mlt.maxntp`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `5000`
+|===
++
 Sets the maximum number of tokens to parse in each example document field that 
is not stored with TermVector support.
 
 `mlt.boost`::
-Specifies if the query will be boosted by the interesting term relevance. It 
can be either "true" or "false".
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `false`
+|===
++
+Specifies if the query will be boosted by the interesting term relevance.
+Possible values are `true` or `false`.
 
 `mlt.qf`::
-Query fields and their boosts using the same format as that used by the 
<<the-dismax-query-parser.adoc#,DisMax Query Parser>>. These fields must also 
be specified in `mlt.fl`.
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Query fields and their boosts using the same format used by the 
<<the-dismax-query-parser.adoc#,DisMax Query Parser>>.
+These fields must also be specified in `mlt.fl`.
+
+`mlt.interestingTerms`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `none`
+|===
++
+Adds a section in the response that shows the top terms (based on TF/IDF) used 
for the MoreLikeThis query.
+It supports three possible values:
++
+* `list` lists the terms.
+* `none` lists no terms.
+* `details` lists the terms along with the boost value used for each term.
+Unless `mlt.boost=true`, all terms will have `boost=1.0`.
+
+=== MoreLikeThis Request Handler
+
+==== Request Handler Configuration
+
+The MoreLikeThis request handler is not configured by default and needs to be 
set up before using it.
+You can do this by manually editing `solrconfig.xml` or with the Config API:
+
+[.dynamic-tabs]
+--
+[example.tab-pane#manualconfig]
+====
+[.tab-label]*Manual Configuration*
+
+[source,xml]
+----
+<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
+  <str name="mlt.fl">body</str>
+</requestHandler>
+----
+====
+
+[example.tab-pane#configapi]
+====
+[.tab-label]*Config API*
+
+[source,bash]
+----
+curl -X POST -H 'Content-type:application/json' -d {
+  "add-requesthandler": {
+    "name": "/mlt",
+    "class": "solr.MoreLikeThisHandler",
+    "defaults": {"mlt.fl": "body"}
+  }
+} http://localhost:8983/solr/<collection>/config
+----
+====
+--
+
+Both of the above examples set the `mlt.fl` parameter to "body" for the 
request handler.
+This means that all requests to the handler will use that value for the 
parameter unless specifically overridden in an individual request.
+
+For more about request handler configuration in general, see the section 
<<requesthandlers-and-searchcomponents-in-solrconfig.adoc#default-components,RequestHandlers
 and SearchComponents in Solrconfig>>.
+
+==== Request Handler Parameters
+
+The MoreLikeThis request handler supports the following parameters in addition 
to the <<Common Handler and Component Parameters,common parameters>> above.
+It supports faceting, paging, and filtering using common query parameters, but 
does not work well with alternate query parsers.
+
+`mlt.match.include`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `false`
+|===
++
+Specifies if the response should include the matched document.
+If set to `false`, the response will look like a normal select response.
+
+`mlt.match.offset`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Specifies an offset into the main query search results to locate the document 
on which the MoreLikeThis query should operate.
+By default, the query operates on the first result for the `q` parameter.
+
+==== Request Handler Query and Response
+
+Queries to the MoreLikeThis request handler use the name defined when it was 
configured (`/mlt` in the above example).
+
+The following example query uses a document (`q=id:0553573403`) found in 
Solr's example document set (`./example/exampledocs`), and asks that the author 
field be used to find similar documents (`mlt.fl=author`).
+
+[source,bash]
+http://localhost:8983/solr/gettingstarted/mlt?mlt.fl=author&mlt.interestingTerms=details&mlt.match.include=true&mlt.mindf=0&mlt.mintf=0&q=id%3A0553573403
+
+This query also requests interesting terms with their boosts 
(`mlt.interestingTerms=details`) and that the original document also be 
returned (`mlt.match.include=true`).
+The minimum term frequency and minimum word document frequency are set to `0`.
+
+The response will include a section `match`, which includes the original 
document.
+The `response` section includes the similar documents.
+Finally, the `interestingTerms` section shows the terms from the author field 
that were used to find the similar documents.
+Because we did not also specify `mlt.boost`, the boost values shown for the 
interesting terms all display `1.0`.
+
+[source,json]
+----
+{
+  "match":{"numFound":1,"start":0,"numFoundExact":true,
+    "docs":[
+      {
+        "id":"0553573403",
+        "cat":["book"],
+        "name":["A Game of Thrones"],
+        "price":[7.99],
+        "inStock":[true],
+        "author":["George R.R. Martin"],
+        "series_t":"A Song of Ice and Fire",
+        "sequence_i":1,
+        "genre_s":"fantasy",
+        "_version_":1693062911089442816}]
+  },
+  "response":{"numFound":2,"start":0,"numFoundExact":true,
+    "docs":[
+      {
+        "id":"0553579908",
+        "cat":["book"],
+        "name":["A Clash of Kings"],
+        "price":[7.99],
+        "inStock":[true],
+        "author":["George R.R. Martin"],
+        "series_t":"A Song of Ice and Fire",
+        "sequence_i":2,
+        "genre_s":"fantasy",
+        "_version_":1693062911094685696},
+      {
+        "id":"055357342X",
+        "cat":["book"],
+        "name":["A Storm of Swords"],
+        "price":[7.99],
+        "inStock":[true],
+        "author":["George R.R. Martin"],
+        "series_t":"A Song of Ice and Fire",
+        "sequence_i":3,
+        "genre_s":"fantasy",
+        "_version_":1693062911095734272}]
+  },
+  "interestingTerms":[
+    "author:r.r",1.0,
+    "author:george",1.0,
+    "author:martin",1.0]}
+----
+
+If we had not requested `mlt.match.include=true`, the response would not have 
included the `match` section.
+
+==== Streaming External Content to MoreLikeThis
+
+An external document (one not in the index) can be passed to the MoreLikeThis 
request handler to be used for recommended documents.
+
+This is accomplished with the use of <<content-streams.adoc#,Content Streams>>.
+The body of a document can be passed directly to the request handler with the 
`stream.body` parameter.
+Alternatively, if remote streams are enabled, a URL or file could be passed.
+
+[source,bash]
+----
+http://localhost:8983/solr/mlt?stream.body=electronics%20memory&mlt.fl=manu,cat&mlt.interestingTerms=list&mlt.mintf=0
+----
 
-== Parameters for the MoreLikeThisComponent
+This query would pass the terms "electronics memory" to the request handler 
instead of using a document already in the index.
 
-Using MoreLikeThis as a search component returns similar documents for each 
document in the response set. In addition to the common parameters, these 
additional options are available:
+The response in this case would look similar to the response above that used a 
document already in the index.
+
+=== MoreLikeThis Search Component
+
+Using MoreLikeThis as a search component returns similar documents for each 
document in the response set for another query.
+It's important to note this could incur a cost to search performance so should 
only be used when the use case warrants it.
+
+==== Search Component Configuration
+
+The MoreLikeThis search component is a default search component that works 
with all search handlers (see also 
<<requesthandlers-and-searchcomponents-in-solrconfig.adoc#default-components,Default
 Components>>).
+
+Since it is configured already, it doesn't need any additional configuration 
unless you'd like to set parameters for a particular collection that override 
the MoreLikeThis defaults.
+To do this, you could configure it like this:
+
+[source,xml]
+----
+<searchComponent name="mlt" class="solr.MoreLikeThisComponent">
+    <str name="mlt">true</str>
+    <str name="mlt.fl">body</str>
+</searchComponent>
+----
+
+The above example would always enable MoreLikeThis for all queries and will 
always use the "body" field.
+This is probably not something you really want!
+But the example serves to show how you might define whichever parameters you 
would like to be default for MoreLikeThis.
+
+If you gave the search component a name other than "mlt" as in the above 
example, you would need to explicitly add it to a request handler as described 
in the section 
<<requesthandlers-and-searchcomponents-in-solrconfig.adoc#referencing-search-components,Referencing
 Search Components>>.
+Because the above example uses the same name as the default, the parameters 
defined there override Solr's default.
+
+==== Search Component Parameters
+
+The MoreLikeThis search component supports the following parameters in 
addition to the <<Common Handler and Component Parameters,common parameters>> 
above.
 
 `mlt`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
 If set to `true`, activates the `MoreLikeThis` component and enables Solr to 
return `MoreLikeThis` results.
 
 `mlt.count`::
-Specifies the number of similar documents to be returned for each result. The 
default value is 5.
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `5`
+|===
++
+Specifies the number of similar documents to be returned for each result.
 
-`mlt.interestingTerms`:: _Same as defined below for the MLT Handler._
+==== Search Component Query and Response
 
-== Parameters for the MoreLikeThisHandler
+The response when using MoreLikeThis as a search component is different than 
when using the request handler.
 
-The table below summarizes parameters accessible through the 
`MoreLikeThisHandler`. It supports faceting, paging, and filtering using common 
query parameters, but does not work well with alternate query parsers.
+In this case, we are using the `/select` request handler and performing a 
regular query (`q=author:martin`).
+We've asked for MoreLikeThis to be added to the response (`mlt=true`), but 
otherwise the parameters are the same as the earlier example (we've asked for 
interesting terms and set minimum term and document frequencies to `0`).
 
-`mlt.match.include`::
-Specifies whether or not the response should include the matched document. If 
set to false, the response will look like a normal select response.
+[source,bash]
+http://localhost:8983/solr/gettingstarted/select?mlt.fl=name&mlt.mindf=0&mlt.mintf=0&mlt=true&q=author%3Amartin
 
-`mlt.match.offset`::
-Specifies an offset into the main query search results to locate the document 
on which the `MoreLikeThis` query should operate. By default, the query 
operates on the first result for the q parameter.
+The response includes the results of our query, in this case 3 documents which 
have the term "martin" in the author field.
+We've changed the field, however, to find documents that are similar to these 
based on values in the `name` field (`mlt.fl=name`).
 
-`mlt.interestingTerms`::
-Controls how the `MoreLikeThis` component presents the "interesting" terms 
(the top TF/IDF terms) for the query.
-It supports three settings:
-The setting `list` lists the terms.
-The setting `none` lists no terms.
-The setting `details` lists the terms along with the boost value used for each 
term.
-Unless `mlt.boost=true`, all terms will have `boost=1.0`.
+In the response, a `moreLikeThis` section has been added.
+For each document in the results that match our query, a list of document IDs 
is returned with score values.
+Each of these documents are similar to the document in the result list to 
varying degrees.
 
+[source,json]

Review comment:
       Hi Cassandra, I am currently away, but back next week, took a note, I'll 
think about it and keep you updated early next week!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to