Lucene cleanups (#799)

janhoy Fri, 08 Apr 2022 16:33:03 -0700

This is an automated email from the ASF dual-hosted git repository.

janhoy pushed a commit to branch branch_9_0
in repository https://gitbox.apache.org/repos/asf/solr.git



The following commit(s) were added to refs/heads/branch_9_0 by this push:
     new 90f7114671d [RefGuide] Various Lucene/Solr, Solr/Lucene cleanups (#799)
90f7114671d is described below

commit 90f7114671df5eb7ee3ff93413c813e404e8da15
Author: Jan Høydahl <[email protected]>
AuthorDate: Sat Apr 9 01:32:04 2022 +0200

    [RefGuide] Various Lucene/Solr, Solr/Lucene cleanups (#799)
    
    * Use dep-version-lucene for lucene version links
    * More nice wording from #800
    Co-authored by @cpoerschke
    
    (cherry picked from commit 67eb86aeaa73dcb57a18681847925c1f5de7df87)
---
 .../modules/configuration-guide/pages/configuring-solr-xml.adoc       | 2 +-
 .../modules/configuration-guide/pages/index-segments-merging.adoc     | 2 +-
 .../modules/deployment-guide/pages/indexupgrader-tool.adoc            | 2 +-
 solr/solr-ref-guide/modules/deployment-guide/pages/jvm-settings.adoc  | 2 +-
 .../modules/deployment-guide/pages/taking-solr-to-production.adoc     | 2 +-
 .../modules/getting-started/pages/about-this-guide.adoc               | 2 +-
 solr/solr-ref-guide/modules/getting-started/pages/tutorial-aws.adoc   | 2 +-
 .../modules/indexing-guide/pages/currencies-exchange-rates.adoc       | 2 +-
 .../modules/indexing-guide/pages/indexing-with-tika.adoc              | 2 +-
 .../modules/indexing-guide/pages/luke-request-handler.adoc            | 2 +-
 .../modules/indexing-guide/pages/phonetic-matching.adoc               | 2 +-
 .../solr-ref-guide/modules/query-guide/pages/dismax-query-parser.adoc | 2 +-
 solr/solr-ref-guide/modules/query-guide/pages/loading.adoc            | 2 +-
 solr/solr-ref-guide/modules/query-guide/pages/machine-learning.adoc   | 2 +-
 .../modules/query-guide/pages/standard-query-parser.adoc              | 4 ++--
 .../modules/query-guide/pages/stream-evaluator-reference.adoc         | 2 +-
 solr/solr-ref-guide/modules/query-guide/pages/term-vectors.adoc       | 4 ++--
 .../modules/upgrade-notes/pages/solr-upgrade-notes.adoc               | 2 +-
 18 files changed, 20 insertions(+), 20 deletions(-)

diff --git 
a/solr/solr-ref-guide/modules/configuration-guide/pages/configuring-solr-xml.adoc
 
b/solr/solr-ref-guide/modules/configuration-guide/pages/configuring-solr-xml.adoc
index 9f1955082da..9744a7956ae 100644
--- 
a/solr/solr-ref-guide/modules/configuration-guide/pages/configuring-solr-xml.adoc
+++ 
b/solr/solr-ref-guide/modules/configuration-guide/pages/configuring-solr-xml.adoc
@@ -260,7 +260,7 @@ The directory under which configsets for Solr cores can be 
found.
 Sets the maximum number of (nested) clauses allowed in any query.
 +
 This global limit provides a safety constraint on the total number of clauses 
allowed in any query against any collection -- regardless of whether those 
clauses were explicitly specified in a query string, or were the result of 
query expansion/re-writing from a more complex type of query based on the terms 
in the index.
-This limit is enforced at multiple points in  Lucene, both to prevent 
primitive query objects (mainly `BooleanQuery`) from being constructed with an 
excessive number of clauses in a way that may exhaust the JVM heap, but also to 
ensure that no composite query (made up of multiple primitive queries) can be 
executed with an excessive _total_ number of nested clauses in a way that may 
cause a search thread to use excessive CPU.
+This limit is enforced at multiple points in Lucene, both to prevent primitive 
query objects (mainly `BooleanQuery`) from being constructed with an excessive 
number of clauses in a way that may exhaust the JVM heap, but also to ensure 
that no composite query (made up of multiple primitive queries) can be executed 
with an excessive _total_ number of nested clauses in a way that may cause a 
search thread to use excessive CPU.
 +
 In default configurations this property uses the value of the 
`solr.max.booleanClauses` system property if specified.
 This is the same system property used in the `_default` configset for the 
xref:caches-warming.adoc#maxbooleanclauses-element[`<maxBooleanClauses>` 
element of `solrconfig.xml`] making it easy for Solr administrators to increase 
both values (in all collections) without needing to search through and update 
all of their configs.
diff --git 
a/solr/solr-ref-guide/modules/configuration-guide/pages/index-segments-merging.adoc
 
b/solr/solr-ref-guide/modules/configuration-guide/pages/index-segments-merging.adoc
index 9bf7cb4d74b..766e61cbaf7 100644
--- 
a/solr/solr-ref-guide/modules/configuration-guide/pages/index-segments-merging.adoc
+++ 
b/solr/solr-ref-guide/modules/configuration-guide/pages/index-segments-merging.adoc
@@ -250,7 +250,7 @@ This is not required for near real-time search, but will 
reduce search latency o
 == Compound File Segments
 
 Each Lucene segment is typically comprised of a dozen or so files.
-Lucene can be configured to bundle all of the files for a segment into a 
single compound file using a file extension of `.cfs`, for "Compound File 
Segment".
+Solr can be configured to bundle all of the files for a Lucene segment into a 
single compound file using a file extension of `.cfs`, for "Compound File 
Segment".
 
 CFS segments may incur a minor performance hit for various reasons, depending 
on the runtime environment.
 For example, filesystem buffers are typically associated with open file 
descriptors, which may limit the total cache space available to each index.
diff --git 
a/solr/solr-ref-guide/modules/deployment-guide/pages/indexupgrader-tool.adoc 
b/solr/solr-ref-guide/modules/deployment-guide/pages/indexupgrader-tool.adoc
index 3edffb1d023..e6aa84ff1da 100644
--- a/solr/solr-ref-guide/modules/deployment-guide/pages/indexupgrader-tool.adoc
+++ b/solr/solr-ref-guide/modules/deployment-guide/pages/indexupgrader-tool.adoc
@@ -36,7 +36,7 @@ You will need to include the `lucene-core-<version>.jar` and 
`lucene-backwards-c
 
 [source,bash,subs="attributes"]
 ----
-java -cp 
lucene-core-{solr-full-version}.jar:lucene-backward-codecs-{solr-full-version}.jar
 org.apache.lucene.index.IndexUpgrader [-delete-prior-commits] [-verbose] 
/path/to/index
+java -cp 
lucene-core-{dep-version-lucene}.jar:lucene-backward-codecs-{dep-version-lucene}.jar
 org.apache.lucene.index.IndexUpgrader [-delete-prior-commits] [-verbose] 
/path/to/index
 ----
 
 This tool keeps only the last commit in an index.
diff --git 
a/solr/solr-ref-guide/modules/deployment-guide/pages/jvm-settings.adoc 
b/solr/solr-ref-guide/modules/deployment-guide/pages/jvm-settings.adoc
index f2a73a418d2..d1217258a63 100644
--- a/solr/solr-ref-guide/modules/deployment-guide/pages/jvm-settings.adoc
+++ b/solr/solr-ref-guide/modules/deployment-guide/pages/jvm-settings.adoc
@@ -39,7 +39,7 @@ There are several points to keep in mind:
 
 * Running Solr with too little "headroom" allocated for the heap can cause 
excessive resources to be consumed by continual GC.
 Thus the 25-50% recommendation above.
-* Lucene/Solr makes extensive use of MMapDirectory, which uses RAM _not_ 
reserved for the JVM for most of the Lucene index.
+* Solr makes extensive use of MMapDirectory, which uses RAM _not_ reserved for 
the JVM for most of the Lucene index.
 Therefore, as much memory as possible should be left for the operating system 
to use for this purpose.
 * The heap allocated should be as small as possible while maintaining good 
performance.
 8-16Gb is quite common, and larger heaps are sometimes used.
diff --git 
a/solr/solr-ref-guide/modules/deployment-guide/pages/taking-solr-to-production.adoc
 
b/solr/solr-ref-guide/modules/deployment-guide/pages/taking-solr-to-production.adoc
index 34f807423d2..1af02f47f70 100644
--- 
a/solr/solr-ref-guide/modules/deployment-guide/pages/taking-solr-to-production.adoc
+++ 
b/solr/solr-ref-guide/modules/deployment-guide/pages/taking-solr-to-production.adoc
@@ -386,7 +386,7 @@ Errors such as "too many open files", "connection error", 
and "max processes exc
 
 === Avoid Swapping (*nix Operating Systems)
 
-When running a Java application like Lucene/Solr, having the OS swap memory to 
disk is a very bad situation.
+When running a Java application like Solr, having the OS swap memory to disk 
is a very bad situation.
 We usually prefer a hard crash so other healthy Solr nodes can take over, 
instead of letting a Solr node swap, causing terrible performance, timeouts and 
an unstable system.
 So our recommendation is to disable swap on the host altogether or reduce the 
"swappiness".
 These instructions are valid for Linux environments.
diff --git 
a/solr/solr-ref-guide/modules/getting-started/pages/about-this-guide.adoc 
b/solr/solr-ref-guide/modules/getting-started/pages/about-this-guide.adoc
index 7ea258ce3cb..f60b414c0af 100644
--- a/solr/solr-ref-guide/modules/getting-started/pages/about-this-guide.adoc
+++ b/solr/solr-ref-guide/modules/getting-started/pages/about-this-guide.adoc
@@ -25,7 +25,7 @@ It is structured to address a broad spectrum of needs, 
ranging from new develope
 It will be of use at any point in the application life cycle, for whenever you 
need authoritative information about Solr.
 
 The material as presented assumes that you are familiar with some basic search 
concepts and that you can read XML.
-It does not assume that you are a Java programmer, although knowledge of Java 
is helpful when working directly with Lucene or when developing custom 
extensions to a Lucene/Solr installation.
+It does not assume that you are a Java programmer, although knowledge of Java 
is helpful when working directly with Lucene or when developing custom 
extensions to a Solr installation.
 
 == Hosts and Port Examples
 
diff --git 
a/solr/solr-ref-guide/modules/getting-started/pages/tutorial-aws.adoc 
b/solr/solr-ref-guide/modules/getting-started/pages/tutorial-aws.adoc
index 6439c5f56fe..4a7d7bdee4a 100644
--- a/solr/solr-ref-guide/modules/getting-started/pages/tutorial-aws.adoc
+++ b/solr/solr-ref-guide/modules/getting-started/pages/tutorial-aws.adoc
@@ -140,7 +140,7 @@ $ java -version
 +
 [source,bash,subs="verbatim,attributes+"]
 # download desired version of Solr
-$ wget 
http://archive.apache.org/dist/lucene/solr/{solr-full-version}/solr-{solr-full-version}.tgz
+$ wget 
http://archive.apache.org/dist/solr/solr/{solr-full-version}/solr-{solr-full-version}.tgz
 # untar
 $ tar -zxvf solr-{solr-full-version}.tgz
 # set SOLR_HOME
diff --git 
a/solr/solr-ref-guide/modules/indexing-guide/pages/currencies-exchange-rates.adoc
 
b/solr/solr-ref-guide/modules/indexing-guide/pages/currencies-exchange-rates.adoc
index 7d089ac9415..3853ce305cc 100644
--- 
a/solr/solr-ref-guide/modules/indexing-guide/pages/currencies-exchange-rates.adoc
+++ 
b/solr/solr-ref-guide/modules/indexing-guide/pages/currencies-exchange-rates.adoc
@@ -16,7 +16,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-The `currency` FieldType provides support for monetary values to Solr/Lucene 
with query-time currency conversion and exchange rates.
+The `currency` FieldType provides support for monetary values to Solr with 
query-time currency conversion and exchange rates.
 The following features are supported:
 
 * Point queries
diff --git 
a/solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc 
b/solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc
index 4c55ae950e5..73a38af2c96 100644
--- a/solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc
+++ b/solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc
@@ -46,7 +46,7 @@ Solr Cell supplies some metadata of its own too.
 You can configure which elements should be included/ignored, and which should 
map to another field.
 * Solr Cell maps each piece of metadata onto a field.
 By default it maps to the same name but several parameters control how this is 
done.
-* When Solr Cell finishes creating the internal `SolrInputDocument`, the rest 
of the Lucene/Solr indexing stack takes over.
+* When Solr Cell finishes creating the internal `SolrInputDocument`, the rest 
of the indexing stack takes over.
 The next step after any update handler is the 
xref:configuration-guide:update-request-processors.adoc[Update Request 
Processor] chain.
 
 Solr Cell is a module, which means it's not automatically included with Solr 
but must be configured.
diff --git 
a/solr/solr-ref-guide/modules/indexing-guide/pages/luke-request-handler.adoc 
b/solr/solr-ref-guide/modules/indexing-guide/pages/luke-request-handler.adoc
index bb0987d17dd..56ec50eebc8 100644
--- a/solr/solr-ref-guide/modules/indexing-guide/pages/luke-request-handler.adoc
+++ b/solr/solr-ref-guide/modules/indexing-guide/pages/luke-request-handler.adoc
@@ -17,7 +17,7 @@
 // under the License.
 
 The Luke Request Handler offers programmatic access to the information 
provided on the xref:schema-browser-screen.adoc[] page of the Admin UI.
-It is modeled after Luke, the Lucene Index Browser by Andrzej Bialecki.
+It is modeled after 
https://github.com/apache/lucene/tree/releases/lucene/{dep-version-lucene}/lucene/luke[Luke],
 the Lucene Index Browser.
 It is an implicit handler, so you don't need to define it in `solrconfig.xml`.
 
 The Luke Request Handler accepts the following parameters:
diff --git 
a/solr/solr-ref-guide/modules/indexing-guide/pages/phonetic-matching.adoc 
b/solr/solr-ref-guide/modules/indexing-guide/pages/phonetic-matching.adoc
index 2f6fd1365b9..c68b7a9e2ac 100644
--- a/solr/solr-ref-guide/modules/indexing-guide/pages/phonetic-matching.adoc
+++ b/solr/solr-ref-guide/modules/indexing-guide/pages/phonetic-matching.adoc
@@ -25,7 +25,7 @@ For overviews of and comparisons between algorithms, see 
http://en.wikipedia.org
 For examples of how to use this encoding in your analyzer, see 
xref:filters.adoc#beider-morse-filter[Beider Morse Filter] in the Filter 
Descriptions section.
 
 Beider-Morse Phonetic Matching (BMPM) is a "soundalike" tool that lets you 
search using a new phonetic matching system.
-BMPM helps you search for personal names (or just surnames) in a Solr/Lucene 
index, and is far superior to the existing phonetic codecs, such as regular 
soundex, metaphone, caverphone, etc.
+BMPM helps you search for personal names (or just surnames) in a Solr index, 
and is far superior to the existing phonetic codecs, such as regular soundex, 
metaphone, caverphone, etc.
 
 In general, phonetic matching lets you search a name list for names that are 
phonetically equivalent to the desired name.
 BMPM is similar to a soundex search in that an exact spelling is not required.
diff --git 
a/solr/solr-ref-guide/modules/query-guide/pages/dismax-query-parser.adoc 
b/solr/solr-ref-guide/modules/query-guide/pages/dismax-query-parser.adoc
index 52d209bafdf..c9d0db263f7 100644
--- a/solr/solr-ref-guide/modules/query-guide/pages/dismax-query-parser.adoc
+++ b/solr/solr-ref-guide/modules/query-guide/pages/dismax-query-parser.adoc
@@ -77,7 +77,7 @@ These boost factors make matches in `fieldOne` much more 
significant than matche
 
 === mm (Minimum Should Match) Parameter
 
-When processing queries, Lucene/Solr recognizes three types of clauses: 
mandatory, prohibited, and "optional" (also known as "should" clauses).
+When processing queries, there are three types of clauses: mandatory, 
prohibited, and "optional" (also known as "should" clauses).
 By default, all words or phrases specified in the `q` parameter are treated as 
"optional" clauses unless they are preceded by a "+" or a "-".
 When dealing with these "optional" clauses, the `mm` parameter makes it 
possible to say that a certain minimum number of those clauses must match.
 The DisMax query parser offers great flexibility in how the minimum number can 
be specified.
diff --git a/solr/solr-ref-guide/modules/query-guide/pages/loading.adoc 
b/solr/solr-ref-guide/modules/query-guide/pages/loading.adoc
index 150f981ff2c..f2d03b10516 100644
--- a/solr/solr-ref-guide/modules/query-guide/pages/loading.adoc
+++ b/solr/solr-ref-guide/modules/query-guide/pages/loading.adoc
@@ -446,7 +446,7 @@ image::math-expressions/ifIsNull.png[]
 === Text Analysis
 
 The `analyze` function can be used from inside a `select` function to analyze
-a text field with a Lucene/Solr analyzer.
+a text field with an available analyzer.
 The output of `analyze` is a list of analyzed tokens which can be added to 
each tuple as a multi-valued field.
 
 The multi-valued field can then be sent to Solr for indexing or the 
`cartesianProduct`
diff --git 
a/solr/solr-ref-guide/modules/query-guide/pages/machine-learning.adoc 
b/solr/solr-ref-guide/modules/query-guide/pages/machine-learning.adoc
index 5f602ef04bc..55aaba56948 100644
--- a/solr/solr-ref-guide/modules/query-guide/pages/machine-learning.adoc
+++ b/solr/solr-ref-guide/modules/query-guide/pages/machine-learning.adoc
@@ -415,7 +415,7 @@ NOTE: The example below works with TF-IDF _term vectors_.
 The section xref:term-vectors.adoc[] offers a full explanation of this 
features.
 
 In the example the `search` function returns documents where the `review_t` 
field matches the phrase "star wars".
-The `select` function is run over the result set and applies the `analyze` 
function which uses the Lucene/Solr analyzer attached to the schema field 
`text_bigrams` to re-analyze the `review_t` field.
+The `select` function is run over the result set and applies the `analyze` 
function which uses the analyzer attached to the schema field `text_bigrams` to 
re-analyze the `review_t` field.
 This analyzer returns bigrams which are then annotated to documents in a field 
called `terms`.
 
 The `termVectors` function then creates TD-IDF term vectors from the bigrams 
stored in the `terms` field.
diff --git 
a/solr/solr-ref-guide/modules/query-guide/pages/standard-query-parser.adoc 
b/solr/solr-ref-guide/modules/query-guide/pages/standard-query-parser.adoc
index fe9bb929bcd..06dd8112a11 100644
--- a/solr/solr-ref-guide/modules/query-guide/pages/standard-query-parser.adoc
+++ b/solr/solr-ref-guide/modules/query-guide/pages/standard-query-parser.adoc
@@ -220,7 +220,7 @@ However for float/double types that support `NaN` values, 
these two queries perf
 
 === Boosting a Term with "^"
 
-Lucene/Solr provides the relevance level of matching documents based on the 
terms found.
+Solr provides the relevance level of matching documents based on the terms 
found.
 To boost a term use the caret symbol `^` with a boost factor (a number) at the 
end of the term you are searching.
 The higher the boost factor, the more relevant the term will be.
 
@@ -372,7 +372,7 @@ For example, to search for (1+1):2 without having Solr 
interpret the plus sign a
 
 == Grouping Terms to Form Sub-Queries
 
-Lucene/Solr supports using parentheses to group clauses to form sub-queries.
+Solr supports using parentheses to group clauses to form sub-queries.
 This can be very useful if you want to control the Boolean logic for a query.
 
 The query below searches for either "jakarta" or "apache" and "website":
diff --git 
a/solr/solr-ref-guide/modules/query-guide/pages/stream-evaluator-reference.adoc 
b/solr/solr-ref-guide/modules/query-guide/pages/stream-evaluator-reference.adoc
index 5e3accf0575..c6bb3f9a4a6 100644
--- 
a/solr/solr-ref-guide/modules/query-guide/pages/stream-evaluator-reference.adoc
+++ 
b/solr/solr-ref-guide/modules/query-guide/pages/stream-evaluator-reference.adoc
@@ -100,7 +100,7 @@ add(fieldA,if(gt(fieldA,fieldB),fieldA,fieldB)) // if 
fieldA > fieldB then field
 
 == analyze
 
-The `analyze` function analyzes text using a Lucene/Solr analyzer and returns 
a list of tokens emitted by the analyzer.
+The `analyze` function analyzes text using an available analyzer and returns a 
list of tokens emitted by the analyzer.
 The `analyze` function can be called on its own or within the 
xref:stream-decorator-reference.adoc#select[`select`] and 
xref:stream-decorator-reference.adoc#cartesianproduct[`cartesianProduct`] 
streaming expressions.
 
 === analyze Parameters
diff --git a/solr/solr-ref-guide/modules/query-guide/pages/term-vectors.adoc 
b/solr/solr-ref-guide/modules/query-guide/pages/term-vectors.adoc
index 7ca5527fff0..5f297d050a8 100644
--- a/solr/solr-ref-guide/modules/query-guide/pages/term-vectors.adoc
+++ b/solr/solr-ref-guide/modules/query-guide/pages/term-vectors.adoc
@@ -119,7 +119,7 @@ The phrase query "Man on Fire" is searched for and the top 
5000 results, by scor
 A single field from the results is return which is the `review_t` field that 
contains text of the movie review.
 
 Then `cartesianProduct` function is run over the search results.
-The `cartesianProduct` function applies the `analyze` function, which takes 
the `review_t` field and analyzes it with the Lucene/Solr analyzer attached to 
the `text_bigrams` schema field.
+The `cartesianProduct` function applies the `analyze` function, which takes 
the `review_t` field and analyzes it with the analyzer attached to the 
`text_bigrams` schema field.
 This analyzer emits the bigrams found in the text field.
 The `cartesianProduct` function explodes each bigram into its own tuple with 
the bigram stored in the field `term`.
 
@@ -132,7 +132,7 @@ Then Zeppelin-Solr is used to visualize the top 10 ten 
bigrams.
 
 image::math-expressions/text-analytics.png[]
 
-Lucene/Solr analyzers can be configured in many different ways to support 
aggregations over NLP entities (people, places, companies, etc.) as well as 
tokens extracted with regular expressions or dictionaries.
+Analyzers can be configured in many different ways to support aggregations 
over NLP entities (people, places, companies, etc.) as well as tokens extracted 
with regular expressions or dictionaries.
 
 == TF-IDF Term Vectors
 
diff --git 
a/solr/solr-ref-guide/modules/upgrade-notes/pages/solr-upgrade-notes.adoc 
b/solr/solr-ref-guide/modules/upgrade-notes/pages/solr-upgrade-notes.adoc
index ac9c8bcd4ff..617cd14a0e7 100644
--- a/solr/solr-ref-guide/modules/upgrade-notes/pages/solr-upgrade-notes.adoc
+++ b/solr/solr-ref-guide/modules/upgrade-notes/pages/solr-upgrade-notes.adoc
@@ -574,7 +574,7 @@ for an overview of the main new features of Solr 8.4.
 
 When upgrading to 8.4.x users should be aware of the following major changes 
from 8.3.
 
-*Reminder:*  If you set the `postingsFormat` or `docValuesFormat` in the 
schema in order to use a non-default option, you risk preventing yourself from 
upgrading your Lucene/Solr software at future versions.
+*Reminder:*  If you set the `postingsFormat` or `docValuesFormat` in the 
schema in order to use a non-default option, you risk preventing yourself from 
upgrading your Solr software at future versions, due to changed version of the 
Lucene library.
 Multiple non-default postings formats changed in 8.4, thus rendering the index 
data from a previous index.
 This includes "FST50" which was recommended by the Solr TaggerHandler for 
performance reasons.
 There is now improved documentation to navigate this trade-off choice.

[solr] branch branch_9_0 updated: [RefGuide] Various Lucene/Solr, Solr/Lucene cleanups (#799)

Reply via email to