[
https://issues.apache.org/jira/browse/SOLR-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-13336:
----------------------------
Affects Version/s: (was: master (9.0))
(was: 7.6)
8.0
Fix Version/s: 8.1
Description:
changes made in Solr 7.0 set the effective value of
{{BoleanQuery.getMaxClauseCount}} to {{Integer.MAX_VALUE-1}} and only impossed
a restriction based on the (existing) solrconfig.xml setting at the Solr query
parser level via a new utility helper method.l
But this means programatically generated queries (either by low level lucene
methods, or by query re-writing) no longer had any safety valve to prevent
(effectively) infinite expansion. This issue fixes this problem by:
* restoring a default upper bound on {{BoleanQuery.getMaxClauseCount}} of 1024
* introducing a new solr.xml level setting for configuring this upper
bound:{noformat}
<int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>
{noformat}
*NOTE* that this solr.xml limit is ahard upper bound, that superceeds the
existing solrconfig.xml setting, which has been left in place and still limits
the size of user specified boolean queries. ie: solr.xml maxBooleanClauses >=
solrconfig.xml maxBooleanClauses >= number of clauses a user explicitly
specifies in a query string; solr.xml maxBooleanClauses >= numberr of clauses
in an expanded/rewritten query
{panel:title=original bug report}
Since SOLR-10921 it appears that Solr always sets
{{BooleanQuery.maxClauseCount}} (at the Lucene level) to
{{Integer.MAX_VALUE-1}}. I assume this is because Solr parses
{{maxBooleanClauses}} out of the config and applies it externally.
In any case, when used as part of
{{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and possibly other places?),
the Lucene code checks internally against only the static {{maxClauseCount}}
variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).
Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?),
{{maxBooleanClauses}} is having no effect. I'm pretty sure this is what's
underlying the [issue reported here as being related to Solr
7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].
To summarize, users are definitely susceptible (to varying degrees of likely
severity, assuming no actual _malicious_ attack) if:
# Running Solr >= 7.6.0
# Using edismax with "ps" param set to >0
# Query-time analysis chain is _at all_ capable of producing graphs (e.g.,
WordDelimiterGraphFilter, SynonymGraphFilter that has corresponding synonyms
with varying token lengths.
Users are _particularly_ vulnerable in practice if they have query-time
{{WordDelimiterGraphFilter}} configured with {{preserveOriginal=true}}.
To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only
increased the likelihood of problems manifesting (as a result of LUCENE-8531).
Notably, the "enumerated strings" approach to graph phrase query (reintroduced
by LUCENE-8531) was previously in place pre-6.5 – at which point it could rely
on default Lucene-level {{maxClauseCount}} failsafe (removed as of 7.0). This
explains the odd "Affects versions" => maxBooleanClauses was disabled at the
Lucene level (in Solr contexts) starting with version 7.0, but the change
became more likely to manifest problems for users as of 7.6.
{panel}
was:
Since SOLR-10921 it appears that Solr always sets
{{BooleanQuery.maxClauseCount}} (at the Lucene level) to
{{Integer.MAX_VALUE-1}}. I assume this is because Solr parses
{{maxBooleanClauses}} out of the config and applies it externally.
In any case, when used as part of
{{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and possibly other places?),
the Lucene code checks internally against only the static {{maxClauseCount}}
variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).
Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?),
{{maxBooleanClauses}} is having no effect. I'm pretty sure this is what's
underlying the [issue reported here as being related to Solr
7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].
To summarize, users are definitely susceptible (to varying degrees of likely
severity, assuming no actual _malicious_ attack) if:
# Running Solr >= 7.6.0
# Using edismax with "ps" param set to >0
# Query-time analysis chain is _at all_ capable of producing graphs (e.g.,
WordDelimiterGraphFilter, SynonymGraphFilter that has corresponding synonyms
with varying token lengths.
Users are _particularly_ vulnerable in practice if they have query-time
{{WordDelimiterGraphFilter}} configured with {{preserveOriginal=true}}.
To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only
increased the likelihood of problems manifesting (as a result of LUCENE-8531).
Notably, the "enumerated strings" approach to graph phrase query (reintroduced
by LUCENE-8531) was previously in place pre-6.5 – at which point it could rely
on default Lucene-level {{maxClauseCount}} failsafe (removed as of 7.0). This
explains the odd "Affects versions" => maxBooleanClauses was disabled at the
Lucene level (in Solr contexts) starting with version 7.0, but the change
became more likely to manifest problems for users as of 7.6.
Summary: solrconfig.xml maxBooleanClauses ignored by
programtic/rewrtten queries; can result in exponential expansion of naive
queries (was: maxBooleanClauses ignored; can result in exponential expansion
of naive queries)
> solrconfig.xml maxBooleanClauses ignored by programtic/rewrtten queries; can
> result in exponential expansion of naive queries
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-13336
> URL: https://issues.apache.org/jira/browse/SOLR-13336
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: query parsers
> Affects Versions: 7.0, 8.0
> Reporter: Michael Gibney
> Assignee: Hoss Man
> Priority: Major
> Fix For: 8.1
>
> Attachments: SOLR-13336.patch, SOLR-13336.patch, SOLR-13336.patch
>
>
> changes made in Solr 7.0 set the effective value of
> {{BoleanQuery.getMaxClauseCount}} to {{Integer.MAX_VALUE-1}} and only
> impossed a restriction based on the (existing) solrconfig.xml setting at the
> Solr query parser level via a new utility helper method.l
> But this means programatically generated queries (either by low level lucene
> methods, or by query re-writing) no longer had any safety valve to prevent
> (effectively) infinite expansion. This issue fixes this problem by:
> * restoring a default upper bound on {{BoleanQuery.getMaxClauseCount}} of 1024
> * introducing a new solr.xml level setting for configuring this upper
> bound:{noformat}
> <int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>
> {noformat}
> *NOTE* that this solr.xml limit is ahard upper bound, that superceeds the
> existing solrconfig.xml setting, which has been left in place and still
> limits the size of user specified boolean queries. ie: solr.xml
> maxBooleanClauses >= solrconfig.xml maxBooleanClauses >= number of clauses a
> user explicitly specifies in a query string; solr.xml maxBooleanClauses >=
> numberr of clauses in an expanded/rewritten query
> {panel:title=original bug report}
> Since SOLR-10921 it appears that Solr always sets
> {{BooleanQuery.maxClauseCount}} (at the Lucene level) to
> {{Integer.MAX_VALUE-1}}. I assume this is because Solr parses
> {{maxBooleanClauses}} out of the config and applies it externally.
> In any case, when used as part of
> {{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and possibly other places?),
> the Lucene code checks internally against only the static {{maxClauseCount}}
> variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).
> Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?),
> {{maxBooleanClauses}} is having no effect. I'm pretty sure this is what's
> underlying the [issue reported here as being related to Solr
> 7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].
> To summarize, users are definitely susceptible (to varying degrees of likely
> severity, assuming no actual _malicious_ attack) if:
> # Running Solr >= 7.6.0
> # Using edismax with "ps" param set to >0
> # Query-time analysis chain is _at all_ capable of producing graphs (e.g.,
> WordDelimiterGraphFilter, SynonymGraphFilter that has corresponding synonyms
> with varying token lengths.
> Users are _particularly_ vulnerable in practice if they have query-time
> {{WordDelimiterGraphFilter}} configured with {{preserveOriginal=true}}.
> To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only
> increased the likelihood of problems manifesting (as a result of
> LUCENE-8531). Notably, the "enumerated strings" approach to graph phrase
> query (reintroduced by LUCENE-8531) was previously in place pre-6.5 – at
> which point it could rely on default Lucene-level {{maxClauseCount}} failsafe
> (removed as of 7.0). This explains the odd "Affects versions" =>
> maxBooleanClauses was disabled at the Lucene level (in Solr contexts)
> starting with version 7.0, but the change became more likely to manifest
> problems for users as of 7.6.
> {panel}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]