[ 
https://issues.apache.org/jira/browse/SOLR-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-13336:
----------------------------
    Affects Version/s:     (was: master (9.0))
                           (was: 7.6)
                       8.0
        Fix Version/s: 8.1
          Description: 
changes made in Solr 7.0 set the effective value of 
{{BoleanQuery.getMaxClauseCount}} to {{Integer.MAX_VALUE-1}} and only impossed 
a restriction based on the (existing) solrconfig.xml setting  at the Solr query 
parser level via a new utility helper method.l

But this means programatically generated queries (either by low level lucene 
methods, or by query re-writing) no longer had any safety valve to prevent 
(effectively) infinite expansion.  This issue fixes this problem by:
* restoring a default upper bound on {{BoleanQuery.getMaxClauseCount}} of 1024
* introducing a new solr.xml level setting for configuring this upper 
bound:{noformat}
<int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>
{noformat}

*NOTE* that this solr.xml limit is ahard upper bound, that superceeds the 
existing solrconfig.xml setting, which has been left in place and still limits 
the size of user specified boolean queries.  ie: solr.xml maxBooleanClauses >= 
solrconfig.xml maxBooleanClauses >= number of clauses a user explicitly 
specifies in a query string; solr.xml maxBooleanClauses >= numberr of clauses 
in an expanded/rewritten query

{panel:title=original bug report}
Since SOLR-10921 it appears that Solr always sets 
{{BooleanQuery.maxClauseCount}} (at the Lucene level) to 
{{Integer.MAX_VALUE-1}}. I assume this is because Solr parses 
{{maxBooleanClauses}} out of the config and applies it externally.

In any case, when used as part of 
{{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and possibly other places?), 
the Lucene code checks internally against only the static {{maxClauseCount}} 
variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).

Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?), 
{{maxBooleanClauses}} is having no effect. I'm pretty sure this is what's 
underlying the [issue reported here as being related to Solr 
7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].

To summarize, users are definitely susceptible (to varying degrees of likely 
severity, assuming no actual _malicious_ attack) if:
 # Running Solr >= 7.6.0
 # Using edismax with "ps" param set to >0
 # Query-time analysis chain is _at all_ capable of producing graphs (e.g., 
WordDelimiterGraphFilter, SynonymGraphFilter that has corresponding synonyms 
with varying token lengths.

Users are _particularly_ vulnerable in practice if they have query-time 
{{WordDelimiterGraphFilter}} configured with {{preserveOriginal=true}}.

To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only 
increased the likelihood of problems manifesting (as a result of LUCENE-8531). 
Notably, the "enumerated strings" approach to graph phrase query (reintroduced 
by LUCENE-8531) was previously in place pre-6.5 – at which point it could rely 
on default Lucene-level {{maxClauseCount}} failsafe (removed as of 7.0). This 
explains the odd "Affects versions" => maxBooleanClauses was disabled at the 
Lucene level (in Solr contexts) starting with version 7.0, but the change 
became more likely to manifest problems for users as of 7.6.
{panel}

  was:
Since SOLR-10921 it appears that Solr always sets 
{{BooleanQuery.maxClauseCount}} (at the Lucene level) to 
{{Integer.MAX_VALUE-1}}. I assume this is because Solr parses 
{{maxBooleanClauses}} out of the config and applies it externally.

In any case, when used as part of 
{{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and possibly other places?), 
the Lucene code checks internally against only the static {{maxClauseCount}} 
variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).

Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?), 
{{maxBooleanClauses}} is having no effect. I'm pretty sure this is what's 
underlying the [issue reported here as being related to Solr 
7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].

To summarize, users are definitely susceptible (to varying degrees of likely 
severity, assuming no actual _malicious_ attack) if:
 # Running Solr >= 7.6.0
 # Using edismax with "ps" param set to >0
 # Query-time analysis chain is _at all_ capable of producing graphs (e.g., 
WordDelimiterGraphFilter, SynonymGraphFilter that has corresponding synonyms 
with varying token lengths.

Users are _particularly_ vulnerable in practice if they have query-time 
{{WordDelimiterGraphFilter}} configured with {{preserveOriginal=true}}.

To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only 
increased the likelihood of problems manifesting (as a result of LUCENE-8531). 
Notably, the "enumerated strings" approach to graph phrase query (reintroduced 
by LUCENE-8531) was previously in place pre-6.5 – at which point it could rely 
on default Lucene-level {{maxClauseCount}} failsafe (removed as of 7.0). This 
explains the odd "Affects versions" => maxBooleanClauses was disabled at the 
Lucene level (in Solr contexts) starting with version 7.0, but the change 
became more likely to manifest problems for users as of 7.6.

              Summary: solrconfig.xml maxBooleanClauses ignored by 
programtic/rewrtten queries; can result in exponential expansion of naive 
queries  (was: maxBooleanClauses ignored; can result in exponential expansion 
of naive queries)

> solrconfig.xml maxBooleanClauses ignored by programtic/rewrtten queries; can 
> result in exponential expansion of naive queries
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13336
>                 URL: https://issues.apache.org/jira/browse/SOLR-13336
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 7.0, 8.0
>            Reporter: Michael Gibney
>            Assignee: Hoss Man
>            Priority: Major
>             Fix For: 8.1
>
>         Attachments: SOLR-13336.patch, SOLR-13336.patch, SOLR-13336.patch
>
>
> changes made in Solr 7.0 set the effective value of 
> {{BoleanQuery.getMaxClauseCount}} to {{Integer.MAX_VALUE-1}} and only 
> impossed a restriction based on the (existing) solrconfig.xml setting  at the 
> Solr query parser level via a new utility helper method.l
> But this means programatically generated queries (either by low level lucene 
> methods, or by query re-writing) no longer had any safety valve to prevent 
> (effectively) infinite expansion.  This issue fixes this problem by:
> * restoring a default upper bound on {{BoleanQuery.getMaxClauseCount}} of 1024
> * introducing a new solr.xml level setting for configuring this upper 
> bound:{noformat}
> <int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>
> {noformat}
> *NOTE* that this solr.xml limit is ahard upper bound, that superceeds the 
> existing solrconfig.xml setting, which has been left in place and still 
> limits the size of user specified boolean queries.  ie: solr.xml 
> maxBooleanClauses >= solrconfig.xml maxBooleanClauses >= number of clauses a 
> user explicitly specifies in a query string; solr.xml maxBooleanClauses >= 
> numberr of clauses in an expanded/rewritten query
> {panel:title=original bug report}
> Since SOLR-10921 it appears that Solr always sets 
> {{BooleanQuery.maxClauseCount}} (at the Lucene level) to 
> {{Integer.MAX_VALUE-1}}. I assume this is because Solr parses 
> {{maxBooleanClauses}} out of the config and applies it externally.
> In any case, when used as part of 
> {{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and possibly other places?), 
> the Lucene code checks internally against only the static {{maxClauseCount}} 
> variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).
> Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?), 
> {{maxBooleanClauses}} is having no effect. I'm pretty sure this is what's 
> underlying the [issue reported here as being related to Solr 
> 7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].
> To summarize, users are definitely susceptible (to varying degrees of likely 
> severity, assuming no actual _malicious_ attack) if:
>  # Running Solr >= 7.6.0
>  # Using edismax with "ps" param set to >0
>  # Query-time analysis chain is _at all_ capable of producing graphs (e.g., 
> WordDelimiterGraphFilter, SynonymGraphFilter that has corresponding synonyms 
> with varying token lengths.
> Users are _particularly_ vulnerable in practice if they have query-time 
> {{WordDelimiterGraphFilter}} configured with {{preserveOriginal=true}}.
> To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only 
> increased the likelihood of problems manifesting (as a result of 
> LUCENE-8531). Notably, the "enumerated strings" approach to graph phrase 
> query (reintroduced by LUCENE-8531) was previously in place pre-6.5 – at 
> which point it could rely on default Lucene-level {{maxClauseCount}} failsafe 
> (removed as of 7.0). This explains the odd "Affects versions" => 
> maxBooleanClauses was disabled at the Lucene level (in Solr contexts) 
> starting with version 7.0, but the change became more likely to manifest 
> problems for users as of 7.6.
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to