Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija merged PR #2395: URL: https://github.com/apache/solr/pull/2395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2100767344 Cool; please remember to mention me in CHANGES.txt. And thanks for adding this protection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2100578628 Thanks for the change David - apologies for the back and forth trying to understand what you were suggesting. I've merged your PR into this branch. With that wrapped up, I'm going to re-run tests/check/precommit and aim to merge this afternoon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1594022421 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * + * maxFields - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * warnOnly - (optional) If true then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to false + *if not specified + * + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { +// register a commit callback for monitoring the number of fields in the schema +numFieldsMonitor = new NumFieldsMonitor(core); +core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); +core.registerNewSearcherListener(numFieldsMonitor); + } + + @Override + public void init(NamedList args) { +warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + +if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); +} +final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); +if (rawMaxFields == null || !(rawMaxFields instanceof Integer)) { + throw new IllegalArgumentException( + MAXIMUM_FIELDS_PARAM + " must be configured as a non-null "); +} +maximumFields = (Integer) rawMaxFields; +if (maximumFields <= 0) { + throw new IllegalArgumentException(MAXIMUM_FIELDS_PARAM + " must be a positive integer"); +} + } + + @Override + public UpdateRequestProcessor getInstance( + SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { +
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1594022421 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * + * maxFields - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * warnOnly - (optional) If true then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to false + *if not specified + * + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { +// register a commit callback for monitoring the number of fields in the schema +numFieldsMonitor = new NumFieldsMonitor(core); +core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); +core.registerNewSearcherListener(numFieldsMonitor); + } + + @Override + public void init(NamedList args) { +warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + +if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); +} +final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); +if (rawMaxFields == null || !(rawMaxFields instanceof Integer)) { + throw new IllegalArgumentException( + MAXIMUM_FIELDS_PARAM + " must be configured as a non-null "); +} +maximumFields = (Integer) rawMaxFields; +if (maximumFields <= 0) { + throw new IllegalArgumentException(MAXIMUM_FIELDS_PARAM + " must be a positive integer"); +} + } + + @Override + public UpdateRequestProcessor getInstance( + SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { +
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1594016753 ## solr/core/src/test-files/solr/configsets/cloud-minimal-field-limiting/conf/solrconfig.xml: ## @@ -0,0 +1,73 @@ + + + + + + + + + ${solr.data.dir:} + + + + + ${tests.luceneMatchVersion:LATEST} + + + + ${solr.commitwithin.softcommit:true} + + + + + + + explicit + true + text + + + + +${solr.test.maxFields:1234} + Review Comment: I've updated this to be in line with your request. Though I'm still curious if there's a reason to "prefer" this going forward, or whether it's subjective. I've not paid much attention historically to how these appear in our configsets, but naively I'd assume that more composability is better. ## solr/core/src/test-files/solr/configsets/cloud-minimal-field-limiting/conf/solrconfig.xml: ## @@ -0,0 +1,73 @@ + + + + + + + + + ${solr.data.dir:} + + + + + ${tests.luceneMatchVersion:LATEST} + + + + ${solr.commitwithin.softcommit:true} + + + + + + + explicit + true + text + + + + +${solr.test.maxFields:1234} + Review Comment: I've updated this to be in line with your request. Though I'm still curious if there's a reason to prefer this going forward, or whether it's subjective. I've not paid much attention historically to how these appear in our configsets, but naively I'd assume that more composability is better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2097012109 See https://github.com/gerlowskija/solr/pull/5 for a commit that makes a large simplification: * No NumFieldsMonitor * No named URP; anonymous class is fine (and less code) * URP need not be inserted in the happy path! The single line I refer to earlier is as follows in my PR to your PR: `final int currentNumFields = req.getSearcher().getFieldInfos().size();` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1591513535 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * + * maxFields - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * warnOnly - (optional) If true then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to false + *if not specified + * + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { +// register a commit callback for monitoring the number of fields in the schema +numFieldsMonitor = new NumFieldsMonitor(core); +core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); +core.registerNewSearcherListener(numFieldsMonitor); + } + + @Override + public void init(NamedList args) { +warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + +if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); +} +final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); +if (rawMaxFields == null || !(rawMaxFields instanceof Integer)) { + throw new IllegalArgumentException( + MAXIMUM_FIELDS_PARAM + " must be configured as a non-null "); +} +maximumFields = (Integer) rawMaxFields; +if (maximumFields <= 0) { + throw new IllegalArgumentException(MAXIMUM_FIELDS_PARAM + " must be a positive integer"); +} + } + + @Override + public UpdateRequestProcessor getInstance( + SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { + +
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1591039849 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * + * maxFields - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * warnOnly - (optional) If true then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to false + *if not specified + * + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { +// register a commit callback for monitoring the number of fields in the schema +numFieldsMonitor = new NumFieldsMonitor(core); +core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); +core.registerNewSearcherListener(numFieldsMonitor); + } + + @Override + public void init(NamedList args) { +warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + +if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); +} +final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); +if (rawMaxFields == null || !(rawMaxFields instanceof Integer)) { + throw new IllegalArgumentException( + MAXIMUM_FIELDS_PARAM + " must be configured as a non-null "); +} +maximumFields = (Integer) rawMaxFields; +if (maximumFields <= 0) { + throw new IllegalArgumentException(MAXIMUM_FIELDS_PARAM + " must be a positive integer"); +} + } + + @Override + public UpdateRequestProcessor getInstance( + SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { +
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1591064705 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessor.java: ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.Locale; +import org.apache.solr.common.SolrException; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NumFieldLimitingUpdateRequestProcessor extends UpdateRequestProcessor { Review Comment: Missed this earlier; done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1591039849 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * + * maxFields - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * warnOnly - (optional) If true then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to false + *if not specified + * + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { +// register a commit callback for monitoring the number of fields in the schema +numFieldsMonitor = new NumFieldsMonitor(core); +core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); +core.registerNewSearcherListener(numFieldsMonitor); + } + + @Override + public void init(NamedList args) { +warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + +if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); +} +final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); +if (rawMaxFields == null || !(rawMaxFields instanceof Integer)) { + throw new IllegalArgumentException( + MAXIMUM_FIELDS_PARAM + " must be configured as a non-null "); +} +maximumFields = (Integer) rawMaxFields; +if (maximumFields <= 0) { + throw new IllegalArgumentException(MAXIMUM_FIELDS_PARAM + " must be a positive integer"); +} + } + + @Override + public UpdateRequestProcessor getInstance( + SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { +
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1589670077 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * + * maxFields - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * warnOnly - (optional) If true then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to false + *if not specified + * + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { +// register a commit callback for monitoring the number of fields in the schema +numFieldsMonitor = new NumFieldsMonitor(core); +core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); +core.registerNewSearcherListener(numFieldsMonitor); + } + + @Override + public void init(NamedList args) { +warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + +if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); +} +final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); +if (rawMaxFields == null || !(rawMaxFields instanceof Integer)) { + throw new IllegalArgumentException( + MAXIMUM_FIELDS_PARAM + " must be configured as a non-null "); +} +maximumFields = (Integer) rawMaxFields; +if (maximumFields <= 0) { + throw new IllegalArgumentException(MAXIMUM_FIELDS_PARAM + " must be a positive integer"); +} + } + + @Override + public UpdateRequestProcessor getInstance( + SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { + +
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1589657516 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * + * maxFields - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * warnOnly - (optional) If true then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to false + *if not specified + * + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { +// register a commit callback for monitoring the number of fields in the schema +numFieldsMonitor = new NumFieldsMonitor(core); +core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); +core.registerNewSearcherListener(numFieldsMonitor); + } + + @Override + public void init(NamedList args) { +warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + +if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); +} +final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); +if (rawMaxFields == null || !(rawMaxFields instanceof Integer)) { + throw new IllegalArgumentException( + MAXIMUM_FIELDS_PARAM + " must be configured as a non-null "); +} +maximumFields = (Integer) rawMaxFields; +if (maximumFields <= 0) { + throw new IllegalArgumentException(MAXIMUM_FIELDS_PARAM + " must be a positive integer"); +} + } + + @Override + public UpdateRequestProcessor getInstance( + SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { + +
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1589647578 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { + + private int numFields; + + public NumFieldsMonitor(SolrCore core) { +super(core); +this.numFields = -1; + } + + @Override + public void postCommit() { +RefCounted indexSearcher = null; +try { + indexSearcher = getCore().getSearcher(); Review Comment: Maybe; Java exception handling is such a PITA I didn't want to defeat the lambda convenience if the lambda threw an IOException which is rather common. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1589645861 ## solr/core/src/test-files/solr/configsets/cloud-minimal-field-limiting/conf/solrconfig.xml: ## @@ -0,0 +1,73 @@ + + + + + + + + + ${solr.data.dir:} + + + + + ${tests.luceneMatchVersion:LATEST} + + + + ${solr.commitwithin.softcommit:true} + + + + + + + explicit + true + text + + + + +${solr.test.maxFields:1234} + Review Comment: Having no-config vs config doesn't conceptually change where the element can be placed. ex: `` vs ``` ``` Can be in the same places. I went to the ref guide's [page on URPs](https://solr.apache.org/guide/solr/latest/configuration-guide/update-request-processors.html#custom-update-request-processor-chain) and observe the first example shows a chain with several URPs, one of which has config. I think it's natural to place URPs inline in the chain. Separating it out is for composability if there are multiple chains and URPs and you want to avoid possible repetition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1589189040 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * + * maxFields - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * warnOnly - (optional) If true then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to false + *if not specified + * + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { +// register a commit callback for monitoring the number of fields in the schema +numFieldsMonitor = new NumFieldsMonitor(core); +core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); +core.registerNewSearcherListener(numFieldsMonitor); + } + + @Override + public void init(NamedList args) { +warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + +if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); +} +final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); +if (rawMaxFields == null || !(rawMaxFields instanceof Integer)) { + throw new IllegalArgumentException( + MAXIMUM_FIELDS_PARAM + " must be configured as a non-null "); +} +maximumFields = (Integer) rawMaxFields; +if (maximumFields <= 0) { + throw new IllegalArgumentException(MAXIMUM_FIELDS_PARAM + " must be a positive integer"); +} + } + + @Override + public UpdateRequestProcessor getInstance( + SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { +
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1589268556 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { + + private int numFields; + + public NumFieldsMonitor(SolrCore core) { +super(core); +this.numFields = -1; + } + + @Override + public void postCommit() { +RefCounted indexSearcher = null; +try { + indexSearcher = getCore().getSearcher(); Review Comment: Is there a reason that `withSearcher` throws an IOException? Would it be worth having one of those methods that consumes a vanilla Function instead of a `IOFunction`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2093110049 Thanks for the re-review @dsmiley . Pending feedback from others I'm going to stick with the NumFieldMonitor approach (more details [here](https://github.com/apache/solr/pull/2395#discussion_r1589189040)), but I've addressed the rest of your suggestions. Lmk what you think! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1589215019 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { Review Comment: > I suppose you didn't like my proposal that would eliminate the need for this event listener Yes, in short. I agree it's a matter of opinion, but I prefer the particular complexity and benefits that NumFieldsMonitor offers to that of the direct-SIS or weak-reference approach. See my reply [on the factory above](https://github.com/apache/solr/pull/2395/files#diff-6da5de1a5fc66295d066909ad976a15ad31e8411ed71d7909e8ed0d14f3cac78R103) for a bit more detail/explanation. ## solr/core/src/test/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorIntegrationTest.java: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.UUID; +import java.util.concurrent.TimeUnit; +import org.apache.solr.client.solrj.request.CollectionAdminRequest; +import org.apache.solr.cloud.SolrCloudTestCase; +import org.apache.solr.common.SolrInputDocument; +import org.hamcrest.Matchers; +import org.junit.BeforeClass; +import org.junit.Test; + +public class NumFieldLimitingUpdateRequestProcessorIntegrationTest extends SolrCloudTestCase { + + private static String SINGLE_SHARD_COLL_NAME = "singleShardColl"; Review Comment: I think this is copy/paste from a test that used both single and multi-shard collections. I've updated the name to be more generic. ## solr/core/src/test-files/solr/configsets/cloud-minimal-field-limiting/conf/solrconfig.xml: ## @@ -0,0 +1,73 @@ + + + + + + + + + ${solr.data.dir:} + + + + + ${tests.luceneMatchVersion:LATEST} + + + + ${solr.commitwithin.softcommit:true} + + + + + + + explicit + true + text + + + + +${solr.test.maxFields:1234} + Review Comment: Tbh I wasn't aware that you could declare processors (that require config) directly *in* a processor chain. I can't find an example of that. Looking through several other solrconfig.xml files for URPs that require configuration...they all look similar to what I've done here afaict. (See the [_default configset Solr ships](https://github.com/apache/solr/blob/main/solr/server/solr/configsets/_default/conf/solrconfig.xml#L900-L944) and the [default configset in the 'core' modules tests](https://github.com/apache/solr/blob/main/solr/core/src/test-files/solr/configsets/_default/conf/solrconfig.xml#L61-L81)...both look similar to the example here) Maybe I'm misunderstanding you or missing something though - can you point to an example of how you'd prefer to see this declared, and a reason it's preferable? ## solr/solr-ref-guide/modules/configuration-guide/pages/update-request-processors.adoc: ## @@ -337,6 +337,12 @@ Documents processed prior to the offender
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1586827784 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessor.java: ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.Locale; +import org.apache.solr.common.SolrException; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NumFieldLimitingUpdateRequestProcessor extends UpdateRequestProcessor { Review Comment: javadoc needed, even if one line linking to a factory for further info. Or don't make public :-) ## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { + + private int numFields; + + public NumFieldsMonitor(SolrCore core) { +super(core); +this.numFields = -1; + } + + @Override + public void postCommit() { +RefCounted indexSearcher = null; +try { + indexSearcher = getCore().getSearcher(); Review Comment: see SolrCore.withSearcher to make this code less involved. I added that long ago and getSearcher's javadocs recommend using it. ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2086978057 Alright, I've removed a lot of the leadership and shard checking logic based on review feedback. I think this should be ready to go, unless you have remaining concerns @dsmiley ? I've also added the new URP to the URP-chain defined in the "_default" configset (`/solr/server/solr/configsets/_default`). My intent here is that this part of the PR would be a "main" only change. 9.x would either remain without the URP in the _default configset, or have it enabled but in "warnOnly" mode (my personal preference). I used a relatively high field limit, 1000, but we can raise it if folks think 1000 is too low? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1585300753 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessor.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.Locale; +import org.apache.solr.cloud.CloudDescriptor; +import org.apache.solr.cloud.ZkController; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.cloud.Replica; +import org.apache.solr.common.cloud.Slice; +import org.apache.solr.common.util.StrUtils; +import org.apache.solr.core.CoreDescriptor; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NumFieldLimitingUpdateRequestProcessor extends UpdateRequestProcessor { + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + private SolrQueryRequest req; + private int fieldThreshold; + private int currentNumFields; + private boolean warnOnly; + + public NumFieldLimitingUpdateRequestProcessor( + SolrQueryRequest req, + UpdateRequestProcessor next, + int fieldThreshold, + int currentNumFields, + boolean warnOnly) { +super(next); +this.req = req; +this.fieldThreshold = fieldThreshold; +this.currentNumFields = currentNumFields; +this.warnOnly = warnOnly; + } + + public void processAdd(AddUpdateCommand cmd) throws IOException { +if (!isCloudMode() || /* implicit isCloudMode==true */ isLeaderThatOwnsTheDoc(cmd)) { Review Comment: Yeah, I'm fine with moving away from that. In fact, rather than move the leader-check to the Factory, I think I'll omit it altogether. I don't think it makes much sense if we're not checking the shard as well. Though if you see a way for that to work, I'm happy to add it back in. ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1567989670 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessor.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.Locale; +import org.apache.solr.cloud.CloudDescriptor; +import org.apache.solr.cloud.ZkController; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.cloud.Replica; +import org.apache.solr.common.cloud.Slice; +import org.apache.solr.common.util.StrUtils; +import org.apache.solr.core.CoreDescriptor; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NumFieldLimitingUpdateRequestProcessor extends UpdateRequestProcessor { + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + private SolrQueryRequest req; + private int fieldThreshold; + private int currentNumFields; + private boolean warnOnly; + + public NumFieldLimitingUpdateRequestProcessor( + SolrQueryRequest req, + UpdateRequestProcessor next, + int fieldThreshold, + int currentNumFields, + boolean warnOnly) { +super(next); +this.req = req; +this.fieldThreshold = fieldThreshold; +this.currentNumFields = currentNumFields; +this.warnOnly = warnOnly; + } + + public void processAdd(AddUpdateCommand cmd) throws IOException { +if (!isCloudMode() || /* implicit isCloudMode==true */ isLeaderThatOwnsTheDoc(cmd)) { Review Comment: isCloudMode -- let's not; the user can configure or not as they wish. "isLeaderThatOwnsTheDoc" can also be removed. Instead in the URP Factory, if we are not a leader, then skip this URP. The "owns the doc concept" I think we are agreeing we don't need in this PR. ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + *
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2059268487 > I don't like the complexity in this URP relating to tolerance of where the URP is placed in the chain; I'd feel better if the URP were simplified from that concern and we expect the user to place it at an appropriate spot Yeah I understand. I still feel a little reluctant I guess, but I'm likely being paranoid. It's harder to feel good about the "win" in making things safer for users if the change opens up another trappy state when misconfigured. But I'll take out the leader-check for now. We can always re-evaluate later if we see people getting bitten by this, and there are other ways to mitigate the risk of misconfig (e.g. putting into the "_default" config for 10.0 so users needn't tweak things). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2050095834 I don't like the complexity in this URP relating to tolerance of where the URP is placed in the chain; I'd feel better if the URP were simplified from that concern and we expect the user to place it at an appropriate spot. We don't have complexity like that elsewhere and/or I argue it's not the right approach. I sympathize with why an URP feels right. Okay. On each `addDoc`, don't we just need to do the check on the current SolrIndexSearcher but remember who the searcher was so that the next `addDoc` can see it's the same searcher and if so don't do the check? It'd need to cache the searcher reference for the instance equality; use a `WeakReference` so we allow the searcher to close & GC. If we do this, we don't need a commit event listener, thus don't need an additional component / confusing interaction complexity. Don't get the SolrIndexSearcher from the request (we don't want a per-request cache instance), get it from the SolrCore so we see a possibly evolving SolrIndexSearcher. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2049901831 > Sorry if I rained on your parade, attempting to get this in by EOW. Oh no worries; there's no particular rush. That was just how long I was planning to wait if no one commented in the interim. I'm glad you both chimed in! Though tbh I can't quite tell where you are on the PR approach overall: 1. Are you '-1' this being an URP at all? 2. Are you OK with this being an URP, with the minor tweaks you mentioned (e.g. documenting that it goes _after_ DUP, etc.)? 3. Are you only OK with this being an URP, but only if we make some improvements to the URP framework (e.g. to ensure certain URPs always follow DUP) 4. ...something else? Please let me know! One last reply inline: > To block all indexing, it could set SolrCore.readOnly = true and be done with it IMO `SolrCore.readOnly` is too blunt a tool for the "field limiting" this PR attempts. First because `readOnly` would block deletions and operations that we'd want to allow. And second because a user blocked by 'readOnly' would have their `/update` requests fail with a message that's pretty inscrutable or at least unrelated to the root cause. That's what I like about the URP approach - it gives us more granularity into the different types of "/update" operations and allows us to get a nice (or at least, nicer) error message back to users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1560999237 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessor.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.Locale; +import org.apache.solr.cloud.CloudDescriptor; +import org.apache.solr.cloud.ZkController; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.cloud.Replica; +import org.apache.solr.common.cloud.Slice; +import org.apache.solr.common.util.StrUtils; +import org.apache.solr.core.CoreDescriptor; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NumFieldLimitingUpdateRequestProcessor extends UpdateRequestProcessor { + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + private SolrQueryRequest req; + private int fieldThreshold; + private int currentNumFields; + private boolean warnOnly; + + public NumFieldLimitingUpdateRequestProcessor( + SolrQueryRequest req, + UpdateRequestProcessor next, + int fieldThreshold, + int currentNumFields, + boolean warnOnly) { +super(next); +this.req = req; +this.fieldThreshold = fieldThreshold; +this.currentNumFields = currentNumFields; +this.warnOnly = warnOnly; + } + + public void processAdd(AddUpdateCommand cmd) throws IOException { +if (!isCloudMode() || /* implicit isCloudMode==true */ isLeaderThatOwnsTheDoc(cmd)) { + if (coreExceedsFieldLimit()) { +throwExceptionOrLog(cmd.getPrintableId()); + } else { +if (log.isDebugEnabled()) { + log.debug( + "Allowing document {}, since current core is under the max-field limit (numFields={}, maxThreshold={})", Review Comment: Done ## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { + + private int numFields; + + public NumFieldsMonitor(SolrCore core) { +super(core); +this.numFields = -1; + } + + @Override + public void postCommit() { +RefCounted indexSearcher = null; +try { + indexSearcher = getCore().getSearcher(); + // Get the number of fields directly from the IndexReader instead of the Schema object to also + // include the Review Comment: > formatting? Done > are there any other interesting places in Solr to use a NumFieldsMonitor? Good question. It does seem like a kindof neat thing to have around, but I can't think of any other use-cases off the top of my head... ## solr/core/src/test/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactoryTest.java: ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1561009284 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * + * maxFields - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * warnOnly - (optional) If true then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to false + *if not specified + * + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { +// register a commit callback for monitoring the number of fields in the schema +numFieldsMonitor = new NumFieldsMonitor(core); +core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); +core.registerNewSearcherListener(numFieldsMonitor); + } + + public void init(NamedList args) { +warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + +if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); +} +final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); Review Comment: Agreed - it's really unfortunate. I looked at a number of URP's and NamedListInitializedPlugin's, but couldn't find anything I liked. Maybe we could file a ticket for that separately; it'd be a really nice cleanup for all our NLIP's to use a set of utilities like: ``` int getRequiredNamedListValueAsInteger(NamedList nl, String paramName); Integer getNamedListValueAsInteger(NamedList nl, String paramName); boolean getRequiredNamedListValueAsBoolean(NamedList nl, String paramName);
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
dsmiley commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2048498024 Sorry if I rained on your parade, attempting to get this in by EOW. I like the idea of having checks like this by default but I'd like Solr to have such built-in plugins simply be registered automatically unless you set some env/sys-prop to disable, like "solr.core.eventListener.fieldLimit.enabled". We already register a ton of query parsers even though nobody asked for them :-). I think that should be the model for built-in plugins so that we can continue slowly to reduce what's in a solrconfig.xml yet also allow someone to disable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
epugh commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1559751361 ## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { + + private int numFields; + + public NumFieldsMonitor(SolrCore core) { +super(core); +this.numFields = -1; + } + + @Override + public void postCommit() { +RefCounted indexSearcher = null; +try { + indexSearcher = getCore().getSearcher(); + // Get the number of fields directly from the IndexReader instead of the Schema object to also + // include the Review Comment: are there any other interesting places in Solr to use a NumFieldsMonitor? ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessor.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.Locale; +import org.apache.solr.cloud.CloudDescriptor; +import org.apache.solr.cloud.ZkController; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.cloud.Replica; +import org.apache.solr.common.cloud.Slice; +import org.apache.solr.common.util.StrUtils; +import org.apache.solr.core.CoreDescriptor; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NumFieldLimitingUpdateRequestProcessor extends UpdateRequestProcessor { + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + private SolrQueryRequest req; + private int fieldThreshold; + private int currentNumFields; + private boolean warnOnly; + + public NumFieldLimitingUpdateRequestProcessor( + SolrQueryRequest req, + UpdateRequestProcessor next, + int fieldThreshold, + int currentNumFields, + boolean warnOnly) { +super(next); +this.req = req; +this.fieldThreshold = fieldThreshold; +this.currentNumFields = currentNumFields; +this.warnOnly = warnOnly; + } + + public void processAdd(AddUpdateCommand cmd) throws IOException { +if (!isCloudMode() || /* implicit isCloudMode==true */ isLeaderThatOwnsTheDoc(cmd)) { + if (coreExceedsFieldLimit()) { +throwExceptionOrLog(cmd.getPrintableId()); + } else { +if (log.isDebugEnabled()) { + log.debug( + "Allowing document {}, since current core is under the max-field limit (numFields={}, maxThreshold={})", Review Comment: max-field or maxField? ## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2047973430 This should all be ready to go: I've added unit and integration tests, `./gradlew test` passes, there's ref-guide coverage and decent Javadocs, etc. I'll aim to merge EOW, absent any feedback folks might have here. One big question I have is: should we add this into the "_default" configset's default URP-chain? At least for 10.x? I'd argue we should. Safety guardrails like this one aren't any good unless users enable them, and having it be enabled by default is the best way to do that. We might get some complaints from folks who run into this during an inplace 9->10 upgrade, but: 1. We tell users to carefully reconsider their configsets when doing upgrades, so this shouldn't be a surprise. 2. We can highlight it clearly in the upgrade notes 3. Users who hit this on an in-place upgrade can always bump the limit, toggle the URP into 'warnOnly' mode, or disable the URP altogether if they really want. I'd love a sanity-check/+1 from someone else though before I make this enabled by default on `main`/10.x, so please chime in! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2045508653 In a sample solrconfig.xml, configuration for the proposed URP would look like: ``` 123 ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija commented on PR #2395: URL: https://github.com/apache/solr/pull/2395#issuecomment-2045498246 Still TBD - additional tests, ref-guide docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]
gerlowskija opened a new pull request, #2395: URL: https://github.com/apache/solr/pull/2395 https://issues.apache.org/jira/browse/SOLR-17192 # Description Solr isn't infinitely scalable when it comes to the number of fields in each core/collection. Most deployments start to experience problems any time a core has upwards of a few hundred fields. Usually this doesn't exhibit itself right away. instead waiting until segment-merge or some other time to rear its head. Despite this being a known limitation Solr doesn't have any (active) way of helping users avoid this, excepting one or two references buried in the Solr ref-gudie. # Solution This commit adds a new UpdateRequestProcessor, `NumFieldsLimitingUpdateRequestProcessor`, that flags potentially-dangerous schema design for users by failing all updates once the relevant core has exceeded a configurable limit. The proposed URP has two configuration properties: - `maxFields` - (required) a positive integer value representing the maximum number of fields a core should be allowed to have. - `warnOnly` - (optional, defaults to 'false' if not specified) a boolean flag indicating whether Solr should "really" fail requests once the threshold has been hit, or should merely log a warning instead. # Tests Unit tests in `NumFieldsLimitingUpdateRequestProcessorFactoryTest`. Still needs some higher level testing prior to merge. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [ ] I have run `./gradlew check`. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org