dsmiley commented on a change in pull request #47: URL: https://github.com/apache/solr/pull/47#discussion_r602231577
########## File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java ########## @@ -220,7 +220,7 @@ protected String getReplicateLeaderUrl(ZkNodeProps leaderprops) { final private void replicate(String nodeName, SolrCore core, ZkNodeProps leaderprops) throws SolrServerException, IOException { - final String leaderUrl = getReplicateLeaderUrl(leaderprops); + final String leaderUrl = getReplicateLeaderUrl(leaderprops);//TODO? Review comment: Did you say TODO because maybe this URL needs to be checked here? I did a "Analyze Data Flow To Here" in IntelliJ at the spot where we fetch the URL. This URL is always sourced from the SolrCloud cluster state, and not a parameter. So we're good. That said, it could be worth a comment in some places to say that we "trust" the leaderUrl from the cluster state (needn't be checked) ########## File path: solr/solr-ref-guide/src/format-of-solr-xml.adoc ########## @@ -103,6 +103,9 @@ Specifies the path to a common library directory that will be shared across all `allowPaths`:: Solr will normally only access folders relative to `$SOLR_HOME`, `$SOLR_DATA_HOME` or `coreRootDir`. If you need to e.g., create a core outside of these paths, you can explicitly allow the path with `allowPaths`. It is a comma separated string of file system paths to allow. The special value of `*` will allow any path on the system. +`allowUrls``: +When running Solr in non-cloud mode and if planning to do distributed search (using the "shards" parameter), the list of hosts needs to be allowed or Solr will forbid the request. The allow-list can also be configured in `solr.in.sh`. Review comment: Instead of making reference so `solr.in.sh`, it's better to refer to the particular environment variable that is used. People can set that env variable even if they don't touch solr.in.sh. ########## File path: solr/solr-ref-guide/src/distributed-requests.adoc ########## @@ -115,17 +115,19 @@ If specified, the thread pool will use a backing queue instead of a direct hando `fairnessPolicy`:: Chooses the JVM specifics dealing with fair policy queuing, if enabled distributed searches will be handled in a First in First out fashion at a cost to throughput. If disabled throughput will be favored over latency. The default is `false`. -`shardsWhitelist`:: +In addition, `HttpShardHandlerFactory` also depends on the following top-level property: + +`allowUrls`:: If specified, this lists limits what nodes can be requested in the `shards` request parameter. + -In SolrCloud mode this whitelist is automatically configured to include all live nodes in the cluster. +In SolrCloud mode this allow-list is automatically configured to include all live nodes in the cluster. + -In standalone mode the whitelist defaults to empty (sharding not allowed). +In standalone mode the allow-list defaults to empty (sharding not allowed). + -If you need to disable this feature for backwards compatibility, you can set the system property `solr.disable.shardsWhitelist=true`. The value of this parameter is a comma separated list of the nodes that will be whitelisted, i.e., +If you need to disable this feature for backwards compatibility, you can set the system property `solr.disable.allowUrls=true`. The value of this parameter is a comma separated list of the nodes that will be allowed, i.e., `10.0.0.1:8983/solr,10.0.0.1:8984/solr`. + -NOTE: In SolrCloud mode, if at least one node is included in the whitelist, then the `live_nodes` will no longer be used as source for the list. This means that if you need to do a cross-cluster request using the `shards` parameter in SolrCloud mode (in addition to regular within-cluster requests), you'll need to add all nodes (local cluster + remote nodes) to the whitelist. +NOTE: In SolrCloud mode, if at least one node is included in the allow-list, then the `live_nodes` will no longer be used as source for the list. This means that if you need to do a cross-cluster request using the `shards` parameter in SolrCloud mode (in addition to regular within-cluster requests), you'll need to add all nodes (local cluster + remote nodes) to the allow-list. Review comment: IMO this is a shame but I understand we don't want to change this approach in this PR. ########## File path: solr/core/src/java/org/apache/solr/handler/component/HttpShardHandlerFactory.java ########## @@ -413,139 +384,4 @@ public void initializeMetrics(SolrMetricsContext parentContext, String scope) { solrMetricsContext.getMetricRegistry(), SolrMetricManager.mkName("httpShardExecutor", expandedScope, "threadPool")); } - Review comment: It's nice to see all this stuff moved out of HttpShardHandlerFactory which already has tons of concerns to deal with. ########## File path: solr/server/solr/solr.xml ########## @@ -53,7 +54,6 @@ class="HttpShardHandlerFactory"> <int name="socketTimeout">${socketTimeout:600000}</int> <int name="connTimeout">${connTimeout:60000}</int> - <str name="shardsWhitelist">${solr.shardsWhitelist:}</str> Review comment: If a user has a legacy config with this defined, what do we do now? Log a warning would be nice. ########## File path: solr/core/src/java/org/apache/solr/security/AllowListUrlChecker.java ########## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.security; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.cloud.ClusterState; +import org.apache.solr.common.util.StrUtils; +import org.apache.solr.core.NodeConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.annotation.Nullable; +import java.lang.invoke.MethodHandles; +import java.net.MalformedURLException; +import java.net.URL; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; + +/** + * Validates URLs based on an allow list or a {@link ClusterState} in SolrCloud. + */ +public class AllowListUrlChecker { + + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + /** + * {@link org.apache.solr.core.SolrXmlConfig} property to configure the allowed URLs. + */ + public static final String URL_ALLOW_LIST = "allowUrls"; + + /** + * System property to disable URL checking and {@link #ALLOW_ALL} instead. + */ + public static final String DISABLE_URL_ALLOW_LIST = "solr.disable." + URL_ALLOW_LIST; + + /** + * Clue given in URL-forbidden exceptions messages. + */ + public static final String SET_SOLR_DISABLE_URL_ALLOW_LIST_CLUE = "Set -D" + DISABLE_URL_ALLOW_LIST + "=true to disable URL allow-list checks."; + + /** + * Singleton checker which allows all URLs. {@link #isEnabled()} returns false. + */ + public static final AllowListUrlChecker ALLOW_ALL; + + static { + try { + ALLOW_ALL = new AllowListUrlChecker(null) { + @Override + public void checkAllowList(List<String> urls, ClusterState clusterState) { + // Allow. + } + + @Override + public boolean isEnabled() { + return false; + } + + @Override + public String toString() { + return getClass().getSimpleName() + " [allow all]"; + } + }; + } catch (MalformedURLException e) { + // Never thrown. + throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e); + } + } + + /** + * Allow list of hosts. Elements in the list will be host:port (no protocol or context). + */ + private final Set<String> hostAllowList; + + /** + * @param urlAllowList Comma-separated list of allowed URLs. URLs must be well-formed, missing protocol is tolerated. + * Empty or null is supported and means there is no explicit allow-list of URLs. In this case no + * URL is allowed unless a {@link ClusterState} is provided in + * {@link #checkAllowList(List, ClusterState)}. + * @throws MalformedURLException If an URL is invalid. + */ + public AllowListUrlChecker(@Nullable String urlAllowList) throws MalformedURLException { + hostAllowList = parseHostPorts(urlAllowList); + } + + /** + * Creates a URL checker based on the {@link NodeConfig} property to configure the allowed URLs. + */ + public static AllowListUrlChecker create(NodeConfig config) { + if (Boolean.getBoolean(DISABLE_URL_ALLOW_LIST)) { + return AllowListUrlChecker.ALLOW_ALL; + } else if (System.getProperty("solr.disable.shardsWhitelist") != null) { + log.warn("Property 'solr.disable.shardsWhitelist' is deprecated, please use '" + DISABLE_URL_ALLOW_LIST + "' instead."); Review comment: nice ########## File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java ########## @@ -385,6 +388,9 @@ public CoreContainer(NodeConfig config, CoresLocator locator, boolean asyncSolrC } } + this.allowListUrlChecker = AllowListUrlChecker.create(config); + log.info("URL allow-list initialized: {}", this.allowListUrlChecker); Review comment: I question that this worth logging at INFO for. Solr has *tons* of plugins. ########## File path: solr/solr-ref-guide/src/distributed-requests.adoc ########## @@ -115,17 +115,19 @@ If specified, the thread pool will use a backing queue instead of a direct hando `fairnessPolicy`:: Chooses the JVM specifics dealing with fair policy queuing, if enabled distributed searches will be handled in a First in First out fashion at a cost to throughput. If disabled throughput will be favored over latency. The default is `false`. -`shardsWhitelist`:: +In addition, `HttpShardHandlerFactory` also depends on the following top-level property: + +`allowUrls`:: Review comment: I think you should be making reference to `format-of-solr-xml.adoc` where this setting should be canonically documented. That said, I understand why this distributed-requests page has information on this. ########## File path: solr/core/src/java/org/apache/solr/core/NodeConfig.java ########## @@ -44,6 +44,8 @@ private final Set<Path> allowPaths; + private final String allowUrls; Review comment: I can't help but notice that just above here we have a `Set` of allowed paths, yet here you are adding a single String that is apparently more than one thing. This is a bit out of balance. If it's not much work, can this be harmonized? ########## File path: solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java ########## @@ -1231,6 +1231,9 @@ private void setupPolling(String intervalStr) { @Override @SuppressWarnings({"resource"}) public void inform(SolrCore core) { + + //TODO core.getCoreContainer().getAllowListUrlChecker() Review comment: I see you managed to get ahold of the checker from IndexFetcher so maybe it's not needed here in ReplicationHandler? ########## File path: solr/solr-ref-guide/src/format-of-solr-xml.adoc ########## @@ -103,6 +103,9 @@ Specifies the path to a common library directory that will be shared across all `allowPaths`:: Solr will normally only access folders relative to `$SOLR_HOME`, `$SOLR_DATA_HOME` or `coreRootDir`. If you need to e.g., create a core outside of these paths, you can explicitly allow the path with `allowPaths`. It is a comma separated string of file system paths to allow. The special value of `*` will allow any path on the system. +`allowUrls``: +When running Solr in non-cloud mode and if planning to do distributed search (using the "shards" parameter), the list of hosts needs to be allowed or Solr will forbid the request. The allow-list can also be configured in `solr.in.sh`. Review comment: And since there are more extensive docs on the `distributed-requests.adoc` page; lets link to there. Or move more of that info here as you prefer. ########## File path: solr/solr-ref-guide/src/the-terms-component.adoc ########## @@ -349,11 +349,11 @@ The TermsComponent also supports distributed indexes. For the `/terms` request h `shards`:: Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see <<distributed-search-with-index-sharding.adoc#,Distributed Search with Index Sharding>>. + -The `shards` parameter is subject to a host whitelist that has to be configured in the component's parameters using the configuration key `shardsWhitelist` and the list of hosts as values. +The `shards` parameter is subject to a host allow-list that has to be configured in the component's parameters using the configuration key `allowUrls` and the list of hosts as values. Review comment: Information on this page is obsolete; we can remove `shards` here. See https://issues.apache.org/jira/browse/SOLR-14036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org