dsmiley commented on a change in pull request #47:
URL: https://github.com/apache/solr/pull/47#discussion_r602231577



##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -220,7 +220,7 @@ protected String getReplicateLeaderUrl(ZkNodeProps 
leaderprops) {
   final private void replicate(String nodeName, SolrCore core, ZkNodeProps 
leaderprops)
       throws SolrServerException, IOException {
 
-    final String leaderUrl = getReplicateLeaderUrl(leaderprops);
+    final String leaderUrl = getReplicateLeaderUrl(leaderprops);//TODO?

Review comment:
       Did you say TODO because maybe this URL needs to be checked here?  I did 
a "Analyze Data Flow To Here" in IntelliJ at the spot where we fetch the URL.  
This URL is always sourced from the SolrCloud cluster state, and not a 
parameter.  So we're good.  That said, it could be worth a comment in some 
places to say that we "trust" the leaderUrl from the cluster state (needn't be 
checked)

##########
File path: solr/solr-ref-guide/src/format-of-solr-xml.adoc
##########
@@ -103,6 +103,9 @@ Specifies the path to a common library directory that will 
be shared across all
 `allowPaths`::
 Solr will normally only access folders relative to `$SOLR_HOME`, 
`$SOLR_DATA_HOME` or `coreRootDir`. If you need to e.g., create a core outside 
of these paths, you can explicitly allow the path with `allowPaths`. It is a 
comma separated string of file system paths to allow. The special value of `*` 
will allow any path on the system.
 
+`allowUrls``:
+When running Solr in non-cloud mode and if planning to do distributed search 
(using the "shards" parameter), the list of hosts needs to be allowed or Solr 
will forbid the request. The allow-list can also be configured in `solr.in.sh`.

Review comment:
       Instead of making reference so `solr.in.sh`, it's better to refer to the 
particular environment variable that is used.  People can set that env variable 
even if they don't touch solr.in.sh.

##########
File path: solr/solr-ref-guide/src/distributed-requests.adoc
##########
@@ -115,17 +115,19 @@ If specified, the thread pool will use a backing queue 
instead of a direct hando
 `fairnessPolicy`::
 Chooses the JVM specifics dealing with fair policy queuing, if enabled 
distributed searches will be handled in a First in First out fashion at a cost 
to throughput. If disabled throughput will be favored over latency. The default 
is `false`.
 
-`shardsWhitelist`::
+In addition, `HttpShardHandlerFactory` also depends on the following top-level 
property:
+
+`allowUrls`::
 If specified, this lists limits what nodes can be requested in the `shards` 
request parameter.
 +
-In SolrCloud mode this whitelist is automatically configured to include all 
live nodes in the cluster.
+In SolrCloud mode this allow-list is automatically configured to include all 
live nodes in the cluster.
 +
-In standalone mode the whitelist defaults to empty (sharding not allowed).
+In standalone mode the allow-list defaults to empty (sharding not allowed).
 +
-If you need to disable this feature for backwards compatibility, you can set 
the system property `solr.disable.shardsWhitelist=true`. The value of this 
parameter is a comma separated list of the nodes that will be whitelisted, i.e.,
+If you need to disable this feature for backwards compatibility, you can set 
the system property `solr.disable.allowUrls=true`. The value of this parameter 
is a comma separated list of the nodes that will be allowed, i.e.,
 `10.0.0.1:8983/solr,10.0.0.1:8984/solr`.
 +
-NOTE: In SolrCloud mode, if at least one node is included in the whitelist, 
then the `live_nodes` will no longer be used as source for the list. This means 
that if you need to do a cross-cluster request using the `shards` parameter in 
SolrCloud mode (in addition to regular within-cluster requests), you'll need to 
add all nodes (local cluster + remote nodes) to the whitelist.
+NOTE: In SolrCloud mode, if at least one node is included in the allow-list, 
then the `live_nodes` will no longer be used as source for the list. This means 
that if you need to do a cross-cluster request using the `shards` parameter in 
SolrCloud mode (in addition to regular within-cluster requests), you'll need to 
add all nodes (local cluster + remote nodes) to the allow-list.

Review comment:
       IMO this is a shame but I understand we don't want to change this 
approach in this PR.

##########
File path: 
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandlerFactory.java
##########
@@ -413,139 +384,4 @@ public void initializeMetrics(SolrMetricsContext 
parentContext, String scope) {
         solrMetricsContext.getMetricRegistry(),
         SolrMetricManager.mkName("httpShardExecutor", expandedScope, 
"threadPool"));
   }
-

Review comment:
       It's nice to see all this stuff moved out of HttpShardHandlerFactory 
which already has tons of concerns to deal with.

##########
File path: solr/server/solr/solr.xml
##########
@@ -53,7 +54,6 @@
     class="HttpShardHandlerFactory">
     <int name="socketTimeout">${socketTimeout:600000}</int>
     <int name="connTimeout">${connTimeout:60000}</int>
-    <str name="shardsWhitelist">${solr.shardsWhitelist:}</str>

Review comment:
       If a user has a legacy config with this defined, what do we do now?  Log 
a warning would be nice.

##########
File path: solr/core/src/java/org/apache/solr/security/AllowListUrlChecker.java
##########
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.security;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.cloud.ClusterState;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.core.NodeConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Validates URLs based on an allow list or a {@link ClusterState} in 
SolrCloud.
+ */
+public class AllowListUrlChecker {
+
+    private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+    /**
+     * {@link org.apache.solr.core.SolrXmlConfig} property to configure the 
allowed URLs.
+     */
+    public static final String URL_ALLOW_LIST = "allowUrls";
+
+    /**
+     * System property to disable URL checking and {@link #ALLOW_ALL} instead.
+     */
+    public static final String DISABLE_URL_ALLOW_LIST = "solr.disable." + 
URL_ALLOW_LIST;
+
+    /**
+     * Clue given in URL-forbidden exceptions messages.
+     */
+    public static final String SET_SOLR_DISABLE_URL_ALLOW_LIST_CLUE = "Set -D" 
+ DISABLE_URL_ALLOW_LIST + "=true to disable URL allow-list checks.";
+
+    /**
+     * Singleton checker which allows all URLs. {@link #isEnabled()} returns 
false.
+     */
+    public static final AllowListUrlChecker ALLOW_ALL;
+
+    static {
+        try {
+            ALLOW_ALL = new AllowListUrlChecker(null) {
+                @Override
+                public void checkAllowList(List<String> urls, ClusterState 
clusterState) {
+                    // Allow.
+                }
+
+                @Override
+                public boolean isEnabled() {
+                    return false;
+                }
+
+                @Override
+                public String toString() {
+                    return getClass().getSimpleName() + " [allow all]";
+                }
+            };
+        } catch (MalformedURLException e) {
+            // Never thrown.
+            throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
+        }
+    }
+
+    /**
+     * Allow list of hosts. Elements in the list will be host:port (no 
protocol or context).
+     */
+    private final Set<String> hostAllowList;
+
+    /**
+     * @param urlAllowList Comma-separated list of allowed URLs. URLs must be 
well-formed, missing protocol is tolerated.
+     *                    Empty or null is supported and means there is no 
explicit allow-list of URLs. In this case no
+     *                    URL is allowed unless a {@link ClusterState} is 
provided in
+     *                    {@link #checkAllowList(List, ClusterState)}.
+     * @throws MalformedURLException If an URL is invalid.
+     */
+    public AllowListUrlChecker(@Nullable String urlAllowList) throws 
MalformedURLException {
+        hostAllowList = parseHostPorts(urlAllowList);
+    }
+
+    /**
+     * Creates a URL checker based on the {@link NodeConfig} property to 
configure the allowed URLs.
+     */
+    public static AllowListUrlChecker create(NodeConfig config) {
+        if (Boolean.getBoolean(DISABLE_URL_ALLOW_LIST)) {
+            return AllowListUrlChecker.ALLOW_ALL;
+        } else if (System.getProperty("solr.disable.shardsWhitelist") != null) 
{
+            log.warn("Property 'solr.disable.shardsWhitelist' is deprecated, 
please use '" + DISABLE_URL_ALLOW_LIST + "' instead.");

Review comment:
       nice

##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -385,6 +388,9 @@ public CoreContainer(NodeConfig config, CoresLocator 
locator, boolean asyncSolrC
       }
     }
 
+    this.allowListUrlChecker = AllowListUrlChecker.create(config);
+    log.info("URL allow-list initialized: {}", this.allowListUrlChecker);

Review comment:
       I question that this worth logging at INFO for.  Solr has *tons* of 
plugins.

##########
File path: solr/solr-ref-guide/src/distributed-requests.adoc
##########
@@ -115,17 +115,19 @@ If specified, the thread pool will use a backing queue 
instead of a direct hando
 `fairnessPolicy`::
 Chooses the JVM specifics dealing with fair policy queuing, if enabled 
distributed searches will be handled in a First in First out fashion at a cost 
to throughput. If disabled throughput will be favored over latency. The default 
is `false`.
 
-`shardsWhitelist`::
+In addition, `HttpShardHandlerFactory` also depends on the following top-level 
property:
+
+`allowUrls`::

Review comment:
       I think you should be making reference to `format-of-solr-xml.adoc` 
where this setting should be canonically documented.  That said, I understand 
why this distributed-requests page has information on this.

##########
File path: solr/core/src/java/org/apache/solr/core/NodeConfig.java
##########
@@ -44,6 +44,8 @@
 
   private final Set<Path> allowPaths;
 
+  private final String allowUrls;

Review comment:
       I can't help but notice that just above here we have a `Set` of allowed 
paths, yet here you are adding a single String that is apparently more than one 
thing.  This is a bit out of balance.  If it's not much work, can this be 
harmonized?

##########
File path: solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java
##########
@@ -1231,6 +1231,9 @@ private void setupPolling(String intervalStr) {
   @Override
   @SuppressWarnings({"resource"})
   public void inform(SolrCore core) {
+
+    //TODO core.getCoreContainer().getAllowListUrlChecker()

Review comment:
       I see you managed to get ahold of the checker from IndexFetcher so maybe 
it's not needed here in ReplicationHandler?

##########
File path: solr/solr-ref-guide/src/format-of-solr-xml.adoc
##########
@@ -103,6 +103,9 @@ Specifies the path to a common library directory that will 
be shared across all
 `allowPaths`::
 Solr will normally only access folders relative to `$SOLR_HOME`, 
`$SOLR_DATA_HOME` or `coreRootDir`. If you need to e.g., create a core outside 
of these paths, you can explicitly allow the path with `allowPaths`. It is a 
comma separated string of file system paths to allow. The special value of `*` 
will allow any path on the system.
 
+`allowUrls``:
+When running Solr in non-cloud mode and if planning to do distributed search 
(using the "shards" parameter), the list of hosts needs to be allowed or Solr 
will forbid the request. The allow-list can also be configured in `solr.in.sh`.

Review comment:
       And since there are more extensive docs on the 
`distributed-requests.adoc` page; lets link to there.  Or move more of that 
info here as you prefer.

##########
File path: solr/solr-ref-guide/src/the-terms-component.adoc
##########
@@ -349,11 +349,11 @@ The TermsComponent also supports distributed indexes. For 
the `/terms` request h
 `shards`::
 Specifies the shards in your distributed indexing configuration. For more 
information about distributed indexing, see 
<<distributed-search-with-index-sharding.adoc#,Distributed Search with Index 
Sharding>>.
 +
-The `shards` parameter is subject to a host whitelist that has to be 
configured in the component's parameters using the configuration key 
`shardsWhitelist` and the list of hosts as values.
+The `shards` parameter is subject to a host allow-list that has to be 
configured in the component's parameters using the configuration key 
`allowUrls` and the list of hosts as values.

Review comment:
       Information on this page is obsolete; we can remove `shards` here.  See 
https://issues.apache.org/jira/browse/SOLR-14036




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to