tisonkun commented on code in PR #1966: URL: https://github.com/apache/zookeeper/pull/1966#discussion_r1115748846
########## zookeeper-docs/src/main/resources/markdown/zookeeperSnapshotAndRestore.md: ########## @@ -0,0 +1,67 @@ +<!-- +Copyright 2002-2004 The Apache Software Foundation + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +//--> + +# ZooKeeper Snapshot and Restore Guide + +Zookeeper is designed to withstand machine failures. A Zookeeper cluster can automatically recover +from temporary failures such as machine reboot. It can also tolerate up to (N-1)/2 permanent +failures for a cluster of N members due to hardware failures or disk corruption, etc. When a member +permanently fails, it loses access to the cluster. If the cluster permanently loses more than +(N-1)/2 members, it disastrously fails and loses quorum. Once the quorum is lost, the cluster +cannot reach consensus and therefore cannot continue to accept updates. + +To recover from such disastrous failures, Zookeeper provides snapshot and restore functionalities to +restore a cluster from a snapshot. + +1. Snapshot and restore operate on the connected server via Admin Server APIs +1. Snapshot and restore are rate limited to protect the server from being overloaded +1. Snapshot and restore require authentication and authorization on the root path with ALL permission. +The supported auth schemas are digest, x509 and IP. + +* [Snapshot](#zookeeper_snapshot) +* [Restore](#zookeeper_restore) + +<a name="zookeeper_snapshot"></a> + +## Snapshot +Recovering a cluster needs a snapshot from a ZooKeeper cluster. Users can periodically take +snapshots from a live server which has the highest zxid and stream out data to a local +or external storage/file system (e.g., S3). + + ```bash + # The snapshot command takes snapshot from the server it connects to and rate limited to once every 5 mins by default + curl -H 'Authorization: digest root:root_passwd' http://hostname:adminPort/commands/snapshot?streaming=true --output snapshotFileName + ``` + +<a name="zookeeper_restore"></a> +## Restore + +Restoring a cluster needs a single snapshot as input stream. Restore can be used for recovering a +cluster for quorum lost or building a brand-new cluster with seed data. + +All members should restore using the same snapshot. The following are the recommended steps: + +- Blocking client traffic before restore starts +- For each server + - Moving the files in dataDir and dataLogDir to different location to prevent the restored database Review Comment: Is it better to take a snapshot of the latest state? I think blocking client traffic doesn't prohibit admin request? ########## zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java: ########## @@ -2069,7 +2069,7 @@ public void dumpMonitorValues(BiConsumer<String, Object> response) { /** * Grant or deny authorization to an operation on a node as a function of: - * @param cnxn : the server connection + * @param cnxn : the server connection or null for admin server commands Review Comment: Actually, I found `cnxn` is not required for this method since `ServerObjs serverObjs` is not effectively used in the method call. But it can be out of the scope so this comment is just a hint. ########## zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/CommandBase.java: ########## @@ -28,22 +28,31 @@ public abstract class CommandBase implements Command { private final Set<String> names; private final String doc; private final boolean serverRequired; + private final AuthRequest authRequest; /** * @param names The possible names of this command, with the primary name first. */ protected CommandBase(List<String> names) { - this(names, true, null); + this(names, true); } protected CommandBase(List<String> names, boolean serverRequired) { this(names, serverRequired, null); } - protected CommandBase(List<String> names, boolean serverRequired, String doc) { + protected CommandBase(List<String> names, boolean serverRequired, AuthRequest authRequest) { + this(names, serverRequired, null, authRequest); + } + + protected CommandBase(List<String> names, boolean serverRequired, String doc, AuthRequest authRequest) { Review Comment: nit: It seems `doc` is always null and can be safely removed. ########## zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/StreamOutputter.java: ########## @@ -46,7 +46,7 @@ public void output(final CommandResponse response, final OutputStream os) { try (final InputStream is = response.getInputStream()){ IOUtils.copyBytes(is, os, 1024, true); } catch (final IOException e) { - LOG.error("Exception occurred when streaming out data to {}", clientIP, e); + LOG.warn("Exception streaming out data to {}", clientIP, e); Review Comment: Why do you make this change? ########## zookeeper-server/src/main/java/org/apache/zookeeper/server/auth/AuthenticationProvider.java: ########## @@ -49,6 +52,19 @@ public interface AuthenticationProvider { */ KeeperException.Code handleAuthentication(ServerCnxn cnxn, byte[] authData); + /** + * This method is called when admin server command passes authentication data for this + * scheme. + * + * @param request + * the request that contains the authentication information. + * @param authData + * the authentication data received. + * @return Ids + * the list of Id. Empty list means not authenticated + */ + List<Id> handleAuthentication(HttpServletRequest request, byte[] authData); Review Comment: `default` to empty list and thus reduce some impls? ########## zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/JettyAdminServer.java: ########## @@ -89,17 +90,20 @@ public JettyAdminServer() throws AdminServerException, IOException, GeneralSecur System.getProperty("zookeeper.admin.commandURL", DEFAULT_COMMAND_URL), Integer.getInteger("zookeeper.admin.httpVersion", DEFAULT_HTTP_VERSION), Boolean.getBoolean("zookeeper.admin.portUnification"), - Boolean.getBoolean("zookeeper.admin.forceHttps")); + Boolean.getBoolean("zookeeper.admin.forceHttps"), + Boolean.getBoolean("zookeeper.admin.needClientAuth")); } + @SuppressWarnings("deprecation") Review Comment: Here is a patch to reduce complex: ```diff diff --git a/zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/JettyAdminServer.java b/zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/JettyAdminServer.java index 2c3bb723d..a237e4c3b 100644 --- a/zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/JettyAdminServer.java +++ b/zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/JettyAdminServer.java @@ -94,7 +94,6 @@ public JettyAdminServer() throws AdminServerException, IOException, GeneralSecur Boolean.getBoolean("zookeeper.admin.needClientAuth")); } - @SuppressWarnings("deprecation") public JettyAdminServer( String address, int port, @@ -148,15 +147,12 @@ public JettyAdminServer( throw e; } - SslContextFactory sslContextFactory = new SslContextFactory.Server(); + SslContextFactory.Server sslContextFactory = new SslContextFactory.Server(); sslContextFactory.setKeyStore(keyStore); sslContextFactory.setKeyStorePassword(privateKeyPassword); sslContextFactory.setTrustStore(trustStore); sslContextFactory.setTrustStorePassword(certAuthPassword); - if (needClientAuth) { - // setNeedClientAuth() of SslContextFactory.Server is not a deprecated API - sslContextFactory.setNeedClientAuth(true); - } + sslContextFactory.setNeedClientAuth(needClientAuth); if (forceHttps) { connector = new ServerConnector(server, ``` ########## zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/Commands.java: ########## @@ -532,23 +615,21 @@ public CommandResponse runGet(ZooKeeperServer zkServer, Map<String, String> kwar */ public static class RestoreCommand extends PostCommand { static final String RESPONSE_DATA_LAST_ZXID = "last_zxid"; - static final String ADMIN_RESTORE_ENABLED = "zookeeper.admin.restore.enabled"; - private RateLimiter rateLimiter; public RestoreCommand() { - super(Arrays.asList("restore", "rest")); - rateLimiter = new RateLimiter(1, rateLimiterInterval, TimeUnit.MICROSECONDS); + super(Arrays.asList("restore", "rest"), true, new AuthRequest(ZooDefs.Perms.ALL, ROOT_PATH)); + rateLimiter = new RateLimiter(1, rateLimiterInterval, TimeUnit.MILLISECONDS); Review Comment: And here we fix a bug? ########## zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/Commands.java: ########## @@ -76,6 +85,8 @@ public class Commands { static final Logger LOG = LoggerFactory.getLogger(Commands.class); static final String ADMIN_RATE_LIMITER_INTERVAL = "zookeeper.admin.rateLimiterIntervalInMS"; private static final long rateLimiterInterval = Integer.parseInt(System.getProperty(ADMIN_RATE_LIMITER_INTERVAL, "300000")); + static final String AUTH_INFO_SEPARATOR = " "; + static final String ROOT_PATH = "/"; Review Comment: VisibleForTesting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
