[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-26 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r512413846



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
##
@@ -3402,4 +3406,33 @@ public ListReplicationSinkServersResponse 
listReplicationSinkServers(
 }
 return builder.build();
   }
+
+  @Override
+  public RegionServerReportResponse replicationServerReport(RpcController 
controller,
+  RegionServerReportRequest request) throws ServiceException {
+try {
+  master.checkServiceStarted();
+  int versionNumber = 0;
+  String version = "0.0.0";
+  VersionInfo versionInfo = VersionInfoUtil.getCurrentClientVersionInfo();
+  if (versionInfo != null) {
+version = versionInfo.getVersion();
+versionNumber = VersionInfoUtil.getVersionNumber(versionInfo);
+  }
+  ClusterStatusProtos.ServerLoad sl = request.getLoad();
+  ServerName serverName = ProtobufUtil.toServerName(request.getServer());
+  ServerMetrics oldMetrics = 
master.getReplicationServerManager().getServerMetrics(serverName);
+  ServerMetrics newMetrics =
+  ServerMetricsBuilder.toServerMetrics(serverName, versionNumber, 
version, sl);
+  master.getReplicationServerManager().serverReport(serverName, 
newMetrics);
+  if (sl != null && master.metricsMaster != null) {
+// Up our metrics.
+master.metricsMaster.incrementRequests(sl.getTotalNumberOfRequests()

Review comment:
   Got it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-26 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r512361785



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HReplicationServer.java
##
@@ -388,4 +446,152 @@ protected ReplicationServerRpcServices 
createRpcServices() throws IOException {
   protected boolean setAbortRequested() {
 return abortRequested.compareAndSet(false, true);
   }
+
+  private void tryReplicationServerReport(long reportStartTime, long 
reportEndTime)
+  throws IOException {
+ReplicationServerStatusService.BlockingInterface rss = rssStub;
+if (rss == null) {
+  createReplicationServerStatusStub(true);
+  rss = rssStub;
+  if (rss == null) {
+return;
+  }
+}
+ClusterStatusProtos.ServerLoad sl = buildServerLoad(reportStartTime, 
reportEndTime);
+try {
+  RegionServerReportRequest.Builder request = RegionServerReportRequest
+  .newBuilder();
+  request.setServer(ProtobufUtil.toServerName(this.serverName));
+  request.setLoad(sl);
+  rss.replicationServerReport(null, request.build());
+} catch (ServiceException se) {
+  IOException ioe = ProtobufUtil.getRemoteException(se);
+  if (ioe instanceof YouAreDeadException) {
+// This will be caught and handled as a fatal error in run()
+throw ioe;
+  }
+  if (rssStub == rss) {
+rssStub = null;
+  }
+  // Couldn't connect to the master, get location from zk and reconnect
+  // Method blocks until new master is found or we are stopped
+  createReplicationServerStatusStub(true);
+}
+  }
+
+  private ClusterStatusProtos.ServerLoad buildServerLoad(long reportStartTime, 
long reportEndTime) {
+long usedMemory = -1L;
+long maxMemory = -1L;
+final MemoryUsage usage = MemorySizeUtil.safeGetHeapMemoryUsage();
+if (usage != null) {
+  usedMemory = usage.getUsed();
+  maxMemory = usage.getMax();
+}
+
+ClusterStatusProtos.ServerLoad.Builder serverLoad = 
ClusterStatusProtos.ServerLoad.newBuilder();
+serverLoad.setTotalNumberOfRequests(rpcServices.requestCount.sum());
+serverLoad.setUsedHeapMB((int) (usedMemory / 1024 / 1024));
+serverLoad.setMaxHeapMB((int) (maxMemory / 1024 / 1024));
+
+serverLoad.setReportStartTime(reportStartTime);
+serverLoad.setReportEndTime(reportEndTime);
+
+// for the replicationLoad purpose. Only need to get from one 
executorService
+// either source or sink will get the same info
+ReplicationSinkService sinks = getReplicationSinkService();
+
+if (sinks != null) {
+  // always refresh first to get the latest value
+  ReplicationLoad rLoad = sinks.refreshAndGetReplicationLoad();
+  if (rLoad != null) {
+serverLoad.setReplLoadSink(rLoad.getReplicationLoadSink());
+  }
+}
+return serverLoad.build();
+  }
+
+  /**
+   * Get the current master from ZooKeeper and open the RPC connection to it. 
To get a fresh
+   * connection, the current rssStub must be null. Method will block until a 
master is available.
+   * You can break from this block by requesting the server stop.
+   * @param refresh If true then master address will be read from ZK, 
otherwise use cached data
+   * @return master + port, or null if server has been stopped
+   */
+  private synchronized ServerName createReplicationServerStatusStub(boolean 
refresh) {

Review comment:
   The return ServerName never used?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-26 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r512355359



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
##
@@ -3402,4 +3406,33 @@ public ListReplicationSinkServersResponse 
listReplicationSinkServers(
 }
 return builder.build();
   }
+
+  @Override
+  public RegionServerReportResponse replicationServerReport(RpcController 
controller,
+  RegionServerReportRequest request) throws ServiceException {
+try {
+  master.checkServiceStarted();
+  int versionNumber = 0;
+  String version = "0.0.0";
+  VersionInfo versionInfo = VersionInfoUtil.getCurrentClientVersionInfo();
+  if (versionInfo != null) {
+version = versionInfo.getVersion();
+versionNumber = VersionInfoUtil.getVersionNumber(versionInfo);
+  }
+  ClusterStatusProtos.ServerLoad sl = request.getLoad();
+  ServerName serverName = ProtobufUtil.toServerName(request.getServer());
+  ServerMetrics oldMetrics = 
master.getReplicationServerManager().getServerMetrics(serverName);
+  ServerMetrics newMetrics =
+  ServerMetricsBuilder.toServerMetrics(serverName, versionNumber, 
version, sl);
+  master.getReplicationServerManager().serverReport(serverName, 
newMetrics);
+  if (sl != null && master.metricsMaster != null) {
+// Up our metrics.
+master.metricsMaster.incrementRequests(sl.getTotalNumberOfRequests()

Review comment:
   One new question: ReplicationServer should not have request count 
metric? Because this metric should mean that read metric or write metric?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-25 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r511741469



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HReplicationServer.java
##
@@ -388,4 +446,152 @@ protected ReplicationServerRpcServices 
createRpcServices() throws IOException {
   protected boolean setAbortRequested() {
 return abortRequested.compareAndSet(false, true);
   }
+
+  protected void tryReplicationServerReport(long reportStartTime, long 
reportEndTime)

Review comment:
   private method is enough?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-25 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r511741505



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HReplicationServer.java
##
@@ -388,4 +446,152 @@ protected ReplicationServerRpcServices 
createRpcServices() throws IOException {
   protected boolean setAbortRequested() {
 return abortRequested.compareAndSet(false, true);
   }
+
+  protected void tryReplicationServerReport(long reportStartTime, long 
reportEndTime)
+  throws IOException {
+ReplicationServerStatusService.BlockingInterface rss = rssStub;
+if (rss == null) {
+  createReplicationServerStatusStub(true);
+  rss = rssStub;
+  if (rss == null) {
+return;
+  }
+}
+ClusterStatusProtos.ServerLoad sl = buildServerLoad(reportStartTime, 
reportEndTime);
+try {
+  RegionServerReportRequest.Builder request = RegionServerReportRequest
+  .newBuilder();
+  request.setServer(ProtobufUtil.toServerName(this.serverName));
+  request.setLoad(sl);
+  rss.replicationServerReport(null, request.build());
+} catch (ServiceException se) {
+  IOException ioe = ProtobufUtil.getRemoteException(se);
+  if (ioe instanceof YouAreDeadException) {
+// This will be caught and handled as a fatal error in run()
+throw ioe;
+  }
+  if (rssStub == rss) {
+rssStub = null;
+  }
+  // Couldn't connect to the master, get location from zk and reconnect
+  // Method blocks until new master is found or we are stopped
+  createReplicationServerStatusStub(true);
+}
+  }
+
+  private ClusterStatusProtos.ServerLoad buildServerLoad(long reportStartTime, 
long reportEndTime) {
+long usedMemory = -1L;
+long maxMemory = -1L;
+final MemoryUsage usage = MemorySizeUtil.safeGetHeapMemoryUsage();
+if (usage != null) {
+  usedMemory = usage.getUsed();
+  maxMemory = usage.getMax();
+}
+
+ClusterStatusProtos.ServerLoad.Builder serverLoad = 
ClusterStatusProtos.ServerLoad.newBuilder();
+serverLoad.setTotalNumberOfRequests(rpcServices.requestCount.sum());
+serverLoad.setUsedHeapMB((int) (usedMemory / 1024 / 1024));
+serverLoad.setMaxHeapMB((int) (maxMemory / 1024 / 1024));
+
+serverLoad.setReportStartTime(reportStartTime);
+serverLoad.setReportEndTime(reportEndTime);
+
+// for the replicationLoad purpose. Only need to get from one 
executorService
+// either source or sink will get the same info
+ReplicationSinkService sinks = getReplicationSinkService();
+
+if (sinks != null) {
+  // always refresh first to get the latest value
+  ReplicationLoad rLoad = sinks.refreshAndGetReplicationLoad();
+  if (rLoad != null) {
+serverLoad.setReplLoadSink(rLoad.getReplicationLoadSink());
+  }
+}
+return serverLoad.build();
+  }
+
+  /**
+   * Get the current master from ZooKeeper and open the RPC connection to it. 
To get a fresh
+   * connection, the current rssStub must be null. Method will block until a 
master is available.
+   * You can break from this block by requesting the server stop.
+   * @param refresh If true then master address will be read from ZK, 
otherwise use cached data
+   * @return master + port, or null if server has been stopped
+   */
+  protected synchronized ServerName createReplicationServerStatusStub(boolean 
refresh) {

Review comment:
   Ditto.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-25 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r511741036



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java
##
@@ -445,7 +418,7 @@ public PeerRegionServerListener(HBaseReplicationEndpoint 
endpoint) {
 
 @Override
 public synchronized void nodeChildrenChanged(String path) {
-  if (path.equals(regionServerListNode)) {
+  if (replicationEndpoint.fetchServersUseZk && 
path.equals(regionServerListNode)) {

Review comment:
   Add comment in the class javadoc for this new impl?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-25 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r511740716



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java
##
@@ -445,7 +418,7 @@ public PeerRegionServerListener(HBaseReplicationEndpoint 
endpoint) {
 
 @Override
 public synchronized void nodeChildrenChanged(String path) {
-  if (path.equals(regionServerListNode)) {
+  if (replicationEndpoint.fetchServersUseZk && 
path.equals(regionServerListNode)) {

Review comment:
   So in the new impl, it will always register zk listener but ignored the 
zk event if not use zk?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-25 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r511738528



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ReplicationServerManager.java
##
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.master;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentNavigableMap;
+import java.util.concurrent.ConcurrentSkipListMap;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.hbase.ServerMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * The ServerManager class manages info about replication servers.
+ * 
+ * Maintains lists of online and dead servers.
+ * 
+ * Servers are distinguished in two different ways.  A given server has a
+ * location, specified by hostname and port, and of which there can only be one
+ * online at any given time.  A server instance is specified by the location
+ * (hostname and port) as well as the startcode (timestamp from when the server
+ * was started).  This is used to differentiate a restarted instance of a given
+ * server from the original instance.
+ */
+@InterfaceAudience.Private
+public class ReplicationServerManager {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ReplicationServerManager.class);
+
+  public static final String ONLINE_SERVER_REFRESH_INTERVAL =
+  "hbase.master.replication.server.refresh.interval";
+  public static final int ONLINE_SERVER_REFRESH_INTERVAL_DEFAULT = 60 * 1000; 
// 1 mins
+
+  private final MasterServices master;
+
+  /** Map of registered servers to their current load */
+  private final ConcurrentNavigableMap 
onlineServers =
+new ConcurrentSkipListMap<>();
+
+  private OnlineServerRefresher onlineServerRefresher;
+  private int refreshPeriod;
+
+  /**
+   * Constructor.
+   */
+  public ReplicationServerManager(final MasterServices master) {
+this.master = master;
+  }
+
+  /**
+   * start chore in ServerManager
+   */
+  public void startChore() {
+Configuration conf = master.getConfiguration();
+refreshPeriod = conf.getInt(ONLINE_SERVER_REFRESH_INTERVAL,
+ONLINE_SERVER_REFRESH_INTERVAL_DEFAULT);
+onlineServerRefresher = new 
OnlineServerRefresher("ReplicationServerRefresher", refreshPeriod);
+master.getChoreService().scheduleChore(onlineServerRefresher);
+  }
+
+  /**
+   * Stop the ServerManager.
+   */
+  public void stop() {
+if (onlineServerRefresher != null) {
+  onlineServerRefresher.cancel();
+}
+  }
+
+  public void serverReport(ServerName sn, ServerMetrics sl) {
+if (null == this.onlineServers.replace(sn, sl)) {
+  if (!checkAndRecordNewServer(sn, sl)) {
+LOG.info("ReplicationServerReport ignored, could not record the 
server: {}", sn);
+  }
+}
+  }
+
+  /**
+   * Check is a server of same host and port already exists,
+   * if not, or the existed one got a smaller start code, record it.
+   *
+   * @param serverName the server to check and record
+   * @param sl the server load on the server
+   * @return true if the server is recorded, otherwise, false
+   */
+  boolean checkAndRecordNewServer(final ServerName serverName, final 
ServerMetrics sl) {

Review comment:
   private method?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-25 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r511738198



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ReplicationServerManager.java
##
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.master;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentNavigableMap;
+import java.util.concurrent.ConcurrentSkipListMap;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.hbase.ServerMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * The ServerManager class manages info about replication servers.

Review comment:
   ServerManager => ReplicationServerManager?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] infraio commented on a change in pull request #2579: HBASE-24999 Master manages ReplicationServers

2020-10-25 Thread GitBox


infraio commented on a change in pull request #2579:
URL: https://github.com/apache/hbase/pull/2579#discussion_r511674844



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ReplicationServerManager.java
##
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.master;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentNavigableMap;
+import java.util.concurrent.ConcurrentSkipListMap;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.hbase.ServerMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * The ServerManager class manages info about replication servers.
+ * 
+ * Maintains lists of online and dead servers.
+ * 
+ * Servers are distinguished in two different ways.  A given server has a
+ * location, specified by hostname and port, and of which there can only be one
+ * online at any given time.  A server instance is specified by the location
+ * (hostname and port) as well as the startcode (timestamp from when the server
+ * was started).  This is used to differentiate a restarted instance of a given
+ * server from the original instance.
+ */
+@InterfaceAudience.Private
+public class ReplicationServerManager {

Review comment:
   Yes. But we want do this refator after we finished this, because 
ReplicationServerManager may have new featrues but ServerManager not need. It 
is not easy to decide that extend ServerManager or implement a common 
interface. Thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org