Chao Sun created HDFS-15014:
-------------------------------

             Summary: [RBF] WebHdfs chooseDatanode shouldn't call 
getDatanodeReport 
                 Key: HDFS-15014
                 URL: https://issues.apache.org/jira/browse/HDFS-15014
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: rbf
            Reporter: Chao Sun


Currently the {{chooseDatanode}} call (which is shared by {{open}}, {{create}}, 
{{append}} and {{getFileChecksum}}) in RBF WebHDFS calls {{getDatanodeReport}} 
from ALL downstream namenodes:

{code}
  private DatanodeInfo chooseDatanode(final Router router,
      final String path, final HttpOpParam.Op op, final long openOffset,
      final String excludeDatanodes) throws IOException {
    // We need to get the DNs as a privileged user
    final RouterRpcServer rpcServer = getRPCServer(router);
    UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
    RouterRpcServer.setCurrentUser(loginUser);

    DatanodeInfo[] dns = null;
    try {
      dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE);
    } catch (IOException e) {
      LOG.error("Cannot get the datanodes from the RPC server", e);
    } finally {
      // Reset ugi to remote user for remaining operations.
      RouterRpcServer.resetCurrentUser();
    }

    HashSet<Node> excludes = new HashSet<Node>();
    if (excludeDatanodes != null) {
      Collection<String> collection =
          getTrimmedStringCollection(excludeDatanodes);
      for (DatanodeInfo dn : dns) {
        if (collection.contains(dn.getName())) {
          excludes.add(dn);
        }
      }
    }
...
{code}

The {{getDatanodeReport}} is very expensive (particularly in a large cluster) 
as it need to lock the {{DatanodeManager}} which is also shared by calls such 
as processing heartbeats. Check HDFS-14366 for a similar issue.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to