You would compare the webhdfs addresses from 
DFSUtil.getHaNnWebHdfsAddresses(conf, srcFs.getScheme()) to the hdfs addresses 
from FSHDFSUtils.getNNAddresses(desFs, conf) and see if there is an 
intersection. Something like the below. My assumption being that the same host 
runs both hdfs and webhdfs. Is my understanding correct?

public static boolean isCoercibleToHdfs(Configuration conf, FileSystem srcFs, 
FileSystem desFs) {
  if (isSameHdfs(conf, srcFs, desFs)) {
    return true;
  }

  if (srcFs instanceof WebHdfsFileSystem && desFs instanceof 
DistributedFileSystem) {
    String srcServiceName = srcFs.getCanonicalServiceName();
    String desServiceName = desFs.getCanonicalServiceName();

    if (srcServiceName == null || desServiceName == null) {
      return false;
    }

    // Only compare hostnames since the ports used by webhdfs and hdfs are 
different.
    Set<String> webhdfsHostnames = new HashSet<>();
    if (srcServiceName.startsWith("ha-webhdfs") || 
srcServiceName.startsWith("ha-swebhdfs")) {
      Map<String, Map<String, InetSocketAddress>> haNnWebHdfsAddresses =
          DFSUtil.getHaNnWebHdfsAddresses(conf, srcFs.getScheme());
      String nameService = srcServiceName.substring(srcServiceName.indexOf(":") 
+ 1);
      if (haNnWebHdfsAddresses.containsKey(nameService)) {
        Map<String, InetSocketAddress> nnMap = 
haNnWebHdfsAddresses.get(nameService);
        for (Map.Entry<String, InetSocketAddress> addressEntry : 
nnMap.entrySet()) {
          InetSocketAddress addr = addressEntry.getValue();
          webhdfsHostnames.add(addr.getHostString());
        }
      }
    } else {
      webhdfsHostnames.add(srcServiceName.split(":")[0]);
    }

    Set<String> hdfsHostnames = new HashSet<>();
    Set<InetSocketAddress> srcAddrs = getNNAddresses((DistributedFileSystem) 
desFs, conf);
    for (InetSocketAddress address : srcAddrs) {
      hdfsHostnames.add(address.getHostString());
    }

    return Sets.intersection(webhdfsHostnames, hdfsHostnames).size() > 0;
  }
  return false;
}



On 2019/02/11 06:12:43, 张铎(Duo Zhang) <p...@gmail.com<mailto:p...@gmail.com>> 
wrote:
> How do we know if a webhdfs is the same with a hdfs?>
>
> Schile,Nathan <na...@cerner.com.invalid<mailto:na...@cerner.com.invalid>> 
> 于2019年2月11日周一 下午1:25写道:>
>
> > Currently when bulk loading from a webhdfs filesystem, files are copied>
> > rather than renamed if they reside on the same cluster [1]. This causes the>
> > bulk load to not perform optimally.>
> >>
> >>
> >>
> > It seems like the configured webhdfs namenodes can be compared against>
> > that of the namenodes being bulk loaded to, and if they are the same, then>
> > the bulk loaded files could be renamed rather than copied.>
> >>
> >>
> >>
> > I was able to locate a JIRA comment bring up this use case [2] but wasn't>
> > able to find a comment or JIRA for with a resolution.>
> >>
> >>
> >>
> > If this issue and proposed solution are acceptable, I would be happy to>
> > log a JIRA and work on a patch. Please let me know how to proceed.>
> >>
> >>
> >>
> > [1]>
> > https://github.com/apache/hbase/blob/rel/2.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SecureBulkLoadManager.java#L369-L383>
> >>
> > [2]>
> > https://issues.apache.org/jira/browse/HBASE-8304?focusedCommentId=13923197&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13923197>
> >>
> >>
> >>
> > CONFIDENTIALITY NOTICE This message and any included attachments are from>
> > Cerner Corporation and are intended only for the addressee. The information>
> > contained in this message is confidential and may constitute inside or>
> > non-public information under international, federal, or state securities>
> > laws. Unauthorized forwarding, printing, copying, distribution, or use of>
> > such information is strictly prohibited and may be unlawful. If you are not>
> > the addressee, please promptly delete this message and notify the sender of>
> > the delivery error by e-mail or you may call Cerner's corporate offices in>
> > Kansas City, Missouri, U.S.A at (+1) (816)221-1024.>
> >>
>


CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

Reply via email to