On Mon, Feb 11, 2019 at 2:18 PM Schile,Nathan
<[email protected]> wrote:
> You would compare the webhdfs addresses from
> DFSUtil.getHaNnWebHdfsAddresses(conf, srcFs.getScheme()) to the hdfs
> addresses from FSHDFSUtils.getNNAddresses(desFs, conf) and see if there is
> an intersection. Something like the below. My assumption being that the
> same host runs both hdfs and webhdfs. Is my understanding correct?
>
Mostly yes. WebHDFS is served by NameNodes. (HttpFS is another story though)
>
> public static boolean isCoercibleToHdfs(Configuration conf, FileSystem
> srcFs, FileSystem desFs) {
> if (isSameHdfs(conf, srcFs, desFs)) {
> return true;
> }
>
> if (srcFs instanceof WebHdfsFileSystem && desFs instanceof
> DistributedFileSystem) {
> String srcServiceName = srcFs.getCanonicalServiceName();
> String desServiceName = desFs.getCanonicalServiceName();
>
> if (srcServiceName == null || desServiceName == null) {
> return false;
> }
>
> // Only compare hostnames since the ports used by webhdfs and hdfs are
> different.
> Set<String> webhdfsHostnames = new HashSet<>();
> if (srcServiceName.startsWith("ha-webhdfs") ||
> srcServiceName.startsWith("ha-swebhdfs")) {
> Map<String, Map<String, InetSocketAddress>> haNnWebHdfsAddresses =
> DFSUtil.getHaNnWebHdfsAddresses(conf, srcFs.getScheme());
> String nameService =
> srcServiceName.substring(srcServiceName.indexOf(":") + 1);
> if (haNnWebHdfsAddresses.containsKey(nameService)) {
> Map<String, InetSocketAddress> nnMap =
> haNnWebHdfsAddresses.get(nameService);
> for (Map.Entry<String, InetSocketAddress> addressEntry :
> nnMap.entrySet()) {
> InetSocketAddress addr = addressEntry.getValue();
> webhdfsHostnames.add(addr.getHostString());
> }
> }
> } else {
> webhdfsHostnames.add(srcServiceName.split(":")[0]);
> }
>
> Set<String> hdfsHostnames = new HashSet<>();
> Set<InetSocketAddress> srcAddrs =
> getNNAddresses((DistributedFileSystem) desFs, conf);
> for (InetSocketAddress address : srcAddrs) {
> hdfsHostnames.add(address.getHostString());
> }
>
> return Sets.intersection(webhdfsHostnames, hdfsHostnames).size() > 0;
> }
> return false;
> }
>
>
>
> On 2019/02/11 06:12:43, 张铎(Duo Zhang) <[email protected]<mailto:
> [email protected]>> wrote:
> > How do we know if a webhdfs is the same with a hdfs?>
> >
> > Schile,Nathan <[email protected]<mailto:[email protected]>>
> 于2019年2月11日周一 下午1:25写道:>
> >
> > > Currently when bulk loading from a webhdfs filesystem, files are
> copied>
> > > rather than renamed if they reside on the same cluster [1]. This
> causes the>
> > > bulk load to not perform optimally.>
> > >>
> > >>
> > >>
> > > It seems like the configured webhdfs namenodes can be compared against>
> > > that of the namenodes being bulk loaded to, and if they are the same,
> then>
> > > the bulk loaded files could be renamed rather than copied.>
> > >>
> > >>
> > >>
> > > I was able to locate a JIRA comment bring up this use case [2] but
> wasn't>
> > > able to find a comment or JIRA for with a resolution.>
> > >>
> > >>
> > >>
> > > If this issue and proposed solution are acceptable, I would be happy
> to>
> > > log a JIRA and work on a patch. Please let me know how to proceed.>
> > >>
> > >>
> > >>
> > > [1]>
> > >
> https://github.com/apache/hbase/blob/rel/2.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SecureBulkLoadManager.java#L369-L383
> >
> > >>
> > > [2]>
> > >
> https://issues.apache.org/jira/browse/HBASE-8304?focusedCommentId=13923197&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13923197
> >
> > >>
> > >>
> > >>
> > > CONFIDENTIALITY NOTICE This message and any included attachments are
> from>
> > > Cerner Corporation and are intended only for the addressee. The
> information>
> > > contained in this message is confidential and may constitute inside or>
> > > non-public information under international, federal, or state
> securities>
> > > laws. Unauthorized forwarding, printing, copying, distribution, or use
> of>
> > > such information is strictly prohibited and may be unlawful. If you
> are not>
> > > the addressee, please promptly delete this message and notify the
> sender of>
> > > the delivery error by e-mail or you may call Cerner's corporate
> offices in>
> > > Kansas City, Missouri, U.S.A at (+1) (816)221-1024.>
> > >>
> >
>
>
> CONFIDENTIALITY NOTICE This message and any included attachments are from
> Cerner Corporation and are intended only for the addressee. The information
> contained in this message is confidential and may constitute inside or
> non-public information under international, federal, or state securities
> laws. Unauthorized forwarding, printing, copying, distribution, or use of
> such information is strictly prohibited and may be unlawful. If you are not
> the addressee, please promptly delete this message and notify the sender of
> the delivery error by e-mail or you may call Cerner's corporate offices in
> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>