Currently when bulk loading from a webhdfs filesystem, files are copied rather than renamed if they reside on the same cluster [1]. This causes the bulk load to not perform optimally.
It seems like the configured webhdfs namenodes can be compared against that of the namenodes being bulk loaded to, and if they are the same, then the bulk loaded files could be renamed rather than copied. I was able to locate a JIRA comment bring up this use case [2] but wasn't able to find a comment or JIRA for with a resolution. If this issue and proposed solution are acceptable, I would be happy to log a JIRA and work on a patch. Please let me know how to proceed. [1] https://github.com/apache/hbase/blob/rel/2.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SecureBulkLoadManager.java#L369-L383 [2] https://issues.apache.org/jira/browse/HBASE-8304?focusedCommentId=13923197&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13923197 CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
