nyl3532016 commented on a change in pull request #2667:
URL: https://github.com/apache/hbase/pull/2667#discussion_r525745508



##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/tool/BulkLoadHFilesTool.java
##########
@@ -647,51 +684,24 @@ private String getUniqueName() {
       return null;
     }
     if (Bytes.compareTo(first.get(), last.get()) > 0) {
-      throw new IllegalArgumentException("Invalid range: " + 
Bytes.toStringBinary(first.get()) +
-        " > " + Bytes.toStringBinary(last.get()));
-    }
-    int idx =
-      Collections.binarySearch(startEndKeys, Pair.newPair(first.get(), 
HConstants.EMPTY_END_ROW),
-        (p1, p2) -> Bytes.compareTo(p1.getFirst(), p2.getFirst()));
-    if (idx < 0) {
-      // not on boundary, returns -(insertion index). Calculate region it
-      // would be in.
-      idx = -(idx + 1) - 1;
-    }
-    int indexForCallable = idx;
-
-    /*
-     * we can consider there is a region hole in following conditions. 1) if 
idx < 0,then first
-     * region info is lost. 2) if the endkey of a region is not equal to the 
startkey of the next
-     * region. 3) if the endkey of the last region is not empty.
-     */
-    if (indexForCallable < 0) {
-      throw new IOException("The first region info for table " + tableName +
-        " can't be found in hbase:meta.Please use hbck tool to fix it first.");
-    } else if ((indexForCallable == startEndKeys.size() - 1) &&
-      !Bytes.equals(startEndKeys.get(indexForCallable).getSecond(), 
HConstants.EMPTY_BYTE_ARRAY)) {
-      throw new IOException("The last region info for table " + tableName +
-        " can't be found in hbase:meta.Please use hbck tool to fix it first.");
-    } else if (indexForCallable + 1 < startEndKeys.size() &&
-      !(Bytes.compareTo(startEndKeys.get(indexForCallable).getSecond(),
-        startEndKeys.get(indexForCallable + 1).getFirst()) == 0)) {
-      throw new IOException("The endkey of one region for table " + tableName +
-        " is not equal to the startkey of the next region in hbase:meta." +
-        "Please use hbck tool to fix it first.");
+      throw new IllegalArgumentException("Invalid range: " + 
Bytes.toStringBinary(first.get())
+          + " > " + Bytes.toStringBinary(last.get()));
     }
-
-    boolean lastKeyInRange = Bytes.compareTo(last.get(), 
startEndKeys.get(idx).getSecond()) < 0 ||
-      Bytes.equals(startEndKeys.get(idx).getSecond(), 
HConstants.EMPTY_BYTE_ARRAY);
+    int firstKeyRegionIdx = getVaildRegionIndex(startEndKeys, first.get(), 
tableName);
+    boolean lastKeyInRange =
+        Bytes.compareTo(last.get(), 
startEndKeys.get(firstKeyRegionIdx).getSecond()) < 0 || Bytes
+            .equals(startEndKeys.get(firstKeyRegionIdx).getSecond(), 
HConstants.EMPTY_BYTE_ARRAY);
     if (!lastKeyInRange) {
-      Pair<byte[], byte[]> startEndKey = startEndKeys.get(indexForCallable);
-      List<LoadQueueItem> lqis =
-        splitStoreFile(item, 
FutureUtils.get(conn.getAdmin().getDescriptor(tableName)),
-            startEndKey.getSecond());
+      int lastKeyRegionIdx = getVaildRegionIndex(startEndKeys, last.get(), 
tableName);
+      int splitIdx = (firstKeyRegionIdx + lastKeyRegionIdx) / 2;

Review comment:
       yes,if hfile span multi regions, for example A,B,C,D,E,F(the start key 
of these regions are in ascending order), we should use region C endkey to 
split, not region A. In this way, we can get equal .top and .bottom hfiles, 
reduce time complexity of split from O(n) to O(logn),decrease invoke of 
`bulkLoad` rpc to regionserver ,and also avoid write amplification during 
`copyHFileHalf`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to