[GitHub] [hbase] wchevreuil commented on a change in pull request #2667: HBASE-25281 Bulkload split hfile too many times due to unreasonable split point

2020-11-21 Thread GitBox


wchevreuil commented on a change in pull request #2667:
URL: https://github.com/apache/hbase/pull/2667#discussion_r528182565



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/tool/BulkLoadHFilesTool.java
##
@@ -647,51 +684,24 @@ private String getUniqueName() {
   return null;
 }
 if (Bytes.compareTo(first.get(), last.get()) > 0) {
-  throw new IllegalArgumentException("Invalid range: " + 
Bytes.toStringBinary(first.get()) +
-" > " + Bytes.toStringBinary(last.get()));
-}
-int idx =
-  Collections.binarySearch(startEndKeys, Pair.newPair(first.get(), 
HConstants.EMPTY_END_ROW),
-(p1, p2) -> Bytes.compareTo(p1.getFirst(), p2.getFirst()));
-if (idx < 0) {
-  // not on boundary, returns -(insertion index). Calculate region it
-  // would be in.
-  idx = -(idx + 1) - 1;
-}
-int indexForCallable = idx;
-
-/*
- * we can consider there is a region hole in following conditions. 1) if 
idx < 0,then first
- * region info is lost. 2) if the endkey of a region is not equal to the 
startkey of the next
- * region. 3) if the endkey of the last region is not empty.
- */
-if (indexForCallable < 0) {
-  throw new IOException("The first region info for table " + tableName +
-" can't be found in hbase:meta.Please use hbck tool to fix it first.");
-} else if ((indexForCallable == startEndKeys.size() - 1) &&
-  !Bytes.equals(startEndKeys.get(indexForCallable).getSecond(), 
HConstants.EMPTY_BYTE_ARRAY)) {
-  throw new IOException("The last region info for table " + tableName +
-" can't be found in hbase:meta.Please use hbck tool to fix it first.");
-} else if (indexForCallable + 1 < startEndKeys.size() &&
-  !(Bytes.compareTo(startEndKeys.get(indexForCallable).getSecond(),
-startEndKeys.get(indexForCallable + 1).getFirst()) == 0)) {
-  throw new IOException("The endkey of one region for table " + tableName +
-" is not equal to the startkey of the next region in hbase:meta." +
-"Please use hbck tool to fix it first.");
+  throw new IllegalArgumentException("Invalid range: " + 
Bytes.toStringBinary(first.get())
+  + " > " + Bytes.toStringBinary(last.get()));
 }
-
-boolean lastKeyInRange = Bytes.compareTo(last.get(), 
startEndKeys.get(idx).getSecond()) < 0 ||
-  Bytes.equals(startEndKeys.get(idx).getSecond(), 
HConstants.EMPTY_BYTE_ARRAY);
+int firstKeyRegionIdx = getVaildRegionIndex(startEndKeys, first.get(), 
tableName);
+boolean lastKeyInRange =
+Bytes.compareTo(last.get(), 
startEndKeys.get(firstKeyRegionIdx).getSecond()) < 0 || Bytes
+.equals(startEndKeys.get(firstKeyRegionIdx).getSecond(), 
HConstants.EMPTY_BYTE_ARRAY);
 if (!lastKeyInRange) {
-  Pair startEndKey = startEndKeys.get(indexForCallable);
-  List lqis =
-splitStoreFile(item, 
FutureUtils.get(conn.getAdmin().getDescriptor(tableName)),
-startEndKey.getSecond());
+  int lastKeyRegionIdx = getVaildRegionIndex(startEndKeys, last.get(), 
tableName);
+  int splitIdx = (firstKeyRegionIdx + lastKeyRegionIdx) / 2;

Review comment:
   Makes sense, thanks for explaining!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] wchevreuil commented on a change in pull request #2667: HBASE-25281 Bulkload split hfile too many times due to unreasonable split point

2020-11-17 Thread GitBox


wchevreuil commented on a change in pull request #2667:
URL: https://github.com/apache/hbase/pull/2667#discussion_r525106766



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/tool/BulkLoadHFilesTool.java
##
@@ -615,6 +615,43 @@ private String getUniqueName() {
 return lqis;
   }
 
+  /**
+   * @param startEndKeys the start/end keys of regions belong to this table, 
the list in ascending
+   *  order by start key
+   * @param key the key need to find which region belong to
+   * @param tableName table
+   * @throws IOException the table has hole or overlap, need hbck tool to fix
+   */
+  private int getVaildRegionIndex(List> startEndKeys, 
byte[] key,

Review comment:
   nit: "getValidRegionIndex"

##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/tool/BulkLoadHFilesTool.java
##
@@ -647,51 +684,24 @@ private String getUniqueName() {
   return null;
 }
 if (Bytes.compareTo(first.get(), last.get()) > 0) {
-  throw new IllegalArgumentException("Invalid range: " + 
Bytes.toStringBinary(first.get()) +
-" > " + Bytes.toStringBinary(last.get()));
-}
-int idx =
-  Collections.binarySearch(startEndKeys, Pair.newPair(first.get(), 
HConstants.EMPTY_END_ROW),
-(p1, p2) -> Bytes.compareTo(p1.getFirst(), p2.getFirst()));
-if (idx < 0) {
-  // not on boundary, returns -(insertion index). Calculate region it
-  // would be in.
-  idx = -(idx + 1) - 1;
-}
-int indexForCallable = idx;
-
-/*
- * we can consider there is a region hole in following conditions. 1) if 
idx < 0,then first
- * region info is lost. 2) if the endkey of a region is not equal to the 
startkey of the next
- * region. 3) if the endkey of the last region is not empty.
- */
-if (indexForCallable < 0) {
-  throw new IOException("The first region info for table " + tableName +
-" can't be found in hbase:meta.Please use hbck tool to fix it first.");
-} else if ((indexForCallable == startEndKeys.size() - 1) &&
-  !Bytes.equals(startEndKeys.get(indexForCallable).getSecond(), 
HConstants.EMPTY_BYTE_ARRAY)) {
-  throw new IOException("The last region info for table " + tableName +
-" can't be found in hbase:meta.Please use hbck tool to fix it first.");
-} else if (indexForCallable + 1 < startEndKeys.size() &&
-  !(Bytes.compareTo(startEndKeys.get(indexForCallable).getSecond(),
-startEndKeys.get(indexForCallable + 1).getFirst()) == 0)) {
-  throw new IOException("The endkey of one region for table " + tableName +
-" is not equal to the startkey of the next region in hbase:meta." +
-"Please use hbck tool to fix it first.");
+  throw new IllegalArgumentException("Invalid range: " + 
Bytes.toStringBinary(first.get())
+  + " > " + Bytes.toStringBinary(last.get()));
 }
-
-boolean lastKeyInRange = Bytes.compareTo(last.get(), 
startEndKeys.get(idx).getSecond()) < 0 ||
-  Bytes.equals(startEndKeys.get(idx).getSecond(), 
HConstants.EMPTY_BYTE_ARRAY);
+int firstKeyRegionIdx = getVaildRegionIndex(startEndKeys, first.get(), 
tableName);
+boolean lastKeyInRange =
+Bytes.compareTo(last.get(), 
startEndKeys.get(firstKeyRegionIdx).getSecond()) < 0 || Bytes
+.equals(startEndKeys.get(firstKeyRegionIdx).getSecond(), 
HConstants.EMPTY_BYTE_ARRAY);
 if (!lastKeyInRange) {
-  Pair startEndKey = startEndKeys.get(indexForCallable);
-  List lqis =
-splitStoreFile(item, 
FutureUtils.get(conn.getAdmin().getDescriptor(tableName)),
-startEndKey.getSecond());
+  int lastKeyRegionIdx = getVaildRegionIndex(startEndKeys, last.get(), 
tableName);
+  int splitIdx = (firstKeyRegionIdx + lastKeyRegionIdx) / 2;

Review comment:
   So if the hfile.endKey > region.endKey, we don't always split the file 
by region.endKey anymore? Can you explain what should be the ideal split point 
in this case?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org