keith-turner commented on code in PR #5341:
URL: https://github.com/apache/accumulo/pull/5341#discussion_r1970457723


##########
server/manager/src/main/java/org/apache/accumulo/manager/tableOps/bulkVer2/LoadFiles.java:
##########
@@ -342,12 +341,22 @@ private long loadFiles(TableId tableId, Path bulkDir, 
LoadMappingIterator loadMa
     loader.start(bulkDir, manager, tid, bulkInfo.setTime);
 
     long t1 = System.currentTimeMillis();
+    KeyExtent prevLastExtent = null; // KeyExtent of last tablet from prior 
loadMapEntry
     while (lmi.hasNext()) {
       loadMapEntry = lmi.next();
-      List<TabletMetadata> tablets =
-          findOverlappingTablets(fmtTid, loadMapEntry.getKey(), tabletIter);
+      KeyExtent loadMapKey = loadMapEntry.getKey();
+      if (prevLastExtent != null && 
!loadMapKey.isPreviousExtent(prevLastExtent)) {

Review Comment:
   Wondering if using a batch scanner would be better here to minimize the 
overall number of RPCs made.  Would be a large change to the code. The current 
code, even if we optimize the use of the scanner will make a lot of RPCs for 
some cases (like importing into every 100th tablet in a million tablet table) 
and those RPCs will be made serially.  A batch scanner  would minimize the 
number of RPCs made for these cases.   
   
   Would be good to gather some performance data before making large changes to 
improve performance to ensure they are needed.  Can not do it  in 2.1, but in 
main we could experiment w/ the SplitMillionIT and try doing things like 
importing into every 10th tablet for 1000 tablets, every 100th tablet for 1000 
tablets, etc.  
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to