keith-turner commented on code in PR #5341: URL: https://github.com/apache/accumulo/pull/5341#discussion_r1970457723
########## server/manager/src/main/java/org/apache/accumulo/manager/tableOps/bulkVer2/LoadFiles.java: ########## @@ -342,12 +341,22 @@ private long loadFiles(TableId tableId, Path bulkDir, LoadMappingIterator loadMa loader.start(bulkDir, manager, tid, bulkInfo.setTime); long t1 = System.currentTimeMillis(); + KeyExtent prevLastExtent = null; // KeyExtent of last tablet from prior loadMapEntry while (lmi.hasNext()) { loadMapEntry = lmi.next(); - List<TabletMetadata> tablets = - findOverlappingTablets(fmtTid, loadMapEntry.getKey(), tabletIter); + KeyExtent loadMapKey = loadMapEntry.getKey(); + if (prevLastExtent != null && !loadMapKey.isPreviousExtent(prevLastExtent)) { Review Comment: Wondering if using a batch scanner would be better here to minimize the overall number of RPCs made. Would be a large change to the code. The current code, even if we optimize the use of the scanner will make a lot of RPCs for some cases (like importing into every 100th tablet in a million tablet table) and those RPCs will be made serially. A batch scanner would minimize the number of RPCs made for these cases. Would be good to gather some performance data before making large changes to improve performance to ensure they are needed. Can not do it in 2.1, but in main we could experiment w/ the SplitMillionIT and try doing things like importing into every 10th tablet for 1000 tablets, every 100th tablet for 1000 tablets, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org