dlmarion commented on code in PR #5982:
URL: https://github.com/apache/accumulo/pull/5982#discussion_r2557729570


##########
core/src/main/java/org/apache/accumulo/core/data/LoadPlan.java:
##########
@@ -510,4 +524,41 @@ public static LoadPlan compute(URI file, 
Map<String,String> properties,
       return builder.build();
     }
   }
+
+  /**
+   * Computes a load plan for a rfile based on the minimum and maximum row 
present across all
+   * locality groups.
+   *
+   * @param properties used when opening the rfile, see
+   *        {@link 
org.apache.accumulo.core.client.rfile.RFile.ScannerOptions#withTableProperties(Map)}
+   *
+   * @return a load plan of type {@link RangeType#FILE}
+   * @since 2.1.5
+   */
+  public static LoadPlan compute(URI file, Map<String,String> properties) 
throws IOException {
+    var path = new Path(file);
+    var conf = new Configuration();
+    var fs = FileSystem.get(path.toUri(), conf);
+    CryptoService cs =
+        CryptoFactoryLoader.getServiceForClient(CryptoEnvironment.Scope.TABLE, 
properties);
+    CachableBlockFile.CachableBuilder cb =
+        new CachableBlockFile.CachableBuilder().fsPath(fs, 
path).conf(conf).cryptoService(cs);
+    try (var reader = new 
org.apache.accumulo.core.file.rfile.RFile.Reader(cb)) {

Review Comment:
   Is there a reason not to use FileOperations.ReaderBuilder?



##########
core/src/main/java/org/apache/accumulo/core/data/LoadPlan.java:
##########
@@ -90,13 +96,19 @@ public enum RangeType {
      * row and end row can be null. The start row is exclusive and the end row 
is inclusive (like
      * Accumulo tablets). A common use case for this would be when files were 
partitioned using a
      * table's splits. When using this range type, the start and end row must 
exist as splits in the
-     * table or an exception will be thrown at load time.
+     * table or an exception will be thrown at load time. This RangeType is 
the most efficient for
+     * accumulo to load, and it enables only loading files to tablets that 
overlap data in the file.
      */
     TABLE,
     /**
-     * Range that correspond to known rows in a file. For this range type, the 
start row and end row
-     * must be non-null. The start row and end row are both considered 
inclusive. At load time,
-     * these data ranges will be mapped to table ranges.
+     * Range that corresponds to the minimum and maximum rows in a file. For 
this range type, the
+     * start row and end row must be non-null. The start row and end row are 
both considered
+     * inclusive. At load time, these data ranges will be mapped to table 
ranges. For this RangeType
+     * accumulo has to do more work at load to map the file range to tablets. 
Also, this will map a

Review Comment:
   ```suggestion
        * Accumulo has to do more work at load to map the file range to 
tablets. Also, this will map a
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to