Re: [PR] Fixes bug with getting disk usage of the ROOT table [accumulo]

via GitHub Wed, 16 Apr 2025 13:17:43 -0700


keith-turner commented on code in PR #5479:
URL: https://github.com/apache/accumulo/pull/5479#discussion_r2047657209



##########
server/base/src/main/java/org/apache/accumulo/server/util/TableDiskUsage.java:
##########
@@ -203,47 +195,44 @@ public static Map<SortedSet<String>,Long> 
getDiskUsage(Set<TableId> tableIds,
 
     // For each table ID
     for (TableId tableId : tableIds) {
-      // if the table to compute usage is for the metadata table itself then 
we need to scan the
-      // root table, else we scan the metadata table
-      try (Scanner mdScanner = tableId.equals(MetadataTable.ID)
-          ? client.createScanner(RootTable.NAME, Authorizations.EMPTY)
-          : client.createScanner(MetadataTable.NAME, Authorizations.EMPTY)) {
-        
mdScanner.fetchColumnFamily(MetadataSchema.TabletsSection.DataFileColumnFamily.NAME);
-        mdScanner.setRange(new KeyExtent(tableId, null, null).toMetaRange());
-
-        final Set<TabletFile> files = new HashSet<>();
-
-        // Read each file referenced by that table
-        for (Map.Entry<Key,Value> entry : mdScanner) {
-          final TabletFile file =
-              new TabletFile(new 
Path(entry.getKey().getColumnQualifier().toString()));
-
-          // get the table referenced by the file which may not be the same as 
the current
-          // table we are scanning if the file is shared between multiple 
tables
-          final TableId fileTableRef = file.getTableId();
-
-          // if this is a ref to a different table than the one we are 
scanning then we need
-          // to make sure the table is also linked for this shared file if the 
table is
-          // part of the set of tables we are running du on so we can track 
shared usages
-          if (!fileTableRef.equals(tableId) && 
tableIds.contains(fileTableRef)) {
-            // link the table and the shared file for computing shared sizes
-            tdu.linkFileAndTable(fileTableRef, file.getFileName());
-          }
-
-          // link the file to the table we are scanning for
-          tdu.linkFileAndTable(tableId, file.getFileName());
-
-          // add the file size for the table if not already seen for this scan
-          if (files.add(file)) {
-            // This tracks the file size for individual files for computing 
shared file statistics
-            // later
-            tdu.addFileSize(file.getFileName(),
-                new DataFileValue(entry.getValue().get()).getSize());
+      // read the metadata
+      try (var tabletsMetadata =
+          ((ClientContext) 
client).getAmple().readTablets().forTable(tableId).build()) {

Review Comment:
   This was only reading the file column previously, can keep doing that to cut 
down on how much data is read.
   
   ```suggestion
             ((ClientContext) 
client).getAmple().readTablets().forTable(tableId).fetch(FILES).build()) {
   ```



##########
server/base/src/test/java/org/apache/accumulo/server/util/TableDiskUsageTest.java:
##########
@@ -75,17 +72,20 @@ public static void beforeClass() {
 
   @Test
   public void testSingleTableMultipleTablets() throws Exception {
-    final ServerContext client = EasyMock.createMock(ServerContext.class);
-    final Scanner scanner = EasyMock.createMock(Scanner.class);
-    mockScan(client, scanner, 1);
+    final ClientContext client = EasyMock.createMock(ClientContext.class);

Review Comment:
   The more levels of mocking there is in test code, the harder it is to 
refactor code the cod under test.   Sometimes its better to change the code to 
make it easy to test and to minimize the need for mocking.   For example maybe 
could have changed :
   
   ```java
    public static Map<SortedSet<String>,Long> getDiskUsage(Set<TableId> 
tableIds,
         AccumuloClient client) 
   ```
   
   to :
   
   ```java
    public static Map<SortedSet<String>,Long> getDiskUsage(Set<TableId> 
tableIds,
         Function<TableId, Iterable<Map<StoredTabletFile,DataFileValue>>> 
tableFilesProvider) 
   ```
   
   Maybe this would make the code easier to test.  Not a change for this PR as 
the changes w/ easy mock were already made.   just something to consider for 
future changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fixes bug with getting disk usage of the ROOT table [accumulo]

Reply via email to