keith-turner commented on code in PR #5479:
URL: https://github.com/apache/accumulo/pull/5479#discussion_r2047657209
##########
server/base/src/main/java/org/apache/accumulo/server/util/TableDiskUsage.java:
##########
@@ -203,47 +195,44 @@ public static Map<SortedSet<String>,Long>
getDiskUsage(Set<TableId> tableIds,
// For each table ID
for (TableId tableId : tableIds) {
- // if the table to compute usage is for the metadata table itself then
we need to scan the
- // root table, else we scan the metadata table
- try (Scanner mdScanner = tableId.equals(MetadataTable.ID)
- ? client.createScanner(RootTable.NAME, Authorizations.EMPTY)
- : client.createScanner(MetadataTable.NAME, Authorizations.EMPTY)) {
-
mdScanner.fetchColumnFamily(MetadataSchema.TabletsSection.DataFileColumnFamily.NAME);
- mdScanner.setRange(new KeyExtent(tableId, null, null).toMetaRange());
-
- final Set<TabletFile> files = new HashSet<>();
-
- // Read each file referenced by that table
- for (Map.Entry<Key,Value> entry : mdScanner) {
- final TabletFile file =
- new TabletFile(new
Path(entry.getKey().getColumnQualifier().toString()));
-
- // get the table referenced by the file which may not be the same as
the current
- // table we are scanning if the file is shared between multiple
tables
- final TableId fileTableRef = file.getTableId();
-
- // if this is a ref to a different table than the one we are
scanning then we need
- // to make sure the table is also linked for this shared file if the
table is
- // part of the set of tables we are running du on so we can track
shared usages
- if (!fileTableRef.equals(tableId) &&
tableIds.contains(fileTableRef)) {
- // link the table and the shared file for computing shared sizes
- tdu.linkFileAndTable(fileTableRef, file.getFileName());
- }
-
- // link the file to the table we are scanning for
- tdu.linkFileAndTable(tableId, file.getFileName());
-
- // add the file size for the table if not already seen for this scan
- if (files.add(file)) {
- // This tracks the file size for individual files for computing
shared file statistics
- // later
- tdu.addFileSize(file.getFileName(),
- new DataFileValue(entry.getValue().get()).getSize());
+ // read the metadata
+ try (var tabletsMetadata =
+ ((ClientContext)
client).getAmple().readTablets().forTable(tableId).build()) {
Review Comment:
This was only reading the file column previously, can keep doing that to cut
down on how much data is read.
```suggestion
((ClientContext)
client).getAmple().readTablets().forTable(tableId).fetch(FILES).build()) {
```
##########
server/base/src/test/java/org/apache/accumulo/server/util/TableDiskUsageTest.java:
##########
@@ -75,17 +72,20 @@ public static void beforeClass() {
@Test
public void testSingleTableMultipleTablets() throws Exception {
- final ServerContext client = EasyMock.createMock(ServerContext.class);
- final Scanner scanner = EasyMock.createMock(Scanner.class);
- mockScan(client, scanner, 1);
+ final ClientContext client = EasyMock.createMock(ClientContext.class);
Review Comment:
The more levels of mocking there is in test code, the harder it is to
refactor code the cod under test. Sometimes its better to change the code to
make it easy to test and to minimize the need for mocking. For example maybe
could have changed :
```java
public static Map<SortedSet<String>,Long> getDiskUsage(Set<TableId>
tableIds,
AccumuloClient client)
```
to :
```java
public static Map<SortedSet<String>,Long> getDiskUsage(Set<TableId>
tableIds,
Function<TableId, Iterable<Map<StoredTabletFile,DataFileValue>>>
tableFilesProvider)
```
Maybe this would make the code easier to test. Not a change for this PR as
the changes w/ easy mock were already made. just something to consider for
future changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]