keith-turner commented on PR #5615:
URL: https://github.com/apache/accumulo/pull/5615#issuecomment-2949718240
These changes assume RFile can quickly skip an entire file column family
when its not present in the file. Did some local testing to make sure that
assumption is correct and found it is. Would still want to do an end to end
testing, but hoping that metadata tablets w/o the family can be quickly skipped
w/ this change. This is the test I wrote.
```java
public class RFilePerfTest {
public static void main(String[] args) throws IOException {
Random rand = new Random();
Files.deleteIfExists(Path.of("/tmp/test.rf"));
try (var writer = RFile.newWriter().to("file:///tmp/test.rf").build()) {
writer.startNewLocalityGroup("LG1", "loc", "ecomp");
for (int i = 0; i < 10_000_000; i++) {
String row = String.format("%09x", i);
int port = rand.nextInt(1 << 16);
writer.append(new Key(row, "loc", "127.0.0.1:" + port), new
Value(""));
}
writer.startDefaultLocalityGroup();
for (int i = 0; i < 10_000_000; i++) {
String row = String.format("%09x", i);
writer.append(new Key(row, "tab", "pr"), new
Value(String.format("%09x", i - 1)));
}
}
try (var scanner =
RFile.newScanner().from("file:///tmp/test.rf").build()) {
for (String family : List.of("loc","ecomp","tab","migration")) {
scanner.setRange(new Range());
scanner.clearColumns();
scanner.fetchColumnFamily(new Text(family));
long t1 = System.currentTimeMillis();
long size = Iterables.size(scanner);
long t2 = System.currentTimeMillis();
System.out.printf("family:%10s size:%,d time:%,d\n", family, size,
t2-t1);
}
}
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]