Github user joshelser commented on a diff in the pull request:
https://github.com/apache/accumulo/pull/143#discussion_r77050549
--- Diff:
test/src/test/java/org/apache/accumulo/test/functional/GarbageCollectorIT.java
---
@@ -139,6 +150,52 @@ public void gcTest() throws Exception {
}
@Test
+ public void gcDeleteDeadTServerWAL() throws Exception {
+ // Kill GC process
+ killMacGc();
+
+ // Create table and ingest data
+ Connector c = getConnector();
+ c.tableOperations().create("test_ingest");
+ c.tableOperations().setProperty("test_ingest",
Property.TABLE_SPLIT_THRESHOLD.getKey(), "5K");
+ String tableId =
getConnector().tableOperations().tableIdMap().get("test_ingest");
+ TestIngest.Opts opts = new TestIngest.Opts();
+ VerifyIngest.Opts vopts = new VerifyIngest.Opts();
+ vopts.rows = opts.rows = 10000;
+ vopts.cols = opts.cols = 1;
+ opts.setPrincipal("root");
+ vopts.setPrincipal("root");
+ TestIngest.ingest(c, opts, new BatchWriterOpts());
+
+ // Test WAL log has been created
+ List<String> walsBefore = getWALsForTableId(tableId);
+ Assert.assertEquals("Should be one WAL", 1, walsBefore.size());
+
+ // Flush and check for no WAL logs
+ c.tableOperations().flush("test_ingest", null, null, true);
+ List<String> walsAfter = getWALsForTableId(tableId);
+ Assert.assertEquals("Should be no WALs", 0, walsAfter.size());
+
+ // Validate WAL file still exists
+ String walFile =
walsBefore.get(0).split("\\|")[0].replaceFirst("file:///", "");
+ File wf = new File(walFile);
+ Assert.assertEquals("WAL file does not exist", true, wf.exists());
+
+ // Kill TServer and give it some time to die and master to rebalance
+ killMacTServer();
+ UtilWaitThread.sleep(5000);
+
+ // Restart GC and let it run
+ Process gc = getCluster().exec(SimpleGarbageCollector.class);
+ UtilWaitThread.sleep(60000);
+
+ // Then check the log for proper events
+ String output = FunctionalTestUtils.readAll(getCluster(),
SimpleGarbageCollector.class, gc);
+ assertTrue("WAL GC should have started", output.contains("Beginning
garbage collection of write-ahead logs"));
+ assertTrue("WAL was not removed even though tserver was down",
output.contains("Removing WAL for offline server"));
--- End diff --
I missed this the first time around (the test failing locally caused me to
look more closely). We shouldn't be writing tests based on log messages. We
should do a check on the contents of the filesystem using the file you computed
earlier (in `walsBefore`). Going to look at this and see if I can fix it
quickly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---