keith-turner commented on code in PR #4207:
URL: https://github.com/apache/accumulo/pull/4207#discussion_r1478795655


##########
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java:
##########
@@ -83,13 +85,17 @@ public class GarbageCollectWriteAheadLogs {
     this.fs = fs;
     this.liveServers = liveServers;
     this.walMarker = new WalStateManager(context);
-    this.store = () -> Iterators.concat(
-        context.getAmple().readTablets().forLevel(DataLevel.ROOT).filter(new 
HasWalsFilter())
-            .fetch(LOCATION, LAST, LOGS, PREV_ROW, SUSPEND).build().iterator(),
-        
context.getAmple().readTablets().forLevel(DataLevel.METADATA).filter(new 
HasWalsFilter())
-            .fetch(LOCATION, LAST, LOGS, PREV_ROW, SUSPEND).build().iterator(),
-        context.getAmple().readTablets().forLevel(DataLevel.USER).filter(new 
HasWalsFilter())
-            .fetch(LOCATION, LAST, LOGS, PREV_ROW, 
SUSPEND).build().iterator());
+    TabletsMetadata root = 
context.getAmple().readTablets().forLevel(DataLevel.ROOT)
+        .filter(new HasWalsFilter()).fetch(LOCATION, LAST, LOGS, PREV_ROW, 
SUSPEND).build();
+    TabletsMetadata metadata = 
context.getAmple().readTablets().forLevel(DataLevel.METADATA)
+        .filter(new HasWalsFilter()).fetch(LOCATION, LAST, LOGS, PREV_ROW, 
SUSPEND).build();
+    TabletsMetadata user = 
context.getAmple().readTablets().forLevel(DataLevel.USER)
+        .filter(new HasWalsFilter()).fetch(LOCATION, LAST, LOGS, PREV_ROW, 
SUSPEND).build();
+    this.store = Streams.concat(root.stream(), metadata.stream(), 
user.stream()).onClose(() -> {

Review Comment:
   > Is your concern with if collect was called again on the same 
GarbageCollectWriteAheadLogs object?
   
   I did not realize its only called once.  I was assuming the object was 
reused and was wrong about that.
   
   > Is that something we want to allow for? 
   
   No we don't need to allow for that, just need to support the way its used in 
SimpleGarbageCollector
   
   Looking at the code, there is another reason we my want to defer creating 
the stream.  Creating the streams in the construct will change the order in 
which things are done in the GC algorithm.
   
   Before these changes the code does the following
   
    1. [Read 
wals](https://github.com/apache/accumulo/blob/5f5cbd37a8efeaab4bc129c064680bbf4aa43be8/server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java#L122-L139)
    2. [Read live 
tservers](https://github.com/apache/accumulo/blob/5f5cbd37a8efeaab4bc129c064680bbf4aa43be8/server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java#L149)
    3. [Create scanner and scan for refs to 
wals](https://github.com/apache/accumulo/blob/5f5cbd37a8efeaab4bc129c064680bbf4aa43be8/server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java#L154).
  Because the current code sets a lambda in the constructor, the scanner is not 
actually created in the constructor.
   
   After these changes the code does the following
   
    1. In the constructor create the scanners and potentially read some data
    2. Read wals
    3. Read live tservers
    4. use scanner created in constructor to scan for refs to wals
   
   So the order in which data is read is changed.  I am not sure if this change 
in order matters, I would need to research it.  I do know the order of reading 
data is very important in the GC algorithms.  So if changing the order in which 
data is read would need to reason about the safety of doing that.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to