keith-turner commented on code in PR #4207:
URL: https://github.com/apache/accumulo/pull/4207#discussion_r1478795655
##########
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java:
##########
@@ -83,13 +85,17 @@ public class GarbageCollectWriteAheadLogs {
this.fs = fs;
this.liveServers = liveServers;
this.walMarker = new WalStateManager(context);
- this.store = () -> Iterators.concat(
- context.getAmple().readTablets().forLevel(DataLevel.ROOT).filter(new
HasWalsFilter())
- .fetch(LOCATION, LAST, LOGS, PREV_ROW, SUSPEND).build().iterator(),
-
context.getAmple().readTablets().forLevel(DataLevel.METADATA).filter(new
HasWalsFilter())
- .fetch(LOCATION, LAST, LOGS, PREV_ROW, SUSPEND).build().iterator(),
- context.getAmple().readTablets().forLevel(DataLevel.USER).filter(new
HasWalsFilter())
- .fetch(LOCATION, LAST, LOGS, PREV_ROW,
SUSPEND).build().iterator());
+ TabletsMetadata root =
context.getAmple().readTablets().forLevel(DataLevel.ROOT)
+ .filter(new HasWalsFilter()).fetch(LOCATION, LAST, LOGS, PREV_ROW,
SUSPEND).build();
+ TabletsMetadata metadata =
context.getAmple().readTablets().forLevel(DataLevel.METADATA)
+ .filter(new HasWalsFilter()).fetch(LOCATION, LAST, LOGS, PREV_ROW,
SUSPEND).build();
+ TabletsMetadata user =
context.getAmple().readTablets().forLevel(DataLevel.USER)
+ .filter(new HasWalsFilter()).fetch(LOCATION, LAST, LOGS, PREV_ROW,
SUSPEND).build();
+ this.store = Streams.concat(root.stream(), metadata.stream(),
user.stream()).onClose(() -> {
Review Comment:
> Is your concern with if collect was called again on the same
GarbageCollectWriteAheadLogs object?
I did not realize its only called once. I was assuming the object was
reused and was wrong about that.
> Is that something we want to allow for?
No we don't need to allow for that, just need to support the way its used in
SimpleGarbageCollector
Looking at the code, there is another reason we my want to defer creating
the stream. Creating the streams in the construct will change the order in
which things are done in the GC algorithm.
Before these changes the code does the following
1. [Read
wals](https://github.com/apache/accumulo/blob/5f5cbd37a8efeaab4bc129c064680bbf4aa43be8/server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java#L122-L139)
2. [Read live
tservers](https://github.com/apache/accumulo/blob/5f5cbd37a8efeaab4bc129c064680bbf4aa43be8/server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java#L149)
3. [Create scanner and scan for refs to
wals](https://github.com/apache/accumulo/blob/5f5cbd37a8efeaab4bc129c064680bbf4aa43be8/server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java#L154).
Because the current code sets a lambda in the constructor, the scanner is not
actually created in the constructor.
After these changes the code does the following
1. In the constructor create the scanners and potentially read some data
2. Read wals
3. Read live tservers
4. use scanner created in constructor to scan for refs to wals
So the order in which data is read is changed. I am not sure if this change
in order matters, I would need to research it. I do know the order of reading
data is very important in the GC algorithms. So if changing the order in which
data is read would need to reason about the safety of doing that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]