cshannon commented on PR #3875:
URL: https://github.com/apache/accumulo/pull/3875#issuecomment-1773017530
@keith-turner - I reviewed the PR in detail and I like it a lot so far with
the strategy change. The strategy to only fence/delete files and not tablets
and then to just merge away empty tablets definitely simplifies things quite a
bit. The switch to use FATE definitely made it a lot easier to do after TGW
since you could just append another merge FATE step. The conditional mutation
changes I need to look a bit more at but look good to me so far but I am
relatively new to looking at conditional mutations. It will be pretty nice that
we will be able to concurrently merge/delete at the same time on different
ranges in the table.
**In regards to the test failure:**
I took a look closely at the test and the changes here will cause the
resulting tablets after deletion and also the fencing in some cases to be
different, but still correct rows are visible. This is because of the change is
strategy with keeping empty tablets and merging. It also leads to a nice side
effect of having a bit cleaner metadata as the end result. (i explain more
below)
The part of the test that is failing is the part that verifies what tablets
still exist, it's not checking split points there. With your changes in this PR
the resulting tablets are different (as well as fenced files).
For example, the first test in DeleteRowsIT under testManyRows() the range
used is `(f, h]` so it deletes rows after f and including row h which delete
two entire tablets. Under the old deletion strategy this causes the g and h
tablets to be deleted and a new tablet is created with f being stretched to i.
Because of the stretching we end up with a fenced file that includes (h,i] as
that range was not part of the delete range.
After delete we end up with the metadata:
```
Extent: 1;f;e; File Name: F0000003.rf; Range: (-inf,+inf); Entries: 9, Size:
24
Extent: 1;i;f; File Name: F0000003.rf; Range: [h%00; : []
9223372036854775807 false,i%00; : [] 9223372036854775807 false); Entries: 9,
Size: 25
Extent: 1;j;i; File Name: F0000003.rf; Range: (-inf,+inf); Entries: 9, Size:
24
```
Under the new strategy in this PR we don't delete tablets but instead merge
empty tablets. After deletion processes we end up with two tablets that still
exist but have zero files, (f,g] and (g,h].
When the merge operation runs it removes only the "g" empty tablet but keeps
the "h" tablet even though both of those are empty tablets. We no longer need
to fence (h,i] because that tablet still exists as we didn't create a new
tablet that stretched from (f,i] so no fencing is required. The end result is
we are just left with the following metadata.
```
Extent: 1;f;e; File Name: F0000036.rf; Range: (-inf,+inf); Entries: 10,
Size: 29
Extent: 1;i;h; File Name: F0000036.rf; Range: (-inf,+inf); Entries: 10,
Size: 29
Extent: 1;j;i; File Name: F0000036.rf; Range: (-inf,+inf); Entries: 10,
Size: 29
```
Running the test and the assertion failure it prints out:
```
Expected :abcdefijklmnopqrstuvwxyz
Actual :abcdefhijklmnopqrstuvwxyz
```
Obviously the failure is because the "h" split/tablet still exists and only
the "g" tablet was removed. This is because the merge operation "includes" the
last row specified when you run it. So in this case the merge Fate Op that is
run after Delete includes the "h" as the last row so it merges to the tablet
that includes that row so in this case "h" tablet is kept even though it's
empty. I have to think about it more but this is probably just an edge case and
may not be a big deal. In order to fix it I think you would need to specify
that the merge range goes to the row after the delete range. As a quick test I
tried to just do the following in DeleteRows Fate operation (I had to add a
couple getters):
```java
@Override
public Repo<Manager> call(long tid, Manager manager) throws Exception {
MergeInfo mergeInfo = data.getMergeInfo();
// delete or fence files within the deletion range
deleteTabletFiles(manager, tid, mergeInfo);
// merge away empty tablets in the deletion range
// Merge to the following row
TableRangeOp d = new TableRangeOp(MergeInfo.Operation.DELETE,
data.getNamespaceId(), data.getTableId(),
mergeInfo.getExtent().prevEndRow(), new
Key(mergeInfo.getExtent().endRow()).followingKey(PartialKey.ROW).getRow());
return new MergeTablets(d.getData());
}
```
This however failed because the "i" tablet didn't have the opid, so would
need to add that if we wanted to do it:
```
2023-10-20T11:59:15,960 [merge.MergeTablets] DEBUG: FATE[25e4ddf74c11462b]
Merging metadata for 1;h^@;f
2023-10-20T11:59:15,967 [fate.Fate] WARN : Failed to execute Repo
FATE[25e4ddf74c11462b]
java.lang.IllegalStateException: FATE[25e4ddf74c11462b] merging tablet 1;i;h
had unexpected opid null
at
com.google.common.base.Preconditions.checkState(Preconditions.java:854)
~[guava-32.0.1-jre.jar:?]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]