cshannon commented on PR #3875:
URL: https://github.com/apache/accumulo/pull/3875#issuecomment-1773017530

   @keith-turner - I reviewed the PR in detail and I like it a lot so far with 
the strategy change. The strategy to only fence/delete files and not tablets 
and then to just merge away empty tablets definitely simplifies things quite a 
bit. The switch to use FATE definitely made it a lot easier to do after TGW 
since you could just append another merge FATE step. The conditional mutation 
changes I need to look a bit more at but look good to me so far but I am 
relatively new to looking at conditional mutations. It will be pretty nice that 
we will be able to concurrently merge/delete at the same time on different 
ranges in the table.
   
   **In regards to the test failure:**
   
   I took a look closely at the test and the changes here will cause the 
resulting tablets after deletion and also the fencing in some cases to be 
different, but still correct rows are visible. This is because of the change is 
strategy with keeping empty tablets and merging. It also leads to a nice side 
effect of having a bit cleaner metadata as the end result. (i explain more 
below)
   
   The part of the test that is failing is the part that verifies what tablets 
still exist, it's not checking split points there. With your changes in this PR 
the resulting tablets are different (as well as fenced files).
   
   For example, the first test in DeleteRowsIT under testManyRows() the range 
used is `(f, h]` so it deletes rows after f and including row h which delete 
two entire tablets. Under the old deletion strategy this causes the g and h 
tablets to be deleted and a new tablet is created with f being stretched to i. 
Because of the stretching we end up with a fenced file that includes (h,i] as 
that range was not part of the delete range.
   
   After delete we end up with the metadata:
   
   ```
   Extent: 1;f;e; File Name: F0000003.rf; Range: (-inf,+inf); Entries: 9, Size: 
24
   Extent: 1;i;f; File Name: F0000003.rf; Range: [h%00; : [] 
9223372036854775807 false,i%00; : [] 9223372036854775807 false); Entries: 9, 
Size: 25
   Extent: 1;j;i; File Name: F0000003.rf; Range: (-inf,+inf); Entries: 9, Size: 
24
   ```
   
   Under the new strategy in this PR we don't delete tablets but instead merge 
empty tablets. After deletion processes we end up with two tablets that still 
exist but have zero files, (f,g] and (g,h].
   
   When the merge operation runs it removes only the "g" empty tablet but keeps 
the "h" tablet even though both of those are empty tablets. We no longer need 
to fence (h,i] because that tablet still exists as we didn't create a new 
tablet that stretched from (f,i] so no fencing is required. The end result is 
we are just left with the following metadata.
   
   ```
   Extent: 1;f;e; File Name: F0000036.rf; Range: (-inf,+inf); Entries: 10, 
Size: 29
   Extent: 1;i;h; File Name: F0000036.rf; Range: (-inf,+inf); Entries: 10, 
Size: 29
   Extent: 1;j;i; File Name: F0000036.rf; Range: (-inf,+inf); Entries: 10, 
Size: 29
   ```
   
   Running the test and the assertion failure it prints out:
   
   ```
   Expected :abcdefijklmnopqrstuvwxyz
   Actual   :abcdefhijklmnopqrstuvwxyz
   ```
   Obviously the failure is because the "h" split/tablet still exists and only 
the "g" tablet was removed. This is because the merge operation "includes" the 
last row specified when you run it. So in this case the merge Fate Op that is 
run after Delete includes the "h" as the last row so it merges to the tablet 
that includes that row so in this case "h" tablet is kept even though it's 
empty. I have to think about it more but this is probably just an edge case and 
may not be a big deal. In order to fix it I think you would need to specify 
that the merge range goes to the row after the delete range.  As a quick test I 
tried to just do the following in DeleteRows Fate operation (I had to add a 
couple getters):
   
   ```java
     @Override
     public Repo<Manager> call(long tid, Manager manager) throws Exception {
       MergeInfo mergeInfo = data.getMergeInfo();
   
       // delete or fence files within the deletion range
       deleteTabletFiles(manager, tid, mergeInfo);
   
       // merge away empty tablets in the deletion range
      // Merge to the following row
       TableRangeOp d = new TableRangeOp(MergeInfo.Operation.DELETE, 
data.getNamespaceId(), data.getTableId(),
               mergeInfo.getExtent().prevEndRow(), new 
Key(mergeInfo.getExtent().endRow()).followingKey(PartialKey.ROW).getRow());
       return new MergeTablets(d.getData());
     }
   ```
   
   This however failed because the "i" tablet didn't have the opid, so would 
need to add that if we wanted to do it:
   
   ```
   2023-10-20T11:59:15,960 [merge.MergeTablets] DEBUG: FATE[25e4ddf74c11462b] 
Merging metadata for 1;h^@;f
   2023-10-20T11:59:15,967 [fate.Fate] WARN : Failed to execute Repo 
FATE[25e4ddf74c11462b]
   java.lang.IllegalStateException: FATE[25e4ddf74c11462b] merging tablet 1;i;h 
had unexpected opid null
           at 
com.google.common.base.Preconditions.checkState(Preconditions.java:854) 
~[guava-32.0.1-jre.jar:?]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to