ssdong edited a comment on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-811619487


   @jsbali To give out extra insights and details, as @zherenyu831 has posted 
in the beginning:
   ```
   [20210323080718__replacecommit__COMPLETED]: size : 0
   [20210323081449__replacecommit__COMPLETED]: size : 1
   [20210323082046__replacecommit__COMPLETED]: size : 1
   [20210323082758__replacecommit__COMPLETED]: size : 1
   [20210323084004__replacecommit__COMPLETED]: size : 1
   [20210323085044__replacecommit__COMPLETED]: size : 1
   [20210323085823__replacecommit__COMPLETED]: size : 1
   [20210323090550__replacecommit__COMPLETED]: size : 1
   [20210323091700__replacecommit__COMPLETED]: size : 1
   ```
   If we keep everything the same and let archive logic handling everything, it 
would fail at 0 `partitionToReplaceFileIds` against 
`20210323080718__replacecommit__COMPLETED`(the first item in the list above), 
and this is a known issue. 
   
   To make the archive work, we tried to _manually_ delete the first _empty_ 
commit file, which is `20210323080718__replacecommit__COMPLETED`(the first item 
in the list above). This has succeeded the archive, but instead, it has failed 
upon `User class threw exception: org.apache.hudi.exception.HoodieIOException: 
Could not read commit details from 
s3://xxx/data/.hoodie/20210323081449.replacecommit`(the second item in the list 
above)
   
   Now to reason through the underlying mechanism of this error, given the 
archive was successful, that means a few commit files have been placed within 
the `.archive` folder, let's say 
   ```
   [20210323081449__replacecommit__COMPLETED]: size : 1
   [20210323082046__replacecommit__COMPLETED]: size : 1
   [20210323082758__replacecommit__COMPLETED]: size : 1
   [20210323084004__replacecommit__COMPLETED]: size : 1
   [20210323085044__replacecommit__COMPLETED]: size : 1
   ```
   have been successfully moved and placed in `.archive`. At this moment, the 
timeline has been updated and there are 3 remaining commit files which are:
   ```
   [20210323085823__replacecommit__COMPLETED]: size : 1
   [20210323090550__replacecommit__COMPLETED]: size : 1
   [20210323091700__replacecommit__COMPLETED]: size : 1
   ```
   
   Now, if you pay attention to the stack trace which caused `User class threw 
exception: org.apache.hudi.exception.HoodieIOException: Could not read commit 
details from s3://xxx/data/.hoodie/20210323081449.replacecommit`, and I am just 
pasting them again:
   ```
   User class threw exception: org.apache.hudi.exception.HoodieIOException: 
Could not read commit details from 
s3://xxx/data/.hoodie/20210323081449.replacecommit
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530)
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
   at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
   at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
   at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
   at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
   at 
java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
   at 
org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
   at 
org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179)
   at 
org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112)
   ```
   
   After a `close` action being triggered on `TimelineService`, which is 
understandable, it propagates to `HoodieTableFileSystemView.close` and there is:
   ```
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
   ```
   happening right after it. Now I am not exactly sure why we need an `init` 
after `close` being called upon the `HoodieTableFileSystemView`.(Probably 
someone with deep knowledge could answer it). If you look at the source code, 
the `reset` and `init` are _initializing with new Hoodie timeline_.  (
   ```
   @Override
     public final void reset() {
       try {
         writeLock.lock();
   
         addedPartitions.clear();
         resetViewState();
   
         bootstrapIndex = null;
   
         // Initialize with new Hoodie timeline.
         init(metaClient, getTimeline());
       } finally {
         writeLock.unlock();
       }
     }
   ```
   This above `getTimeline()` _didn't_ really fetch a _new_ timeline since 
TimelineService has been closed, and obviously `public void sync()` isn't being 
triggered, which resets the old timeline with the new ones. The Hudi table 
view's in-memory timeline remains the very _old_ timeline, i.e. the one 
_before_ doing the archive. If it tries to read those commits from in-memory 
and performance corresponding actions, it will certainly fail without a doubt 
since we have archived the commit files, and now they exist in the `.archive` 
folder.
   
   It does sound like a paradox here, given that it throws exceptions _after_ 
we manually delete the commit file to try to save the archive logic. Shouldn't 
it already exists from the beginning that even if we have a successful 
archiving action, the in-memory timeline remains old based on the mechanisms we 
do closing and initializing against a Hudi table view?
   
   Thoughts? 😅 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to