[GitHub] [hadoop] ZanderXu commented on pull request #4744: HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-10-28 Thread GitBox


ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1294684900

   @xkrogen Sir, can you help me finally review it? 
   
   @ashutoshcipher @tomscut @ayushtkn @Hexiaoqiao Sir, can help me to 
double-review it when you are available?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ZanderXu commented on pull request #4744: HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-24 Thread GitBox


ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1226679229

   ```
if (curSegment != null) { 
  LOG.warn("Client is requesting a new log segment " + txid +  
  " though we are already writing " + curSegment + ". " + 
  "Aborting the current segment in order to begin the new one." + 
  " ; journal id: " + journalId); 
  // The writer may have lost a connection to us and is now 
  // re-connecting after the connection came back. 
  // We should abort our own old segment. 
  abortCurSegment(); 
} 
   ```
   
   The `abortCurSegment()` just aborts the current segment, but not finalize 
the current inProgress segment, so may result in  two inProgress segment files 
on disk.
   
   > So are we agreed that the best way forward is to modify 
recoverUnclosedStreams() to throw exception on failure, then we can use 
inProgressOk = false to solve this problem as you originally proposed?
   
   Yes, I totally agree with this and I will modify this patch with this idea. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ZanderXu commented on pull request #4744: HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-22 Thread GitBox


ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1222443598

   @abhishekkarigar  Thanks for your attention to this issue.  
   
   @xkrogen and I will solve this problem as soon as possible.
   
   @xkrogen Sir, please review the latest patch. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ZanderXu commented on pull request #4744: HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-19 Thread GitBox


ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1221235271

   @xkrogen Thanks.
   > I think you might have also understood what I was saying in my last comment
   
   Yes, I got it. 
   
   > I am thinking we can add a new parameter to 
LogsPurgeable#selectInputStreams() like preferBulkReads.
   
   This is a good idea, I will fix this patch like this.
   
   > In recoverUnclosedStreams, if the finalization fails, it will just ignore 
it and assume that it will be handled later (by Journal#startLogSegment(), 
which will automatically close an old stream when you try to open a new one).
   
   > In recoverUnclosedStreams, if the finalization fails, it will just ignore 
it
   
   Sorry, I didn't notice this. But I think it's crazy. 
   
   > assume that it will be handled later (by Journal#startLogSegment(), which 
will automatically close an old stream when you try to open a new one).
   
   I'm sorry, I just find this comment, but didn't find related code to 
finalize the previous inProgress segment. Can you share the related code? 
Thanks. 
   
   
   > If there is something preventing the new active from communicating with 
the JNs, or something preventing the JNs from finalizing the old segment, then 
the NN will eventually fail to become active regardless.
   
   Yes, I agree. Standby should crash or fail to become active if it cannot 
finalize the old segment. About this case, How about fix it in a new PR? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ZanderXu commented on pull request #4744: HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-19 Thread GitBox


ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1220527843

   @xkrogen Master, I have update this patch via enable or disable 
inProgressTailing in `QuorumJournalManager`, please help me review it. If you 
have any good ideas, I'd be happy to modify this patch as intended.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ZanderXu commented on pull request #4744: HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-19 Thread GitBox


ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1220377078

   > If the active crashed, then the segment won't be finalized, right?
   
   If the active crashed, during standby starting active services, the standby 
will recover unclosed streams via `recoverUnclosedStreams`. So before 
`catchupDuringFailover`, the last segment should always closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ZanderXu commented on pull request #4744: HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-17 Thread GitBox


ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1218994368

   The same processing idea has also appeared in HDFS-14806.  
[Here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java#L113)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ZanderXu commented on pull request #4744: HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-17 Thread GitBox


ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1218979907

   @xkrogen Thanks for your review and comment. Sorry for the late replay.
   
   Before `catchupDuringFailover`,  whatever active crash or successfully 
changed to standby, the last segment in majority journalnode should be 
finalized.  My idea is `catchupDuringFailover` just ignore 
`selectRpcInputStreams` and fail back to use `selectStreamingInputStreams` with 
`getEditLogManifest`.  
   
   Unfortunately,  `catchupDuringFailover` can only use disable in-progress to 
ignore `selectRpcInputStreams`, because `inProgressTailingEnabled` in 
`QuorumJournalManager` is unchangeable.
   The key code is as follows:
   ```
 @Override
 public void selectInputStreams(Collection streams,
 long fromTxnId, boolean inProgressOk,
 boolean onlyDurableTxns) throws IOException {

   // Here, catchupDuringFailover should ignore this if branch and fail 
back to selectStreamingInputStreams
   
   if (inProgressOk && inProgressTailingEnabled) {
 LOG.debug("Tailing edits starting from txn ID {} via RPC mechanism", 
fromTxnId);
 try {
   Collection rpcStreams = new ArrayList<>();
   selectRpcInputStreams(rpcStreams, fromTxnId, onlyDurableTxns);
   streams.addAll(rpcStreams);
   return;
 } catch (IOException ioe) {
   LOG.warn("Encountered exception while tailing edits >= " + fromTxnId 
+
   " via RPC; falling back to streaming.", ioe);
 }
   }
   selectStreamingInputStreams(streams, fromTxnId, inProgressOk,
   onlyDurableTxns);
 }
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ZanderXu commented on pull request #4744: HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-15 Thread GitBox


ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1215017852

   @ferhui @xkrogen Master, this PR uses `getEditLogManifest()` to fix this 
problem. Please help me review it, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org