[ https://issues.apache.org/jira/browse/HDFS-15287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091084#comment-17091084 ]
Kihwal Lee commented on HDFS-15287: ----------------------------------- [~shv]. Sorry, I guess I was too terse. Here is what happens. It is clearly a rollback image, but the active namenode still rejects. When I disabled the check in code (changed the default to false), it works. This is branch-2.10. So it is not working as you intended. I did not check trunk. {noformat} 2020-04-07 20:17:05,686 [TransferFsImageUpload-62] INFO namenode.TransferFsImage: Sending fileName: /xxx/current/fsimage_rollback_000000000123456789, fileSize: 591328984. Sent total: 655360 bytes. Size of last segment in tended to send: 131072 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3479) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3462) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:377) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:321) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:229) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:275) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) {noformat} On the active namenode side, I see {noformat} 2020-04-07 20:17:05,686 [qtp2000648320-936001] INFO namenode.ImageServlet: ImageServlet allowing checkpointer: hdfs/mycluster....@myrealm.com 2020-04-07 20:17:05,686 [qtp2000648320-936001] WARN conf.Configuration: No unit for dfs.namenode.checkpoint.period(43200) assuming SECONDS {noformat} This WARN message is indication of the interval check kicking in. bq. Active NameNode checks whether to accept a checkpoint from a StandbyNode in order to avoid too frequent checkpoints in case of multiple Standby checkpointers. I understand the original intention, but that breaks existing use cases. Normal checkpointing can happen in two conditions. Either the configured time has passed or the number of transactions has exceeded the configured limit since last checkpoint. This check is rejecting images from the latter. This is a legitimate use case and we have relied on it for over a decade. At minimum, please make it configurable. > HDFS rollingupgrade prepare never finishes > ------------------------------------------ > > Key: HDFS-15287 > URL: https://issues.apache.org/jira/browse/HDFS-15287 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.10.0, 3.3.0 > Reporter: Kihwal Lee > Priority: Blocker > > After HDFS-12979, the prepare step of rolling upgrade does not work. This is > because it added additional check for sufficient time passing since last > checkpoint. Since RU rollback image creation and upload can happen any time, > uploading of rollback image never succeeds. For a new cluster deployed for > testing, it might work since it never checkpointed before. > It was found that this check is disabled for unit tests, defeating the very > purpose of testing. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org