[ https://issues.apache.org/jira/browse/HDFS-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571701#comment-17571701 ]
ZanderXu edited comment on HDFS-16645 at 7/27/22 2:36 AM: ---------------------------------------------------------- [~weichiu][~smeng] Thanks for your comments. bq. Would you give some background on how/when this issue is observed? I found this problem when I started a new JournalNode with some copies data from other JournalNodes. In addition, multiple in-progress segments will appear when JournalNode is restarted. bq. how did we end up having multiple of them, because it was supposed to finalize the inprogress properly. Yes, there should generally not be multiple in-progress segments. But it seems that we can't avoid multiple segments in some abnormal cases, such as journalnode is killed unexpected, machine restarts, started with some copies segments, and so on. But we can do somethings to find and delete them in time: * Try to delete the in-progress segment when JournalNode restarted * Try to find and delete them by JournalNodeSyncer But we also need to do something in getEditLogManifest to use the latest in-progress segment. [~weichiu][~smeng] If you have any other good ideas, please show me. I will code and push it forward. Or maybe we can push this issue forward first, then create a new issue to delete invalid in-progress segments. was (Author: xuzq_zander): [~weichiu][~smeng] Thanks for your comments. > Would you give some background on how/when this issue is observed? I found this problem when I started a new JournalNode with some copies data from other JournalNodes. In addition, multiple in-progress segments will appear when JournalNode is restarted. > how did we end up having multiple of them, because it was supposed to > finalize the inprogress properly. Yes, there should generally not be multiple in-progress segments. But it seems that we can't avoid multiple segments in some abnormal cases, such as journalnode is killed unexpected, machine restarts, started with some copies segments, and so on. But we can do somethings to find and delete them in time: * Try to delete the in-progress segment when JournalNode restarted * Try to find and delete them by JournalNodeSyncer But we also need to do something in getEditLogManifest to use the latest in-progress segment. [~weichiu][~smeng] If you have any other good ideas, please show me. I will code and push it forward. > Multi inProgress segments caused "Invalid log manifest" > ------------------------------------------------------- > > Key: HDFS-16645 > URL: https://issues.apache.org/jira/browse/HDFS-16645 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: ZanderXu > Assignee: ZanderXu > Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code:java} > java.lang.IllegalStateException: Invalid log manifest (log [1-? > (in-progress)] overlaps [6-? (in-progress)])[[6-? (in-progress)], [1-? > (in-progress)]] CommittedTxId: 0 > at > org.apache.hadoop.hdfs.server.protocol.RemoteEditLogManifest.checkState(RemoteEditLogManifest.java:62) > at > org.apache.hadoop.hdfs.server.protocol.RemoteEditLogManifest.<init>(RemoteEditLogManifest.java:46) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:740) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org