[ 
https://issues.apache.org/jira/browse/HDFS-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571701#comment-17571701
 ] 

ZanderXu edited comment on HDFS-16645 at 7/27/22 2:36 AM:
----------------------------------------------------------

[~weichiu][~smeng] Thanks for your comments.

bq. Would you give some background on how/when this issue is observed?

I found this problem when I started a new JournalNode with some copies data 
from other JournalNodes.

In addition, multiple in-progress segments will appear when JournalNode is 
restarted.

bq.  how did we end up having multiple of them, because it was supposed to 
finalize the inprogress properly.

Yes, there should generally not be multiple in-progress segments. But it seems 
that we can't avoid multiple segments in some abnormal cases, such as 
journalnode is killed unexpected, machine restarts, started with some copies 
segments, and so on.

But we can do somethings to find and delete them in time:
* Try to delete the in-progress segment when JournalNode restarted 
* Try to find and delete them by JournalNodeSyncer

But we also need to do something in getEditLogManifest to use the latest 
in-progress segment.

[~weichiu][~smeng] If you have any other good ideas, please show me. I will 
code and push it forward.

Or maybe we can push this issue forward first, then create a new issue to 
delete invalid in-progress segments.


was (Author: xuzq_zander):
[~weichiu][~smeng] Thanks for your comments.

> Would you give some background on how/when this issue is observed?

I found this problem when I started a new JournalNode with some copies data 
from other JournalNodes.

In addition, multiple in-progress segments will appear when JournalNode is 
restarted.

>  how did we end up having multiple of them, because it was supposed to 
> finalize the inprogress properly.

Yes, there should generally not be multiple in-progress segments. But it seems 
that we can't avoid multiple segments in some abnormal cases, such as 
journalnode is killed unexpected, machine restarts, started with some copies 
segments, and so on.

But we can do somethings to find and delete them in time:
* Try to delete the in-progress segment when JournalNode restarted 
* Try to find and delete them by JournalNodeSyncer

But we also need to do something in getEditLogManifest to use the latest 
in-progress segment.

[~weichiu][~smeng] If you have any other good ideas, please show me. I will 
code and push it forward.

> Multi inProgress segments caused "Invalid log manifest"
> -------------------------------------------------------
>
>                 Key: HDFS-16645
>                 URL: https://issues.apache.org/jira/browse/HDFS-16645
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java}
> java.lang.IllegalStateException: Invalid log manifest (log [1-? 
> (in-progress)] overlaps [6-? (in-progress)])[[6-? (in-progress)], [1-? 
> (in-progress)]] CommittedTxId: 0 
>         at 
> org.apache.hadoop.hdfs.server.protocol.RemoteEditLogManifest.checkState(RemoteEditLogManifest.java:62)
>       at 
> org.apache.hadoop.hdfs.server.protocol.RemoteEditLogManifest.<init>(RemoteEditLogManifest.java:46)
>       at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:740)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to