[ 
https://issues.apache.org/jira/browse/HDFS-11760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan updated HDFS-11760:
-----------------------------
    Description: 
There's a case in my company: when two namenodes tried to transit state, 
checkpoint took place in standbynode and it didn't check cancellation until 
every after 4096 inodes saved, which took too much time between checks in our 
case and caused state transition fail at the end. After we changed it to 50, 
state transition succeed.

The *CHECK_CANCEL_INTERVAL* in *FSImageFormat.Saver* was introduced in 
HDFS-7097 and set to 4096, previously was 50. There may be some reasons for the 
choice of 4096, but i'm not sure that, so I suggest to make it configurable.

  was:
There's a case in my company: when two namenodes tried to transit state, 
checkpoint took place in standbynode and it didn't check cancellation until 
every after 4096 inodes saved, which took too much time between checks in our 
case and caused state transition fail at the end. After we changed it to 50, 
state transition succeed.

The *CHECK_CANCEL_INTERVAL* in _FSImageFormat.Saver_ was introduced in 
HDFS-7097 and set to 4096, previously was 50. There may be some reasons for the 
choice of 4096, but i'm not sure that, so I suggest to make it configurable.


> Make CHECK_CANCEL_INTERVAL configurable
> ---------------------------------------
>
>                 Key: HDFS-11760
>                 URL: https://issues.apache.org/jira/browse/HDFS-11760
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.0, 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>            Reporter: Reid Chan
>            Priority: Minor
>
> There's a case in my company: when two namenodes tried to transit state, 
> checkpoint took place in standbynode and it didn't check cancellation until 
> every after 4096 inodes saved, which took too much time between checks in our 
> case and caused state transition fail at the end. After we changed it to 50, 
> state transition succeed.
> The *CHECK_CANCEL_INTERVAL* in *FSImageFormat.Saver* was introduced in 
> HDFS-7097 and set to 4096, previously was 50. There may be some reasons for 
> the choice of 4096, but i'm not sure that, so I suggest to make it 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to