[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration
[ https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219664#comment-15219664 ] Duo Zhang commented on HDFS-8782: - +1 > Upgrade to block ID-based DN storage layout delays DN registration > -- > > Key: HDFS-8782 > URL: https://issues.apache.org/jira/browse/HDFS-8782 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Priority: Critical > > We have seen multiple incidents at production sites that there are long > delays for DNs to register to the NN when upgrading to post 2.6 release. > Further investigation shows that the DN is blocked when upgrading the storage > layout introduced in HDFS-6482. The new storage layout requires making up to > 64k directories in the underlying file system. Unfortunately the current > implementation calls {{mkdirs()}} sequentially and upgrades each volume in > sequential order. > As a result, upgrading a DN with a lot of disks or with blocks that have > random block ID takes a long time (usually in hours), and the DN won't > register to the NN unless it finishes upgrading all the storage directory. > The excessive delays confuse operations and break the assumption of rolling > upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration
[ https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219536#comment-15219536 ] Vinayakumar B commented on HDFS-8782: - Since HDFS-8578 is resolved now, can we close this as duplicate? > Upgrade to block ID-based DN storage layout delays DN registration > -- > > Key: HDFS-8782 > URL: https://issues.apache.org/jira/browse/HDFS-8782 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Priority: Critical > > We have seen multiple incidents at production sites that there are long > delays for DNs to register to the NN when upgrading to post 2.6 release. > Further investigation shows that the DN is blocked when upgrading the storage > layout introduced in HDFS-6482. The new storage layout requires making up to > 64k directories in the underlying file system. Unfortunately the current > implementation calls {{mkdirs()}} sequentially and upgrades each volume in > sequential order. > As a result, upgrading a DN with a lot of disks or with blocks that have > random block ID takes a long time (usually in hours), and the DN won't > register to the NN unless it finishes upgrading all the storage directory. > The excessive delays confuse operations and break the assumption of rolling > upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration
[ https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989160#comment-14989160 ] Vinayakumar B commented on HDFS-8782: - HDFS-8578, is also related to upgrade in parallel. > Upgrade to block ID-based DN storage layout delays DN registration > -- > > Key: HDFS-8782 > URL: https://issues.apache.org/jira/browse/HDFS-8782 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Priority: Critical > > We have seen multiple incidents at production sites that there are long > delays for DNs to register to the NN when upgrading to post 2.6 release. > Further investigation shows that the DN is blocked when upgrading the storage > layout introduced in HDFS-6482. The new storage layout requires making up to > 64k directories in the underlying file system. Unfortunately the current > implementation calls {{mkdirs()}} sequentially and upgrades each volume in > sequential order. > As a result, upgrading a DN with a lot of disks or with blocks that have > random block ID takes a long time (usually in hours), and the DN won't > register to the NN unless it finishes upgrading all the storage directory. > The excessive delays confuse operations and break the assumption of rolling > upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration
[ https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989130#comment-14989130 ] Duo Zhang commented on HDFS-8782: - I tried to make {{DataStorage.addStorageLocations}} run parallelly but I found it is difficult. There are some properties in {{DataStorage}}(inherit from {{StorageInfo}}) which will be updated when loading {{StorageDirectory}}, such as {{layoutVersion}}, so it may have side effect when changing the code from sequential to parallel even if I use lock everywhere to protect these properties. I do not get the point why we need a {{layoutVersion}} in {{DataStorage}}? As far as I know, {{DataStorage}} is only a container of {{StorageDirectory}} or {{BlockPoolSliceStorage}} if federation is enabled. So what does the {{layoutVersion}} in {{DataStorage}} mean? Is there any history reason for keeping it? Thanks. > Upgrade to block ID-based DN storage layout delays DN registration > -- > > Key: HDFS-8782 > URL: https://issues.apache.org/jira/browse/HDFS-8782 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Priority: Critical > > We have seen multiple incidents at production sites that there are long > delays for DNs to register to the NN when upgrading to post 2.6 release. > Further investigation shows that the DN is blocked when upgrading the storage > layout introduced in HDFS-6482. The new storage layout requires making up to > 64k directories in the underlying file system. Unfortunately the current > implementation calls {{mkdirs()}} sequentially and upgrades each volume in > sequential order. > As a result, upgrading a DN with a lot of disks or with blocks that have > random block ID takes a long time (usually in hours), and the DN won't > register to the NN unless it finishes upgrading all the storage directory. > The excessive delays confuse operations and break the assumption of rolling > upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration
[ https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740128#comment-14740128 ] Duo Zhang commented on HDFS-8782: - I think at least we could upgrade each volume parallel? I tried upgrading from 2.5.0 to 2.7.1. It spent more than 20 minutes on a 3T * 11 datanode... If parallel, the halt time could reduce to 2 minutes I think? Thanks. > Upgrade to block ID-based DN storage layout delays DN registration > -- > > Key: HDFS-8782 > URL: https://issues.apache.org/jira/browse/HDFS-8782 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Priority: Critical > > We have seen multiple incidents at production sites that there are long > delays for DNs to register to the NN when upgrading to post 2.6 release. > Further investigation shows that the DN is blocked when upgrading the storage > layout introduced in HDFS-6482. The new storage layout requires making up to > 64k directories in the underlying file system. Unfortunately the current > implementation calls {{mkdirs()}} sequentially and upgrades each volume in > sequential order. > As a result, upgrading a DN with a lot of disks or with blocks that have > random block ID takes a long time (usually in hours), and the DN won't > register to the NN unless it finishes upgrading all the storage directory. > The excessive delays confuse operations and break the assumption of rolling > upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)