[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration

2016-03-31 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219664#comment-15219664
 ] 

Duo Zhang commented on HDFS-8782:
-

+1

> Upgrade to block ID-based DN storage layout delays DN registration
> --
>
> Key: HDFS-8782
> URL: https://issues.apache.org/jira/browse/HDFS-8782
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Priority: Critical
>
> We have seen multiple incidents at production sites that there are long 
> delays for DNs to register to the NN when upgrading to post 2.6 release.
> Further investigation shows that the DN is blocked when upgrading the storage 
> layout introduced in HDFS-6482. The new storage layout requires making up to 
> 64k directories in the underlying file system. Unfortunately the current 
> implementation calls {{mkdirs()}} sequentially and upgrades each volume in 
> sequential order.
> As a result, upgrading a DN with a lot of disks or with blocks that have 
> random block ID takes a long time (usually in hours), and the DN won't 
> register to the NN unless it finishes upgrading all the storage directory. 
> The excessive delays confuse operations and break the assumption of rolling 
> upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration

2016-03-31 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219536#comment-15219536
 ] 

Vinayakumar B commented on HDFS-8782:
-

Since HDFS-8578 is resolved now, can we close this as duplicate?

> Upgrade to block ID-based DN storage layout delays DN registration
> --
>
> Key: HDFS-8782
> URL: https://issues.apache.org/jira/browse/HDFS-8782
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Priority: Critical
>
> We have seen multiple incidents at production sites that there are long 
> delays for DNs to register to the NN when upgrading to post 2.6 release.
> Further investigation shows that the DN is blocked when upgrading the storage 
> layout introduced in HDFS-6482. The new storage layout requires making up to 
> 64k directories in the underlying file system. Unfortunately the current 
> implementation calls {{mkdirs()}} sequentially and upgrades each volume in 
> sequential order.
> As a result, upgrading a DN with a lot of disks or with blocks that have 
> random block ID takes a long time (usually in hours), and the DN won't 
> register to the NN unless it finishes upgrading all the storage directory. 
> The excessive delays confuse operations and break the assumption of rolling 
> upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration

2015-11-04 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989160#comment-14989160
 ] 

Vinayakumar B commented on HDFS-8782:
-

HDFS-8578, is also related to upgrade in parallel.

> Upgrade to block ID-based DN storage layout delays DN registration
> --
>
> Key: HDFS-8782
> URL: https://issues.apache.org/jira/browse/HDFS-8782
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Priority: Critical
>
> We have seen multiple incidents at production sites that there are long 
> delays for DNs to register to the NN when upgrading to post 2.6 release.
> Further investigation shows that the DN is blocked when upgrading the storage 
> layout introduced in HDFS-6482. The new storage layout requires making up to 
> 64k directories in the underlying file system. Unfortunately the current 
> implementation calls {{mkdirs()}} sequentially and upgrades each volume in 
> sequential order.
> As a result, upgrading a DN with a lot of disks or with blocks that have 
> random block ID takes a long time (usually in hours), and the DN won't 
> register to the NN unless it finishes upgrading all the storage directory. 
> The excessive delays confuse operations and break the assumption of rolling 
> upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration

2015-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989130#comment-14989130
 ] 

Duo Zhang commented on HDFS-8782:
-

I tried to make {{DataStorage.addStorageLocations}} run parallelly but I found 
it is difficult.

There are some properties in {{DataStorage}}(inherit from {{StorageInfo}}) 
which will be updated when loading {{StorageDirectory}}, such as 
{{layoutVersion}}, so it may have side effect when changing the code from 
sequential to parallel even if I use lock everywhere to protect these 
properties.

I do not get the point why we need a {{layoutVersion}} in {{DataStorage}}? As 
far as I know, {{DataStorage}} is only a container of {{StorageDirectory}} or 
{{BlockPoolSliceStorage}} if federation is enabled. So what does the 
{{layoutVersion}} in {{DataStorage}} mean? Is there any history reason for 
keeping it?

Thanks.

> Upgrade to block ID-based DN storage layout delays DN registration
> --
>
> Key: HDFS-8782
> URL: https://issues.apache.org/jira/browse/HDFS-8782
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Priority: Critical
>
> We have seen multiple incidents at production sites that there are long 
> delays for DNs to register to the NN when upgrading to post 2.6 release.
> Further investigation shows that the DN is blocked when upgrading the storage 
> layout introduced in HDFS-6482. The new storage layout requires making up to 
> 64k directories in the underlying file system. Unfortunately the current 
> implementation calls {{mkdirs()}} sequentially and upgrades each volume in 
> sequential order.
> As a result, upgrading a DN with a lot of disks or with blocks that have 
> random block ID takes a long time (usually in hours), and the DN won't 
> register to the NN unless it finishes upgrading all the storage directory. 
> The excessive delays confuse operations and break the assumption of rolling 
> upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration

2015-09-10 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740128#comment-14740128
 ] 

Duo Zhang commented on HDFS-8782:
-

I think at least we could upgrade each volume parallel?

I tried upgrading from 2.5.0 to 2.7.1. It spent more than 20 minutes on a 3T * 
11 datanode... If parallel, the halt time could reduce to 2 minutes I think?

Thanks.

> Upgrade to block ID-based DN storage layout delays DN registration
> --
>
> Key: HDFS-8782
> URL: https://issues.apache.org/jira/browse/HDFS-8782
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Priority: Critical
>
> We have seen multiple incidents at production sites that there are long 
> delays for DNs to register to the NN when upgrading to post 2.6 release.
> Further investigation shows that the DN is blocked when upgrading the storage 
> layout introduced in HDFS-6482. The new storage layout requires making up to 
> 64k directories in the underlying file system. Unfortunately the current 
> implementation calls {{mkdirs()}} sequentially and upgrades each volume in 
> sequential order.
> As a result, upgrading a DN with a lot of disks or with blocks that have 
> random block ID takes a long time (usually in hours), and the DN won't 
> register to the NN unless it finishes upgrading all the storage directory. 
> The excessive delays confuse operations and break the assumption of rolling 
> upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)