from:"Fengdong Yu \(JIRA\)"

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-18 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366751#comment-14366751
 ] 

Fengdong Yu commented on HDFS-7285:
---

[~zhz], Can you explain how to run your Python code? you don't have parameter 
specification.


> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-18 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366753#comment-14366753
 ] 

Fengdong Yu commented on HDFS-7285:
---

Wow, why shows lots of repeated comments here?

> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-18 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366749#comment-14366749
 ] 

Fengdong Yu commented on HDFS-7285:
---

[~zhz], Can you explain how to run your Python code? you don't have parameter 
specification.


> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-18 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366750#comment-14366750
 ] 

Fengdong Yu commented on HDFS-7285:
---

[~zhz], Can you explain how to run your Python code? you don't have parameter 
specification.


> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-18 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366748#comment-14366748
 ] 

Fengdong Yu commented on HDFS-7285:
---

[~zhz], Can you explain how to run your Python code? you don't have parameter 
specification.


> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-18 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366747#comment-14366747
 ] 

Fengdong Yu commented on HDFS-7285:
---

[~zhz], Can you explain how to run your Python code? you don't have parameter 
specification.


> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-18 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366745#comment-14366745
 ] 

Fengdong Yu commented on HDFS-7285:
---

[~zhz], Can you explain how to run your Python code? you don't have parameter 
specification.


> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-18 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366746#comment-14366746
 ] 

Fengdong Yu commented on HDFS-7285:
---

[~zhz], Can you explain how to run your Python code? you don't have parameter 
specification.


> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

2014-10-23 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182277#comment-14182277
 ] 

Fengdong Yu commented on HDFS-7279:
---

That is great,

> Use netty to implement DatanodeWebHdfsMethods
> -
>
> Key: HDFS-7279
> URL: https://issues.apache.org/jira/browse/HDFS-7279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, webhdfs
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7279.000.patch
>
>
> Currently the DN implements all related webhdfs functionality using jetty. As 
> the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
> and connection management, DN often suffers from long latency and OOM when 
> its webhdfs component is under sustained heavy load.
> This jira proposes to implement the webhdfs component in DN using netty, 
> which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7254) Add documents for hot swap drive

2014-10-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178009#comment-14178009
 ] 

Fengdong Yu commented on HDFS-7254:
---

bq.<<>>

should be dfs.datanode.data.dir

> Add documents for hot swap drive
> 
>
> Key: HDFS-7254
> URL: https://issues.apache.org/jira/browse/HDFS-7254
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch
>
>
> Add documents for the hot swap drive functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2014-10-17 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174888#comment-14174888
 ] 

Fengdong Yu commented on HDFS-7240:
---

please look at here for a simple description:
http://www.hortonworks.com/blog/ozone-object-store-hdfs/



> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6137) Datanode cannot rollback because LayoutVersion incorrect

2014-10-07 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163021#comment-14163021
 ] 

Fengdong Yu commented on HDFS-6137:
---

Thanks [~szetszwo], a very useful description.

bq. I think we only can advise users to do manually rollback (manually change 
the data/current/VERSION file to the old version) 
Yes. but for a large cluster, there need some addtional work to do.

bq. but cannot change the (old) softwares to fix bug.
what did you mean?



> Datanode cannot rollback because LayoutVersion incorrect
> 
>
> Key: HDFS-6137
> URL: https://issues.apache.org/jira/browse/HDFS-6137
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.5-alpha
>Reporter: Fengdong Yu
>
> upgrade from hadoop-2.0.5-alpha(QJM HA enabled) to the lastest trunk(HA 
> disabled), which is successful. then stop the cluster, and rollback,  then it 
> throw exception:
> {code}
> 2014-03-21 18:33:19,384 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-1123524590-10.204.8.135-1395397158134 (storage id 
> DS-1123524590-10.204.8.135-50010-1395397185148) service to 
> 10-204-8-135/10.204.8.135:9000
> org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
> version of storage directory 
> /data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported: 
> -55. Expecting = -40.
> at 
> org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
> at 
> org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
> at java.lang.Thread.run(Thread.java:744)
> {code}
>   
> I looked at the datanode dir,  $datanode.dir/VERSION is always new, when we 
> upgrade, this file was overwrited, so it MUST fail during rollback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6496) WebHDFS cannot open file

2014-06-06 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6496:
--

Attachment: webhdfs.PNG

> WebHDFS cannot open file
> 
>
> Key: HDFS-6496
> URL: https://issues.apache.org/jira/browse/HDFS-6496
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: webhdfs.PNG
>
>
> WebHDFS cannot open the file on the name node web UI. I attched screen.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6496) WebHDFS cannot open file

2014-06-06 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019694#comment-14019694
 ] 

Fengdong Yu commented on HDFS-6496:
---

Click any file on NN web UI, get the same error

> WebHDFS cannot open file
> 
>
> Key: HDFS-6496
> URL: https://issues.apache.org/jira/browse/HDFS-6496
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: webhdfs.PNG
>
>
> WebHDFS cannot open the file on the name node web UI. I attched screen.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6496) WebHDFS cannot open file

2014-06-06 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-6496:
-

 Summary: WebHDFS cannot open file
 Key: HDFS-6496
 URL: https://issues.apache.org/jira/browse/HDFS-6496
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu


WebHDFS cannot open the file on the name node web UI. I attched screen.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-6495) In some case, the hedged read will lead to client infinite wait.

2014-06-06 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu resolved HDFS-6495.
---

Resolution: Duplicate

Duplicate with HDFS-6494

> In some case, the  hedged read will lead to client  infinite wait.
> --
>
> Key: HDFS-6495
> URL: https://issues.apache.org/jira/browse/HDFS-6495
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: LiuLei
>
> When I use "hedged read", If there is only one live datanode, the reading 
> from  the datanode throw TimeoutException and ChecksumException., the Client 
> will infinite wait.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6338) Add a RPC method to allow administrator to delete the file lease.

2014-05-04 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989220#comment-13989220
 ] 

Fengdong Yu commented on HDFS-6338:
---

Ted,

recoverLease() is not I want, which is revoke the lease. but I aim to delete 
the lease. explain in detail:

DFSClient can generate a clientID such as NOMAPREDUCE-1000 before write file-a 
on HDFS, but it was interrupted and not closed normally.

then I try to write it again, DFSClient will generate annothe clientID, such as 
NOMAPREDUCE-1001, then it throw an exception such as :

failed to create file because this file is already being created by 
NOMAPREDUCE-1000.

So I have to waite NOMAPREDUCE-1000 expire. 


> Add a RPC method to allow administrator to delete the file lease.
> -
>
> Key: HDFS-6338
> URL: https://issues.apache.org/jira/browse/HDFS-6338
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
>
> we have to wait file lease expire after unexpected interrupt during HDFS 
> writing. so I want to add a RPC method to allow administrator delete the file 
> lease.
> Please leave comments here, I am workong on the patch now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6338) Add a RPC method to allow administrator to delete the file lease.

2014-05-04 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6338:
--

Summary: Add a RPC method to allow administrator to delete the file lease.  
(was: Add a RPC method to allow administrator to delete file lease.)

> Add a RPC method to allow administrator to delete the file lease.
> -
>
> Key: HDFS-6338
> URL: https://issues.apache.org/jira/browse/HDFS-6338
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
>
> we have to wait file lease expire after unexpected interrupt during HDFS 
> writing. so I want to add a RPC method to allow administrator delete the file 
> lease.
> Please leave comments here, I am workong on the patch now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6338) Add a RPC method to allow administrator to delete file lease.

2014-05-04 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6338:
--

Summary: Add a RPC method to allow administrator to delete file lease.  
(was: Add a RPC to allow administrator to delete file lease.)

> Add a RPC method to allow administrator to delete file lease.
> -
>
> Key: HDFS-6338
> URL: https://issues.apache.org/jira/browse/HDFS-6338
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
>
> we have to wait file lease expire after unexpected interrupt during HDFS 
> writing. so I want to add a RPC method to allow administrator delete the file 
> lease.
> Please leave comments here, I am workong on the patch now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6338) Add a RPC to allow administrator to delete file lease.

2014-05-04 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-6338:
-

 Summary: Add a RPC to allow administrator to delete file lease.
 Key: HDFS-6338
 URL: https://issues.apache.org/jira/browse/HDFS-6338
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor


we have to wait file lease expire after unexpected interrupt during HDFS 
writing. so I want to add a RPC method to allow administrator delete the file 
lease.

Please leave comments here, I am workong on the patch now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6318) refreshServiceAcl cannot affect both active NN and standby NN

2014-05-01 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986513#comment-13986513
 ] 

Fengdong Yu commented on HDFS-6318:
---

The test failure is not related to this patch.

> refreshServiceAcl cannot affect both active NN and standby NN
> -
>
> Key: HDFS-6318
> URL: https://issues.apache.org/jira/browse/HDFS-6318
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
> Attachments: HDFS-6318.patch, HDFS-6318.patch
>
>
> refreshServiceAcl cannot affect both active NN and standby NN, it only select 
> one NN to reload the ACL configuration. but we should reload Acl on both 
> active NN and standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6318) refreshServiceAcl cannot affect both active NN and standby NN

2014-04-30 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6318:
--

Attachment: HDFS-6318.patch

> refreshServiceAcl cannot affect both active NN and standby NN
> -
>
> Key: HDFS-6318
> URL: https://issues.apache.org/jira/browse/HDFS-6318
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
> Attachments: HDFS-6318.patch, HDFS-6318.patch
>
>
> refreshServiceAcl cannot affect both active NN and standby NN, it only select 
> one NN to reload the ACL configuration. but we should reload Acl on both 
> active NN and standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Work started] (HDFS-6318) refreshServiceAcl cannot affect both active NN and standby NN

2014-04-30 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-6318 started by Fengdong Yu.

> refreshServiceAcl cannot affect both active NN and standby NN
> -
>
> Key: HDFS-6318
> URL: https://issues.apache.org/jira/browse/HDFS-6318
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
> Attachments: HDFS-6318.patch
>
>
> refreshServiceAcl cannot affect both active NN and standby NN, it only select 
> one NN to reload the ACL configuration. but we should reload Acl on both 
> active NN and standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6318) refreshServiceAcl cannot affect both active NN and standby NN

2014-04-30 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6318:
--

Status: Patch Available  (was: In Progress)

> refreshServiceAcl cannot affect both active NN and standby NN
> -
>
> Key: HDFS-6318
> URL: https://issues.apache.org/jira/browse/HDFS-6318
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
> Attachments: HDFS-6318.patch
>
>
> refreshServiceAcl cannot affect both active NN and standby NN, it only select 
> one NN to reload the ACL configuration. but we should reload Acl on both 
> active NN and standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6318) refreshServiceAcl cannot affect both active NN and standby NN

2014-04-30 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6318:
--

Attachment: HDFS-6318.patch

> refreshServiceAcl cannot affect both active NN and standby NN
> -
>
> Key: HDFS-6318
> URL: https://issues.apache.org/jira/browse/HDFS-6318
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
> Attachments: HDFS-6318.patch
>
>
> refreshServiceAcl cannot affect both active NN and standby NN, it only select 
> one NN to reload the ACL configuration. but we should reload Acl on both 
> active NN and standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6318) refreshServiceAcl cannot affect both active NN and standby NN

2014-04-30 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-6318:
-

 Summary: refreshServiceAcl cannot affect both active NN and 
standby NN
 Key: HDFS-6318
 URL: https://issues.apache.org/jira/browse/HDFS-6318
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu


refreshServiceAcl cannot affect both active NN and standby NN, it only select 
one NN to reload the ACL configuration. but we should reload Acl on both active 
NN and standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5147) Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup

2014-04-30 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986256#comment-13986256
 ] 

Fengdong Yu commented on HDFS-5147:
---

But for some admin commands, such refresh ACL, it should be execute on both ANN 
and SNN, otherwise, standby NN doesn't know the latest ACL after failover.



> Certain dfsadmin commands such as safemode do not interact with the active 
> namenode in ha setup
> ---
>
> Key: HDFS-5147
> URL: https://issues.apache.org/jira/browse/HDFS-5147
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.0-beta
>Reporter: Arpit Gupta
>
> There are certain commands in dfsadmin return the status of the first 
> namenode specified in the configs rather than interacting with the active 
> namenode
> For example. Issue
> hdfs dfsadmin -safemode get
> and it will return the status of the first namenode in the configs rather 
> than the active namenode.
> I think all dfsadmin commands should determine which is the active namenode 
> do the operation on it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6299) Protobuf for XAttr and client-side implementation

2014-04-30 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985574#comment-13985574
 ] 

Fengdong Yu commented on HDFS-6299:
---

Yes, upload the patch to HDFS-6309, don't update this issue again.:)

{code}
+  if (result != null && result.isEmpty()) {
{code}

it should be 
{code}
  if (result != null && !result.isEmpty()) {
{code}


> Protobuf for XAttr and client-side implementation 
> --
>
> Key: HDFS-6299
> URL: https://issues.apache.org/jira/browse/HDFS-6299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: HDFS-6299.2.patch, HDFS-6299.patch
>
>
> This JIRA tracks Protobuf for XAttr and implementation for XAttr interfaces 
> in DistributedFilesystem and DFSClient. 
> With this JIRA we may just keep the dummy  implemenation for Xattr API of 
> ClientProtocol in NameNodeRpcServer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (HDFS-6299) Protobuf for XAttr and client-side implementation

2014-04-30 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu reopened HDFS-6299:
---


It cannot be closed, I still reopen it. and the committed should be reverted. 
and revise these comments here.

> Protobuf for XAttr and client-side implementation 
> --
>
> Key: HDFS-6299
> URL: https://issues.apache.org/jira/browse/HDFS-6299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: HDFS-6299.patch
>
>
> This JIRA tracks Protobuf for XAttr and implementation for XAttr interfaces 
> in DistributedFilesystem and DFSClient. 
> With this JIRA we may just keep the dummy  implemenation for Xattr API of 
> ClientProtocol in NameNodeRpcServer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6299) Protobuf for XAttr and client-side implementation

2014-04-30 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985315#comment-13985315
 ] 

Fengdong Yu commented on HDFS-6299:
---

bq.Actually DFSClient is not publicly exposed one and the clear javadoc 
comments there with API. That is like a core helper class delegation from 
client perspective. No harm in having javadoc.

yes, there is no harm, but it's the code style required. please refer to other 
client methods in DFSClent, such as append() ?

> Protobuf for XAttr and client-side implementation 
> --
>
> Key: HDFS-6299
> URL: https://issues.apache.org/jira/browse/HDFS-6299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: HDFS-6299.patch
>
>
> This JIRA tracks Protobuf for XAttr and implementation for XAttr interfaces 
> in DistributedFilesystem and DFSClient. 
> With this JIRA we may just keep the dummy  implemenation for Xattr API of 
> ClientProtocol in NameNodeRpcServer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6299) Protobuf for XAttr and client-side implementation

2014-04-30 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985312#comment-13985312
 ] 

Fengdong Yu commented on HDFS-6299:
---

1.
{code}
+  public byte[] getXAttr(String src, String name) throws IOException {
+checkOpen();
+try {
+  XAttr xAttr = buildXAttr(name, null);
+  List xAttrs = Lists.newArrayListWithCapacity(1);
+  xAttrs.add(xAttr);
+  List result = namenode.getXAttrs(src, xAttrs);
+  byte[] value = null;
+  if (result != null && result.size() > 0) {
+XAttr a = result.get(0);
+value = a.getValue();
+if (value == null) {
+  value = new byte[0]; //xattr exists, but no value.
+}
+  }
+  return value;
+} catch(RemoteException re) {
{code}

It looks like you don't want to return null here? so if result is null or 
empty, It still return null
Another, try to use !result.isEmpty(), instead of 'result.size() > 0'

2.
ClientRPC interface is not symmetrical, there are getXAttr(), getXAttrs(), 
setXAttr(), so there should be setXAttrs().

3.
why there is no getXAttr() in the ClientProtocol? we should allow to get only 
one xattr in one time.





> Protobuf for XAttr and client-side implementation 
> --
>
> Key: HDFS-6299
> URL: https://issues.apache.org/jira/browse/HDFS-6299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: HDFS-6299.patch
>
>
> This JIRA tracks Protobuf for XAttr and implementation for XAttr interfaces 
> in DistributedFilesystem and DFSClient. 
> With this JIRA we may just keep the dummy  implemenation for Xattr API of 
> ClientProtocol in NameNodeRpcServer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (HDFS-6299) Protobuf for XAttr and client-side implementation

2014-04-30 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu reopened HDFS-6299:
---


I reopened this issue, because I found there are more than two issues after I 
review.

> Protobuf for XAttr and client-side implementation 
> --
>
> Key: HDFS-6299
> URL: https://issues.apache.org/jira/browse/HDFS-6299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: HDFS-6299.patch
>
>
> This JIRA tracks Protobuf for XAttr and implementation for XAttr interfaces 
> in DistributedFilesystem and DFSClient. 
> With this JIRA we may just keep the dummy  implemenation for Xattr API of 
> ClientProtocol in NameNodeRpcServer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6299) Protobuf for XAttr and client-side implementation

2014-04-30 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985299#comment-13985299
 ] 

Fengdong Yu commented on HDFS-6299:
---

Can you add Javadoc in DFSClient?

> Protobuf for XAttr and client-side implementation 
> --
>
> Key: HDFS-6299
> URL: https://issues.apache.org/jira/browse/HDFS-6299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: HDFS-6299.patch
>
>
> This JIRA tracks Protobuf for XAttr and implementation for XAttr interfaces 
> in DistributedFilesystem and DFSClient. 
> With this JIRA we may just keep the dummy  implemenation for Xattr API of 
> ClientProtocol in NameNodeRpcServer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6299) Protobuf for XAttr and client-side implementation

2014-04-30 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985296#comment-13985296
 ] 

Fengdong Yu commented on HDFS-6299:
---

sorry, my comments is later, 
{code}
+int prefixIndex = name.indexOf(".");
+if (prefixIndex == -1) {
+  throw new HadoopIllegalArgumentException("XAttr name must be prefixed 
with" +
+  " user/trusted/security/system which followed by '.'");
+} else if (prefixIndex == name.length() -1) {
+  throw new HadoopIllegalArgumentException("XAttr name can not be empty.");
+}
{code}

It should be 
{code}
if (prefixIndex <= 0) {
{code}

other wise, the code will do waste before catch this Exception.

> Protobuf for XAttr and client-side implementation 
> --
>
> Key: HDFS-6299
> URL: https://issues.apache.org/jira/browse/HDFS-6299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: HDFS-6299.patch
>
>
> This JIRA tracks Protobuf for XAttr and implementation for XAttr interfaces 
> in DistributedFilesystem and DFSClient. 
> With this JIRA we may just keep the dummy  implemenation for Xattr API of 
> ClientProtocol in NameNodeRpcServer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6271) Rename JournalProtocol to BackupNodeJournalProtocol

2014-04-23 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979166#comment-13979166
 ] 

Fengdong Yu commented on HDFS-6271:
---

we should keep concetion consistent as webUI, which shows Standby, not backup, 
so It would be StandbyNodeJournalProtocol.

> Rename JournalProtocol to BackupNodeJournalProtocol
> ---
>
> Key: HDFS-6271
> URL: https://issues.apache.org/jira/browse/HDFS-6271
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>
> [~shv], as indicated in earlier comments, JournalProtocol is used only for 
> sending journal to backupnode .Renaming it would make it clear how the 
> protocol is being used and will not conflict with QuorumJournalProtocol.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6252) Namenode old webUI should be deprecated

2014-04-23 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979164#comment-13979164
 ] 

Fengdong Yu commented on HDFS-6252:
---

Thanks for the patch, Haohui!

> Namenode old webUI should be deprecated
> ---
>
> Key: HDFS-6252
> URL: https://issues.apache.org/jira/browse/HDFS-6252
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Fengdong Yu
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-6252.000.patch
>
>
> We've deprecated hftp and hsftp in HDFS-5570, so if we always download file 
> from "download this file" on the browseDirectory.jsp, it will throw an error:
> Problem accessing /streamFile/***
> because streamFile servlet was deleted in HDFS-5570.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6267) Upgrade Jetty6 to Jetty9

2014-04-21 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976348#comment-13976348
 ] 

Fengdong Yu commented on HDFS-6267:
---

Thanks Andrew. yes, this upgrade is not easy.

bq.. It'd be technically fine to do this in trunk for 3.0 (like this JIRA's 
affects version indicates), but I personally am against divergence between 
trunk and branch-2 until there's a clear plan for releasing a Hadoop 3.0
I agree, this issue will be pending to commit until there is branch-3.

bq.This is also a change that would need a lot of testing, since Jetty is used 
in a lot of places in Hadoop.
Jetty was only used for webHDFS in branch-2, the new webUI is also use webHDFS, 
MR shuffle use netty now.
and I did lots of tests for Jetty9 manually on my testing cluster,  it works 
well. 



> Upgrade Jetty6 to Jetty9
> 
>
> Key: HDFS-6267
> URL: https://issues.apache.org/jira/browse/HDFS-6267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
> Attachments: HDFS-6267.patch
>
>
> Jetty stable version is 9.x, but it requires Java7, so I want to target 3.0 
> for this upgrade.
> Jetty9 is incompatible with Jetty6, so the patch will be big.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6267) Upgrade Jetty6 to Jetty9

2014-04-21 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976291#comment-13976291
 ] 

Fengdong Yu commented on HDFS-6267:
---

I can build on my local, but failed here. there is no detail failure console 
output on Jenkins.

> Upgrade Jetty6 to Jetty9
> 
>
> Key: HDFS-6267
> URL: https://issues.apache.org/jira/browse/HDFS-6267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
> Attachments: HDFS-6267.patch
>
>
> Jetty stable version is 9.x, but it requires Java7, so I want to target 3.0 
> for this upgrade.
> Jetty9 is incompatible with Jetty6, so the patch will be big.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6267) Upgrade Jetty6 to Jetty9

2014-04-21 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6267:
--

Attachment: HDFS-6267.patch

> Upgrade Jetty6 to Jetty9
> 
>
> Key: HDFS-6267
> URL: https://issues.apache.org/jira/browse/HDFS-6267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
> Attachments: HDFS-6267.patch
>
>
> Jetty stable version is 9.x, but it requires Java7, so I want to target 3.0 
> for this upgrade.
> Jetty9 is incompatible with Jetty6, so the patch will be big.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6267) Upgrade Jetty6 to Jetty9

2014-04-21 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6267:
--

Hadoop Flags: Incompatible change
  Status: Patch Available  (was: In Progress)

> Upgrade Jetty6 to Jetty9
> 
>
> Key: HDFS-6267
> URL: https://issues.apache.org/jira/browse/HDFS-6267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
> Attachments: HDFS-6267.patch
>
>
> Jetty stable version is 9.x, but it requires Java7, so I want to target 3.0 
> for this upgrade.
> Jetty9 is incompatible with Jetty6, so the patch will be big.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6267) Upgrade Jetty6 to Jetty9

2014-04-21 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-6267:
-

 Summary: Upgrade Jetty6 to Jetty9
 Key: HDFS-6267
 URL: https://issues.apache.org/jira/browse/HDFS-6267
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor


Jetty stable version is 9.x, but it requires Java7, so I want to target 3.0 for 
this upgrade.

Jetty9 is incompatible with Jetty6, so the patch will be big.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Work started] (HDFS-6267) Upgrade Jetty6 to Jetty9

2014-04-21 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-6267 started by Fengdong Yu.

> Upgrade Jetty6 to Jetty9
> 
>
> Key: HDFS-6267
> URL: https://issues.apache.org/jira/browse/HDFS-6267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
>
> Jetty stable version is 9.x, but it requires Java7, so I want to target 3.0 
> for this upgrade.
> Jetty9 is incompatible with Jetty6, so the patch will be big.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6252) Namenode old webUI should be deprecated

2014-04-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975341#comment-13975341
 ] 

Fengdong Yu commented on HDFS-6252:
---

Yes. I can provide a patch for this. but I am not available recently, so the 
patch maybe later. but It will target 2.5 release.

> Namenode old webUI should be deprecated
> ---
>
> Key: HDFS-6252
> URL: https://issues.apache.org/jira/browse/HDFS-6252
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Fengdong Yu
>Priority: Minor
>
> We've deprecated hftp and hsftp in HDFS-5570, so if we always download file 
> from "download this file" on the browseDirectory.jsp, it will throw an error:
> Problem accessing /streamFile/***
> because streamFile servlet was deleted in HDFS-5570.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6252) Namenode old webUI should be deprecated

2014-04-16 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-6252:
-

 Summary: Namenode old webUI should be deprecated
 Key: HDFS-6252
 URL: https://issues.apache.org/jira/browse/HDFS-6252
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Fengdong Yu
Priority: Minor


We've deprecated hftp and hsftp in HDFS-5570, so if we always download file 
from "download this file" on the browseDirectory.jsp, it will throw an error:

Problem accessing /streamFile/***

because streamFile servlet was deleted in HDFS-5570.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6239) start-dfs.sh does not start remote DataNode due to escape characters

2014-04-13 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968028#comment-13968028
 ] 

Fengdong Yu commented on HDFS-6239:
---

I cannot reproduce it.

Do you have softlink on your Hadoop install path? maybe your bash is 
incompatible.

so can you try: search '-P' in the $HADOOP_HOME/libexec/hadoop-config.sh, and 
remove "-P".



> start-dfs.sh does not start remote DataNode due to escape characters
> 
>
> Key: HDFS-6239
> URL: https://issues.apache.org/jira/browse/HDFS-6239
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 1.2.1
> Environment: GNU bash, version 4.1.2(1)-release 
> (x86_64-redhat-linux-gnu)
> Linux foo 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> AFS file system.
>Reporter: xyzzy
>
> start-dfs.sh fails to start remote data nodes and task nodes, though it is 
> possible to start them manually through hadoop-daemon.sh.
> I've been able to debug and find the root cause the bug, and I thought it was 
> a trivial fix, but I do not know how to do it. Can't figure out a way to 
> handle this seemingly trivial bug.
> hadoop-daemons.sh calls slave.sh:
> exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" \; 
> "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
> This is the issue when I debug using bash -x: In slaves.sh, the \; becomes ';'
> + ssh .xx..xxx cd /afs/xx..xxx/x/x/x/xx/x/libexec/.. ';' 
> /afs/xx..xxx/x/x/x/xx//bin/hadoop-daemon.sh --config 
> /afs/xx..xxx/x/x/x/xx//libexec/../conf start datanode
> The problem is ';' . Because the semi-colon is surrounded by quotes, it 
> doesn't execute the code after that. I manually ran the above command, and as 
> expected the data node did not start. When I removed the quotes around the 
> semi-colon, everything works. Please note that you can see the issue only 
> when you do bash -x. If you echo the statement, the quotes around the 
> semi-colon are not visible.
> This issue is always reproducible for me, and because of it, I have to 
> manually start daemons on each machine. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6178) Decommission on standby NN couldn't finish

2014-03-31 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956077#comment-13956077
 ] 

Fengdong Yu commented on HDFS-6178:
---

Thanks for the detail report, but generally, we just decommission DNs on the 
active node, right?


> Decommission on standby NN couldn't finish
> --
>
> Key: HDFS-6178
> URL: https://issues.apache.org/jira/browse/HDFS-6178
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Ming Ma
>
> Currently decommissioning machines in HA-enabled cluster requires running 
> refreshNodes in both active and standby nodes. Sometimes decommissioning 
> won't finish from standby NN's point of view.  Here is the diagnosis of why 
> it could happen.
> Standby NN's blockManager manages blocks replication and block invalidation 
> as if it is the active NN; even though DNs will ignore block commands coming 
> from standby NN. When standby NN makes block operation decisions such as the 
> target of block replication and the node to remove excess blocks from, the 
> decision is independent of active NN. So active NN and standby NN could have 
> different states. When we try to decommission nodes on standby nodes; such 
> state inconsistency might prevent standby NN from making progress. Here is an 
> example.
> Machine A
> Machine B
> Machine C
> Machine D
> Machine E
> Machine F
> Machine G
> Machine H
> 1. For a given block, both active and standby have 5 replicas on machine A, 
> B, C, D, E. So both active and standby decide to pick excess nodes to 
> invalidate.
> Active picked D and E as excess DNs. After the next block reports from D and 
> E, active NN has 3 active replicas (A, B, C), 0 excess replica.
> {noformat}
> 2014-03-27 01:50:14,410 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (E:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> 2014-03-27 01:50:15,539 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (D:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> {noformat}
> Standby pick C, E as excess DNs. Given DNs ignore commands from standby, 
> After the next block reports from C, D, E,  standby has 2 active replicas (A, 
> B), 1 excess replica (C).
> {noformat}
> 2014-03-27 01:51:49,543 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (E:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> 2014-03-27 01:51:49,894 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (C:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> {noformat}
> 2. Machine A decomm request was sent to standby. Standby only had one live 
> replica and picked machine G, H as targets, but given standby commands was 
> ignored by DNs, G, H remained in pending replication queue until they are 
> timed out. At this point, you have one decommissioning replica (A), 1 active 
> replica (B), one excess replica (C).
> {noformat}
> 2014-03-27 04:42:52,258 INFO BlockStateChange: BLOCK* ask A:50010 to 
> replicate blk_-5207804474559026159_121186764 to datanode(s) G:50010 H:50010
> {noformat}
> 3. Machine A decomm request was sent to active NN. Active NN picked machine F 
> as the target. It finished properly. So active NN had 3 active replicas (B, 
> C, F), one decommissioned replica (A).
> {noformat}
> 2014-03-27 04:44:15,239 INFO BlockStateChange: BLOCK* ask 10.42.246.110:50010 
> to replicate blk_-5207804474559026159_121186764 to datanode(s) F:50010
> 2014-03-27 04:44:16,083 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 size 
> 7100065
> {noformat}
> 4. Standby NN picked up F as a new replica. Thus standby had one 
> decommissioning replica (A), 2 active replicas (B, F), one excess replica 
> (C). Standby NN kept trying to schedule replication work, but DNs ignored the 
> commands.
> {noformat}
> 2014-03-27 04:44:16,084 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 size 
> 7100065
> 2014-03-28 23:06:11,970 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block: 
> blk_-5207804474559026159_121186764, Expected Replicas: 3, live replicas: 2, 
> corrupt replicas: 0, decommissioned replicas: 1, excess replicas: 1, Is Open 
> File: false, Datanodes having this block: C:50010 B:50010 A:50010 F:50010 , 
> Current Datanode: A:50010, Is current datanode decommissioning: true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6163) Fix a minor bug in the HA upgrade document

2014-03-27 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6163:
--

Status: Patch Available  (was: Open)

> Fix a minor bug in the HA upgrade document
> --
>
> Key: HDFS-6163
> URL: https://issues.apache.org/jira/browse/HDFS-6163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
> Attachments: HDFS-6163.patch
>
>
> Fix a minor command error in the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6163) Fix a minor bug in the HA upgrade document

2014-03-27 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6163:
--

Attachment: HDFS-6163.patch

> Fix a minor bug in the HA upgrade document
> --
>
> Key: HDFS-6163
> URL: https://issues.apache.org/jira/browse/HDFS-6163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Fengdong Yu
>Priority: Minor
> Attachments: HDFS-6163.patch
>
>
> Fix a minor command error in the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6163) Fix a minor bug in the HA upgrade document

2014-03-27 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-6163:
-

 Summary: Fix a minor bug in the HA upgrade document
 Key: HDFS-6163
 URL: https://issues.apache.org/jira/browse/HDFS-6163
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor


Fix a minor command error in the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6137) Datanode cannot rollback because LayoutVersion incorrect

2014-03-26 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947666#comment-13947666
 ] 

Fengdong Yu commented on HDFS-6137:
---

[~szetszwo], I found BlockPoolSliceStorage.doRollback() is not called during DN 
start with -rollback in the Exception.

{code}
at 
org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
{code}

doTransition() should call doRollback(), right?

> Datanode cannot rollback because LayoutVersion incorrect
> 
>
> Key: HDFS-6137
> URL: https://issues.apache.org/jira/browse/HDFS-6137
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> upgrade from hadoop-2.0.5-alpha(QJM HA enabled) to the lastest trunk(HA 
> disabled), which is successful. then stop the cluster, and rollback,  then it 
> throw exception:
> {code}
> 2014-03-21 18:33:19,384 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-1123524590-10.204.8.135-1395397158134 (storage id 
> DS-1123524590-10.204.8.135-50010-1395397185148) service to 
> 10-204-8-135/10.204.8.135:9000
> org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
> version of storage directory 
> /data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported: 
> -55. Expecting = -40.
> at 
> org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
> at 
> org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
> at java.lang.Thread.run(Thread.java:744)
> {code}
>   
> I looked at the datanode dir,  $datanode.dir/VERSION is always new, when we 
> upgrade, this file was overwrited, so it MUST fail during rollback.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE when upgrading namenode from fsimages older than -32

2014-03-26 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947662#comment-13947662
 ] 

Fengdong Yu commented on HDFS-6130:
---

And, Thanks  [~szetszwo] .

> NPE when upgrading namenode from fsimages older than -32
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Haohui Mai
>Priority: Blocker
> Fix For: 2.4.0
>
> Attachments: HDFS-6130.000.patch, fsimage.tar.gz
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE when upgrading namenode from fsimages older than -32

2014-03-25 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947440#comment-13947440
 ] 

Fengdong Yu commented on HDFS-6130:
---

[~wheat9] I've tested it. It does works. thanks.

+1 for the patch.

> NPE when upgrading namenode from fsimages older than -32
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6130.000.patch, fsimage.tar.gz
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE when upgrading namenode from fsimages older than -32

2014-03-25 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947321#comment-13947321
 ] 

Fengdong Yu commented on HDFS-6130:
---

Thanks for the patch, I will test it.


> NPE when upgrading namenode from fsimages older than -32
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6130.000.patch, fsimage.tar.gz
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6154) Improve the speed of saveNameSpace，making HDFS restart and checkPoint faster

2014-03-25 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6154:
--

Assignee: Fengdong Yu

> Improve the speed of saveNameSpace，making HDFS restart and checkPoint faster
> 
>
> Key: HDFS-6154
> URL: https://issues.apache.org/jira/browse/HDFS-6154
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: guodongdong
>Assignee: Fengdong Yu
> Attachments: HDFS-6154-patch
>
>
> There are two stage In namenode savenamespace,  serializing INode, calculate 
> MD5 and write to disk.  Now, two stage is doing serially, In this 
> improvement, it is doing  parallel, one thread do serializing INode, other 
> thread do calculating MD5 and writing to disk, it double speed of 
> savenamespace, Detail is show in table:
> Testing environment:
>   only test namenode savenamespace, dfsadmin -saveNamespace
> machine: 144GB, Intel(R) Xeon(R) CPU  E5645  @ 2.40GHz, 12 cpu, Raid 5 
> SAS Disk,  jdk 1.7.0
>  
> ||image size||before optimizing||after optimizing ||
> |1.2GB|22sec|11sec|
> |4.3GB|66sec|36sec|
> |22GB|406sec|250sec|



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6154) Improve the speed of saveNameSpace，making HDFS restart and checkPoint faster

2014-03-25 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946260#comment-13946260
 ] 

Fengdong Yu commented on HDFS-6154:
---

hi [~guodongdong]

please change your LANG to en_US.utf-8 before generate the patch.
I have sevral comments:

{code}
-  DigestOutputStream fos = new DigestOutputStream(fout, digester);
+  java.io.OutputStream fos = new DigestOutputStream(fout, digester);
+  fos = new AsyncBufferedOutputStream(fos);
{code}

it could be 
{code}
 java.io.OutputStream fos = new AsyncBufferedOutputStream(
   new DigestOutputStream(fout, digester));
{code}

{code}
-loadSecretManagerState(in);
+//loadSecretManagerState(in);
{code}

why comment out load secret manager?


I'll continue review the patch later.


> Improve the speed of saveNameSpace，making HDFS restart and checkPoint faster
> 
>
> Key: HDFS-6154
> URL: https://issues.apache.org/jira/browse/HDFS-6154
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: guodongdong
> Attachments: HDFS-6154-patch
>
>
> There are two stage In namenode savenamespace,  serializing INode, calculate 
> MD5 and write to disk.  Now, two stage is doing serially, In this 
> improvement, it is doing  parallel, one thread do serializing INode, other 
> thread do calculating MD5 and writing to disk, it double speed of 
> savenamespace, Detail is show in table:
> Testing environment:
>   only test namenode savenamespace, dfsadmin -saveNamespace
> machine: 144GB, Intel(R) Xeon(R) CPU  E5645  @ 2.40GHz, 12 cpu, Raid 5 
> SAS Disk,  jdk 1.7.0
>  
> ||image size||before optimizing||after optimizing ||
> |1.2GB|22sec|11sec|
> |4.3GB|66sec|36sec|
> |22GB|406sec|250sec|



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6154) Improve the speed of saveNameSpace，making HDFS restart and checkPoint faster

2014-03-25 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6154:
--

Assignee: (was: Fengdong Yu)

> Improve the speed of saveNameSpace，making HDFS restart and checkPoint faster
> 
>
> Key: HDFS-6154
> URL: https://issues.apache.org/jira/browse/HDFS-6154
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: guodongdong
> Attachments: HDFS-6154-patch
>
>
> There are two stage In namenode savenamespace,  serializing INode, calculate 
> MD5 and write to disk.  Now, two stage is doing serially, In this 
> improvement, it is doing  parallel, one thread do serializing INode, other 
> thread do calculating MD5 and writing to disk, it double speed of 
> savenamespace, Detail is show in table:
> Testing environment:
>   only test namenode savenamespace, dfsadmin -saveNamespace
> machine: 144GB, Intel(R) Xeon(R) CPU  E5645  @ 2.40GHz, 12 cpu, Raid 5 
> SAS Disk,  jdk 1.7.0
>  
> ||image size||before optimizing||after optimizing ||
> |1.2GB|22sec|11sec|
> |4.3GB|66sec|36sec|
> |22GB|406sec|250sec|



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-24 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946201#comment-13946201
 ] 

Fengdong Yu commented on HDFS-6130:
---

[~wheat9], 

fsimage was uploaded.
please read my following steps carefully before fix the bug.

1)There is no HA enabled during these steps.
2)all test files are all less than one block size

a. start hadoop-1.0.4 hdfs
b. put  one files on the hdfs
c. stop hdfs.
d. start dfs with upgrade option to the lastest trunk
e. put more than ten files on the hdfs
f. stop hdfs
g. start hdfs  (NPE here)

NOTE. if put a few files(such as one file) at step e, there is no NPE at step g.



> NPE during namenode upgrade from old release
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: fsimage.tar.gz
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-24 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6130:
--

Attachment: fsimage.tar.gz

> NPE during namenode upgrade from old release
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: fsimage.tar.gz
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-24 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946089#comment-13946089
 ] 

Fengdong Yu commented on HDFS-6130:
---

please ignore my create check point method, that's wrong.

> NPE during namenode upgrade from old release
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-24 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946073#comment-13946073
 ] 

Fengdong Yu commented on HDFS-6130:
---

hi [~szetszwo], where you get 1.3.0? 

> NPE during namenode upgrade from old release
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-24 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945992#comment-13945992
 ] 

Fengdong Yu commented on HDFS-6130:
---

OK, no problem, I can using  rollingUpgrade -prepare to create check point.

> NPE during namenode upgrade from old release
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-24 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945943#comment-13945943
 ] 

Fengdong Yu commented on HDFS-6130:
---

Thanks [~szetszwo]!

[~wheat9], do you want only fsimage or both image and edit log? I'll reproduce 
today using 1.3.0 and the latest trunk, then I'll keep the corresponding 
fsimage and edit logs.

> NPE during namenode upgrade from old release
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-21 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943939#comment-13943939
 ] 

Fengdong Yu commented on HDFS-6130:
---

update:

I miss a step between step 2 and step3. Add as step2.1, otherwise, all upgrades 
succeed.
step2.1:  
{code}
hdfs dfs -put test.data /
{code}

So, after upgrade from Apache1.x to the trunk, we MUST writer HDFS before ha 
enabled in the next step.
I don't find any unit tests cover this scenrio.


> NPE during namenode upgrade from old release
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-21 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6130:
--

Summary: NPE during namenode upgrade from old release  (was: NPE during 
namenode startup)

> NPE during namenode upgrade from old release
> 
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6137) Datanode cannot rollback because LayoutVersion incorrect

2014-03-21 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6137:
--

Description: 
upgrade from hadoop-2.0.5-alpha(QJM HA enabled) to the lastest trunk(HA 
disabled), which is successful. then stop the cluster, and rollback,  then it 
throw exception:

{code}
2014-03-21 18:33:19,384 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for block pool Block pool 
BP-1123524590-10.204.8.135-1395397158134 (storage id 
DS-1123524590-10.204.8.135-50010-1395397185148) service to 
10-204-8-135/10.204.8.135:9000
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory 
/data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported: 
-55. Expecting = -40.
at 
org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
at 
org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:744)
{code}
  
I looked at the datanode dir,  $datanode.dir/VERSION is always new, when we 
upgrade, this file was overwrited, so it MUST fail during rollback.


  was:
upgrade from hadoop-2.0.5-alpha(HA enabled) to the lastest trunk(HA disabled), 
which is successful. then stop the cluster, and rollback,  then it throw 
exception:

{code}
2014-03-21 18:33:19,384 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for block pool Block pool 
BP-1123524590-10.204.8.135-1395397158134 (storage id 
DS-1123524590-10.204.8.135-50010-1395397185148) service to 
10-204-8-135/10.204.8.135:9000
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory 
/data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported: 
-55. Expecting = -40.
at 
org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
at 
org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:744)
{code}
  
I looked at the datanode dir,  $datanode.dir/VERSION is always new, when we 
upgrade, this file was overwrited, so it MUST fail during rollback.



> Datanode cannot rollback because LayoutVersion incorrect
> 
>
> Key: HDFS-6137
> URL: https://issues.apache.org/jira/browse/HDFS-6137
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> upgrade from hadoop-2.0.5-alpha(QJM HA enabled) to the lastest trunk(HA 
> disabled), which is successful. then stop the cluster, and rollback,  then it 
> throw exception:
> {code}
> 2014-03-21 18:33:19,384 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-1123524

[jira] [Commented] (HDFS-6130) NPE during namenode startup

2014-03-21 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943012#comment-13943012
 ] 

Fengdong Yu commented on HDFS-6130:
---

Add some reproduce steps:
upgrade from Apache 1.x to the latest trunk as follows to reproduce this issue:

1) running a normal HDFS v1.x, then stop HDFS.
2) switch to the new deploy(trunk), without HA, run command: 'start-dfs.sh 
-upgrade', which should be successful.
3) stop the HDFS, enable QJM HA in the configuration, scp NAME-NODE-dir to the 
SNN
4) then start journal nodes, run command: 'hdfs namenode 
-initializeSharedEdits', - throws NPE here.

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6137) Datanode cannot rollback because LayoutVersion incorrect

2014-03-21 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6137:
--

Description: 
upgrade from hadoop-2.0.5-alpha(HA enabled) to the lastest trunk(HA disabled), 
which is successful. then stop the cluster, and rollback,  then it throw 
exception:

{code}
2014-03-21 18:33:19,384 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for block pool Block pool 
BP-1123524590-10.204.8.135-1395397158134 (storage id 
DS-1123524590-10.204.8.135-50010-1395397185148) service to 
10-204-8-135/10.204.8.135:9000
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory 
/data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported: 
-55. Expecting = -40.
at 
org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
at 
org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:744)
{code}
  
I looked at the datanode dir,  {datanode.dir}/VERSION is always new, when we 
upgrade, this file was overwrited, so it MUST fail during rollback.


> Datanode cannot rollback because LayoutVersion incorrect
> 
>
> Key: HDFS-6137
> URL: https://issues.apache.org/jira/browse/HDFS-6137
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> upgrade from hadoop-2.0.5-alpha(HA enabled) to the lastest trunk(HA 
> disabled), which is successful. then stop the cluster, and rollback,  then it 
> throw exception:
> {code}
> 2014-03-21 18:33:19,384 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-1123524590-10.204.8.135-1395397158134 (storage id 
> DS-1123524590-10.204.8.135-50010-1395397185148) service to 
> 10-204-8-135/10.204.8.135:9000
> org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
> version of storage directory 
> /data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported: 
> -55. Expecting = -40.
> at 
> org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
> at 
> org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
> at java.lang.Thread.run(Thread.java:744)
> {code}
>   
> I looked at the datanode dir,  {datanode.dir}/VERSION is always new, when we 
> upgrade, this file was overwrited, so it MUST fail during rollback.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6137) Datanode cannot rollback because LayoutVersion incorrect

2014-03-21 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6137:
--

Description: 
upgrade from hadoop-2.0.5-alpha(HA enabled) to the lastest trunk(HA disabled), 
which is successful. then stop the cluster, and rollback,  then it throw 
exception:

{code}
2014-03-21 18:33:19,384 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for block pool Block pool 
BP-1123524590-10.204.8.135-1395397158134 (storage id 
DS-1123524590-10.204.8.135-50010-1395397185148) service to 
10-204-8-135/10.204.8.135:9000
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory 
/data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported: 
-55. Expecting = -40.
at 
org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
at 
org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:744)
{code}
  
I looked at the datanode dir,  $datanode.dir/VERSION is always new, when we 
upgrade, this file was overwrited, so it MUST fail during rollback.


  was:
upgrade from hadoop-2.0.5-alpha(HA enabled) to the lastest trunk(HA disabled), 
which is successful. then stop the cluster, and rollback,  then it throw 
exception:

{code}
2014-03-21 18:33:19,384 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for block pool Block pool 
BP-1123524590-10.204.8.135-1395397158134 (storage id 
DS-1123524590-10.204.8.135-50010-1395397185148) service to 
10-204-8-135/10.204.8.135:9000
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory 
/data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported: 
-55. Expecting = -40.
at 
org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
at 
org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:744)
{code}
  
I looked at the datanode dir,  {datanode.dir}/VERSION is always new, when we 
upgrade, this file was overwrited, so it MUST fail during rollback.



> Datanode cannot rollback because LayoutVersion incorrect
> 
>
> Key: HDFS-6137
> URL: https://issues.apache.org/jira/browse/HDFS-6137
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> upgrade from hadoop-2.0.5-alpha(HA enabled) to the lastest trunk(HA 
> disabled), which is successful. then stop the cluster, and rollback,  then it 
> throw exception:
> {code}
> 2014-03-21 18:33:19,384 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-1123524590-10.

[jira] [Created] (HDFS-6137) Datanode cannot rollback because LayoutVersion incorrect

2014-03-21 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-6137:
-

 Summary: Datanode cannot rollback because LayoutVersion incorrect
 Key: HDFS-6137
 URL: https://issues.apache.org/jira/browse/HDFS-6137
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Fengdong Yu






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode startup

2014-03-21 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942877#comment-13942877
 ] 

Fengdong Yu commented on HDFS-6130:
---

[~sureshms] , Thanks for your comments.

bq.. Given apache releases do not have this issue
Apache release also has this issue. Apache 1.0.4 upgrade to the trunk, you can 
reproduce this issue.

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6130:
--

Attachment: (was: HDFS-6130.patch)

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942764#comment-13942764
 ] 

Fengdong Yu commented on HDFS-6130:
---

cacel patch, It loss data sometimes.

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: HDFS-6130.patch
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6130:
--

Status: Open  (was: Patch Available)

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: HDFS-6130.patch
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6130:
--

Status: Patch Available  (was: Open)

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: HDFS-6130.patch
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6130:
--

Attachment: (was: HDFS-6130.patch)

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: HDFS-6130.patch
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6130:
--

Attachment: HDFS-6130.patch

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: HDFS-6130.patch
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-6130:
--

Attachment: HDFS-6130.patch

I am not sure I resolved the root casue, but It works now.

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
> Attachments: HDFS-6130.patch
>
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942639#comment-13942639
 ] 

Fengdong Yu commented on HDFS-6130:
---

[~sureshms] maybe you are right, I have three test clusters, only this one is 
cdh release.  others are all Apache release.

but cdh old release can upgrade to Apache 2.2.0 successfully. so I don't think 
this was caused by cdh release.


> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942627#comment-13942627
 ] 

Fengdong Yu commented on HDFS-6130:
---

[~andrew.wang] , can you also take a look? 

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5988) Bad fsimage always generated after upgrade

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942603#comment-13942603
 ] 

Fengdong Yu commented on HDFS-5988:
---

[~andrew.wang] , please take a look HDFS-6130,  still failed upgrade from an 
old release.

> Bad fsimage always generated after upgrade
> --
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Fix For: 2.4.0
>
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode startup

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942602#comment-13942602
 ] 

Fengdong Yu commented on HDFS-6130:
---

[~szetszwo], I've included the HDFS-5988,  I tested trunk from Revision: 1579559

> NPE during namenode startup
> ---
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during upgrade using trunk after RU merged

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941664#comment-13941664
 ] 

Fengdong Yu commented on HDFS-6130:
---

again, I looked through the code, but still not sure the root cause.

If I upgrade from 0.20.2-cdh3u1 to 2.2.0(HA disabled), it's successful. and 
then from 2.2.0 to trunk(HA enabled), which is also successful.

> NPE during upgrade using trunk after RU merged
> --
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6129) When a replica is not found for deletion, do not throw exception.

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941661#comment-13941661
 ] 

Fengdong Yu commented on HDFS-6129:
---

bq. another, it'd better also add to 'errors' as other exception messages if 
replica cannot be found.

Sorry, please ignore this line.

> When a replica is not found for deletion, do not throw exception.
> -
>
> Key: HDFS-6129
> URL: https://issues.apache.org/jira/browse/HDFS-6129
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h6129_20140319.patch
>
>
> It is actually a valid case if a replica is not found for deletion -- the 
> replica may be deleted earlier.  So that we should not throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during upgrade using trunk after RU merged

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941579#comment-13941579
 ] 

Fengdong Yu commented on HDFS-6130:
---

I try another way, but still failed.
I upgraded from 0.20.2-cdh3u1 to 1.2.1 firstly - successful, then upgraded from 
1.2.1 to trunk, also NPE



> NPE during upgrade using trunk after RU merged
> --
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6129) When a replica is not found for deletion, do not throw exception.

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941527#comment-13941527
 ] 

Fengdong Yu commented on HDFS-6129:
---

one minor comments:
{code}
+f = info.getBlockFile();
 v = (FsVolumeImpl)info.getVolume();
 if (f == null) {
-  LOG.warn("Failed to delete replica " + invalidBlks[i]
+  errors.add("Failed to delete replica " + invalidBlks[i]
   +  ": File not found, volume=" + v);
-  error = true;
   continue;
 }
{code}

move {code} v = (FsVolumeImpl)info.getVolume(); {code} after the 'if' block.

another,  it'd better also add to 'errors' as other exception messages if 
replica cannot be found.


> When a replica is not found for deletion, do not throw exception.
> -
>
> Key: HDFS-6129
> URL: https://issues.apache.org/jira/browse/HDFS-6129
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h6129_20140319.patch
>
>
> It is actually a valid case if a replica is not found for deletion -- the 
> replica may be deleted earlier.  So that we should not throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during upgrade using trunk after RU merged

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941513#comment-13941513
 ] 

Fengdong Yu commented on HDFS-6130:
---

This looks like not related with RU, [~jingzhao], [~wheat9] , I think this was 
caused by protobuf seralized fsimage. can you take a look?

> NPE during upgrade using trunk after RU merged
> --
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during upgrade using trunk after RU merged

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941483#comment-13941483
 ] 

Fengdong Yu commented on HDFS-6130:
---

Add some addtional informations:
1)  I upgraded from 0.20.2-cdh3u1 to trunk with HA disabled  -  successful.
2)  stop HDFS, and enable HA
3) start journal nodes and run:  hdfs namenode -initializeSharedEdits - NPE


> NPE during upgrade using trunk after RU merged
> --
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during upgrade using trunk after RU merged

2014-03-20 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941480#comment-13941480
 ] 

Fengdong Yu commented on HDFS-6130:
---

but If I upgrade from 2.x to the trunk, that's ok for me.

> NPE during upgrade using trunk after RU merged
> --
>
> Key: HDFS-6130
> URL: https://issues.apache.org/jira/browse/HDFS-6130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
> I can upgrade successfully if I don't configurage HA, but if HA enabled,
> there is NPE when I run ' hdfs namenode -initializeSharedEdits'
> {code}
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
> enabled
> 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
> total heap and retry cache entry expiry time is 60 millis
> 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
> NameNodeRetryCache
> 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
> 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
> 275.3 KB
> 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
> 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
> 14/03/20 15:06:41 INFO common.Storage: Lock on 
> /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO common.Storage: Lock on 
> /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
> 7326@10-150-170-176
> 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
> 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
> 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
> 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6130) NPE during upgrade using trunk after RU merged

2014-03-20 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-6130:
-

 Summary: NPE during upgrade using trunk after RU merged
 Key: HDFS-6130
 URL: https://issues.apache.org/jira/browse/HDFS-6130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu


I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 

I can upgrade successfully if I don't configurage HA, but if HA enabled,
there is NPE when I run ' hdfs namenode -initializeSharedEdits'

{code}
14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
total heap and retry cache entry expiry time is 60 millis
14/03/20 15:06:41 INFO util.GSet: Computing capacity for map NameNodeRetryCache
14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
275.3 KB
14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
14/03/20 15:06:41 INFO common.Storage: Lock on 
/data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176
14/03/20 15:06:42 INFO common.Storage: Lock on 
/data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176
14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
/
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6113) Rolling upgrae exception

2014-03-17 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938817#comment-13938817
 ] 

Fengdong Yu commented on HDFS-6113:
---

continue above, if your answer is yes, then we'd better add some addtional text 
in the HDFS-5778, [~szetszwo] , do you think so? 

> Rolling upgrae exception
> 
>
> Key: HDFS-6113
> URL: https://issues.apache.org/jira/browse/HDFS-6113
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I've a hadoop-2.3 running non-securable on the cluster. then I built a trunk 
> instance, also non securable.
> NN1 - active
> NN2 - standby
> DN1 - datanode 
> DN2 - datanode
> JN1,JN2,JN3 - Journal and ZK
> then on the NN2:
> {code}
> hadoop-dameon.sh stop namenode
> hadoop-dameon.sh stop zkfc
> {code}
> then:
> change the environment variables to the new hadoop.(trunk version)
> then:
> {code}
> hadoop-dameon.sh start namenode
> {code}
> NN2 throws exception:
> {code}
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not journal 
> CTime for one more JournalNodes. 1 exceptions thrown:
> 10.100.91.33:8485: Failed on local exception: java.io.EOFException; Host 
> Details : local host is: "10-204-8-136/10.204.8.136"; destination host is: 
> "jn33.com":8485;
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getJournalCTime(QuorumJournalManager.java:631)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.getSharedLogCTime(FSEditLog.java:1383)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:738)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:600)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:360)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:258)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:444)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:500)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:656)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:641)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1294)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> {code}
> JN throws Exception:
> {code}
> 2014-03-18 12:19:01,960 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 8485: readAndProcess threw exception java.io.IOException: Unable 
> to read authentication method from client 10.204.8.136. Count of bytes read: 0
> java.io.IOException: Unable to read authentication method
>   at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1344)
>   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:761)
>   at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:560)
>   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:535)
> 2014-03-18 12:19:01,960 DEBUG org.apache.hadoop.ipc.Server: IPC Server 
> listener on 8485: disconnecting client 10.204.8.136:39063. Number of active 
> connections: 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6113) Rolling upgrae exception

2014-03-17 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938815#comment-13938815
 ] 

Fengdong Yu commented on HDFS-6113:
---

[~jingzhao],  I know RU is start from 2.4, but HDFS-5535 has been merged with 
the trunk.
my target version is built from trunk, but, do you mean,  both the new and the 
old version should all support rolling upgrade? 

> Rolling upgrae exception
> 
>
> Key: HDFS-6113
> URL: https://issues.apache.org/jira/browse/HDFS-6113
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Fengdong Yu
>
> I've a hadoop-2.3 running non-securable on the cluster. then I built a trunk 
> instance, also non securable.
> NN1 - active
> NN2 - standby
> DN1 - datanode 
> DN2 - datanode
> JN1,JN2,JN3 - Journal and ZK
> then on the NN2:
> {code}
> hadoop-dameon.sh stop namenode
> hadoop-dameon.sh stop zkfc
> {code}
> then:
> change the environment variables to the new hadoop.(trunk version)
> then:
> {code}
> hadoop-dameon.sh start namenode
> {code}
> NN2 throws exception:
> {code}
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not journal 
> CTime for one more JournalNodes. 1 exceptions thrown:
> 10.100.91.33:8485: Failed on local exception: java.io.EOFException; Host 
> Details : local host is: "10-204-8-136/10.204.8.136"; destination host is: 
> "jn33.com":8485;
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getJournalCTime(QuorumJournalManager.java:631)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.getSharedLogCTime(FSEditLog.java:1383)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:738)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:600)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:360)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:258)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:444)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:500)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:656)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:641)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1294)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
> {code}
> JN throws Exception:
> {code}
> 2014-03-18 12:19:01,960 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 8485: readAndProcess threw exception java.io.IOException: Unable 
> to read authentication method from client 10.204.8.136. Count of bytes read: 0
> java.io.IOException: Unable to read authentication method
>   at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1344)
>   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:761)
>   at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:560)
>   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:535)
> 2014-03-18 12:19:01,960 DEBUG org.apache.hadoop.ipc.Server: IPC Server 
> listener on 8485: disconnecting client 10.204.8.136:39063. Number of active 
> connections: 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6113) Rolling upgrae exception

2014-03-17 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-6113:
-

 Summary: Rolling upgrae exception
 Key: HDFS-6113
 URL: https://issues.apache.org/jira/browse/HDFS-6113
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Fengdong Yu


I've a hadoop-2.3 running non-securable on the cluster. then I built a trunk 
instance, also non securable.

NN1 - active
NN2 - standby
DN1 - datanode 
DN2 - datanode
JN1,JN2,JN3 - Journal and ZK

then on the NN2:
{code}
hadoop-dameon.sh stop namenode
hadoop-dameon.sh stop zkfc
{code}

then:
change the environment variables to the new hadoop.(trunk version)

then:

{code}
hadoop-dameon.sh start namenode
{code}

NN2 throws exception:
{code}
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not journal CTime 
for one more JournalNodes. 1 exceptions thrown:
10.100.91.33:8485: Failed on local exception: java.io.EOFException; Host 
Details : local host is: "10-204-8-136/10.204.8.136"; destination host is: 
"jn33.com":8485;
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getJournalCTime(QuorumJournalManager.java:631)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.getSharedLogCTime(FSEditLog.java:1383)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:738)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:600)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:360)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:258)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:444)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:500)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:656)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:641)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1294)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
{code}


JN throws Exception:
{code}
2014-03-18 12:19:01,960 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 8485: readAndProcess threw exception java.io.IOException: Unable to read 
authentication method from client 10.204.8.136. Count of bytes read: 0
java.io.IOException: Unable to read authentication method
at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1344)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:761)
at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:560)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:535)
2014-03-18 12:19:01,960 DEBUG org.apache.hadoop.ipc.Server: IPC Server listener 
on 8485: disconnecting client 10.204.8.136:39063. Number of active connections: 
1
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

2014-02-26 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913831#comment-13913831
 ] 

Fengdong Yu commented on HDFS-5535:
---

Thanks for the test plan, I'll start test on our dev cluster.

> Umbrella jira for improved HDFS rolling upgrades
> 
>
> Key: HDFS-5535
> URL: https://issues.apache.org/jira/browse/HDFS-5535
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ha, hdfs-client, namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Nathan Roberts
> Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, 
> h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch, 
> h5535_20140221-2031.patch, h5535_20140224-1931.patch, 
> h5535_20140225-1225.patch, h5535_20140226-1328.patch, hdfs-5535-test-plan.pdf
>
>
> In order to roll a new HDFS release through a large cluster quickly and 
> safely, a few enhancements are needed in HDFS. An initial High level design 
> document will be attached to this jira, and sub-jiras will itemize the 
> individual tasks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2014-02-25 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912360#comment-13912360
 ] 

Fengdong Yu commented on HDFS-4504:
---

[~cmccabe] this was opened for a long time. please go back. 

can you also add these new configurable items to the hdfs-default.xml? Thanks.

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch, 
> HDFS-4504.015.patch, HDFS-4504.016.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5778) Document new commands and parameters for improved rolling upgrades

2014-02-24 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911087#comment-13911087
 ] 

Fengdong Yu commented on HDFS-5778:
---

The new patch looks really good.

> Document new commands and parameters for improved rolling upgrades
> --
>
> Key: HDFS-5778
> URL: https://issues.apache.org/jira/browse/HDFS-5778
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Akira AJISAKA
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: h5778_20140220.patch, h5778_20140221.patch, 
> h5778_20140224.patch, h5778_20140224b.patch
>
>
> "hdfs dfsadmin -rollingUpgrade" command was newly added in HDFS-5752, and 
> some other commands and parameters will be added in the future. This issue 
> exists to flag undocumented commands and parameters when HDFS-5535 branch is 
> merging to trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-6003) Add '-rollingUpgrade ' to namenode usage message

2014-02-24 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910172#comment-13910172
 ] 

Fengdong Yu commented on HDFS-6003:
---

[~vinayrpet], can you also take a look HDFS-5778,  which is rolling upgrade 
document, I do think you'd better update HDFS-5578.

> Add '-rollingUpgrade ' to namenode usage message
> 
>
> Key: HDFS-6003
> URL: https://issues.apache.org/jira/browse/HDFS-6003
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Minor
>
> Add '-rollingUpgrade ' to namenode usage message



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-6001) In HDFS HA setup, FileSystem.getUri returns hdfs://

2014-02-23 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910054#comment-13910054
 ] 

Fengdong Yu commented on HDFS-6001:
---

[~jerryjch], I don't think this is a bug, can you paste your context or just 
tell us what you want to do?

but I only advice discuss in the mail list, not here.

> In HDFS HA setup, FileSystem.getUri returns hdfs://
> -
>
> Key: HDFS-6001
> URL: https://issues.apache.org/jira/browse/HDFS-6001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Jerry He
>Priority: Minor
>
> When hdfs is set up with HA enable, FileSystem.getUri returns 
> hdfs://
> Here dfs.nameservices is defined when HA is enabled. In documentation:
> {quote}
> dfs.nameservices - the logical name for this new nameserviceChoose a logical 
> name for this nameservice, for example "mycluster", and use this logical name 
> for the value of this config option. The name you choose is arbitrary. It 
> will be used both for configuration and as the authority component of 
> absolute HDFS paths in the cluster.
> Note: If you are also using HDFS Federation, this configuration setting 
> should also include the list of other nameservices, HA or otherwise, as a 
> comma-separated list.
> 
>   dfs.nameservices
>   mycluster
> 
> {quote}
> This is probably ok or even intended.  But a caller may further process the 
> URI, for example, call URI.getHost(). This will return the 'mycluster', which 
> is not a valid host anywhere.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5778) Document new commands and parameters for improved rolling upgrades

2014-02-23 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910011#comment-13910011
 ] 

Fengdong Yu commented on HDFS-5778:
---

hi [~szetszwo],
please delete above comments, which is invalid format.

I read the document again,
{code}
hdfs dfsadmin -rollingUpgrade start
{code}

What steps this command indicates? 
Does that shutdown *cluster* firstly, then start HDFS with -upgrade options? or 
start to upgrade NNs and DNs one by one?

from the code patch, it only calls 
{code}
namesystem.startRollingUpgrade();
{code}


> Document new commands and parameters for improved rolling upgrades
> --
>
> Key: HDFS-5778
> URL: https://issues.apache.org/jira/browse/HDFS-5778
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Akira AJISAKA
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5778_20140220.patch, h5778_20140221.patch
>
>
> "hdfs dfsadmin -rollingUpgrade" command was newly added in HDFS-5752, and 
> some other commands and parameters will be added in the future. This issue 
> exists to flag undocumented commands and parameters when HDFS-5535 branch is 
> merging to trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5778) Document new commands and parameters for improved rolling upgrades

2014-02-23 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910009#comment-13910009
 ] 

Fengdong Yu commented on HDFS-5778:
---

[~szetszwo], I read the document again, 
{code{
hdfs dfsadmin -rollingUpgrade start
{code}

What steps this command indicates? 
Does that shutdown *cluster* firstly, then start HDFS with -upgrade options? or 
start to upgrade NNs and DNs one by one?

from the code patch, it only calls 
{code{
namesystem.startRollingUpgrade();
{code}



> Document new commands and parameters for improved rolling upgrades
> --
>
> Key: HDFS-5778
> URL: https://issues.apache.org/jira/browse/HDFS-5778
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Akira AJISAKA
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5778_20140220.patch, h5778_20140221.patch
>
>
> "hdfs dfsadmin -rollingUpgrade" command was newly added in HDFS-5752, and 
> some other commands and parameters will be added in the future. This issue 
> exists to flag undocumented commands and parameters when HDFS-5535 branch is 
> merging to trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5778) Document new commands and parameters for improved rolling upgrades

2014-02-22 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909670#comment-13909670
 ] 

Fengdong Yu commented on HDFS-5778:
---

Thanks [~szetszwo], I misunderstand "dfsadmin -rollingUpgrade start", I believe 
 this command will shut down DN and NN one by one, not shut down all daemons at 
one time.

but for a rolling upgrade, it should be upgrade daemons one by one. right? If 
it's not true currently, we'd better clarify it in the document.


> Document new commands and parameters for improved rolling upgrades
> --
>
> Key: HDFS-5778
> URL: https://issues.apache.org/jira/browse/HDFS-5778
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Akira AJISAKA
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5778_20140220.patch, h5778_20140221.patch
>
>
> "hdfs dfsadmin -rollingUpgrade" command was newly added in HDFS-5752, and 
> some other commands and parameters will be added in the future. This issue 
> exists to flag undocumented commands and parameters when HDFS-5535 branch is 
> merging to trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

1 2 3 4 >

1 - 100 of 305 matches

Mail list logo