[jira] [Commented] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab

2012-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482158#comment-13482158
 ] 

Hadoop QA commented on HDFS-4105:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550378/HDFS-4105.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3384//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3384//console

This message is automatically generated.

> the SPNEGO user for secondary namenode should use the web keytab
> 
>
> Key: HDFS-4105
> URL: https://issues.apache.org/jira/browse/HDFS-4105
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.0.2-alpha
>Reporter: Arpit Gupta
>Assignee: Arpit Gupta
> Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch
>
>
> This is similar to HDFS-3466 where we made sure the namenode checks for the 
> web keytab before it uses the namenode keytab.
> The same needs to be done for secondary namenode as well.
> {code}
> String httpKeytab = 
>   conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY);
> if (httpKeytab != null && !httpKeytab.isEmpty()) {
>   params.put("kerberos.keytab", httpKeytab);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482131#comment-13482131
 ] 

Jing Zhao commented on HDFS-4067:
-

testcase failure reported in HDFS-3948 before. Will run 
TestUnderReplicatedBlocks in loop later.  

> TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
> ---
>
> Key: HDFS-4067
> URL: https://issues.apache.org/jira/browse/HDFS-4067
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Jing Zhao
>  Labels: test-fail
> Attachments: HDFS-4067.trunk.001.patch
>
>
> After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
> the root cause of the failure is ReplicaAlreadyExistsException:
> {noformat}
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
> already exists in state FINALIZED and thus cannot be created.
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:155)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482127#comment-13482127
 ] 

Hadoop QA commented on HDFS-4067:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12550126/HDFS-4067.trunk.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHDFS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3383//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3383//console

This message is automatically generated.

> TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
> ---
>
> Key: HDFS-4067
> URL: https://issues.apache.org/jira/browse/HDFS-4067
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Jing Zhao
>  Labels: test-fail
> Attachments: HDFS-4067.trunk.001.patch
>
>
> After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
> the root cause of the failure is ReplicaAlreadyExistsException:
> {noformat}
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
> already exists in state FINALIZED and thus cannot be created.
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:155)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HDFS-4097) provide CLI support for create/delete/list snapshots

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482123#comment-13482123
 ] 

Suresh Srinivas edited comment on HDFS-4097 at 10/23/12 5:31 AM:
-

The design document already covers admin operation for allowing snapshots and 
marking it as snapshottable. Here are the Admin CLIs for that:
* Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }}
* Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot 
}}

User/Admin CLIs for creating, deleting and listing snapshots:
* Creating a snapshot: {{hadoop dfs -createSnapshot  }}
* Deleting a snapshot: {{hadoop dfs –deleteSnapshot  }}
* Listing of the snapshots: {{hadoop dfs -listSnapshots }}

Will try to get an updated document posted by tomorrow with this information.



  was (Author: sureshms):
The design document already covers admin operation for allowing snapshots 
and marking it as snapshottable. Here are the Admin CLIs for that:
* Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }}
* Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot 
}}

User/Admin CLIs for creating, deleting and listing snapshots:
* Creating a snapshot: {{hadoop dfs -createSnapshot  }}
* Deleting a snapshot: {{hadoop dfs –deleteSnapshot }}
* Listing of the snapshots: {{hadoop dfs -listSnapshots}}

Will try to get an updated document posted by tomorrow with this information.


  
> provide CLI support for create/delete/list snapshots
> 
>
> Key: HDFS-4097
> URL: https://issues.apache.org/jira/browse/HDFS-4097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4097.patch, HDFS-4097.patch
>
>
> provide CLI support for create/delete/list snapshots

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HDFS-4097) provide CLI support for create/delete/list snapshots

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482123#comment-13482123
 ] 

Suresh Srinivas edited comment on HDFS-4097 at 10/23/12 5:30 AM:
-

The design document already covers admin operation for allowing snapshots and 
marking it as snapshottable. Here are the Admin CLIs for that:
* Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }}
* Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot 
}}

User/Admin CLIs for creating, deleting and listing snapshots:
* Creating a snapshot: {{hadoop dfs -createSnapshot  }}
* Deleting a snapshot: {{hadoop dfs –deleteSnapshot }}
* Listing of the snapshots: {{hadoop dfs -listSnapshots}}

Will try to get an updated document posted by tomorrow with this information.



  was (Author: sureshms):
The design document already covers admin operation for allowing snapshots 
and marking it as snapshottable. Here are the Admin CLIs for that:
* Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }}
* Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot 
}}

User/Admin CLIs for creating, deleting and listing snapshots:
* Creating a snapshot: {{hadoop dfs -createSnapshot   }}
* Deleting a snapshot: {{hadoop dfs –deleteSnapshot }}
* Listing of the snapshots: {{hadoop dfs -listSnapshot}}

Will try to get an updated document posted by tomorrow with this information.


  
> provide CLI support for create/delete/list snapshots
> 
>
> Key: HDFS-4097
> URL: https://issues.apache.org/jira/browse/HDFS-4097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4097.patch, HDFS-4097.patch
>
>
> provide CLI support for create/delete/list snapshots

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4084) provide CLI support for allow and disallow snapshot on a directory

2012-10-22 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-4084:
--

Labels: needs-test  (was: )

> provide CLI support for allow and disallow snapshot on a directory
> --
>
> Key: HDFS-4084
> URL: https://issues.apache.org/jira/browse/HDFS-4084
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node, tools
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4084.patch, HDFS-4084.patch
>
>
> To provide CLI support to allow snapshot, disallow snapshot on a directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4084) provide CLI support for allow and disallow snapshot on a directory

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482125#comment-13482125
 ] 

Suresh Srinivas commented on HDFS-4084:
---

Comments:
# When you are overriding, no need to add javadoc to the method if you plan on 
inheriting changes from the super class. So javadoc for DistributedFileSystem 
methods you have added can be deleted.
# Why are you adding @VisibleForTesting for snapshot related methods in 
FSNamesystem. Also please proper javadoc to the methods.
# FSNamesystem.java is unnecessarily importing SnapshotInfo?
# NamenodeRpcServer.java remove comment "Client Protocol" for overridden 
protocol methods


> provide CLI support for allow and disallow snapshot on a directory
> --
>
> Key: HDFS-4084
> URL: https://issues.apache.org/jira/browse/HDFS-4084
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node, tools
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4084.patch, HDFS-4084.patch
>
>
> To provide CLI support to allow snapshot, disallow snapshot on a directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482123#comment-13482123
 ] 

Suresh Srinivas commented on HDFS-4097:
---

The design document already covers admin operation for allowing snapshots and 
marking it as snapshottable. Here are the Admin CLIs for that:
* Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }}
* Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot 
}}

User/Admin CLIs for creating, deleting and listing snapshots:
* Creating a snapshot: {{hadoop dfs -createSnapshot   }}
* Deleting a snapshot: {{hadoop dfs –deleteSnapshot }}
* Listing of the snapshots: {{hadoop dfs -listSnapshot}}

Will try to get an updated document posted by tomorrow with this information.



> provide CLI support for create/delete/list snapshots
> 
>
> Key: HDFS-4097
> URL: https://issues.apache.org/jira/browse/HDFS-4097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4097.patch, HDFS-4097.patch
>
>
> provide CLI support for create/delete/list snapshots

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots

2012-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482119#comment-13482119
 ] 

Aaron T. Myers commented on HDFS-4097:
--

bq. Aaron, a user should be able create snapshots for the directories he is 
allowed to create snapshots on.

Right, but the question is what directories can a user create snapshots on? Is 
it a super-user only operation? This is why I asked the following in HDFS-2802:

{quote}
One question regarding the user experience that I don't see described in the 
document: will creating a snapshot require super user privileges? Or can any 
user create a snapshot of a subdirectory? If the latter, what permissions are 
required to create a snapshot? What if the user doesn't have permissions on 
some files under the subtree of the snapshot target? Does this result in an 
incomplete snapshot? Or a completely failed snapshot?
{quote}

To which you responded:

{quote}
Will add usecases
{quote}

So, what's the answer to my question?

> provide CLI support for create/delete/list snapshots
> 
>
> Key: HDFS-4097
> URL: https://issues.apache.org/jira/browse/HDFS-4097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4097.patch, HDFS-4097.patch
>
>
> provide CLI support for create/delete/list snapshots

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482116#comment-13482116
 ] 

Suresh Srinivas commented on HDFS-4097:
---

Aaron, a user should be able create snapshots for the directories he is allowed 
to create snapshots on. Please see HDFS-4084, where admin related commands are 
being added.

> provide CLI support for create/delete/list snapshots
> 
>
> Key: HDFS-4097
> URL: https://issues.apache.org/jira/browse/HDFS-4097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4097.patch, HDFS-4097.patch
>
>
> provide CLI support for create/delete/list snapshots

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots

2012-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482115#comment-13482115
 ] 

Aaron T. Myers commented on HDFS-4097:
--

I'm a little skeptical that this should be part of the FileSystem class. Is it 
expected that creating a snapshot is something that will be done by ordinary 
users? In a comment in HDFS-2802, I recommended that we make creating and 
deleting a snapshot a super-user only operation. If that's the case, then I 
think we should move this functionality to the HdfsAdmin class.

> provide CLI support for create/delete/list snapshots
> 
>
> Key: HDFS-4097
> URL: https://issues.apache.org/jira/browse/HDFS-4097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4097.patch, HDFS-4097.patch
>
>
> provide CLI support for create/delete/list snapshots

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482111#comment-13482111
 ] 

Suresh Srinivas commented on HDFS-4097:
---

Comments:
# FileSystem.java - "doesn't support listSnapshot" - change listSnapshot to 
listSnapshots
# SnapshotInfo.java - this class cannot be private and evolving since it is 
exposed in FileSystem.java a class that has audience public.
# SnapshotCommand.java
#* there are couple of places your using "_" instead of "-" in registerCommand()
#* Instead of printing "Snap Name"  you could just print "Name"
#* I will format the printing based on "ls" command out put. At least the 
snapshot name should be printed in the end (given its length could vary). What 
format are you printing date in? Can you post an example output?
# javadoc {{@see ClientProtocol#deleteSnap(String snapshotName, String 
snapshotRoot)}} - method should be deleteSnapshot
# When you are overriding, no need to add javadoc to the method if you plan on 
inheriting changes from the super class. So javadoc for DistributedFileSystem 
methods you have added can be deleted.
# FSNamsystem.java
#* Why are you adding @VisibleForTesting for snapshot related methods in 
FSNamesystem. Also please proper javadoc to the methods.
#* Why no just return new SnapshotInfo[0] from #listSnapshots()


> provide CLI support for create/delete/list snapshots
> 
>
> Key: HDFS-4097
> URL: https://issues.apache.org/jira/browse/HDFS-4097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4097.patch, HDFS-4097.patch
>
>
> provide CLI support for create/delete/list snapshots

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482108#comment-13482108
 ] 

Jing Zhao commented on HDFS-4106:
-

Failing testcases are related to HDFS-3616 (TestWebHdfsWithMultipleNameNodes) 
and HDFS-4067 (TestUnderReplicatedBlocks).

> BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be 
> declared as volatile
> --
>
> Key: HDFS-4106
> URL: https://issues.apache.org/jira/browse/HDFS-4106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-4106-trunk.001.patch
>
>
> All these variables may be assigned/read by a testing thread (through 
> BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
> they should be declared as volatile to make sure the "happens-before" 
> consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown

2012-10-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-3616:
---

Assignee: Jing Zhao

> TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException 
> in DN shutdown
> --
>
> Key: HDFS-3616
> URL: https://issues.apache.org/jira/browse/HDFS-3616
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Jing Zhao
>
> I have seen this in precommit build #2743
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>   at java.util.HashMap$EntryIterator.next(HashMap.java:834)
>   at java.util.HashMap$EntryIterator.next(HashMap.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown

2012-10-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482105#comment-13482105
 ] 

Jing Zhao commented on HDFS-3616:
-

Also got this exception in HDFS-4106. Seems like the exception happens because 
a thread is iterating the hashmap bpSlices (FsVolumeImpl#shutdown) while 
another thread is remove entries from the same hashMap 
(FsVolumeImpl#shutdownBlockPool). A quick fix can be changing bpSlices from a 
HashMap to a ConcurrentHashMap.

> TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException 
> in DN shutdown
> --
>
> Key: HDFS-3616
> URL: https://issues.apache.org/jira/browse/HDFS-3616
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Uma Maheswara Rao G
>
> I have seen this in precommit build #2743
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>   at java.util.HashMap$EntryIterator.next(HashMap.java:834)
>   at java.util.HashMap$EntryIterator.next(HashMap.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock

2012-10-22 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-4062.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

I committed the patch to branch-1. Thank you Jing for fixing this.

> In branch-1, FSNameSystem#invalidateWorkForOneNode and 
> FSNameSystem#computeReplicationWorkForBlock should print logs outside of the 
> namesystem lock
> ---
>
> Key: HDFS-4062
> URL: https://issues.apache.org/jira/browse/HDFS-4062
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-4062.b1.001.patch
>
>
> Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode 
> and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long 
> log info level information outside of the namesystem lock. We create this 
> separate jira since the description and code is different for 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482101#comment-13482101
 ] 

Suresh Srinivas commented on HDFS-4062:
---

test-patch is broken in branch-1. Tests not needed because this patch just 
moves the logs outside the lock.

+1 for the patch.

> In branch-1, FSNameSystem#invalidateWorkForOneNode and 
> FSNameSystem#computeReplicationWorkForBlock should print logs outside of the 
> namesystem lock
> ---
>
> Key: HDFS-4062
> URL: https://issues.apache.org/jira/browse/HDFS-4062
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-4062.b1.001.patch
>
>
> Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode 
> and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long 
> log info level information outside of the namesystem lock. We create this 
> separate jira since the description and code is different for 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482098#comment-13482098
 ] 

Hadoop QA commented on HDFS-4106:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12550382/HDFS-4106-trunk.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
  
org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3382//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3382//console

This message is automatically generated.

> BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be 
> declared as volatile
> --
>
> Key: HDFS-4106
> URL: https://issues.apache.org/jira/browse/HDFS-4106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-4106-trunk.001.patch
>
>
> All these variables may be assigned/read by a testing thread (through 
> BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
> they should be declared as volatile to make sure the "happens-before" 
> consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4107) Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4107:
-

Status: Patch Available  (was: Open)

> Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction
> -
>
> Key: HDFS-4107
> URL: https://issues.apache.org/jira/browse/HDFS-4107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h4107_20121022.patch
>
>
> In the namenode code, there are many individual routines checking whether an 
> inode is null and whether it could be cast to 
> INodeFile/INodeFileUnderConstruction.  Let's add utility methods for such 
> checks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4107) Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4107:
-

Attachment: h4107_20121022.patch

h4107_20121022.patch: adds INodeFile.valueOf(..) and 
INodeFileUnderConstruction.valueOf(..).

> Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction
> -
>
> Key: HDFS-4107
> URL: https://issues.apache.org/jira/browse/HDFS-4107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h4107_20121022.patch
>
>
> In the namenode code, there are many individual routines checking whether an 
> inode is null and whether it could be cast to 
> INodeFile/INodeFileUnderConstruction.  Let's add utility methods for such 
> checks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4107) Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-4107:


 Summary: Add utility methods to cast INode to INodeFile and 
INodeFileUnderConstruction
 Key: HDFS-4107
 URL: https://issues.apache.org/jira/browse/HDFS-4107
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


In the namenode code, there are many individual routines checking whether an 
inode is null and whether it could be cast to 
INodeFile/INodeFileUnderConstruction.  Let's add utility methods for such 
checks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482076#comment-13482076
 ] 

Suresh Srinivas commented on HDFS-2434:
---

Did you run test in a loop to ensure it does not fail?

> TestNameNodeMetrics.testCorruptBlock fails intermittently
> -
>
> Key: HDFS-2434
> URL: https://issues.apache.org/jira/browse/HDFS-2434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Jing Zhao
>  Labels: test-fail
> Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, 
> HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch, 
> HDFS-2434.trunk.005.patch
>
>
> java.lang.AssertionError: Bad value for metric CorruptBlocks expected:<1> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:91)
>   at org.junit.Assert.failNotEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:126)
>   at org.junit.Assert.assertEquals(Assert.java:470)
>   at 
> org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
>   at 
> org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
>   at 
> org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:168)
>   at junit.framework.TestCase.runBare(TestCase.java:134)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory

2012-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482058#comment-13482058
 ] 

Hadoop QA commented on HDFS-4104:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550383/hdfs-4104.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3381//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3381//console

This message is automatically generated.

> dfs -test -d prints inappropriate error on nonexistent directory
> 
>
> Key: HDFS-4104
> URL: https://issues.apache.org/jira/browse/HDFS-4104
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-4104.txt
>
>
> Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It 
> should not generate any output due to missing files.  Alas, it prints an 
> error message when {{foo}} does not exist.
> {code}
> $ hdfs dfs -test -d foo; echo $?
> test: `foo': No such file or directory
> 1
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4045) SecondaryNameNode cannot read from QuorumJournal URI

2012-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482022#comment-13482022
 ] 

Hadoop QA commented on HDFS-4045:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550359/hdfs-4045.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3380//console

This message is automatically generated.

> SecondaryNameNode cannot read from QuorumJournal URI
> 
>
> Key: HDFS-4045
> URL: https://issues.apache.org/jira/browse/HDFS-4045
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0
>Reporter: Vinithra Varadharajan
>Assignee: Andy Isaacson
> Attachments: hdfs-4045.txt
>
>
> If HDFS is set up in basic mode (non-HA) with QuorumJournal, and the 
> dfs.namenode.edits.dir is set to only the QuorumJournal URI and no local dir, 
> the SecondaryNameNode is unable to do a checkpoint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482013#comment-13482013
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3540:
--

> After the default is changed, all existing tests except TestEditLogToleration 
> ...

I should say "except for a few" instead since some Recovery Mode tests run with 
DFS_NAMENODE_EDITS_TOLERATION_LENGTH_KEY == -1.

> Further improvement on recovery mode and edit log toleration in branch-1
> 
>
> Key: HDFS-3540
> URL: https://issues.apache.org/jira/browse/HDFS-3540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 1.2.0
>
> Attachments: h3540_20120925.patch, h3540_20120926.patch, 
> h3540_20120927.patch, h3540_20121009.patch, HDFS-3540-b1.004.patch
>
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
> recovery mode feature in branch-1 is dramatically different from the recovery 
> mode in trunk since the edit log implementations in these two branch are 
> different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further 
> improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-3540.
--

   Resolution: Fixed
Fix Version/s: 1.2.0
 Hadoop Flags: Reviewed

I have committed this.

> Further improvement on recovery mode and edit log toleration in branch-1
> 
>
> Key: HDFS-3540
> URL: https://issues.apache.org/jira/browse/HDFS-3540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 1.2.0
>
> Attachments: h3540_20120925.patch, h3540_20120926.patch, 
> h3540_20120927.patch, h3540_20121009.patch, HDFS-3540-b1.004.patch
>
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
> recovery mode feature in branch-1 is dramatically different from the recovery 
> mode in trunk since the edit log implementations in these two branch are 
> different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further 
> improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481992#comment-13481992
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3540:
--

After the default is changed, all existing tests except TestEditLogToleration 
run with DFS_NAMENODE_EDITS_TOLERATION_LENGTH_KEY==0.

> Further improvement on recovery mode and edit log toleration in branch-1
> 
>
> Key: HDFS-3540
> URL: https://issues.apache.org/jira/browse/HDFS-3540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h3540_20120925.patch, h3540_20120926.patch, 
> h3540_20120927.patch, h3540_20121009.patch, HDFS-3540-b1.004.patch
>
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
> recovery mode feature in branch-1 is dramatically different from the recovery 
> mode in trunk since the edit log implementations in these two branch are 
> different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further 
> improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4106:


Status: Patch Available  (was: Open)

> BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be 
> declared as volatile
> --
>
> Key: HDFS-4106
> URL: https://issues.apache.org/jira/browse/HDFS-4106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-4106-trunk.001.patch
>
>
> All these variables may be assigned/read by a testing thread (through 
> BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
> they should be declared as volatile to make sure the "happens-before" 
> consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4067:


Status: Patch Available  (was: Open)

> TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
> ---
>
> Key: HDFS-4067
> URL: https://issues.apache.org/jira/browse/HDFS-4067
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Jing Zhao
>  Labels: test-fail
> Attachments: HDFS-4067.trunk.001.patch
>
>
> After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
> the root cause of the failure is ReplicaAlreadyExistsException:
> {noformat}
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
> already exists in state FINALIZED and thus cannot be created.
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:155)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481983#comment-13481983
 ] 

Suresh Srinivas commented on HDFS-3540:
---

+1 for the patch. Are there any tests for 
DFS_NAMENODE_EDITS_TOLERATION_LENGTH_KEY set to 0?

> Further improvement on recovery mode and edit log toleration in branch-1
> 
>
> Key: HDFS-3540
> URL: https://issues.apache.org/jira/browse/HDFS-3540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h3540_20120925.patch, h3540_20120926.patch, 
> h3540_20120927.patch, h3540_20121009.patch, HDFS-3540-b1.004.patch
>
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
> recovery mode feature in branch-1 is dramatically different from the recovery 
> mode in trunk since the edit log implementations in these two branch are 
> different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further 
> improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory

2012-10-22 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4104:


Status: Patch Available  (was: Open)

> dfs -test -d prints inappropriate error on nonexistent directory
> 
>
> Key: HDFS-4104
> URL: https://issues.apache.org/jira/browse/HDFS-4104
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-4104.txt
>
>
> Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It 
> should not generate any output due to missing files.  Alas, it prints an 
> error message when {{foo}} does not exist.
> {code}
> $ hdfs dfs -test -d foo; echo $?
> test: `foo': No such file or directory
> 1
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory

2012-10-22 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4104:


Attachment: hdfs-4104.txt

Remove code to print errors on file not found from {{dfs -test}}, for feature 
parity with /usr/bin/test.

The existing test frameworks don't provide a way to verify that stderr is 
clean, so no tests updated. 

> dfs -test -d prints inappropriate error on nonexistent directory
> 
>
> Key: HDFS-4104
> URL: https://issues.apache.org/jira/browse/HDFS-4104
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-4104.txt
>
>
> Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It 
> should not generate any output due to missing files.  Alas, it prints an 
> error message when {{foo}} does not exist.
> {code}
> $ hdfs dfs -test -d foo; echo $?
> test: `foo': No such file or directory
> 1
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper

2012-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481978#comment-13481978
 ] 

Hadoop QA commented on HDFS-4101:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550352/HDFS-4101-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3379//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3379//console

This message is automatically generated.

> ZKFC should implement zookeeper.recovery.retry like HBase to connect to 
> ZooKeeper
> -
>
> Key: HDFS-4101
> URL: https://issues.apache.org/jira/browse/HDFS-4101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: auto-failover, ha
>Affects Versions: 2.0.0-alpha, 3.0.0
> Environment: running CDH4.1.1
>Reporter: Damien Hardy
>Assignee: Damien Hardy
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-4101-1.patch
>
>
> When zkfc start and zookeeper is not yet started ZKFC fails and stop directly.
> Maybe ZKFC should allow some retries on Zookeeper services like does HBase 
> with zookeeper.recovery.retry
> This particularly appends when I start my whole cluster on VirtualBox for 
> example (every components nearly at the same time) ZKFC is the only that fail 
> and stop ... 
> Every others can wait each-others some time independently of the start order 
> like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the 
> system can be set and stable in few seconds

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory

2012-10-22 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4104:


Summary: dfs -test -d prints inappropriate error on nonexistent directory  
(was: dfs -test -d prints inappropriate error on nonexistent directory, and -z 
should be -s)

> dfs -test -d prints inappropriate error on nonexistent directory
> 
>
> Key: HDFS-4104
> URL: https://issues.apache.org/jira/browse/HDFS-4104
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
>
> Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It 
> should not generate any output due to missing files.  Alas, it prints an 
> error message when {{foo}} does not exist.
> {code}
> $ hdfs dfs -test -d foo; echo $?
> test: `foo': No such file or directory
> 1
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4106:


Attachment: HDFS-4106-trunk.001.patch

> BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be 
> declared as volatile
> --
>
> Key: HDFS-4106
> URL: https://issues.apache.org/jira/browse/HDFS-4106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-4106-trunk.001.patch
>
>
> All these variables may be assigned/read by a testing thread (through 
> BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
> they should be declared as volatile to make sure the "happens-before" 
> consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-22 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4106:
---

 Summary: BPServiceActor#lastHeartbeat, lastBlockReport and 
lastDeletedReport should be declared as volatile
 Key: HDFS-4106
 URL: https://issues.apache.org/jira/browse/HDFS-4106
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4106-trunk.001.patch

All these variables may be assigned/read by a testing thread (through 
BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
they should be declared as volatile to make sure the "happens-before" 
consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4080) Add an option to disable block-level state change logging

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481970#comment-13481970
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-4080:
--

Let's put the block-level log in BlockManager state change log (a new log).  
Then, it can be set independently of the NN state change log.

> Add an option to disable block-level state change logging
> -
>
> Key: HDFS-4080
> URL: https://issues.apache.org/jira/browse/HDFS-4080
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Kihwal Lee
>
> Although the block-level logging in namenode is useful for debugging, it can 
> add a significant overhead to busy hdfs clusters since they are done while 
> the namespace write lock is held. One example is shown in HDFS-4075. In this 
> example, the write lock was held for 5 minutes while logging 11 million log 
> messages for 5.5 million block invalidation events. 
> It will be useful if we have an option to disable these block-level log 
> messages and keep other state change messages going.  If others feel that 
> they can turned into DEBUG (with addition of isDebugEnabled() checks), that 
> may also work too, but there might be people depending on the messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab

2012-10-22 Thread Arpit Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated HDFS-4105:
--

Attachment: HDFS-4105.patch

patch for trunk.

> the SPNEGO user for secondary namenode should use the web keytab
> 
>
> Key: HDFS-4105
> URL: https://issues.apache.org/jira/browse/HDFS-4105
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.0.2-alpha
>Reporter: Arpit Gupta
>Assignee: Arpit Gupta
> Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch
>
>
> This is similar to HDFS-3466 where we made sure the namenode checks for the 
> web keytab before it uses the namenode keytab.
> The same needs to be done for secondary namenode as well.
> {code}
> String httpKeytab = 
>   conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY);
> if (httpKeytab != null && !httpKeytab.isEmpty()) {
>   params.put("kerberos.keytab", httpKeytab);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab

2012-10-22 Thread Arpit Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated HDFS-4105:
--

Status: Patch Available  (was: Open)

> the SPNEGO user for secondary namenode should use the web keytab
> 
>
> Key: HDFS-4105
> URL: https://issues.apache.org/jira/browse/HDFS-4105
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha, 1.1.0
>Reporter: Arpit Gupta
>Assignee: Arpit Gupta
> Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch
>
>
> This is similar to HDFS-3466 where we made sure the namenode checks for the 
> web keytab before it uses the namenode keytab.
> The same needs to be done for secondary namenode as well.
> {code}
> String httpKeytab = 
>   conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY);
> if (httpKeytab != null && !httpKeytab.isEmpty()) {
>   params.put("kerberos.keytab", httpKeytab);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory, and -z should be -s

2012-10-22 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4104:


Summary: dfs -test -d prints inappropriate error on nonexistent directory, 
and -z should be -s  (was: dfs -test -d prints inappropriate error on 
nonexistent directory)

> dfs -test -d prints inappropriate error on nonexistent directory, and -z 
> should be -s
> -
>
> Key: HDFS-4104
> URL: https://issues.apache.org/jira/browse/HDFS-4104
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
>
> Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It 
> should not generate any output due to missing files.  Alas, it prints an 
> error message when {{foo}} does not exist.
> {code}
> $ hdfs dfs -test -d foo; echo $?
> test: `foo': No such file or directory
> 1
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4056) Always start the NN's SecretManager

2012-10-22 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481942#comment-13481942
 ] 

Kan Zhang commented on HDFS-4056:
-

{quote}
bq. I don't see a use case where SIMPLE + SIMPLE and SIMPLE + TOKEN need to be 
enabled simultaneously
{quote}

My above comment should read "I don't see a use case where SIMPLE + SIMPLE and 
SIMPLE + TOKEN need to be CONFIGURED simultaneously."

bq. In the absence of a new config key, the ambiguity introduced by SIMPLE 
effectively allows token-free operation.

That's why I suggest earlier there should be 2 config keys, one for initial 
auth method and one for subsequent one. And if we have those configs, we can 
avoid the unnecessary issuing of NN tokens when SIMPLE + SIMPLE is configured.

> Always start the NN's SecretManager
> ---
>
> Key: HDFS-4056
> URL: https://issues.apache.org/jira/browse/HDFS-4056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-4056.patch
>
>
> To support the ability to use tokens regardless of whether kerberos is 
> enabled, the NN's secret manager should always be started.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab

2012-10-22 Thread Arpit Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated HDFS-4105:
--

Attachment: HDFS-4105.branch-1.patch

patch for branch-1

> the SPNEGO user for secondary namenode should use the web keytab
> 
>
> Key: HDFS-4105
> URL: https://issues.apache.org/jira/browse/HDFS-4105
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.0.2-alpha
>Reporter: Arpit Gupta
>Assignee: Arpit Gupta
> Attachments: HDFS-4105.branch-1.patch
>
>
> This is similar to HDFS-3466 where we made sure the namenode checks for the 
> web keytab before it uses the namenode keytab.
> The same needs to be done for secondary namenode as well.
> {code}
> String httpKeytab = 
>   conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY);
> if (httpKeytab != null && !httpKeytab.isEmpty()) {
>   params.put("kerberos.keytab", httpKeytab);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab

2012-10-22 Thread Arpit Gupta (JIRA)
Arpit Gupta created HDFS-4105:
-

 Summary: the SPNEGO user for secondary namenode should use the web 
keytab
 Key: HDFS-4105
 URL: https://issues.apache.org/jira/browse/HDFS-4105
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.2-alpha, 1.1.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta


This is similar to HDFS-3466 where we made sure the namenode checks for the web 
keytab before it uses the namenode keytab.

The same needs to be done for secondary namenode as well.

{code}
String httpKeytab = 
  conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY);
if (httpKeytab != null && !httpKeytab.isEmpty()) {
  params.put("kerberos.keytab", httpKeytab);
}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4056) Always start the NN's SecretManager

2012-10-22 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481913#comment-13481913
 ] 

Daryn Sharp commented on HDFS-4056:
---

bq. {quote}The NN does not allow a UGI auth of token to issue, renew, or cancel 
tokens.{quote}
bq.  Since only connections authenticated using the initial auth method(s) are 
allowed to fetch tokens (I assume we keep that behavior) [...]

Yes, that behavior has not changed.

bq.  [...] the server needs to be able to make a determination on whether a 
connection is authenticated as an initial connection or a subsequent one. 

I completely understand the point you are trying to make here.  With a secure 
cluster, a task (subsequent connection) must use DIGEST-MD5 with a token, else 
it will fail because it lacks a TGT for KERBEROS.  The distinction between 
initial and subsequent connection is unambiguous based on KERBEROS/DIGEST-MD5.  
That distinction will hold true for /DIGEST-MD5.

bq. I don't see a use case where SIMPLE + SIMPLE and SIMPLE + TOKEN need to be 
enabled simultaneously

SIMPLE is a special case where it's ambiguous if its an initial or subsequent 
connection.  The server has no way to know, so it's up to the client to "do the 
right thing".  This is where a conf setting, that the job submitter adds, would 
instruct the RPC client to only use tokens which would enforce SIMPLE + TOKEN.

bq. it is desirable to be able to turn off any token related stuff (we can do 
that today)

In the absence of a new config key, the ambiguity introduced by SIMPLE 
effectively allows token-free operation.

> Always start the NN's SecretManager
> ---
>
> Key: HDFS-4056
> URL: https://issues.apache.org/jira/browse/HDFS-4056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-4056.patch
>
>
> To support the ability to use tokens regardless of whether kerberos is 
> enabled, the NN's secret manager should always be started.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-4042) send Cache-Control header on JSP pages

2012-10-22 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned HDFS-4042:


Assignee: Alejandro Abdelnur

> send Cache-Control header on JSP pages
> --
>
> Key: HDFS-4042
> URL: https://issues.apache.org/jira/browse/HDFS-4042
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, name-node
>Affects Versions: 2.0.2-alpha
>Reporter: Andy Isaacson
>Assignee: Alejandro Abdelnur
>Priority: Minor
>
> We should send a Cache-Control header on JSP pages so that HTTP/1.1 compliant 
> caches can properly manage cached data.
> Currently our JSPs send:
> {noformat}
> % curl -v http://nn1:50070/dfshealth.jsp
> ...
> < HTTP/1.1 200 OK
> < Content-Type: text/html; charset=utf-8
> < Expires: Thu, 01-Jan-1970 00:00:00 GMT
> < Set-Cookie: JSESSIONID=xtblchjm7o7j1y1f33r0mpmqp;Path=/
> < Content-Length: 3651
> < Server: Jetty(6.1.26)
> {noformat}
> Based on a quick reading of RFC 2616 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html I think we want to 
> send {{Cache-Control: private, no-cache}} but I could be wrong.  The Jetty 
> docs http://docs.codehaus.org/display/JETTY/LastModifiedCacheControl indicate 
> this is fairly straightforward.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4042) send Cache-Control header on JSP pages

2012-10-22 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481909#comment-13481909
 ] 

Alejandro Abdelnur commented on HDFS-4042:
--

IMO all HTTP responses by Hadoop (web UI, JSON, WebHDFS, HFTP, etc) should set 
headers to disable caching because all these resources are dynamic by natur. On 
WebHDFS and HFTP specifically, it could be argued that file contents could be 
cached, but I'd say that proxies will most likely ignore caching those 
resources due their size. Not to mentioned security implications, like 
permissions.

> send Cache-Control header on JSP pages
> --
>
> Key: HDFS-4042
> URL: https://issues.apache.org/jira/browse/HDFS-4042
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, name-node
>Affects Versions: 2.0.2-alpha
>Reporter: Andy Isaacson
>Priority: Minor
>
> We should send a Cache-Control header on JSP pages so that HTTP/1.1 compliant 
> caches can properly manage cached data.
> Currently our JSPs send:
> {noformat}
> % curl -v http://nn1:50070/dfshealth.jsp
> ...
> < HTTP/1.1 200 OK
> < Content-Type: text/html; charset=utf-8
> < Expires: Thu, 01-Jan-1970 00:00:00 GMT
> < Set-Cookie: JSESSIONID=xtblchjm7o7j1y1f33r0mpmqp;Path=/
> < Content-Length: 3651
> < Server: Jetty(6.1.26)
> {noformat}
> Based on a quick reading of RFC 2616 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html I think we want to 
> send {{Cache-Control: private, no-cache}} but I could be wrong.  The Jetty 
> docs http://docs.codehaus.org/display/JETTY/LastModifiedCacheControl indicate 
> this is fairly straightforward.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4084) provide CLI support for allow and disallow snapshot on a directory

2012-10-22 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-4084:
-

Attachment: HDFS-4084.patch

Re-based the patch and addressed Arpit's comment.

> provide CLI support for allow and disallow snapshot on a directory
> --
>
> Key: HDFS-4084
> URL: https://issues.apache.org/jira/browse/HDFS-4084
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node, tools
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-4084.patch, HDFS-4084.patch
>
>
> To provide CLI support to allow snapshot, disallow snapshot on a directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481906#comment-13481906
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2802:
--

{quote}
> I don't think that a solution which copies all of the files/directories 
> that are being snapshotted should be merged to trunk.

I disagree. But we may just end up doing optimization prior to merge.
{quote}
I understand that O(1) snapshot creation is desirable but I don't think it is a 
requirement.  Currently, the snapshot feature is missing in HDFS.  Having 
snapshot with O(N) snapshot creation already opens the door of opportunity to 
many applications.  Only the applications requiring O(1) snapshot creation 
should wait for the improvement but not everyone has to wait for it.

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4097) provide CLI support for create/delete/list snapshots

2012-10-22 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-4097:
-

Attachment: HDFS-4097.patch

Thanks. New patch is uploaded.

> provide CLI support for create/delete/list snapshots
> 
>
> Key: HDFS-4097
> URL: https://issues.apache.org/jira/browse/HDFS-4097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client, name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Attachments: HDFS-4097.patch, HDFS-4097.patch
>
>
> provide CLI support for create/delete/list snapshots

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481853#comment-13481853
 ] 

Suresh Srinivas commented on HDFS-2802:
---

bq. Great, I'm glad you agree.
I have agreed many times :-) 
[here|https://issues.apache.org/jira/browse/HDFS-2802?focusedCommentId=13480547&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13480547]
 and 
[here|https://issues.apache.org/jira/browse/HDFS-2802?focusedCommentId=13481632&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13481632].
 I am happy to get beyond that issue :-) Thanks for the offer to help. I think 
for now we are on a path to get it done very soon.

bq. To be completely clear, if it makes sense to implement a copying solution 
as an interim development stage
Sounds good. I misunderstood you to be saying opposite of that 
[here|https://issues.apache.org/jira/browse/HDFS-2802?focusedCommentId=13481682&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13481682]

bq. I don't think that a solution which copies all of the files/directories 
that are being snapshotted should be merged to trunk.
I disagree. But we may just end up doing optimization prior to merge.

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4045) SecondaryNameNode cannot read from QuorumJournal URI

2012-10-22 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4045:


Status: Patch Available  (was: Open)

It took a fair bit of work to generalize the getRemoteEdits and streaming code, 
but this implementation seems to work OK and passes my initial sniff test. I'm 
skeptical that streaming the edits from the JournalNode through the NameNode to 
the SNN is a robust solution both in error-resilience and performance, but I 
think it's time to post a rough cut and see how it looks.

> SecondaryNameNode cannot read from QuorumJournal URI
> 
>
> Key: HDFS-4045
> URL: https://issues.apache.org/jira/browse/HDFS-4045
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0
>Reporter: Vinithra Varadharajan
>Assignee: Andy Isaacson
> Attachments: hdfs-4045.txt
>
>
> If HDFS is set up in basic mode (non-HA) with QuorumJournal, and the 
> dfs.namenode.edits.dir is set to only the QuorumJournal URI and no local dir, 
> the SecondaryNameNode is unable to do a checkpoint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4045) SecondaryNameNode cannot read from QuorumJournal URI

2012-10-22 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4045:


Attachment: hdfs-4045.txt

> SecondaryNameNode cannot read from QuorumJournal URI
> 
>
> Key: HDFS-4045
> URL: https://issues.apache.org/jira/browse/HDFS-4045
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0
>Reporter: Vinithra Varadharajan
>Assignee: Andy Isaacson
> Attachments: hdfs-4045.txt
>
>
> If HDFS is set up in basic mode (non-HA) with QuorumJournal, and the 
> dfs.namenode.edits.dir is set to only the QuorumJournal URI and no local dir, 
> the SecondaryNameNode is unable to do a checkpoint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory

2012-10-22 Thread Andy Isaacson (JIRA)
Andy Isaacson created HDFS-4104:
---

 Summary: dfs -test -d prints inappropriate error on nonexistent 
directory
 Key: HDFS-4104
 URL: https://issues.apache.org/jira/browse/HDFS-4104
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor


Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It 
should not generate any output due to missing files.  Alas, it prints an error 
message when {{foo}} does not exist.

{code}
$ hdfs dfs -test -d foo; echo $?
test: `foo': No such file or directory
1
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper

2012-10-22 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-4101:


Assignee: Damien Hardy

> ZKFC should implement zookeeper.recovery.retry like HBase to connect to 
> ZooKeeper
> -
>
> Key: HDFS-4101
> URL: https://issues.apache.org/jira/browse/HDFS-4101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: auto-failover, ha
>Affects Versions: 2.0.0-alpha, 3.0.0
> Environment: running CDH4.1.1
>Reporter: Damien Hardy
>Assignee: Damien Hardy
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-4101-1.patch
>
>
> When zkfc start and zookeeper is not yet started ZKFC fails and stop directly.
> Maybe ZKFC should allow some retries on Zookeeper services like does HBase 
> with zookeeper.recovery.retry
> This particularly appends when I start my whole cluster on VirtualBox for 
> example (every components nearly at the same time) ZKFC is the only that fail 
> and stop ... 
> Every others can wait each-others some time independently of the start order 
> like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the 
> system can be set and stable in few seconds

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481840#comment-13481840
 ] 

Aaron T. Myers commented on HDFS-2802:
--

bq. That is the goal.

Great, I'm glad you agree. I'd be happy to personally work on a more time/space 
efficient snapshot solution.

bq. Where I disagree is with the assertion that we cannot get to it starting 
with a simple implementation. I would like to get the simple implementation 
done ASAP with all the tests developed in parallel. Then we will change the 
implementation for better efficiency both in time and space. Tests will help 
ensure the correctness of optimizations.

I suspect you'll find that implementing the copying solution will not be much 
easier than a zero-copy/COW solution, and so seems to me like implementing the 
copying solution will result in a lot of wasted development/testing work.

To be completely clear, if it makes sense to implement a copying solution as an 
interim development stage, then that's fine, but I don't think that a solution 
which copies all of the files/directories that are being snapshotted should be 
merged to trunk.

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3649) Port HDFS-385 to branch-1-win

2012-10-22 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-3649:
--

Fix Version/s: 1-win

> Port HDFS-385 to branch-1-win
> -
>
> Key: HDFS-3649
> URL: https://issues.apache.org/jira/browse/HDFS-3649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1-win
>Reporter: Sumadhur Reddy Bolli
>Assignee: Sumadhur Reddy Bolli
> Fix For: 1-win
>
>
> Added patch to HDF-385 to port the existing pluggable placement policy to 
> branch-1-win

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper

2012-10-22 Thread Damien Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Hardy updated HDFS-4101:
---

Release Note: Make ZK connection retries also available at startup.
  Status: Patch Available  (was: Open)

> ZKFC should implement zookeeper.recovery.retry like HBase to connect to 
> ZooKeeper
> -
>
> Key: HDFS-4101
> URL: https://issues.apache.org/jira/browse/HDFS-4101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: auto-failover, ha
>Affects Versions: 2.0.0-alpha, 3.0.0
> Environment: running CDH4.1.1
>Reporter: Damien Hardy
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-4101-1.patch
>
>
> When zkfc start and zookeeper is not yet started ZKFC fails and stop directly.
> Maybe ZKFC should allow some retries on Zookeeper services like does HBase 
> with zookeeper.recovery.retry
> This particularly appends when I start my whole cluster on VirtualBox for 
> example (every components nearly at the same time) ZKFC is the only that fail 
> and stop ... 
> Every others can wait each-others some time independently of the start order 
> like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the 
> system can be set and stable in few seconds

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper

2012-10-22 Thread Damien Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Hardy updated HDFS-4101:
---

Attachment: HDFS-4101-1.patch

First proposition for patch
  * use existing code implementing retries 
  * existing hardcode limit as NUM_RETRIES = 3 

> ZKFC should implement zookeeper.recovery.retry like HBase to connect to 
> ZooKeeper
> -
>
> Key: HDFS-4101
> URL: https://issues.apache.org/jira/browse/HDFS-4101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: auto-failover, ha
>Affects Versions: 2.0.0-alpha, 3.0.0
> Environment: running CDH4.1.1
>Reporter: Damien Hardy
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-4101-1.patch
>
>
> When zkfc start and zookeeper is not yet started ZKFC fails and stop directly.
> Maybe ZKFC should allow some retries on Zookeeper services like does HBase 
> with zookeeper.recovery.retry
> This particularly appends when I start my whole cluster on VirtualBox for 
> example (every components nearly at the same time) ZKFC is the only that fail 
> and stop ... 
> Every others can wait each-others some time independently of the start order 
> like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the 
> system can be set and stable in few seconds

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3990) NN's health report has severe performance problems

2012-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481804#comment-13481804
 ] 

Hadoop QA commented on HDFS-3990:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550340/HDFS-3990.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3378//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3378//console

This message is automatically generated.

> NN's health report has severe performance problems
> --
>
> Key: HDFS-3990
> URL: https://issues.apache.org/jira/browse/HDFS-3990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-3990.branch-0.23.patch, HDFS-3990.patch, 
> HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, 
> HDFS-3990.patch, HDFS-3990.patch, hdfs-3990.txt, hdfs-3990.txt
>
>
> The dfshealth page will place a read lock on the namespace while it does a 
> dns lookup for every DN.  On a multi-thousand node cluster, this often 
> results in 10s+ load time for the health page.  10 concurrent requests were 
> found to cause 7m+ load times during which time write operations blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4056) Always start the NN's SecretManager

2012-10-22 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481799#comment-13481799
 ] 

Kan Zhang commented on HDFS-4056:
-

{quote}
bq. Bottom line is the server should always be able to figure out by itself 
whether a connection is an initial connection or a subsequent one, based on the 
auth method (and type of credentials) used, since it needs to decide on whether 
tokens can be issued for that connection.

The server already uses the auth the client sends in the rpc connection header 
to determine the sasl method the client wants to use. The auth to the server 
then determines the UGI's auth. The NN does not allow a UGI auth of token to 
issue, renew, or cancel tokens.
{quote}

I don't think you get my point. It was a general comment. Since only 
connections authenticated using the initial auth method(s) are allowed to fetch 
tokens (I assume we keep that behavior), the server needs to be able to make a 
determination on whether a connection is authenticated as an initial connection 
or a subsequent one. For example, if we were to support SIMPLE + TOKEN and 
SIMPLE + SIMPLE simultaneously (I think not), how could the server decide a 
connection authenticated with SIMPLE to be an initial connection or not?

bq. If we want to allow compatibility with older clients, then both SIMPLE + 
SIMPLE and SIMPLE + TOKEN must both be supported. Enabling the option of SIMPLE 
+ TOKEN means we need the secret manager enabled which is the aim of this patch.

I don't see a use case where SIMPLE + SIMPLE and SIMPLE + TOKEN need to be 
enabled simultaneously. Can you elaborate? On the other hand, in the SIMPLE + 
SIMPLE use case I explained above, it is desirable to be able to turn off any 
token related stuff (we can do that today).

> Always start the NN's SecretManager
> ---
>
> Key: HDFS-4056
> URL: https://issues.apache.org/jira/browse/HDFS-4056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-4056.patch
>
>
> To support the ability to use tokens regardless of whether kerberos is 
> enabled, the NN's secret manager should always be started.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4091) Add snapshot quota to limit the number of snapshots

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4091:
-

Description: Once a directory has been set to snapshottable, users could 
create snapshots of the directories.  In this JIRA, we add a quota to 
snapshottable directories.  The quota is set by admin in order to limit the 
number of snapshots allowed.  (was: For each snapshottable directory, add a 
quota to limit the number of snapshots of the directory.)

> Add snapshot quota to limit the number of snapshots
> ---
>
> Key: HDFS-4091
> URL: https://issues.apache.org/jira/browse/HDFS-4091
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h4091_20121021.patch
>
>
> Once a directory has been set to snapshottable, users could create snapshots 
> of the directories.  In this JIRA, we add a quota to snapshottable 
> directories.  The quota is set by admin in order to limit the number of 
> snapshots allowed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4091) Add snapshot quota to limit the number of snapshots

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481787#comment-13481787
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-4091:
--

Sure, updated the description.

> Add snapshot quota to limit the number of snapshots
> ---
>
> Key: HDFS-4091
> URL: https://issues.apache.org/jira/browse/HDFS-4091
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h4091_20121021.patch
>
>
> Once a directory has been set to snapshottable, users could create snapshots 
> of the directories.  In this JIRA, we add a quota to snapshottable 
> directories.  The quota is set by admin in order to limit the number of 
> snapshots allowed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4091) Add snapshot quota to limit the number of snapshots

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481765#comment-13481765
 ] 

Suresh Srinivas commented on HDFS-4091:
---

Can you please add more details to the description?

> Add snapshot quota to limit the number of snapshots
> ---
>
> Key: HDFS-4091
> URL: https://issues.apache.org/jira/browse/HDFS-4091
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h4091_20121021.patch
>
>
> For each snapshottable directory, add a quota to limit the number of 
> snapshots of the directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4091) Add snapshot quota to limit the number of snapshots

2012-10-22 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-4091:
--

Description: For each snapshottable directory, add a quota to limit the 
number of snapshots of the directory.  (was: For each snapshottable directory, 
add a quote to limit the number of snapshots of the directory.)

> Add snapshot quota to limit the number of snapshots
> ---
>
> Key: HDFS-4091
> URL: https://issues.apache.org/jira/browse/HDFS-4091
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h4091_20121021.patch
>
>
> For each snapshottable directory, add a quota to limit the number of 
> snapshots of the directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481758#comment-13481758
 ] 

Suresh Srinivas commented on HDFS-2802:
---

bq. This sort of fundamental design decision is not something that can be 
easily improved incrementally. Copying huge portions of the working set, and 
then making that copying fast and space efficient, should not be the goal. The 
goal should be to entirely avoid copying huge portions of the working set.
That is the goal. Where I disagree is with the assertion that we cannot get to 
it starting with a simple implementation. I would like to get the simple 
implementation done ASAP with all the tests developed in parallel. Then we will 
change the implementation for better efficiency both in time and space. Tests 
will help ensure the correctness of optimizations. 

We already have a bunch of ideas on how to do it. Some of those ideas are in 
the early prototype that Hari has uploaded. HDFS-4103 has been created to allay 
these concerns that have been repeatedly brought up to convey that it is goal 
of HDFS-2802 to have optimized snapshots. Will update the design document as a 
part of that jira.

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-4103) Support O(1) snapshot creation

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE reassigned HDFS-4103:


Assignee: Tsz Wo (Nicholas), SZE

> Support O(1) snapshot creation
> --
>
> Key: HDFS-4103
> URL: https://issues.apache.org/jira/browse/HDFS-4103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> In our first snapshot implementation, snapshot creation runs in O(N) and 
> occupies O(N) memory space, where N = # files + # directories + # symlinks in 
> the snapshot.  The advantages of the implementation are that there is no 
> additional cost for the modifications after snapshots are created, and it 
> leads to a simple implementation.
> In this JIRA, we optimize snapshot creation to O(1) although it introduces 
> additional cost in the modifications after snapshots are created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4103) Support O(1) snapshot creation

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-4103:


 Summary: Support O(1) snapshot creation
 Key: HDFS-4103
 URL: https://issues.apache.org/jira/browse/HDFS-4103
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE


In our first snapshot implementation, snapshot creation runs in O(N) and 
occupies O(N) memory space, where N = # files + # directories + # symlinks in 
the snapshot.  The advantages of the implementation are that there is no 
additional cost for the modifications after snapshots are created, and it leads 
to a simple implementation.

In this JIRA, we optimize snapshot creation to O(1) although it introduces 
additional cost in the modifications after snapshots are created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481747#comment-13481747
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2802:
--

> ... but it is certainly possible to have an O(1) solution in terms of the 
> number of files/directories that are not modified. ...

I think the meaning of O(1) above is a very special use of the big-O notation.  
I would rather keep saying 

- O(N), where N = # files + # directories + # symlinks; and
- O(M), where M = # modified files + # modified directories + # modified 
symlinks.

I agree that O(M) implementation (i.e. include O(1) snapshot creation) is our 
end goal.

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3990) NN's health report has severe performance problems

2012-10-22 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3990:
--

Attachment: HDFS-3990.patch

No longer init {{peerHostName}} to the DN's registration hostname.  Check for 
null when building list of node names to filter.

I again looked into removing the null check on {{Server.getRemoteAddress}}.  
The tests that call directly into the rpc server object, rather than via a 
connection, appear to be passing mock dn registrations.  So the majority of 
functional tests are matching real cluster behavior.

I tried having the rpc server set the ip/peerHostName but some of the tests are 
verifying the layout and version checks work.  So I tried to push those down 
into the {{FSNamesystem#registerDatanode}} but that method isn't exposed for 
the tests to call.

If this patch is ok, I'll update the 23 patch.

> NN's health report has severe performance problems
> --
>
> Key: HDFS-3990
> URL: https://issues.apache.org/jira/browse/HDFS-3990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-3990.branch-0.23.patch, HDFS-3990.patch, 
> HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, 
> HDFS-3990.patch, HDFS-3990.patch, hdfs-3990.txt, hdfs-3990.txt
>
>
> The dfshealth page will place a read lock on the namespace while it does a 
> dns lookup for every DN.  On a multi-thousand node cluster, this often 
> results in 10s+ load time for the health page.  10 concurrent requests were 
> found to cause 7m+ load times during which time write operations blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481682#comment-13481682
 ] 

Aaron T. Myers commented on HDFS-2802:
--

bq. Why is starting with simple implementation and then optimizing it later not 
a choice?

This sort of fundamental design decision is not something that can be easily 
improved incrementally. Copying huge portions of the working set, and then 
making that copying fast and space efficient, should not be the goal. The goal 
should be to entirely avoid copying huge portions of the working set.

bq. O(1) memory usage in general does not not seem possible since the original 
files/directories could be modified. So the best case is O(N) memory usage in 
general. However, it is possible to have O(1) memory usage at snapshot creation.

I agree that it's not possible to have an O(1) solution in terms of the number 
of files/directories that are modified, but it is certainly possible to have an 
O(1) solution in terms of the number of files/directories that are _not_ 
modified. That's the issue I'm concerned about, and as far as I can tell is not 
what is proposed by this design document.

bq. For small subtrees, i.e. when N is small, it does not matter if it is O(1) 
or O(N). Such snapshot feature already benefits many applications. So we are 
going to implement O(N) snapshot creation in the first phase and then 
optimization it later.

What are the use cases for taking snapshots of small subtrees? An 
implementation that is suitable only for small subtrees is probably impractical 
to snapshot an HBase root directory, a Hive warehouse, or most /user 
directories that I'm aware of. You'd also presumably want to keep at least a 
handful (10s?) of snapshots available, so any small subtree that could be 
snapshotted must be multiplied by ~10 to consider its snapshot size. Note that 
the design document also explicitly states that it should be possible to take a 
snapshot of the root of the file system.

bq. Then, we could have the snapshot feature out early instead of spending a 
long time to come up a complicated design and implementation. A complicated 
design also increases the risk of bugs in the implementation.

I'm all for a simple design, but the design must also meet the stated 
requirements. The design document states that it should be possible to create a 
snapshot of the root of the file system, but I don't think the proposed design 
can do such a thing.

Suresh had previously said that "May be the design document is fairly early and 
might have misled you. That is not the goal. The goal is to have efficient 
implementation." If we're on the same page that the end goal is an O(1) 
implementation, in terms of the number of files that are not modified between 
snapshots, in this branch then we can move on.

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481669#comment-13481669
 ] 

Hari Mankude commented on HDFS-2802:


Nicholas is right in that we do start off with O(1) memory usage. But depending 
on writes and updates on the base filesystem, memory usage for snapshot will 
increase. The worst case is when an application updates all the files in the 
snapshotted subtree. Even in this scenario, the snap inodes are minimized 
versions of the actual file inode and retain only the relevant information for 
snapshots. Additionally (in the prototype), if multiple snapshots are taken of 
the same subtree, then significant optimizations are done to reduce the memory 
footprint by representing more than one snapshot in a single snapINode. 

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4087) Protocol changes for listSnapshots functionality

2012-10-22 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481667#comment-13481667
 ] 

Brandon Li commented on HDFS-4087:
--

Thanks Aaron! I will update the javadoc along with the next patch.

> Protocol changes for listSnapshots functionality
> 
>
> Key: HDFS-4087
> URL: https://issues.apache.org/jira/browse/HDFS-4087
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: HDFS-4087.patch, HDFS-4087.patch, HDFS-4087.patch, 
> HDFS-4087.patch
>
>
> SnapInfo saves information about a snapshot. This jira also updates the java 
> protocol classes and translation for listSnapshot operation.
> Given a snapshot root, the snapshots create under it can be listed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481643#comment-13481643
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2802:
--

O(1) memory usage in general does not not seem possible since the original 
files/directories could be modified.  So the best case is O(N) memory usage in 
general.  However, it is possible to have O(1) memory usage at snapshot 
creation.

For small subtrees, i.e. when N is small, it does not matter if it is O(1) or 
O(N).  Such snapshot feature already benefits many applications.  So we are 
going to implement O(N) snapshot creation in the first phase and then 
optimization it later.  Then, we could have the snapshot feature out early 
instead of spending a long time to come up a complicated design and 
implementation.  A complicated design also increases the risk of bugs in the 
implementation.

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481632#comment-13481632
 ] 

Suresh Srinivas commented on HDFS-2802:
---

bq. Instead of implementing an inefficient design and then optimizing it, we 
should come up with and implement an efficient design.
Why is starting with simple implementation and then optimizing it later not a 
choice?

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4078) Handle replication in snapshots

2012-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481622#comment-13481622
 ] 

Aaron T. Myers commented on HDFS-4078:
--

Ah, didn't notice that label, and wasn't aware of it's use. Usually we just 
comment that tests will be added in a subsequent JIRA and link to said JIRA.

> Handle replication in snapshots
> ---
>
> Key: HDFS-4078
> URL: https://issues.apache.org/jira/browse/HDFS-4078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>  Labels: needs-test
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: h4078_20121021.patch
>
>
> Without snapshots, file replication is the same as block replication.
> With snapshot, the file replication R_o of the original file and the file 
> replication R_s of the snapshot file could possibly be different.  Since the 
> blocks are shared between the original file and the snapshot file, block 
> replication is max(R_o, R_s).  If there are more than one snapshots, block 
> replication is the max file replication of the original file and all snapshot 
> files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4078) Handle replication in snapshots

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481620#comment-13481620
 ] 

Suresh Srinivas commented on HDFS-4078:
---

bq. I would really have preferred to see some automated tests included with 
this change, as an incorrect implementation of this could result in data loss 
from snapshots. At least, a follow-up JIRA should be filed to add some 
automated tests for this functionality.
Aaron, please see the label "needs-test" all these jiras will need subsequent 
tests.

> Handle replication in snapshots
> ---
>
> Key: HDFS-4078
> URL: https://issues.apache.org/jira/browse/HDFS-4078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>  Labels: needs-test
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: h4078_20121021.patch
>
>
> Without snapshots, file replication is the same as block replication.
> With snapshot, the file replication R_o of the original file and the file 
> replication R_s of the snapshot file could possibly be different.  Since the 
> blocks are shared between the original file and the snapshot file, block 
> replication is max(R_o, R_s).  If there are more than one snapshots, block 
> replication is the max file replication of the original file and all snapshot 
> files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481617#comment-13481617
 ] 

Suresh Srinivas commented on HDFS-4046:
---

@Binglin are you planning to contribute the c library based on this work to 
Apache Hadoop?

> ChecksumTypeProto use NULL as enum value which is illegal in C/C++
> --
>
> Key: HDFS-4046
> URL: https://issues.apache.org/jira/browse/HDFS-4046
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
> Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, 
> HDFS-4046-ChecksumType-NULL.patch
>
>
> I tried to write a native hdfs client using protobuf based protocol, when I 
> generate c++ code using hdfs.proto, the generated file can not compile, 
> because NULL is an already defined macro.
> I am thinking two solutions:
> 1. refactor all DataChecksum.Type.NULL references to NONE, which should be 
> fine for all languages, but this may breaking compatibility.
> 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use 
> enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto 
> and DataChecksum.Type, and make sure enum integer values are match(currently 
> already match).
> I can make a patch for solution 2.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++

2012-10-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481615#comment-13481615
 ] 

Suresh Srinivas commented on HDFS-4046:
---

bq. It shouldn't as long as the enum ordering does not change.
Kihwal, my question was meant to say the change is backward compatible :-)

I would suggest adding the enum name or prefix based on the context at the 
beginning like say STATUS_SUCCESS etc. 

> ChecksumTypeProto use NULL as enum value which is illegal in C/C++
> --
>
> Key: HDFS-4046
> URL: https://issues.apache.org/jira/browse/HDFS-4046
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
> Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, 
> HDFS-4046-ChecksumType-NULL.patch
>
>
> I tried to write a native hdfs client using protobuf based protocol, when I 
> generate c++ code using hdfs.proto, the generated file can not compile, 
> because NULL is an already defined macro.
> I am thinking two solutions:
> 1. refactor all DataChecksum.Type.NULL references to NONE, which should be 
> fine for all languages, but this may breaking compatibility.
> 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use 
> enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto 
> and DataChecksum.Type, and make sure enum integer values are match(currently 
> already match).
> I can make a patch for solution 2.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2802:
-

Tags:   (was: 1)
Release Note:   (was: 1)

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4102) Test replication with snapshots

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-4102:


 Summary: Test replication with snapshots
 Key: HDFS-4102
 URL: https://issues.apache.org/jira/browse/HDFS-4102
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


Replication becomes more tricky with snapshots since the original file and 
snapshot files could possibly be set to different replication while the blocks 
are share among those files.  This JIRA is to add more tests for replication 
with snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4078) Handle replication in snapshots

2012-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481610#comment-13481610
 ] 

Aaron T. Myers commented on HDFS-4078:
--

Sounds good. Thanks, Nicholas.

> Handle replication in snapshots
> ---
>
> Key: HDFS-4078
> URL: https://issues.apache.org/jira/browse/HDFS-4078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>  Labels: needs-test
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: h4078_20121021.patch
>
>
> Without snapshots, file replication is the same as block replication.
> With snapshot, the file replication R_o of the original file and the file 
> replication R_s of the snapshot file could possibly be different.  Since the 
> blocks are shared between the original file and the snapshot file, block 
> replication is max(R_o, R_s).  If there are more than one snapshots, block 
> replication is the max file replication of the original file and all snapshot 
> files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4078) Handle replication in snapshots

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481605#comment-13481605
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-4078:
--

Hi Aaron, we are going to add tests in some future JIRAs.  Let me create one 
for replication now.

> Handle replication in snapshots
> ---
>
> Key: HDFS-4078
> URL: https://issues.apache.org/jira/browse/HDFS-4078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>  Labels: needs-test
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: h4078_20121021.patch
>
>
> Without snapshots, file replication is the same as block replication.
> With snapshot, the file replication R_o of the original file and the file 
> replication R_s of the snapshot file could possibly be different.  Since the 
> blocks are shared between the original file and the snapshot file, block 
> replication is max(R_o, R_s).  If there are more than one snapshots, block 
> replication is the max file replication of the original file and all snapshot 
> files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481602#comment-13481602
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-4046:
--

Would NONE be a reserved word in some other languages?  How about adding a 
prefix such as PB_NULL or HADOOP_NULL?  Then the same prefix can be used for 
other values.

> ChecksumTypeProto use NULL as enum value which is illegal in C/C++
> --
>
> Key: HDFS-4046
> URL: https://issues.apache.org/jira/browse/HDFS-4046
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
> Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, 
> HDFS-4046-ChecksumType-NULL.patch
>
>
> I tried to write a native hdfs client using protobuf based protocol, when I 
> generate c++ code using hdfs.proto, the generated file can not compile, 
> because NULL is an already defined macro.
> I am thinking two solutions:
> 1. refactor all DataChecksum.Type.NULL references to NONE, which should be 
> fine for all languages, but this may breaking compatibility.
> 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use 
> enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto 
> and DataChecksum.Type, and make sure enum integer values are match(currently 
> already match).
> I can make a patch for solution 2.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4099) Clean up replication code and add more javadoc

2012-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481597#comment-13481597
 ] 

Hudson commented on HDFS-4099:
--

Integrated in Hadoop-trunk-Commit #2907 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2907/])
HDFS-4099. Clean up replication code and add more javadoc. (Revision 
1400986)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1400986
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> Clean up replication code and add more javadoc
> --
>
> Key: HDFS-4099
> URL: https://issues.apache.org/jira/browse/HDFS-4099
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: 2.0.3-alpha
>
> Attachments: h4099_20121021.patch
>
>
> - FSNamesystem.checkReplicationFactor(..) should be combined with 
> BlockManager.checkReplication(..).
> - Add javadoc to the replication related method in BlockManager.
> - Also clean up those methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4099) Clean up replication code and add more javadoc

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4099:
-

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha
   Status: Resolved  (was: Patch Available)

I did not add new tests since the changes are simple.

I have committed this.

> Clean up replication code and add more javadoc
> --
>
> Key: HDFS-4099
> URL: https://issues.apache.org/jira/browse/HDFS-4099
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: 2.0.3-alpha
>
> Attachments: h4099_20121021.patch
>
>
> - FSNamesystem.checkReplicationFactor(..) should be combined with 
> BlockManager.checkReplication(..).
> - Add javadoc to the replication related method in BlockManager.
> - Also clean up those methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4087) Protocol changes for listSnapshots functionality

2012-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481593#comment-13481593
 ] 

Aaron T. Myers commented on HDFS-4087:
--

The patch committed also includes a copy/pasted comment that isn't 
accurate/relevant:
{code}
+/**
+ * Interface that represents the over the wire information for a file.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
+public class SnapshotInfo {
{code}

> Protocol changes for listSnapshots functionality
> 
>
> Key: HDFS-4087
> URL: https://issues.apache.org/jira/browse/HDFS-4087
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Brandon Li
>  Labels: needs-test
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: HDFS-4087.patch, HDFS-4087.patch, HDFS-4087.patch, 
> HDFS-4087.patch
>
>
> SnapInfo saves information about a snapshot. This jira also updates the java 
> protocol classes and translation for listSnapshot operation.
> Given a snapshot root, the snapshots create under it can be listed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4075) Reduce recommissioning overhead

2012-10-22 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481587#comment-13481587
 ] 

Ravi Prakash commented on HDFS-4075:


Hi Kihwal!

The log message is

bq. LOG.info("Invalidated" + numOverReplicated + " over-replicated blocks on " +
bq.srcNode + " during recommissioning");

which might mislead me to believe that the block invalidated was on srcNode, 
when it could be any one of the 4 nodes. Maybe something to the effect 
"Recommissioning of srcNode led to numOverReplicated over-replicated blocks to 
be invalidated"? 

Can you please also explain the change in DatanodeManager.java in this patch? 
node.isAlive will be updated only when the node heartbeats in. So when will 
blockManager.processOverReplicatedBlocksOnReCommission(node);
be called?



> Reduce recommissioning overhead
> ---
>
> Key: HDFS-4075
> URL: https://issues.apache.org/jira/browse/HDFS-4075
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.4, 2.0.2-alpha
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: hdfs-4075.patch
>
>
> When datanodes are recommissioned, 
> {BlockManager#processOverReplicatedBlocksOnReCommission()} is called for each 
> rejoined node and excess blocks are added to the invalidate list. The problem 
> is this is done while the namesystem write lock is held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4099) Clean up replication code and add more javadoc

2012-10-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4099:
-

Priority: Minor  (was: Major)
Hadoop Flags: Reviewed

Thanks Suresh and Uma for reviewing the patch.

> Clean up replication code and add more javadoc
> --
>
> Key: HDFS-4099
> URL: https://issues.apache.org/jira/browse/HDFS-4099
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: h4099_20121021.patch
>
>
> - FSNamesystem.checkReplicationFactor(..) should be combined with 
> BlockManager.checkReplication(..).
> - Add javadoc to the replication related method in BlockManager.
> - Also clean up those methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481571#comment-13481571
 ] 

Aaron T. Myers commented on HDFS-2802:
--

bq. Aaron, by O(N) where N=# of files + # of directories, I guess you mean O(N) 
snapshot creation time and O(N) memory usage at snapshot creation. Snapshot 
creation can be optimized by lazy INode creation. No INode is created at 
snapshot creation time. Only the INode modified after snapshot will be created. 
Then it becomes O(1) snapshot creation time and O(1) memory usage at snapshot 
creation. The design does not exclude this optimization.

On page 7 the design document says the following:

{quote}
The memory usage is linear to the number of INode snapped because it copies all 
INodes when a snapshot is created.
{quote}

Then later on page 7 the design document goes on to discuss how this might be 
optimized by supporting offline snapshots on disk (i.e. get the snapshot 
metadata out of the NN's heap), and performing the copying of all the INodes in 
parallel using several threads.

This is what I am referring to. I fully support a design which implements O(1) 
snapshot creation time and O(1) memory usage, but the current proposed design 
does not describe such a thing. Instead of implementing an inefficient design 
and then optimizing it, we should come up with and implement an efficient 
design.

> Support for RW/RO snapshots in HDFS
> ---
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Reporter: Hari Mankude
>Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper

2012-10-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481565#comment-13481565
 ] 

Steve Loughran commented on HDFS-4101:
--

the good news is: it'll be easy to test

> ZKFC should implement zookeeper.recovery.retry like HBase to connect to 
> ZooKeeper
> -
>
> Key: HDFS-4101
> URL: https://issues.apache.org/jira/browse/HDFS-4101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: auto-failover, ha
>Affects Versions: 2.0.0-alpha, 3.0.0
> Environment: running CDH4.1.1
>Reporter: Damien Hardy
>Priority: Minor
>  Labels: newbie
>
> When zkfc start and zookeeper is not yet started ZKFC fails and stop directly.
> Maybe ZKFC should allow some retries on Zookeeper services like does HBase 
> with zookeeper.recovery.retry
> This particularly appends when I start my whole cluster on VirtualBox for 
> example (every components nearly at the same time) ZKFC is the only that fail 
> and stop ... 
> Every others can wait each-others some time independently of the start order 
> like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the 
> system can be set and stable in few seconds

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4078) Handle replication in snapshots

2012-10-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481562#comment-13481562
 ] 

Aaron T. Myers commented on HDFS-4078:
--

I would really have preferred to see some automated tests included with this 
change, as an incorrect implementation of this could result in data loss from 
snapshots. At least, a follow-up JIRA should be filed to add some automated 
tests for this functionality.

In general I've seen a bunch of the commits to the HDFS-2802 branch include no 
automated tests whatsoever. I realize that it's sometimes difficult to write 
automated tests at the beginning of a big project, before all of the 
scaffolding is in place, but I hope that there's a plan to have comprehensive 
test coverage of this work.

> Handle replication in snapshots
> ---
>
> Key: HDFS-4078
> URL: https://issues.apache.org/jira/browse/HDFS-4078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>  Labels: needs-test
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: h4078_20121021.patch
>
>
> Without snapshots, file replication is the same as block replication.
> With snapshot, the file replication R_o of the original file and the file 
> replication R_s of the snapshot file could possibly be different.  Since the 
> blocks are shared between the original file and the snapshot file, block 
> replication is max(R_o, R_s).  If there are more than one snapshots, block 
> replication is the max file replication of the original file and all snapshot 
> files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4090) getFileChecksum() result incompatible when called against zero-byte files.

2012-10-22 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481545#comment-13481545
 ] 

Ravi Prakash commented on HDFS-4090:


The only nit (and I don't care if you don't fix it), is that the comments could 
have taken 2 lines instead of 3.
Also, when I removed the src/main part of the patch, the test fails with a NPE 
because zeroChecksum == null. Maybe check for null? But I'm not going to be a 
stickler for that either.

Please feel free to check it in. +1 lgtm

> getFileChecksum() result incompatible when called against zero-byte files.
> --
>
> Key: HDFS-4090
> URL: https://issues.apache.org/jira/browse/HDFS-4090
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.23.4, 2.0.2-alpha
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: hdfs-4090.patch
>
>
> When getFileChecksum() is called against a zero-byte file, the branch-1 
> client returns MD5MD5CRC32FileChecksum with crcPerBlock=0, bytePerCrc=0 and 
> md5=70bc8f4b72a86921468bf8e8441dce51, whereas a null is returned in trunk.
> The null makes sense since there is no actual block checksums, but this 
> breaks the compatibility when doing distCp and calling getFileChecksum() via 
> webhdfs or hftp.
> This JIRA is to make the client to return the same 'magic' value that the 
> branch-1 and earlier clients return.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4030) BlockManager excessBlocksCount and postponedMisreplicatedBlocksCount should be AtomicLongs

2012-10-22 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481510#comment-13481510
 ] 

Kihwal Lee commented on HDFS-4030:
--

+1 (nb) The patch looks good to me. 

> BlockManager excessBlocksCount and postponedMisreplicatedBlocksCount should 
> be AtomicLongs
> --
>
> Key: HDFS-4030
> URL: https://issues.apache.org/jira/browse/HDFS-4030
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-4030.txt
>
>
> The BlockManager excessBlocksCount and postponedMisreplicatedBlocksCount 
> fields are currently volatile longs which are incremented, which isn't thread 
> safe. It looks like they're always incremented on paths that hold the NN 
> write lock but it would be easier and less error prone for future changes if 
> we made them AtomicLongs. The other volatile long members are just set in one 
> thread and read in another so they're fine as is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4056) Always start the NN's SecretManager

2012-10-22 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481474#comment-13481474
 ] 

Daryn Sharp commented on HDFS-4056:
---

bq. What combinations of initial and subsequent auth modes are we going to 
support? 

The current RPC client/server behavior is:
* Insecure:
** SIMPLE: accept
** DIGEST-MD5: (secret manager enabled) accept
** DIGEST-MD5: (secret manager disabled) downgrade client to SIMPLE
** KERBEROS: downgrade client to SIMPLE
* Secure:
** SIMPLE: reject
** DIGEST-MD5: (secret manager enabled) accept
** DIGEST-MD5: (secret manager disabled) reject
** KERBEROS: accept

So today an insecure cluster is SIMPLE + SIMPLE, a secure cluster is KERBEROS + 
TOKEN.  This patch enables SIMPLE + TOKEN by activating the secret manager, but 
still supports SIMPLE + SIMPLE.

bq. Bottom line is the server should always be able to figure out by itself 
whether a connection is an initial connection or a subsequent one, based on the 
auth method (and type of credentials) used, since it needs to decide on whether 
tokens can be issued for that connection.

The server already uses the auth the client sends in the rpc connection header 
to determine the sasl method the client wants to use.  The auth to the server 
then determines the UGI's auth.  The NN does not allow a UGI auth of token to 
issue, renew, or cancel tokens.

bq. if we are going to support SIMPLE + SIMPLE then we shouldn't always start 
NN's SecretManager.

If we want to allow compatibility with older clients, then both SIMPLE + SIMPLE 
and SIMPLE + TOKEN must both be supported.  Enabling the option of SIMPLE + 
TOKEN means we need the secret manager enabled which is the aim of this patch.

> Always start the NN's SecretManager
> ---
>
> Key: HDFS-4056
> URL: https://issues.apache.org/jira/browse/HDFS-4056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-4056.patch
>
>
> To support the ability to use tokens regardless of whether kerberos is 
> enabled, the NN's secret manager should always be started.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3553) Hftp proxy tokens are broken

2012-10-22 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481432#comment-13481432
 ] 

Daryn Sharp commented on HDFS-3553:
---

My apologies, I overlooked my previous statement about branch-1 before closing. 
 It does appear that proxy tokens cannot be used with hftp, unless that proxy 
token was obtained via hdfs.  HDFS-3509 is attempting to fix proxy tokens for 
webhdfs.  The two filesystems ultimately call into the same low level methods.  
The branch-1 patch here should apply if that jira makes the appropriate changes.

> Hftp proxy tokens are broken
> 
>
> Key: HDFS-3553
> URL: https://issues.apache.org/jira/browse/HDFS-3553
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.2, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3553-1.branch-1.0.patch, 
> HDFS-3553-2.branch-1.0.patch, HDFS-3553-3.branch-1.0.patch, 
> HDFS-3553.branch-1.0.patch, HDFS-3553.branch-23.patch, HDFS-3553.trunk.patch
>
>
> Proxy tokens are broken for hftp.  The impact is systems using proxy tokens, 
> such as oozie jobs, cannot use hftp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++

2012-10-22 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481431#comment-13481431
 ] 

Kihwal Lee commented on HDFS-4046:
--

> Surech: You mean renaming it to NONE from the existing name NULL? Why does it 
> break compatibility?
It shouldn't as long as the enum ordering does not change. 

> Binglin: And there is a class named ChecksumNull may also need change given 
> that NULL is renamed to NONE?
This is not critical, but I think it is okay to update it for the consistency.

> ChecksumTypeProto use NULL as enum value which is illegal in C/C++
> --
>
> Key: HDFS-4046
> URL: https://issues.apache.org/jira/browse/HDFS-4046
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
> Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, 
> HDFS-4046-ChecksumType-NULL.patch
>
>
> I tried to write a native hdfs client using protobuf based protocol, when I 
> generate c++ code using hdfs.proto, the generated file can not compile, 
> because NULL is an already defined macro.
> I am thinking two solutions:
> 1. refactor all DataChecksum.Type.NULL references to NONE, which should be 
> fine for all languages, but this may breaking compatibility.
> 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use 
> enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto 
> and DataChecksum.Type, and make sure enum integer values are match(currently 
> already match).
> I can make a patch for solution 2.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4080) Add an option to disable block-level state change logging

2012-10-22 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481390#comment-13481390
 ] 

Kihwal Lee commented on HDFS-4080:
--

Many are being logged while the lock is held by an object far up in the call 
tree, so it is not as simple as HDFS-4052.  The ultimate solution may be finer 
granule locking.

> Add an option to disable block-level state change logging
> -
>
> Key: HDFS-4080
> URL: https://issues.apache.org/jira/browse/HDFS-4080
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Kihwal Lee
>
> Although the block-level logging in namenode is useful for debugging, it can 
> add a significant overhead to busy hdfs clusters since they are done while 
> the namespace write lock is held. One example is shown in HDFS-4075. In this 
> example, the write lock was held for 5 minutes while logging 11 million log 
> messages for 5.5 million block invalidation events. 
> It will be useful if we have an option to disable these block-level log 
> messages and keep other state change messages going.  If others feel that 
> they can turned into DEBUG (with addition of isDebugEnabled() checks), that 
> may also work too, but there might be people depending on the messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper

2012-10-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-4101:
--

Labels: newbie  (was: )

This would be a good ticket for anyone who's interested in getting started 
contributing to the HA code! Thanks for filing, Damien.

> ZKFC should implement zookeeper.recovery.retry like HBase to connect to 
> ZooKeeper
> -
>
> Key: HDFS-4101
> URL: https://issues.apache.org/jira/browse/HDFS-4101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: auto-failover, ha
>Affects Versions: 2.0.0-alpha, 3.0.0
> Environment: running CDH4.1.1
>Reporter: Damien Hardy
>Priority: Minor
>  Labels: newbie
>
> When zkfc start and zookeeper is not yet started ZKFC fails and stop directly.
> Maybe ZKFC should allow some retries on Zookeeper services like does HBase 
> with zookeeper.recovery.retry
> This particularly appends when I start my whole cluster on VirtualBox for 
> example (every components nearly at the same time) ZKFC is the only that fail 
> and stop ... 
> Every others can wait each-others some time independently of the start order 
> like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the 
> system can be set and stable in few seconds

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3704) In the DFSClient, Add the node to the dead list when the ipc.Client calls fails

2012-10-22 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481316#comment-13481316
 ] 

Yanbo Liang commented on HDFS-3704:
---

If the ipc.Client calls to DataNode fails, maybe it was caused by temporary 
failure and will recovery later. If we treat these DataNodes as failure nodes 
corresponding to this DFSInputStream, we can not use replicas on these kind of 
DataNode no longer even if they recover from failure at later time. Does it 
meet the requirement of other workflow?

> In the DFSClient, Add the node to the dead list when the ipc.Client calls 
> fails
> ---
>
> Key: HDFS-3704
> URL: https://issues.apache.org/jira/browse/HDFS-3704
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3, 2.0.0-alpha
>Reporter: nkeywal
>Priority: Minor
>
> The DFSCLient maintains a list of dead node per input steam. When creating 
> this DFSInputStream, it may connect to one of the nodes to check final block 
> size. If this call fail, this datanode should be put in the dead nodes list 
> to save time. If not it will be retried for the block transfer during the 
> read, and we're likely to get a timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to zookeeper

2012-10-22 Thread Damien Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Hardy updated HDFS-4101:
---

Summary: ZKFC should implement zookeeper.recovery.retry like HBase to 
connect to zookeeper  (was: ZKFC should implement zookeeper.recovery.retry like 
Hbase to connect to zookeeper)

> ZKFC should implement zookeeper.recovery.retry like HBase to connect to 
> zookeeper
> -
>
> Key: HDFS-4101
> URL: https://issues.apache.org/jira/browse/HDFS-4101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: auto-failover, ha
>Affects Versions: 2.0.0-alpha, 3.0.0
> Environment: running CDH4.1.1
>Reporter: Damien Hardy
>Priority: Minor
>
> When zkfc start and zookeeper is not yet started ZKFC fails and stop directly.
> Maybe ZKFC should allow some retries on Zookeeper services like does HBase 
> with zookeeper.recovery.retry
> This particularly appends when I start my whole cluster on VirtualBox for 
> example (every components nearly at the same time) ZKFC is the only that fail 
> and stop ... 
> Every others can wait each-others some time independently of the start order 
> like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the 
> system can be set and stable in few seconds

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >