[jira] [Resolved] (HDFS-3117) clean cache and can't start hadoop

2012-03-20 Thread Aaron T. Myers (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3117.
--

Resolution: Invalid

Hi cldoltd, please post your question to common-u...@hadoop.apache.org. JIRA is 
for tracking known issues with Hadoop.

 clean cache and can't start hadoop
 --

 Key: HDFS-3117
 URL: https://issues.apache.org/jira/browse/HDFS-3117
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: cldoltd

 i use command cache 3 /proc/sys/vm/drop_caches to clean cache
 Now i can't start hadoop.
 thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.020.patch

 Implement Recovery Mode
 ---

 Key: HDFS-3004
 URL: https://issues.apache.org/jira/browse/HDFS-3004
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
 HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
 HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
 HDFS-3004.019.patch, HDFS-3004.020.patch, 
 HDFS-3004__namenode_recovery_tool.txt


 When the NameNode metadata is corrupt for some reason, we want to be able to 
 fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
 world, we never would.  However, bad data on disk can happen from time to 
 time, because of hardware errors or misconfigurations.  In the past we have 
 had to correct it manually, which is time-consuming and which can result in 
 downtime.
 Recovery mode is initialized by the system administrator.  When the NameNode 
 starts up in Recovery Mode, it will try to load the FSImage file, apply all 
 the edits from the edits log, and then write out a new image.  Then it will 
 shut down.
 Unlike in the normal startup process, the recovery mode startup process will 
 be interactive.  When the NameNode finds something that is inconsistent, it 
 will prompt the operator as to what it should do.   The operator can also 
 choose to take the first option for all prompts by starting up with the '-f' 
 flag, or typing 'a' at one of the prompts.
 I have reused as much code as possible from the NameNode in this tool.  
 Hopefully, the effort that was spent developing this will also make the 
 NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3004:
---

Status: Open  (was: Patch Available)

 Implement Recovery Mode
 ---

 Key: HDFS-3004
 URL: https://issues.apache.org/jira/browse/HDFS-3004
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
 HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
 HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
 HDFS-3004.019.patch, HDFS-3004.020.patch, 
 HDFS-3004__namenode_recovery_tool.txt


 When the NameNode metadata is corrupt for some reason, we want to be able to 
 fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
 world, we never would.  However, bad data on disk can happen from time to 
 time, because of hardware errors or misconfigurations.  In the past we have 
 had to correct it manually, which is time-consuming and which can result in 
 downtime.
 Recovery mode is initialized by the system administrator.  When the NameNode 
 starts up in Recovery Mode, it will try to load the FSImage file, apply all 
 the edits from the edits log, and then write out a new image.  Then it will 
 shut down.
 Unlike in the normal startup process, the recovery mode startup process will 
 be interactive.  When the NameNode finds something that is inconsistent, it 
 will prompt the operator as to what it should do.   The operator can also 
 choose to take the first option for all prompts by starting up with the '-f' 
 flag, or typing 'a' at one of the prompts.
 I have reused as much code as possible from the NameNode in this tool.  
 Hopefully, the effort that was spent developing this will also make the 
 NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3004:
---

Status: Patch Available  (was: Open)

 Implement Recovery Mode
 ---

 Key: HDFS-3004
 URL: https://issues.apache.org/jira/browse/HDFS-3004
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
 HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
 HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
 HDFS-3004.019.patch, HDFS-3004.020.patch, 
 HDFS-3004__namenode_recovery_tool.txt


 When the NameNode metadata is corrupt for some reason, we want to be able to 
 fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
 world, we never would.  However, bad data on disk can happen from time to 
 time, because of hardware errors or misconfigurations.  In the past we have 
 had to correct it manually, which is time-consuming and which can result in 
 downtime.
 Recovery mode is initialized by the system administrator.  When the NameNode 
 starts up in Recovery Mode, it will try to load the FSImage file, apply all 
 the edits from the edits log, and then write out a new image.  Then it will 
 shut down.
 Unlike in the normal startup process, the recovery mode startup process will 
 be interactive.  When the NameNode finds something that is inconsistent, it 
 will prompt the operator as to what it should do.   The operator can also 
 choose to take the first option for all prompts by starting up with the '-f' 
 flag, or typing 'a' at one of the prompts.
 I have reused as much code as possible from the NameNode in this tool.  
 Hopefully, the effort that was spent developing this will also make the 
 NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup

2012-03-20 Thread Olivier Sallou (Created) (JIRA)
wiki and hadoop templates provides wrong superusergroup property instead of 
supergroup
--

 Key: HDFS-3118
 URL: https://issues.apache.org/jira/browse/HDFS-3118
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0
 Environment: Used Debian package install
Reporter: Olivier Sallou
Priority: Minor


The hdfs-site template and the wiki: 
http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User
refers to property dfs.permissions.superusergroup to define the group of 
superuser.

However we must use the property dfs.permissions.supergroup, and not 
superusergroup, to make it work.

In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, 
supergroup is extracted from:
this.supergroup = conf.get(dfs.permissions.supergroup, supergroup);

It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup

2012-03-20 Thread Olivier Sallou (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Sallou updated HDFS-3118:
-

Affects Version/s: 1.0.1

 wiki and hadoop templates provides wrong superusergroup property instead of 
 supergroup
 --

 Key: HDFS-3118
 URL: https://issues.apache.org/jira/browse/HDFS-3118
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
 Environment: Used Debian package install
Reporter: Olivier Sallou
Priority: Minor

 The hdfs-site template and the wiki: 
 http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User
 refers to property dfs.permissions.superusergroup to define the group of 
 superuser.
 However we must use the property dfs.permissions.supergroup, and not 
 superusergroup, to make it work.
 In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, 
 supergroup is extracted from:
 this.supergroup = conf.get(dfs.permissions.supergroup, supergroup);
 It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup

2012-03-20 Thread Olivier Sallou (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Sallou updated HDFS-3118:
-

Status: Patch Available  (was: Open)

--- src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java.orig  
2012-03-20 09:54:33.0 +0100
+++ src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java   
2012-03-20 09:55:13.0 +0100
@@ -473,7 +473,7 @@
 fsOwner = UserGroupInformation.getCurrentUser();
 LOG.info(fsOwner= + fsOwner);
 
-this.supergroup = conf.get(dfs.permissions.supergroup, supergroup);
+this.supergroup = 
conf.get(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY, supergroup);
 this.isPermissionEnabled = conf.getBoolean(dfs.permissions, true);
 LOG.info(supergroup= + supergroup);
 LOG.info(isPermissionEnabled= + isPermissionEnabled);
--- src/test/org/apache/hadoop/mapred/TestMapredSystemDir.java.orig 
2012-03-20 09:56:37.0 +0100
+++ src/test/org/apache/hadoop/mapred/TestMapredSystemDir.java  2012-03-20 
09:58:14.0 +0100
@@ -30,6 +30,7 @@
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.security.*;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
 
 /**
  * Test if JobTracker is resilient to garbage in mapred.system.dir.
@@ -49,7 +50,7 @@
 MiniMRCluster mr = null;
 try {
   // start dfs
-  conf.set(dfs.permissions.supergroup, supergroup);
+  conf.set(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY, supergroup);
   conf.set(mapred.system.dir, /mapred);
   dfs = new MiniDFSCluster(conf, 1, true, null);
   FileSystem fs = dfs.getFileSystem();
@@ -120,4 +121,4 @@
   if (mr != null) { mr.shutdown();}
 }
   }
-}
\ No newline at end of file
+}

--- src/hdfs/hdfs-default.xml.orig  2012-03-20 10:00:53.0 +0100
+++ src/hdfs/hdfs-default.xml   2012-03-20 10:01:04.0 +0100
@@ -184,7 +184,7 @@
 /property
 
 property
-  namedfs.permissions.supergroup/name
+  namedfs.permissions.superusergroup/name
   valuesupergroup/value
   descriptionThe name of the group of super-users./description
 /property
--- src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml.orig
2012-03-20 10:02:01.0 +0100
+++ src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml 
2012-03-20 10:02:14.0 +0100
@@ -227,7 +227,7 @@
only those things visible using other permissions. Additional 
groups may be added to the comma-separated list.
 /li
 
-   licodedfs.permissions.supergroup = supergroup/code
+   licodedfs.permissions.superusergroup = supergroup/code
br /The name of the group of super-users.
/li

--- src/docs/cn/src/documentation/content/xdocs/hdfs_permissions_guide.xml.orig 
2012-03-20 10:03:17.0 +0100
+++ src/docs/cn/src/documentation/content/xdocs/hdfs_permissions_guide.xml  
2012-03-20 10:03:30.0 +0100
@@ -170,7 +170,7 @@
dd

Web服务器使用的用户名。如果将这个参数设置为超级用户的名称,则所有Web客户就可以看到所有的信息。如果将这个参数设置为一个不使用的用户,则Web客户就只能访问到“other”权限可访问的资源了。额外的组可以加在后面,形成一个用逗号分隔的列表。
/dd
-   dtcodedfs.permissions.supergroup = supergroup/code/dt
+   dtcodedfs.permissions.superusergroup = supergroup/code/dt
dd
超级用户的组名。
/dd


 wiki and hadoop templates provides wrong superusergroup property instead of 
 supergroup
 --

 Key: HDFS-3118
 URL: https://issues.apache.org/jira/browse/HDFS-3118
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.1, 1.0.0
 Environment: Used Debian package install
Reporter: Olivier Sallou
Priority: Minor

 The hdfs-site template and the wiki: 
 http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User
 refers to property dfs.permissions.superusergroup to define the group of 
 superuser.
 However we must use the property dfs.permissions.supergroup, and not 
 superusergroup, to make it work.
 In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, 
 supergroup is extracted from:
 this.supergroup = conf.get(dfs.permissions.supergroup, supergroup);
 It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup

2012-03-20 Thread Olivier Sallou (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Sallou updated HDFS-3118:
-

Release Note: Use DFS_PERMISSIONS_SUPERUSERGROUP_KEY in code and update 
documentation
  Status: Patch Available  (was: Open)

 wiki and hadoop templates provides wrong superusergroup property instead of 
 supergroup
 --

 Key: HDFS-3118
 URL: https://issues.apache.org/jira/browse/HDFS-3118
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.1, 1.0.0
 Environment: Used Debian package install
Reporter: Olivier Sallou
Priority: Minor

 The hdfs-site template and the wiki: 
 http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User
 refers to property dfs.permissions.superusergroup to define the group of 
 superuser.
 However we must use the property dfs.permissions.supergroup, and not 
 superusergroup, to make it work.
 In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, 
 supergroup is extracted from:
 this.supergroup = conf.get(dfs.permissions.supergroup, supergroup);
 It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup

2012-03-20 Thread Olivier Sallou (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Sallou updated HDFS-3118:
-

Status: Open  (was: Patch Available)

 wiki and hadoop templates provides wrong superusergroup property instead of 
 supergroup
 --

 Key: HDFS-3118
 URL: https://issues.apache.org/jira/browse/HDFS-3118
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.1, 1.0.0
 Environment: Used Debian package install
Reporter: Olivier Sallou
Priority: Minor

 The hdfs-site template and the wiki: 
 http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User
 refers to property dfs.permissions.superusergroup to define the group of 
 superuser.
 However we must use the property dfs.permissions.supergroup, and not 
 superusergroup, to make it work.
 In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, 
 supergroup is extracted from:
 this.supergroup = conf.get(dfs.permissions.supergroup, supergroup);
 It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup

2012-03-20 Thread Olivier Sallou (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Sallou updated HDFS-3118:
-

Status: Patch Available  (was: Open)

file attached

 wiki and hadoop templates provides wrong superusergroup property instead of 
 supergroup
 --

 Key: HDFS-3118
 URL: https://issues.apache.org/jira/browse/HDFS-3118
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.1, 1.0.0
 Environment: Used Debian package install
Reporter: Olivier Sallou
Priority: Minor
 Attachments: supergroup.patch


 The hdfs-site template and the wiki: 
 http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User
 refers to property dfs.permissions.superusergroup to define the group of 
 superuser.
 However we must use the property dfs.permissions.supergroup, and not 
 superusergroup, to make it work.
 In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, 
 supergroup is extracted from:
 this.supergroup = conf.get(dfs.permissions.supergroup, supergroup);
 It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup

2012-03-20 Thread Olivier Sallou (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Sallou updated HDFS-3118:
-

Attachment: supergroup.patch

 wiki and hadoop templates provides wrong superusergroup property instead of 
 supergroup
 --

 Key: HDFS-3118
 URL: https://issues.apache.org/jira/browse/HDFS-3118
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
 Environment: Used Debian package install
Reporter: Olivier Sallou
Priority: Minor
 Attachments: supergroup.patch


 The hdfs-site template and the wiki: 
 http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User
 refers to property dfs.permissions.superusergroup to define the group of 
 superuser.
 However we must use the property dfs.permissions.supergroup, and not 
 superusergroup, to make it work.
 In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, 
 supergroup is extracted from:
 this.supergroup = conf.get(dfs.permissions.supergroup, supergroup);
 It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file

2012-03-20 Thread J.Andreina (Created) (JIRA)
Overreplicated block is not deleted even after the replication factor is 
reduced after sync follwed by closing that file


 Key: HDFS-3119
 URL: https://issues.apache.org/jira/browse/HDFS-3119
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.24.0
Reporter: J.Andreina
Priority: Minor
 Fix For: 0.24.0, 0.23.2


cluster setup:
--

1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB

step1: write a file filewrite.txt of size 90bytes with sync(not closed) 
step2: change the replication factor to 1  using the command: ./hdfs dfs 
-setrep 1 /filewrite.txt
step3: close the file

* At the NN side the file Decreasing replication from 2 to 1 for 
/filewrite.txt , logs has occured but the overreplicated blocks are not 
deleted even after the block report is sent from DN

* while listing the file in the console using ./hdfs dfs -ls  the replication 
factor for that file is mentioned as 1

* In fsck report for that files displays that the file is replicated to 2 
datanodes


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream

2012-03-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233524#comment-13233524
 ] 

jirapos...@reviews.apache.org commented on HDFS-2834:
-



bq.  On 2012-03-20 01:27:50, Todd Lipcon wrote:
bq.   
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java,
 line 44
bq.   https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line44
bq.  
bq.   shouldn't this be true?

Oops, yes. Thankfully the test still passes when it's testing the right path...


bq.  On 2012-03-20 01:27:50, Todd Lipcon wrote:
bq.   
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java,
 lines 81-82
bq.   https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line81
bq.  
bq.   no reason to use DFSClient here. Instead you can just use the 
filesystem, right? Then downcast the stream you get back?

Good point - no need even to downcast since FSDataInputStream has the API.


bq.  On 2012-03-20 01:27:50, Todd Lipcon wrote:
bq.   
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java,
 line 104
bq.   https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line104
bq.  
bq.   don't you want an assert on sawException here? You can also use 
GenericTestUtils.assertExceptionContains() if you want to check the text of it

Good catch. No particular need to assert the content of the exception - any 
checksum error is good enough here. 


bq.  On 2012-03-20 01:27:50, Todd Lipcon wrote:
bq.   
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 lines 562-564
bq.   https://reviews.apache.org/r/4212/diff/2/?file=90207#file90207line562
bq.  
bq.   this comment seems like it's in the wrong spot, since the code that 
comes after it doesn't reference offsetFromChunkBoundary.

I removed the comment, it's covered by the comment at line 549.


- Henry


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4212/#review6103
---


On 2012-03-09 00:47:24, Henry Robinson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4212/
bq.  ---
bq.  
bq.  (Updated 2012-03-09 00:47:24)
bq.  
bq.  
bq.  Review request for hadoop-hdfs and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  New patch for HDFS-2834 (I can't update the old review request).
bq.  
bq.  
bq.  This addresses bug HDFS-2834.
bq.  http://issues.apache.org/jira/browse/HDFS-2834
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
 dfab730 
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
 cc61697 
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
 4187f1c 
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
 2b817ff 
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
 b7da8d4 
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
 ea24777 
bq.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
 9d4f4a2 
bq.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
 PRE-CREATION 
bq.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
 bbd0012 
bq.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
 eb2a1d8 
bq.  
bq.  Diff: https://reviews.apache.org/r/4212/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Henry
bq.  
bq.



 ByteBuffer-based read API for DFSInputStream
 

 Key: HDFS-2834
 URL: https://issues.apache.org/jira/browse/HDFS-2834
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Henry Robinson
Assignee: Henry Robinson
 Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, 
 HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.7.patch, 
 HDFS-2834.8.patch, HDFS-2834.9.patch, HDFS-2834.patch, HDFS-2834.patch, 
 hdfs-2834-libhdfs-benchmark.png


 The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated 
 {{byte[]}}. Although for many clients this is desired behaviour, in certain 
 situations, such as native-reads 

[jira] [Updated] (HDFS-2834) ByteBuffer-based read API for DFSInputStream

2012-03-20 Thread Henry Robinson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-2834:
-

Attachment: HDFS-2834.10.patch

Review comments.

 ByteBuffer-based read API for DFSInputStream
 

 Key: HDFS-2834
 URL: https://issues.apache.org/jira/browse/HDFS-2834
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Henry Robinson
Assignee: Henry Robinson
 Attachments: HDFS-2834-no-common.patch, HDFS-2834.10.patch, 
 HDFS-2834.3.patch, HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, 
 HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.9.patch, HDFS-2834.patch, 
 HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png


 The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated 
 {{byte[]}}. Although for many clients this is desired behaviour, in certain 
 situations, such as native-reads through libhdfs, this imposes an extra copy 
 penalty since the {{byte[]}} needs to be copied out again into a natively 
 readable memory area. 
 For these cases, it would be preferable to allow the client to supply its own 
 buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream

2012-03-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233535#comment-13233535
 ] 

jirapos...@reviews.apache.org commented on HDFS-2834:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4212/
---

(Updated 2012-03-20 16:29:56.616292)


Review request for hadoop-hdfs and Todd Lipcon.


Changes
---

Review responses


Summary
---

New patch for HDFS-2834 (I can't update the old review request).


This addresses bug HDFS-2834.
http://issues.apache.org/jira/browse/HDFS-2834


Diffs (updated)
-

  
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
 dfab730 
  
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
 cc61697 
  
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
 4187f1c 
  
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
 71c8a50 
  
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
 b7da8d4 
  
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
 ea24777 
  
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
 9d4f4a2 
  
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
 PRE-CREATION 
  
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
 bbd0012 
  
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
 f4052bb 

Diff: https://reviews.apache.org/r/4212/diff


Testing
---


Thanks,

Henry



 ByteBuffer-based read API for DFSInputStream
 

 Key: HDFS-2834
 URL: https://issues.apache.org/jira/browse/HDFS-2834
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Henry Robinson
Assignee: Henry Robinson
 Attachments: HDFS-2834-no-common.patch, HDFS-2834.10.patch, 
 HDFS-2834.3.patch, HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, 
 HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.9.patch, HDFS-2834.patch, 
 HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png


 The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated 
 {{byte[]}}. Although for many clients this is desired behaviour, in certain 
 situations, such as native-reads through libhdfs, this imposes an extra copy 
 penalty since the {{byte[]}} needs to be copied out again into a natively 
 readable memory area. 
 For these cases, it would be preferable to allow the client to supply its own 
 buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover

2012-03-20 Thread Mingjie Lai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233550#comment-13233550
 ] 

Mingjie Lai commented on HDFS-3083:
---

Aaron. You're right for the root cause. The order of the configured namenodes 
does make a difference. 

Throwing StandbyException from SecreteManager is not perfect, but okay for me. 

Good job. 

 HA+security: failed to run a mapred job from yarn after a manual failover
 -

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover

2012-03-20 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233573#comment-13233573
 ] 

Aaron T. Myers commented on HDFS-3083:
--

Thanks a lot, Mingjie. Did you perhaps get a chance to apply the patch and test 
out the fix?

 HA+security: failed to run a mapred job from yarn after a manual failover
 -

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-20 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3050:
--

Status: Patch Available  (was: Open)

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover

2012-03-20 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233576#comment-13233576
 ] 

Todd Lipcon commented on HDFS-3083:
---

+1. This seems like the best way to fix this that I can think of as well. Can 
you run the HDFS tests locally to be sure before committing?

 HA+security: failed to run a mapred job from yarn after a manual failover
 -

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-20 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233578#comment-13233578
 ] 

Eli Collins commented on HDFS-3050:
---

Think you need to regenerate the diff, this one nukes LdapGroupsMappings and 
modifies CHANGES.txt.

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-20 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233591#comment-13233591
 ] 

Eli Collins commented on HDFS-3044:
---

- Still needs a test the new behavior of fsck move, that it's not destructive 
(ie a test that covers move w/o delete, and asserts the source files are still 
there)
- Nit, if we're going to name the flag doMove (vs eg salvageCorruptFiles) 
please add a comment by the declaration that doMove doesn't actually do a 
move anymore (since it no longer deletes its a copy now)

Otherwise looks great!

 fsck move should be non-destructive by default
 --

 Key: HDFS-3044
 URL: https://issues.apache.org/jira/browse/HDFS-3044
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Eli Collins
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3044.002.patch


 The fsck move behavior in the code and originally articulated in HADOOP-101 
 is:
 {quote}Current failure modes for DFS involve blocks that are completely 
 missing. The only way to fix them would be to recover chains of blocks and 
 put them into lost+found{quote}
 A directory is created with the file name, the blocks that are accessible are 
 created as individual files in this directory, then the original file is 
 removed. 
 I suspect the rationale for this behavior was that you can't use files that 
 are missing locations, and copying the block as files at least makes part of 
 the files accessible. However this behavior can also result in permanent 
 dataloss. Eg:
 - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
 startup, files with blocks where all replicas are on these set of datanodes 
 are marked corrupt
 - Admin does fsck move, which deletes the corrupt files, saves whatever 
 blocks were available
 - The HW issues with datanodes are resolved, they are started and join the 
 cluster. The NN tells them to delete their blocks for the corrupt files since 
 the file was deleted. 
 I think we should:
 - Make fsck move non-destructive by default (eg just does a move into 
 lost+found)
 - Make the destructive behavior optional (eg --destructive so admins think 
 about what they're doing)
 - Provide better sanity checks and warnings, eg if you're running fsck and 
 not all the slaves have checked in (if using dfs.hosts) then fsck should 
 print a warning indicating this that an admin should have to override if they 
 want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-20 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233603#comment-13233603
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3107:
--

Easy, Milind.  :)  I do agree with Suresh that (2) is not a very good reason to 
have truncate.  I think such accidence is rare.  However, you made a good point 
that having append without truncate is a deficiency.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-20 Thread Milind Bhandarkar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233609#comment-13233609
 ] 

Milind Bhandarkar commented on HDFS-3107:
-

I must have missed a smiley :-)

Nicolas,

After appends were enabled in HDFS, we have seen a lot of cases where a lot of 
(mainly text, or even compressed text) datasets were merged using appends.

This is where customers realize their mistake immediately after starting to 
append, and do a ctrl-c.

This is very common.

--
Milind Bhandarkar
Chief Architect, Greenplum Labs,
Data Computing Division, EMC
+1-650-523-3858 (W)
+1-408-666-8483 (C)



 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3100) failed to append data using webhdfs

2012-03-20 Thread Brandon Li (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-3100:
-

Attachment: HDFS-3100.patch

Regenerate the patch from the right directory.

 failed to append data using webhdfs
 ---

 Key: HDFS-3100
 URL: https://issues.apache.org/jira/browse/HDFS-3100
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.24.0, 0.23.1
Reporter: Zhanwei.Wang
Assignee: Brandon Li
 Attachments: HDFS-3100.patch, HDFS-3100.patch, 
 hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, 
 test.sh, testAppend.patch


 STEP:
 1, deploy a single node hdfs  0.23.1 cluster and configure hdfs as:
 A) enable webhdfs
 B) enable append
 C) disable permissions
 2, start hdfs
 3, run the test script as attached
 RESULT:
 expected: a file named testFile should be created and populated with 32K * 
 5000 zeros, HDFS should be OK.
 I got: script cannot be finished, file has been created but not be populated 
 as expected, actually append operation failed.
 Datanode log shows that, blockscaner report a bad replica and nanenode decide 
 to delete it. Since it is a single node cluster, append fail. It makes no 
 sense that the script failed every time.
 Datanode and Namenode logs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3108) [UI] Few Namenode links are not working

2012-03-20 Thread Brahma Reddy Battula (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233637#comment-13233637
 ] 

Brahma Reddy Battula commented on HDFS-3108:


Ya..Scenario-1 is same as HDFS-2025..

 [UI] Few Namenode links are not working
 ---

 Key: HDFS-3108
 URL: https://issues.apache.org/jira/browse/HDFS-3108
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0, 0.23.1
Reporter: Brahma Reddy Battula
Priority: Minor
 Fix For: 0.23.3

 Attachments: Scenario2_Trace.txt


 Scenario 1
 ==
 Once tail a file from UI and click on Go Back to File View,I am getting 
 HTTP ERROR 404
 Scenario 2
 ===
 Frequently I am getting following execption If a click on (BrowseFileSystem 
 or anyfile)java.lang.IllegalArgumentException: java.net.UnknownHostException: 
 HOST-10-18-40-24

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution

2012-03-20 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233651#comment-13233651
 ] 

Eli Collins commented on HDFS-2617:
---

Cool, sometime soon?

 Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
 --

 Key: HDFS-2617
 URL: https://issues.apache.org/jira/browse/HDFS-2617
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan
 Attachments: HDFS-2617-a.patch


 The current approach to secure and authenticate nn web services is based on 
 Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now 
 that we have one, we can get rid of the non-standard KSSL and use SPNEGO 
 throughout.  This will simplify setup and configuration.  Also, Kerberized 
 SSL is a non-standard approach with its own quirks and dark corners 
 (HDFS-2386).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover

2012-03-20 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233661#comment-13233661
 ] 

Aaron T. Myers commented on HDFS-3083:
--

I just ran the full HDFS test suite, and they all passed.

I'll commit this shortly based on Todd's +1. Thanks a lot for the review.

 HA+security: failed to run a mapred job from yarn after a manual failover
 -

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.007.patch

rebase on latest trunk

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3083) Cannot run a MR job with HA and security enabled when second-listed NN active

2012-03-20 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3083:
-

Summary: Cannot run a MR job with HA and security enabled when 
second-listed NN active  (was: HA+security: failed to run a mapred job from 
yarn after a manual failover)

 Cannot run a MR job with HA and security enabled when second-listed NN active
 -

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active

2012-03-20 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3083:
-

Summary: Cannot run an MR job with HA and security enabled when 
second-listed NN active  (was: Cannot run a MR job with HA and security enabled 
when second-listed NN active)

 Cannot run an MR job with HA and security enabled when second-listed NN active
 --

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233666#comment-13233666
 ] 

Hadoop QA commented on HDFS-3050:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519106/HDFS-3050.007.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2041//console

This message is automatically generated.

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active

2012-03-20 Thread Aaron T. Myers (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3083.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

I've just committed this to trunk and branch-0.23.

Thanks a lot for the reviews, Todd and Mingjie.

 Cannot run an MR job with HA and security enabled when second-listed NN active
 --

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3100) failed to append data using webhdfs

2012-03-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233680#comment-13233680
 ] 

Hadoop QA commented on HDFS-3100:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519100/HDFS-3100.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 11 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests:
  org.apache.hadoop.hdfs.TestCrcCorruption
  org.apache.hadoop.hdfs.server.namenode.TestFsck
  org.apache.hadoop.hdfs.server.namenode.TestCorruptFilesJsp
  org.apache.hadoop.hdfs.TestClientBlockVerification
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
  org.apache.hadoop.hdfs.TestDatanodeBlockScanner
  org.apache.hadoop.hdfs.TestDataTransferProtocol
  org.apache.hadoop.hdfs.TestDFSShell
  
org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
  org.apache.hadoop.hdfs.TestDFSClientRetries

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2040//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2040//console

This message is automatically generated.

 failed to append data using webhdfs
 ---

 Key: HDFS-3100
 URL: https://issues.apache.org/jira/browse/HDFS-3100
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.24.0, 0.23.1
Reporter: Zhanwei.Wang
Assignee: Brandon Li
 Attachments: HDFS-3100.patch, HDFS-3100.patch, 
 hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, 
 test.sh, testAppend.patch


 STEP:
 1, deploy a single node hdfs  0.23.1 cluster and configure hdfs as:
 A) enable webhdfs
 B) enable append
 C) disable permissions
 2, start hdfs
 3, run the test script as attached
 RESULT:
 expected: a file named testFile should be created and populated with 32K * 
 5000 zeros, HDFS should be OK.
 I got: script cannot be finished, file has been created but not be populated 
 as expected, actually append operation failed.
 Datanode log shows that, blockscaner report a bad replica and nanenode decide 
 to delete it. Since it is a single node cluster, append fail. It makes no 
 sense that the script failed every time.
 Datanode and Namenode logs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: (was: HDFS-3050.008.patch)

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3086) Change Datanode not to send storage list in registration - it will be sent in block report

2012-03-20 Thread Tsz Wo (Nicholas), SZE (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3086:
-

Attachment: h3086_20120320.patch

h3086_20120320.patch:
- remove the storages parameter from DatanodeProtocol.registerDatanode(..);
- change storageID to DatanodeStorage in StorageBlockReport.

 Change Datanode not to send storage list in registration - it will be sent in 
 block report
 --

 Key: HDFS-3086
 URL: https://issues.apache.org/jira/browse/HDFS-3086
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h3086_20120320.patch


 When a datnode is registered, the datanode send also the storage lists.  It 
 is not useful since the storage list is already available in block reports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3086) Change Datanode not to send storage list in registration - it will be sent in block report

2012-03-20 Thread Tsz Wo (Nicholas), SZE (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3086:
-

Status: Patch Available  (was: Open)

 Change Datanode not to send storage list in registration - it will be sent in 
 block report
 --

 Key: HDFS-3086
 URL: https://issues.apache.org/jira/browse/HDFS-3086
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h3086_20120320.patch


 When a datnode is registered, the datanode send also the storage lists.  It 
 is not useful since the storage list is already available in block reports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.008.patch
HDFS-3050.008.patch

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3044) fsck move should be non-destructive by default

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3044:
---

Attachment: HDFS-3044.003.patch

address eli's comments

 fsck move should be non-destructive by default
 --

 Key: HDFS-3044
 URL: https://issues.apache.org/jira/browse/HDFS-3044
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Eli Collins
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch


 The fsck move behavior in the code and originally articulated in HADOOP-101 
 is:
 {quote}Current failure modes for DFS involve blocks that are completely 
 missing. The only way to fix them would be to recover chains of blocks and 
 put them into lost+found{quote}
 A directory is created with the file name, the blocks that are accessible are 
 created as individual files in this directory, then the original file is 
 removed. 
 I suspect the rationale for this behavior was that you can't use files that 
 are missing locations, and copying the block as files at least makes part of 
 the files accessible. However this behavior can also result in permanent 
 dataloss. Eg:
 - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
 startup, files with blocks where all replicas are on these set of datanodes 
 are marked corrupt
 - Admin does fsck move, which deletes the corrupt files, saves whatever 
 blocks were available
 - The HW issues with datanodes are resolved, they are started and join the 
 cluster. The NN tells them to delete their blocks for the corrupt files since 
 the file was deleted. 
 I think we should:
 - Make fsck move non-destructive by default (eg just does a move into 
 lost+found)
 - Make the destructive behavior optional (eg --destructive so admins think 
 about what they're doing)
 - Provide better sanity checks and warnings, eg if you're running fsck and 
 not all the slaves have checked in (if using dfs.hosts) then fsck should 
 print a warning indicating this that an admin should have to override if they 
 want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-03-20 Thread Arpit Gupta (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated HDFS-3094:
--

Attachment: HDFS-3094.branch-1.0.patch

Attached an updated patch for branch 1.0 with comments addressed

Here are the test patch results

{code}
BUILD SUCCESSFUL
Total time: 7 minutes 22 seconds




-1 overall.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 11 new Findbugs (version 
1.3.9) warnings.
{code}

Findbugs warnings are unrelated to this patch

 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery

2012-03-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233755#comment-13233755
 ] 

Hudson commented on HDFS-3105:
--

Integrated in Hadoop-Mapreduce-0.23-Build #231 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/231/])
svn merge -c 1302683 from trunk for HDFS-3105. (Revision 1302685)

 Result = FAILURE
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302685
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolTranslatorPB.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InterDatanodeProtocol.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestInterDatanodeProtocol.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java


 Add DatanodeStorage information to block recovery
 -

 Key: HDFS-3105
 URL: https://issues.apache.org/jira/browse/HDFS-3105
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, hdfs client
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.24.0, 0.23.3

 Attachments: h3105_20120315.patch, h3105_20120315b.patch, 
 h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch


 When recovering a block, the namenode and client do not have the datanode 
 storage information of the block.  So namenode cannot add the block to the 
 corresponding datanode storge block list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-03-20 Thread Arpit Gupta (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated HDFS-3094:
--

Attachment: HDFS-3094.patch

Attached patch for trunk with the comments addressed.

 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.

2012-03-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233762#comment-13233762
 ] 

Hudson commented on HDFS-3091:
--

Integrated in Hadoop-Mapreduce-0.23-Build #231 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/231/])
Merge HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure 
policy in the config description for the smaller clusters. Contributed by 
Nicholas. (Revision 1302633)

 Result = FAILURE
umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302633
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-common-project
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job
* /hadoop/common/branches/branch-0.23/hadoop-project
* /hadoop/common/branches/branch-0.23/hadoop-project/src/site


 Update the usage limitations of ReplaceDatanodeOnFailure policy in the config 
 description for the smaller clusters.
 ---

 Key: HDFS-3091
 URL: https://issues.apache.org/jira/browse/HDFS-3091
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.24.0, 0.23.3

 Attachments: h3091_20120319.patch


 When verifying the HDFS-1606 feature, Observed 

[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-20 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233768#comment-13233768
 ] 

Suresh Srinivas commented on HDFS-3107:
---

bq. I must have missed a smiley
Thats okay. You missed the smiley in the tweet too.

bq. This is very common.
I see I was not aware it was that common.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-03-20 Thread Arpit Gupta (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated HDFS-3094:
--

Status: Patch Available  (was: Open)

 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-03-20 Thread Arpit Gupta (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233769#comment-13233769
 ] 

Arpit Gupta commented on HDFS-3094:
---

bq. should be -nonInteractive (not a capital 'A')
done

bq. Rename getisForce to just isForce or isForceEnabled().

done

Updated tests to not use a different thread and sleep


bq. It looks like if you specify invalid options, it won't give any kind of 
useful error message. You should probably be throwing 
HadoopIllegalArgumentException instead of returning null in several of these 
cases.

Left as is, as returning null causes usage to be printed which will show the 
correct format.


 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active

2012-03-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233773#comment-13233773
 ] 

Hudson commented on HDFS-3083:
--

Integrated in Hadoop-Common-0.23-Commit #708 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/708/])
HDFS-3083. Cannot run an MR job with HA and security enabled when 
second-listed NN active. Contributed by Aaron T. Myers. (Revision 1303099)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1303099
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/SecretManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Cannot run an MR job with HA and security enabled when second-listed NN active
 --

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active

2012-03-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233787#comment-13233787
 ] 

Hudson commented on HDFS-3083:
--

Integrated in Hadoop-Common-trunk-Commit #1907 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1907/])
HDFS-3083. Cannot run an MR job with HA and security enabled when 
second-listed NN active. Contributed by Aaron T. Myers. (Revision 1303098)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1303098
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/SecretManager.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Cannot run an MR job with HA and security enabled when second-listed NN active
 --

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-20 Thread Milind Bhandarkar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233791#comment-13233791
 ] 

Milind Bhandarkar commented on HDFS-3107:
-

.bq Thats okay. You missed the smiley in the tweet too.

I just copy-pasted, so it was expected :-)

.bq I see I was not aware it was that common.

Since appends were enabled very recently, only those with the facebook's 
version of hadoop, or hadoop 1.0 are users doing this now. Before this, users 
were creating multiple files.

In any case, my interest in this feature is for implementing transactions over 
HDFS (as Lei and I have already discussed with Sanjay Radia and Hairong.) And 
aborting a transaction means truncating to the last known good data across 
multiple files.


 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active

2012-03-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233796#comment-13233796
 ] 

Hudson commented on HDFS-3083:
--

Integrated in Hadoop-Hdfs-0.23-Commit #699 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/699/])
HDFS-3083. Cannot run an MR job with HA and security enabled when 
second-listed NN active. Contributed by Aaron T. Myers. (Revision 1303099)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1303099
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/SecretManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Cannot run an MR job with HA and security enabled when second-listed NN active
 --

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active

2012-03-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233803#comment-13233803
 ] 

Hudson commented on HDFS-3083:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1981 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1981/])
HDFS-3083. Cannot run an MR job with HA and security enabled when 
second-listed NN active. Contributed by Aaron T. Myers. (Revision 1303098)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1303098
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/SecretManager.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Cannot run an MR job with HA and security enabled when second-listed NN active
 --

 Key: HDFS-3083
 URL: https://issues.apache.org/jira/browse/HDFS-3083
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 0.24.0, 0.23.3
Reporter: Mingjie Lai
Assignee: Aaron T. Myers
Priority: Critical
 Fix For: 0.24.0, 0.23.3

 Attachments: HDFS-3083-combined.patch


 Steps to reproduce:
 - turned on ha and security
 - run a mapred job, and wait to finish
 - failover to another namenode
 - run the mapred job again, it fails. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3120) Provide ability to enable sync without append

2012-03-20 Thread Eli Collins (Created) (JIRA)
Provide ability to enable sync without append
-

 Key: HDFS-3120
 URL: https://issues.apache.org/jira/browse/HDFS-3120
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.1
Reporter: Eli Collins
Assignee: Eli Collins


The work on branch-20-append was to support *sync*, for durable HBase WALs, not 
*append*. The branch-20-append implementation is known to be buggy. There's 
been confusion about this, we often answer queries on the list [like 
this|http://search-hadoop.com/m/wfed01VOIJ5]. Unfortunately, the way to enable 
correct sync on branch-1 for HBase is to set dfs.support.append to true in your 
config, which has the side effect of enabling append (which we don't want to 
do).

Let's add a new *dfs.support.hsync* option that enables working sync (which is 
basically the current dfs.support.append flag modulo one place where it's not 
referring to sync). For compatibility, if dfs.support.append is set, 
dfs.support.sync will be set as well. This way someone can enable sync for 
HBase and still keep the current behavior that if dfs.support.append is not set 
then an append operation will result in an IOE indicating append is not 
supported. We should do this on trunk as well, as there's no reason to conflate 
hsync and append with a single config even if append works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-20 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233816#comment-13233816
 ] 

Eli Collins commented on HDFS-3107:
---

bq. Since appends were enabled very recently, only those with the facebook's 
version of hadoop, or hadoop 1.0 are users doing this now.

Append doesn't work on hadoop 1.0, see HDFS-3120.  I'm actually going to start 
a discussion about removing append entirely on hdfs-dev@.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3072) haadmin should have configurable timeouts for failover commands

2012-03-20 Thread Todd Lipcon (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HDFS-3072:
-

Assignee: Todd Lipcon

 haadmin should have configurable timeouts for failover commands
 ---

 Key: HDFS-3072
 URL: https://issues.apache.org/jira/browse/HDFS-3072
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha
Affects Versions: 0.24.0
Reporter: Philip Zeyliger
Assignee: Todd Lipcon

 The HAAdmin failover could should time out reasonably aggressively and go 
 onto the fencing strategies if it's dealing with a mostly dead active 
 namenode.  Currently it uses what's probably the default, which is to say no 
 timeout whatsoever.
 {code}
   /**
* Return a proxy to the specified target service.
*/
   protected HAServiceProtocol getProtocol(String serviceId)
   throws IOException {
 String serviceAddr = getServiceAddr(serviceId);
 InetSocketAddress addr = NetUtils.createSocketAddr(serviceAddr);
 return (HAServiceProtocol)RPC.getProxy(
   HAServiceProtocol.class, HAServiceProtocol.versionID,
   addr, getConf());
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.

2012-03-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233839#comment-13233839
 ] 

Hudson commented on HDFS-3091:
--

Integrated in Hadoop-Mapreduce-trunk #1025 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1025/])
HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure policy 
in the config description for the smaller clusters. Contributed by Nicholas. 
(Revision 1302624)

 Result = SUCCESS
umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302624
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


 Update the usage limitations of ReplaceDatanodeOnFailure policy in the config 
 description for the smaller clusters.
 ---

 Key: HDFS-3091
 URL: https://issues.apache.org/jira/browse/HDFS-3091
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.24.0, 0.23.3

 Attachments: h3091_20120319.patch


 When verifying the HDFS-1606 feature, Observed couple of issues.
 Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont 
 have enough DN to replcae in cluster and will be resulted into write failure.
 {quote}
 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010]
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
 {quote}
 Lets take some cases:
 1) Replication factor 3 and cluster size also 3 and unportunately pipeline 
 drops to 1.
 ReplaceDatanodeOnFailure will be satisfied because *existings(1)= 
 replication/2 (3/2==1)*.
 But when it finding the new node to replace obiously it can not find the new 
 node and the sanity check will fail.
 This will be resulted to Wite failure.
 2) Replication factor 10 (accidentally user sets the replication factor to 
 higher value than cluster size),
   Cluser has only 5 datanodes.
   Here even if one node fails also write will fail with same reason.
   Because pipeline max will be 5 and killed one datanode, then existings will 
 be 4
   *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it 
 can not replace with the new node as there is no extra nodes exist in the 
 cluster. This will be resulted to write failure.
 3) sync realted opreations also fails in this situations ( will post the 
 clear scenarios)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-20 Thread Milind Bhandarkar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233846#comment-13233846
 ] 

Milind Bhandarkar commented on HDFS-3107:
-

Yes, I am using the term append loosely, because of FB's 20-append branch. 
Our transaction work is done with 0.23.x.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-20 Thread Milind Bhandarkar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233857#comment-13233857
 ] 

Milind Bhandarkar commented on HDFS-3107:
-

Suresh, Nicolas, Eli; Any opinions about the proposed API and semantics ?

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-20 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233904#comment-13233904
 ] 

Todd Lipcon commented on HDFS-3107:
---

IMO adding truncate() adds a bunch of non-trivial complexity. It's not so much 
because truncating a block is that hard -- but rather because it breaks a 
serious invariant we have elsewhere that blocks only get longer after they are 
created. This means that we have to revisit code all over HDFS -- in particular 
some of the trickiest bits around block synchronization -- to get this to work. 
It's not insurmountable, but I would like to know a lot more about the use case 
before commenting on the API/semantics.

Maybe you can open a JIRA or upload a design about your transactional HDFS 
feature, so we can understand the motivation better? Otherwise I'm more 
inclined to agree with Eli's suggestion to remove append entirely (please 
continue that discussion on-list, though).

{quote}
After appends were enabled in HDFS, we have seen a lot of cases where a lot of 
(mainly text, or even compressed text) datasets were merged using appends.

This is where customers realize their mistake immediately after starting to 
append, and do a ctrl-c.
{quote}
I don't follow... we don't even expose append() via the shell. And if we did, 
would users actually be using fs -append to manually write new lines of data 
into their Hadoop systems??


 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.022.patch

* remove some unecessary whitespace changes

* re-introduce EditLogInputException

* edit log input stream: change API as we discussed.

* FSEditLogLoader: re-organize this file.  Fix some corner cases relating to 
out-of-order transaction IDs

 Implement Recovery Mode
 ---

 Key: HDFS-3004
 URL: https://issues.apache.org/jira/browse/HDFS-3004
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
 HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
 HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
 HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, 
 HDFS-3004__namenode_recovery_tool.txt


 When the NameNode metadata is corrupt for some reason, we want to be able to 
 fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
 world, we never would.  However, bad data on disk can happen from time to 
 time, because of hardware errors or misconfigurations.  In the past we have 
 had to correct it manually, which is time-consuming and which can result in 
 downtime.
 Recovery mode is initialized by the system administrator.  When the NameNode 
 starts up in Recovery Mode, it will try to load the FSImage file, apply all 
 the edits from the edits log, and then write out a new image.  Then it will 
 shut down.
 Unlike in the normal startup process, the recovery mode startup process will 
 be interactive.  When the NameNode finds something that is inconsistent, it 
 will prompt the operator as to what it should do.   The operator can also 
 choose to take the first option for all prompts by starting up with the '-f' 
 flag, or typing 'a' at one of the prompts.
 I have reused as much code as possible from the NameNode in this tool.  
 Hopefully, the effort that was spent developing this will also make the 
 NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.023.patch

* OpInstanceCache needs to be thread-local to work correctly

* update exception text regex in TestFSEditLogLoader

 Implement Recovery Mode
 ---

 Key: HDFS-3004
 URL: https://issues.apache.org/jira/browse/HDFS-3004
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
 HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
 HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
 HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, 
 HDFS-3004.023.patch, HDFS-3004__namenode_recovery_tool.txt


 When the NameNode metadata is corrupt for some reason, we want to be able to 
 fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
 world, we never would.  However, bad data on disk can happen from time to 
 time, because of hardware errors or misconfigurations.  In the past we have 
 had to correct it manually, which is time-consuming and which can result in 
 downtime.
 Recovery mode is initialized by the system administrator.  When the NameNode 
 starts up in Recovery Mode, it will try to load the FSImage file, apply all 
 the edits from the edits log, and then write out a new image.  Then it will 
 shut down.
 Unlike in the normal startup process, the recovery mode startup process will 
 be interactive.  When the NameNode finds something that is inconsistent, it 
 will prompt the operator as to what it should do.   The operator can also 
 choose to take the first option for all prompts by starting up with the '-f' 
 flag, or typing 'a' at one of the prompts.
 I have reused as much code as possible from the NameNode in this tool.  
 Hopefully, the effort that was spent developing this will also make the 
 NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3121) test for HADOOP-8194 (quota using viewfs)

2012-03-20 Thread John George (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-3121:
--

Attachment: hdfs-3121.patch

Attaching test for HADOOP-8194

 test for HADOOP-8194 (quota using viewfs)
 -

 Key: HDFS-3121
 URL: https://issues.apache.org/jira/browse/HDFS-3121
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
 Attachments: hdfs-3121.patch


 This JIRA is to write tests for viewing quota using viewfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3121) test for HADOOP-8194 (quota using viewfs)

2012-03-20 Thread John George (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-3121:
--

Status: Patch Available  (was: Open)

 test for HADOOP-8194 (quota using viewfs)
 -

 Key: HDFS-3121
 URL: https://issues.apache.org/jira/browse/HDFS-3121
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
 Attachments: hdfs-3121.patch


 This JIRA is to write tests for viewing quota using viewfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3086) Change Datanode not to send storage list in registration - it will be sent in block report

2012-03-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233963#comment-13233963
 ] 

Hadoop QA commented on HDFS-3086:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519114/h3086_20120320b.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2050//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2050//console

This message is automatically generated.

 Change Datanode not to send storage list in registration - it will be sent in 
 block report
 --

 Key: HDFS-3086
 URL: https://issues.apache.org/jira/browse/HDFS-3086
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h3086_20120320.patch, h3086_20120320b.patch


 When a datnode is registered, the datanode send also the storage lists.  It 
 is not useful since the storage list is already available in block reports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3071) haadmin failover command does not provide enough detail for when target NN is not ready to be active

2012-03-20 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233977#comment-13233977
 ] 

Aaron T. Myers commented on HDFS-3071:
--

The patch looks like it will work to me, but I agree that we shouldn't concern 
ourselves yet with protocol compatibility of the HAServiceProtocol. As such, I 
think you should go ahead and revise the patch to have a more conventional API.

 haadmin failover command does not provide enough detail for when target NN is 
 not ready to be active
 

 Key: HDFS-3071
 URL: https://issues.apache.org/jira/browse/HDFS-3071
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha
Affects Versions: 0.24.0
Reporter: Philip Zeyliger
Assignee: Todd Lipcon
 Attachments: hdfs-3071.txt


 When running the failover command, you can get an error message like the 
 following:
 {quote}
 $ hdfs --config $(pwd) haadmin -failover namenode2 namenode1
 Failover failed: xxx.yyy/1.2.3.4:8020 is not ready to become active
 {quote}
 Unfortunately, the error message doesn't describe why that node isn't ready 
 to be active.  In my case, the target namenode's logs don't indicate anything 
 either. It turned out that the issue was Safe mode is ON.Resources are low 
 on NN. Safe mode must be turned off manually., but ideally the user would be 
 told that at the time of the failover.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-03-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234005#comment-13234005
 ] 

Hadoop QA commented on HDFS-3094:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519125/HDFS-3094.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2053//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2053//console

This message is automatically generated.

 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3100) failed to append data using webhdfs

2012-03-20 Thread Brandon Li (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-3100:
-

Attachment: HDFS-3100.patch

Previous patch missed one condition and didn't send checksum to the client and 
thus real corruption couldn't be detected. This problem is fixed in the new 
patch.

 failed to append data using webhdfs
 ---

 Key: HDFS-3100
 URL: https://issues.apache.org/jira/browse/HDFS-3100
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.24.0, 0.23.1
Reporter: Zhanwei.Wang
Assignee: Brandon Li
 Attachments: HDFS-3100.patch, HDFS-3100.patch, HDFS-3100.patch, 
 hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, 
 test.sh, testAppend.patch


 STEP:
 1, deploy a single node hdfs  0.23.1 cluster and configure hdfs as:
 A) enable webhdfs
 B) enable append
 C) disable permissions
 2, start hdfs
 3, run the test script as attached
 RESULT:
 expected: a file named testFile should be created and populated with 32K * 
 5000 zeros, HDFS should be OK.
 I got: script cannot be finished, file has been created but not be populated 
 as expected, actually append operation failed.
 Datanode log shows that, blockscaner report a bad replica and nanenode decide 
 to delete it. Since it is a single node cluster, append fail. It makes no 
 sense that the script failed every time.
 Datanode and Namenode logs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3100) failed to append data using webhdfs

2012-03-20 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234015#comment-13234015
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3100:
--

Brandon, I think we could avoid the metaFileExists(..) and, as you mentioned, 
there is a race condition between two calls.

 failed to append data using webhdfs
 ---

 Key: HDFS-3100
 URL: https://issues.apache.org/jira/browse/HDFS-3100
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.24.0, 0.23.1
Reporter: Zhanwei.Wang
Assignee: Brandon Li
 Attachments: HDFS-3100.patch, HDFS-3100.patch, HDFS-3100.patch, 
 hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, 
 test.sh, testAppend.patch


 STEP:
 1, deploy a single node hdfs  0.23.1 cluster and configure hdfs as:
 A) enable webhdfs
 B) enable append
 C) disable permissions
 2, start hdfs
 3, run the test script as attached
 RESULT:
 expected: a file named testFile should be created and populated with 32K * 
 5000 zeros, HDFS should be OK.
 I got: script cannot be finished, file has been created but not be populated 
 as expected, actually append operation failed.
 Datanode log shows that, blockscaner report a bad replica and nanenode decide 
 to delete it. Since it is a single node cluster, append fail. It makes no 
 sense that the script failed every time.
 Datanode and Namenode logs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234062#comment-13234062
 ] 

Hadoop QA commented on HDFS-3004:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519152/HDFS-3004.023.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests:
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade
  org.apache.hadoop.hdfs.TestPersistBlocks

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2054//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2054//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2054//console

This message is automatically generated.

 Implement Recovery Mode
 ---

 Key: HDFS-3004
 URL: https://issues.apache.org/jira/browse/HDFS-3004
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
 HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
 HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
 HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, 
 HDFS-3004.023.patch, HDFS-3004__namenode_recovery_tool.txt


 When the NameNode metadata is corrupt for some reason, we want to be able to 
 fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
 world, we never would.  However, bad data on disk can happen from time to 
 time, because of hardware errors or misconfigurations.  In the past we have 
 had to correct it manually, which is time-consuming and which can result in 
 downtime.
 Recovery mode is initialized by the system administrator.  When the NameNode 
 starts up in Recovery Mode, it will try to load the FSImage file, apply all 
 the edits from the edits log, and then write out a new image.  Then it will 
 shut down.
 Unlike in the normal startup process, the recovery mode startup process will 
 be interactive.  When the NameNode finds something that is inconsistent, it 
 will prompt the operator as to what it should do.   The operator can also 
 choose to take the first option for all prompts by starting up with the '-f' 
 flag, or typing 'a' at one of the prompts.
 I have reused as much code as possible from the NameNode in this tool.  
 Hopefully, the effort that was spent developing this will also make the 
 NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3100) failed to append data using webhdfs

2012-03-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234079#comment-13234079
 ] 

Hadoop QA commented on HDFS-3100:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519162/HDFS-3100.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 11 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed the unit tests build

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2056//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2056//console

This message is automatically generated.

 failed to append data using webhdfs
 ---

 Key: HDFS-3100
 URL: https://issues.apache.org/jira/browse/HDFS-3100
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.24.0, 0.23.1
Reporter: Zhanwei.Wang
Assignee: Brandon Li
 Attachments: HDFS-3100.patch, HDFS-3100.patch, HDFS-3100.patch, 
 hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, 
 test.sh, testAppend.patch


 STEP:
 1, deploy a single node hdfs  0.23.1 cluster and configure hdfs as:
 A) enable webhdfs
 B) enable append
 C) disable permissions
 2, start hdfs
 3, run the test script as attached
 RESULT:
 expected: a file named testFile should be created and populated with 32K * 
 5000 zeros, HDFS should be OK.
 I got: script cannot be finished, file has been created but not be populated 
 as expected, actually append operation failed.
 Datanode log shows that, blockscaner report a bad replica and nanenode decide 
 to delete it. Since it is a single node cluster, append fail. It makes no 
 sense that the script failed every time.
 Datanode and Namenode logs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3086) Change Datanode not to send storage list in registration - it will be sent in block report

2012-03-20 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234080#comment-13234080
 ] 

Suresh Srinivas commented on HDFS-3086:
---

Patch looks good. +1.

 Change Datanode not to send storage list in registration - it will be sent in 
 block report
 --

 Key: HDFS-3086
 URL: https://issues.apache.org/jira/browse/HDFS-3086
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h3086_20120320.patch, h3086_20120320b.patch


 When a datnode is registered, the datanode send also the storage lists.  It 
 is not useful since the storage list is already available in block reports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.

2012-03-20 Thread Uma Maheswara Rao G (Created) (JIRA)
Block recovery with closeFile flag true can race with blockReport. Due to this 
blocks are getting marked as corrupt.


 Key: HDFS-3122
 URL: https://issues.apache.org/jira/browse/HDFS-3122
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Critical


*Block Report* can *race* with *Block Recovery* with closeFile flag true.

 IF block report generated just befor recovery at DN side and due to N/W. This 
block report got delayed to NN. Recovery success and generation stamp has been 
changed to new one. 
primary DN invokes the commitBlockSynchronization and block got updated in NN 
side. Also marked as complete, since the closeFile flag is true. Updated with 
new genstamp.

Now blockReport started processing at NN side. This particular block from RBW 
(when it generated the BR at DN), and file was completed at NN side.

Since the genartion stamps are mismatching, block is getting marked as corrupt.

{code}
 case RWR:
  if (!storedBlock.isComplete()) {
return null; // not corrupt
  } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) 
{
return new BlockToMarkCorrupt(storedBlock,
reported  + reportedState +  replica with genstamp  +
iblk.getGenerationStamp() +  does not match COMPLETE block's  +
genstamp in block map  + storedBlock.getGenerationStamp());
  } else { // COMPLETE block, same genstamp
{code}






--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.

2012-03-20 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234104#comment-13234104
 ] 

Uma Maheswara Rao G commented on HDFS-3122:
---

I reproduced this case with the debug points.

1) created a file and hsync'ed that file.
2) triggered on BR in separate thread and blocked this call in NN side just 
before aquiring the fsnamesystem lock.
3) triggered one recoverlease call from separate thread and completed the call.
4) after successfully completed #3 (after commitBlockSynchronization with new 
genstamp), started processing the blocked BR in #2.
5) since that old BR has older genstamp, that block is getting marked as 
corrupt.

will attach the colored logs. 


 Block recovery with closeFile flag true can race with blockReport. Due to 
 this blocks are getting marked as corrupt.
 

 Key: HDFS-3122
 URL: https://issues.apache.org/jira/browse/HDFS-3122
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Critical

 *Block Report* can *race* with *Block Recovery* with closeFile flag true.
  IF block report generated just befor recovery at DN side and due to N/W. 
 This block report got delayed to NN. Recovery success and generation stamp 
 has been changed to new one. 
 primary DN invokes the commitBlockSynchronization and block got updated in NN 
 side. Also marked as complete, since the closeFile flag is true. Updated with 
 new genstamp.
 Now blockReport started processing at NN side. This particular block from RBW 
 (when it generated the BR at DN), and file was completed at NN side.
 Since the genartion stamps are mismatching, block is getting marked as 
 corrupt.
 {code}
  case RWR:
   if (!storedBlock.isComplete()) {
 return null; // not corrupt
   } else if (storedBlock.getGenerationStamp() != 
 iblk.getGenerationStamp()) {
 return new BlockToMarkCorrupt(storedBlock,
 reported  + reportedState +  replica with genstamp  +
 iblk.getGenerationStamp() +  does not match COMPLETE block's  +
 genstamp in block map  + storedBlock.getGenerationStamp());
   } else { // COMPLETE block, same genstamp
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3121) test for HADOOP-8194 (quota using viewfs)

2012-03-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234119#comment-13234119
 ] 

Hadoop QA commented on HDFS-3121:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519154/hdfs-3121.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests:
  org.apache.hadoop.fs.viewfs.TestViewFsFileStatusHdfs

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2055//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2055//console

This message is automatically generated.

 test for HADOOP-8194 (quota using viewfs)
 -

 Key: HDFS-3121
 URL: https://issues.apache.org/jira/browse/HDFS-3121
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
 Attachments: hdfs-3121.patch


 This JIRA is to write tests for viewing quota using viewfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.

2012-03-20 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-3122:
--

Attachment: blockCorrupt.txt

attched the grepped logs.

 Block recovery with closeFile flag true can race with blockReport. Due to 
 this blocks are getting marked as corrupt.
 

 Key: HDFS-3122
 URL: https://issues.apache.org/jira/browse/HDFS-3122
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Critical
 Attachments: blockCorrupt.txt


 *Block Report* can *race* with *Block Recovery* with closeFile flag true.
  IF block report generated just befor recovery at DN side and due to N/W. 
 This block report got delayed to NN. Recovery success and generation stamp 
 has been changed to new one. 
 primary DN invokes the commitBlockSynchronization and block got updated in NN 
 side. Also marked as complete, since the closeFile flag is true. Updated with 
 new genstamp.
 Now blockReport started processing at NN side. This particular block from RBW 
 (when it generated the BR at DN), and file was completed at NN side.
 Since the genartion stamps are mismatching, block is getting marked as 
 corrupt.
 {code}
  case RWR:
   if (!storedBlock.isComplete()) {
 return null; // not corrupt
   } else if (storedBlock.getGenerationStamp() != 
 iblk.getGenerationStamp()) {
 return new BlockToMarkCorrupt(storedBlock,
 reported  + reportedState +  replica with genstamp  +
 iblk.getGenerationStamp() +  does not match COMPLETE block's  +
 genstamp in block map  + storedBlock.getGenerationStamp());
   } else { // COMPLETE block, same genstamp
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.

2012-03-20 Thread Uma Maheswara Rao G (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-3122:
--

 Description: 
*Block Report* can *race* with *Block Recovery* with closeFile flag true.

 Block report generated just before block recovery at DN side and due to N/W 
problems, block report got delayed to NN. 
After this, recovery success and generation stamp modifies to new one. 
And primary DN invokes the commitBlockSynchronization and block got updated in 
NN side. Also block got marked as complete, since the closeFile flag was true. 
Updated with new genstamp.

Now blockReport started processing at NN side. This particular block from RBW 
(when it generated the BR at DN), and file was completed at NN side.
Finally block will be marked as corrupt because of genstamp mismatch.

{code}
 case RWR:
  if (!storedBlock.isComplete()) {
return null; // not corrupt
  } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) 
{
return new BlockToMarkCorrupt(storedBlock,
reported  + reportedState +  replica with genstamp  +
iblk.getGenerationStamp() +  does not match COMPLETE block's  +
genstamp in block map  + storedBlock.getGenerationStamp());
  } else { // COMPLETE block, same genstamp
{code}






  was:
*Block Report* can *race* with *Block Recovery* with closeFile flag true.

 IF block report generated just befor recovery at DN side and due to N/W. This 
block report got delayed to NN. Recovery success and generation stamp has been 
changed to new one. 
primary DN invokes the commitBlockSynchronization and block got updated in NN 
side. Also marked as complete, since the closeFile flag is true. Updated with 
new genstamp.

Now blockReport started processing at NN side. This particular block from RBW 
(when it generated the BR at DN), and file was completed at NN side.

Since the genartion stamps are mismatching, block is getting marked as corrupt.

{code}
 case RWR:
  if (!storedBlock.isComplete()) {
return null; // not corrupt
  } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) 
{
return new BlockToMarkCorrupt(storedBlock,
reported  + reportedState +  replica with genstamp  +
iblk.getGenerationStamp() +  does not match COMPLETE block's  +
genstamp in block map  + storedBlock.getGenerationStamp());
  } else { // COMPLETE block, same genstamp
{code}






Target Version/s: 0.24.0, 0.23.3  (was: 0.23.3, 0.24.0)

 Block recovery with closeFile flag true can race with blockReport. Due to 
 this blocks are getting marked as corrupt.
 

 Key: HDFS-3122
 URL: https://issues.apache.org/jira/browse/HDFS-3122
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Critical
 Attachments: blockCorrupt.txt


 *Block Report* can *race* with *Block Recovery* with closeFile flag true.
  Block report generated just before block recovery at DN side and due to N/W 
 problems, block report got delayed to NN. 
 After this, recovery success and generation stamp modifies to new one. 
 And primary DN invokes the commitBlockSynchronization and block got updated 
 in NN side. Also block got marked as complete, since the closeFile flag was 
 true. Updated with new genstamp.
 Now blockReport started processing at NN side. This particular block from RBW 
 (when it generated the BR at DN), and file was completed at NN side.
 Finally block will be marked as corrupt because of genstamp mismatch.
 {code}
  case RWR:
   if (!storedBlock.isComplete()) {
 return null; // not corrupt
   } else if (storedBlock.getGenerationStamp() != 
 iblk.getGenerationStamp()) {
 return new BlockToMarkCorrupt(storedBlock,
 reported  + reportedState +  replica with genstamp  +
 iblk.getGenerationStamp() +  does not match COMPLETE block's  +
 genstamp in block map  + storedBlock.getGenerationStamp());
   } else { // COMPLETE block, same genstamp
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234131#comment-13234131
 ] 

Hadoop QA commented on HDFS-3004:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519152/HDFS-3004.023.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests:
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade
  org.apache.hadoop.hdfs.TestPersistBlocks

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2057//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2057//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2057//console

This message is automatically generated.

 Implement Recovery Mode
 ---

 Key: HDFS-3004
 URL: https://issues.apache.org/jira/browse/HDFS-3004
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
 HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
 HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
 HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, 
 HDFS-3004.023.patch, HDFS-3004__namenode_recovery_tool.txt


 When the NameNode metadata is corrupt for some reason, we want to be able to 
 fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
 world, we never would.  However, bad data on disk can happen from time to 
 time, because of hardware errors or misconfigurations.  In the past we have 
 had to correct it manually, which is time-consuming and which can result in 
 downtime.
 Recovery mode is initialized by the system administrator.  When the NameNode 
 starts up in Recovery Mode, it will try to load the FSImage file, apply all 
 the edits from the edits log, and then write out a new image.  Then it will 
 shut down.
 Unlike in the normal startup process, the recovery mode startup process will 
 be interactive.  When the NameNode finds something that is inconsistent, it 
 will prompt the operator as to what it should do.   The operator can also 
 choose to take the first option for all prompts by starting up with the '-f' 
 flag, or typing 'a' at one of the prompts.
 I have reused as much code as possible from the NameNode in this tool.  
 Hopefully, the effort that was spent developing this will also make the 
 NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira