[jira] Updated: (HDFS-984) Delegation Tokens should be persisted in Namenode

2010-02-19 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-984:
--

Attachment: HDFS-984.7.patch

This patch is dependent on HADOOP-6573 patch to compile successfully.

> Delegation Tokens should be persisted in Namenode
> -
>
> Key: HDFS-984
> URL: https://issues.apache.org/jira/browse/HDFS-984
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-984.7.patch
>
>
> The Delegation tokens should be persisted in the FsImage and EditLogs so that 
> they are valid to be used after namenode shutdown and restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-988) saveNamespace can corrupt edits log

2010-02-19 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-988:
--

Attachment: saveNamespace.txt

This patch is for hadoop 0.20. It fixes the race of saveNamespace with setting 
safemode. 

> saveNamespace can corrupt edits log
> ---
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: dhruba borthakur
> Attachments: saveNamespace.txt
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-984) Delegation Tokens should be persisted in Namenode

2010-02-19 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835644#action_12835644
 ] 

Jitendra Nath Pandey commented on HDFS-984:
---

> Can you please explain the use-case why these need to be persisted?
  Delegation tokens are used by tasks to authenticate with namenode. If 
namenode restarts and looses information for previously issued delegation 
tokens, the running tasks will not be able to connect to the namenode.
  The scope of this jira is limited to delegation tokens issued by namenode. 
The delgation tokens in MR issued by JT are not being persisted.

> Delegation Tokens should be persisted in Namenode
> -
>
> Key: HDFS-984
> URL: https://issues.apache.org/jira/browse/HDFS-984
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-984.7.patch
>
>
> The Delegation tokens should be persisted in the FsImage and EditLogs so that 
> they are valid to be used after namenode shutdown and restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-986) Push HADOOP-6551 into HDFS

2010-02-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835650#action_12835650
 ] 

Owen O'Malley commented on HDFS-986:


Sorry, I should have commented, that I ran tests locally and they passed.

> Push HADOOP-6551 into HDFS
> --
>
> Key: HDFS-986
> URL: https://issues.apache.org/jira/browse/HDFS-986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: h-986-1.patch, h-986.patch
>
>
> We need to throw readable error messages instead of returning false on errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-986) Push HADOOP-6551 into HDFS

2010-02-19 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-986:
---

   Resolution: Fixed
Fix Version/s: 0.22.0
   Status: Resolved  (was: Patch Available)

I just committed this.

> Push HADOOP-6551 into HDFS
> --
>
> Key: HDFS-986
> URL: https://issues.apache.org/jira/browse/HDFS-986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: h-986-1.patch, h-986.patch
>
>
> We need to throw readable error messages instead of returning false on errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-908) TestDistributedFileSystem fails with Wrong FS on weird hosts

2010-02-19 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-908:
-

Attachment: hdfs-908.txt

Here's a new patch that fixes this issues on all the hosts I've tried it on

> TestDistributedFileSystem fails with Wrong FS on weird hosts
> 
>
> Key: HDFS-908
> URL: https://issues.apache.org/jira/browse/HDFS-908
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
> Attachments: hdfs-908.txt, hdfs-908.txt
>
>
> On the same host where I experienced HDFS-874, I also experience this failure 
> for TestDistributedFileSystem:
> Testcase: testFileChecksum took 0.492 sec
>   Caused an ERROR
> Wrong FS: hftp://localhost.localdomain:59782/filechecksum/foo0, expected: 
> hftp://127.0.0.1:59782
> java.lang.IllegalArgumentException: Wrong FS: 
> hftp://localhost.localdomain:59782/filechecksum/foo0, expected: 
> hftp://127.0.0.1:59782
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
>   at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:222)
>   at 
> org.apache.hadoop.hdfs.HftpFileSystem.getFileChecksum(HftpFileSystem.java:318)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testFileChecksum(TestDistributedFileSystem.java:166)
> Doesn't appear to occur on trunk or branch-0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-986) Push HADOOP-6551 into HDFS

2010-02-19 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835841#action_12835841
 ] 

Eli Collins commented on HDFS-986:
--

Looks like this caused a compilation failure, see HDFS-990.

> Push HADOOP-6551 into HDFS
> --
>
> Key: HDFS-986
> URL: https://issues.apache.org/jira/browse/HDFS-986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: h-986-1.patch, h-986.patch
>
>
> We need to throw readable error messages instead of returning false on errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-990) FSNameSystem#renewDelegationToken doesn't compile

2010-02-19 Thread Eli Collins (JIRA)
FSNameSystem#renewDelegationToken doesn't compile
-

 Key: HDFS-990
 URL: https://issues.apache.org/jira/browse/HDFS-990
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Eli Collins
Priority: Blocker
 Fix For: 0.22.0


The following code returns a boolean but a long (the new expiration time) is 
expected. Looks like HDFS-986 introduced this.

{code} 
 public long renewDelegationToken(Token token)
  throws InvalidToken, IOException {
String renewer = UserGroupInformation.getCurrentUser().getShortUserName();
return dtSecretManager.renewToken(token, renewer);
  }
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-990) FSNameSystem#renewDelegationToken doesn't compile

2010-02-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-990:
-

Component/s: name-node

> FSNameSystem#renewDelegationToken doesn't compile
> -
>
> Key: HDFS-990
> URL: https://issues.apache.org/jira/browse/HDFS-990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Eli Collins
>Priority: Blocker
> Fix For: 0.22.0
>
>
> The following code returns a boolean but a long (the new expiration time) is 
> expected. Looks like HDFS-986 introduced this.
> {code} 
>  public long renewDelegationToken(Token token)
>   throws InvalidToken, IOException {
> String renewer = UserGroupInformation.getCurrentUser().getShortUserName();
> return dtSecretManager.renewToken(token, renewer);
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-990) FSNameSystem#renewDelegationToken doesn't compile

2010-02-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835853#action_12835853
 ] 

Owen O'Malley commented on HDFS-990:


I suspect that you caught it before Hudson had pushed the change to Common out.

{quote}
mvn-install:
[artifact:install] [INFO] Installing 
/Users/oom/work/eclipse/hdfs-trunk/build/hadoop-hdfs-0.22.0-SNAPSHOT.jar to 
/Users/oom/.m2/repository/org/apache/hadoop/hadoop-hdfs/0.22.0-SNAPSHOT/hadoop-hdfs-0.22.0-SNAPSHOT.jar
[artifact:install] [INFO] Installing 
/Users/oom/work/eclipse/hdfs-trunk/build/hadoop-hdfs-test-0.22.0-SNAPSHOT.jar 
to 
/Users/oom/.m2/repository/org/apache/hadoop/hadoop-hdfs-test/0.22.0-SNAPSHOT/hadoop-hdfs-test-0.22.0-SNAPSHOT.jar

BUILD SUCCESSFUL
{quote}

Trunk compiles for me, do you still see this?

> FSNameSystem#renewDelegationToken doesn't compile
> -
>
> Key: HDFS-990
> URL: https://issues.apache.org/jira/browse/HDFS-990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Eli Collins
>Priority: Blocker
> Fix For: 0.22.0
>
>
> The following code returns a boolean but a long (the new expiration time) is 
> expected. Looks like HDFS-986 introduced this.
> {code} 
>  public long renewDelegationToken(Token token)
>   throws InvalidToken, IOException {
> String renewer = UserGroupInformation.getCurrentUser().getShortUserName();
> return dtSecretManager.renewToken(token, renewer);
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-990) FSNameSystem#renewDelegationToken doesn't compile

2010-02-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835858#action_12835858
 ] 

Owen O'Malley commented on HDFS-990:


I'd also suggest doing an "ant veryclean" to make sure you aren't getting a 
stale jar from ivy. 

Have I mentioned that ivy is a big problem...

> FSNameSystem#renewDelegationToken doesn't compile
> -
>
> Key: HDFS-990
> URL: https://issues.apache.org/jira/browse/HDFS-990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Eli Collins
>Priority: Blocker
> Fix For: 0.22.0
>
>
> The following code returns a boolean but a long (the new expiration time) is 
> expected. Looks like HDFS-986 introduced this.
> {code} 
>  public long renewDelegationToken(Token token)
>   throws InvalidToken, IOException {
> String renewer = UserGroupInformation.getCurrentUser().getShortUserName();
> return dtSecretManager.renewToken(token, renewer);
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-990) FSNameSystem#renewDelegationToken doesn't compile

2010-02-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-990.
--

Resolution: Not A Problem

Looks like HDFS-986 made it over to git but HADOOP-6551 is still on the way, my 
git connector is probably just being slow.  Sorry for the noise!

> FSNameSystem#renewDelegationToken doesn't compile
> -
>
> Key: HDFS-990
> URL: https://issues.apache.org/jira/browse/HDFS-990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Eli Collins
>Priority: Blocker
> Fix For: 0.22.0
>
>
> The following code returns a boolean but a long (the new expiration time) is 
> expected. Looks like HDFS-986 introduced this.
> {code} 
>  public long renewDelegationToken(Token token)
>   throws InvalidToken, IOException {
> String renewer = UserGroupInformation.getCurrentUser().getShortUserName();
> return dtSecretManager.renewToken(token, renewer);
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (HDFS-733) TestBlockReport fails intermittently

2010-02-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reopened HDFS-733:
--


I'm seeing this again on trunk, eg  
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/testReport.
 Running TestBlockReport locally works with the same patch, and has worked for 
a while with this patch.

Regression

org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08 (from 
TestBlockReport)

Failing for the past 1 build (Since #236 )
Took 4.3 sec.
add description
Error Message

Wrong number of PendingReplication blocks expected:<2> but was:<1>
Stacktrace

junit.framework.AssertionFailedError: Wrong number of PendingReplication blocks 
expected:<2> but was:<1>
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08(TestBlockReport.java:393)

> TestBlockReport fails intermittently
> 
>
> Key: HDFS-733
> URL: https://issues.apache.org/jira/browse/HDFS-733
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Suresh Srinivas
>Assignee: Konstantin Boudnik
> Fix For: 0.21.0, 0.22.0
>
> Attachments: HDFS-733.2.patch, HDFS-733.patch, HDFS-733.patch, 
> HDFS-733.patch, HDFS-733.patch
>
>
> Details at 
> http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/58/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-19 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835876#action_12835876
 ] 

Suresh Srinivas commented on HDFS-946:
--

+1 the patch looks good.

> NameNode should not return full path name when lisitng a diretory or getting 
> the status of a file
> -
>
> Key: HDFS-946
> URL: https://issues.apache.org/jira/browse/HDFS-946
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, 
> HdfsFileStatus3.patch
>
>
> FSDirectory#getListring(String src) has the following code:
>   int i = 0;
>   for (INode cur : contents) {
> listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
> i++;
>   }
> So listing a directory will return an array of FileStatus. Each FileStatus 
> element has the full path name. This increases the return message size and 
> adds non-negligible CPU time to the operation.
> FSDirectory#getFileInfo(String) does not need to return the file name either.
> Another optimization is that in the version of FileStatus that's used in the 
> wire protocol, the field path does not need to be Path; It could be a String 
> or a byte array ideally. This could avoid unnecessary creation of the Path 
> objects at NameNode, thus help reduce the GC problem observed when a large 
> number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-245) Create symbolic links in HDFS

2010-02-19 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835877#action_12835877
 ] 

Eli Collins commented on HDFS-245:
--

The core test failure TestBlockReport looks like HDFS-733, passes for me 
locally. The contrib test failure is hudson failing to download tomcat. 
 
  [exec] BUILD FAILED
 [exec] 
/grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build.xml:569:
 The following error occurred while executing this line:
 [exec] 
/grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/build.xml:48:
 The following error occurred while executing this line:
 [exec] 
/grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/hdfsproxy/build.xml:292:
 org.codehaus.cargo.container.ContainerException: Failed to download 
[http://apache.osuosl.org/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip]

> Create symbolic links in HDFS
> -
>
> Key: HDFS-245
> URL: https://issues.apache.org/jira/browse/HDFS-245
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: Eli Collins
> Attachments: 4044_20081030spi.java, design-doc-v4.txt, 
> designdocv1.txt, designdocv2.txt, designdocv3.txt, 
> HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, 
> symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, 
> symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, 
> symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, 
> symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, 
> symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, 
> symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, 
> symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, 
> symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, 
> symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, 
> symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, 
> symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, 
> symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, 
> symlink37-hdfs.patch, symlink38-hdfs.patch, symlink39-hdfs.patch, 
> symLink4.patch, symlink40-hdfs.patch, symLink5.patch, symLink6.patch, 
> symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file 
> that contains a reference to another file or directory in the form of an 
> absolute or relative path and that affects pathname resolution. Programs 
> which read or write to files named by a symbolic link will behave as if 
> operating directly on the target file. However, archiving utilities can 
> handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-733) TestBlockReport fails intermittently

2010-02-19 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835890#action_12835890
 ] 

Eli Collins commented on HDFS-733:
--

Was able to reproduce this locally by looping the test, here's another assert 
that failed: 

Testcase: blockReport_08 took 50.467 sec
FAILED
Was waiting too long for a replica to become TEMPORARY
junit.framework.AssertionFailedError: Was waiting too long for a replica to 
become TEMPORARY
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockReport.waitForTempReplica(TestBlockReport.java:483)
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08(TestBlockReport.java:387)


> TestBlockReport fails intermittently
> 
>
> Key: HDFS-733
> URL: https://issues.apache.org/jira/browse/HDFS-733
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Suresh Srinivas
>Assignee: Konstantin Boudnik
> Fix For: 0.21.0, 0.22.0
>
> Attachments: HDFS-733.2.patch, HDFS-733.patch, HDFS-733.patch, 
> HDFS-733.patch, HDFS-733.patch
>
>
> Details at 
> http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/58/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-520) Create new tests for block recovery

2010-02-19 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-520:
---

Status: Patch Available  (was: Reopened)

> Create new tests for block recovery
> ---
>
> Key: HDFS-520
> URL: https://issues.apache.org/jira/browse/HDFS-520
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: blockRecoveryPositive.patch, blockRecoveryPositive.patch
>
>
> According to the test plan a number of new features are going to be 
> implemented as a part of this umbrella (HDFS-265) JIRA.
> These new features are have to be tested properly. Block recovery is one of 
> new functionality which require new tests to be developed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-520) Create new tests for block recovery

2010-02-19 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-520:
---

Attachment: blockRecoveryPositive.patch

Here is a new patch that addressed all Cos's comments.

> Create new tests for block recovery
> ---
>
> Key: HDFS-520
> URL: https://issues.apache.org/jira/browse/HDFS-520
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: blockRecoveryPositive.patch, blockRecoveryPositive.patch
>
>
> According to the test plan a number of new features are going to be 
> implemented as a part of this umbrella (HDFS-265) JIRA.
> These new features are have to be tested properly. Block recovery is one of 
> new functionality which require new tests to be developed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-961) dfs_readdir incorrectly parses paths

2010-02-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-961:
-

Status: Patch Available  (was: Open)

> dfs_readdir incorrectly parses paths
> 
>
> Key: HDFS-961
> URL: https://issues.apache.org/jira/browse/HDFS-961
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/fuse-dfs
>Affects Versions: 0.20.1, 0.20.2, 0.21.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-961-1.patch, hdfs-961-2.patch
>
>
> fuse-dfs dfs_readdir assumes that DistributedFileSystem#listStatus returns 
> Paths with the same scheme/authority as the dfs.name.dir used to connect. If 
> NameNode.DEFAULT_PORT port is used listStatus returns Paths that have 
> authorities without the port (see HDFS-960), which breaks the following code. 
> {code}
> // hack city: todo fix the below to something nicer and more maintainable but
> // with good performance
> // strip off the path but be careful if the path is solely '/'
> // NOTE - this API started returning filenames as full dfs uris
> const char *const str = info[i].mName + dfs->dfs_uri_len + path_len + 
> ((path_len == 1 && *path == '/') ? 0 : 1);
> {code}
> Let's make the path parsing here more robust. listStatus returns normalized 
> paths so we can find the start of the path by searching for the 3rd slash. A 
> more long term solution is to have hdfsFileInfo maintain a path object or at 
> least pointers to the relevant URI components.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-961) dfs_readdir incorrectly parses paths

2010-02-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-961:
-

Status: Open  (was: Patch Available)

> dfs_readdir incorrectly parses paths
> 
>
> Key: HDFS-961
> URL: https://issues.apache.org/jira/browse/HDFS-961
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/fuse-dfs
>Affects Versions: 0.20.1, 0.20.2, 0.21.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-961-1.patch, hdfs-961-2.patch
>
>
> fuse-dfs dfs_readdir assumes that DistributedFileSystem#listStatus returns 
> Paths with the same scheme/authority as the dfs.name.dir used to connect. If 
> NameNode.DEFAULT_PORT port is used listStatus returns Paths that have 
> authorities without the port (see HDFS-960), which breaks the following code. 
> {code}
> // hack city: todo fix the below to something nicer and more maintainable but
> // with good performance
> // strip off the path but be careful if the path is solely '/'
> // NOTE - this API started returning filenames as full dfs uris
> const char *const str = info[i].mName + dfs->dfs_uri_len + path_len + 
> ((path_len == 1 && *path == '/') ? 0 : 1);
> {code}
> Let's make the path parsing here more robust. listStatus returns normalized 
> paths so we can find the start of the path by searching for the 3rd slash. A 
> more long term solution is to have hdfsFileInfo maintain a path object or at 
> least pointers to the relevant URI components.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-19 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-946:
---

Hadoop Flags: [Incompatible change, Reviewed]  (was: [Incompatible change])
  Status: Patch Available  (was: Open)

Looks that contribute tests did not get to run. Resubmitting the patch...

> NameNode should not return full path name when lisitng a diretory or getting 
> the status of a file
> -
>
> Key: HDFS-946
> URL: https://issues.apache.org/jira/browse/HDFS-946
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, 
> HdfsFileStatus3.patch
>
>
> FSDirectory#getListring(String src) has the following code:
>   int i = 0;
>   for (INode cur : contents) {
> listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
> i++;
>   }
> So listing a directory will return an array of FileStatus. Each FileStatus 
> element has the full path name. This increases the return message size and 
> adds non-negligible CPU time to the operation.
> FSDirectory#getFileInfo(String) does not need to return the file name either.
> Another optimization is that in the version of FileStatus that's used in the 
> wire protocol, the field path does not need to be Path; It could be a String 
> or a byte array ideally. This could avoid unnecessary creation of the Path 
> objects at NameNode, thus help reduce the GC problem observed when a large 
> number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-19 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-946:
---

Status: Open  (was: Patch Available)

> NameNode should not return full path name when lisitng a diretory or getting 
> the status of a file
> -
>
> Key: HDFS-946
> URL: https://issues.apache.org/jira/browse/HDFS-946
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, 
> HdfsFileStatus3.patch
>
>
> FSDirectory#getListring(String src) has the following code:
>   int i = 0;
>   for (INode cur : contents) {
> listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
> i++;
>   }
> So listing a directory will return an array of FileStatus. Each FileStatus 
> element has the full path name. This increases the return message size and 
> adds non-negligible CPU time to the operation.
> FSDirectory#getFileInfo(String) does not need to return the file name either.
> Another optimization is that in the version of FileStatus that's used in the 
> wire protocol, the field path does not need to be Path; It could be a String 
> or a byte array ideally. This could avoid unnecessary creation of the Path 
> objects at NameNode, thus help reduce the GC problem observed when a large 
> number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-520) Create new tests for block recovery

2010-02-19 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-520:
---

Attachment: blockRecoveryPositive1.patch

> Create new tests for block recovery
> ---
>
> Key: HDFS-520
> URL: https://issues.apache.org/jira/browse/HDFS-520
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: blockRecoveryPositive.patch, blockRecoveryPositive1.patch
>
>
> According to the test plan a number of new features are going to be 
> implemented as a part of this umbrella (HDFS-265) JIRA.
> These new features are have to be tested properly. Block recovery is one of 
> new functionality which require new tests to be developed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-520) Create new tests for block recovery

2010-02-19 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-520:
---

Attachment: (was: blockRecoveryPositive.patch)

> Create new tests for block recovery
> ---
>
> Key: HDFS-520
> URL: https://issues.apache.org/jira/browse/HDFS-520
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: blockRecoveryPositive.patch, blockRecoveryPositive1.patch
>
>
> According to the test plan a number of new features are going to be 
> implemented as a part of this umbrella (HDFS-265) JIRA.
> These new features are have to be tested properly. Block recovery is one of 
> new functionality which require new tests to be developed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-87) NameNode startup fails if edit log terminates prematurely

2010-02-19 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur resolved HDFS-87.
--

Resolution: Not A Problem

> NameNode startup fails if edit log terminates prematurely
> -
>
> Key: HDFS-87
> URL: https://issues.apache.org/jira/browse/HDFS-87
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: ~50 node cluster
>Reporter: Bryan Pendleton
> Attachments: fixNameNodeStartup.patch
>
>
> I ran out of space on the device that stores the edit log, resulting in an 
> edit log that is truncated mid transaction.
> Ideally, the NameNode should start up, in SafeMode or the like, whenever this 
> happens. Right now, you get this stack trace:
> 2006-12-12 15:33:57,212 ERROR org.apache.hadoop.dfs.NameNode: 
> java.io.EOFExcepti
> on
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:310)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:104)
> at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:227)
> at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:191)
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:320)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:226)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:146)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:138)
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:589)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-87) NameNode startup fails if edit log terminates prematurely

2010-02-19 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835945#action_12835945
 ] 

dhruba borthakur commented on HDFS-87:
--

The issue of having an truncated last-entry in the edit slog is an old issue 
and this problem should not be there in the current trunk (after we introduced 
the preallocation of blocks in the edits log and OP_INVALID end marker).

I am closing  this for now


> NameNode startup fails if edit log terminates prematurely
> -
>
> Key: HDFS-87
> URL: https://issues.apache.org/jira/browse/HDFS-87
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: ~50 node cluster
>Reporter: Bryan Pendleton
> Attachments: fixNameNodeStartup.patch
>
>
> I ran out of space on the device that stores the edit log, resulting in an 
> edit log that is truncated mid transaction.
> Ideally, the NameNode should start up, in SafeMode or the like, whenever this 
> happens. Right now, you get this stack trace:
> 2006-12-12 15:33:57,212 ERROR org.apache.hadoop.dfs.NameNode: 
> java.io.EOFExcepti
> on
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:310)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:104)
> at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:227)
> at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:191)
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:320)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:226)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:146)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:138)
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:589)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits

2010-02-19 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835950#action_12835950
 ] 

Konstantin Shvachko commented on HDFS-955:
--

Very good test cases.
I don't understand why you mixed in this patch the code from HDFS-957. It does 
not help the test cases, right?
# testCrashWhileSavingSecondImage() passes with my fix.
# testCrashBeforeRollingFSImage() does not pass. The reason is that we
#- we save image into IMAGE_NEW
#- then empty EDITS and EDITS_NEW
#- then crash
The criteria that IMAGE_NEW was written completely and successfully is the 
existence of EDITS_NEW. So if we try to restart NN at this point, IMAGE_NEW 
will be discarded (since EDITS_NEW is present) and we end up with the old IMAGE.
# testSaveWhileEditsRolled(). I don't know what you were trying to achieve with 
this. I don't see the expected exception thrown. But it fails, and the reason 
here is that after saveNamespace() we get a corrupted edits. It has the version 
bytes, but does not have the end-mark byte (OP_INVALID) in the end. I think it 
was not closed properly.

Will try to fix this.

> FSImage.saveFSImage can lose edits
> --
>
> Key: HDFS-955
> URL: https://issues.apache.org/jira/browse/HDFS-955
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-955-moretests.txt, hdfs-955-unittest.txt, 
> PurgeEditsBeforeImageSave.patch
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-961) dfs_readdir incorrectly parses paths

2010-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835962#action_12835962
 ] 

Hadoop QA commented on HDFS-961:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435934/hdfs-961-2.patch
  against trunk revision 911744.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/117/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/117/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/117/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/117/console

This message is automatically generated.

> dfs_readdir incorrectly parses paths
> 
>
> Key: HDFS-961
> URL: https://issues.apache.org/jira/browse/HDFS-961
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/fuse-dfs
>Affects Versions: 0.20.1, 0.20.2, 0.21.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-961-1.patch, hdfs-961-2.patch
>
>
> fuse-dfs dfs_readdir assumes that DistributedFileSystem#listStatus returns 
> Paths with the same scheme/authority as the dfs.name.dir used to connect. If 
> NameNode.DEFAULT_PORT port is used listStatus returns Paths that have 
> authorities without the port (see HDFS-960), which breaks the following code. 
> {code}
> // hack city: todo fix the below to something nicer and more maintainable but
> // with good performance
> // strip off the path but be careful if the path is solely '/'
> // NOTE - this API started returning filenames as full dfs uris
> const char *const str = info[i].mName + dfs->dfs_uri_len + path_len + 
> ((path_len == 1 && *path == '/') ? 0 : 1);
> {code}
> Let's make the path parsing here more robust. listStatus returns normalized 
> paths so we can find the start of the path by searching for the 3rd slash. A 
> more long term solution is to have hdfsFileInfo maintain a path object or at 
> least pointers to the relevant URI components.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits

2010-02-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835992#action_12835992
 ] 

Todd Lipcon commented on HDFS-955:
--

bq. I don't understand why you mixed in this patch the code from HDFS-957. It 
does not help the test cases, right?

I was just working on the two in the same tree - with a bit of modification to 
recoverInterruptedCheckpoint I think the second test can be fixed using the 
functionality from HDFS-957. I can demonstrate this with a patch if you would 
like.

bq. The criteria that IMAGE_NEW was written completely and successfully is the 
existence of EDITS_NEW

I think you misspoke here - EDITS_NEW exists _before_ IMAGE_NEW is saved. In my 
opinion the cleanest way of knowing that IMAGE_NEW is complete is the HDFS-957 
patch. You may be able to know that info from the state of some other files, 
but why not be explicit about it to avoid some classes of errors?

bq. I don't know what you were trying to achieve with this. I don't see the 
expected exception thrown. 

Ah, sloppy copy paste there on my part. I don't expect an exception to actually 
be caught there. The failed restart with corrupted edits is indeed the failure 
I expected to provoke with that test.

> FSImage.saveFSImage can lose edits
> --
>
> Key: HDFS-955
> URL: https://issues.apache.org/jira/browse/HDFS-955
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-955-moretests.txt, hdfs-955-unittest.txt, 
> PurgeEditsBeforeImageSave.patch
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-729) fsck option to list only corrupted files

2010-02-19 Thread Rodrigo Schmidt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated HDFS-729:
-

Attachment: HDFS-729.2.patch

> fsck option to list only corrupted files
> 
>
> Key: HDFS-729
> URL: https://issues.apache.org/jira/browse/HDFS-729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: Rodrigo Schmidt
> Attachments: badFiles.txt, badFiles2.txt, corruptFiles.txt, 
> HDFS-729.1.patch, HDFS-729.2.patch
>
>
> An option to fsck to list only corrupted files will be very helpful for 
> frequent monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-729) fsck option to list only corrupted files

2010-02-19 Thread Rodrigo Schmidt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated HDFS-729:
-

Status: Patch Available  (was: Open)

I just added a complete patch.

I'm returning an array of FileStatus objects, as Dhruba suggested, and I used 
the nomenclature proposed by Raghu (corrupt instead of bad)

> fsck option to list only corrupted files
> 
>
> Key: HDFS-729
> URL: https://issues.apache.org/jira/browse/HDFS-729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: Rodrigo Schmidt
> Attachments: badFiles.txt, badFiles2.txt, corruptFiles.txt, 
> HDFS-729.1.patch, HDFS-729.2.patch
>
>
> An option to fsck to list only corrupted files will be very helpful for 
> frequent monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-520) Create new tests for block recovery

2010-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836020#action_12836020
 ] 

Hadoop QA commented on HDFS-520:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12436355/blockRecoveryPositive.patch
  against trunk revision 911744.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/237/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/237/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/237/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/237/console

This message is automatically generated.

> Create new tests for block recovery
> ---
>
> Key: HDFS-520
> URL: https://issues.apache.org/jira/browse/HDFS-520
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: blockRecoveryPositive.patch, blockRecoveryPositive1.patch
>
>
> According to the test plan a number of new features are going to be 
> implemented as a part of this umbrella (HDFS-265) JIRA.
> These new features are have to be tested properly. Block recovery is one of 
> new functionality which require new tests to be developed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log

2010-02-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836021#action_12836021
 ] 

Todd Lipcon commented on HDFS-988:
--

Hi Dhruba,

I still think we should fix this in the other issues and then backport to 20. 
But I'll do a review of this patch here since you've already uploaded it:

- in setPermission, the audit logging has moved outside the synchronized block. 
Thus dir.getFileInfo may actually return incorrect info (or even fail if it 
races with someone deleting the file)
- same goes for setOwner
- I think it's OK, but can you verify that the top synchronized block in 
getAdditionalBlock can never have side effects? I don't know the lease 
management code well enough - checkLease is guaranteed side-effect free?



> saveNamespace can corrupt edits log
> ---
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: dhruba borthakur
> Attachments: saveNamespace.txt
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-19 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-946:
---

Attachment: HdfsFileStatus4.patch

This patch synced with the trunk and fixed a bug in ListPathsServlet.

> NameNode should not return full path name when lisitng a diretory or getting 
> the status of a file
> -
>
> Key: HDFS-946
> URL: https://issues.apache.org/jira/browse/HDFS-946
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, 
> HdfsFileStatus3.patch, HdfsFileStatus4.patch
>
>
> FSDirectory#getListring(String src) has the following code:
>   int i = 0;
>   for (INode cur : contents) {
> listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
> i++;
>   }
> So listing a directory will return an array of FileStatus. Each FileStatus 
> element has the full path name. This increases the return message size and 
> adds non-negligible CPU time to the operation.
> FSDirectory#getFileInfo(String) does not need to return the file name either.
> Another optimization is that in the version of FileStatus that's used in the 
> wire protocol, the field path does not need to be Path; It could be a String 
> or a byte array ideally. This could avoid unnecessary creation of the Path 
> objects at NameNode, thus help reduce the GC problem observed when a large 
> number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits

2010-02-19 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836067#action_12836067
 ] 

Konstantin Shvachko commented on HDFS-955:
--

>> The criteria that IMAGE_NEW was written completely and successfully is the 
>> existence of EDITS_NEW
> I think you misspoke here - EDITS_NEW exists before IMAGE_NEW is saved. 

This is what I meant. During start up the NN decides on whether to discard or 
to keep IMAGE_NEW (and rename it to IMAGE) based on the existence of EDITS_NEW. 
If EDITS_NEW exists then it simply removes IMAGE_NEW. This means that the NN 
failure occurred before IMAGE_NEW was completed. If EDITS_NEW is not present, 
but IMAGE_NEW is, this means that the NN failure occurred after IMAGE_NEW was 
successfully written, and therefore the NN need just to complete the checkpoint 
by renaming IMAGE_NEW to IMAGE and purging edits.

> You may be able to know that info from the state of some other files, but why 
> not be explicit about it to avoid some classes of errors?

We want to be able to know about failure without reading contents of the image 
file. The contents may be corrupted during failures, it is not safe to rely on 
reading the data from image or edits files.

> FSImage.saveFSImage can lose edits
> --
>
> Key: HDFS-955
> URL: https://issues.apache.org/jira/browse/HDFS-955
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-955-moretests.txt, hdfs-955-unittest.txt, 
> PurgeEditsBeforeImageSave.patch
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits

2010-02-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836073#action_12836073
 ] 

Todd Lipcon commented on HDFS-955:
--

bq. If EDITS_NEW is not present, but IMAGE_NEW is, this means that the NN 
failure occurred after IMAGE_NEW was successfully written

This is not necessarily the case on a lot of filesystems. As I noted in 
HDFS-970, delayed allocation combined with the default journaling modes in many 
commonly deployed filesystems means that you cannot use the existance of one 
file to determine whether data has been flushed in another. That is to say, 
some filesystems will recover the metadata operations on the EDITS files even 
though the data operations on IMAGE_NEW are incomplete.

The _only_ way we can know that IMAGE_NEW is really on disk across a variety of 
filesystems is to fsync it. Otherwise, when the filesystem is recovered, we 
could rollback to a state where the file is empty but EDITS_NEW has been 
removed.

bq. During start up the NN decides on whether to discard or to keep IMAGE_NEW 
(and rename it to IMAGE) based on the existence of EDITS_NEW

I agree this is what it does. But I don't think there is then any valid rolling 
order that tolerates arbitrary crashes. See my discussion above

bq. The contents may be corrupted during failures, it is not safe to rely on 
reading the data from image or edits files

It is safe if we fsync. metadata can also be corrupted (rolled back to 
indeterminate states) in failures. Especially with the broken way in which we 
currently do image replacement, I don't want to take chances here. This is best 
explained by the presentation linked from this post by Theodore T'so: 
http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync

> FSImage.saveFSImage can lose edits
> --
>
> Key: HDFS-955
> URL: https://issues.apache.org/jira/browse/HDFS-955
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-955-moretests.txt, hdfs-955-unittest.txt, 
> PurgeEditsBeforeImageSave.patch
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-520) Create new tests for block recovery

2010-02-19 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-520:


Attachment: 
TEST-org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.txt

Hmm, please let me know if I'm doing something wrong here, but I've tried to 
run 
{{ant clean run-test-unit}} and all TestBlockRecovery tests have failed (see 
attached log)

> Create new tests for block recovery
> ---
>
> Key: HDFS-520
> URL: https://issues.apache.org/jira/browse/HDFS-520
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: blockRecoveryPositive.patch, 
> blockRecoveryPositive1.patch, 
> TEST-org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.txt
>
>
> According to the test plan a number of new features are going to be 
> implemented as a part of this umbrella (HDFS-265) JIRA.
> These new features are have to be tested properly. Block recovery is one of 
> new functionality which require new tests to be developed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-708) A stress-test tool for HDFS.

2010-02-19 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836098#action_12836098
 ] 

Konstantin Shvachko commented on HDFS-708:
--

Hi Wang Xu. Thanks for the link. I think people will find your scripting 
solution useful for testing HDFS directly without the MapReduce framework. 
What we want to achieve by this is to model the workload of a real cluster and 
test it on that load. MR is a part of testing for us.

> A stress-test tool for HDFS.
> 
>
> Key: HDFS-708
> URL: https://issues.apache.org/jira/browse/HDFS-708
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: test, tools
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.22.0
>
> Attachments: SLiveTest.pdf
>
>
> It would be good to have a tool for automatic stress testing HDFS, which 
> would provide IO-intensive load on HDFS cluster.
> The idea is to start the tool, let it run overnight, and then be able to 
> analyze possible failures.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-708) A stress-test tool for HDFS.

2010-02-19 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-708:
-

Attachment: SLiveTest.pdf

Attaching a more detailed design document.

> A stress-test tool for HDFS.
> 
>
> Key: HDFS-708
> URL: https://issues.apache.org/jira/browse/HDFS-708
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: test, tools
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.22.0
>
> Attachments: SLiveTest.pdf
>
>
> It would be good to have a tool for automatic stress testing HDFS, which 
> would provide IO-intensive load on HDFS cluster.
> The idea is to start the tool, let it run overnight, and then be able to 
> analyze possible failures.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-984) Delegation Tokens should be persisted in Namenode

2010-02-19 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-984:
--

Attachment: HDFS-984.10.patch

> Delegation Tokens should be persisted in Namenode
> -
>
> Key: HDFS-984
> URL: https://issues.apache.org/jira/browse/HDFS-984
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-984.10.patch, HDFS-984.7.patch
>
>
> The Delegation tokens should be persisted in the FsImage and EditLogs so that 
> they are valid to be used after namenode shutdown and restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.