[jira] Commented: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN

2011-02-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990462#comment-12990462
 ] 

Todd Lipcon commented on HDFS-900:
--

I'm a little concerned that this wasn't committed with a test. The fix looks 
good but manual testing won't prevent a regression

> Corrupt replicas are not tracked correctly through block report from DN
> ---
>
> Key: HDFS-900
> URL: https://issues.apache.org/jira/browse/HDFS-900
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: log-commented, reportCorruptBlock.patch, 
> to-reproduce.patch
>
>
> This one is tough to describe, but essentially the following order of events 
> is seen to occur:
> # A client marks one replica of a block to be corrupt by telling the NN about 
> it
> # Replication is then scheduled to make a new replica of this node
> # The replication completes, such that there are now 3 good replicas and 1 
> corrupt replica
> # The DN holding the corrupt replica sends a block report. Rather than 
> telling this DN to delete the node, the NN instead marks this as a new *good* 
> replica of the block, and schedules deletion on one of the good replicas.
> I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
> dfs.replication=2, but it seems feasible. I will attach a debug log with some 
> commentary marked by '>', plus a unit test patch which I can get 
> to reproduce this behavior reliably. (it's not a proper unit test, just some 
> edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-03 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990457#comment-12990457
 ] 

Konstantin Boudnik commented on HDFS-1602:
--

+1 patch seems to be legit and it sure fixes HDFS-1496


> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
> Attachments: HDFS-1602.patch
>
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-875) NameNode incorretly handles corrupt replicas

2011-02-03 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990452#comment-12990452
 ] 

Konstantin Shvachko commented on HDFS-875:
--

Now that HDFS-900 is closed, should we close this one as well?
If not than it would be good to have a precise description of what the problem 
is, how it reveals itself, is it reproducible, and have anybody seen it live.

> NameNode incorretly handles corrupt replicas
> 
>
> Key: HDFS-875
> URL: https://issues.apache.org/jira/browse/HDFS-875
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hairong Kuang
>Priority: Blocker
> Fix For: 0.22.0
>
>
> I reviewed how NameNode handles corrupt replicas as part of work on HDFS-145. 
> Comparing to releases prior to 0.21, NameNode now does a good job identifying 
> corrupt replicas, but it seems to me there are two flaws how it handles the 
> corrupt replicas:
> 1. NameNode does not add corrupt replicas to the block locations as what 
> NameNode does before;
> 2. If the corruption is caused by generation stamp mismatch or state 
> mismatch, the wrong GS and state do not get put in corruptReplicasMap. 
> Therefore it may lead to the deletion of the wrong replica. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN

2011-02-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-900:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this.

> Corrupt replicas are not tracked correctly through block report from DN
> ---
>
> Key: HDFS-900
> URL: https://issues.apache.org/jira/browse/HDFS-900
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: log-commented, reportCorruptBlock.patch, 
> to-reproduce.patch
>
>
> This one is tough to describe, but essentially the following order of events 
> is seen to occur:
> # A client marks one replica of a block to be corrupt by telling the NN about 
> it
> # Replication is then scheduled to make a new replica of this node
> # The replication completes, such that there are now 3 good replicas and 1 
> corrupt replica
> # The DN holding the corrupt replica sends a block report. Rather than 
> telling this DN to delete the node, the NN instead marks this as a new *good* 
> replica of the block, and schedules deletion on one of the good replicas.
> I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
> dfs.replication=2, but it seems feasible. I will attach a debug log with some 
> commentary marked by '>', plus a unit test patch which I can get 
> to reproduce this behavior reliably. (it's not a proper unit test, just some 
> edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN

2011-02-03 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990446#comment-12990446
 ] 

Konstantin Shvachko commented on HDFS-900:
--

test failures:
TestFileConcurrentReader - HDFS-1401
TestStorageRestore - HDFS-1496

test-patch results:
{code}
 [exec] -1 overall.  
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec] 
==
{code}
Testing of this patch have dome manually and using Todd's utility attached 
above.

> Corrupt replicas are not tracked correctly through block report from DN
> ---
>
> Key: HDFS-900
> URL: https://issues.apache.org/jira/browse/HDFS-900
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: log-commented, reportCorruptBlock.patch, 
> to-reproduce.patch
>
>
> This one is tough to describe, but essentially the following order of events 
> is seen to occur:
> # A client marks one replica of a block to be corrupt by telling the NN about 
> it
> # Replication is then scheduled to make a new replica of this node
> # The replication completes, such that there are now 3 good replicas and 1 
> corrupt replica
> # The DN holding the corrupt replica sends a block report. Rather than 
> telling this DN to delete the node, the NN instead marks this as a new *good* 
> replica of the block, and schedules deletion on one of the good replicas.
> I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
> dfs.replication=2, but it seems feasible. I will attach a debug log with some 
> commentary marked by '>', plus a unit test patch which I can get 
> to reproduce this behavior reliably. (it's not a proper unit test, just some 
> edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN

2011-02-03 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990424#comment-12990424
 ] 

Jakob Homan commented on HDFS-900:
--

+1

> Corrupt replicas are not tracked correctly through block report from DN
> ---
>
> Key: HDFS-900
> URL: https://issues.apache.org/jira/browse/HDFS-900
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: log-commented, reportCorruptBlock.patch, 
> to-reproduce.patch
>
>
> This one is tough to describe, but essentially the following order of events 
> is seen to occur:
> # A client marks one replica of a block to be corrupt by telling the NN about 
> it
> # Replication is then scheduled to make a new replica of this node
> # The replication completes, such that there are now 3 good replicas and 1 
> corrupt replica
> # The DN holding the corrupt replica sends a block report. Rather than 
> telling this DN to delete the node, the NN instead marks this as a new *good* 
> replica of the block, and schedules deletion on one of the good replicas.
> I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
> dfs.replication=2, but it seems feasible. I will attach a debug log with some 
> commentary marked by '>', plus a unit test patch which I can get 
> to reproduce this behavior reliably. (it's not a proper unit test, just some 
> edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1515) Test append and quotas

2011-02-03 Thread Po Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990420#comment-12990420
 ] 

Po Cheung commented on HDFS-1515:
-

There exists a testcase for quotas and append in TestQuota.testSpaceCommands.  
Will that suffice?

{noformat}
  // Test Append :
  
  // verify space quota
  c = dfs.getContentSummary(quotaDir1);
  assertEquals(c.getSpaceQuota(), 4 * fileSpace);
  
  // verify space before append;
  c = dfs.getContentSummary(dstPath);
  assertEquals(c.getSpaceConsumed(), 3 * fileSpace);
  
  OutputStream out = dfs.append(file2);
  // appending 1 fileLen should succeed
  out.write(new byte[fileLen]);
  out.close();
  
  file2Len += fileLen; // after append
  
  // verify space after append;
  c = dfs.getContentSummary(dstPath);
  assertEquals(c.getSpaceConsumed(), 4 * fileSpace);
  
  // now increase the quota for quotaDir1
  dfs.setQuota(quotaDir1, FSConstants.QUOTA_DONT_SET, 5 * fileSpace);
  // Now, appending more than 1 fileLen should result in an error
  out = dfs.append(file2);
  hasException = false;
  try {
out.write(new byte[fileLen + 1024]);
out.flush();
out.close();
  } catch (DSQuotaExceededException e) {
hasException = true;
IOUtils.closeStream(out);
  }
  assertTrue(hasException);
  
  file2Len += fileLen; // after partial append
  
  // verify space after partial append
  c = dfs.getContentSummary(dstPath);
  assertEquals(c.getSpaceConsumed(), 5 * fileSpace);
{noformat}

> Test append and quotas 
> ---
>
> Key: HDFS-1515
> URL: https://issues.apache.org/jira/browse/HDFS-1515
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
>
> There is no test coverage for quotas and append. Let's add a test to 
> TestQuota that covers that quotas are updated correctly when appending to a 
> file.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Closed: (HDFS-982) TestDelegationToken#testDelegationTokenWithRealUser is failing

2011-02-03 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins closed HDFS-982.



> TestDelegationToken#testDelegationTokenWithRealUser is failing
> --
>
> Key: HDFS-982
> URL: https://issues.apache.org/jira/browse/HDFS-982
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: contrib/hdfsproxy, security
>Affects Versions: 0.22.0
>Reporter: Eli Collins
>Assignee: Po Cheung
>Priority: Blocker
> Fix For: 0.22.0
>
>
> Hudson is reporting that TestDelegationToken#testDelegationTokenWithRealUser 
> is failing on trunk.
> Failing for the past 10 builds (Since #223 )
> Took 0.61 sec.
> add description
> Error Message
> User: RealUser is not allowed to impersonate proxyUser
> Stacktrace
> org.apache.hadoop.ipc.RemoteException: User: RealUser is not allowed to 
> impersonate proxyUser
>   at org.apache.hadoop.ipc.Client.call(Client.java:887)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228)
>   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:220)
>   at 
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:151)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:250)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:217)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:87)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1747)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1775)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1763)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:101)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.getFileSystem(MiniDFSCluster.java:813)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:147)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:145)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken.testDelegationTokenWithRealUser(TestDelegationToken.java:144)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (HDFS-982) TestDelegationToken#testDelegationTokenWithRealUser is failing

2011-02-03 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-982.
--

Resolution: Fixed

Looks like it's fixed.

> TestDelegationToken#testDelegationTokenWithRealUser is failing
> --
>
> Key: HDFS-982
> URL: https://issues.apache.org/jira/browse/HDFS-982
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: contrib/hdfsproxy, security
>Affects Versions: 0.22.0
>Reporter: Eli Collins
>Assignee: Po Cheung
>Priority: Blocker
> Fix For: 0.22.0
>
>
> Hudson is reporting that TestDelegationToken#testDelegationTokenWithRealUser 
> is failing on trunk.
> Failing for the past 10 builds (Since #223 )
> Took 0.61 sec.
> add description
> Error Message
> User: RealUser is not allowed to impersonate proxyUser
> Stacktrace
> org.apache.hadoop.ipc.RemoteException: User: RealUser is not allowed to 
> impersonate proxyUser
>   at org.apache.hadoop.ipc.Client.call(Client.java:887)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228)
>   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:220)
>   at 
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:151)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:250)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:217)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:87)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1747)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1775)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1763)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:101)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.getFileSystem(MiniDFSCluster.java:813)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:147)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:145)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken.testDelegationTokenWithRealUser(TestDelegationToken.java:144)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-982) TestDelegationToken#testDelegationTokenWithRealUser is failing

2011-02-03 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990413#comment-12990413
 ] 

Konstantin Shvachko commented on HDFS-982:
--

Eli do you still see that failing. If not I'll close it.

> TestDelegationToken#testDelegationTokenWithRealUser is failing
> --
>
> Key: HDFS-982
> URL: https://issues.apache.org/jira/browse/HDFS-982
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: contrib/hdfsproxy, security
>Affects Versions: 0.22.0
>Reporter: Eli Collins
>Assignee: Po Cheung
>Priority: Blocker
> Fix For: 0.22.0
>
>
> Hudson is reporting that TestDelegationToken#testDelegationTokenWithRealUser 
> is failing on trunk.
> Failing for the past 10 builds (Since #223 )
> Took 0.61 sec.
> add description
> Error Message
> User: RealUser is not allowed to impersonate proxyUser
> Stacktrace
> org.apache.hadoop.ipc.RemoteException: User: RealUser is not allowed to 
> impersonate proxyUser
>   at org.apache.hadoop.ipc.Client.call(Client.java:887)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228)
>   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:220)
>   at 
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:151)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:250)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:217)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:87)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1747)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1775)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1763)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:101)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.getFileSystem(MiniDFSCluster.java:813)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:147)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:145)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken.testDelegationTokenWithRealUser(TestDelegationToken.java:144)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-03 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990410#comment-12990410
 ] 

Sanjay Radia commented on HDFS-347:
---

Dhruba,
  you had mentioned that you have a prototype for this local optimization. 
Could you please share the performance improvement observed and your approach.

> DFS read performance suboptimal when client co-located on nodes with data
> -
>
> Key: HDFS-347
> URL: https://issues.apache.org/jira/browse/HDFS-347
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: George Porter
>Assignee: Todd Lipcon
> Attachments: HADOOP-4801.1.patch, HADOOP-4801.2.patch, 
> HADOOP-4801.3.patch, all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-982) TestDelegationToken#testDelegationTokenWithRealUser is failing

2011-02-03 Thread Po Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990382#comment-12990382
 ] 

Po Cheung commented on HDFS-982:


The test passes on trunk as well.

> TestDelegationToken#testDelegationTokenWithRealUser is failing
> --
>
> Key: HDFS-982
> URL: https://issues.apache.org/jira/browse/HDFS-982
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: contrib/hdfsproxy, security
>Affects Versions: 0.22.0
>Reporter: Eli Collins
>Assignee: Po Cheung
>Priority: Blocker
> Fix For: 0.22.0
>
>
> Hudson is reporting that TestDelegationToken#testDelegationTokenWithRealUser 
> is failing on trunk.
> Failing for the past 10 builds (Since #223 )
> Took 0.61 sec.
> add description
> Error Message
> User: RealUser is not allowed to impersonate proxyUser
> Stacktrace
> org.apache.hadoop.ipc.RemoteException: User: RealUser is not allowed to 
> impersonate proxyUser
>   at org.apache.hadoop.ipc.Client.call(Client.java:887)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228)
>   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:220)
>   at 
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:151)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:250)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:217)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:87)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1747)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1775)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1763)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:101)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.getFileSystem(MiniDFSCluster.java:813)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:147)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken$1.run(TestDelegationToken.java:145)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706)
>   at 
> org.apache.hadoop.hdfs.security.TestDelegationToken.testDelegationTokenWithRealUser(TestDelegationToken.java:144)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix

2011-02-03 Thread Boris Shkolnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990355#comment-12990355
 ] 

Boris Shkolnik commented on HDFS-1496:
--

Looks like the problem is that when we are trying to restore a storage dir, we 
format it , which always saves the current in-memory state into a _new_ 
fsimage. So instead we should restore a storage without saving the state and 
creating new fsimage. It will be copied there during the checkpoint anyway. 
I've attached the patch to HDFS-1602. Please look at it and comment.(patch is 
for trunk).

> TestStorageRestore is failing after HDFS-903 fix
> 
>
> Key: HDFS-1496
> URL: https://issues.apache.org/jira/browse/HDFS-1496
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Konstantin Boudnik
>Assignee: Hairong Kuang
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1496.sh, HDFS-1496.sh, HDFS-1496.sh
>
>
> TestStorageRestore seems to be failing after HDFS-903 commit. Running git 
> bisect confirms it.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (HDFS-1607) Fix references to misspelled method name getProtocolSigature

2011-02-03 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-1607.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Verified that this patch compiles against the current Common snapshot and 
committed to trunk.

> Fix references to misspelled method name getProtocolSigature
> 
>
> Key: HDFS-1607
> URL: https://issues.apache.org/jira/browse/HDFS-1607
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Trivial
> Attachments: hdfs-1607.txt
>
>


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1610) TestClientProtocolWithDelegationToken failing

2011-02-03 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1610:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

> TestClientProtocolWithDelegationToken failing
> -
>
> Key: HDFS-1610
> URL: https://issues.apache.org/jira/browse/HDFS-1610
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-1610.txt
>
>
> Another instance of the same type of failure as MAPREDUCE-2300 (a mock 
> protocol implementation isn't returning a protocol signature)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN

2011-02-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-900:
-

Status: Patch Available  (was: Open)

> Corrupt replicas are not tracked correctly through block report from DN
> ---
>
> Key: HDFS-900
> URL: https://issues.apache.org/jira/browse/HDFS-900
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: log-commented, reportCorruptBlock.patch, 
> to-reproduce.patch
>
>
> This one is tough to describe, but essentially the following order of events 
> is seen to occur:
> # A client marks one replica of a block to be corrupt by telling the NN about 
> it
> # Replication is then scheduled to make a new replica of this node
> # The replication completes, such that there are now 3 good replicas and 1 
> corrupt replica
> # The DN holding the corrupt replica sends a block report. Rather than 
> telling this DN to delete the node, the NN instead marks this as a new *good* 
> replica of the block, and schedules deletion on one of the good replicas.
> I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
> dfs.replication=2, but it seems feasible. I will attach a debug log with some 
> commentary marked by '>', plus a unit test patch which I can get 
> to reproduce this behavior reliably. (it's not a proper unit test, just some 
> edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1610) TestClientProtocolWithDelegationToken failing

2011-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990287#comment-12990287
 ] 

Hadoop QA commented on HDFS-1610:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12470177/hdfs-1610.txt
  against trunk revision 1066305.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

-1 release audit.  The applied patch generated 2 release audit warnings 
(more than the trunk's current 0 warnings).

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.server.namenode.TestStorageRestore

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/146//testReport/
Release audit warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/146//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/146//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/146//console

This message is automatically generated.

> TestClientProtocolWithDelegationToken failing
> -
>
> Key: HDFS-1610
> URL: https://issues.apache.org/jira/browse/HDFS-1610
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-1610.txt
>
>
> Another instance of the same type of failure as MAPREDUCE-2300 (a mock 
> protocol implementation isn't returning a protocol signature)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1610) TestClientProtocolWithDelegationToken failing

2011-02-03 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990282#comment-12990282
 ] 

Konstantin Boudnik commented on HDFS-1610:
--

+1 patch looks good and fixed the issues

> TestClientProtocolWithDelegationToken failing
> -
>
> Key: HDFS-1610
> URL: https://issues.apache.org/jira/browse/HDFS-1610
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-1610.txt
>
>
> Another instance of the same type of failure as MAPREDUCE-2300 (a mock 
> protocol implementation isn't returning a protocol signature)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1610) TestClientProtocolWithDelegationToken failing

2011-02-03 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1610:
--

Status: Patch Available  (was: Open)

> TestClientProtocolWithDelegationToken failing
> -
>
> Key: HDFS-1610
> URL: https://issues.apache.org/jira/browse/HDFS-1610
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-1610.txt
>
>
> Another instance of the same type of failure as MAPREDUCE-2300 (a mock 
> protocol implementation isn't returning a protocol signature)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1610) TestClientProtocolWithDelegationToken failing

2011-02-03 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1610:
--

Attachment: hdfs-1610.txt

Patch to fix both tests

> TestClientProtocolWithDelegationToken failing
> -
>
> Key: HDFS-1610
> URL: https://issues.apache.org/jira/browse/HDFS-1610
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-1610.txt
>
>
> Another instance of the same type of failure as MAPREDUCE-2300 (a mock 
> protocol implementation isn't returning a protocol signature)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1610) TestClientProtocolWithDelegationToken failing

2011-02-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990239#comment-12990239
 ] 

Todd Lipcon commented on HDFS-1610:
---

TestBlockToken has the same problem

> TestClientProtocolWithDelegationToken failing
> -
>
> Key: HDFS-1610
> URL: https://issues.apache.org/jira/browse/HDFS-1610
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
>
> Another instance of the same type of failure as MAPREDUCE-2300 (a mock 
> protocol implementation isn't returning a protocol signature)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HDFS-1610) TestClientProtocolWithDelegationToken failing

2011-02-03 Thread Todd Lipcon (JIRA)
TestClientProtocolWithDelegationToken failing
-

 Key: HDFS-1610
 URL: https://issues.apache.org/jira/browse/HDFS-1610
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker


Another instance of the same type of failure as MAPREDUCE-2300 (a mock protocol 
implementation isn't returning a protocol signature)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-03 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1602:
-

Attachment: HDFS-1602.patch

> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
> Attachments: HDFS-1602.patch
>
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-03 Thread Boris Shkolnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990218#comment-12990218
 ] 

Boris Shkolnik commented on HDFS-1602:
--

I've looked at testStorageRestore failure. Seems like the problem is that when 
we are trying to restore a storage dir, we format it , which always saves the 
current in-memory state into a new fsimage. So instead we should restore a 
storage without storing creating new fsimage. It will be copied from the 
CheckPoint.
Here is a patch for trunk to do it. Please review.

> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1609) rmr command is not displaying any error message when a path contains wildcard characters and does not exist.

2011-02-03 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HDFS-1609:
--

Attachment: HDFS-1609.patch

currently if globAndProcess() doesn't find any file that match with the 
pattern, does nothing. with the HDFS-1609.patch, raise FileNotFoundException if 
there's no matches

> rmr command is not displaying any error message when a path contains wildcard 
> characters and does not exist.
> 
>
> Key: HDFS-1609
> URL: https://issues.apache.org/jira/browse/HDFS-1609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.20.1, 0.20.2
>Reporter: Uma Maheswara Rao G
>Priority: Minor
> Attachments: HDFS-1609.patch
>
>
> When we give invalid directory path then it will show error message on the 
> console. But if we provide the wildcard expression in invalid directory path 
> then it will not show any error message even there is no pattern match for 
> that path.
> linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin # ./hdfs dfs -rmr /test/test
> rmr: cannot remove /test/test: No such file or directory.
> *linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin # ./hdfs dfs -rmr /test* *
> *linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin #*

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HDFS-1609) rmr command is not displaying any error message when a path contains wildcard characters and does not exist.

2011-02-03 Thread Uma Maheswara Rao G (JIRA)
rmr command is not displaying any error message when a path contains wildcard 
characters and does not exist.


 Key: HDFS-1609
 URL: https://issues.apache.org/jira/browse/HDFS-1609
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.2, 0.20.1
Reporter: Uma Maheswara Rao G
Priority: Minor


When we give invalid directory path then it will show error message on the 
console. But if we provide the wildcard expression in invalid directory path 
then it will not show any error message even there is no pattern match for that 
path.

linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin # ./hdfs dfs -rmr /test/test
rmr: cannot remove /test/test: No such file or directory.

*linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin # ./hdfs dfs -rmr /test* *
*linux-9j5v:/home/hadoop-hdfs-0.22.0-SNAPSHOT/bin #*


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1608) Provide overwrite option (-overwrite/-f) in put and copyFromLocal command line options

2011-02-03 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-1608:
--

Description: 
FileSystem has the API 



*public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, 
Path dst)*
 
 
This API provides overwrite option. But the mapping command line doesn't have 
this option. To maintain the consistency and better usage  the command line 
option also can support the overwrite option like to put the files forcefully. 
( put [-f]  ) and also for copyFromLocal command line option.


  was:
FileSystem has the API 

  {code:xml}


public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, 
Path dst)

{code}
 
This API provides overwrite option. But the mapping command line doesn't have 
this option. To maintain the consistency and better usage  the command line 
option also can support the overwrite option like to put the files forcefully. 
( put [-f]  ) and also for copyFromLocal command line option.



> Provide overwrite option (-overwrite/-f) in put and copyFromLocal command 
> line options
> --
>
> Key: HDFS-1608
> URL: https://issues.apache.org/jira/browse/HDFS-1608
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.20.1, 0.20.2
>Reporter: Uma Maheswara Rao G
>Priority: Minor
>
> FileSystem has the API 
> *public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] 
> srcs, Path dst)*
>  
>  
> This API provides overwrite option. But the mapping command line doesn't have 
> this option. To maintain the consistency and better usage  the command line 
> option also can support the overwrite option like to put the files 
> forcefully. ( put [-f]  ) and also for copyFromLocal 
> command line option.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HDFS-1608) Provide overwrite option (-overwrite/-f) in put and copyFromLocal command line options

2011-02-03 Thread Uma Maheswara Rao G (JIRA)
Provide overwrite option (-overwrite/-f) in put and copyFromLocal command line 
options
--

 Key: HDFS-1608
 URL: https://issues.apache.org/jira/browse/HDFS-1608
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.20.2, 0.20.1
Reporter: Uma Maheswara Rao G
Priority: Minor


FileSystem has the API 

  {code:xml}


public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, 
Path dst)

{code}
 
This API provides overwrite option. But the mapping command line doesn't have 
this option. To maintain the consistency and better usage  the command line 
option also can support the overwrite option like to put the files forcefully. 
( put [-f]  ) and also for copyFromLocal command line option.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Assigned: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN

2011-02-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reassigned HDFS-900:


Assignee: Konstantin Shvachko

> Corrupt replicas are not tracked correctly through block report from DN
> ---
>
> Key: HDFS-900
> URL: https://issues.apache.org/jira/browse/HDFS-900
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: log-commented, reportCorruptBlock.patch, 
> to-reproduce.patch
>
>
> This one is tough to describe, but essentially the following order of events 
> is seen to occur:
> # A client marks one replica of a block to be corrupt by telling the NN about 
> it
> # Replication is then scheduled to make a new replica of this node
> # The replication completes, such that there are now 3 good replicas and 1 
> corrupt replica
> # The DN holding the corrupt replica sends a block report. Rather than 
> telling this DN to delete the node, the NN instead marks this as a new *good* 
> replica of the block, and schedules deletion on one of the good replicas.
> I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
> dfs.replication=2, but it seems feasible. I will attach a debug log with some 
> commentary marked by '>', plus a unit test patch which I can get 
> to reproduce this behavior reliably. (it's not a proper unit test, just some 
> edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN

2011-02-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-900:
-

Attachment: reportCorruptBlock.patch

Yes, this is indeed a bug in block report. After step 3 in Todd's description 
the NN has 3 good replicas and one corrupt. The corrupt replica is in 
recentInvalidatesSet, but not in the DatanodeDescriptor. That is the replica is 
scheduled for deletion from the DN. See blockReceived(). 
But before it is deleted from the DN, that same DN sends a block report, which 
contains the replica. DatanodeDescriptor.processReport() treats it as a new 
replica because it is not in the DatanodeDescriptor and a good one since its 
blockId, generationStamp, and the length are in order.
The fix is to ignore replicas that are scheduled for deletion from this DN.
I tested this patch with the test case attached by Todd, thanks. The test 
passes with the fix and fails without.
The test case is not exactly a unit test as it introduces changes to 
FSNamesystem class for testing. So I did not include it to the patch.
Todd, is it possible to convert your case into a real unit test.

> Corrupt replicas are not tracked correctly through block report from DN
> ---
>
> Key: HDFS-900
> URL: https://issues.apache.org/jira/browse/HDFS-900
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: log-commented, reportCorruptBlock.patch, 
> to-reproduce.patch
>
>
> This one is tough to describe, but essentially the following order of events 
> is seen to occur:
> # A client marks one replica of a block to be corrupt by telling the NN about 
> it
> # Replication is then scheduled to make a new replica of this node
> # The replication completes, such that there are now 3 good replicas and 1 
> corrupt replica
> # The DN holding the corrupt replica sends a block report. Rather than 
> telling this DN to delete the node, the NN instead marks this as a new *good* 
> replica of the block, and schedules deletion on one of the good replicas.
> I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
> dfs.replication=2, but it seems feasible. I will attach a debug log with some 
> commentary marked by '>', plus a unit test patch which I can get 
> to reproduce this behavior reliably. (it's not a proper unit test, just some 
> edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira