[jira] [Moved] (HDFS-5844) Fix broken link in WebHDFS.apt.vm

2014-01-27 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA moved HADOOP-10299 to HDFS-5844:
--

  Component/s: (was: documentation)
   documentation
 Target Version/s: 2.3.0  (was: 2.3.0)
Affects Version/s: (was: 2.2.0)
   2.2.0
  Key: HDFS-5844  (was: HADOOP-10299)
  Project: Hadoop HDFS  (was: Hadoop Common)

> Fix broken link in WebHDFS.apt.vm
> -
>
> Key: HDFS-5844
> URL: https://issues.apache.org/jira/browse/HDFS-5844
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Minor
>  Labels: newbie
>
> There is one broken link in WebHDFS.apt.vm.
> {code}
> {{{RemoteException JSON Schema}}}
> {code}
> should be
> {code}
> {{RemoteException JSON Schema}}
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5754:
-

Attachment: HDFS-5754.010.patch

Rebased the patch with the HDFS-5535 branch along with a couple unit test fixes.

> Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
> 
>
> Key: HDFS-5754
> URL: https://issues.apache.org/jira/browse/HDFS-5754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Attachments: FeatureInfo.patch, HDFS-5754.001.patch, 
> HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, 
> HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, 
> HDFS-5754.009.patch, HDFS-5754.010.patch
>
>
> Currently, LayoutVersion defines the on-disk data format and supported 
> features of the entire cluster including NN and DNs.  LayoutVersion is 
> persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
> supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
> different LayoutVersion than NN cannot register with the NN.
> We propose to split LayoutVersion into two independent values that are local 
> to the nodes:
> - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
> the format of FSImage, editlog and the directory structure.
> - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
> the format of block data file, metadata file, block pool layout, and the 
> directory structure.  
> The LayoutVersion check will be removed in DN registration.  If 
> NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
> upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4854) Fix the broken images in the Federation documentation page

2014-01-27 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-4854:


Resolution: Duplicate
Status: Resolved  (was: Patch Available)

The links were fixed in HDFS-5231. Closing this issue as duplicate.

> Fix the broken images in the Federation documentation page
> --
>
> Key: HDFS-4854
> URL: https://issues.apache.org/jira/browse/HDFS-4854
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.0.4-alpha
>Reporter: Stephen Chu
>Assignee: Stephen Chu
> Attachments: HDFS-4854.patch
>
>
> Currently, there are two broken images in the Federation documentation 
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html.
> federation.gif and federation-background.gif are inside hadoop-yarn-project 
> site resources, but Federation.apt.vm has moved to hadoop-hdfs-project.
> We should move these two .gifs back to hadoop-hdfs-project and fix the image 
> links in Federation.apt.vm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-27 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883823#comment-13883823
 ] 

Liang Xie commented on HDFS-5776:
-

bq. Isn't the call to actualGetFromOneDataNode wrapped in a loop itself? I am 
talking about the while loop in fetchBlockByteRange. Will that not change the 
behavior? Maybe it is harmless, I am not sure. I just want us to be clear 
either way.
Yes, it doesn't change the whole behavior and harmless, in deed, it's safer 
than before.
In the old impl, the refetchToken/refetchEncryptionKey are shared by all nodes 
from chooseDataNode once key/token exception happened. that means if the first 
node consumed this retry quota, then if the second or third node hit the 
key/token exception,  clearDataEncryptionKey/fetchBlockAt opeerations will not 
be called, it's a little unfair:)
In the new impl/patch, we make the second or later node have a similar retry 
quota as the first node, it's more fair to me.
Anyway, it doesn't change the normal path, just safer/fair to the 
security-enabled scenario.

bq. The test looks like a stress test, i.e. we are hoping that some of the 
hedged requests will complete before the primary requests. We can create a 
separate Jira to write a deterministic unit test and it’s fine if someone else 
picks that up later.
Ok, I can track it later.

For patch v9 or v10, both are OK with me(though our internal branch use the 
style without limit), since my original wish is to reduce the HBase's P99 and 
P99.9 latency, not any difference on this point. V9 is safer but probably need 
to modify HDFS source code again if hit the hardcode limit(It's difficult to a 
normal end user).  IMHO, the actual/final committer who will commit this JIRA 
can pick one up. It'll be a pity if lots of guys continue to argue this style 
and hold on the progress, that doesn't help the downstream HBase project at all.

> Support 'hedged' reads in DFSClient
> ---
>
> Key: HDFS-5776
> URL: https://issues.apache.org/jira/browse/HDFS-5776
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, 
> HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, 
> HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-2892) Some of property descriptions are not given(hdfs-default.xml)

2014-01-27 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2892.
---

  Resolution: Invalid
Target Version/s:   (was: 2.0.0-alpha, 3.0.0)

Resolving as Invalid as these were user questions.

> Some of property descriptions are not given(hdfs-default.xml) 
> --
>
> Key: HDFS-2892
> URL: https://issues.apache.org/jira/browse/HDFS-2892
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.0
>Reporter: Brahma Reddy Battula
>Priority: Trivial
>
> Hi..I taken 23.0 release form 
> http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available
> I just gone through all properties provided in the hdfs-default.xml..Some of 
> the property description not mentioned..It's better to give description of 
> property and usage(how to configure ) and Only MapReduce related jars only 
> provided..Please check following two configurations
>  *No Description*
> {noformat}
> 
>   dfs.datanode.https.address
>   0.0.0.0:50475
> 
> 
>   dfs.namenode.https-address
>   0.0.0.0:50470
> 
> {noformat}
>  Better to mention example usage (what to configure...format(syntax))in 
> desc,here I did not get what default mean whether this name of n/w interface 
> or something else
>  
>   dfs.datanode.dns.interface
>   default
>   The name of the Network Interface from which a data node 
> should 
>   report its IP address.
>   
>  
> The following property is commented..If it is not supported better to remove.
> 
>dfs.cluster.administrators
>ACL for the admins
>This configuration is used to control who can access the
> default servlets in the namenode, etc.
>
> 
>  Small clarification for following property..if some value configured this 
> then NN will be safe mode upto this much time..
> May I know usage of the following property...
> 
>   dfs.blockreport.initialDelay  0
>   Delay for first block report in seconds.
> 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5843) DFSClient.getFileChecksum() throws IOException if checksum is disabled

2014-01-27 Thread Laurent Goujon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Goujon updated HDFS-5843:
-

Attachment: hdfs-5843.patch

Attaching patch to fix the issue + test case to verify. Thanks for reviewing

> DFSClient.getFileChecksum() throws IOException if checksum is disabled
> --
>
> Key: HDFS-5843
> URL: https://issues.apache.org/jira/browse/HDFS-5843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Laurent Goujon
> Attachments: hdfs-5843.patch
>
>
> If a file is created with checksum disabled (using {{ChecksumOpt.disabled()}} 
> for example), calling {{FileSystem.getFileChecksum()}} throws the following 
> IOException:
> {noformat}
> java.io.IOException: Fail to get block MD5 for 
> BP-341493254-192.168.1.10-1390888724459:blk_1073741825_1001
>   at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1965)
>   at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1771)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1186)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1194)
> [...]
> {noformat}
> From the logs, the datanode is doing some wrong arithmetics because of the 
> crcPerBlock:
> {noformat}
> 2014-01-27 21:58:46,329 ERROR datanode.DataNode (DataXceiver.java:run(225)) - 
> 127.0.0.1:52398:DataXceiver error processing BLOCK_CHECKSUM operation  src: 
> /127.0.0.1:52407 dest: /127.0.0.1:52398
> java.lang.ArithmeticException: / by zero
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockChecksum(DataXceiver.java:658)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opBlockChecksum(Receiver.java:169)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>   at java.lang.Thread.run(Thread.java:695)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5843) DFSClient.getFileChecksum() throws IOException if checksum is disabled

2014-01-27 Thread Laurent Goujon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Goujon updated HDFS-5843:
-

Status: Patch Available  (was: Open)

> DFSClient.getFileChecksum() throws IOException if checksum is disabled
> --
>
> Key: HDFS-5843
> URL: https://issues.apache.org/jira/browse/HDFS-5843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Laurent Goujon
> Attachments: hdfs-5843.patch
>
>
> If a file is created with checksum disabled (using {{ChecksumOpt.disabled()}} 
> for example), calling {{FileSystem.getFileChecksum()}} throws the following 
> IOException:
> {noformat}
> java.io.IOException: Fail to get block MD5 for 
> BP-341493254-192.168.1.10-1390888724459:blk_1073741825_1001
>   at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1965)
>   at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1771)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1186)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1194)
> [...]
> {noformat}
> From the logs, the datanode is doing some wrong arithmetics because of the 
> crcPerBlock:
> {noformat}
> 2014-01-27 21:58:46,329 ERROR datanode.DataNode (DataXceiver.java:run(225)) - 
> 127.0.0.1:52398:DataXceiver error processing BLOCK_CHECKSUM operation  src: 
> /127.0.0.1:52407 dest: /127.0.0.1:52398
> java.lang.ArithmeticException: / by zero
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockChecksum(DataXceiver.java:658)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opBlockChecksum(Receiver.java:169)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>   at java.lang.Thread.run(Thread.java:695)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5843) DFSClient.getFileChecksum() throws IOException if checksum is disabled

2014-01-27 Thread Laurent Goujon (JIRA)
Laurent Goujon created HDFS-5843:


 Summary: DFSClient.getFileChecksum() throws IOException if 
checksum is disabled
 Key: HDFS-5843
 URL: https://issues.apache.org/jira/browse/HDFS-5843
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Laurent Goujon


If a file is created with checksum disabled (using {{ChecksumOpt.disabled()}} 
for example), calling {{FileSystem.getFileChecksum()}} throws the following 
IOException:

{noformat}
java.io.IOException: Fail to get block MD5 for 
BP-341493254-192.168.1.10-1390888724459:blk_1073741825_1001
at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1965)
at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1771)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1186)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1194)
[...]
{noformat}

>From the logs, the datanode is doing some wrong arithmetics because of the 
>crcPerBlock:
{noformat}
2014-01-27 21:58:46,329 ERROR datanode.DataNode (DataXceiver.java:run(225)) - 
127.0.0.1:52398:DataXceiver error processing BLOCK_CHECKSUM operation  src: 
/127.0.0.1:52407 dest: /127.0.0.1:52398
java.lang.ArithmeticException: / by zero
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockChecksum(DataXceiver.java:658)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opBlockChecksum(Receiver.java:169)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:695)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5835) Add a new option for starting standby NN when rolling upgrade is in progress

2014-01-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-5835.
--

   Resolution: Fixed
Fix Version/s: HDFS-5535 (Rolling upgrades)
 Hadoop Flags: Reviewed

I have committed this.

> Add a new option for starting standby NN when rolling upgrade is in progress
> 
>
> Key: HDFS-5835
> URL: https://issues.apache.org/jira/browse/HDFS-5835
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: h5835_20130127.patch
>
>
> When rolling upgrade is already in-progress and the standby NN is not yet 
> started up, a new startup option is needed for the standby NN to initialize 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-01-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883784#comment-13883784
 ] 

stack commented on HDFS-4239:
-

I think throwing up an exception the right thing to do.  The volume is going 
away at the operators volition.

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Jimmy Xiang
> Attachments: hdfs-4239.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883771#comment-13883771
 ] 

Hadoop QA commented on HDFS-5810:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625506/HDFS-5810.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.TestShortCircuitCache
  org.apache.hadoop.hdfs.server.namenode.TestNameNodeHttpServer
  org.apache.hadoop.hdfs.TestParallelShortCircuitReadUnCached

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5958//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5958//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5958//console

This message is automatically generated.

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5833) SecondaryNameNode have an incorrect java doc

2014-01-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883770#comment-13883770
 ] 

Hudson commented on HDFS-5833:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5049 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5049/])
HDFS-5833. Fix incorrect javadoc in SecondaryNameNode. (Contributed by Bangtao 
Zhou) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561938)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java


> SecondaryNameNode have an incorrect java doc
> 
>
> Key: HDFS-5833
> URL: https://issues.apache.org/jira/browse/HDFS-5833
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Bangtao Zhou
>Priority: Trivial
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5833-1.patch
>
>
> SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode 
> uses the *NamenodeProtocol* to talk to the primary NameNode, not the 
> *ClientProtocol*



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5833) SecondaryNameNode have an incorrect java doc

2014-01-27 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5833:


  Resolution: Fixed
   Fix Version/s: 2.3.0
  3.0.0
Target Version/s: 2.3.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the patch Bangtao. I committed this to trunk, branch-2 and 
branch-2.3.

> SecondaryNameNode have an incorrect java doc
> 
>
> Key: HDFS-5833
> URL: https://issues.apache.org/jira/browse/HDFS-5833
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Bangtao Zhou
>Priority: Trivial
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5833-1.patch
>
>
> SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode 
> uses the *NamenodeProtocol* to talk to the primary NameNode, not the 
> *ClientProtocol*



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883750#comment-13883750
 ] 

Hadoop QA commented on HDFS-5804:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625518/HDFS-5804.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5959//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5959//console

This message is automatically generated.

> HDFS NFS Gateway fails to mount and proxy when using Kerberos
> -
>
> Key: HDFS-5804
> URL: https://issues.apache.org/jira/browse/HDFS-5804
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Abin Shahab
> Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, 
> HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
> javadoc-after-patch.log, javadoc-before-patch.log
>
>
> When using HDFS nfs gateway with secure hadoop 
> (hadoop.security.authentication: kerberos), mounting hdfs fails. 
> Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
> as the user invoking commands on the hdfs mount).
> Steps to reproduce:
> 1) start a hadoop cluster with kerberos enabled.
> 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
> a an account in kerberos.
> 3) Get the keytab for nfsserver, and issue the following mount command: mount 
> -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
> 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
> having a TGT for root.
> This is the stacktrace: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]; Host Details : local host is: 
> "my-nfs-server-host.com/10.252.4.197"; destination host is: 
> "my-namenode-host.com":8020; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
>   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
>   at 
> org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883746#comment-13883746
 ] 

Hadoop QA commented on HDFS-5776:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625502/HDFS-5776-v10.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5957//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5957//console

This message is automatically generated.

> Support 'hedged' reads in DFSClient
> ---
>
> Key: HDFS-5776
> URL: https://issues.apache.org/jira/browse/HDFS-5776
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, 
> HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, 
> HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos

2014-01-27 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HDFS-5804:
--

Attachment: HDFS-5804.patch

Test fix

> HDFS NFS Gateway fails to mount and proxy when using Kerberos
> -
>
> Key: HDFS-5804
> URL: https://issues.apache.org/jira/browse/HDFS-5804
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Abin Shahab
> Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, 
> HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
> javadoc-after-patch.log, javadoc-before-patch.log
>
>
> When using HDFS nfs gateway with secure hadoop 
> (hadoop.security.authentication: kerberos), mounting hdfs fails. 
> Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
> as the user invoking commands on the hdfs mount).
> Steps to reproduce:
> 1) start a hadoop cluster with kerberos enabled.
> 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
> a an account in kerberos.
> 3) Get the keytab for nfsserver, and issue the following mount command: mount 
> -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
> 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
> having a TGT for root.
> This is the stacktrace: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]; Host Details : local host is: 
> "my-nfs-server-host.com/10.252.4.197"; destination host is: 
> "my-namenode-host.com":8020; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
>   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
>   at 
> org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
>   at 
> org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstr

[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-01-27 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883728#comment-13883728
 ] 

Jimmy Xiang commented on HDFS-4239:
---

We can release the lock after the volume is marked down. No new block will be 
allocated to this volume. How about those blocks on this volume being writing? 
The writing could take forever, for example, a rarely updated HLog file. I was 
thinking to fail the writing pipeline so that the client can set up another 
pipeline. Any problem with that?

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Jimmy Xiang
> Attachments: hdfs-4239.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883717#comment-13883717
 ] 

stack commented on HDFS-5776:
-

[~arpitagarwal] Would v10 be palatable?  You say OK to v9 above but Colin 
review would favor v10?

[~xieliang007] Can you take care of the other nits raised by [~arpitagarwal]

Good stuff.

> Support 'hedged' reads in DFSClient
> ---
>
> Key: HDFS-5776
> URL: https://issues.apache.org/jira/browse/HDFS-5776
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, 
> HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, 
> HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment

2014-01-27 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883702#comment-13883702
 ] 

Andrew Wang commented on HDFS-5746:
---

Nice work here. I have a fair number of review comments, but most of it's nitty:

I didn't see anything named ShortCircuitSharedMemorySegment in the patch, 
should it be included?

SharedFileDescriptorFactory:
* Javadoc for SharedFileDescriptorFactory constructor
* {{rand()}} isn't reentrant, potentially making it unsafe for 
{{createDescriptor0}}. Should we use {{rand_r}} instead, or slap a synchronized 
on it?
* Also not sure why we concat two {{rand()}}. Seems like one should be enough 
with the collision detection code. 
* The {{open}} is done with mode {{0777}}, wouldn't {{0700}} be safer? I 
thought we were passing these over a domain socket, so we can keep the 
permissions locked up.
* Paranoia, should we do a check in CloseableReferenceCount#reference for 
overflow to the closed bit? I know we have 30 bits, but who knows.
* Unrelated nit: DomainSocket#write(byte[], int, int) {{boolean exec}} is 
indented wrong, mind fixing it?

DomainSocketWatcher:
* Class javadoc is c+p from {{DomainSocket}}, I think it should be updated for 
DSW. Some high-level description of how the nested classes fit together would 
be nice.
* Some Java-isms. {{Runnable}} is preferred over {{Thread}}. It's also weird 
that DSW is a {{Thread}} subclass and it calls {{start}} on itself. An inner 
class implementing Runnable would be more idiomatic.
* Explain use of {{loopSocks 0}} versus {{loopSocks 1}}? This is a crucial part 
of this class: we need to use a socketpair rather than a plain condition 
variable because of blocking on poll.
* "loopSocks" is also not a very descriptive name, maybe "wakeupPair" or 
"eventPair" instead?
* Can add a Precondition check to make sure the lock is held in checkNotClosed
* If we fail to kick, add and remove could block until the poll timeout.
* Should doc that we only support one Handler per fd, it overwrites on add. 
Maybe Precondition this instead if we don't want to overwrite, I can't tell 
from context here.
* Typo "loopSOcks" in log message
* The repeated calls to {{sendCallback}} are worrisome. For instance, a sock 
could be EOF and closed, be removed by the first sendCallback, and then if 
there's a pending toRemove for the sock, the second sendCallback aborts on the 
Precondition check.
* {{closeAll}} parameter in sendCallback is unused
* This comment probably means to refer to loopSocks:
{code}
// Close shutdownSocketPair[0], so that shutdownSocketPair[1] gets an EOF
{code}
* This comment probably meant poll, not select:
{code}
  // were waiting in select().
{code}

TestDomainSocketWatcher:
* Why are two of the {{@Test}} in TestDomainSocketWatcher commented out?
* Timeouts seem kind of long, these should be super fast tests right?

> add ShortCircuitSharedMemorySegment
> ---
>
> Key: HDFS-5746
> URL: https://issues.apache.org/jira/browse/HDFS-5746
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0
>
> Attachments: HDFS-5746.001.patch
>
>
> Add ShortCircuitSharedMemorySegment, which will be used to communicate 
> information between the datanode and the client about whether a replica is 
> mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-27 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883685#comment-13883685
 ] 

Arpit Agarwal commented on HDFS-5776:
-

{quote}
Yes, that would be perfect sometimes, but not works for HBase scenario(the 
above Stack's consideration is great), since we made the pool "static", and per 
client view, it's more flexible if we provide instance level disable/enable 
APIs, so we can archive to use the hbase shell script to control the switch per 
dfs client instance, that'll be cooler
{quote}
Okay.

{quote}
In actualGetFromOneDatanode(), the refetchToken/refetchEncryptionKey is 
initialized outside the while (true) loop (see Line 993-996), when we hit 
InvalidEncryptionKeyException/InvalidBlockTokenException, the refetchToken and 
refetchEncryptionKey will be decreased by 1, (see refetchEncryptionKey-- and 
refetchToken-- statement), if the exceptions happened again, the check 
conditions will be failed definitely(see "e instanceof 
InvalidEncryptionKeyException && refetchEncryptionKey > 0" and "refetchToken > 
0"), so go to the else clause, that'll execute:
{quote}
Isn't the call to {{actualGetFromOneDataNode}} wrapped in a loop itself? I am 
talking about the while loop in {{fetchBlockByteRange}}. Will that not change 
the behavior? Maybe it is harmless, I am not sure. I just want us to be clear 
either way.

Thanks for adding the thread count limit. If we need more than 128 threads per 
client process just for backup reads we (hdfs) need to think about proper async 
rpc. Suggesting a lack of limits ignores the point that it can double the DN 
load on an already loaded cluster. Also 1ms lower bound for the delay is as 
good as zero but as long as we have a thread count limit I am okay.

Minor points that don't need to hold up the checkin:
# The test looks like a stress test, i.e. we are hoping that some of the hedged 
requests will complete before the primary requests. We can create a separate 
Jira to write a deterministic unit test and it’s fine if someone else picks 
that up later.
# A couple of points from my initial feedback (#10, #12) were missed but again 
not worth holding the checkin.

Other than clarifying the loop behavior the v9 patch looks fine to me.

Thanks again for working with the feedback Liang, this is a nice capability to 
have in HDFS.

> Support 'hedged' reads in DFSClient
> ---
>
> Key: HDFS-5776
> URL: https://issues.apache.org/jira/browse/HDFS-5776
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, 
> HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, 
> HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-01-27 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5810:
---

Attachment: HDFS-5810.004.patch

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5730) Inconsistent Audit logging for HDFS APIs

2014-01-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883669#comment-13883669
 ] 

Colin Patrick McCabe commented on HDFS-5730:


Does anyone have a strong opinion about this approach?  If not, I will review 
this in detail later this week

> Inconsistent Audit logging for HDFS APIs
> 
>
> Key: HDFS-5730
> URL: https://issues.apache.org/jira/browse/HDFS-5730
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-5730.patch, HDFS-5730.patch
>
>
> When looking at the audit loggs in HDFS, I am seeing some inconsistencies 
> what was logged with audit and what is added recently.
> For more details please check the comments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883666#comment-13883666
 ] 

Jing Zhao commented on HDFS-5776:
-

Thanks for the work [~xieliang007]! I will review your latest patch and give my 
comments tonight (PST).

> Support 'hedged' reads in DFSClient
> ---
>
> Key: HDFS-5776
> URL: https://issues.apache.org/jira/browse/HDFS-5776
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, 
> HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, 
> HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-27 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5776:


Attachment: HDFS-5776-v10.txt

patch v10 removed the hard code limit per Colin's comments.
patch v9 has the hard code limit.
Any more comments or +1?  Personally i'd like to let the first cut go to trunk 
and branch-2 asap, so i can kick off the HBase side change. More detailed 
disagreement could be resolved in other future JIRAs, right? and since the 
default pool size is 0, so no obvious foreseeable function/performance hurt 
against the current existing downstream application.

> Support 'hedged' reads in DFSClient
> ---
>
> Key: HDFS-5776
> URL: https://issues.apache.org/jira/browse/HDFS-5776
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, 
> HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, 
> HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883650#comment-13883650
 ] 

Hadoop QA commented on HDFS-5842:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12625416/HADOOP-10215.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestHttpsFileSystem
  org.apache.hadoop.hdfs.server.namenode.TestNameNodeHttpServer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5955//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5955//console

This message is automatically generated.

> Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
> secure cluster
> 
>
> Key: HDFS-5842
> URL: https://issues.apache.org/jira/browse/HDFS-5842
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
> Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
> HADOOP-10215.002.patch, HADOOP-10215.002.patch
>
>
> Noticed this while debugging issues in another application. We saw an error 
> when trying to do a FileSystem.get using an hftp file system on a secure 
> cluster using a proxy user ugi.
> This is a small snippet used
> {code}
>  FileSystem testFS = ugi.doAs(new PrivilegedExceptionAction() {
> @Override
> public FileSystem run() throws IOException {
> return FileSystem.get(hadoopConf);
> }
> });
> {code}
> The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
> was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5835) Add a new option for starting standby NN when rolling upgrade is in progress

2014-01-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883627#comment-13883627
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5835:
--

Thanks Arpit and Jing for reviewing the patch.


# Yes.  See also #2 below.

# Suppose NN1 is active and NN2 is standby.  NN2 will be updated first.  Then 
NN1 will failover to NN2.  And then NN1 will be updated.

# SBN should do checkpoint only before the update marker.

I add tests for the cases above.

> Add a new option for starting standby NN when rolling upgrade is in progress
> 
>
> Key: HDFS-5835
> URL: https://issues.apache.org/jira/browse/HDFS-5835
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5835_20130127.patch
>
>
> When rolling upgrade is already in-progress and the standby NN is not yet 
> started up, a new startup option is needed for the standby NN to initialize 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5833) SecondaryNameNode have an incorrect java doc

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883610#comment-13883610
 ] 

Hadoop QA commented on HDFS-5833:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625248/HDFS-5833-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDistributedFileSystem

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5954//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5954//console

This message is automatically generated.

> SecondaryNameNode have an incorrect java doc
> 
>
> Key: HDFS-5833
> URL: https://issues.apache.org/jira/browse/HDFS-5833
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Bangtao Zhou
>Priority: Trivial
> Attachments: HDFS-5833-1.patch
>
>
> SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode 
> uses the *NamenodeProtocol* to talk to the primary NameNode, not the 
> *ClientProtocol*



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage

2014-01-27 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883580#comment-13883580
 ] 

Haohui Mai commented on HDFS-5698:
--

I took a fsimage from a production cluster, and scaled it down to different 
sizes to evaluate the performance and the size impact.

I ran the test on a machine that has an 8-core Xeon E5530 CPU @ 2.4GHz, 24G 
memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running RHEL 6.2, Java 1.6. 
The JVM has a maximum heap size of 20G, and it runs the concurrent mark and 
sweep GC.

Here are the numbers:

|Size in Old|512M|1G|2G|4G|8G| 
|Size in PB|469M|950M|1.9G|3.7G|7.0G| 
|Saving in Old (ms)|14678|28991|60520|96894|160878| 
|Saving in PB (ms)|14709|16746|32623|83645|168617| 
|Loading in Old (ms)|12819|24664|48240|114090|307689|
|Loading in PB (ms)|28268|43205|87060|266681|491605| 

The first two rows show the size of the fsimage in both the old and the new 
format respectively. The third and the forth row show the time of saving the 
fsimage in two different formats, and the last two rows show the time of 
loading the fsimage in two different format.

The new fsimage format is slightly more compact. The code writes the new 
fsimage slightly faster. Currently the new fsimage format loads slower. 
However, in the new format most of the loading process can be parallelized. I 
plan to introduce this feature after the branch is merged.

> Use protobuf to serialize / deserialize FSImage
> ---
>
> Key: HDFS-5698
> URL: https://issues.apache.org/jira/browse/HDFS-5698
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch
>
>
> Currently, the code serializes FSImage using in-house serialization 
> mechanisms. There are a couple disadvantages of the current approach:
> # Mixing the responsibility of reconstruction and serialization / 
> deserialization. The current code paths of serialization / deserialization 
> have spent a lot of effort on maintaining compatibility. What is worse is 
> that they are mixed with the complex logic of reconstructing the namespace, 
> making the code difficult to follow.
> # Poor documentation of the current FSImage format. The format of the FSImage 
> is practically defined by the implementation. An bug in implementation means 
> a bug in the specification. Furthermore, it also makes writing third-party 
> tools quite difficult.
> # Changing schemas is non-trivial. Adding a field in FSImage requires bumping 
> the layout version every time. Bumping out layout version requires (1) the 
> users to explicitly upgrade the clusters, and (2) putting new code to 
> maintain backward compatibility.
> This jira proposes to use protobuf to serialize the FSImage. Protobuf has 
> been used to serialize / deserialize the RPC message in Hadoop.
> Protobuf addresses all the above problems. It clearly separates the 
> responsibility of serialization and reconstructing the namespace. The 
> protobuf files document the current format of the FSImage. The developers now 
> can add optional fields with ease, since the old code can always read the new 
> FSImage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5835) Add a new option for starting standby NN when rolling upgrade is in progress

2014-01-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883558#comment-13883558
 ] 

Jing Zhao commented on HDFS-5835:
-

+1 the Patch looks good to me.

Some questions not related to the patch: 
# So when we start the SBN, the SBN already has been upgraded?
# Is it possible that the NN failover just happens when we start the SBN? Or 
the other NN is in standby state at this time, and this NN will become active 
in the end? In that case this "STARTED' option may also be applied to the ANN?
# If we allow SBN to do checkpoint during the rolling upgrade, the SBN may not 
hit the upgrade marker in the editlog when it restarts. Thus the current 
document said we would disable the checkpoint. But this may also cause issue if 
the time between "start" and "finalize" is long. Since we do not delete old 
editlog and fsimage during checkpointing, an alternative way is to scan the 
editlog even across the fsimage?

> Add a new option for starting standby NN when rolling upgrade is in progress
> 
>
> Key: HDFS-5835
> URL: https://issues.apache.org/jira/browse/HDFS-5835
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5835_20130127.patch
>
>
> When rolling upgrade is already in-progress and the standby NN is not yet 
> started up, a new startup option is needed for the standby NN to initialize 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883548#comment-13883548
 ] 

Hadoop QA commented on HDFS-5804:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625470/HDFS-5804.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs:

  org.apache.hadoop.hdfs.nfs.nfs3.TestWrites
  org.apache.hadoop.hdfs.nfs.TestReaddir

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5956//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5956//console

This message is automatically generated.

> HDFS NFS Gateway fails to mount and proxy when using Kerberos
> -
>
> Key: HDFS-5804
> URL: https://issues.apache.org/jira/browse/HDFS-5804
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Abin Shahab
> Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, 
> HDFS-5804.patch, exception-as-root.log, javadoc-after-patch.log, 
> javadoc-before-patch.log
>
>
> When using HDFS nfs gateway with secure hadoop 
> (hadoop.security.authentication: kerberos), mounting hdfs fails. 
> Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
> as the user invoking commands on the hdfs mount).
> Steps to reproduce:
> 1) start a hadoop cluster with kerberos enabled.
> 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
> a an account in kerberos.
> 3) Get the keytab for nfsserver, and issue the following mount command: mount 
> -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
> 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
> having a TGT for root.
> This is the stacktrace: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]; Host Details : local host is: 
> "my-nfs-server-host.com/10.252.4.197"; destination host is: 
> "my-namenode-host.com":8020; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
>   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
>   at 
> org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.se

[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883549#comment-13883549
 ] 

Hadoop QA commented on HDFS-5841:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625443/hdfs-5841-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5953//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5953//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5953//console

This message is automatically generated.

> Update HDFS caching documentation with new changes
> --
>
> Key: HDFS-5841
> URL: https://issues.apache.org/jira/browse/HDFS-5841
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: caching
> Attachments: hdfs-5841-1.patch
>
>
> The caching documentation is a little out of date, since it's missing 
> description of features like TTL and expiration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes

2014-01-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883539#comment-13883539
 ] 

Brandon Li commented on HDFS-5767:
--

Sounds good to me. I can review the patch once it's available. Thanks.

> Nfs implementation assumes userName userId mapping to be unique, which is not 
> true sometimes
> 
>
> Key: HDFS-5767
> URL: https://issues.apache.org/jira/browse/HDFS-5767
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
> Environment: With LDAP enabled
>Reporter: Yongjun Zhang
>Assignee: Brandon Li
>
> I'm seeing that the nfs implementation assumes unique  pair 
> to be returned by command  "getent paswd". That is, for a given userName, 
> there should be a single userId, and for a given userId, there should be a 
> single userName.  The reason is explained in the following message:
>  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway 
> can't start with duplicate name or id on the host system.\n"
>   + "This is because HDFS (non-kerberos cluster) uses name as the only 
> way to identify a user or group.\n"
>   + "The host system with duplicated user/group name or id might work 
> fine most of the time by itself.\n"
>   + "However when NFS gateway talks to HDFS, HDFS accepts only user and 
> group name.\n"
>   + "Therefore, same name means the same user or same group. To find the 
> duplicated names/ids, one can do:\n"
>   + " and  
> on Linux systms,\n"
>   + " and  PrimaryGroupID> on MacOS.";
> This requirement can not be met sometimes (e.g. because of the use of LDAP) 
> Let's do some examination:
> What exist in /etc/passwd:
> $ more /etc/passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> $ more /etc/passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> The above result says userName  "bin" has userId "2", and "daemon" has userId 
> "1".
>  
> What we can see with "getent passwd" command due to LDAP:
> $ getent passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> bin:x:1:1:bin:/bin:/sbin/nologin
> $ getent passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> daemon:x:2:2:daemon:/sbin:/sbin/nologin
> We can see that there are multiple entries for the same userName with 
> different userIds, and the same userId could be associated with different 
> userNames.
> So the assumption stated in the above DEBUG_INFO message can not be met here. 
> The DEBUG_INFO also stated that HDFS uses name as the only way to identify 
> user/group. I'm filing this JIRA for a solution.
> Hi [~brandonli], since you implemented most of the nfs feature, would you 
> please comment? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5835) Add a new option for starting standby NN when rolling upgrade is in progress

2014-01-27 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883512#comment-13883512
 ] 

Arpit Agarwal commented on HDFS-5835:
-

+1 for the patch.

> Add a new option for starting standby NN when rolling upgrade is in progress
> 
>
> Key: HDFS-5835
> URL: https://issues.apache.org/jira/browse/HDFS-5835
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5835_20130127.patch
>
>
> When rolling upgrade is already in-progress and the standby NN is not yet 
> started up, a new startup option is needed for the standby NN to initialize 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883511#comment-13883511
 ] 

Hadoop QA commented on HDFS-5842:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12625416/HADOOP-10215.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3480//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3480//console

This message is automatically generated.

> Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
> secure cluster
> 
>
> Key: HDFS-5842
> URL: https://issues.apache.org/jira/browse/HDFS-5842
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
> Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
> HADOOP-10215.002.patch, HADOOP-10215.002.patch
>
>
> Noticed this while debugging issues in another application. We saw an error 
> when trying to do a FileSystem.get using an hftp file system on a secure 
> cluster using a proxy user ugi.
> This is a small snippet used
> {code}
>  FileSystem testFS = ugi.doAs(new PrivilegedExceptionAction() {
> @Override
> public FileSystem run() throws IOException {
> return FileSystem.get(hadoopConf);
> }
> });
> {code}
> The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
> was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes

2014-01-27 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883505#comment-13883505
 ] 

Yongjun Zhang commented on HDFS-5767:
-

Thanks Brandon. I assume you are ok with:

"If you deem that the simplified solution to assume unique  
mapping (by ignoring duplicated same mapping) is sufficient, then we can go 
with the algorithm I listed at 
comment - 22/Jan/14 10:44."

I can work out the solution if so.





> Nfs implementation assumes userName userId mapping to be unique, which is not 
> true sometimes
> 
>
> Key: HDFS-5767
> URL: https://issues.apache.org/jira/browse/HDFS-5767
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
> Environment: With LDAP enabled
>Reporter: Yongjun Zhang
>Assignee: Brandon Li
>
> I'm seeing that the nfs implementation assumes unique  pair 
> to be returned by command  "getent paswd". That is, for a given userName, 
> there should be a single userId, and for a given userId, there should be a 
> single userName.  The reason is explained in the following message:
>  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway 
> can't start with duplicate name or id on the host system.\n"
>   + "This is because HDFS (non-kerberos cluster) uses name as the only 
> way to identify a user or group.\n"
>   + "The host system with duplicated user/group name or id might work 
> fine most of the time by itself.\n"
>   + "However when NFS gateway talks to HDFS, HDFS accepts only user and 
> group name.\n"
>   + "Therefore, same name means the same user or same group. To find the 
> duplicated names/ids, one can do:\n"
>   + " and  
> on Linux systms,\n"
>   + " and  PrimaryGroupID> on MacOS.";
> This requirement can not be met sometimes (e.g. because of the use of LDAP) 
> Let's do some examination:
> What exist in /etc/passwd:
> $ more /etc/passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> $ more /etc/passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> The above result says userName  "bin" has userId "2", and "daemon" has userId 
> "1".
>  
> What we can see with "getent passwd" command due to LDAP:
> $ getent passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> bin:x:1:1:bin:/bin:/sbin/nologin
> $ getent passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> daemon:x:2:2:daemon:/sbin:/sbin/nologin
> We can see that there are multiple entries for the same userName with 
> different userIds, and the same userId could be associated with different 
> userNames.
> So the assumption stated in the above DEBUG_INFO message can not be met here. 
> The DEBUG_INFO also stated that HDFS uses name as the only way to identify 
> user/group. I'm filing this JIRA for a solution.
> Hi [~brandonli], since you implemented most of the nfs feature, would you 
> please comment? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos

2014-01-27 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HDFS-5804:
--

Attachment: HDFS-5804.patch

This removes the isSecurityEnabled check.

> HDFS NFS Gateway fails to mount and proxy when using Kerberos
> -
>
> Key: HDFS-5804
> URL: https://issues.apache.org/jira/browse/HDFS-5804
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Abin Shahab
> Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, 
> HDFS-5804.patch, exception-as-root.log, javadoc-after-patch.log, 
> javadoc-before-patch.log
>
>
> When using HDFS nfs gateway with secure hadoop 
> (hadoop.security.authentication: kerberos), mounting hdfs fails. 
> Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
> as the user invoking commands on the hdfs mount).
> Steps to reproduce:
> 1) start a hadoop cluster with kerberos enabled.
> 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
> a an account in kerberos.
> 3) Get the keytab for nfsserver, and issue the following mount command: mount 
> -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
> 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
> having a TGT for root.
> This is the stacktrace: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]; Host Details : local host is: 
> "my-nfs-server-host.com/10.252.4.197"; destination host is: 
> "my-namenode-host.com":8020; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
>   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
>   at 
> org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
>   at 
> org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
>   at 
> org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>   at 
> org.jboss.netty.channel.DefaultChannelPi

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-27 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883492#comment-13883492
 ] 

Suresh Srinivas commented on HDFS-5138:
---

bq. Finalize is actually rather easy, since it's idempotent. 
Missed this. Agreed finalize is idempotent (not sure how code deals with the 
failures - not had time to look into it). But not being able to finalize in 
some cases could be problematic. Especially from storage utilization point of 
view.

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery

2014-01-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883493#comment-13883493
 ] 

Todd Lipcon commented on HDFS-5790:
---

Thanks for the analysis Kihwal. My logic was basically the same - glad to have 
it confirmed.

Also, you're right - I'm pretty sure the "single writer" was NN_Recovery in the 
production case we saw as well, though it wasn't easy to verify (we don't 
appear to have any way to dump the LeaseManager state at runtime, which is a 
shame)

I'll commit this in a day or two if no one has further comments.

> LeaseManager.findPath is very slow when many leases need recovery
> -
>
> Key: HDFS-5790
> URL: https://issues.apache.org/jira/browse/HDFS-5790
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, performance
>Affects Versions: 2.4.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-5790.txt, hdfs-5790.txt
>
>
> We recently saw an issue where the NN restarted while tens of thousands of 
> files were open. The NN then ended up spending multiple seconds for each 
> commitBlockSynchronization() call, spending most of its time inside 
> LeaseManager.findPath(). findPath currently works by looping over all files 
> held for a given writer, and traversing the filesystem for each one. This 
> takes way too long when tens of thousands of files are open by a single 
> writer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes

2014-01-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883482#comment-13883482
 ] 

Brandon Li commented on HDFS-5767:
--

Sorry for missing that question.
NFS gateway uses only one map containing the name-id mapping. Even IdUserGroup 
is used on a different machine to get a different id or name, it can't pass it 
to NFS gateway. Actually, with AUTH_UNIX as the current authentication method, 
NFS client passes only the user id to NFS gateway and that is done usually by 
kernel not by application.

> Nfs implementation assumes userName userId mapping to be unique, which is not 
> true sometimes
> 
>
> Key: HDFS-5767
> URL: https://issues.apache.org/jira/browse/HDFS-5767
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
> Environment: With LDAP enabled
>Reporter: Yongjun Zhang
>Assignee: Brandon Li
>
> I'm seeing that the nfs implementation assumes unique  pair 
> to be returned by command  "getent paswd". That is, for a given userName, 
> there should be a single userId, and for a given userId, there should be a 
> single userName.  The reason is explained in the following message:
>  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway 
> can't start with duplicate name or id on the host system.\n"
>   + "This is because HDFS (non-kerberos cluster) uses name as the only 
> way to identify a user or group.\n"
>   + "The host system with duplicated user/group name or id might work 
> fine most of the time by itself.\n"
>   + "However when NFS gateway talks to HDFS, HDFS accepts only user and 
> group name.\n"
>   + "Therefore, same name means the same user or same group. To find the 
> duplicated names/ids, one can do:\n"
>   + " and  
> on Linux systms,\n"
>   + " and  PrimaryGroupID> on MacOS.";
> This requirement can not be met sometimes (e.g. because of the use of LDAP) 
> Let's do some examination:
> What exist in /etc/passwd:
> $ more /etc/passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> $ more /etc/passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> The above result says userName  "bin" has userId "2", and "daemon" has userId 
> "1".
>  
> What we can see with "getent passwd" command due to LDAP:
> $ getent passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> bin:x:1:1:bin:/bin:/sbin/nologin
> $ getent passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> daemon:x:2:2:daemon:/sbin:/sbin/nologin
> We can see that there are multiple entries for the same userName with 
> different userIds, and the same userId could be associated with different 
> userNames.
> So the assumption stated in the above DEBUG_INFO message can not be met here. 
> The DEBUG_INFO also stated that HDFS uses name as the only way to identify 
> user/group. I'm filing this JIRA for a solution.
> Hi [~brandonli], since you implemented most of the nfs feature, would you 
> please comment? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Moved] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-27 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao moved HADOOP-10215 to HDFS-5842:
--

  Component/s: (was: security)
   security
Affects Version/s: (was: 2.2.0)
   2.2.0
  Key: HDFS-5842  (was: HADOOP-10215)
  Project: Hadoop HDFS  (was: Hadoop Common)

> Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
> secure cluster
> 
>
> Key: HDFS-5842
> URL: https://issues.apache.org/jira/browse/HDFS-5842
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
> Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
> HADOOP-10215.002.patch, HADOOP-10215.002.patch
>
>
> Noticed this while debugging issues in another application. We saw an error 
> when trying to do a FileSystem.get using an hftp file system on a secure 
> cluster using a proxy user ugi.
> This is a small snippet used
> {code}
>  FileSystem testFS = ugi.doAs(new PrivilegedExceptionAction() {
> @Override
> public FileSystem run() throws IOException {
> return FileSystem.get(hadoopConf);
> }
> });
> {code}
> The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
> was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HDFS-5138) Support HDFS upgrade in HA

2014-01-27 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883415#comment-13883415
 ] 

Suresh Srinivas edited comment on HDFS-5138 at 1/27/14 10:51 PM:
-

bq. The concern is about losing edit logs by overwriting a renamed directory 
with some contents, so by definition there will be some files in the directory 
being renamed to.
That makes sense. Thanks.

bq. The preupgrade and upgrade failure scenarios should both be handled either 
manually or by the storage recovery process
I do not think JN performs recovery, based on the following code from 
JNStorage.java
{code}
  void analyzeStorage() throws IOException {
this.state = sd.analyzeStorage(StartupOption.REGULAR, this);
if (state == StorageState.NORMAL) {
  readProperties(sd);
}
  }
{code}

For JournalNode, StorageDirectory#doRecover() is not called. Is that correct? 
From my understanding, once it gets into this state, JournalNode restart will 
not work?


was (Author: sureshms):
bq. The concern is about losing edit logs by overwriting a renamed directory 
with some contents, so by definition there will be some files in the directory 
being renamed to.
That makes sense. Thanks.

bq. The preupgrade and upgrade failure scenarios should both be handled either 
manually or by the storage recovery process
I do not think JN performs recovery, based on the following code from 
JNStorage.java
{code}
  void analyzeStorage() throws IOException {
this.state = sd.analyzeStorage(StartupOption.REGULAR, this);
if (state == StorageState.NORMAL) {
  readProperties(sd);
}
  }
{code}

For JournalNode, StorageDirectory#doRecover() is not called. Is that correct? 
From my understanding, once it gets into this state, JournalNode should not 
startup?

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.

2014-01-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883469#comment-13883469
 ] 

Hudson commented on HDFS-5830:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5048 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5048/])
HDFS-5830. WebHdfsFileSystem.getFileBlockLocations throws 
IllegalArgumentException when accessing another cluster. (Yongjun Zhang via 
Colin Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561885)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java


> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing another cluster. 
> 
>
> Key: HDFS-5830
> URL: https://issues.apache.org/jira/browse/HDFS-5830
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5830.001.patch
>
>
> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing a another cluster (that doesn't have caching support). 
> java.lang.IllegalArgumentException: cachedLocs should not be null, use a 
> different constructor
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.

2014-01-27 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883464#comment-13883464
 ] 

Yongjun Zhang commented on HDFS-5830:
-

Thanks a lot Colin! 

I planned to take a look at the -1 javadoc thing. Will update later when I find 
something.


> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing another cluster. 
> 
>
> Key: HDFS-5830
> URL: https://issues.apache.org/jira/browse/HDFS-5830
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5830.001.patch
>
>
> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing a another cluster (that doesn't have caching support). 
> java.lang.IllegalArgumentException: cachedLocs should not be null, use a 
> different constructor
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value

2014-01-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883443#comment-13883443
 ] 

Colin Patrick McCabe commented on HDFS-5781:


Yeah, perhaps we could have a separate JIRA to use a static function rather 
than a static block.

> Use an array to record the mapping between FSEditLogOpCode and the 
> corresponding byte value
> ---
>
> Key: HDFS-5781
> URL: https://issues.apache.org/jira/browse/HDFS-5781
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, 
> HDFS-5781.002.patch, HDFS-5781.002.patch
>
>
> HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a 
> given byte value. While improving the efficiency, it may cause issue. E.g., 
> when several new editlog ops are added to trunk around the same time (for 
> several different new features), it is hard to backport the editlog ops with 
> larger byte values to branch-2 before those with smaller values, since there 
> will be gaps in the byte values of the enum. 
> This jira plans to still use an array to record the mapping between editlog 
> ops and their byte values, and allow gap between valid ops. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883436#comment-13883436
 ] 

Hadoop QA commented on HDFS-5698:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625413/HDFS-5698.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5952//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5952//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5952//console

This message is automatically generated.

> Use protobuf to serialize / deserialize FSImage
> ---
>
> Key: HDFS-5698
> URL: https://issues.apache.org/jira/browse/HDFS-5698
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch
>
>
> Currently, the code serializes FSImage using in-house serialization 
> mechanisms. There are a couple disadvantages of the current approach:
> # Mixing the responsibility of reconstruction and serialization / 
> deserialization. The current code paths of serialization / deserialization 
> have spent a lot of effort on maintaining compatibility. What is worse is 
> that they are mixed with the complex logic of reconstructing the namespace, 
> making the code difficult to follow.
> # Poor documentation of the current FSImage format. The format of the FSImage 
> is practically defined by the implementation. An bug in implementation means 
> a bug in the specification. Furthermore, it also makes writing third-party 
> tools quite difficult.
> # Changing schemas is non-trivial. Adding a field in FSImage requires bumping 
> the layout version every time. Bumping out layout version requires (1) the 
> users to explicitly upgrade the clusters, and (2) putting new code to 
> maintain backward compatibility.
> This jira proposes to use protobuf to serialize the FSImage. Protobuf has 
> been used to serialize / deserialize the RPC message in Hadoop.
> Protobuf addresses all the above problems. It clearly separates the 
> responsibility of serialization and reconstructing the namespace. The 
> protobuf files document the current format of the FSImage. The developers now 
> can add optional fields with ease, since the old code can always read the new 
> FSImage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5833) SecondaryNameNode have an incorrect java doc

2014-01-27 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5833:


Affects Version/s: (was: trunk-win)
   3.0.0
   Status: Patch Available  (was: Open)

> SecondaryNameNode have an incorrect java doc
> 
>
> Key: HDFS-5833
> URL: https://issues.apache.org/jira/browse/HDFS-5833
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Bangtao Zhou
>Priority: Trivial
> Attachments: HDFS-5833-1.patch
>
>
> SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode 
> uses the *NamenodeProtocol* to talk to the primary NameNode, not the 
> *ClientProtocol*



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.

2014-01-27 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5830:
---

   Resolution: Fixed
Fix Version/s: 2.3.0
   Status: Resolved  (was: Patch Available)

> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing another cluster. 
> 
>
> Key: HDFS-5830
> URL: https://issues.apache.org/jira/browse/HDFS-5830
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5830.001.patch
>
>
> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing a another cluster (that doesn't have caching support). 
> java.lang.IllegalArgumentException: cachedLocs should not be null, use a 
> different constructor
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.

2014-01-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883429#comment-13883429
 ] 

Colin Patrick McCabe commented on HDFS-5830:


release audit warning is a pid file-- not relevant.  committing.

> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing another cluster. 
> 
>
> Key: HDFS-5830
> URL: https://issues.apache.org/jira/browse/HDFS-5830
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Blocker
> Attachments: HDFS-5830.001.patch
>
>
> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing a another cluster (that doesn't have caching support). 
> java.lang.IllegalArgumentException: cachedLocs should not be null, use a 
> different constructor
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HDFS-5138) Support HDFS upgrade in HA

2014-01-27 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883415#comment-13883415
 ] 

Suresh Srinivas edited comment on HDFS-5138 at 1/27/14 10:09 PM:
-

bq. The concern is about losing edit logs by overwriting a renamed directory 
with some contents, so by definition there will be some files in the directory 
being renamed to.
That makes sense. Thanks.

bq. The preupgrade and upgrade failure scenarios should both be handled either 
manually or by the storage recovery process
I do not think JN performs recovery, based on the following code from 
JNStorage.java
{code}
  void analyzeStorage() throws IOException {
this.state = sd.analyzeStorage(StartupOption.REGULAR, this);
if (state == StorageState.NORMAL) {
  readProperties(sd);
}
  }
{code}

For JournalNode, StorageDirectory#doRecover() is not called. Is that correct? 
From my understanding, once it gets into this state, JournalNode should not 
startup?


was (Author: sureshms):
bq. The concern is about losing edit logs by overwriting a renamed directory 
with some contents, so by definition there will be some files in the directory 
being renamed to.
That makes sense. Thanks.

bq. The preupgrade and upgrade failure scenarios should both be handled either 
manually or by the storage recovery process
I do not think JN performs recovery, based on the following code from 
JNStorage.java
{code}
  void analyzeStorage() throws IOException {
this.state = sd.analyzeStorage(StartupOption.REGULAR, this);
if (state == StorageState.NORMAL) {
  readProperties(sd);
}
  }
{code}

For JournalNode, node call StorageDirectory#doRecover(). Is that correct?

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-27 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883415#comment-13883415
 ] 

Suresh Srinivas commented on HDFS-5138:
---

bq. The concern is about losing edit logs by overwriting a renamed directory 
with some contents, so by definition there will be some files in the directory 
being renamed to.
That makes sense. Thanks.

bq. The preupgrade and upgrade failure scenarios should both be handled either 
manually or by the storage recovery process
I do not think JN performs recovery, based on the following code from 
JNStorage.java
{code}
  void analyzeStorage() throws IOException {
this.state = sd.analyzeStorage(StartupOption.REGULAR, this);
if (state == StorageState.NORMAL) {
  readProperties(sd);
}
  }
{code}

For JournalNode, node call StorageDirectory#doRecover(). Is that correct?

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery

2014-01-27 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883409#comment-13883409
 ] 

Kihwal Lee commented on HDFS-5790:
--

I wondered why commitBlockSynchronization() sometimes takes long and this jira 
explains why.  When the original lease holders disappear, the lease holders are 
changed to namenode for block recovery. So if a lot of files get abandoned at 
around the same time, NN will be that writer with a large number of open files. 

The patch looks good. The paths managed by LeaseManager are supposed to be 
updated on deletions and renames, so there is no point in searching there when 
the reference to inode is already known. For all user-initiated calls, the 
inode is obtained using the user-supplied path and then checkLease() is called 
before calling findPath(). So if something is to fail in findPath(), it should 
fail earlier in the code path. The patch seems fine in terms of both 
consistency and correctness.

+1

> LeaseManager.findPath is very slow when many leases need recovery
> -
>
> Key: HDFS-5790
> URL: https://issues.apache.org/jira/browse/HDFS-5790
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, performance
>Affects Versions: 2.4.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-5790.txt, hdfs-5790.txt
>
>
> We recently saw an issue where the NN restarted while tens of thousands of 
> files were open. The NN then ended up spending multiple seconds for each 
> commitBlockSynchronization() call, spending most of its time inside 
> LeaseManager.findPath(). findPath currently works by looping over all files 
> held for a given writer, and traversing the filesystem for each one. This 
> takes way too long when tens of thousands of files are open by a single 
> writer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5297) Fix dead links in HDFS site documents

2014-01-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883397#comment-13883397
 ] 

Hudson commented on HDFS-5297:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5047 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5047/])
HDFS-5297. Fix dead links in HDFS site documents. (Contributed by Akira 
Ajisaka) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561849)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithQJM.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsEditsViewer.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsImageViewer.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsQuotaAdminGuide.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ShortCircuitLocalReads.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm


> Fix dead links in HDFS site documents
> -
>
> Key: HDFS-5297
> URL: https://issues.apache.org/jira/browse/HDFS-5297
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5297.patch
>
>
> I found a lot of broken hyperlinks in HDFS document to be fixed.
> Ex.)
> In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below
> {noformat}
>For command usage, see {{{dfsadmin}}}.
> {noformat}
> It should be fixed to 
> {noformat}
>For command usage, see 
> {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-27 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883394#comment-13883394
 ] 

Aaron T. Myers commented on HDFS-5138:
--

bq. Aaron T. Myers, we talked about this last on Friday Jan 16th over the phone 
right. I did tell you that JournalNode potentially losing editlogs.

There must have been some misunderstanding because I'm pretty sure I told you 
that I didn't think that was possible. :) Anyway, see below...

bq. Is that correct? Did you check it? Java File#renameTo() is platform 
dependent. The following code always renames the directories (on my MAC):

I did, at least on Linux. In the code example you have above, try putting a 
child file or directory under the directory f2 and see if it still works. The 
concern is about losing edit logs by overwriting a renamed directory with some 
contents, so by definition there will be some files in the directory being 
renamed to.

bq. Related question. Lets say even if the rename fails, how does user recover 
from that condition? I brought up several scenarios related to that in 
preupgrade, upgrade, and finalize. How do we handle finalize being done 
successfully done on one namenode and not the other?

Finalize is actually rather easy, since it's idempotent. The preupgrade and 
upgrade failure scenarios should both be handled either manually or by the 
storage recovery process, which currently should happen on JN restart, but I 
agree could be improved. Let's continue discussion of this over on HDFS-5840.

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes

2014-01-27 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883395#comment-13883395
 ] 

Yongjun Zhang commented on HDFS-5767:
-

Thanks [~brandonli],

Duplicated entries of exactly same mapping is easy to handle (we can simply 
ignore, because they are the same) as we discussed earlier. See my earlier 
comment at - 21/Jan/14 16:07. 

If you deem that the simplified solution to assume unique  
mapping (by ignoring duplicated same mapping) is sufficient, then we can go 
with the algorithm I listed at 
comment - 22/Jan/14 10:44.

I actually had a question for you in my comment at  21/Jan/14 16:07 above, and 
I'm putting it here again: 

"I'm asking another question here, I noticed that IdUserGroup class also 
provides API go getUserName of given uid. I'm not sure whether this API will be 
called from different machine with different uid for the same user. If it does, 
then we might get wrong user name back from this API. Say, userA is mapped to 1 
in /etc/passwd, and 2 in ldap, we end up assign mapping , is it 
possible some one will call this API with "1", and expect useA?"

Thanks.


> Nfs implementation assumes userName userId mapping to be unique, which is not 
> true sometimes
> 
>
> Key: HDFS-5767
> URL: https://issues.apache.org/jira/browse/HDFS-5767
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
> Environment: With LDAP enabled
>Reporter: Yongjun Zhang
>Assignee: Brandon Li
>
> I'm seeing that the nfs implementation assumes unique  pair 
> to be returned by command  "getent paswd". That is, for a given userName, 
> there should be a single userId, and for a given userId, there should be a 
> single userName.  The reason is explained in the following message:
>  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway 
> can't start with duplicate name or id on the host system.\n"
>   + "This is because HDFS (non-kerberos cluster) uses name as the only 
> way to identify a user or group.\n"
>   + "The host system with duplicated user/group name or id might work 
> fine most of the time by itself.\n"
>   + "However when NFS gateway talks to HDFS, HDFS accepts only user and 
> group name.\n"
>   + "Therefore, same name means the same user or same group. To find the 
> duplicated names/ids, one can do:\n"
>   + " and  
> on Linux systms,\n"
>   + " and  PrimaryGroupID> on MacOS.";
> This requirement can not be met sometimes (e.g. because of the use of LDAP) 
> Let's do some examination:
> What exist in /etc/passwd:
> $ more /etc/passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> $ more /etc/passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> The above result says userName  "bin" has userId "2", and "daemon" has userId 
> "1".
>  
> What we can see with "getent passwd" command due to LDAP:
> $ getent passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> bin:x:1:1:bin:/bin:/sbin/nologin
> $ getent passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> daemon:x:2:2:daemon:/sbin:/sbin/nologin
> We can see that there are multiple entries for the same userName with 
> different userIds, and the same userId could be associated with different 
> userNames.
> So the assumption stated in the above DEBUG_INFO message can not be met here. 
> The DEBUG_INFO also stated that HDFS uses name as the only way to identify 
> user/group. I'm filing this JIRA for a solution.
> Hi [~brandonli], since you implemented most of the nfs feature, would you 
> please comment? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5838) TestCacheDirectives#testCreateAndModifyPools fails

2014-01-27 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-5838.
-

Resolution: Not A Problem

The setup and teardown functions that run before and after the tests 
respectively happens to solve the problem.

> TestCacheDirectives#testCreateAndModifyPools fails
> --
>
> Key: HDFS-5838
> URL: https://issues.apache.org/jira/browse/HDFS-5838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: java7
> Attachments: HDFS-5838.patch
>
>
> testCreateAndModifyPools generates an assertion fail when it runs after 
> testBasicPoolOperations.
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)  Time 
> elapsed: 4.649 sec  <<< FAILURE!
> java.lang.AssertionError: expected no cache pools after deleting pool
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertFalse(Assert.java:68)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160)
> Results :
> Failed tests: 
>   TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no 
> cache pools after deleting pool
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5838) TestCacheDirectives#testCreateAndModifyPools fails

2014-01-27 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5838:


Status: Open  (was: Patch Available)

> TestCacheDirectives#testCreateAndModifyPools fails
> --
>
> Key: HDFS-5838
> URL: https://issues.apache.org/jira/browse/HDFS-5838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: java7
> Attachments: HDFS-5838.patch
>
>
> testCreateAndModifyPools generates an assertion fail when it runs after 
> testBasicPoolOperations.
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)  Time 
> elapsed: 4.649 sec  <<< FAILURE!
> java.lang.AssertionError: expected no cache pools after deleting pool
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertFalse(Assert.java:68)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160)
> Results :
> Failed tests: 
>   TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no 
> cache pools after deleting pool
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5838) TestCacheDirectives#testCreateAndModifyPools fails

2014-01-27 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5838:


Summary: TestCacheDirectives#testCreateAndModifyPools fails  (was: 
TestcacheDirectives#testCreateAndModifyPools fails)

> TestCacheDirectives#testCreateAndModifyPools fails
> --
>
> Key: HDFS-5838
> URL: https://issues.apache.org/jira/browse/HDFS-5838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: java7
> Attachments: HDFS-5838.patch
>
>
> testCreateAndModifyPools generates an assertion fail when it runs after 
> testBasicPoolOperations.
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)  Time 
> elapsed: 4.649 sec  <<< FAILURE!
> java.lang.AssertionError: expected no cache pools after deleting pool
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertFalse(Assert.java:68)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160)
> Results :
> Failed tests: 
>   TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no 
> cache pools after deleting pool
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-27 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883368#comment-13883368
 ] 

Suresh Srinivas commented on HDFS-5138:
---

{quote}
Hi Suresh, it's obviously fine that you're busy (we all are) but in the future 
please just let me know that you intend to review it and that we should hold 
off for committing it for a bit. I reached out to you more than once last week 
to ask about a review timeline and never heard back from you, so I asked Todd 
to commit it (I'm traveling at the moment) given the silence.
{quote}
[~atm], we talked about this last on Friday Jan 16th over the phone right. I 
did tell you that JournalNode potentially losing editlogs.

bq. This scenario isn't possible as you described because either the 
pre-upgrade or upgrade stages (depending upon when the original failure 
happened) will fail to rename the dir if it already exists.
Is that correct? Did you check it? Java File#renameTo() is platform dependent. 
The following code always renames the directories (on my MAC):

{code}
public static void main(String[] args) {
File f1 = new File("/tmp/dir1");
File f2 = new File("/tmp/dir2");
f1.mkdir();
f2.mkdir();
System.out.println(f1 + (f1.exists() ? " exists" : " does not exist"));
System.out.println(f2 + (f2.exists() ? " exists" : " does not exist"));
f1.renameTo(f2);
System.out.println("Renamed " + f1 + " to " + f2);
System.out.println(f1 + (f1.exists() ? " exists" : " does not exist"));
System.out.println(f2 + (f2.exists() ? " exists" : " does not exist"));
  }
{code}

Related question. Lets say even if the rename fails, how does user recover from 
that condition? I brought up several scenarios related to that in preupgrade, 
upgrade, and finalize. How do we handle finalize being done successfully done 
on one namenode and not the other?

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes

2014-01-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883348#comment-13883348
 ] 

Brandon Li commented on HDFS-5767:
--

Thanks, [~yzhangal]. 
I think multi-mapping(e.g., test1->502, test2->502, test1->503) in most cases 
is an error. In that case, NFS gateway can fail to start.  

Completely duplicated mapping is not uncommon and the dup can be just ignored 
by NFS. One example I saw before is that, the same user account was configured 
with the same id twice on both LDAP and local node(/etc/passwd). Then "getent 
passwd" could give the same mapping twice (e.g, test1->502, test1->502)


> Nfs implementation assumes userName userId mapping to be unique, which is not 
> true sometimes
> 
>
> Key: HDFS-5767
> URL: https://issues.apache.org/jira/browse/HDFS-5767
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
> Environment: With LDAP enabled
>Reporter: Yongjun Zhang
>Assignee: Brandon Li
>
> I'm seeing that the nfs implementation assumes unique  pair 
> to be returned by command  "getent paswd". That is, for a given userName, 
> there should be a single userId, and for a given userId, there should be a 
> single userName.  The reason is explained in the following message:
>  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway 
> can't start with duplicate name or id on the host system.\n"
>   + "This is because HDFS (non-kerberos cluster) uses name as the only 
> way to identify a user or group.\n"
>   + "The host system with duplicated user/group name or id might work 
> fine most of the time by itself.\n"
>   + "However when NFS gateway talks to HDFS, HDFS accepts only user and 
> group name.\n"
>   + "Therefore, same name means the same user or same group. To find the 
> duplicated names/ids, one can do:\n"
>   + " and  
> on Linux systms,\n"
>   + " and  PrimaryGroupID> on MacOS.";
> This requirement can not be met sometimes (e.g. because of the use of LDAP) 
> Let's do some examination:
> What exist in /etc/passwd:
> $ more /etc/passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> $ more /etc/passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> The above result says userName  "bin" has userId "2", and "daemon" has userId 
> "1".
>  
> What we can see with "getent passwd" command due to LDAP:
> $ getent passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> bin:x:1:1:bin:/bin:/sbin/nologin
> $ getent passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> daemon:x:2:2:daemon:/sbin:/sbin/nologin
> We can see that there are multiple entries for the same userName with 
> different userIds, and the same userId could be associated with different 
> userNames.
> So the assumption stated in the above DEBUG_INFO message can not be met here. 
> The DEBUG_INFO also stated that HDFS uses name as the only way to identify 
> user/group. I'm filing this JIRA for a solution.
> Hi [~brandonli], since you implemented most of the nfs feature, would you 
> please comment? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5297) Fix dead links in HDFS site documents

2014-01-27 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5297:


  Resolution: Fixed
   Fix Version/s: 2.3.0
  3.0.0
Target Version/s: 2.3.0  (was: 2.4.0)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1. Generated the site and verified that it fixes most of the broken links.

There is one broken link in WebHDFS.apt.vm.

{noformat}{{{RemoteException JSON Schema}}}{noformat} should be 
{noformat}{{RemoteException JSON Schema}}{noformat}

It can be addressed in a separate Jira.

I committed the patch to trunk, branch-2 and branch-2.3. Thanks for the 
contribution Akira-san!

> Fix dead links in HDFS site documents
> -
>
> Key: HDFS-5297
> URL: https://issues.apache.org/jira/browse/HDFS-5297
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5297.patch
>
>
> I found a lot of broken hyperlinks in HDFS document to be fixed.
> Ex.)
> In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below
> {noformat}
>For command usage, see {{{dfsadmin}}}.
> {noformat}
> It should be fixed to 
> {noformat}
>For command usage, see 
> {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883341#comment-13883341
 ] 

Hadoop QA commented on HDFS-5838:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625405/HDFS-5838.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5951//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5951//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5951//console

This message is automatically generated.

> TestcacheDirectives#testCreateAndModifyPools fails
> --
>
> Key: HDFS-5838
> URL: https://issues.apache.org/jira/browse/HDFS-5838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: java7
> Attachments: HDFS-5838.patch
>
>
> testCreateAndModifyPools generates an assertion fail when it runs after 
> testBasicPoolOperations.
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)  Time 
> elapsed: 4.649 sec  <<< FAILURE!
> java.lang.AssertionError: expected no cache pools after deleting pool
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertFalse(Assert.java:68)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160)
> Results :
> Failed tests: 
>   TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no 
> cache pools after deleting pool
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5841) Update HDFS caching documentation with new changes

2014-01-27 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5841:
--

Status: Patch Available  (was: Open)

> Update HDFS caching documentation with new changes
> --
>
> Key: HDFS-5841
> URL: https://issues.apache.org/jira/browse/HDFS-5841
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: caching
> Attachments: hdfs-5841-1.patch
>
>
> The caching documentation is a little out of date, since it's missing 
> description of features like TTL and expiration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5841) Update HDFS caching documentation with new changes

2014-01-27 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5841:
--

Attachment: hdfs-5841-1.patch

Patch attached, I also took the opportunity to reorg some of the content 
(hopefully for the better). The diff is kind of hard to review, just looking at 
it via {{mvn site:site}} is probably easiest.

> Update HDFS caching documentation with new changes
> --
>
> Key: HDFS-5841
> URL: https://issues.apache.org/jira/browse/HDFS-5841
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: caching
> Attachments: hdfs-5841-1.patch
>
>
> The caching documentation is a little out of date, since it's missing 
> description of features like TTL and expiration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5841) Update HDFS caching documentation with new changes

2014-01-27 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-5841:
-

 Summary: Update HDFS caching documentation with new changes
 Key: HDFS-5841
 URL: https://issues.apache.org/jira/browse/HDFS-5841
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang


The caching documentation is a little out of date, since it's missing 
description of features like TTL and expiration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5297) Fix dead links in HDFS site documents

2014-01-27 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5297:


Summary: Fix dead links in HDFS site documents  (was: Fix dead links in 
HDFS document)

> Fix dead links in HDFS site documents
> -
>
> Key: HDFS-5297
> URL: https://issues.apache.org/jira/browse/HDFS-5297
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Attachments: HDFS-5297.patch
>
>
> I found a lot of broken hyperlinks in HDFS document to be fixed.
> Ex.)
> In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below
> {noformat}
>For command usage, see {{{dfsadmin}}}.
> {noformat}
> It should be fixed to 
> {noformat}
>For command usage, see 
> {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA

2014-01-27 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5138:
-

   Resolution: Fixed
Fix Version/s: 3.0.0
 Hadoop Flags: Incompatible change,Reviewed
   Status: Resolved  (was: Patch Available)

Resolving this with a fix version of 3.0.0.

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-27 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883289#comment-13883289
 ] 

Aaron T. Myers commented on HDFS-5138:
--

Hi Suresh, it's obviously fine that you're busy (we all are) but in the future 
please just let me know that you intend to review it and that we should hold 
off for committing it for a bit. I reached out to you more than once last week 
to ask about a review timeline and never heard back from you, so I asked Todd 
to commit it (I'm traveling at the moment) given the silence.

bq. I had brought up one issue about potentially losing editlogs on JournalNode.

This scenario isn't possible as you described because either the pre-upgrade or 
upgrade stages (depending upon when the original failure happened) will fail to 
rename the dir if it already exists.

That said, your points about improving the documentation and the recovery 
procedure in the event of partial failure of the upgrade are well taken and 
certainly worth addressing. Upon looking at it further, I also think we should 
change a few of the assertions in the code to be actual exceptions, since we 
shouldn't have to be running with assertions enabled to check these error 
conditions, which should harden all of these code paths a bit more.

bq. please address the comments before merging to branch-2.

OK, I've filed HDFS-5840 to address your latest comments. Please follow that 
JIRA and review it as promptly as you can. I'm going to resolve this JIRA for 
now with a fix version of 3.0.0 and will merge both JIRAs to branch-2 when 
HDFS-5840 is completed.

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures

2014-01-27 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883273#comment-13883273
 ] 

Aaron T. Myers commented on HDFS-5840:
--

>From Suresh:

I am adding information about the design, the way I understand it. Let me know 
if I got it wrong.
*Upgrade preparation:*
# New bits are installed on the cluster nodes.
# The cluster is brought down.

*Upgrade:* For HA setup, choose one of the namenodes to initiate upgrade on and 
start it with -upgrade flag.
# NN performs preupgrade for all non shared storage directories by moving 
current to previous.tmp and creating new current.
#* Failure here is fine. NN start up fails. Next attempt at upgrade the storage 
directories are recovered.
# NN performs preupgrade of shared edits (NFS/JournalNodes) over RPC. 
JournalNodes current moved to previous.tmp and new current is created.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog 
directory could be lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of non shared edits by writing new CTIME to current and 
moving previous.tmp to previous.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog 
directory could be lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of shared edits (NFS/JournalNodes) over RPC. JournalNodes 
current has new CTIM and previous.tmp is moved to previous.
# We need to document that all the JournalNodes must be up. If a JN is 
irrecoverably lost, configuration must be changed to exclude the JN.

*Rollback:* NN is started with rollback flag
# For all the non shared directories, the NN checks for canRollBack, 
essentially ensures that previous directory with the right layout version 
exists.
# For all the shared directories, the NN checks for canRollBack, essentially 
ensures that previous directory with the right layout version exists.
# NN performs rollback for shared directories (moving previous to current)
#* If rollback of one of the JN fails, then directories are in inconsistent 
state. I think any attempt at retrying rollback will fail and will require 
manually moving files around. I do not think restarting JN fixes this.
# We need to document that all the JournalNodes must be up. If a JN is 
irrecoverably lost, configuration must be changed to exclude the JN.

*Finalize:* DFSAdmin command is run to finalize the upgrade.
# Active NN performs finalizing of editlog. If JN's fail to finalize, active NN 
fails to finalize. However it is possible that standby finalizes, leaving the 
cluster in an inconsistent state.
# We need to document that all the JournalNodes must be up. If a JN is 
irrecoverably lost, configuration must be changed to exclude the JN.

Comments on the code in the patch (this is almost complete):
Comments:
# Minor nit: there are some white space changes
# assertAllResultsEqual - for loop can just start with i = 1? Also if the 
collection objects is of size zero or one, the method can return early. Is 
there a need to do object.toArray() for these early checks? With that, perhaps 
the findbugs exclude may not be necessary.
# Unit test can be added for methods isAtLeastOneActive, 
getRpcAddressesForNameserviceId and getProxiesForAllNameNodesInNameservice (I 
am okay if this is done in a separate jira)
# Finalizing upgrade is quite tricky. Consider the following scenarios:
#* One NN is active and the other is standby - works fine
#* One NN is active and the other is down or all NNs - finalize command throws 
exception and the user will not know if it has succeeded or failed and what to 
do next
#* No active NN - throws an exception cannot finalize with no active
#* BlockPoolSliceStorage.java change seems unnecessary
# Why is {{throw new AssertionError("Unreachable code.");}} in 
QuorumJournalManager.java methods?
# FSImage#doRollBack() - when canRollBack is false after checking if non-share 
directories can rollback, an exception must be immediately thrown, instead of 
checking shared editlog. Also printing Log.info when storages can be rolled 
back will help in debugging.
# FSEditlog#canRollBackSharedLog should accept StorageInfo instead of Storage
# QuorumJournalManager#canRollBack and getJournalCTime can throw AssertionError 
(from DFSUtil.assertAllResultsEqual()). Is that the right exception to expose 
or IOException?
# Namenode startup throws AssertionError with -rollback option. I think we 
should throw IOException, which is how all the other failures are indicated.

> Follow-up to HDFS-5138 to improve error handling during partial upgrade 
> failures
> 
>
> Key: HDFS-5840
> URL: https://issues.apache.org/jira/browse/HDFS-5840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.

[jira] [Created] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures

2014-01-27 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5840:


 Summary: Follow-up to HDFS-5138 to improve error handling during 
partial upgrade failures
 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 3.0.0


Suresh posted some good comment in HDFS-5138 after that patch had already been 
committed to trunk. This JIRA is to address those. See the first comment of 
this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.

2014-01-27 Thread Sachin Jose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Jose updated HDFS-5608:
--

Attachment: HDFS-5608.4.patch

Addressed above review comments and added end to end test cases for webhdfs , 
jsonutils , AclPermissionParam. Please review the same.

> WebHDFS: implement GETACLSTATUS and SETACL.
> ---
>
> Key: HDFS-5608
> URL: https://issues.apache.org/jira/browse/HDFS-5608
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Sachin Jose
> Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch, HDFS-5608.2.patch, 
> HDFS-5608.3.patch, HDFS-5608.4.patch
>
>
> Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883263#comment-13883263
 ] 

Hadoop QA commented on HDFS-5781:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12624851/HDFS-5781.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5950//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5950//console

This message is automatically generated.

> Use an array to record the mapping between FSEditLogOpCode and the 
> corresponding byte value
> ---
>
> Key: HDFS-5781
> URL: https://issues.apache.org/jira/browse/HDFS-5781
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, 
> HDFS-5781.002.patch, HDFS-5781.002.patch
>
>
> HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a 
> given byte value. While improving the efficiency, it may cause issue. E.g., 
> when several new editlog ops are added to trunk around the same time (for 
> several different new features), it is hard to backport the editlog ops with 
> larger byte values to branch-2 before those with smaller values, since there 
> will be gaps in the byte values of the enum. 
> This jira plans to still use an array to record the mapping between editlog 
> ops and their byte values, and allow gap between valid ops. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.

2014-01-27 Thread Sachin Jose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Jose updated HDFS-5608:
--

Attachment: (was: HDFS-5608.4.patch)

> WebHDFS: implement GETACLSTATUS and SETACL.
> ---
>
> Key: HDFS-5608
> URL: https://issues.apache.org/jira/browse/HDFS-5608
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Sachin Jose
> Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch, HDFS-5608.2.patch, 
> HDFS-5608.3.patch
>
>
> Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.

2014-01-27 Thread Sachin Jose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Jose updated HDFS-5608:
--

Attachment: HDFS-5608.4.patch

> WebHDFS: implement GETACLSTATUS and SETACL.
> ---
>
> Key: HDFS-5608
> URL: https://issues.apache.org/jira/browse/HDFS-5608
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Sachin Jose
> Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch, HDFS-5608.2.patch, 
> HDFS-5608.3.patch, HDFS-5608.4.patch
>
>
> Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-01-27 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883236#comment-13883236
 ] 

Daryn Sharp commented on HDFS-4564:
---

Ok, will split.

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-4564.branch-23.patch
>
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes

2014-01-27 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883234#comment-13883234
 ] 

Yongjun Zhang commented on HDFS-5767:
-

Hi [~brandonli],

What I suggested in my last update is a simplified solution for unique 
 mapping. I don't have solution that supports multi-mapping 
yet (I also put this aside a bit due to other stuff), but let me take a further 
look on that. 

BTW, for your info, http://linux.die.net/man/5/nsswitch.conf  defines the 
search order as:

"One or more service specifications e.g., "files", "db", or "nis". The order of 
the services on the line determines the order in which those services will be 
queried, in turn, until a result is found. "

Thanks.


> Nfs implementation assumes userName userId mapping to be unique, which is not 
> true sometimes
> 
>
> Key: HDFS-5767
> URL: https://issues.apache.org/jira/browse/HDFS-5767
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
> Environment: With LDAP enabled
>Reporter: Yongjun Zhang
>Assignee: Brandon Li
>
> I'm seeing that the nfs implementation assumes unique  pair 
> to be returned by command  "getent paswd". That is, for a given userName, 
> there should be a single userId, and for a given userId, there should be a 
> single userName.  The reason is explained in the following message:
>  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway 
> can't start with duplicate name or id on the host system.\n"
>   + "This is because HDFS (non-kerberos cluster) uses name as the only 
> way to identify a user or group.\n"
>   + "The host system with duplicated user/group name or id might work 
> fine most of the time by itself.\n"
>   + "However when NFS gateway talks to HDFS, HDFS accepts only user and 
> group name.\n"
>   + "Therefore, same name means the same user or same group. To find the 
> duplicated names/ids, one can do:\n"
>   + " and  
> on Linux systms,\n"
>   + " and  PrimaryGroupID> on MacOS.";
> This requirement can not be met sometimes (e.g. because of the use of LDAP) 
> Let's do some examination:
> What exist in /etc/passwd:
> $ more /etc/passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> $ more /etc/passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> The above result says userName  "bin" has userId "2", and "daemon" has userId 
> "1".
>  
> What we can see with "getent passwd" command due to LDAP:
> $ getent passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> bin:x:1:1:bin:/bin:/sbin/nologin
> $ getent passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> daemon:x:2:2:daemon:/sbin:/sbin/nologin
> We can see that there are multiple entries for the same userName with 
> different userIds, and the same userId could be associated with different 
> userNames.
> So the assumption stated in the above DEBUG_INFO message can not be met here. 
> The DEBUG_INFO also stated that HDFS uses name as the only way to identify 
> user/group. I'm filing this JIRA for a solution.
> Hi [~brandonli], since you implemented most of the nfs feature, would you 
> please comment? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5698) Use protobuf to serialize / deserialize FSImage

2014-01-27 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5698:
-

Attachment: HDFS-5698.001.patch

> Use protobuf to serialize / deserialize FSImage
> ---
>
> Key: HDFS-5698
> URL: https://issues.apache.org/jira/browse/HDFS-5698
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch
>
>
> Currently, the code serializes FSImage using in-house serialization 
> mechanisms. There are a couple disadvantages of the current approach:
> # Mixing the responsibility of reconstruction and serialization / 
> deserialization. The current code paths of serialization / deserialization 
> have spent a lot of effort on maintaining compatibility. What is worse is 
> that they are mixed with the complex logic of reconstructing the namespace, 
> making the code difficult to follow.
> # Poor documentation of the current FSImage format. The format of the FSImage 
> is practically defined by the implementation. An bug in implementation means 
> a bug in the specification. Furthermore, it also makes writing third-party 
> tools quite difficult.
> # Changing schemas is non-trivial. Adding a field in FSImage requires bumping 
> the layout version every time. Bumping out layout version requires (1) the 
> users to explicitly upgrade the clusters, and (2) putting new code to 
> maintain backward compatibility.
> This jira proposes to use protobuf to serialize the FSImage. Protobuf has 
> been used to serialize / deserialize the RPC message in Hadoop.
> Protobuf addresses all the above problems. It clearly separates the 
> responsibility of serialization and reconstructing the namespace. The 
> protobuf files document the current format of the FSImage. The developers now 
> can add optional fields with ease, since the old code can always read the new 
> FSImage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5797) Implement offline image viewer.

2014-01-27 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-5797.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

I've committed this.

> Implement offline image viewer.
> ---
>
> Key: HDFS-5797
> URL: https://issues.apache.org/jira/browse/HDFS-5797
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: HDFS-5698 (FSImage in protobuf)
>
> Attachments: HDFS-5797.000.patch, HDFS-5797.001.patch
>
>
> The format of FSImage has changed dramatically therefore a new implementation 
> of OfflineImageViewer is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5797) Implement offline image viewer.

2014-01-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883180#comment-13883180
 ] 

Jing Zhao commented on HDFS-5797:
-

I've tested this patch and looks like a new oiv can work now. Some comments:
# The new FSImageUtil will add another util class for fsimage. Looks like we 
need to do some code refactoring here. But since we will finally need to remove 
all the old saver classes/methods, I think we can do it there.
# The lsr part will cost memory. I guess we can create a separate jira in the 
future to improve it.

Thus I think we can commit this patch first and address the remaining issues 
later. +1

> Implement offline image viewer.
> ---
>
> Key: HDFS-5797
> URL: https://issues.apache.org/jira/browse/HDFS-5797
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: HDFS-5698 (FSImage in protobuf)
>
> Attachments: HDFS-5797.000.patch, HDFS-5797.001.patch
>
>
> The format of FSImage has changed dramatically therefore a new implementation 
> of OfflineImageViewer is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-01-27 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883164#comment-13883164
 ] 

Alejandro Abdelnur commented on HDFS-4564:
--

about splitting the JIRA, for track-ability I think it will be easier to have 2 
separate JIRAs/commits, we can have both of them ready and commit them in 
tandem.

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-4564.branch-23.patch
>
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-01-27 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883162#comment-13883162
 ] 

Alejandro Abdelnur commented on HDFS-4564:
--

[~daryn], thx for sniffing around to see what s going on. So it seems the 
{{KerberosAuthenticator}} (hadoop-auth Kerberos client side), could be 
simplified to remove all the SPNEGO handshake and let the JDK do that provided 
you are in a DO-AS block. The {{KerberosAuthenticator}} would simply extract 
the AUTH_COOKIE into a hadoop-auth token cookie via 
{{AuthenticatedURL.extractToken(conn, token)}} and delegate to the fallback if 
no cookie is present. The presence of the hadoop-auth token cookie, when using 
the AuthenticatedUrl, will skip completely the 'authentication' path in both 
the client and the server side. Now, what we have to see is what happens when 
you are UGI logged in but you don't to this within a DO-AS block.


> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-4564.branch-23.patch
>
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk

2014-01-27 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-5839:
-

Attachment: org.apache.hadoop.hdfs.web.TestWebHDFS-output.txt

> TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk
> 
>
> Key: HDFS-5839
> URL: https://issues.apache.org/jira/browse/HDFS-5839
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: org.apache.hadoop.hdfs.web.TestWebHDFS-output.txt
>
>
> Here is test failure:
> {code}
> testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 45.206 sec  <<< FAILURE!
> java.lang.AssertionError: There are 1 exception(s):
>   Exception 0: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
> at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:104)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:615)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlOpener.connect(WebHdfsFileSystem.java:878)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:180)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at 
> org.apache.hadoop.hdfs.TestDFSClientRetries$5.run(TestDFSClientRetries.java:954)
> at java.lang.Thread.run(Thread.java:724)
> at org.junit.Assert.fail(Assert.java:93)
> at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1083)
> at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:1003)
> at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
> {code}
> From test output:
> {code}
> 2014-01-27 17:55:59,388 WARN  resources.ExceptionHandler 
> (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:166)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:231)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:658)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.access$400(NamenodeWebHdfsMethods.java:116)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:631)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:626)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1560)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:626)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
> at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
> at 
> com.sun.jersey.server.impl.uri.rules.Right

[jira] [Updated] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk

2014-01-27 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-5839:
-

Description: 
Here is test failure:
{code}
testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
45.206 sec  <<< FAILURE!
java.lang.AssertionError: There are 1 exception(s):
  Exception 0: 
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
at 
org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:104)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:615)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlOpener.connect(WebHdfsFileSystem.java:878)
at 
org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119)
at 
org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
at 
org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:180)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.hdfs.TestDFSClientRetries$5.run(TestDFSClientRetries.java:954)
at java.lang.Thread.run(Thread.java:724)

at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1083)
at 
org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:1003)
at 
org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
{code}
>From test output:
{code}
2014-01-27 17:55:59,388 WARN  resources.ExceptionHandler 
(ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:166)
at 
org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:231)
at 
org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:658)
at 
org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.access$400(NamenodeWebHdfsMethods.java:116)
at 
org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:631)
at 
org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:626)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1560)
at 
org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:626)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
{code}

> TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk
> 
>
> Key: HDFS-5839
> URL: https://issues.apache.org/jira/browse/HDFS-5839
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>
> Here is test failure:
> {code}
> testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 

[jira] [Created] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk

2014-01-27 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5839:


 Summary: TestWebHDFS#testNamenodeRestart fails with 
NullPointerException in trunk
 Key: HDFS-5839
 URL: https://issues.apache.org/jira/browse/HDFS-5839
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails

2014-01-27 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883130#comment-13883130
 ] 

Mit Desai commented on HDFS-5838:
-

I think the failure will be intermittent. If these tests would run in opposite 
order the assertion error may not pop up. Adding a label "java7" so that it can 
be tracked as a JDK7 issue

> TestcacheDirectives#testCreateAndModifyPools fails
> --
>
> Key: HDFS-5838
> URL: https://issues.apache.org/jira/browse/HDFS-5838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: java7
> Attachments: HDFS-5838.patch
>
>
> testCreateAndModifyPools generates an assertion fail when it runs after 
> testBasicPoolOperations.
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)  Time 
> elapsed: 4.649 sec  <<< FAILURE!
> java.lang.AssertionError: expected no cache pools after deleting pool
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertFalse(Assert.java:68)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160)
> Results :
> Failed tests: 
>   TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no 
> cache pools after deleting pool
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value

2014-01-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883131#comment-13883131
 ] 

Jing Zhao commented on HDFS-5781:
-

Thanks for the comment Daryn. In general, this patch just changes back to the 
original behavior, which also uses the static block initializer. I agree it 
will be a pain to debug static block initializers, that's why in my 001 patch I 
try to make the initializer simpler. I think we can create a separate jira to 
see if we can avoid using it.

> Use an array to record the mapping between FSEditLogOpCode and the 
> corresponding byte value
> ---
>
> Key: HDFS-5781
> URL: https://issues.apache.org/jira/browse/HDFS-5781
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, 
> HDFS-5781.002.patch, HDFS-5781.002.patch
>
>
> HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a 
> given byte value. While improving the efficiency, it may cause issue. E.g., 
> when several new editlog ops are added to trunk around the same time (for 
> several different new features), it is hard to backport the editlog ops with 
> larger byte values to branch-2 before those with smaller values, since there 
> will be gaps in the byte values of the enum. 
> This jira plans to still use an array to record the mapping between editlog 
> ops and their byte values, and allow gap between valid ops. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails

2014-01-27 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5838:


Labels: java7  (was: )

> TestcacheDirectives#testCreateAndModifyPools fails
> --
>
> Key: HDFS-5838
> URL: https://issues.apache.org/jira/browse/HDFS-5838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: java7
> Attachments: HDFS-5838.patch
>
>
> testCreateAndModifyPools generates an assertion fail when it runs after 
> testBasicPoolOperations.
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)  Time 
> elapsed: 4.649 sec  <<< FAILURE!
> java.lang.AssertionError: expected no cache pools after deleting pool
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertFalse(Assert.java:68)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160)
> Results :
> Failed tests: 
>   TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no 
> cache pools after deleting pool
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails

2014-01-27 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5838:


Status: Patch Available  (was: Open)

> TestcacheDirectives#testCreateAndModifyPools fails
> --
>
> Key: HDFS-5838
> URL: https://issues.apache.org/jira/browse/HDFS-5838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5838.patch
>
>
> testCreateAndModifyPools generates an assertion fail when it runs after 
> testBasicPoolOperations.
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)  Time 
> elapsed: 4.649 sec  <<< FAILURE!
> java.lang.AssertionError: expected no cache pools after deleting pool
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertFalse(Assert.java:68)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160)
> Results :
> Failed tests: 
>   TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no 
> cache pools after deleting pool
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()

2014-01-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883123#comment-13883123
 ] 

Hudson commented on HDFS-5825:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5044 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5044/])
HDFS-5825. Use FileUtils.copyFile() to implement DFSTestUtils.copyFile(). 
(Contributed by Haohui Mai) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561792)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java


> Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
> -
>
> Key: HDFS-5825
> URL: https://issues.apache.org/jira/browse/HDFS-5825
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-5825.000.patch
>
>
> {{DFSTestUtils.copyFile()}} is implemented by copying data through 
> FileInputStream / FileOutputStream. Apache Common IO provides 
> {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient.
> This jira proposes to implement {{DFSTestUtils.copyFile()}} using 
> {{FileUtils.copyFile()}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value

2014-01-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883124#comment-13883124
 ] 

Hudson commented on HDFS-5781:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5044 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5044/])
HDFS-5781. Use an array to record the mapping between FSEditLogOpCode and the 
corresponding byte value. Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561788)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java


> Use an array to record the mapping between FSEditLogOpCode and the 
> corresponding byte value
> ---
>
> Key: HDFS-5781
> URL: https://issues.apache.org/jira/browse/HDFS-5781
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, 
> HDFS-5781.002.patch, HDFS-5781.002.patch
>
>
> HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a 
> given byte value. While improving the efficiency, it may cause issue. E.g., 
> when several new editlog ops are added to trunk around the same time (for 
> several different new features), it is hard to backport the editlog ops with 
> larger byte values to branch-2 before those with smaller values, since there 
> will be gaps in the byte values of the enum. 
> This jira plans to still use an array to record the mapping between editlog 
> ops and their byte values, and allow gap between valid ops. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()

2014-01-27 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5825:


  Resolution: Fixed
   Fix Version/s: 2.3.0
Target Version/s: 2.3.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the contribution Haohui!

I committed this to trunk, branch-2 and branch-2.3.

> Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
> -
>
> Key: HDFS-5825
> URL: https://issues.apache.org/jira/browse/HDFS-5825
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-5825.000.patch
>
>
> {{DFSTestUtils.copyFile()}} is implemented by copying data through 
> FileInputStream / FileOutputStream. Apache Common IO provides 
> {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient.
> This jira proposes to implement {{DFSTestUtils.copyFile()}} using 
> {{FileUtils.copyFile()}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails

2014-01-27 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5838:


Attachment: HDFS-5838.patch

testBasicPoolOperations creates a pool "pool2" which gets never removed.

This pool pops up in the when the testCreateAndModifyPools checks for existing 
pools and gets an assertion fail

> TestcacheDirectives#testCreateAndModifyPools fails
> --
>
> Key: HDFS-5838
> URL: https://issues.apache.org/jira/browse/HDFS-5838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5838.patch
>
>
> testCreateAndModifyPools generates an assertion fail when it runs after 
> testBasicPoolOperations.
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
> test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)  Time 
> elapsed: 4.649 sec  <<< FAILURE!
> java.lang.AssertionError: expected no cache pools after deleting pool
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertFalse(Assert.java:68)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160)
> Results :
> Failed tests: 
>   TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no 
> cache pools after deleting pool
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value

2014-01-27 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5781:


   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the review, Colin! I've committed this to trunk and branch-2.

> Use an array to record the mapping between FSEditLogOpCode and the 
> corresponding byte value
> ---
>
> Key: HDFS-5781
> URL: https://issues.apache.org/jira/browse/HDFS-5781
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, 
> HDFS-5781.002.patch, HDFS-5781.002.patch
>
>
> HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a 
> given byte value. While improving the efficiency, it may cause issue. E.g., 
> when several new editlog ops are added to trunk around the same time (for 
> several different new features), it is hard to backport the editlog ops with 
> larger byte values to branch-2 before those with smaller values, since there 
> will be gaps in the byte values of the enum. 
> This jira plans to still use an array to record the mapping between editlog 
> ops and their byte values, and allow gap between valid ops. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883115#comment-13883115
 ] 

Brandon Li commented on HDFS-5754:
--

{quote} I think we need two maps. Do you agree?{quote}
Yes. We need to maps here. 
Looks like it's hard to keep the patch at a minimal size. Uploaded a new patch.

> Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
> 
>
> Key: HDFS-5754
> URL: https://issues.apache.org/jira/browse/HDFS-5754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Attachments: FeatureInfo.patch, HDFS-5754.001.patch, 
> HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, 
> HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, 
> HDFS-5754.009.patch
>
>
> Currently, LayoutVersion defines the on-disk data format and supported 
> features of the entire cluster including NN and DNs.  LayoutVersion is 
> persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
> supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
> different LayoutVersion than NN cannot register with the NN.
> We propose to split LayoutVersion into two independent values that are local 
> to the nodes:
> - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
> the format of FSImage, editlog and the directory structure.
> - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
> the format of block data file, metadata file, block pool layout, and the 
> directory structure.  
> The LayoutVersion check will be removed in DN registration.  If 
> NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
> upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails

2014-01-27 Thread Mit Desai (JIRA)
Mit Desai created HDFS-5838:
---

 Summary: TestcacheDirectives#testCreateAndModifyPools fails
 Key: HDFS-5838
 URL: https://issues.apache.org/jira/browse/HDFS-5838
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Mit Desai
Assignee: Mit Desai


testCreateAndModifyPools generates an assertion fail when it runs after 
testBasicPoolOperations.

{noformat}
Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)  Time elapsed: 
4.649 sec  <<< FAILURE!
java.lang.AssertionError: expected no cache pools after deleting pool
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertFalse(Assert.java:68)
at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334)
at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160)


Results :

Failed tests: 
  TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no cache 
pools after deleting pool
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5754:
-

Attachment: HDFS-5754.009.patch

> Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
> 
>
> Key: HDFS-5754
> URL: https://issues.apache.org/jira/browse/HDFS-5754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Attachments: FeatureInfo.patch, HDFS-5754.001.patch, 
> HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, 
> HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, 
> HDFS-5754.009.patch
>
>
> Currently, LayoutVersion defines the on-disk data format and supported 
> features of the entire cluster including NN and DNs.  LayoutVersion is 
> persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
> supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
> different LayoutVersion than NN cannot register with the NN.
> We propose to split LayoutVersion into two independent values that are local 
> to the nodes:
> - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
> the format of FSImage, editlog and the directory structure.
> - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
> the format of block data file, metadata file, block pool layout, and the 
> directory structure.  
> The LayoutVersion check will be removed in DN registration.  If 
> NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
> upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-27 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883110#comment-13883110
 ] 

Suresh Srinivas commented on HDFS-5138:
---

[~atm], please address the comments before merging to branch-2.

My main concern apart from comments on the code is, the need to have all JNs 
and when any of the steps related to a JN fails, the boundary conditions that 
arise out of it. These issues can result in loss of metadata and very involved, 
error prone recovery procedure. It also might need the system to be restarted 
(say finalize fails because one of the JNs is not up). Please look at the 
comments on the design and see if I understand it correctly.

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value

2014-01-27 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883105#comment-13883105
 ] 

Daryn Sharp commented on HDFS-5781:
---

In general static block initializers are frowned upon - I've been dinged for 
them in the past.  If they ever throw an exception it causes the jvm to 
misreport the exception in very bizarre ways.

> Use an array to record the mapping between FSEditLogOpCode and the 
> corresponding byte value
> ---
>
> Key: HDFS-5781
> URL: https://issues.apache.org/jira/browse/HDFS-5781
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, 
> HDFS-5781.002.patch, HDFS-5781.002.patch
>
>
> HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a 
> given byte value. While improving the efficiency, it may cause issue. E.g., 
> when several new editlog ops are added to trunk around the same time (for 
> several different new features), it is hard to backport the editlog ops with 
> larger byte values to branch-2 before those with smaller values, since there 
> will be gaps in the byte values of the enum. 
> This jira plans to still use an array to record the mapping between editlog 
> ops and their byte values, and allow gap between valid ops. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883097#comment-13883097
 ] 

Hadoop QA commented on HDFS-5830:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625374/HDFS-5830.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
-12 warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5949//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5949//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5949//console

This message is automatically generated.

> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing another cluster. 
> 
>
> Key: HDFS-5830
> URL: https://issues.apache.org/jira/browse/HDFS-5830
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Blocker
> Attachments: HDFS-5830.001.patch
>
>
> WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when 
> accessing a another cluster (that doesn't have caching support). 
> java.lang.IllegalArgumentException: cachedLocs should not be null, use a 
> different constructor
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446)
> at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812)
> at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-01-27 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883095#comment-13883095
 ] 

Daryn Sharp commented on HDFS-4564:
---

I just sniffed our secure clusters doing a hadoop fs ls.  It did not prefetch 
service tickets.  The server requested spnego for the getDelegationToken 
request, client sent service ticket.  The client then sent a file stat and list 
status.  Both operations sent the delegation token sans a service ticket.  This 
is with JDK7 although different JDKs may have different behavior.

I'm not sure it would be easy to ensure the client never does a pre-fetch of a 
service ticket -- assuming other JDKs do that.   About the only way I can 
conceive of is create a new subject/ugi with only the token.  Token ops use the 
current user, whereas other ops use the new subject.  I'm not necessarily 
suggesting this approach...

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-4564.branch-23.patch
>
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5837) dfs.namenode.replication.considerLoad does not consider decommissioned nodes

2014-01-27 Thread Bryan Beaudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault updated HDFS-5837:


Description: 
In DefaultBlockPlacementPolicy, there is a setting 
dfs.namenode.replication.considerLoad which tries to balance the load of the 
cluster when choosing replica locations.  This code does not take into account 
decommissioned nodes.

The code for considerLoad calculates the load by doing:  TotalClusterLoad / 
numNodes.  However, numNodes includes decommissioned nodes (which have 0 load). 
 Therefore, the average load is artificially low.  Example:

TotalLoad = 250
numNodes = 100
decommissionedNodes = 70
remainingNodes = numNodes - decommissionedNodes = 30

avgLoad = 250/100 = 2.50
trueAvgLoad = 250 / 30 = 8.33

If the real load of the remaining 30 nodes is (on average) 8.33, this is more 
than 2x the calculated average load of 2.50.  This causes these nodes to be 
rejected as replica locations. The final result is that all nodes are rejected, 
and no replicas can be placed.  

See exceptions printed from client during this scenario: 
https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1


  was:
In DefaultBlockPlacementPolicy, there is a setting 
dfs.namenode.replication.considerLoad which tries to balance the load of the 
cluster when choosing replica locations.  This code does not take into account 
decommissioned nodes.

The code for considerLoad calculates the load by doing:  TotalClusterLoad /
numNodes.  However, numNodes includes decommissioned nodes (which have 0 load). 
 Therefore, the average load is artificially low.  Example:

TotalLoad = 250
numNodes = 100
decommissionedNodes = 50

avgLoad = 250/100 = 2.50
trueAvgLoad = 250 / (100 - 70) = 8.33

If the real load of the remaining 30 nodes is (on average) 8.33, this is more 
than 2x the calculated average load of 2.50.  This causes these nodes to be 
rejected as replica locations. The final result is that all nodes are rejected, 
and no replicas can be placed.  

See exceptions printed from client during this scenario: 
https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1



> dfs.namenode.replication.considerLoad does not consider decommissioned nodes
> 
>
> Key: HDFS-5837
> URL: https://issues.apache.org/jira/browse/HDFS-5837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Bryan Beaudreault
>
> In DefaultBlockPlacementPolicy, there is a setting 
> dfs.namenode.replication.considerLoad which tries to balance the load of the 
> cluster when choosing replica locations.  This code does not take into 
> account decommissioned nodes.
> The code for considerLoad calculates the load by doing:  TotalClusterLoad / 
> numNodes.  However, numNodes includes decommissioned nodes (which have 0 
> load).  Therefore, the average load is artificially low.  Example:
> TotalLoad = 250
> numNodes = 100
> decommissionedNodes = 70
> remainingNodes = numNodes - decommissionedNodes = 30
> avgLoad = 250/100 = 2.50
> trueAvgLoad = 250 / 30 = 8.33
> If the real load of the remaining 30 nodes is (on average) 8.33, this is more 
> than 2x the calculated average load of 2.50.  This causes these nodes to be 
> rejected as replica locations. The final result is that all nodes are 
> rejected, and no replicas can be placed.  
> See exceptions printed from client during this scenario: 
> https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >