[jira] [Moved] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA moved HADOOP-10299 to HDFS-5844: -- Component/s: (was: documentation) documentation Target Version/s: 2.3.0 (was: 2.3.0) Affects Version/s: (was: 2.2.0) 2.2.0 Key: HDFS-5844 (was: HADOOP-10299) Project: Hadoop HDFS (was: Hadoop Common) > Fix broken link in WebHDFS.apt.vm > - > > Key: HDFS-5844 > URL: https://issues.apache.org/jira/browse/HDFS-5844 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > > There is one broken link in WebHDFS.apt.vm. > {code} > {{{RemoteException JSON Schema}}} > {code} > should be > {code} > {{RemoteException JSON Schema}} > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5754: - Attachment: HDFS-5754.010.patch Rebased the patch with the HDFS-5535 branch along with a couple unit test fixes. > Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion > > > Key: HDFS-5754 > URL: https://issues.apache.org/jira/browse/HDFS-5754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Brandon Li > Attachments: FeatureInfo.patch, HDFS-5754.001.patch, > HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, > HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, > HDFS-5754.009.patch, HDFS-5754.010.patch > > > Currently, LayoutVersion defines the on-disk data format and supported > features of the entire cluster including NN and DNs. LayoutVersion is > persisted in both NN and DNs. When a NN/DN starts up, it checks its > supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a > different LayoutVersion than NN cannot register with the NN. > We propose to split LayoutVersion into two independent values that are local > to the nodes: > - NamenodeLayoutVersion - defines the on-disk data format in NN, including > the format of FSImage, editlog and the directory structure. > - DatanodeLayoutVersion - defines the on-disk data format in DN, including > the format of block data file, metadata file, block pool layout, and the > directory structure. > The LayoutVersion check will be removed in DN registration. If > NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling > upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4854) Fix the broken images in the Federation documentation page
[ https://issues.apache.org/jira/browse/HDFS-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-4854: Resolution: Duplicate Status: Resolved (was: Patch Available) The links were fixed in HDFS-5231. Closing this issue as duplicate. > Fix the broken images in the Federation documentation page > -- > > Key: HDFS-4854 > URL: https://issues.apache.org/jira/browse/HDFS-4854 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.0.4-alpha >Reporter: Stephen Chu >Assignee: Stephen Chu > Attachments: HDFS-4854.patch > > > Currently, there are two broken images in the Federation documentation > http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html. > federation.gif and federation-background.gif are inside hadoop-yarn-project > site resources, but Federation.apt.vm has moved to hadoop-hdfs-project. > We should move these two .gifs back to hadoop-hdfs-project and fix the image > links in Federation.apt.vm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883823#comment-13883823 ] Liang Xie commented on HDFS-5776: - bq. Isn't the call to actualGetFromOneDataNode wrapped in a loop itself? I am talking about the while loop in fetchBlockByteRange. Will that not change the behavior? Maybe it is harmless, I am not sure. I just want us to be clear either way. Yes, it doesn't change the whole behavior and harmless, in deed, it's safer than before. In the old impl, the refetchToken/refetchEncryptionKey are shared by all nodes from chooseDataNode once key/token exception happened. that means if the first node consumed this retry quota, then if the second or third node hit the key/token exception, clearDataEncryptionKey/fetchBlockAt opeerations will not be called, it's a little unfair:) In the new impl/patch, we make the second or later node have a similar retry quota as the first node, it's more fair to me. Anyway, it doesn't change the normal path, just safer/fair to the security-enabled scenario. bq. The test looks like a stress test, i.e. we are hoping that some of the hedged requests will complete before the primary requests. We can create a separate Jira to write a deterministic unit test and it’s fine if someone else picks that up later. Ok, I can track it later. For patch v9 or v10, both are OK with me(though our internal branch use the style without limit), since my original wish is to reduce the HBase's P99 and P99.9 latency, not any difference on this point. V9 is safer but probably need to modify HDFS source code again if hit the hardcode limit(It's difficult to a normal end user). IMHO, the actual/final committer who will commit this JIRA can pick one up. It'll be a pity if lots of guys continue to argue this style and hold on the progress, that doesn't help the downstream HBase project at all. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-2892) Some of property descriptions are not given(hdfs-default.xml)
[ https://issues.apache.org/jira/browse/HDFS-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-2892. --- Resolution: Invalid Target Version/s: (was: 2.0.0-alpha, 3.0.0) Resolving as Invalid as these were user questions. > Some of property descriptions are not given(hdfs-default.xml) > -- > > Key: HDFS-2892 > URL: https://issues.apache.org/jira/browse/HDFS-2892 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.0 >Reporter: Brahma Reddy Battula >Priority: Trivial > > Hi..I taken 23.0 release form > http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available > I just gone through all properties provided in the hdfs-default.xml..Some of > the property description not mentioned..It's better to give description of > property and usage(how to configure ) and Only MapReduce related jars only > provided..Please check following two configurations > *No Description* > {noformat} > > dfs.datanode.https.address > 0.0.0.0:50475 > > > dfs.namenode.https-address > 0.0.0.0:50470 > > {noformat} > Better to mention example usage (what to configure...format(syntax))in > desc,here I did not get what default mean whether this name of n/w interface > or something else > > dfs.datanode.dns.interface > default > The name of the Network Interface from which a data node > should > report its IP address. > > > The following property is commented..If it is not supported better to remove. > >dfs.cluster.administrators >ACL for the admins >This configuration is used to control who can access the > default servlets in the namenode, etc. > > > Small clarification for following property..if some value configured this > then NN will be safe mode upto this much time.. > May I know usage of the following property... > > dfs.blockreport.initialDelay 0 > Delay for first block report in seconds. > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5843) DFSClient.getFileChecksum() throws IOException if checksum is disabled
[ https://issues.apache.org/jira/browse/HDFS-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laurent Goujon updated HDFS-5843: - Attachment: hdfs-5843.patch Attaching patch to fix the issue + test case to verify. Thanks for reviewing > DFSClient.getFileChecksum() throws IOException if checksum is disabled > -- > > Key: HDFS-5843 > URL: https://issues.apache.org/jira/browse/HDFS-5843 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Laurent Goujon > Attachments: hdfs-5843.patch > > > If a file is created with checksum disabled (using {{ChecksumOpt.disabled()}} > for example), calling {{FileSystem.getFileChecksum()}} throws the following > IOException: > {noformat} > java.io.IOException: Fail to get block MD5 for > BP-341493254-192.168.1.10-1390888724459:blk_1073741825_1001 > at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1965) > at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1771) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1186) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1194) > [...] > {noformat} > From the logs, the datanode is doing some wrong arithmetics because of the > crcPerBlock: > {noformat} > 2014-01-27 21:58:46,329 ERROR datanode.DataNode (DataXceiver.java:run(225)) - > 127.0.0.1:52398:DataXceiver error processing BLOCK_CHECKSUM operation src: > /127.0.0.1:52407 dest: /127.0.0.1:52398 > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockChecksum(DataXceiver.java:658) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opBlockChecksum(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:695) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5843) DFSClient.getFileChecksum() throws IOException if checksum is disabled
[ https://issues.apache.org/jira/browse/HDFS-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laurent Goujon updated HDFS-5843: - Status: Patch Available (was: Open) > DFSClient.getFileChecksum() throws IOException if checksum is disabled > -- > > Key: HDFS-5843 > URL: https://issues.apache.org/jira/browse/HDFS-5843 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Laurent Goujon > Attachments: hdfs-5843.patch > > > If a file is created with checksum disabled (using {{ChecksumOpt.disabled()}} > for example), calling {{FileSystem.getFileChecksum()}} throws the following > IOException: > {noformat} > java.io.IOException: Fail to get block MD5 for > BP-341493254-192.168.1.10-1390888724459:blk_1073741825_1001 > at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1965) > at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1771) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1186) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1194) > [...] > {noformat} > From the logs, the datanode is doing some wrong arithmetics because of the > crcPerBlock: > {noformat} > 2014-01-27 21:58:46,329 ERROR datanode.DataNode (DataXceiver.java:run(225)) - > 127.0.0.1:52398:DataXceiver error processing BLOCK_CHECKSUM operation src: > /127.0.0.1:52407 dest: /127.0.0.1:52398 > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockChecksum(DataXceiver.java:658) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opBlockChecksum(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:695) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5843) DFSClient.getFileChecksum() throws IOException if checksum is disabled
Laurent Goujon created HDFS-5843: Summary: DFSClient.getFileChecksum() throws IOException if checksum is disabled Key: HDFS-5843 URL: https://issues.apache.org/jira/browse/HDFS-5843 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Laurent Goujon If a file is created with checksum disabled (using {{ChecksumOpt.disabled()}} for example), calling {{FileSystem.getFileChecksum()}} throws the following IOException: {noformat} java.io.IOException: Fail to get block MD5 for BP-341493254-192.168.1.10-1390888724459:blk_1073741825_1001 at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1965) at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1771) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1186) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1194) [...] {noformat} >From the logs, the datanode is doing some wrong arithmetics because of the >crcPerBlock: {noformat} 2014-01-27 21:58:46,329 ERROR datanode.DataNode (DataXceiver.java:run(225)) - 127.0.0.1:52398:DataXceiver error processing BLOCK_CHECKSUM operation src: /127.0.0.1:52407 dest: /127.0.0.1:52398 java.lang.ArithmeticException: / by zero at org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockChecksum(DataXceiver.java:658) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opBlockChecksum(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:695) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5835) Add a new option for starting standby NN when rolling upgrade is in progress
[ https://issues.apache.org/jira/browse/HDFS-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-5835. -- Resolution: Fixed Fix Version/s: HDFS-5535 (Rolling upgrades) Hadoop Flags: Reviewed I have committed this. > Add a new option for starting standby NN when rolling upgrade is in progress > > > Key: HDFS-5835 > URL: https://issues.apache.org/jira/browse/HDFS-5835 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: h5835_20130127.patch > > > When rolling upgrade is already in-progress and the standby NN is not yet > started up, a new startup option is needed for the standby NN to initialize > the upgrade status. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883784#comment-13883784 ] stack commented on HDFS-4239: - I think throwing up an exception the right thing to do. The volume is going away at the operators volition. > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883771#comment-13883771 ] Hadoop QA commented on HDFS-5810: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625506/HDFS-5810.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.TestShortCircuitCache org.apache.hadoop.hdfs.server.namenode.TestNameNodeHttpServer org.apache.hadoop.hdfs.TestParallelShortCircuitReadUnCached {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5958//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5958//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5958//console This message is automatically generated. > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5833) SecondaryNameNode have an incorrect java doc
[ https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883770#comment-13883770 ] Hudson commented on HDFS-5833: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5049 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5049/]) HDFS-5833. Fix incorrect javadoc in SecondaryNameNode. (Contributed by Bangtao Zhou) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561938) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java > SecondaryNameNode have an incorrect java doc > > > Key: HDFS-5833 > URL: https://issues.apache.org/jira/browse/HDFS-5833 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Bangtao Zhou >Priority: Trivial > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5833-1.patch > > > SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode > uses the *NamenodeProtocol* to talk to the primary NameNode, not the > *ClientProtocol* -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5833) SecondaryNameNode have an incorrect java doc
[ https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5833: Resolution: Fixed Fix Version/s: 2.3.0 3.0.0 Target Version/s: 2.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the patch Bangtao. I committed this to trunk, branch-2 and branch-2.3. > SecondaryNameNode have an incorrect java doc > > > Key: HDFS-5833 > URL: https://issues.apache.org/jira/browse/HDFS-5833 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Bangtao Zhou >Priority: Trivial > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5833-1.patch > > > SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode > uses the *NamenodeProtocol* to talk to the primary NameNode, not the > *ClientProtocol* -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883750#comment-13883750 ] Hadoop QA commented on HDFS-5804: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625518/HDFS-5804.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5959//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5959//console This message is automatically generated. > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, > javadoc-after-patch.log, javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883746#comment-13883746 ] Hadoop QA commented on HDFS-5776: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625502/HDFS-5776-v10.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5957//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5957//console This message is automatically generated. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HDFS-5804: -- Attachment: HDFS-5804.patch Test fix > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, > javadoc-after-patch.log, javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281) > at > org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstr
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883728#comment-13883728 ] Jimmy Xiang commented on HDFS-4239: --- We can release the lock after the volume is marked down. No new block will be allocated to this volume. How about those blocks on this volume being writing? The writing could take forever, for example, a rarely updated HLog file. I was thinking to fail the writing pipeline so that the client can set up another pipeline. Any problem with that? > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883717#comment-13883717 ] stack commented on HDFS-5776: - [~arpitagarwal] Would v10 be palatable? You say OK to v9 above but Colin review would favor v10? [~xieliang007] Can you take care of the other nits raised by [~arpitagarwal] Good stuff. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883702#comment-13883702 ] Andrew Wang commented on HDFS-5746: --- Nice work here. I have a fair number of review comments, but most of it's nitty: I didn't see anything named ShortCircuitSharedMemorySegment in the patch, should it be included? SharedFileDescriptorFactory: * Javadoc for SharedFileDescriptorFactory constructor * {{rand()}} isn't reentrant, potentially making it unsafe for {{createDescriptor0}}. Should we use {{rand_r}} instead, or slap a synchronized on it? * Also not sure why we concat two {{rand()}}. Seems like one should be enough with the collision detection code. * The {{open}} is done with mode {{0777}}, wouldn't {{0700}} be safer? I thought we were passing these over a domain socket, so we can keep the permissions locked up. * Paranoia, should we do a check in CloseableReferenceCount#reference for overflow to the closed bit? I know we have 30 bits, but who knows. * Unrelated nit: DomainSocket#write(byte[], int, int) {{boolean exec}} is indented wrong, mind fixing it? DomainSocketWatcher: * Class javadoc is c+p from {{DomainSocket}}, I think it should be updated for DSW. Some high-level description of how the nested classes fit together would be nice. * Some Java-isms. {{Runnable}} is preferred over {{Thread}}. It's also weird that DSW is a {{Thread}} subclass and it calls {{start}} on itself. An inner class implementing Runnable would be more idiomatic. * Explain use of {{loopSocks 0}} versus {{loopSocks 1}}? This is a crucial part of this class: we need to use a socketpair rather than a plain condition variable because of blocking on poll. * "loopSocks" is also not a very descriptive name, maybe "wakeupPair" or "eventPair" instead? * Can add a Precondition check to make sure the lock is held in checkNotClosed * If we fail to kick, add and remove could block until the poll timeout. * Should doc that we only support one Handler per fd, it overwrites on add. Maybe Precondition this instead if we don't want to overwrite, I can't tell from context here. * Typo "loopSOcks" in log message * The repeated calls to {{sendCallback}} are worrisome. For instance, a sock could be EOF and closed, be removed by the first sendCallback, and then if there's a pending toRemove for the sock, the second sendCallback aborts on the Precondition check. * {{closeAll}} parameter in sendCallback is unused * This comment probably means to refer to loopSocks: {code} // Close shutdownSocketPair[0], so that shutdownSocketPair[1] gets an EOF {code} * This comment probably meant poll, not select: {code} // were waiting in select(). {code} TestDomainSocketWatcher: * Why are two of the {{@Test}} in TestDomainSocketWatcher commented out? * Timeouts seem kind of long, these should be super fast tests right? > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883685#comment-13883685 ] Arpit Agarwal commented on HDFS-5776: - {quote} Yes, that would be perfect sometimes, but not works for HBase scenario(the above Stack's consideration is great), since we made the pool "static", and per client view, it's more flexible if we provide instance level disable/enable APIs, so we can archive to use the hbase shell script to control the switch per dfs client instance, that'll be cooler {quote} Okay. {quote} In actualGetFromOneDatanode(), the refetchToken/refetchEncryptionKey is initialized outside the while (true) loop (see Line 993-996), when we hit InvalidEncryptionKeyException/InvalidBlockTokenException, the refetchToken and refetchEncryptionKey will be decreased by 1, (see refetchEncryptionKey-- and refetchToken-- statement), if the exceptions happened again, the check conditions will be failed definitely(see "e instanceof InvalidEncryptionKeyException && refetchEncryptionKey > 0" and "refetchToken > 0"), so go to the else clause, that'll execute: {quote} Isn't the call to {{actualGetFromOneDataNode}} wrapped in a loop itself? I am talking about the while loop in {{fetchBlockByteRange}}. Will that not change the behavior? Maybe it is harmless, I am not sure. I just want us to be clear either way. Thanks for adding the thread count limit. If we need more than 128 threads per client process just for backup reads we (hdfs) need to think about proper async rpc. Suggesting a lack of limits ignores the point that it can double the DN load on an already loaded cluster. Also 1ms lower bound for the delay is as good as zero but as long as we have a thread count limit I am okay. Minor points that don't need to hold up the checkin: # The test looks like a stress test, i.e. we are hoping that some of the hedged requests will complete before the primary requests. We can create a separate Jira to write a deterministic unit test and it’s fine if someone else picks that up later. # A couple of points from my initial feedback (#10, #12) were missed but again not worth holding the checkin. Other than clarifying the loop behavior the v9 patch looks fine to me. Thanks again for working with the feedback Liang, this is a nice capability to have in HDFS. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5810: --- Attachment: HDFS-5810.004.patch > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5730) Inconsistent Audit logging for HDFS APIs
[ https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883669#comment-13883669 ] Colin Patrick McCabe commented on HDFS-5730: Does anyone have a strong opinion about this approach? If not, I will review this in detail later this week > Inconsistent Audit logging for HDFS APIs > > > Key: HDFS-5730 > URL: https://issues.apache.org/jira/browse/HDFS-5730 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-5730.patch, HDFS-5730.patch > > > When looking at the audit loggs in HDFS, I am seeing some inconsistencies > what was logged with audit and what is added recently. > For more details please check the comments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883666#comment-13883666 ] Jing Zhao commented on HDFS-5776: - Thanks for the work [~xieliang007]! I will review your latest patch and give my comments tonight (PST). > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-5776: Attachment: HDFS-5776-v10.txt patch v10 removed the hard code limit per Colin's comments. patch v9 has the hard code limit. Any more comments or +1? Personally i'd like to let the first cut go to trunk and branch-2 asap, so i can kick off the HBase side change. More detailed disagreement could be resolved in other future JIRAs, right? and since the default pool size is 0, so no obvious foreseeable function/performance hurt against the current existing downstream application. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883650#comment-13883650 ] Hadoop QA commented on HDFS-5842: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625416/HADOOP-10215.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestHttpsFileSystem org.apache.hadoop.hdfs.server.namenode.TestNameNodeHttpServer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5955//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5955//console This message is automatically generated. > Cannot create hftp filesystem when using a proxy user ugi and a doAs on a > secure cluster > > > Key: HDFS-5842 > URL: https://issues.apache.org/jira/browse/HDFS-5842 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.2.0 >Reporter: Arpit Gupta >Assignee: Jing Zhao > Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, > HADOOP-10215.002.patch, HADOOP-10215.002.patch > > > Noticed this while debugging issues in another application. We saw an error > when trying to do a FileSystem.get using an hftp file system on a secure > cluster using a proxy user ugi. > This is a small snippet used > {code} > FileSystem testFS = ugi.doAs(new PrivilegedExceptionAction() { > @Override > public FileSystem run() throws IOException { > return FileSystem.get(hadoopConf); > } > }); > {code} > The same code worked for hdfs and webhdfs but not for hftp when the ugi used > was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5835) Add a new option for starting standby NN when rolling upgrade is in progress
[ https://issues.apache.org/jira/browse/HDFS-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883627#comment-13883627 ] Tsz Wo (Nicholas), SZE commented on HDFS-5835: -- Thanks Arpit and Jing for reviewing the patch. # Yes. See also #2 below. # Suppose NN1 is active and NN2 is standby. NN2 will be updated first. Then NN1 will failover to NN2. And then NN1 will be updated. # SBN should do checkpoint only before the update marker. I add tests for the cases above. > Add a new option for starting standby NN when rolling upgrade is in progress > > > Key: HDFS-5835 > URL: https://issues.apache.org/jira/browse/HDFS-5835 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5835_20130127.patch > > > When rolling upgrade is already in-progress and the standby NN is not yet > started up, a new startup option is needed for the standby NN to initialize > the upgrade status. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5833) SecondaryNameNode have an incorrect java doc
[ https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883610#comment-13883610 ] Hadoop QA commented on HDFS-5833: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625248/HDFS-5833-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDistributedFileSystem {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5954//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5954//console This message is automatically generated. > SecondaryNameNode have an incorrect java doc > > > Key: HDFS-5833 > URL: https://issues.apache.org/jira/browse/HDFS-5833 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Bangtao Zhou >Priority: Trivial > Attachments: HDFS-5833-1.patch > > > SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode > uses the *NamenodeProtocol* to talk to the primary NameNode, not the > *ClientProtocol* -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage
[ https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883580#comment-13883580 ] Haohui Mai commented on HDFS-5698: -- I took a fsimage from a production cluster, and scaled it down to different sizes to evaluate the performance and the size impact. I ran the test on a machine that has an 8-core Xeon E5530 CPU @ 2.4GHz, 24G memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running RHEL 6.2, Java 1.6. The JVM has a maximum heap size of 20G, and it runs the concurrent mark and sweep GC. Here are the numbers: |Size in Old|512M|1G|2G|4G|8G| |Size in PB|469M|950M|1.9G|3.7G|7.0G| |Saving in Old (ms)|14678|28991|60520|96894|160878| |Saving in PB (ms)|14709|16746|32623|83645|168617| |Loading in Old (ms)|12819|24664|48240|114090|307689| |Loading in PB (ms)|28268|43205|87060|266681|491605| The first two rows show the size of the fsimage in both the old and the new format respectively. The third and the forth row show the time of saving the fsimage in two different formats, and the last two rows show the time of loading the fsimage in two different format. The new fsimage format is slightly more compact. The code writes the new fsimage slightly faster. Currently the new fsimage format loads slower. However, in the new format most of the loading process can be parallelized. I plan to introduce this feature after the branch is merged. > Use protobuf to serialize / deserialize FSImage > --- > > Key: HDFS-5698 > URL: https://issues.apache.org/jira/browse/HDFS-5698 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch > > > Currently, the code serializes FSImage using in-house serialization > mechanisms. There are a couple disadvantages of the current approach: > # Mixing the responsibility of reconstruction and serialization / > deserialization. The current code paths of serialization / deserialization > have spent a lot of effort on maintaining compatibility. What is worse is > that they are mixed with the complex logic of reconstructing the namespace, > making the code difficult to follow. > # Poor documentation of the current FSImage format. The format of the FSImage > is practically defined by the implementation. An bug in implementation means > a bug in the specification. Furthermore, it also makes writing third-party > tools quite difficult. > # Changing schemas is non-trivial. Adding a field in FSImage requires bumping > the layout version every time. Bumping out layout version requires (1) the > users to explicitly upgrade the clusters, and (2) putting new code to > maintain backward compatibility. > This jira proposes to use protobuf to serialize the FSImage. Protobuf has > been used to serialize / deserialize the RPC message in Hadoop. > Protobuf addresses all the above problems. It clearly separates the > responsibility of serialization and reconstructing the namespace. The > protobuf files document the current format of the FSImage. The developers now > can add optional fields with ease, since the old code can always read the new > FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5835) Add a new option for starting standby NN when rolling upgrade is in progress
[ https://issues.apache.org/jira/browse/HDFS-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883558#comment-13883558 ] Jing Zhao commented on HDFS-5835: - +1 the Patch looks good to me. Some questions not related to the patch: # So when we start the SBN, the SBN already has been upgraded? # Is it possible that the NN failover just happens when we start the SBN? Or the other NN is in standby state at this time, and this NN will become active in the end? In that case this "STARTED' option may also be applied to the ANN? # If we allow SBN to do checkpoint during the rolling upgrade, the SBN may not hit the upgrade marker in the editlog when it restarts. Thus the current document said we would disable the checkpoint. But this may also cause issue if the time between "start" and "finalize" is long. Since we do not delete old editlog and fsimage during checkpointing, an alternative way is to scan the editlog even across the fsimage? > Add a new option for starting standby NN when rolling upgrade is in progress > > > Key: HDFS-5835 > URL: https://issues.apache.org/jira/browse/HDFS-5835 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5835_20130127.patch > > > When rolling upgrade is already in-progress and the standby NN is not yet > started up, a new startup option is needed for the standby NN to initialize > the upgrade status. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883548#comment-13883548 ] Hadoop QA commented on HDFS-5804: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625470/HDFS-5804.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.hdfs.nfs.nfs3.TestWrites org.apache.hadoop.hdfs.nfs.TestReaddir {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5956//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5956//console This message is automatically generated. > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, exception-as-root.log, javadoc-after-patch.log, > javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.se
[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883549#comment-13883549 ] Hadoop QA commented on HDFS-5841: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625443/hdfs-5841-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5953//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5953//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5953//console This message is automatically generated. > Update HDFS caching documentation with new changes > -- > > Key: HDFS-5841 > URL: https://issues.apache.org/jira/browse/HDFS-5841 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: caching > Attachments: hdfs-5841-1.patch > > > The caching documentation is a little out of date, since it's missing > description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
[ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883539#comment-13883539 ] Brandon Li commented on HDFS-5767: -- Sounds good to me. I can review the patch once it's available. Thanks. > Nfs implementation assumes userName userId mapping to be unique, which is not > true sometimes > > > Key: HDFS-5767 > URL: https://issues.apache.org/jira/browse/HDFS-5767 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.3.0 > Environment: With LDAP enabled >Reporter: Yongjun Zhang >Assignee: Brandon Li > > I'm seeing that the nfs implementation assumes unique pair > to be returned by command "getent paswd". That is, for a given userName, > there should be a single userId, and for a given userId, there should be a > single userName. The reason is explained in the following message: > private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway > can't start with duplicate name or id on the host system.\n" > + "This is because HDFS (non-kerberos cluster) uses name as the only > way to identify a user or group.\n" > + "The host system with duplicated user/group name or id might work > fine most of the time by itself.\n" > + "However when NFS gateway talks to HDFS, HDFS accepts only user and > group name.\n" > + "Therefore, same name means the same user or same group. To find the > duplicated names/ids, one can do:\n" > + " and > on Linux systms,\n" > + " and PrimaryGroupID> on MacOS."; > This requirement can not be met sometimes (e.g. because of the use of LDAP) > Let's do some examination: > What exist in /etc/passwd: > $ more /etc/passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > $ more /etc/passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > The above result says userName "bin" has userId "2", and "daemon" has userId > "1". > > What we can see with "getent passwd" command due to LDAP: > $ getent passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > bin:x:1:1:bin:/bin:/sbin/nologin > $ getent passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > daemon:x:2:2:daemon:/sbin:/sbin/nologin > We can see that there are multiple entries for the same userName with > different userIds, and the same userId could be associated with different > userNames. > So the assumption stated in the above DEBUG_INFO message can not be met here. > The DEBUG_INFO also stated that HDFS uses name as the only way to identify > user/group. I'm filing this JIRA for a solution. > Hi [~brandonli], since you implemented most of the nfs feature, would you > please comment? > Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5835) Add a new option for starting standby NN when rolling upgrade is in progress
[ https://issues.apache.org/jira/browse/HDFS-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883512#comment-13883512 ] Arpit Agarwal commented on HDFS-5835: - +1 for the patch. > Add a new option for starting standby NN when rolling upgrade is in progress > > > Key: HDFS-5835 > URL: https://issues.apache.org/jira/browse/HDFS-5835 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5835_20130127.patch > > > When rolling upgrade is already in-progress and the standby NN is not yet > started up, a new startup option is needed for the standby NN to initialize > the upgrade status. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883511#comment-13883511 ] Hadoop QA commented on HDFS-5842: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625416/HADOOP-10215.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3480//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3480//console This message is automatically generated. > Cannot create hftp filesystem when using a proxy user ugi and a doAs on a > secure cluster > > > Key: HDFS-5842 > URL: https://issues.apache.org/jira/browse/HDFS-5842 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.2.0 >Reporter: Arpit Gupta >Assignee: Jing Zhao > Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, > HADOOP-10215.002.patch, HADOOP-10215.002.patch > > > Noticed this while debugging issues in another application. We saw an error > when trying to do a FileSystem.get using an hftp file system on a secure > cluster using a proxy user ugi. > This is a small snippet used > {code} > FileSystem testFS = ugi.doAs(new PrivilegedExceptionAction() { > @Override > public FileSystem run() throws IOException { > return FileSystem.get(hadoopConf); > } > }); > {code} > The same code worked for hdfs and webhdfs but not for hftp when the ugi used > was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
[ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883505#comment-13883505 ] Yongjun Zhang commented on HDFS-5767: - Thanks Brandon. I assume you are ok with: "If you deem that the simplified solution to assume unique mapping (by ignoring duplicated same mapping) is sufficient, then we can go with the algorithm I listed at comment - 22/Jan/14 10:44." I can work out the solution if so. > Nfs implementation assumes userName userId mapping to be unique, which is not > true sometimes > > > Key: HDFS-5767 > URL: https://issues.apache.org/jira/browse/HDFS-5767 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.3.0 > Environment: With LDAP enabled >Reporter: Yongjun Zhang >Assignee: Brandon Li > > I'm seeing that the nfs implementation assumes unique pair > to be returned by command "getent paswd". That is, for a given userName, > there should be a single userId, and for a given userId, there should be a > single userName. The reason is explained in the following message: > private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway > can't start with duplicate name or id on the host system.\n" > + "This is because HDFS (non-kerberos cluster) uses name as the only > way to identify a user or group.\n" > + "The host system with duplicated user/group name or id might work > fine most of the time by itself.\n" > + "However when NFS gateway talks to HDFS, HDFS accepts only user and > group name.\n" > + "Therefore, same name means the same user or same group. To find the > duplicated names/ids, one can do:\n" > + " and > on Linux systms,\n" > + " and PrimaryGroupID> on MacOS."; > This requirement can not be met sometimes (e.g. because of the use of LDAP) > Let's do some examination: > What exist in /etc/passwd: > $ more /etc/passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > $ more /etc/passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > The above result says userName "bin" has userId "2", and "daemon" has userId > "1". > > What we can see with "getent passwd" command due to LDAP: > $ getent passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > bin:x:1:1:bin:/bin:/sbin/nologin > $ getent passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > daemon:x:2:2:daemon:/sbin:/sbin/nologin > We can see that there are multiple entries for the same userName with > different userIds, and the same userId could be associated with different > userNames. > So the assumption stated in the above DEBUG_INFO message can not be met here. > The DEBUG_INFO also stated that HDFS uses name as the only way to identify > user/group. I'm filing this JIRA for a solution. > Hi [~brandonli], since you implemented most of the nfs feature, would you > please comment? > Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HDFS-5804: -- Attachment: HDFS-5804.patch This removes the isSecurityEnabled check. > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, exception-as-root.log, javadoc-after-patch.log, > javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281) > at > org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPi
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883492#comment-13883492 ] Suresh Srinivas commented on HDFS-5138: --- bq. Finalize is actually rather easy, since it's idempotent. Missed this. Agreed finalize is idempotent (not sure how code deals with the failures - not had time to look into it). But not being able to finalize in some cases could be problematic. Especially from storage utilization point of view. > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery
[ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883493#comment-13883493 ] Todd Lipcon commented on HDFS-5790: --- Thanks for the analysis Kihwal. My logic was basically the same - glad to have it confirmed. Also, you're right - I'm pretty sure the "single writer" was NN_Recovery in the production case we saw as well, though it wasn't easy to verify (we don't appear to have any way to dump the LeaseManager state at runtime, which is a shame) I'll commit this in a day or two if no one has further comments. > LeaseManager.findPath is very slow when many leases need recovery > - > > Key: HDFS-5790 > URL: https://issues.apache.org/jira/browse/HDFS-5790 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, performance >Affects Versions: 2.4.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-5790.txt, hdfs-5790.txt > > > We recently saw an issue where the NN restarted while tens of thousands of > files were open. The NN then ended up spending multiple seconds for each > commitBlockSynchronization() call, spending most of its time inside > LeaseManager.findPath(). findPath currently works by looping over all files > held for a given writer, and traversing the filesystem for each one. This > takes way too long when tens of thousands of files are open by a single > writer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
[ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883482#comment-13883482 ] Brandon Li commented on HDFS-5767: -- Sorry for missing that question. NFS gateway uses only one map containing the name-id mapping. Even IdUserGroup is used on a different machine to get a different id or name, it can't pass it to NFS gateway. Actually, with AUTH_UNIX as the current authentication method, NFS client passes only the user id to NFS gateway and that is done usually by kernel not by application. > Nfs implementation assumes userName userId mapping to be unique, which is not > true sometimes > > > Key: HDFS-5767 > URL: https://issues.apache.org/jira/browse/HDFS-5767 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.3.0 > Environment: With LDAP enabled >Reporter: Yongjun Zhang >Assignee: Brandon Li > > I'm seeing that the nfs implementation assumes unique pair > to be returned by command "getent paswd". That is, for a given userName, > there should be a single userId, and for a given userId, there should be a > single userName. The reason is explained in the following message: > private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway > can't start with duplicate name or id on the host system.\n" > + "This is because HDFS (non-kerberos cluster) uses name as the only > way to identify a user or group.\n" > + "The host system with duplicated user/group name or id might work > fine most of the time by itself.\n" > + "However when NFS gateway talks to HDFS, HDFS accepts only user and > group name.\n" > + "Therefore, same name means the same user or same group. To find the > duplicated names/ids, one can do:\n" > + " and > on Linux systms,\n" > + " and PrimaryGroupID> on MacOS."; > This requirement can not be met sometimes (e.g. because of the use of LDAP) > Let's do some examination: > What exist in /etc/passwd: > $ more /etc/passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > $ more /etc/passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > The above result says userName "bin" has userId "2", and "daemon" has userId > "1". > > What we can see with "getent passwd" command due to LDAP: > $ getent passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > bin:x:1:1:bin:/bin:/sbin/nologin > $ getent passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > daemon:x:2:2:daemon:/sbin:/sbin/nologin > We can see that there are multiple entries for the same userName with > different userIds, and the same userId could be associated with different > userNames. > So the assumption stated in the above DEBUG_INFO message can not be met here. > The DEBUG_INFO also stated that HDFS uses name as the only way to identify > user/group. I'm filing this JIRA for a solution. > Hi [~brandonli], since you implemented most of the nfs feature, would you > please comment? > Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Moved] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao moved HADOOP-10215 to HDFS-5842: -- Component/s: (was: security) security Affects Version/s: (was: 2.2.0) 2.2.0 Key: HDFS-5842 (was: HADOOP-10215) Project: Hadoop HDFS (was: Hadoop Common) > Cannot create hftp filesystem when using a proxy user ugi and a doAs on a > secure cluster > > > Key: HDFS-5842 > URL: https://issues.apache.org/jira/browse/HDFS-5842 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.2.0 >Reporter: Arpit Gupta >Assignee: Jing Zhao > Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, > HADOOP-10215.002.patch, HADOOP-10215.002.patch > > > Noticed this while debugging issues in another application. We saw an error > when trying to do a FileSystem.get using an hftp file system on a secure > cluster using a proxy user ugi. > This is a small snippet used > {code} > FileSystem testFS = ugi.doAs(new PrivilegedExceptionAction() { > @Override > public FileSystem run() throws IOException { > return FileSystem.get(hadoopConf); > } > }); > {code} > The same code worked for hdfs and webhdfs but not for hftp when the ugi used > was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883415#comment-13883415 ] Suresh Srinivas edited comment on HDFS-5138 at 1/27/14 10:51 PM: - bq. The concern is about losing edit logs by overwriting a renamed directory with some contents, so by definition there will be some files in the directory being renamed to. That makes sense. Thanks. bq. The preupgrade and upgrade failure scenarios should both be handled either manually or by the storage recovery process I do not think JN performs recovery, based on the following code from JNStorage.java {code} void analyzeStorage() throws IOException { this.state = sd.analyzeStorage(StartupOption.REGULAR, this); if (state == StorageState.NORMAL) { readProperties(sd); } } {code} For JournalNode, StorageDirectory#doRecover() is not called. Is that correct? From my understanding, once it gets into this state, JournalNode restart will not work? was (Author: sureshms): bq. The concern is about losing edit logs by overwriting a renamed directory with some contents, so by definition there will be some files in the directory being renamed to. That makes sense. Thanks. bq. The preupgrade and upgrade failure scenarios should both be handled either manually or by the storage recovery process I do not think JN performs recovery, based on the following code from JNStorage.java {code} void analyzeStorage() throws IOException { this.state = sd.analyzeStorage(StartupOption.REGULAR, this); if (state == StorageState.NORMAL) { readProperties(sd); } } {code} For JournalNode, StorageDirectory#doRecover() is not called. Is that correct? From my understanding, once it gets into this state, JournalNode should not startup? > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.
[ https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883469#comment-13883469 ] Hudson commented on HDFS-5830: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5048 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5048/]) HDFS-5830. WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster. (Yongjun Zhang via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561885) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing another cluster. > > > Key: HDFS-5830 > URL: https://issues.apache.org/jira/browse/HDFS-5830 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs-client >Affects Versions: 2.3.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5830.001.patch > > > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing a another cluster (that doesn't have caching support). > java.lang.IllegalArgumentException: cachedLocs should not be null, use a > different constructor > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.
[ https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883464#comment-13883464 ] Yongjun Zhang commented on HDFS-5830: - Thanks a lot Colin! I planned to take a look at the -1 javadoc thing. Will update later when I find something. > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing another cluster. > > > Key: HDFS-5830 > URL: https://issues.apache.org/jira/browse/HDFS-5830 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs-client >Affects Versions: 2.3.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5830.001.patch > > > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing a another cluster (that doesn't have caching support). > java.lang.IllegalArgumentException: cachedLocs should not be null, use a > different constructor > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883443#comment-13883443 ] Colin Patrick McCabe commented on HDFS-5781: Yeah, perhaps we could have a separate JIRA to use a static function rather than a static block. > Use an array to record the mapping between FSEditLogOpCode and the > corresponding byte value > --- > > Key: HDFS-5781 > URL: https://issues.apache.org/jira/browse/HDFS-5781 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, > HDFS-5781.002.patch, HDFS-5781.002.patch > > > HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a > given byte value. While improving the efficiency, it may cause issue. E.g., > when several new editlog ops are added to trunk around the same time (for > several different new features), it is hard to backport the editlog ops with > larger byte values to branch-2 before those with smaller values, since there > will be gaps in the byte values of the enum. > This jira plans to still use an array to record the mapping between editlog > ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage
[ https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883436#comment-13883436 ] Hadoop QA commented on HDFS-5698: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625413/HDFS-5698.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5952//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5952//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5952//console This message is automatically generated. > Use protobuf to serialize / deserialize FSImage > --- > > Key: HDFS-5698 > URL: https://issues.apache.org/jira/browse/HDFS-5698 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch > > > Currently, the code serializes FSImage using in-house serialization > mechanisms. There are a couple disadvantages of the current approach: > # Mixing the responsibility of reconstruction and serialization / > deserialization. The current code paths of serialization / deserialization > have spent a lot of effort on maintaining compatibility. What is worse is > that they are mixed with the complex logic of reconstructing the namespace, > making the code difficult to follow. > # Poor documentation of the current FSImage format. The format of the FSImage > is practically defined by the implementation. An bug in implementation means > a bug in the specification. Furthermore, it also makes writing third-party > tools quite difficult. > # Changing schemas is non-trivial. Adding a field in FSImage requires bumping > the layout version every time. Bumping out layout version requires (1) the > users to explicitly upgrade the clusters, and (2) putting new code to > maintain backward compatibility. > This jira proposes to use protobuf to serialize the FSImage. Protobuf has > been used to serialize / deserialize the RPC message in Hadoop. > Protobuf addresses all the above problems. It clearly separates the > responsibility of serialization and reconstructing the namespace. The > protobuf files document the current format of the FSImage. The developers now > can add optional fields with ease, since the old code can always read the new > FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5833) SecondaryNameNode have an incorrect java doc
[ https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5833: Affects Version/s: (was: trunk-win) 3.0.0 Status: Patch Available (was: Open) > SecondaryNameNode have an incorrect java doc > > > Key: HDFS-5833 > URL: https://issues.apache.org/jira/browse/HDFS-5833 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Bangtao Zhou >Priority: Trivial > Attachments: HDFS-5833-1.patch > > > SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode > uses the *NamenodeProtocol* to talk to the primary NameNode, not the > *ClientProtocol* -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.
[ https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5830: --- Resolution: Fixed Fix Version/s: 2.3.0 Status: Resolved (was: Patch Available) > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing another cluster. > > > Key: HDFS-5830 > URL: https://issues.apache.org/jira/browse/HDFS-5830 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs-client >Affects Versions: 2.3.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5830.001.patch > > > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing a another cluster (that doesn't have caching support). > java.lang.IllegalArgumentException: cachedLocs should not be null, use a > different constructor > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.
[ https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883429#comment-13883429 ] Colin Patrick McCabe commented on HDFS-5830: release audit warning is a pid file-- not relevant. committing. > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing another cluster. > > > Key: HDFS-5830 > URL: https://issues.apache.org/jira/browse/HDFS-5830 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs-client >Affects Versions: 2.3.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Attachments: HDFS-5830.001.patch > > > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing a another cluster (that doesn't have caching support). > java.lang.IllegalArgumentException: cachedLocs should not be null, use a > different constructor > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883415#comment-13883415 ] Suresh Srinivas edited comment on HDFS-5138 at 1/27/14 10:09 PM: - bq. The concern is about losing edit logs by overwriting a renamed directory with some contents, so by definition there will be some files in the directory being renamed to. That makes sense. Thanks. bq. The preupgrade and upgrade failure scenarios should both be handled either manually or by the storage recovery process I do not think JN performs recovery, based on the following code from JNStorage.java {code} void analyzeStorage() throws IOException { this.state = sd.analyzeStorage(StartupOption.REGULAR, this); if (state == StorageState.NORMAL) { readProperties(sd); } } {code} For JournalNode, StorageDirectory#doRecover() is not called. Is that correct? From my understanding, once it gets into this state, JournalNode should not startup? was (Author: sureshms): bq. The concern is about losing edit logs by overwriting a renamed directory with some contents, so by definition there will be some files in the directory being renamed to. That makes sense. Thanks. bq. The preupgrade and upgrade failure scenarios should both be handled either manually or by the storage recovery process I do not think JN performs recovery, based on the following code from JNStorage.java {code} void analyzeStorage() throws IOException { this.state = sd.analyzeStorage(StartupOption.REGULAR, this); if (state == StorageState.NORMAL) { readProperties(sd); } } {code} For JournalNode, node call StorageDirectory#doRecover(). Is that correct? > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883415#comment-13883415 ] Suresh Srinivas commented on HDFS-5138: --- bq. The concern is about losing edit logs by overwriting a renamed directory with some contents, so by definition there will be some files in the directory being renamed to. That makes sense. Thanks. bq. The preupgrade and upgrade failure scenarios should both be handled either manually or by the storage recovery process I do not think JN performs recovery, based on the following code from JNStorage.java {code} void analyzeStorage() throws IOException { this.state = sd.analyzeStorage(StartupOption.REGULAR, this); if (state == StorageState.NORMAL) { readProperties(sd); } } {code} For JournalNode, node call StorageDirectory#doRecover(). Is that correct? > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery
[ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883409#comment-13883409 ] Kihwal Lee commented on HDFS-5790: -- I wondered why commitBlockSynchronization() sometimes takes long and this jira explains why. When the original lease holders disappear, the lease holders are changed to namenode for block recovery. So if a lot of files get abandoned at around the same time, NN will be that writer with a large number of open files. The patch looks good. The paths managed by LeaseManager are supposed to be updated on deletions and renames, so there is no point in searching there when the reference to inode is already known. For all user-initiated calls, the inode is obtained using the user-supplied path and then checkLease() is called before calling findPath(). So if something is to fail in findPath(), it should fail earlier in the code path. The patch seems fine in terms of both consistency and correctness. +1 > LeaseManager.findPath is very slow when many leases need recovery > - > > Key: HDFS-5790 > URL: https://issues.apache.org/jira/browse/HDFS-5790 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, performance >Affects Versions: 2.4.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-5790.txt, hdfs-5790.txt > > > We recently saw an issue where the NN restarted while tens of thousands of > files were open. The NN then ended up spending multiple seconds for each > commitBlockSynchronization() call, spending most of its time inside > LeaseManager.findPath(). findPath currently works by looping over all files > held for a given writer, and traversing the filesystem for each one. This > takes way too long when tens of thousands of files are open by a single > writer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5297) Fix dead links in HDFS site documents
[ https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883397#comment-13883397 ] Hudson commented on HDFS-5297: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5047 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5047/]) HDFS-5297. Fix dead links in HDFS site documents. (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561849) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithQJM.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsEditsViewer.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsImageViewer.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsQuotaAdminGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ShortCircuitLocalReads.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm > Fix dead links in HDFS site documents > - > > Key: HDFS-5297 > URL: https://issues.apache.org/jira/browse/HDFS-5297 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5297.patch > > > I found a lot of broken hyperlinks in HDFS document to be fixed. > Ex.) > In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below > {noformat} >For command usage, see {{{dfsadmin}}}. > {noformat} > It should be fixed to > {noformat} >For command usage, see > {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883394#comment-13883394 ] Aaron T. Myers commented on HDFS-5138: -- bq. Aaron T. Myers, we talked about this last on Friday Jan 16th over the phone right. I did tell you that JournalNode potentially losing editlogs. There must have been some misunderstanding because I'm pretty sure I told you that I didn't think that was possible. :) Anyway, see below... bq. Is that correct? Did you check it? Java File#renameTo() is platform dependent. The following code always renames the directories (on my MAC): I did, at least on Linux. In the code example you have above, try putting a child file or directory under the directory f2 and see if it still works. The concern is about losing edit logs by overwriting a renamed directory with some contents, so by definition there will be some files in the directory being renamed to. bq. Related question. Lets say even if the rename fails, how does user recover from that condition? I brought up several scenarios related to that in preupgrade, upgrade, and finalize. How do we handle finalize being done successfully done on one namenode and not the other? Finalize is actually rather easy, since it's idempotent. The preupgrade and upgrade failure scenarios should both be handled either manually or by the storage recovery process, which currently should happen on JN restart, but I agree could be improved. Let's continue discussion of this over on HDFS-5840. > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
[ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883395#comment-13883395 ] Yongjun Zhang commented on HDFS-5767: - Thanks [~brandonli], Duplicated entries of exactly same mapping is easy to handle (we can simply ignore, because they are the same) as we discussed earlier. See my earlier comment at - 21/Jan/14 16:07. If you deem that the simplified solution to assume unique mapping (by ignoring duplicated same mapping) is sufficient, then we can go with the algorithm I listed at comment - 22/Jan/14 10:44. I actually had a question for you in my comment at 21/Jan/14 16:07 above, and I'm putting it here again: "I'm asking another question here, I noticed that IdUserGroup class also provides API go getUserName of given uid. I'm not sure whether this API will be called from different machine with different uid for the same user. If it does, then we might get wrong user name back from this API. Say, userA is mapped to 1 in /etc/passwd, and 2 in ldap, we end up assign mapping , is it possible some one will call this API with "1", and expect useA?" Thanks. > Nfs implementation assumes userName userId mapping to be unique, which is not > true sometimes > > > Key: HDFS-5767 > URL: https://issues.apache.org/jira/browse/HDFS-5767 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.3.0 > Environment: With LDAP enabled >Reporter: Yongjun Zhang >Assignee: Brandon Li > > I'm seeing that the nfs implementation assumes unique pair > to be returned by command "getent paswd". That is, for a given userName, > there should be a single userId, and for a given userId, there should be a > single userName. The reason is explained in the following message: > private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway > can't start with duplicate name or id on the host system.\n" > + "This is because HDFS (non-kerberos cluster) uses name as the only > way to identify a user or group.\n" > + "The host system with duplicated user/group name or id might work > fine most of the time by itself.\n" > + "However when NFS gateway talks to HDFS, HDFS accepts only user and > group name.\n" > + "Therefore, same name means the same user or same group. To find the > duplicated names/ids, one can do:\n" > + " and > on Linux systms,\n" > + " and PrimaryGroupID> on MacOS."; > This requirement can not be met sometimes (e.g. because of the use of LDAP) > Let's do some examination: > What exist in /etc/passwd: > $ more /etc/passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > $ more /etc/passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > The above result says userName "bin" has userId "2", and "daemon" has userId > "1". > > What we can see with "getent passwd" command due to LDAP: > $ getent passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > bin:x:1:1:bin:/bin:/sbin/nologin > $ getent passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > daemon:x:2:2:daemon:/sbin:/sbin/nologin > We can see that there are multiple entries for the same userName with > different userIds, and the same userId could be associated with different > userNames. > So the assumption stated in the above DEBUG_INFO message can not be met here. > The DEBUG_INFO also stated that HDFS uses name as the only way to identify > user/group. I'm filing this JIRA for a solution. > Hi [~brandonli], since you implemented most of the nfs feature, would you > please comment? > Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5838) TestCacheDirectives#testCreateAndModifyPools fails
[ https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-5838. - Resolution: Not A Problem The setup and teardown functions that run before and after the tests respectively happens to solve the problem. > TestCacheDirectives#testCreateAndModifyPools fails > -- > > Key: HDFS-5838 > URL: https://issues.apache.org/jira/browse/HDFS-5838 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Labels: java7 > Attachments: HDFS-5838.patch > > > testCreateAndModifyPools generates an assertion fail when it runs after > testBasicPoolOperations. > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives) Time > elapsed: 4.649 sec <<< FAILURE! > java.lang.AssertionError: expected no cache pools after deleting pool > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertFalse(Assert.java:68) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160) > Results : > Failed tests: > TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no > cache pools after deleting pool > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5838) TestCacheDirectives#testCreateAndModifyPools fails
[ https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5838: Status: Open (was: Patch Available) > TestCacheDirectives#testCreateAndModifyPools fails > -- > > Key: HDFS-5838 > URL: https://issues.apache.org/jira/browse/HDFS-5838 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Labels: java7 > Attachments: HDFS-5838.patch > > > testCreateAndModifyPools generates an assertion fail when it runs after > testBasicPoolOperations. > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives) Time > elapsed: 4.649 sec <<< FAILURE! > java.lang.AssertionError: expected no cache pools after deleting pool > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertFalse(Assert.java:68) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160) > Results : > Failed tests: > TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no > cache pools after deleting pool > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5838) TestCacheDirectives#testCreateAndModifyPools fails
[ https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5838: Summary: TestCacheDirectives#testCreateAndModifyPools fails (was: TestcacheDirectives#testCreateAndModifyPools fails) > TestCacheDirectives#testCreateAndModifyPools fails > -- > > Key: HDFS-5838 > URL: https://issues.apache.org/jira/browse/HDFS-5838 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Labels: java7 > Attachments: HDFS-5838.patch > > > testCreateAndModifyPools generates an assertion fail when it runs after > testBasicPoolOperations. > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives) Time > elapsed: 4.649 sec <<< FAILURE! > java.lang.AssertionError: expected no cache pools after deleting pool > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertFalse(Assert.java:68) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160) > Results : > Failed tests: > TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no > cache pools after deleting pool > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883368#comment-13883368 ] Suresh Srinivas commented on HDFS-5138: --- {quote} Hi Suresh, it's obviously fine that you're busy (we all are) but in the future please just let me know that you intend to review it and that we should hold off for committing it for a bit. I reached out to you more than once last week to ask about a review timeline and never heard back from you, so I asked Todd to commit it (I'm traveling at the moment) given the silence. {quote} [~atm], we talked about this last on Friday Jan 16th over the phone right. I did tell you that JournalNode potentially losing editlogs. bq. This scenario isn't possible as you described because either the pre-upgrade or upgrade stages (depending upon when the original failure happened) will fail to rename the dir if it already exists. Is that correct? Did you check it? Java File#renameTo() is platform dependent. The following code always renames the directories (on my MAC): {code} public static void main(String[] args) { File f1 = new File("/tmp/dir1"); File f2 = new File("/tmp/dir2"); f1.mkdir(); f2.mkdir(); System.out.println(f1 + (f1.exists() ? " exists" : " does not exist")); System.out.println(f2 + (f2.exists() ? " exists" : " does not exist")); f1.renameTo(f2); System.out.println("Renamed " + f1 + " to " + f2); System.out.println(f1 + (f1.exists() ? " exists" : " does not exist")); System.out.println(f2 + (f2.exists() ? " exists" : " does not exist")); } {code} Related question. Lets say even if the rename fails, how does user recover from that condition? I brought up several scenarios related to that in preupgrade, upgrade, and finalize. How do we handle finalize being done successfully done on one namenode and not the other? > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
[ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883348#comment-13883348 ] Brandon Li commented on HDFS-5767: -- Thanks, [~yzhangal]. I think multi-mapping(e.g., test1->502, test2->502, test1->503) in most cases is an error. In that case, NFS gateway can fail to start. Completely duplicated mapping is not uncommon and the dup can be just ignored by NFS. One example I saw before is that, the same user account was configured with the same id twice on both LDAP and local node(/etc/passwd). Then "getent passwd" could give the same mapping twice (e.g, test1->502, test1->502) > Nfs implementation assumes userName userId mapping to be unique, which is not > true sometimes > > > Key: HDFS-5767 > URL: https://issues.apache.org/jira/browse/HDFS-5767 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.3.0 > Environment: With LDAP enabled >Reporter: Yongjun Zhang >Assignee: Brandon Li > > I'm seeing that the nfs implementation assumes unique pair > to be returned by command "getent paswd". That is, for a given userName, > there should be a single userId, and for a given userId, there should be a > single userName. The reason is explained in the following message: > private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway > can't start with duplicate name or id on the host system.\n" > + "This is because HDFS (non-kerberos cluster) uses name as the only > way to identify a user or group.\n" > + "The host system with duplicated user/group name or id might work > fine most of the time by itself.\n" > + "However when NFS gateway talks to HDFS, HDFS accepts only user and > group name.\n" > + "Therefore, same name means the same user or same group. To find the > duplicated names/ids, one can do:\n" > + " and > on Linux systms,\n" > + " and PrimaryGroupID> on MacOS."; > This requirement can not be met sometimes (e.g. because of the use of LDAP) > Let's do some examination: > What exist in /etc/passwd: > $ more /etc/passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > $ more /etc/passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > The above result says userName "bin" has userId "2", and "daemon" has userId > "1". > > What we can see with "getent passwd" command due to LDAP: > $ getent passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > bin:x:1:1:bin:/bin:/sbin/nologin > $ getent passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > daemon:x:2:2:daemon:/sbin:/sbin/nologin > We can see that there are multiple entries for the same userName with > different userIds, and the same userId could be associated with different > userNames. > So the assumption stated in the above DEBUG_INFO message can not be met here. > The DEBUG_INFO also stated that HDFS uses name as the only way to identify > user/group. I'm filing this JIRA for a solution. > Hi [~brandonli], since you implemented most of the nfs feature, would you > please comment? > Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5297) Fix dead links in HDFS site documents
[ https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5297: Resolution: Fixed Fix Version/s: 2.3.0 3.0.0 Target Version/s: 2.3.0 (was: 2.4.0) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Generated the site and verified that it fixes most of the broken links. There is one broken link in WebHDFS.apt.vm. {noformat}{{{RemoteException JSON Schema}}}{noformat} should be {noformat}{{RemoteException JSON Schema}}{noformat} It can be addressed in a separate Jira. I committed the patch to trunk, branch-2 and branch-2.3. Thanks for the contribution Akira-san! > Fix dead links in HDFS site documents > - > > Key: HDFS-5297 > URL: https://issues.apache.org/jira/browse/HDFS-5297 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5297.patch > > > I found a lot of broken hyperlinks in HDFS document to be fixed. > Ex.) > In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below > {noformat} >For command usage, see {{{dfsadmin}}}. > {noformat} > It should be fixed to > {noformat} >For command usage, see > {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails
[ https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883341#comment-13883341 ] Hadoop QA commented on HDFS-5838: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625405/HDFS-5838.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5951//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5951//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5951//console This message is automatically generated. > TestcacheDirectives#testCreateAndModifyPools fails > -- > > Key: HDFS-5838 > URL: https://issues.apache.org/jira/browse/HDFS-5838 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Labels: java7 > Attachments: HDFS-5838.patch > > > testCreateAndModifyPools generates an assertion fail when it runs after > testBasicPoolOperations. > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives) Time > elapsed: 4.649 sec <<< FAILURE! > java.lang.AssertionError: expected no cache pools after deleting pool > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertFalse(Assert.java:68) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160) > Results : > Failed tests: > TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no > cache pools after deleting pool > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5841: -- Status: Patch Available (was: Open) > Update HDFS caching documentation with new changes > -- > > Key: HDFS-5841 > URL: https://issues.apache.org/jira/browse/HDFS-5841 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: caching > Attachments: hdfs-5841-1.patch > > > The caching documentation is a little out of date, since it's missing > description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5841: -- Attachment: hdfs-5841-1.patch Patch attached, I also took the opportunity to reorg some of the content (hopefully for the better). The diff is kind of hard to review, just looking at it via {{mvn site:site}} is probably easiest. > Update HDFS caching documentation with new changes > -- > > Key: HDFS-5841 > URL: https://issues.apache.org/jira/browse/HDFS-5841 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: caching > Attachments: hdfs-5841-1.patch > > > The caching documentation is a little out of date, since it's missing > description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5841) Update HDFS caching documentation with new changes
Andrew Wang created HDFS-5841: - Summary: Update HDFS caching documentation with new changes Key: HDFS-5841 URL: https://issues.apache.org/jira/browse/HDFS-5841 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang The caching documentation is a little out of date, since it's missing description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5297) Fix dead links in HDFS site documents
[ https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5297: Summary: Fix dead links in HDFS site documents (was: Fix dead links in HDFS document) > Fix dead links in HDFS site documents > - > > Key: HDFS-5297 > URL: https://issues.apache.org/jira/browse/HDFS-5297 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Attachments: HDFS-5297.patch > > > I found a lot of broken hyperlinks in HDFS document to be fixed. > Ex.) > In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below > {noformat} >For command usage, see {{{dfsadmin}}}. > {noformat} > It should be fixed to > {noformat} >For command usage, see > {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5138: - Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Incompatible change,Reviewed Status: Resolved (was: Patch Available) Resolving this with a fix version of 3.0.0. > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883289#comment-13883289 ] Aaron T. Myers commented on HDFS-5138: -- Hi Suresh, it's obviously fine that you're busy (we all are) but in the future please just let me know that you intend to review it and that we should hold off for committing it for a bit. I reached out to you more than once last week to ask about a review timeline and never heard back from you, so I asked Todd to commit it (I'm traveling at the moment) given the silence. bq. I had brought up one issue about potentially losing editlogs on JournalNode. This scenario isn't possible as you described because either the pre-upgrade or upgrade stages (depending upon when the original failure happened) will fail to rename the dir if it already exists. That said, your points about improving the documentation and the recovery procedure in the event of partial failure of the upgrade are well taken and certainly worth addressing. Upon looking at it further, I also think we should change a few of the assertions in the code to be actual exceptions, since we shouldn't have to be running with assertions enabled to check these error conditions, which should harden all of these code paths a bit more. bq. please address the comments before merging to branch-2. OK, I've filed HDFS-5840 to address your latest comments. Please follow that JIRA and review it as promptly as you can. I'm going to resolve this JIRA for now with a fix version of 3.0.0 and will merge both JIRAs to branch-2 when HDFS-5840 is completed. > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883273#comment-13883273 ] Aaron T. Myers commented on HDFS-5840: -- >From Suresh: I am adding information about the design, the way I understand it. Let me know if I got it wrong. *Upgrade preparation:* # New bits are installed on the cluster nodes. # The cluster is brought down. *Upgrade:* For HA setup, choose one of the namenodes to initiate upgrade on and start it with -upgrade flag. # NN performs preupgrade for all non shared storage directories by moving current to previous.tmp and creating new current. #* Failure here is fine. NN start up fails. Next attempt at upgrade the storage directories are recovered. # NN performs preupgrade of shared edits (NFS/JournalNodes) over RPC. JournalNodes current moved to previous.tmp and new current is created. #* If one of the JN preupgrade fails and upgrade is reattempted, editlog directory could be lost on the JN. Restarting the JN does not fix the issue. # NN performs upgrade of non shared edits by writing new CTIME to current and moving previous.tmp to previous. #* If one of the JN preupgrade fails and upgrade is reattempted, editlog directory could be lost on the JN. Restarting the JN does not fix the issue. # NN performs upgrade of shared edits (NFS/JournalNodes) over RPC. JournalNodes current has new CTIM and previous.tmp is moved to previous. # We need to document that all the JournalNodes must be up. If a JN is irrecoverably lost, configuration must be changed to exclude the JN. *Rollback:* NN is started with rollback flag # For all the non shared directories, the NN checks for canRollBack, essentially ensures that previous directory with the right layout version exists. # For all the shared directories, the NN checks for canRollBack, essentially ensures that previous directory with the right layout version exists. # NN performs rollback for shared directories (moving previous to current) #* If rollback of one of the JN fails, then directories are in inconsistent state. I think any attempt at retrying rollback will fail and will require manually moving files around. I do not think restarting JN fixes this. # We need to document that all the JournalNodes must be up. If a JN is irrecoverably lost, configuration must be changed to exclude the JN. *Finalize:* DFSAdmin command is run to finalize the upgrade. # Active NN performs finalizing of editlog. If JN's fail to finalize, active NN fails to finalize. However it is possible that standby finalizes, leaving the cluster in an inconsistent state. # We need to document that all the JournalNodes must be up. If a JN is irrecoverably lost, configuration must be changed to exclude the JN. Comments on the code in the patch (this is almost complete): Comments: # Minor nit: there are some white space changes # assertAllResultsEqual - for loop can just start with i = 1? Also if the collection objects is of size zero or one, the method can return early. Is there a need to do object.toArray() for these early checks? With that, perhaps the findbugs exclude may not be necessary. # Unit test can be added for methods isAtLeastOneActive, getRpcAddressesForNameserviceId and getProxiesForAllNameNodesInNameservice (I am okay if this is done in a separate jira) # Finalizing upgrade is quite tricky. Consider the following scenarios: #* One NN is active and the other is standby - works fine #* One NN is active and the other is down or all NNs - finalize command throws exception and the user will not know if it has succeeded or failed and what to do next #* No active NN - throws an exception cannot finalize with no active #* BlockPoolSliceStorage.java change seems unnecessary # Why is {{throw new AssertionError("Unreachable code.");}} in QuorumJournalManager.java methods? # FSImage#doRollBack() - when canRollBack is false after checking if non-share directories can rollback, an exception must be immediately thrown, instead of checking shared editlog. Also printing Log.info when storages can be rolled back will help in debugging. # FSEditlog#canRollBackSharedLog should accept StorageInfo instead of Storage # QuorumJournalManager#canRollBack and getJournalCTime can throw AssertionError (from DFSUtil.assertAllResultsEqual()). Is that the right exception to expose or IOException? # Namenode startup throws AssertionError with -rollback option. I think we should throw IOException, which is how all the other failures are indicated. > Follow-up to HDFS-5138 to improve error handling during partial upgrade > failures > > > Key: HDFS-5840 > URL: https://issues.apache.org/jira/browse/HDFS-5840 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.
[jira] [Created] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
Aaron T. Myers created HDFS-5840: Summary: Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 3.0.0 Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.
[ https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Jose updated HDFS-5608: -- Attachment: HDFS-5608.4.patch Addressed above review comments and added end to end test cases for webhdfs , jsonutils , AclPermissionParam. Please review the same. > WebHDFS: implement GETACLSTATUS and SETACL. > --- > > Key: HDFS-5608 > URL: https://issues.apache.org/jira/browse/HDFS-5608 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Sachin Jose > Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch, HDFS-5608.2.patch, > HDFS-5608.3.patch, HDFS-5608.4.patch > > > Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883263#comment-13883263 ] Hadoop QA commented on HDFS-5781: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624851/HDFS-5781.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5950//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5950//console This message is automatically generated. > Use an array to record the mapping between FSEditLogOpCode and the > corresponding byte value > --- > > Key: HDFS-5781 > URL: https://issues.apache.org/jira/browse/HDFS-5781 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, > HDFS-5781.002.patch, HDFS-5781.002.patch > > > HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a > given byte value. While improving the efficiency, it may cause issue. E.g., > when several new editlog ops are added to trunk around the same time (for > several different new features), it is hard to backport the editlog ops with > larger byte values to branch-2 before those with smaller values, since there > will be gaps in the byte values of the enum. > This jira plans to still use an array to record the mapping between editlog > ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.
[ https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Jose updated HDFS-5608: -- Attachment: (was: HDFS-5608.4.patch) > WebHDFS: implement GETACLSTATUS and SETACL. > --- > > Key: HDFS-5608 > URL: https://issues.apache.org/jira/browse/HDFS-5608 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Sachin Jose > Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch, HDFS-5608.2.patch, > HDFS-5608.3.patch > > > Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.
[ https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Jose updated HDFS-5608: -- Attachment: HDFS-5608.4.patch > WebHDFS: implement GETACLSTATUS and SETACL. > --- > > Key: HDFS-5608 > URL: https://issues.apache.org/jira/browse/HDFS-5608 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Sachin Jose > Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch, HDFS-5608.2.patch, > HDFS-5608.3.patch, HDFS-5608.4.patch > > > Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883236#comment-13883236 ] Daryn Sharp commented on HDFS-4564: --- Ok, will split. > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
[ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883234#comment-13883234 ] Yongjun Zhang commented on HDFS-5767: - Hi [~brandonli], What I suggested in my last update is a simplified solution for unique mapping. I don't have solution that supports multi-mapping yet (I also put this aside a bit due to other stuff), but let me take a further look on that. BTW, for your info, http://linux.die.net/man/5/nsswitch.conf defines the search order as: "One or more service specifications e.g., "files", "db", or "nis". The order of the services on the line determines the order in which those services will be queried, in turn, until a result is found. " Thanks. > Nfs implementation assumes userName userId mapping to be unique, which is not > true sometimes > > > Key: HDFS-5767 > URL: https://issues.apache.org/jira/browse/HDFS-5767 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.3.0 > Environment: With LDAP enabled >Reporter: Yongjun Zhang >Assignee: Brandon Li > > I'm seeing that the nfs implementation assumes unique pair > to be returned by command "getent paswd". That is, for a given userName, > there should be a single userId, and for a given userId, there should be a > single userName. The reason is explained in the following message: > private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway > can't start with duplicate name or id on the host system.\n" > + "This is because HDFS (non-kerberos cluster) uses name as the only > way to identify a user or group.\n" > + "The host system with duplicated user/group name or id might work > fine most of the time by itself.\n" > + "However when NFS gateway talks to HDFS, HDFS accepts only user and > group name.\n" > + "Therefore, same name means the same user or same group. To find the > duplicated names/ids, one can do:\n" > + " and > on Linux systms,\n" > + " and PrimaryGroupID> on MacOS."; > This requirement can not be met sometimes (e.g. because of the use of LDAP) > Let's do some examination: > What exist in /etc/passwd: > $ more /etc/passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > $ more /etc/passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > The above result says userName "bin" has userId "2", and "daemon" has userId > "1". > > What we can see with "getent passwd" command due to LDAP: > $ getent passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > bin:x:1:1:bin:/bin:/sbin/nologin > $ getent passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > daemon:x:2:2:daemon:/sbin:/sbin/nologin > We can see that there are multiple entries for the same userName with > different userIds, and the same userId could be associated with different > userNames. > So the assumption stated in the above DEBUG_INFO message can not be met here. > The DEBUG_INFO also stated that HDFS uses name as the only way to identify > user/group. I'm filing this JIRA for a solution. > Hi [~brandonli], since you implemented most of the nfs feature, would you > please comment? > Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5698) Use protobuf to serialize / deserialize FSImage
[ https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5698: - Attachment: HDFS-5698.001.patch > Use protobuf to serialize / deserialize FSImage > --- > > Key: HDFS-5698 > URL: https://issues.apache.org/jira/browse/HDFS-5698 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch > > > Currently, the code serializes FSImage using in-house serialization > mechanisms. There are a couple disadvantages of the current approach: > # Mixing the responsibility of reconstruction and serialization / > deserialization. The current code paths of serialization / deserialization > have spent a lot of effort on maintaining compatibility. What is worse is > that they are mixed with the complex logic of reconstructing the namespace, > making the code difficult to follow. > # Poor documentation of the current FSImage format. The format of the FSImage > is practically defined by the implementation. An bug in implementation means > a bug in the specification. Furthermore, it also makes writing third-party > tools quite difficult. > # Changing schemas is non-trivial. Adding a field in FSImage requires bumping > the layout version every time. Bumping out layout version requires (1) the > users to explicitly upgrade the clusters, and (2) putting new code to > maintain backward compatibility. > This jira proposes to use protobuf to serialize the FSImage. Protobuf has > been used to serialize / deserialize the RPC message in Hadoop. > Protobuf addresses all the above problems. It clearly separates the > responsibility of serialization and reconstructing the namespace. The > protobuf files document the current format of the FSImage. The developers now > can add optional fields with ease, since the old code can always read the new > FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5797) Implement offline image viewer.
[ https://issues.apache.org/jira/browse/HDFS-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-5797. - Resolution: Fixed Hadoop Flags: Reviewed I've committed this. > Implement offline image viewer. > --- > > Key: HDFS-5797 > URL: https://issues.apache.org/jira/browse/HDFS-5797 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: HDFS-5698 (FSImage in protobuf) > > Attachments: HDFS-5797.000.patch, HDFS-5797.001.patch > > > The format of FSImage has changed dramatically therefore a new implementation > of OfflineImageViewer is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5797) Implement offline image viewer.
[ https://issues.apache.org/jira/browse/HDFS-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883180#comment-13883180 ] Jing Zhao commented on HDFS-5797: - I've tested this patch and looks like a new oiv can work now. Some comments: # The new FSImageUtil will add another util class for fsimage. Looks like we need to do some code refactoring here. But since we will finally need to remove all the old saver classes/methods, I think we can do it there. # The lsr part will cost memory. I guess we can create a separate jira in the future to improve it. Thus I think we can commit this patch first and address the remaining issues later. +1 > Implement offline image viewer. > --- > > Key: HDFS-5797 > URL: https://issues.apache.org/jira/browse/HDFS-5797 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: HDFS-5698 (FSImage in protobuf) > > Attachments: HDFS-5797.000.patch, HDFS-5797.001.patch > > > The format of FSImage has changed dramatically therefore a new implementation > of OfflineImageViewer is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883164#comment-13883164 ] Alejandro Abdelnur commented on HDFS-4564: -- about splitting the JIRA, for track-ability I think it will be easier to have 2 separate JIRAs/commits, we can have both of them ready and commit them in tandem. > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883162#comment-13883162 ] Alejandro Abdelnur commented on HDFS-4564: -- [~daryn], thx for sniffing around to see what s going on. So it seems the {{KerberosAuthenticator}} (hadoop-auth Kerberos client side), could be simplified to remove all the SPNEGO handshake and let the JDK do that provided you are in a DO-AS block. The {{KerberosAuthenticator}} would simply extract the AUTH_COOKIE into a hadoop-auth token cookie via {{AuthenticatedURL.extractToken(conn, token)}} and delegate to the fallback if no cookie is present. The presence of the hadoop-auth token cookie, when using the AuthenticatedUrl, will skip completely the 'authentication' path in both the client and the server side. Now, what we have to see is what happens when you are UGI logged in but you don't to this within a DO-AS block. > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk
[ https://issues.apache.org/jira/browse/HDFS-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-5839: - Attachment: org.apache.hadoop.hdfs.web.TestWebHDFS-output.txt > TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk > > > Key: HDFS-5839 > URL: https://issues.apache.org/jira/browse/HDFS-5839 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu > Attachments: org.apache.hadoop.hdfs.web.TestWebHDFS-output.txt > > > Here is test failure: > {code} > testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS) Time elapsed: > 45.206 sec <<< FAILURE! > java.lang.AssertionError: There are 1 exception(s): > Exception 0: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:104) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:615) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlOpener.connect(WebHdfsFileSystem.java:878) > at > org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119) > at > org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103) > at > org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:180) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.hdfs.TestDFSClientRetries$5.run(TestDFSClientRetries.java:954) > at java.lang.Thread.run(Thread.java:724) > at org.junit.Assert.fail(Assert.java:93) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1083) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:1003) > at > org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) > {code} > From test output: > {code} > 2014-01-27 17:55:59,388 WARN resources.ExceptionHandler > (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERROR > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:166) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:231) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:658) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.access$400(NamenodeWebHdfsMethods.java:116) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:631) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:626) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1560) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.Right
[jira] [Updated] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk
[ https://issues.apache.org/jira/browse/HDFS-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-5839: - Description: Here is test failure: {code} testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS) Time elapsed: 45.206 sec <<< FAILURE! java.lang.AssertionError: There are 1 exception(s): Exception 0: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:104) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:615) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlOpener.connect(WebHdfsFileSystem.java:878) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:180) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.TestDFSClientRetries$5.run(TestDFSClientRetries.java:954) at java.lang.Thread.run(Thread.java:724) at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1083) at org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:1003) at org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) {code} >From test output: {code} 2014-01-27 17:55:59,388 WARN resources.ExceptionHandler (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:166) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:231) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:658) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.access$400(NamenodeWebHdfsMethods.java:116) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:631) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:626) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1560) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:626) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) {code} > TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk > > > Key: HDFS-5839 > URL: https://issues.apache.org/jira/browse/HDFS-5839 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu > > Here is test failure: > {code} > testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS) Time elapsed: >
[jira] [Created] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk
Ted Yu created HDFS-5839: Summary: TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk Key: HDFS-5839 URL: https://issues.apache.org/jira/browse/HDFS-5839 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails
[ https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883130#comment-13883130 ] Mit Desai commented on HDFS-5838: - I think the failure will be intermittent. If these tests would run in opposite order the assertion error may not pop up. Adding a label "java7" so that it can be tracked as a JDK7 issue > TestcacheDirectives#testCreateAndModifyPools fails > -- > > Key: HDFS-5838 > URL: https://issues.apache.org/jira/browse/HDFS-5838 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Labels: java7 > Attachments: HDFS-5838.patch > > > testCreateAndModifyPools generates an assertion fail when it runs after > testBasicPoolOperations. > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives) Time > elapsed: 4.649 sec <<< FAILURE! > java.lang.AssertionError: expected no cache pools after deleting pool > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertFalse(Assert.java:68) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160) > Results : > Failed tests: > TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no > cache pools after deleting pool > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883131#comment-13883131 ] Jing Zhao commented on HDFS-5781: - Thanks for the comment Daryn. In general, this patch just changes back to the original behavior, which also uses the static block initializer. I agree it will be a pain to debug static block initializers, that's why in my 001 patch I try to make the initializer simpler. I think we can create a separate jira to see if we can avoid using it. > Use an array to record the mapping between FSEditLogOpCode and the > corresponding byte value > --- > > Key: HDFS-5781 > URL: https://issues.apache.org/jira/browse/HDFS-5781 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, > HDFS-5781.002.patch, HDFS-5781.002.patch > > > HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a > given byte value. While improving the efficiency, it may cause issue. E.g., > when several new editlog ops are added to trunk around the same time (for > several different new features), it is hard to backport the editlog ops with > larger byte values to branch-2 before those with smaller values, since there > will be gaps in the byte values of the enum. > This jira plans to still use an array to record the mapping between editlog > ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails
[ https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5838: Labels: java7 (was: ) > TestcacheDirectives#testCreateAndModifyPools fails > -- > > Key: HDFS-5838 > URL: https://issues.apache.org/jira/browse/HDFS-5838 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Labels: java7 > Attachments: HDFS-5838.patch > > > testCreateAndModifyPools generates an assertion fail when it runs after > testBasicPoolOperations. > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives) Time > elapsed: 4.649 sec <<< FAILURE! > java.lang.AssertionError: expected no cache pools after deleting pool > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertFalse(Assert.java:68) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160) > Results : > Failed tests: > TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no > cache pools after deleting pool > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails
[ https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5838: Status: Patch Available (was: Open) > TestcacheDirectives#testCreateAndModifyPools fails > -- > > Key: HDFS-5838 > URL: https://issues.apache.org/jira/browse/HDFS-5838 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5838.patch > > > testCreateAndModifyPools generates an assertion fail when it runs after > testBasicPoolOperations. > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives) Time > elapsed: 4.649 sec <<< FAILURE! > java.lang.AssertionError: expected no cache pools after deleting pool > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertFalse(Assert.java:68) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160) > Results : > Failed tests: > TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no > cache pools after deleting pool > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
[ https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883123#comment-13883123 ] Hudson commented on HDFS-5825: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5044 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5044/]) HDFS-5825. Use FileUtils.copyFile() to implement DFSTestUtils.copyFile(). (Contributed by Haohui Mai) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561792) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java > Use FileUtils.copyFile() to implement DFSTestUtils.copyFile() > - > > Key: HDFS-5825 > URL: https://issues.apache.org/jira/browse/HDFS-5825 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-5825.000.patch > > > {{DFSTestUtils.copyFile()}} is implemented by copying data through > FileInputStream / FileOutputStream. Apache Common IO provides > {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient. > This jira proposes to implement {{DFSTestUtils.copyFile()}} using > {{FileUtils.copyFile()}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883124#comment-13883124 ] Hudson commented on HDFS-5781: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5044 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5044/]) HDFS-5781. Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561788) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java > Use an array to record the mapping between FSEditLogOpCode and the > corresponding byte value > --- > > Key: HDFS-5781 > URL: https://issues.apache.org/jira/browse/HDFS-5781 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, > HDFS-5781.002.patch, HDFS-5781.002.patch > > > HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a > given byte value. While improving the efficiency, it may cause issue. E.g., > when several new editlog ops are added to trunk around the same time (for > several different new features), it is hard to backport the editlog ops with > larger byte values to branch-2 before those with smaller values, since there > will be gaps in the byte values of the enum. > This jira plans to still use an array to record the mapping between editlog > ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
[ https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5825: Resolution: Fixed Fix Version/s: 2.3.0 Target Version/s: 2.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the contribution Haohui! I committed this to trunk, branch-2 and branch-2.3. > Use FileUtils.copyFile() to implement DFSTestUtils.copyFile() > - > > Key: HDFS-5825 > URL: https://issues.apache.org/jira/browse/HDFS-5825 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-5825.000.patch > > > {{DFSTestUtils.copyFile()}} is implemented by copying data through > FileInputStream / FileOutputStream. Apache Common IO provides > {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient. > This jira proposes to implement {{DFSTestUtils.copyFile()}} using > {{FileUtils.copyFile()}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails
[ https://issues.apache.org/jira/browse/HDFS-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5838: Attachment: HDFS-5838.patch testBasicPoolOperations creates a pool "pool2" which gets never removed. This pool pops up in the when the testCreateAndModifyPools checks for existing pools and gets an assertion fail > TestcacheDirectives#testCreateAndModifyPools fails > -- > > Key: HDFS-5838 > URL: https://issues.apache.org/jira/browse/HDFS-5838 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5838.patch > > > testCreateAndModifyPools generates an assertion fail when it runs after > testBasicPoolOperations. > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives > test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives) Time > elapsed: 4.649 sec <<< FAILURE! > java.lang.AssertionError: expected no cache pools after deleting pool > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertFalse(Assert.java:68) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160) > Results : > Failed tests: > TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no > cache pools after deleting pool > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5781: Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review, Colin! I've committed this to trunk and branch-2. > Use an array to record the mapping between FSEditLogOpCode and the > corresponding byte value > --- > > Key: HDFS-5781 > URL: https://issues.apache.org/jira/browse/HDFS-5781 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, > HDFS-5781.002.patch, HDFS-5781.002.patch > > > HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a > given byte value. While improving the efficiency, it may cause issue. E.g., > when several new editlog ops are added to trunk around the same time (for > several different new features), it is hard to backport the editlog ops with > larger byte values to branch-2 before those with smaller values, since there > will be gaps in the byte values of the enum. > This jira plans to still use an array to record the mapping between editlog > ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883115#comment-13883115 ] Brandon Li commented on HDFS-5754: -- {quote} I think we need two maps. Do you agree?{quote} Yes. We need to maps here. Looks like it's hard to keep the patch at a minimal size. Uploaded a new patch. > Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion > > > Key: HDFS-5754 > URL: https://issues.apache.org/jira/browse/HDFS-5754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Brandon Li > Attachments: FeatureInfo.patch, HDFS-5754.001.patch, > HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, > HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, > HDFS-5754.009.patch > > > Currently, LayoutVersion defines the on-disk data format and supported > features of the entire cluster including NN and DNs. LayoutVersion is > persisted in both NN and DNs. When a NN/DN starts up, it checks its > supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a > different LayoutVersion than NN cannot register with the NN. > We propose to split LayoutVersion into two independent values that are local > to the nodes: > - NamenodeLayoutVersion - defines the on-disk data format in NN, including > the format of FSImage, editlog and the directory structure. > - DatanodeLayoutVersion - defines the on-disk data format in DN, including > the format of block data file, metadata file, block pool layout, and the > directory structure. > The LayoutVersion check will be removed in DN registration. If > NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling > upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5838) TestcacheDirectives#testCreateAndModifyPools fails
Mit Desai created HDFS-5838: --- Summary: TestcacheDirectives#testCreateAndModifyPools fails Key: HDFS-5838 URL: https://issues.apache.org/jira/browse/HDFS-5838 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Mit Desai Assignee: Mit Desai testCreateAndModifyPools generates an assertion fail when it runs after testBasicPoolOperations. {noformat} Running org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.045 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives test(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives) Time elapsed: 4.649 sec <<< FAILURE! java.lang.AssertionError: expected no cache pools after deleting pool at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertFalse(Assert.java:68) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCreateAndModifyPools(TestCacheDirectives.java:334) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.test(TestCacheDirectives.java:160) Results : Failed tests: TestCacheDirectives.test:160->testCreateAndModifyPools:334 expected no cache pools after deleting pool {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5754: - Attachment: HDFS-5754.009.patch > Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion > > > Key: HDFS-5754 > URL: https://issues.apache.org/jira/browse/HDFS-5754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Brandon Li > Attachments: FeatureInfo.patch, HDFS-5754.001.patch, > HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, > HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, > HDFS-5754.009.patch > > > Currently, LayoutVersion defines the on-disk data format and supported > features of the entire cluster including NN and DNs. LayoutVersion is > persisted in both NN and DNs. When a NN/DN starts up, it checks its > supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a > different LayoutVersion than NN cannot register with the NN. > We propose to split LayoutVersion into two independent values that are local > to the nodes: > - NamenodeLayoutVersion - defines the on-disk data format in NN, including > the format of FSImage, editlog and the directory structure. > - DatanodeLayoutVersion - defines the on-disk data format in DN, including > the format of block data file, metadata file, block pool layout, and the > directory structure. > The LayoutVersion check will be removed in DN registration. If > NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling > upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883110#comment-13883110 ] Suresh Srinivas commented on HDFS-5138: --- [~atm], please address the comments before merging to branch-2. My main concern apart from comments on the code is, the need to have all JNs and when any of the steps related to a JN fails, the boundary conditions that arise out of it. These issues can result in loss of metadata and very involved, error prone recovery procedure. It also might need the system to be restarted (say finalize fails because one of the JNs is not up). Please look at the comments on the design and see if I understand it correctly. > Support HDFS upgrade in HA > -- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.1-beta >Reporter: Kihwal Lee >Assignee: Aaron T. Myers >Priority: Blocker > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > hdfs-5138-branch-2.txt > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883105#comment-13883105 ] Daryn Sharp commented on HDFS-5781: --- In general static block initializers are frowned upon - I've been dinged for them in the past. If they ever throw an exception it causes the jvm to misreport the exception in very bizarre ways. > Use an array to record the mapping between FSEditLogOpCode and the > corresponding byte value > --- > > Key: HDFS-5781 > URL: https://issues.apache.org/jira/browse/HDFS-5781 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, > HDFS-5781.002.patch, HDFS-5781.002.patch > > > HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a > given byte value. While improving the efficiency, it may cause issue. E.g., > when several new editlog ops are added to trunk around the same time (for > several different new features), it is hard to backport the editlog ops with > larger byte values to branch-2 before those with smaller values, since there > will be gaps in the byte values of the enum. > This jira plans to still use an array to record the mapping between editlog > ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.
[ https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883097#comment-13883097 ] Hadoop QA commented on HDFS-5830: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625374/HDFS-5830.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated -12 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5949//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5949//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5949//console This message is automatically generated. > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing another cluster. > > > Key: HDFS-5830 > URL: https://issues.apache.org/jira/browse/HDFS-5830 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs-client >Affects Versions: 2.3.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Attachments: HDFS-5830.001.patch > > > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing a another cluster (that doesn't have caching support). > java.lang.IllegalArgumentException: cachedLocs should not be null, use a > different constructor > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883095#comment-13883095 ] Daryn Sharp commented on HDFS-4564: --- I just sniffed our secure clusters doing a hadoop fs ls. It did not prefetch service tickets. The server requested spnego for the getDelegationToken request, client sent service ticket. The client then sent a file stat and list status. Both operations sent the delegation token sans a service ticket. This is with JDK7 although different JDKs may have different behavior. I'm not sure it would be easy to ensure the client never does a pre-fetch of a service ticket -- assuming other JDKs do that. About the only way I can conceive of is create a new subject/ugi with only the token. Token ops use the current user, whereas other ops use the new subject. I'm not necessarily suggesting this approach... > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5837) dfs.namenode.replication.considerLoad does not consider decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Beaudreault updated HDFS-5837: Description: In DefaultBlockPlacementPolicy, there is a setting dfs.namenode.replication.considerLoad which tries to balance the load of the cluster when choosing replica locations. This code does not take into account decommissioned nodes. The code for considerLoad calculates the load by doing: TotalClusterLoad / numNodes. However, numNodes includes decommissioned nodes (which have 0 load). Therefore, the average load is artificially low. Example: TotalLoad = 250 numNodes = 100 decommissionedNodes = 70 remainingNodes = numNodes - decommissionedNodes = 30 avgLoad = 250/100 = 2.50 trueAvgLoad = 250 / 30 = 8.33 If the real load of the remaining 30 nodes is (on average) 8.33, this is more than 2x the calculated average load of 2.50. This causes these nodes to be rejected as replica locations. The final result is that all nodes are rejected, and no replicas can be placed. See exceptions printed from client during this scenario: https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1 was: In DefaultBlockPlacementPolicy, there is a setting dfs.namenode.replication.considerLoad which tries to balance the load of the cluster when choosing replica locations. This code does not take into account decommissioned nodes. The code for considerLoad calculates the load by doing: TotalClusterLoad / numNodes. However, numNodes includes decommissioned nodes (which have 0 load). Therefore, the average load is artificially low. Example: TotalLoad = 250 numNodes = 100 decommissionedNodes = 50 avgLoad = 250/100 = 2.50 trueAvgLoad = 250 / (100 - 70) = 8.33 If the real load of the remaining 30 nodes is (on average) 8.33, this is more than 2x the calculated average load of 2.50. This causes these nodes to be rejected as replica locations. The final result is that all nodes are rejected, and no replicas can be placed. See exceptions printed from client during this scenario: https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1 > dfs.namenode.replication.considerLoad does not consider decommissioned nodes > > > Key: HDFS-5837 > URL: https://issues.apache.org/jira/browse/HDFS-5837 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Bryan Beaudreault > > In DefaultBlockPlacementPolicy, there is a setting > dfs.namenode.replication.considerLoad which tries to balance the load of the > cluster when choosing replica locations. This code does not take into > account decommissioned nodes. > The code for considerLoad calculates the load by doing: TotalClusterLoad / > numNodes. However, numNodes includes decommissioned nodes (which have 0 > load). Therefore, the average load is artificially low. Example: > TotalLoad = 250 > numNodes = 100 > decommissionedNodes = 70 > remainingNodes = numNodes - decommissionedNodes = 30 > avgLoad = 250/100 = 2.50 > trueAvgLoad = 250 / 30 = 8.33 > If the real load of the remaining 30 nodes is (on average) 8.33, this is more > than 2x the calculated average load of 2.50. This causes these nodes to be > rejected as replica locations. The final result is that all nodes are > rejected, and no replicas can be placed. > See exceptions printed from client during this scenario: > https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)