[jira] [Updated] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HDFS-6986: - Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Zhe Zhang. Committed to trunk and branch-2. > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Fix For: 2.6.0 > > Attachments: HDFS-6986-20140905-v2.patch, > HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124342#comment-14124342 ] Hadoop QA commented on HDFS-6981: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666983/HDFS-6981.06.patch against trunk revision e6420fe. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.datanode.TestBPOfferService org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7928//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7928//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7928//console This message is automatically generated. > DN upgrade with layout version change should not use trash > -- > > Key: HDFS-6981 > URL: https://issues.apache.org/jira/browse/HDFS-6981 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: James Thomas >Assignee: Arpit Agarwal > Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, > HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, HDFS-6981.06.patch > > > Post HDFS-6800, we can encounter the following scenario: > # We start with DN software version -55 and initiate a rolling upgrade to > version -56 > # We delete some blocks, and they are moved to trash > # We roll back to DN software version -55 using the -rollback flag – since we > are running the old code (prior to this patch), we will restore the previous > directory but will not delete the trash > # We append to some of the blocks that were deleted in step 2 > # We then restart a DN that contains blocks that were appended to – since the > trash still exists, it will be restored at this point, the appended-to blocks > will be overwritten, and we will lose the appended data > So I think we need to avoid writing anything to the trash directory if we > have a previous directory. > Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124335#comment-14124335 ] Alejandro Abdelnur commented on HDFS-6986: -- +1, test failure seems unrelated. > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986-20140905-v2.patch, > HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6943) Improve NN allocateBlock log to include replicas' datanode IPs
[ https://issues.apache.org/jira/browse/HDFS-6943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124322#comment-14124322 ] Jing Zhao commented on HDFS-6943: - The patch looks good to me. +1. bq. Maybe we should add test like TestDatanodeStorageInfo like TestContainerId. This is a very good suggestion. But looks like a thorough test for DatanodeStorageInfo needs to cover multiple data fields. Thus I'm also fine if we do it in a separate jira. > Improve NN allocateBlock log to include replicas' datanode IPs > -- > > Key: HDFS-6943 > URL: https://issues.apache.org/jira/browse/HDFS-6943 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-6943.patch > > > Datanode storage ID used to use IP and port. It has changed to use UUID. This > makes debugging harder when we want to understand which DNs are assigned when > DFSClient calls addBlock. For example, > {noformat} > BLOCK* allocateBlock: /foo. BP-1980237412-xx.xx.xxx.xxx-1408142057773 > blk_1227779764_154043834{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-9479727b-24c5-4068-8703-dfb9a41c056c:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-abe7840c-1db8-4623-9da7-3aed6a28c4f4:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-956023f4-56a0-4c30-a148-b78c61cf764b:NORMAL|RBW]]} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124291#comment-14124291 ] Lei Chang commented on HDFS-3107: - Is it compatible with snapshot? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124290#comment-14124290 ] Lei Chang commented on HDFS-3107: - Is it compatible with snapshot? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6981: Attachment: HDFS-6981.06.patch Fix stub implementations in SimulatedFSDataset. > DN upgrade with layout version change should not use trash > -- > > Key: HDFS-6981 > URL: https://issues.apache.org/jira/browse/HDFS-6981 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: James Thomas >Assignee: Arpit Agarwal > Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, > HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, HDFS-6981.06.patch > > > Post HDFS-6800, we can encounter the following scenario: > # We start with DN software version -55 and initiate a rolling upgrade to > version -56 > # We delete some blocks, and they are moved to trash > # We roll back to DN software version -55 using the -rollback flag – since we > are running the old code (prior to this patch), we will restore the previous > directory but will not delete the trash > # We append to some of the blocks that were deleted in step 2 > # We then restart a DN that contains blocks that were appended to – since the > trash still exists, it will be restored at this point, the appended-to blocks > will be overwritten, and we will lose the appended data > So I think we need to avoid writing anything to the trash directory if we > have a previous directory. > Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6951) Saving namespace and restarting NameNode will remove existing encryption zones
[ https://issues.apache.org/jira/browse/HDFS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124275#comment-14124275 ] Hadoop QA commented on HDFS-6951: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666978/HDFS-6951.005.patch against trunk revision e6420fe. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7927//console This message is automatically generated. > Saving namespace and restarting NameNode will remove existing encryption zones > -- > > Key: HDFS-6951 > URL: https://issues.apache.org/jira/browse/HDFS-6951 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: encryption >Affects Versions: 3.0.0 >Reporter: Stephen Chu >Assignee: Charles Lamb > Attachments: HDFS-6951-prelim.002.patch, HDFS-6951-testrepo.patch, > HDFS-6951.001.patch, HDFS-6951.002.patch, HDFS-6951.003.patch, > HDFS-6951.004.patch, HDFS-6951.005.patch, editsStored > > > Currently, when users save namespace and restart the NameNode, pre-existing > encryption zones will be wiped out. > I could reproduce this on a pseudo-distributed cluster: > * Create an encryption zone > * List encryption zones and verify the newly created zone is present > * Save the namespace > * Kill and restart the NameNode > * List the encryption zones and you'll find the encryption zone is missing > I've attached a test case for {{TestEncryptionZones}} that reproduces this as > well. Removing the saveNamespace call will get the test to pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6951) Saving namespace and restarting NameNode will remove existing encryption zones
[ https://issues.apache.org/jira/browse/HDFS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6951: --- Attachment: HDFS-6951.005.patch Hi [~andrew.wang], Here's a rebased patch using --binary. The reason it didn't apply is that the NNLayout Version got bumped to -58 by "creating file with overwrite", so this patch bumps it to -59. > Saving namespace and restarting NameNode will remove existing encryption zones > -- > > Key: HDFS-6951 > URL: https://issues.apache.org/jira/browse/HDFS-6951 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: encryption >Affects Versions: 3.0.0 >Reporter: Stephen Chu >Assignee: Charles Lamb > Attachments: HDFS-6951-prelim.002.patch, HDFS-6951-testrepo.patch, > HDFS-6951.001.patch, HDFS-6951.002.patch, HDFS-6951.003.patch, > HDFS-6951.004.patch, HDFS-6951.005.patch, editsStored > > > Currently, when users save namespace and restart the NameNode, pre-existing > encryption zones will be wiped out. > I could reproduce this on a pseudo-distributed cluster: > * Create an encryption zone > * List encryption zones and verify the newly created zone is present > * Save the namespace > * Kill and restart the NameNode > * List the encryption zones and you'll find the encryption zone is missing > I've attached a test case for {{TestEncryptionZones}} that reproduces this as > well. Removing the saveNamespace call will get the test to pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created
[ https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124267#comment-14124267 ] Hadoop QA commented on HDFS-6898: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666935/HDFS-6898.06.patch against trunk revision 21c0cde. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestMultipleNNDataBlockScanner org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7924//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7924//console This message is automatically generated. > DN must reserve space for a full block when an RBW block is created > --- > > Key: HDFS-6898 > URL: https://issues.apache.org/jira/browse/HDFS-6898 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Gopal V >Assignee: Arpit Agarwal > Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, > HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch > > > DN will successfully create two RBW blocks on the same volume even if the > free space is sufficient for just one full block. > One or both block writers may subsequently get a DiskOutOfSpace exception. > This can be avoided by allocating space up front. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124266#comment-14124266 ] Hadoop QA commented on HDFS-6940: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664262/HDFS-6940.patch against trunk revision 21c0cde. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7925//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7925//console This message is automatically generated. > Initial refactoring to allow ConsensusNode implementation > - > > Key: HDFS-6940 > URL: https://issues.apache.org/jira/browse/HDFS-6940 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Attachments: HDFS-6940.patch > > > Minor refactoring of FSNamesystem to open private methods that are needed for > CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()
[ https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124257#comment-14124257 ] Hadoop QA commented on HDFS-7008: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666954/HDFS-7008.1.patch against trunk revision 21c0cde. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7926//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7926//console This message is automatically generated. > xlator should be closed upon exit from DFSAdmin#genericRefresh() > > > Key: HDFS-7008 > URL: https://issues.apache.org/jira/browse/HDFS-7008 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Tsuyoshi OZAWA >Priority: Minor > Attachments: HDFS-7008.1.patch > > > {code} > GenericRefreshProtocol xlator = > new GenericRefreshProtocolClientSideTranslatorPB(proxy); > // Refresh > Collection responses = xlator.refresh(identifier, args); > {code} > GenericRefreshProtocolClientSideTranslatorPB#close() should be called on > xlator before return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124258#comment-14124258 ] Hadoop QA commented on HDFS-6981: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666923/HDFS-6981.05.patch against trunk revision 21c0cde. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer org.apache.hadoop.hdfs.server.balancer.TestBalancer org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits org.apache.hadoop.hdfs.TestSmallBlock org.apache.hadoop.hdfs.TestWriteBlockGetsBlockLengthHint org.apache.hadoop.hdfs.server.namenode.TestFileLimit org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7923//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7923//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7923//console This message is automatically generated. > DN upgrade with layout version change should not use trash > -- > > Key: HDFS-6981 > URL: https://issues.apache.org/jira/browse/HDFS-6981 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: James Thomas >Assignee: Arpit Agarwal > Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, > HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch > > > Post HDFS-6800, we can encounter the following scenario: > # We start with DN software version -55 and initiate a rolling upgrade to > version -56 > # We delete some blocks, and they are moved to trash > # We roll back to DN software version -55 using the -rollback flag – since we > are running the old code (prior to this patch), we will restore the previous > directory but will not delete the trash > # We append to some of the blocks that were deleted in step 2 > # We then restart a DN that contains blocks that were appended to – since the > trash still exists, it will be restored at this point, the appended-to blocks > will be overwritten, and we will lose the appended data > So I think we need to avoid writing anything to the trash directory if we > have a previous directory. > Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124252#comment-14124252 ] Zhanwei Wang commented on HDFS-6994: Hi [~wheat9] I like your suggestion, I will put libhdfs3 into contrib directory first and make it is useful to everyone. I will separate the patch into sub tasks and make the review easy. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124240#comment-14124240 ] Zhanwei Wang commented on HDFS-6994: Hi [~cmccabe] Thanks very much for your comments. Dynamically loading libjvm is a good idea, but it seems not solve all the problems you mentioned in HADOOP-10388. To make fall back feature work, users have to deploy the HDFS jars on every machine. This adds operational complexity for non-Java clients that just want to integrate with HDFS. Otherwise, fall back feature will not work. And fall back feature will finally be removed when the native client implement the full HDFS client feature. About the boost, your are right. Actually boost is not required if the C++ compiler is not too old. And I also think using boost can make libhdfs3 be useful for as many people as possible who use the old C++ compiler. But, yes, I should not require a very new boost version, it can be improved as well as other dependency issues. So, the most important thing I think, is to figure out a way to integrate libhdfs3 and benefit other features in HADOOP-10388. What is your opinion? > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123912#comment-14123912 ] Zhanwei Wang commented on HDFS-6994: Hi [~aw] Naming is hard -_- The binary name is libhdfs3.so.x.x.x libhdfs3 is name, x.x.x is version > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123905#comment-14123905 ] Zhanwei Wang commented on HDFS-6994: Hi [~nidmhbase] Libhdfs3 provides C interface "hdfsGetFileBlockLocations" in hdfs.h and C++ interface "FileSystem::getFileBlockLocations". > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()
[ https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated HDFS-7008: - Attachment: HDFS-7008.1.patch Thanks for your reporting, Ted. Attached a first patch to fix problem. > xlator should be closed upon exit from DFSAdmin#genericRefresh() > > > Key: HDFS-7008 > URL: https://issues.apache.org/jira/browse/HDFS-7008 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Tsuyoshi OZAWA >Priority: Minor > Attachments: HDFS-7008.1.patch > > > {code} > GenericRefreshProtocol xlator = > new GenericRefreshProtocolClientSideTranslatorPB(proxy); > // Refresh > Collection responses = xlator.refresh(identifier, args); > {code} > GenericRefreshProtocolClientSideTranslatorPB#close() should be called on > xlator before return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()
[ https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated HDFS-7008: - Status: Patch Available (was: Open) > xlator should be closed upon exit from DFSAdmin#genericRefresh() > > > Key: HDFS-7008 > URL: https://issues.apache.org/jira/browse/HDFS-7008 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Tsuyoshi OZAWA >Priority: Minor > Attachments: HDFS-7008.1.patch > > > {code} > GenericRefreshProtocol xlator = > new GenericRefreshProtocolClientSideTranslatorPB(proxy); > // Refresh > Collection responses = xlator.refresh(identifier, args); > {code} > GenericRefreshProtocolClientSideTranslatorPB#close() should be called on > xlator before return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6948) DN rejects blocks if it has older UC block
[ https://issues.apache.org/jira/browse/HDFS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123889#comment-14123889 ] Hadoop QA commented on HDFS-6948: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666906/HDFS-6948.201409052147.txt against trunk revision 21c0cde. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7921//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7921//console This message is automatically generated. > DN rejects blocks if it has older UC block > -- > > Key: HDFS-6948 > URL: https://issues.apache.org/jira/browse/HDFS-6948 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Eric Payne > Attachments: HDFS-6948.201409052147.txt > > > DNs appear to always reject blocks, even with newer genstamps, if it already > has a UC copy in its tmp dir. > {noformat}ReplicaAlreadyExistsException: Block > XXX already > exists in state TEMPORARY and thus cannot be created{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7009) Active NN and standby NN have different live nodes
[ https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7009: -- Summary: Active NN and standby NN have different live nodes (was: Not enough retry during DN's initial handshake with NN) > Active NN and standby NN have different live nodes > -- > > Key: HDFS-7009 > URL: https://issues.apache.org/jira/browse/HDFS-7009 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma > > To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most > cases, given DN sends HB and BR to NN regularly, if a specific RPC call > fails, it isn't a big deal. > However, there are cases where DN fails to register with NN during initial > handshake due to exceptions not covered by RPC client's connection retry. > When this happens, the DN won't talk to that NN until the DN restarts. > {noformat} > BPServiceActor > public void run() { > LOG.info(this + " starting to offer service"); > try { > // init stuff > try { > // setup storage > connectToNNAndHandshake(); > } catch (IOException ioe) { > // Initial handshake, storage recovery or registration failed > // End BPOfferService thread > LOG.fatal("Initialization failed for block pool " + this, ioe); > return; > } > initialized = true; // bp is initialized; > > while (shouldRun()) { > try { > offerService(); > } catch (Exception ex) { > LOG.error("Exception in BPOfferService for " + this, ex); > sleepAndLogInterrupts(5000, "offering service"); > } > } > ... > {noformat} > Here is an example of the call stack. > {noformat} > java.io.IOException: Failed on local exception: java.io.IOException: Response > is null.; Host Details : local host is: "xxx"; destination host is: > "yyy":8030; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) > at org.apache.hadoop.ipc.Client.call(Client.java:1239) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Response is null. > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) > {noformat} > This will create discrepancy between active NN and standby NN in terms of > live nodes. > > Here is a possible scenario of missing blocks after failover. > 1. DN A, B set up handshakes with active NN, but not with standby NN. > 2. A block is replicated to DN A, B and C. > 3. From standby NN's point of view, given A and B are dead nodes, the block > is under replicated. > 4. DN C is down. > 5. Before active NN detects DN C is down, it fails over. > 6. The new active NN considers the block is missing. Even though there are > two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7007) Interfaces to plugin ConsensusNode.
[ https://issues.apache.org/jira/browse/HDFS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123884#comment-14123884 ] Konstantin Shvachko commented on HDFS-7007: --- Another [observation|https://issues.apache.org/jira/browse/HDFS-6940?focusedCommentId=14109691&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14109691], which [~atm] does, argues that subclassing of NameNode classes makes the implementation more fragile (if I understood it correctly). There is one essential advantage of subclassing. I can create a completely different sub-project, recreate the package structure of the parent class and reuse methods classes of that package in the parent class project, without modifying the parent project. That way the parent project is completely independent of the new sub-project. Example: {code} /hadoop-hdfs/ org.apache.hadoop.hdfs.server.namenode.NameNode {} /hadoop-cnode/ org.apache.hadoop.hdfs.server.namenode.ConsensusNode implements NameNode {} {code} In this case you can modify and build hadoop-hdfs without taking into account hadoop-cnode. And deal with CNode only on the integration stage. I thought such separation is desirable. > Interfaces to plugin ConsensusNode. > --- > > Key: HDFS-7007 > URL: https://issues.apache.org/jira/browse/HDFS-7007 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko > > This is to introduce interfaces in NameNode and namesystem, which are needed > to plugin ConsensusNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123881#comment-14123881 ] Hadoop QA commented on HDFS-6986: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666902/HDFS-6986-20140905-v2.patch against trunk revision 21c0cde. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7920//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7920//console This message is automatically generated. > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986-20140905-v2.patch, > HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123880#comment-14123880 ] Hadoop QA commented on HDFS-6982: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666898/HDFS-6982.v2.patch against trunk revision 21c0cde. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.qjournal.server.TestJournalNode org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7919//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7919//console This message is automatically generated. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()
[ https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned HDFS-7008: Assignee: Tsuyoshi OZAWA > xlator should be closed upon exit from DFSAdmin#genericRefresh() > > > Key: HDFS-7008 > URL: https://issues.apache.org/jira/browse/HDFS-7008 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Tsuyoshi OZAWA >Priority: Minor > > {code} > GenericRefreshProtocol xlator = > new GenericRefreshProtocolClientSideTranslatorPB(proxy); > // Refresh > Collection responses = xlator.refresh(identifier, args); > {code} > GenericRefreshProtocolClientSideTranslatorPB#close() should be called on > xlator before return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123873#comment-14123873 ] Konstantin Boudnik commented on HDFS-6940: -- bq. Sure, but by creating a plugin interface or something of that ilk we can precisely define the contract I have a great idea [~atm] - let's in fact do everything as plugins! For example 2.4.0 release introduced 3 backward incompatible fixes that broke _at least_ two huge components in the downsteam. In fact, we are catching stuff like that in Bigtop all the time. I am sure it could've been avoided if we only we had a better plugin contracts for everything that depends on the Hadoop bits. I think everyone should've figured out by now that being in the position of a base-layer puts a tremendous pressure on the development practices and architectural decisions. Changes in the Hadoop shouldn't be breaking user space (similarly to that of Linux kernel). Likewise, changes in a super class should not be breaking its children if the said super-class' contracts are well designed and implemented - that's a basic principle of OOP after all. By artificially limiting choices of the future consumers of a library instead of implementing accommodative APIs one doesn't build a better system. One'd simply be forcing downstream developers to hack-in or around those arbitrary limitations. And such development won't produce a well integrated stack. The evidences of it are plenty. > Initial refactoring to allow ConsensusNode implementation > - > > Key: HDFS-6940 > URL: https://issues.apache.org/jira/browse/HDFS-6940 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Attachments: HDFS-6940.patch > > > Minor refactoring of FSNamesystem to open private methods that are needed for > CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7009) No enough retry during DN's initial handshake with NN
Ming Ma created HDFS-7009: - Summary: No enough retry during DN's initial handshake with NN Key: HDFS-7009 URL: https://issues.apache.org/jira/browse/HDFS-7009 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most cases, given DN sends HB and BR to NN regularly, if a specific RPC call fails, it isn't a big deal. However, there are cases where DN fails to register with NN during initial handshake due to exceptions not covered by RPC client's connection retry. When this happens, the DN won't talk to that NN until the DN restarts. {noformat} BPServiceActor public void run() { LOG.info(this + " starting to offer service"); try { // init stuff try { // setup storage connectToNNAndHandshake(); } catch (IOException ioe) { // Initial handshake, storage recovery or registration failed // End BPOfferService thread LOG.fatal("Initialization failed for block pool " + this, ioe); return; } initialized = true; // bp is initialized; while (shouldRun()) { try { offerService(); } catch (Exception ex) { LOG.error("Exception in BPOfferService for " + this, ex); sleepAndLogInterrupts(5000, "offering service"); } } ... {noformat} Here is an example of the call stack. {noformat} java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: "xxx"; destination host is: "yyy":8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) {noformat} This will create discrepancy between active NN and standby NN in terms of live nodes. Here is a possible scenario of missing blocks after failover. 1. DN A, B set up handshakes with active NN, but not with standby NN. 2. A block is replicated to DN A, B and C. 3. From standby NN's point of view, given A and B are dead nodes, the block is under replicated. 4. DN C is down. 5. Before active NN detects DN C is down, it fails over. 6. The new active NN considers the block is missing. Even though there are two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7009) Not enough retry during DN's initial handshake with NN
[ https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7009: -- Summary: Not enough retry during DN's initial handshake with NN (was: No enough retry during DN's initial handshake with NN) > Not enough retry during DN's initial handshake with NN > -- > > Key: HDFS-7009 > URL: https://issues.apache.org/jira/browse/HDFS-7009 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma > > To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most > cases, given DN sends HB and BR to NN regularly, if a specific RPC call > fails, it isn't a big deal. > However, there are cases where DN fails to register with NN during initial > handshake due to exceptions not covered by RPC client's connection retry. > When this happens, the DN won't talk to that NN until the DN restarts. > {noformat} > BPServiceActor > public void run() { > LOG.info(this + " starting to offer service"); > try { > // init stuff > try { > // setup storage > connectToNNAndHandshake(); > } catch (IOException ioe) { > // Initial handshake, storage recovery or registration failed > // End BPOfferService thread > LOG.fatal("Initialization failed for block pool " + this, ioe); > return; > } > initialized = true; // bp is initialized; > > while (shouldRun()) { > try { > offerService(); > } catch (Exception ex) { > LOG.error("Exception in BPOfferService for " + this, ex); > sleepAndLogInterrupts(5000, "offering service"); > } > } > ... > {noformat} > Here is an example of the call stack. > {noformat} > java.io.IOException: Failed on local exception: java.io.IOException: Response > is null.; Host Details : local host is: "xxx"; destination host is: > "yyy":8030; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) > at org.apache.hadoop.ipc.Client.call(Client.java:1239) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Response is null. > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) > {noformat} > This will create discrepancy between active NN and standby NN in terms of > live nodes. > > Here is a possible scenario of missing blocks after failover. > 1. DN A, B set up handshakes with active NN, but not with standby NN. > 2. A block is replicated to DN A, B and C. > 3. From standby NN's point of view, given A and B are dead nodes, the block > is under replicated. > 4. DN C is down. > 5. Before active NN detects DN C is down, it fails over. > 6. The new active NN considers the block is missing. Even though there are > two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123864#comment-14123864 ] Colin Patrick McCabe commented on HDFS-6994: bq. Haohui wrote: Do you want to separate the patch into sub tasks so that it can go through the review process? I agree. Why don't you guys separate this into a few subtasks, and use the HADOOP-10388 branch as the target? bq. Personally I think that this is an alternative implementation of libhdfs. I don't think we need to get rid of boost for now, but I think the code can be put in the contrib directory which is not built by default, but still allow other people to check it out if they're interested. I think we should make this useful to as many people as possible. That's the reason I made my comment about the possible boost dependency issue. I looked at this a little more closely, though, and I see that the purpose of boost is to substitute for C\+\+11 features such as {{std::thread}}, in cases where the compiler is too old to provide them. With that in mind, I think that it's ok for now. I do think we should do the Jenkins build without boost, to make sure that the C\+\+11 code works. C\+\+11 is clearly the future for C++ and we should be prepared for it. I want to reiterate that we should have a way to switch between this new library and the existing libhdfs. I'd be happy to work on that (I can re-purpose the existing code from HADOOP-10388 to do that) and it will expand the user-base big-time. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123858#comment-14123858 ] Konstantin Shvachko commented on HDFS-6940: --- Aaron, I created HDFS-7007 we can continue discussing interfaces there. Do you have technical objections to the proposed patch? It is a regular practice to make refactoring on the trunk before creating a branch for a new feature in order ease merging. I am sure you are familiar with that. > Initial refactoring to allow ConsensusNode implementation > - > > Key: HDFS-6940 > URL: https://issues.apache.org/jira/browse/HDFS-6940 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Attachments: HDFS-6940.patch > > > Minor refactoring of FSNamesystem to open private methods that are needed for > CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123852#comment-14123852 ] Maysam Yabandeh commented on HDFS-6982: --- Thanks [~wheat9]. bq. What are the minimal changes in the hadoop side to enable this functionality? The minimal change is a couple of lines to register TopMetrics with the hadoop metrics system. {code} DefaultMetricsSystem.initialize("nntop"); TopConfiguration conf = new TopConfiguration(); TopMetrics.initSingleton(conf, "processName", "sessionId", TopUtil.getRequestedReportPeriods(conf)); {code} Also a config change to register TopAuditLogger as the nn audit logger. bq. Should rolling window reside in the NN? The rolling window only provides light weight aggregation and this logic can also be in an external process as it was suggested in the second architecture in the design doc. To transfer the events from the nn to the rolling window residing in another process (or any other aggregation service) the second architecture benefits from already existing audit logs. We also have been using this approach at Twitter mostly to be reliable against the worst case scenarios and have the recent top users retrievable even if the name node is not responsive. The down side was the overhead of parsing the logs. Smaller clusters might also rather not having to maintain an additional process to have access to the top users. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7007) Interfaces to plugin ConsensusNode.
[ https://issues.apache.org/jira/browse/HDFS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123844#comment-14123844 ] Konstantin Shvachko commented on HDFS-7007: --- [~sanjay.radia] [suggested|https://issues.apache.org/jira/browse/HDFS-6469?focusedCommentId=14111655&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14111655] to isolate plugin interfaces, which would make integration of ConsensusNode easier. I see two types of interfaces. One is CoordinationEngine inteface, which is introduced in HADOOP-10641. This one as is ready as an inteface. Second, is an interface (or a sereas of them), which would allow to intercept a client RPC call, identify one that modifies the namespace, allow to submit that call for coordination, and then allow to call a namespace operation corresponding to the call. I have an implementation that does all the above, but I haven't thought about it in terms of plugins. Any ideas, clarifications, examples are very much welcome. > Interfaces to plugin ConsensusNode. > --- > > Key: HDFS-7007 > URL: https://issues.apache.org/jira/browse/HDFS-7007 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko > > This is to introduce interfaces in NameNode and namesystem, which are needed > to plugin ConsensusNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()
Ted Yu created HDFS-7008: Summary: xlator should be closed upon exit from DFSAdmin#genericRefresh() Key: HDFS-7008 URL: https://issues.apache.org/jira/browse/HDFS-7008 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} GenericRefreshProtocol xlator = new GenericRefreshProtocolClientSideTranslatorPB(proxy); // Refresh Collection responses = xlator.refresh(identifier, args); {code} GenericRefreshProtocolClientSideTranslatorPB#close() should be called on xlator before return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7007) Interfaces to plugin ConsensusNode.
Konstantin Shvachko created HDFS-7007: - Summary: Interfaces to plugin ConsensusNode. Key: HDFS-7007 URL: https://issues.apache.org/jira/browse/HDFS-7007 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko This is to introduce interfaces in NameNode and namesystem, which are needed to plugin ConsensusNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123808#comment-14123808 ] Aaron T. Myers commented on HDFS-6940: -- bq. If you write an application, which depends on HDFS (or any other system), whether you subclass or encapsulate anything from HDFS you can break that application by making changes to HDFS. E.g. a change in getBlockLocations() can break Yarn or HBase. Same here. Sure, but by creating a plugin interface or something of that ilk we can precisely define the contract, both for implementers of the interface and maintainers of the main system. By subclassing, you're making it more fragile. Anyway, I'm fine if you want to proceed with this direction, but please only commit this to the branch, not to trunk. No reason this change needs to be on trunk instead of the branch for you to be able to make progress. > Initial refactoring to allow ConsensusNode implementation > - > > Key: HDFS-6940 > URL: https://issues.apache.org/jira/browse/HDFS-6940 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Attachments: HDFS-6940.patch > > > Minor refactoring of FSNamesystem to open private methods that are needed for > CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-6940: -- Status: Patch Available (was: Open) ATM> Not entirely sure what was unclear here. Unclear, because you uses expressions like "somehow abstract", "some sort of interface", etc. without clarifying how or giving any examples, which is not constructive. If you write an application, which depends on HDFS (or any other system), whether you subclass or encapsulate anything from HDFS you can break that application by making changes to HDFS. E.g. a change in getBlockLocations() can break Yarn or HBase. Same here. I will create a new jira as discussed in HDFS-6469 to track possible plugin interfaces related to ConsensusNode, we can move this discussion there. For this jira making it patch available to trigger Jenkins. The methods will need to be opened up which ever directions we take with interfaces. > Initial refactoring to allow ConsensusNode implementation > - > > Key: HDFS-6940 > URL: https://issues.apache.org/jira/browse/HDFS-6940 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Attachments: HDFS-6940.patch > > > Minor refactoring of FSNamesystem to open private methods that are needed for > CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123796#comment-14123796 ] Hadoop QA commented on HDFS-6986: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666911/HDFS-6986-20140905-v3.patch against trunk revision 21c0cde. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestMetaSave The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7922//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7922//console This message is automatically generated. > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986-20140905-v2.patch, > HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6898) DN must reserve space for a full block when an RBW block is created
[ https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6898: Hadoop Flags: Reviewed +1 for the patch. Thanks again, Arpit. > DN must reserve space for a full block when an RBW block is created > --- > > Key: HDFS-6898 > URL: https://issues.apache.org/jira/browse/HDFS-6898 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Gopal V >Assignee: Arpit Agarwal > Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, > HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch > > > DN will successfully create two RBW blocks on the same volume even if the > free space is sufficient for just one full block. > One or both block writers may subsequently get a DiskOutOfSpace exception. > This can be avoided by allocating space up front. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123783#comment-14123783 ] Hadoop QA commented on HDFS-6986: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666886/HDFS-6986-20140905.patch against trunk revision 0571b45. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7917//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7917//console This message is automatically generated. > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986-20140905-v2.patch, > HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6898) DN must reserve space for a full block when an RBW block is created
[ https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6898: Attachment: HDFS-6898.06.patch Thanks for the review Chris. Updated patch attached. > DN must reserve space for a full block when an RBW block is created > --- > > Key: HDFS-6898 > URL: https://issues.apache.org/jira/browse/HDFS-6898 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Gopal V >Assignee: Arpit Agarwal > Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, > HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch > > > DN will successfully create two RBW blocks on the same volume even if the > free space is sufficient for just one full block. > One or both block writers may subsequently get a DiskOutOfSpace exception. > This can be avoided by allocating space up front. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance
[ https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123767#comment-14123767 ] Chris Nauroth commented on HDFS-6606: - Hi, [~hitliuyi]. Nice work! This looks like it's fully compatible too with the recent work in HDFS-2856 to remove the requirement to run DataNode as root. If I understand correctly, the {{DFSClient}} is still going to contact the NameNode to obtain an encryption key via {{ClientProtocol#getDataEncryptionKey}} when {{dfs.encrypt.data.transfer}} is true, but then the result wouldn't actually be used if a cipher is negotiated. It's a shame to keep around that extraneous RPC, but it's very small, and I don't see an easy way to change the code to avoid it. Maybe we could queue this up for future consideration. I'd just like to suggest a few more tests: # {{TestSaslDataTransfer}}: A new test here would validate that it works with the HDFS-2856 style, setting {{dfs.data.transfer.protection}} instead of {{dfs.encrypt.data.transfer}}. # {{TestBalancerWithEncryptedTransfer}}: A new test here would validate that everything works correctly end-to-end with the balancer. # {{TestBalancerWithSaslDataTransfer}}: Same as #2, using the HDFS-2856 style with {{dfs.data.transfer.protection}} configured instead of {{dfs.encrypt.data.transfer}}. > Optimize HDFS Encrypted Transport performance > - > > Key: HDFS-6606 > URL: https://issues.apache.org/jira/browse/HDFS-6606 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs-client, security >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, > HDFS-6606.003.patch, OptimizeHdfsEncryptedTransportperformance.pdf > > > In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, > it was a great work. > It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf), it supports > three security strength: > * high 3des or rc4 (128bits) > * medium des or rc4(56bits) > * low rc4(40bits) > 3des and rc4 are slow, only *tens of MB/s*, > http://www.javamex.com/tutorials/cryptography/ciphers.shtml > http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/ > I will give more detailed performance data in future. Absolutely it’s > bottleneck and will vastly affect the end to end performance. > AES(Advanced Encryption Standard) is recommended as a replacement of DES, > it’s more secure; with AES-NI support, the throughput can reach nearly > *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is > supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add > a new mode support for AES). > This JIRA will use AES with AES-NI support as encryption algorithm for > DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123764#comment-14123764 ] Konstantin Shvachko commented on HDFS-3107: --- No, while under recovery the file has a lease so nobody can open it for append. Same as with lease recovery. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6727) Refresh data volumes on DataNode based on configuration changes
[ https://issues.apache.org/jira/browse/HDFS-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123758#comment-14123758 ] Hadoop QA commented on HDFS-6727: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666838/HDFS-6727.002.patch against trunk revision 71269f7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.tracing.TestTracing org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7915//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7915//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7915//console This message is automatically generated. > Refresh data volumes on DataNode based on configuration changes > --- > > Key: HDFS-6727 > URL: https://issues.apache.org/jira/browse/HDFS-6727 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 2.5.0, 2.4.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Labels: datanode > Attachments: HDFS-6727.000.delta-HDFS-6775.txt, HDFS-6727.001.patch, > HDFS-6727.002.patch, HDFS-6727.combo.patch > > > HDFS-1362 requires DataNode to reload configuration file during the runtime, > so that DN can change the data volumes dynamically. This JIRA reuses the > reconfiguration framework introduced by HADOOP-7001 to enable DN to > reconfigure at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123750#comment-14123750 ] Haohui Mai commented on HDFS-6982: -- This is a nice feature, thanks for [~maysamyabandeh]! I have a couple questions: # What are the minimal changes in the hadoop side to enable this functionality? If nntop goes for the second architecture, does it mean that there is no code changes required in the hadoop side? # Should rolling window reside in the NN? I wonder whether the code should simply publish the metrics to Ganglia / Nagios, etc., and let these framework take care of the aggregation, plotting, etc. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6877) Interrupt writes when the volume being written is removed.
[ https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6877: Attachment: HDFS-6877.001.combo.txt > Interrupt writes when the volume being written is removed. > -- > > Key: HDFS-6877 > URL: https://issues.apache.org/jira/browse/HDFS-6877 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 2.5.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-6877.000.consolidate.txt, > HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, > HDFS-6877.001.patch > > > It will be a race condition that a client is actively writing a block, while > the volume that this block is on is being removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6877) Interrupt writes when the volume being written is removed.
[ https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6877: Attachment: HDFS-6877.001.patch Update patch to: * Fix the order of deleting DatanodeStorage from {{FsDatasetImpl#storageMap}}. * Add timeout for functional tests. > Interrupt writes when the volume being written is removed. > -- > > Key: HDFS-6877 > URL: https://issues.apache.org/jira/browse/HDFS-6877 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 2.5.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-6877.000.consolidate.txt, > HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.patch > > > It will be a race condition that a client is actively writing a block, while > the volume that this block is on is being removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6981: Attachment: HDFS-6981.05.patch Updated patch with a marker file for each BlockPoolSliceStorage root when rolling upgrade is in progress. The presence of the marker file is used to determine whether or not to delete the 'previous' directory when the rolling upgrade is no longer in progress. > DN upgrade with layout version change should not use trash > -- > > Key: HDFS-6981 > URL: https://issues.apache.org/jira/browse/HDFS-6981 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: James Thomas >Assignee: Arpit Agarwal > Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, > HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch > > > Post HDFS-6800, we can encounter the following scenario: > # We start with DN software version -55 and initiate a rolling upgrade to > version -56 > # We delete some blocks, and they are moved to trash > # We roll back to DN software version -55 using the -rollback flag – since we > are running the old code (prior to this patch), we will restore the previous > directory but will not delete the trash > # We append to some of the blocks that were deleted in step 2 > # We then restart a DN that contains blocks that were appended to – since the > trash still exists, it will be restored at this point, the appended-to blocks > will be overwritten, and we will lose the appended data > So I think we need to avoid writing anything to the trash directory if we > have a previous directory. > Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123718#comment-14123718 ] Yongjun Zhang commented on HDFS-4239: - Thanks Jimmy. > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Yongjun Zhang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-4239: -- Assignee: Yongjun Zhang > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Yongjun Zhang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123715#comment-14123715 ] Jimmy Xiang commented on HDFS-4239: --- Sure. Assigned it to you. > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Yongjun Zhang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6951) Saving namespace and restarting NameNode will remove existing encryption zones
[ https://issues.apache.org/jira/browse/HDFS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123703#comment-14123703 ] Andrew Wang commented on HDFS-6951: --- Charles, do you mind rebasing this? It doesn't apply for me: {noformat} -> % git apply -p0 HDFS-6951.004.patch error: patch failed: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeLayoutVersion.java:65 error: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeLayoutVersion.java: patch does not apply error: cannot apply binary patch to 'hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored' without full index line error: hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored: patch does not apply error: patch failed: hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml:1 error: hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml: patch does not apply {noformat} If you provide a {{git diff --binary}}, I can also apply that directly when doing the commit. > Saving namespace and restarting NameNode will remove existing encryption zones > -- > > Key: HDFS-6951 > URL: https://issues.apache.org/jira/browse/HDFS-6951 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: encryption >Affects Versions: 3.0.0 >Reporter: Stephen Chu >Assignee: Charles Lamb > Attachments: HDFS-6951-prelim.002.patch, HDFS-6951-testrepo.patch, > HDFS-6951.001.patch, HDFS-6951.002.patch, HDFS-6951.003.patch, > HDFS-6951.004.patch, editsStored > > > Currently, when users save namespace and restart the NameNode, pre-existing > encryption zones will be wiped out. > I could reproduce this on a pseudo-distributed cluster: > * Create an encryption zone > * List encryption zones and verify the newly created zone is present > * Save the namespace > * Kill and restart the NameNode > * List the encryption zones and you'll find the encryption zone is missing > I've attached a test case for {{TestEncryptionZones}} that reproduces this as > well. Removing the saveNamespace call will get the test to pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123701#comment-14123701 ] Yongjun Zhang commented on HDFS-4239: - HI [~jxiang], thanks for your earlier work on this issue. I wonder if you will have time to work on this? if not, do you mind I take it over? Thanks. > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-4239: -- Status: Open (was: Patch Available) > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4284) BlockReaderLocal not notified of failed disks
[ https://issues.apache.org/jira/browse/HDFS-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-4284: -- Assignee: (was: Jimmy Xiang) > BlockReaderLocal not notified of failed disks > - > > Key: HDFS-4284 > URL: https://issues.apache.org/jira/browse/HDFS-4284 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0, 2.0.2-alpha >Reporter: Andy Isaacson > > When a DN marks a disk as bad, it stops using replicas on that disk. > However a long-running {{BlockReaderLocal}} instance will continue to access > replicas on the failing disk. > Somehow we should let the in-client BlockReaderLocal know that a disk has > been marked as bad so that it can stop reading from the bad disk. > From HDFS-4239: > bq. To rephrase that, a long running BlockReaderLocal will ride over local DN > restarts and disk "ejections". We had to drain the RS of all its regions in > order to stop it from using the bad disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-4239: -- Assignee: (was: Jimmy Xiang) > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123694#comment-14123694 ] Chris Nauroth commented on HDFS-6506: - Unfortunately, it appears this patch has gone stale. [~decster], would you mind updating the patch? [~djp], would you mind +1'ing a new patch quickly if you don't have any other feedback? I'm happy to take care of the commit if you're busy. It would be nice to get this in and hopefully put an end to the spurious failures in the balancer tests. Thanks! > Newly moved block replica been invalidated and deleted in TestBalancer > -- > > Key: HDFS-6506 > URL: https://issues.apache.org/jira/browse/HDFS-6506 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch > > > TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently > https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ > from the error log, the reason seems to be that newly moved block replicas > been invalidated and deleted, so some work of the balancer are reversed. > {noformat} > 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) > - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 > to 127.0.0.1:55468 through 127.0.0.1:49159 > 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) > - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 > to 127.0.0.1:55468 through 127.0.0.1:49159 > 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) > - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 > to 127.0.0.1:55468 through 127.0.0.1:49159 > 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) > - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 > to 127.0.0.1:55468 through 127.0.0.1:49159 > 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) > - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 > to 127.0.0.1:55468 through 127.0.0.1:49159 > 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) > - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 > to 127.0.0.1:55468 through 127.0.0.1:49159 > 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) > - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 > to 127.0.0.1:55468 through 127.0.0.1:49159 > 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) > - Successfully moved blk_1073741829_1005 with size=100 fr > 2014-06-06 18:15:54,706 INFO BlockStateChange > (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* > chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to > invalidated blocks set > 2014-06-06 18:15:54,709 INFO BlockStateChange > (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* > chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to > invalidated blocks set > 2014-06-06 18:15:56,421 INFO BlockStateChange > (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask > 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] > 2014-06-06 18:15:57,717 INFO BlockStateChange > (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* > chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to > invalidated blocks set > 2014-06-06 18:15:57,720 INFO BlockStateChange > (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* > chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to > invalidated blocks set > 2014-06-06 18:15:57,721 INFO BlockStateChange > (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* > chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to > invalidated blocks set > 2014-06-06 18:15:57,722 INFO BlockStateChange > (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* > chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to > invalidated blocks set > 2014-06-06 18:15:57,723 INFO BlockStateChange > (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* > chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to > invalidated blocks set > 2014-06-06 18:15:59,422 INFO BlockStateChange > (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask > 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, > blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] > 2014-06-06 18:16:02,423 INFO BlockStateChange > (BlockManager.java:invalidat
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123683#comment-14123683 ] Haohui Mai commented on HDFS-6994: -- Thanks for posting the patch. It looks interesting. Do you want to separate the patch into sub tasks so that it can go through the review process? Personally I think that this is an alternative implementation of libhdfs. I don't think we need to get rid of boost for now, but I think the code can be put in the contrib directory which is not built by default, but still allow other people to check it out if they're interested. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123678#comment-14123678 ] stack commented on HDFS-6999: - Any chance of your having the particular combination that brings on the infinite loop [~yangjiandan]? Can you reproduce at all? Thanks. > PacketReceiver#readChannelFully is in an infinite loop > -- > > Key: HDFS-6999 > URL: https://issues.apache.org/jira/browse/HDFS-6999 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client >Affects Versions: 2.4.1 >Reporter: Yang Jiandan >Priority: Critical > > In our cluster, we found hbase handler may be never return when it reads hdfs > file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo > found this is because PacketReceiver#readChannelFully is in an infinite loop. > the following while never break. > {code:xml} > while (buf.remaining() > 0) { > int n = ch.read(buf); > if (n < 0) { > throw new IOException("Premature EOF reading from " + ch); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created
[ https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123676#comment-14123676 ] Chris Nauroth commented on HDFS-6898: - Hi, [~arpitagarwal]. The patch looks great. I have just one comment. In {{FsVolumeImpl#releaseReservedSpace}}, the failsafe logic could be subject to a data race. If the {{addAndGet}} results in a negative value, and then another thread calls {{reserveSpaceForRbw}} before the reset to 0 executes, then we'd lose that second thread's reservation. Another approach might be to use a loop that calculates the new value (or 0) and makes a single call to {{compareAndSet}}, repeating until successful. > DN must reserve space for a full block when an RBW block is created > --- > > Key: HDFS-6898 > URL: https://issues.apache.org/jira/browse/HDFS-6898 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Gopal V >Assignee: Arpit Agarwal > Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, > HDFS-6898.04.patch, HDFS-6898.05.patch > > > DN will successfully create two RBW blocks on the same volume even if the > free space is sufficient for just one full block. > One or both block writers may subsequently get a DiskOutOfSpace exception. > This can be avoided by allocating space up front. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6986: Attachment: HDFS-6986-20140905-v3.patch Comparing 2 token objects directly instead of comparing their identifiers, for stronger verification. > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986-20140905-v2.patch, > HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6948) DN rejects blocks if it has older UC block
[ https://issues.apache.org/jira/browse/HDFS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HDFS-6948: - Status: Patch Available (was: Open) > DN rejects blocks if it has older UC block > -- > > Key: HDFS-6948 > URL: https://issues.apache.org/jira/browse/HDFS-6948 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Eric Payne > Attachments: HDFS-6948.201409052147.txt > > > DNs appear to always reject blocks, even with newer genstamps, if it already > has a UC copy in its tmp dir. > {noformat}ReplicaAlreadyExistsException: Block > XXX already > exists in state TEMPORARY and thus cannot be created{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6948) DN rejects blocks if it has older UC block
[ https://issues.apache.org/jira/browse/HDFS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HDFS-6948: - Attachment: HDFS-6948.201409052147.txt > DN rejects blocks if it has older UC block > -- > > Key: HDFS-6948 > URL: https://issues.apache.org/jira/browse/HDFS-6948 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Eric Payne > Attachments: HDFS-6948.201409052147.txt > > > DNs appear to always reject blocks, even with newer genstamps, if it already > has a UC copy in its tmp dir. > {noformat}ReplicaAlreadyExistsException: Block > XXX already > exists in state TEMPORARY and thus cannot be created{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6986: Attachment: HDFS-6986-20140905-v2.patch Stronger test case in the new patch. Thanks [~tucu00] for the suggestion. > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986-20140905-v2.patch, HDFS-6986-20140905.patch, > HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123627#comment-14123627 ] Colin Patrick McCabe commented on HDFS-6994: Hi Zhanwei, this is really interesting. As Wenwu mentioned, there's already a branch where we're working on a native client. It would be nice if we could integrate this with that work somehow. I'm not sure what form that should take. Did you get a chance to read the design doc on HADOOP-10388? There are a few issues important here before this can replace libhdfs. We need the ability to fall back to the JNI code when necessary-- for example, in the case where HDFS is using encryption, and we don't have native client support for that. But we don't want a hard library dependency on libjvm.so-- it should be dynamically loaded. It's good that you are using the existing hdfs.h interface. getLastError seems like it could be a good addition, as well... when using thread-local data for the string. The dependencies here are problematic. libxml2 is not fully thread-safe, and it pulls in a lot of GNOME stuff we don't really want. The boost dependency creates problems as well. For example, Impala depends on a certain version of boost-- if this library pulls in a different version, bad things happen. GnuTls is LGPL, which makes it difficult to ship. I would have to be -1 just based on the dependencies alone... We also have duplicated protobuf files in this patch. We should simply use the protobuf files in the source tree already. If I could summarize my first thoughts: * get rid of boost, including all boost ifdefs * don't worry about earlier RPC versions... we only need to support RPCv9 now (same as Java client code policy in Hadoop) * use libexpat or something instead of libxml2 This is good work overall and hopefully there is stuff we can use here. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client >Reporter: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123618#comment-14123618 ] Yongjun Zhang commented on HDFS-6621: - Thanks [~andrew.wang]. HI [~ravwojdyla], couple of more questions, you said that the old code {{will notify all scheduling threads, even the ones that are waiting and still have all 5 transfer threads occupied}}. Would you please explain how your fix of problem 2 detects the scheduling threads that still have 5 transfer threads occupied so not to notify them? BTW, have you tried to test with the fix of problem 1 only? Or you have to apply fixes for both problem 1 and 2 to see it works? Thanks. > Hadoop Balancer prematurely exits iterations > > > Key: HDFS-6621 > URL: https://issues.apache.org/jira/browse/HDFS-6621 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0, 2.4.0 > Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop > 2.4.0 >Reporter: Benjamin Bowman > Labels: balancer > Attachments: HDFS-6621.patch, HDFS-6621.patch_2 > > > I have been having an issue with the balancing being too slow. The issue was > not with the speed with which blocks were moved, but rather the balancer > would prematurely exit out of it's balancing iterations. It would move ~10 > blocks or 100 MB then exit the current iteration (in which it said it was > planning on moving about 10 GB). > I looked in the Balancer.java code and believe I found and solved the issue. > In the dispatchBlocks() function there is a variable, > "noPendingBlockIteration", which counts the number of iterations in which a > pending block to move cannot be found. Once this number gets to 5, the > balancer exits the overall balancing iteration. I believe the desired > functionality is 5 consecutive no pending block iterations - however this > variable is never reset to 0 upon block moves. So once this number reaches 5 > - even if there have been thousands of blocks moved in between these no > pending block iterations - the overall balancing iteration will prematurely > end. > The fix I applied was to set noPendingBlockIteration = 0 when a pending block > is found and scheduled. In this way, my iterations do not prematurely exit > unless there is 5 consecutive no pending block iterations. Below is a copy > of my dispatchBlocks() function with the change I made. > {code} > private void dispatchBlocks() { > long startTime = Time.now(); > long scheduledSize = getScheduledSize(); > this.blocksToReceive = 2*scheduledSize; > boolean isTimeUp = false; > int noPendingBlockIteration = 0; > while(!isTimeUp && getScheduledSize()>0 && > (!srcBlockList.isEmpty() || blocksToReceive>0)) { > PendingBlockMove pendingBlock = chooseNextBlockToMove(); > if (pendingBlock != null) { > noPendingBlockIteration = 0; > // move the block > pendingBlock.scheduleBlockMove(); > continue; > } > /* Since we can not schedule any block to move, > * filter any moved blocks from the source block list and > * check if we should fetch more blocks from the namenode > */ > filterMovedBlocks(); // filter already moved blocks > if (shouldFetchMoreBlocks()) { > // fetch new blocks > try { > blocksToReceive -= getBlockList(); > continue; > } catch (IOException e) { > LOG.warn("Exception while getting block list", e); > return; > } > } else { > // source node cannot find a pendingBlockToMove, iteration +1 > noPendingBlockIteration++; > // in case no blocks can be moved for source node's task, > // jump out of while-loop after 5 iterations. > if (noPendingBlockIteration >= MAX_NO_PENDING_BLOCK_ITERATIONS) { > setScheduledSize(0); > } > } > // check if time is up or not > if (Time.now()-startTime > MAX_ITERATION_TIME) { > isTimeUp = true; > continue; > } > /* Now we can not schedule any block to move and there are > * no new blocks added to the source block list, so we wait. > */ > try { > synchronized(Balancer.this) { > Balancer.this.wait(1000); // wait for targets/sources to be idle > } > } catch (InterruptedException ignored) { > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated HDFS-6982: -- Attachment: HDFS-6982.v2.patch > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123610#comment-14123610 ] Jing Zhao edited comment on HDFS-6584 at 9/5/14 9:27 PM: - Upload a consolidated patch to run Jenkins. was (Author: jingzhao): Upload a consolidated patch to trigger the Jenkins. > Support Archival Storage > > > Key: HDFS-6584 > URL: https://issues.apache.org/jira/browse/HDFS-6584 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer, namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: HDFS-6584.000.patch, > HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf > > > In most of the Hadoop clusters, as more and more data is stored for longer > time, the demand for storage is outstripping the compute. Hadoop needs a cost > effective and easy to manage solution to meet this demand for storage. > Current solution is: > - Delete the old unused data. This comes at operational cost of identifying > unnecessary data and deleting them manually. > - Add more nodes to the clusters. This adds along with storage capacity > unnecessary compute capacity to the cluster. > Hadoop needs a solution to decouple growing storage capacity from compute > capacity. Nodes with higher density and less expensive storage with low > compute power are becoming available and can be used as cold storage in the > clusters. Based on policy the data from hot storage can be moved to cold > storage. Adding more nodes to the cold storage can grow the storage > independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6584: Status: Patch Available (was: Open) > Support Archival Storage > > > Key: HDFS-6584 > URL: https://issues.apache.org/jira/browse/HDFS-6584 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer, namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: HDFS-6584.000.patch, > HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf > > > In most of the Hadoop clusters, as more and more data is stored for longer > time, the demand for storage is outstripping the compute. Hadoop needs a cost > effective and easy to manage solution to meet this demand for storage. > Current solution is: > - Delete the old unused data. This comes at operational cost of identifying > unnecessary data and deleting them manually. > - Add more nodes to the clusters. This adds along with storage capacity > unnecessary compute capacity to the cluster. > Hadoop needs a solution to decouple growing storage capacity from compute > capacity. Nodes with higher density and less expensive storage with low > compute power are becoming available and can be used as cold storage in the > clusters. Based on policy the data from hot storage can be moved to cold > storage. Adding more nodes to the cold storage can grow the storage > independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6584: Attachment: HDFS-6584.000.patch Upload a consolidated patch to trigger the Jenkins. > Support Archival Storage > > > Key: HDFS-6584 > URL: https://issues.apache.org/jira/browse/HDFS-6584 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer, namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: HDFS-6584.000.patch, > HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf > > > In most of the Hadoop clusters, as more and more data is stored for longer > time, the demand for storage is outstripping the compute. Hadoop needs a cost > effective and easy to manage solution to meet this demand for storage. > Current solution is: > - Delete the old unused data. This comes at operational cost of identifying > unnecessary data and deleting them manually. > - Add more nodes to the clusters. This adds along with storage capacity > unnecessary compute capacity to the cluster. > Hadoop needs a solution to decouple growing storage capacity from compute > capacity. Nodes with higher density and less expensive storage with low > compute power are becoming available and can be used as cold storage in the > clusters. Based on policy the data from hot storage can be moved to cold > storage. Adding more nodes to the cold storage can grow the storage > independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123604#comment-14123604 ] Jing Zhao commented on HDFS-3107: - While the file remains in the under_recovery state while truncation, can the file still be appended? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reassigned HDFS-3107: - Assignee: Plamen Jeliazkov Nicholas in [his comment|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=13235941&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13235941] proposed three approaches to implement truncate. Here is another one, which was mentioned in [this comment|https://issues.apache.org/jira/browse/HDFS-6087?focusedCommentId=13948814&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13948814] of HDFS-6087. Conceptually, truncate removes all full blocks and then starts a recovery process for the last block which is not fully truncated. The truncate recovery is similar to lease recovery. That is, NN sends truncate-DatanodeCommand to one of the DNs containing block replicas. The primary DN synchronizes the new length between replicas, and then sends commitBlockSynchronization() to NN, which completes the truncate. Truncate will work only for closed files. If the file is opened for write an attempt to truncate fails. Here are the truncate steps in more details: - NN receives a truncate(src, newLength) call from a client. - Full blocks are deleted instantaneously. And if there is nothing more to truncate NN returns success to the client. - If not on the block boundary, then NN converts INode to INodeUnderConstruction and set file length to newLength. - The last blocks state is set to BEING_TRUNCATED. - Truncate operation is persisted in editLog. - NN triggers last block length recovery by sending DatanodeCommand and waits for the DN to report back. - File remains UNDER_RECOVERY until the recovery completes. - Lease expiration (soft or hard) will trigger last block recovery for truncate. - If NN restarts it will restart the recovery Assigning to Plamen, he seems to be almost ready with the patch. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123578#comment-14123578 ] Alejandro Abdelnur commented on HDFS-6986: -- can we do the test stronger, asserting we are getting the token set by the mock? > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986-20140905.patch, HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123557#comment-14123557 ] Andrew Wang commented on HDFS-6621: --- I took a quick look at this, and I'm wondering about the change to {{notifyAll}} on a Source rather than Dispatcher. I don't see anything waiting on a Source, so this change essentially makes it into a no-op. I haven't looked into the Balancer deeply enough to figure out the right change though, so maybe someone else can comment. > Hadoop Balancer prematurely exits iterations > > > Key: HDFS-6621 > URL: https://issues.apache.org/jira/browse/HDFS-6621 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0, 2.4.0 > Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop > 2.4.0 >Reporter: Benjamin Bowman > Labels: balancer > Attachments: HDFS-6621.patch, HDFS-6621.patch_2 > > > I have been having an issue with the balancing being too slow. The issue was > not with the speed with which blocks were moved, but rather the balancer > would prematurely exit out of it's balancing iterations. It would move ~10 > blocks or 100 MB then exit the current iteration (in which it said it was > planning on moving about 10 GB). > I looked in the Balancer.java code and believe I found and solved the issue. > In the dispatchBlocks() function there is a variable, > "noPendingBlockIteration", which counts the number of iterations in which a > pending block to move cannot be found. Once this number gets to 5, the > balancer exits the overall balancing iteration. I believe the desired > functionality is 5 consecutive no pending block iterations - however this > variable is never reset to 0 upon block moves. So once this number reaches 5 > - even if there have been thousands of blocks moved in between these no > pending block iterations - the overall balancing iteration will prematurely > end. > The fix I applied was to set noPendingBlockIteration = 0 when a pending block > is found and scheduled. In this way, my iterations do not prematurely exit > unless there is 5 consecutive no pending block iterations. Below is a copy > of my dispatchBlocks() function with the change I made. > {code} > private void dispatchBlocks() { > long startTime = Time.now(); > long scheduledSize = getScheduledSize(); > this.blocksToReceive = 2*scheduledSize; > boolean isTimeUp = false; > int noPendingBlockIteration = 0; > while(!isTimeUp && getScheduledSize()>0 && > (!srcBlockList.isEmpty() || blocksToReceive>0)) { > PendingBlockMove pendingBlock = chooseNextBlockToMove(); > if (pendingBlock != null) { > noPendingBlockIteration = 0; > // move the block > pendingBlock.scheduleBlockMove(); > continue; > } > /* Since we can not schedule any block to move, > * filter any moved blocks from the source block list and > * check if we should fetch more blocks from the namenode > */ > filterMovedBlocks(); // filter already moved blocks > if (shouldFetchMoreBlocks()) { > // fetch new blocks > try { > blocksToReceive -= getBlockList(); > continue; > } catch (IOException e) { > LOG.warn("Exception while getting block list", e); > return; > } > } else { > // source node cannot find a pendingBlockToMove, iteration +1 > noPendingBlockIteration++; > // in case no blocks can be moved for source node's task, > // jump out of while-loop after 5 iterations. > if (noPendingBlockIteration >= MAX_NO_PENDING_BLOCK_ITERATIONS) { > setScheduledSize(0); > } > } > // check if time is up or not > if (Time.now()-startTime > MAX_ITERATION_TIME) { > isTimeUp = true; > continue; > } > /* Now we can not schedule any block to move and there are > * no new blocks added to the source block list, so we wait. > */ > try { > synchronized(Balancer.this) { > Balancer.this.wait(1000); // wait for targets/sources to be idle > } > } catch (InterruptedException ignored) { > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7001) Tests in TestTracing depends on the order of execution
[ https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123552#comment-14123552 ] Hadoop QA commented on HDFS-7001: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666871/HDFS-7001-0.patch against trunk revision 7a62515. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7916//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7916//console This message is automatically generated. > Tests in TestTracing depends on the order of execution > -- > > Key: HDFS-7001 > URL: https://issues.apache.org/jira/browse/HDFS-7001 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-7001-0.patch > > > o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed > first. It should be done in BeforeClass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123530#comment-14123530 ] Zhe Zhang commented on HDFS-6986: - The test case in the new patch mocks a key provider which returns an empty token. > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986-20140905.patch, HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6986: Attachment: HDFS-6986-20140905.patch > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986-20140905.patch, HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7006) Test encryption zones with KMS
[ https://issues.apache.org/jira/browse/HDFS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7006: --- Summary: Test encryption zones with KMS (was: Test encryption zones with MKS) > Test encryption zones with KMS > -- > > Key: HDFS-7006 > URL: https://issues.apache.org/jira/browse/HDFS-7006 > Project: Hadoop HDFS > Issue Type: Test > Components: security, test >Affects Versions: 2.6.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: HDFS-7006.patch > > > We should test EZs with KMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7006) Test encryption zones with MKS
[ https://issues.apache.org/jira/browse/HDFS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HDFS-7006: - Attachment: HDFS-7006.patch > Test encryption zones with MKS > -- > > Key: HDFS-7006 > URL: https://issues.apache.org/jira/browse/HDFS-7006 > Project: Hadoop HDFS > Issue Type: Test > Components: security, test >Affects Versions: 2.6.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: HDFS-7006.patch > > > We should test EZs with KMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7006) Test encryption zones with MKS
Alejandro Abdelnur created HDFS-7006: Summary: Test encryption zones with MKS Key: HDFS-7006 URL: https://issues.apache.org/jira/browse/HDFS-7006 Project: Hadoop HDFS Issue Type: Test Components: security, test Affects Versions: 2.6.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur We should test EZs with KMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7005) DFS input streams do not timeout
Daryn Sharp created HDFS-7005: - Summary: DFS input streams do not timeout Key: HDFS-7005 URL: https://issues.apache.org/jira/browse/HDFS-7005 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Input streams lost their timeout. The problem appears to be {{DFSClient#newConnectedPeer}} does not set the read timeout. During a temporary network interruption the server will close the socket, unbeknownst to the client host, which blocks on a read forever. The results are dire. Services such as the RM, JHS, NMs, oozie servers, etc all need to be restarted to recover - unless you want to wait many hours for the tcp stack keepalive to detect the broken socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider
[ https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123488#comment-14123488 ] Alejandro Abdelnur commented on HDFS-6986: -- I've tested the provided patch in a real cluster and it works as advertised. Please add testcase and we are good to go. > DistributedFileSystem must get delegation tokens from configured KeyProvider > > > Key: HDFS-6986 > URL: https://issues.apache.org/jira/browse/HDFS-6986 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Alejandro Abdelnur >Assignee: Zhe Zhang > Attachments: HDFS-6986.patch > > > {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides > delegation tokens. {{DistributedFileSystem}} should augment the HDFS > delegation tokens with the keyprovider ones so tasks can interact with > keyprovider when it is a client/server impl (KMS). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()
[ https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123482#comment-14123482 ] Colin Patrick McCabe commented on HDFS-6841: In {{DatanodeInfo#getDatanodeReport}}, we translate {{DatanodeInfo#lastUpdate}} into a date: {code} buffer.append("Last contact: "+new Date(lastUpdate)+"\n"); {code} This is not going to work if {{lastUpdate}} is a monotonic time. The easiest way to solve this is to maintain another {{long}} with the wall-clock time, which we set to the current wall-clock time whenever an update occurs. That way we get the benefits of calculating staleness and deadness based on monotonic time, but also reasonable information in {{getDatanodeReport}}. {{FSNamesystem#reached}}: The same issue occurs here. {code} if (reached > 0) resText += " Threshold was reached " + new Date(reached) + "."; {code} {{EditLogTailer#lastLoadTimestamp}}: can we rename this to {{lastLoadTimeMs}}? It is not a timestamp (those come from the wall clock, generally.) We could probably get rid of {{EditLogTailer#getLastLoadTimestamp}} since the only use is in {{FSNamesystem#getMillisSinceLastLoadedEdits}}. All we need is a function which returns the amount of time since the edits were last loaded. > Use Time.monotonicNow() wherever applicable instead of Time.now() > - > > Key: HDFS-6841 > URL: https://issues.apache.org/jira/browse/HDFS-6841 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-6841-001.patch, HDFS-6841-002.patch > > > {{Time.now()}} used in many places to calculate elapsed time. > This should be replaced with {{Time.monotonicNow()}} to avoid effect of > System time changes on elapsed time calculations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123471#comment-14123471 ] James Thomas commented on HDFS-6981: The marker file sounds like the best solution to me. > DN upgrade with layout version change should not use trash > -- > > Key: HDFS-6981 > URL: https://issues.apache.org/jira/browse/HDFS-6981 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: James Thomas >Assignee: Arpit Agarwal > Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, > HDFS-6981.03.patch, HDFS-6981.04.patch > > > Post HDFS-6800, we can encounter the following scenario: > # We start with DN software version -55 and initiate a rolling upgrade to > version -56 > # We delete some blocks, and they are moved to trash > # We roll back to DN software version -55 using the -rollback flag – since we > are running the old code (prior to this patch), we will restore the previous > directory but will not delete the trash > # We append to some of the blocks that were deleted in step 2 > # We then restart a DN that contains blocks that were appended to – since the > trash still exists, it will be restored at this point, the appended-to blocks > will be overwritten, and we will lose the appended data > So I think we need to avoid writing anything to the trash directory if we > have a previous directory. > Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6843) Create FileStatus isEncrypted() method
[ https://issues.apache.org/jira/browse/HDFS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6843: --- Attachment: HDFS-6843.005.patch Resubmitting to see if weird testpatch errors go away. Previous run was as if the patch never got applied. > Create FileStatus isEncrypted() method > -- > > Key: HDFS-6843 > URL: https://issues.apache.org/jira/browse/HDFS-6843 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, security >Affects Versions: 3.0.0 >Reporter: Charles Lamb >Assignee: Charles Lamb > Attachments: HDFS-6843.001.patch, HDFS-6843.002.patch, > HDFS-6843.003.patch, HDFS-6843.004.patch, HDFS-6843.005.patch, > HDFS-6843.005.patch > > > FileStatus should have a 'boolean isEncrypted()' method. (it was in the > context of discussing with AndreW about FileStatus being a Writable). > Having this method would allow MR JobSubmitter do the following: > - > BOOLEAN intermediateEncryption = false > IF jobconf.contains("mr.intermidate.encryption") THEN > intermediateEncryption = jobConf.getBoolean("mr.intermidate.encryption") > ELSE > IF (I/O)Format INSTANCEOF File(I/O)Format THEN > intermediateEncryption = ANY File(I/O)Format HAS a Path with status > isEncrypted()==TRUE > FI > jobConf.setBoolean("mr.intermidate.encryption", intermediateEncryption) > FI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7001) Tests in TestTracing depends on the order of execution
[ https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-7001: --- Attachment: HDFS-7001-0.patch attaching patch. > Tests in TestTracing depends on the order of execution > -- > > Key: HDFS-7001 > URL: https://issues.apache.org/jira/browse/HDFS-7001 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-7001-0.patch > > > o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed > first. It should be done in BeforeClass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7001) Tests in TestTracing depends on the order of execution
[ https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-7001: --- Status: Patch Available (was: Open) > Tests in TestTracing depends on the order of execution > -- > > Key: HDFS-7001 > URL: https://issues.apache.org/jira/browse/HDFS-7001 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-7001-0.patch > > > o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed > first. It should be done in BeforeClass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123413#comment-14123413 ] Arpit Agarwal edited comment on HDFS-6981 at 9/5/14 7:12 PM: - Lacking an explicit finalize command for rolling upgrade, it is hard for the DN to determine when to delete 'previous'. Rolling upgrade is signaled by the presence/absence of RollingUpgradeStatus in the heartbeat response. Without modifying the NN, one solution is that the DN creates a marker file when rolling upgrade is signaled by NN. When rolling upgrade is no longer signaled by NN, 'previous' is cleaned up only if the marker file is present. Else a regular upgrade is in progress and 'previous' is left alone. I am wary of making NN changes, the interaction with HA is complex enough as it is. was (Author: arpitagarwal): Lacking an explicit finalize command for rolling upgrade, it is hard for the DN to determine when to delete 'previous'. Rolling upgrade is signaled by the presence/absence of RollingUpgradeInfo in the heartbeat response. Without modifying the NN, one solution is that the DN creates a marker file when rolling upgrade is signaled by NN. When rolling upgrade is no longer signaled by NN, 'previous' is cleaned up only if the marker file is present. Else a regular upgrade is in progress and 'previous' is left alone. I am wary of making NN changes, the interaction with HA is complex enough as it is. > DN upgrade with layout version change should not use trash > -- > > Key: HDFS-6981 > URL: https://issues.apache.org/jira/browse/HDFS-6981 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: James Thomas >Assignee: Arpit Agarwal > Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, > HDFS-6981.03.patch, HDFS-6981.04.patch > > > Post HDFS-6800, we can encounter the following scenario: > # We start with DN software version -55 and initiate a rolling upgrade to > version -56 > # We delete some blocks, and they are moved to trash > # We roll back to DN software version -55 using the -rollback flag – since we > are running the old code (prior to this patch), we will restore the previous > directory but will not delete the trash > # We append to some of the blocks that were deleted in step 2 > # We then restart a DN that contains blocks that were appended to – since the > trash still exists, it will be restored at this point, the appended-to blocks > will be overwritten, and we will lose the appended data > So I think we need to avoid writing anything to the trash directory if we > have a previous directory. > Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123413#comment-14123413 ] Arpit Agarwal commented on HDFS-6981: - Lacking an explicit finalize command for rolling upgrade, it is hard for the DN to determine when to delete 'previous'. Rolling upgrade is signaled by the presence/absence of RollingUpgradeInfo in the heartbeat response. Without modifying the NN, one solution is that the DN creates a marker file when rolling upgrade is signaled by NN. When rolling upgrade is no longer signaled by NN, 'previous' is cleaned up only if the marker file is present. Else a regular upgrade is in progress and 'previous' is left alone. I am wary of making NN changes, the interaction with HA is complex enough as it is. > DN upgrade with layout version change should not use trash > -- > > Key: HDFS-6981 > URL: https://issues.apache.org/jira/browse/HDFS-6981 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: James Thomas >Assignee: Arpit Agarwal > Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, > HDFS-6981.03.patch, HDFS-6981.04.patch > > > Post HDFS-6800, we can encounter the following scenario: > # We start with DN software version -55 and initiate a rolling upgrade to > version -56 > # We delete some blocks, and they are moved to trash > # We roll back to DN software version -55 using the -rollback flag – since we > are running the old code (prior to this patch), we will restore the previous > directory but will not delete the trash > # We append to some of the blocks that were deleted in step 2 > # We then restart a DN that contains blocks that were appended to – since the > trash still exists, it will be restored at this point, the appended-to blocks > will be overwritten, and we will lose the appended data > So I think we need to avoid writing anything to the trash directory if we > have a previous directory. > Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6984) In Hadoop 3, make FileStatus no longer a Writable
[ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123412#comment-14123412 ] Chris Nauroth commented on HDFS-6984: - bq. So, it looks like DistCp depends on FileStatus being writable... Last time I looked at this, I actually planned on replacing DistCp's usage of {{FileStatus}} serialization with its own custom data type. I believe it doesn't need all of the fields of {{FileStatus}}, so there is potential for a marginal space/performance improvement by omitting the unnecessaries. > In Hadoop 3, make FileStatus no longer a Writable > - > > Key: HDFS-6984 > URL: https://issues.apache.org/jira/browse/HDFS-6984 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-6984.001.patch > > > FileStatus was a Writable in Hadoop 2 and earlier. Originally, we used this > to serialize it and send it over the wire. But in Hadoop 2 and later, we > have the protobuf {{HdfsFileStatusProto}} which serves to serialize this > information. The protobuf form is preferable, since it allows us to add new > fields in a backwards-compatible way. Another issue is that already a lot of > subclasses of FileStatus don't override the Writable methods of the > superclass, breaking the interface contract that read(status.write) should be > equal to the original status. > In Hadoop 3, we should just make FileStatus no longer a writable so that we > don't have to deal with these issues. It's probably too late to do this in > Hadoop 2, since user code may be relying on the ability to use the Writable > methods on FileStatus objects there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6984) In Hadoop 3, make FileStatus no longer a Writable
[ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123402#comment-14123402 ] Colin Patrick McCabe commented on HDFS-6984: So, it looks like DistCp depends on {{FileStatus}} being writable in a pretty fundamental way, since it wants to use it as a MapReduce value in CopyMapper.java: {code} public class CopyMapper extends Mapper { ... {code} Maybe, rather than get rid of the "implements Writable", we should just use protobuf for the serialization in {{FileStatus#write}}. That allows us to add whatever fields we want later via optional protobuf members. > In Hadoop 3, make FileStatus no longer a Writable > - > > Key: HDFS-6984 > URL: https://issues.apache.org/jira/browse/HDFS-6984 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-6984.001.patch > > > FileStatus was a Writable in Hadoop 2 and earlier. Originally, we used this > to serialize it and send it over the wire. But in Hadoop 2 and later, we > have the protobuf {{HdfsFileStatusProto}} which serves to serialize this > information. The protobuf form is preferable, since it allows us to add new > fields in a backwards-compatible way. Another issue is that already a lot of > subclasses of FileStatus don't override the Writable methods of the > superclass, breaking the interface contract that read(status.write) should be > equal to the original status. > In Hadoop 3, we should just make FileStatus no longer a writable so that we > don't have to deal with these issues. It's probably too late to do this in > Hadoop 2, since user code may be relying on the ability to use the Writable > methods on FileStatus objects there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7004) Update KeyProvider instantiation to create by URI
Andrew Wang created HDFS-7004: - Summary: Update KeyProvider instantiation to create by URI Key: HDFS-7004 URL: https://issues.apache.org/jira/browse/HDFS-7004 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Andrew Wang Assignee: Andrew Wang See HADOOP-11054, would be good to update the NN/DFSClient to fetch via this method rather than depending on the URI path lookup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work via webhdfs
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1412#comment-1412 ] Yongjun Zhang commented on HDFS-6776: - Hi [~wheat9], I know that you are opposed to put the msg-parsing hack to webhdfs. However, you also have said: {quote} However, it's okay to me to return a null here so the behavior is similar to DistributedFileSystem. The actual fallback logic can happen at the distcp side when building the file list, but maybe we can defer it to another jira. {quote} I had quite some questions for you in my last two comments, I'd appreciate that you could comment on them. That way, we can understand more about your concern why it's so fragile as you said. Do you agree that a correct webhdfs contract is not to fail with the exception when accessing insecure cluster, rather, it should be able to access insecure cluster? This is a very important question that I hope you could answer. We agree that the msg-parsing is a bit hacky, but why hack in webhdfs is so much worse than in distcp, given webhdfs doesn't work without a fix? BTW, FYI, not to say that it's good thing to do so, there was already code doing msg parsing in webhdfs: {code} // extract UGI-related exceptions and unwrap InvalidToken // the NN mangles these exceptions but the DN does not and may need // to re-fetch a token if either report the token is expired if (re.getMessage().startsWith("Failed to obtain user group information:")) { String[] parts = re.getMessage().split(":\\s+", 3); re = new RemoteException(parts[1], parts[2]); re = ((RemoteException)re).unwrapRemoteException(InvalidToken.class); } {code} Do you consider this fragile? Disclaimer, In the patch I did here, it's not because there was existing code like quoted above. Rather it's because the solution has its simplicity which we discussed earlier. Thanks. > distcp from insecure cluster (source) to secure cluster (destination) doesn't > work via webhdfs > -- > > Key: HDFS-6776 > URL: https://issues.apache.org/jira/browse/HDFS-6776 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.3.0, 2.5.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, > HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch, > HDFS-6776.005.patch, HDFS-6776.006.NullToken.patch, > HDFS-6776.006.NullToken.patch, HDFS-6776.007.patch, HDFS-6776.008.patch, > HDFS-6776.009.patch, HDFS-6776.010.patch, HDFS-6776.011.patch, > dummy-token-proxy.js > > > Issuing distcp command at the secure cluster side, trying to copy stuff from > insecure cluster to secure cluster, and see the following problem: > {code} > hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://:/tmp > hdfs://:8020/tmp/tmptgt > 14/07/30 20:06:19 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, > sourcePaths=[webhdfs://:/tmp], > targetPath=hdfs://:8020/tmp/tmptgt, targetPathExists=true} > 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at > :8032 > 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property > 'ssl.client.truststore.location' has not been set, no TrustStore will be > loaded > 14/07/30 20:06:20 WARN security.UserGroupInformation: > PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) > cause:java.io.IOException: Failed to get the token for hadoopuser, > user=hadoopuser > 14/07/30 20:06:20 WARN security.UserGroupInformation: > PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) > cause:java.io.IOException: Failed to get the token for hadoopuser, > user=hadoopuser > 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered > java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFi
[jira] [Updated] (HDFS-6727) Refresh data volumes on DataNode based on configuration changes
[ https://issues.apache.org/jira/browse/HDFS-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6727: Attachment: HDFS-6727.002.patch Update the patch to remove {{ReconfigurableServlet}} support from the code. Also it uses {{File#getCanonicalPath()}} to determine the changed volumes. > Refresh data volumes on DataNode based on configuration changes > --- > > Key: HDFS-6727 > URL: https://issues.apache.org/jira/browse/HDFS-6727 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 2.5.0, 2.4.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Labels: datanode > Attachments: HDFS-6727.000.delta-HDFS-6775.txt, HDFS-6727.001.patch, > HDFS-6727.002.patch, HDFS-6727.combo.patch > > > HDFS-1362 requires DataNode to reload configuration file during the runtime, > so that DN can change the data volumes dynamically. This JIRA reuses the > reconfiguration framework introduced by HADOOP-7001 to enable DN to > reconfigure at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6862) Add missing timeout annotations to tests
[ https://issues.apache.org/jira/browse/HDFS-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6862: Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks for fixing this [~xyao]! > Add missing timeout annotations to tests > > > Key: HDFS-6862 > URL: https://issues.apache.org/jira/browse/HDFS-6862 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.5.0 >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao > Labels: newbie > Fix For: 2.6.0 > > Attachments: HDFS-6862.0.patch > > > One or more tests in the following classes are missing timeout annotations. > # org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings > # org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints > # org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > # org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions > # org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics > # org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster > # org.apache.hadoop.hdfs.TestHDFSServerPorts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6979) hdfs.dll does not produce .pdb files
[ https://issues.apache.org/jira/browse/HDFS-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6979: Resolution: Fixed Fix Version/s: 2.6.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I have committed this to trunk and branch-2. Arpit, thank you for the code review. Remus, thank you again for reporting the bug. > hdfs.dll does not produce .pdb files > > > Key: HDFS-6979 > URL: https://issues.apache.org/jira/browse/HDFS-6979 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Remus Rusanu >Assignee: Chris Nauroth >Priority: Minor > Labels: build, cmake, native, windows > Fix For: 3.0.0, 2.6.0 > > Attachments: HDFS-6979.1.patch > > > hdfs.dll build does not produce a retail pdb. For comparison we do produce > pdbs for winutils.exe and hadoop.dll. > I did not verify whether cmake project does not produce a dll with embeded > pdb. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6979) hdfs.dll does not produce .pdb files
[ https://issues.apache.org/jira/browse/HDFS-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6979: Summary: hdfs.dll does not produce .pdb files (was: hdfs.dll not produce .pdb files) > hdfs.dll does not produce .pdb files > > > Key: HDFS-6979 > URL: https://issues.apache.org/jira/browse/HDFS-6979 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Remus Rusanu >Assignee: Chris Nauroth >Priority: Minor > Labels: build, cmake, native, windows > Attachments: HDFS-6979.1.patch > > > hdfs.dll build does not produce a retail pdb. For comparison we do produce > pdbs for winutils.exe and hadoop.dll. > I did not verify whether cmake project does not produce a dll with embeded > pdb. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6979) hdfs.dll not produce .pdb files
[ https://issues.apache.org/jira/browse/HDFS-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123243#comment-14123243 ] Arpit Agarwal commented on HDFS-6979: - +1 for the patch. pdb files are good to have. > hdfs.dll not produce .pdb files > > > Key: HDFS-6979 > URL: https://issues.apache.org/jira/browse/HDFS-6979 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Remus Rusanu >Assignee: Chris Nauroth >Priority: Minor > Labels: build, cmake, native, windows > Attachments: HDFS-6979.1.patch > > > hdfs.dll build does not produce a retail pdb. For comparison we do produce > pdbs for winutils.exe and hadoop.dll. > I did not verify whether cmake project does not produce a dll with embeded > pdb. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6376) Distcp data between two HA clusters requires another configuration
[ https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6376: Resolution: Fixed Fix Version/s: 2.6.0 Release Note: Allow distcp to copy data between HA clusters. Users can use a new configuration property "dfs.internal.nameservices" to explicitly specify the name services belonging to the local cluster, while continue using the configuration property "dfs.nameservices" to specify all the name services in the local and remote clusters. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this into trunk and branch-2. Thanks for the contribution, [~dlmarion] and [~wheat9]! > Distcp data between two HA clusters requires another configuration > -- > > Key: HDFS-6376 > URL: https://issues.apache.org/jira/browse/HDFS-6376 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, federation, hdfs-client >Affects Versions: 2.2.0, 2.3.0, 2.4.0 > Environment: Hadoop 2.3.0 >Reporter: Dave Marion >Assignee: Dave Marion > Fix For: 3.0.0, 2.6.0 > > Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, > HDFS-6376-4-branch-2.4.patch, HDFS-6376-5-trunk.patch, > HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch, > HDFS-6376-patch-1.patch, HDFS-6376.000.patch, HDFS-6376.008.patch, > HDFS-6376.009.patch, HDFS-6376.010.patch, HDFS-6376.011.patch > > > User has to create a third set of configuration files for distcp when > transferring data between two HA clusters. > Consider the scenario in [1]. You cannot put all of the required properties > in core-site.xml and hdfs-site.xml for the client to resolve the location of > both active namenodes. If you do, then the datanodes from cluster A may join > cluster B. I can not find a configuration option that tells the datanodes to > federate blocks for only one of the clusters in the configuration. > [1] > http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work via webhdfs
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123232#comment-14123232 ] Haohui Mai edited comment on HDFS-6776 at 9/5/14 5:47 PM: -- I've made it very clear that I'm opposed to put fragile hacks into {{WebHdfsFileSystem}}, though I'm okay if it's done at the application level (e.g. distcp). Unless this is addressed, I cannot give my +1. If you are not familiar with the distcp code, I'll take a look and see whether I can post a patch for it. was (Author: wheat9): I've made it very clear that I'm opposed to put fragile hacks into {{WebHdfsFileSystem}}, though I'm okay if it's done at the application level (e.g. distcp). Unless this is addressed, I cannot give my +1. If you don't want to take a look at the distcp code, I'll take a look and see whether I can post a patch for it. > distcp from insecure cluster (source) to secure cluster (destination) doesn't > work via webhdfs > -- > > Key: HDFS-6776 > URL: https://issues.apache.org/jira/browse/HDFS-6776 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.3.0, 2.5.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, > HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch, > HDFS-6776.005.patch, HDFS-6776.006.NullToken.patch, > HDFS-6776.006.NullToken.patch, HDFS-6776.007.patch, HDFS-6776.008.patch, > HDFS-6776.009.patch, HDFS-6776.010.patch, HDFS-6776.011.patch, > dummy-token-proxy.js > > > Issuing distcp command at the secure cluster side, trying to copy stuff from > insecure cluster to secure cluster, and see the following problem: > {code} > hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://:/tmp > hdfs://:8020/tmp/tmptgt > 14/07/30 20:06:19 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, > sourcePaths=[webhdfs://:/tmp], > targetPath=hdfs://:8020/tmp/tmptgt, targetPathExists=true} > 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at > :8032 > 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property > 'ssl.client.truststore.location' has not been set, no TrustStore will be > loaded > 14/07/30 20:06:20 WARN security.UserGroupInformation: > PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) > cause:java.io.IOException: Failed to get the token for hadoopuser, > user=hadoopuser > 14/07/30 20:06:20 WARN security.UserGroupInformation: > PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) > cause:java.io.IOException: Failed to get the token for hadoopuser, > user=hadoopuser > 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered > java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218) > at > org.apache.hadoop.hdfs.web.WebH
[jira] [Updated] (HDFS-6831) Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help'
[ https://issues.apache.org/jira/browse/HDFS-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6831: Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 I committed this to trunk and branch-2. Thanks for the contribution [~xyao] and thanks [~ajisakaa] for reviewing. > Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help' > --- > > Key: HDFS-6831 > URL: https://issues.apache.org/jira/browse/HDFS-6831 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Akira AJISAKA >Assignee: Xiaoyu Yao >Priority: Minor > Labels: newbie > Fix For: 2.6.0 > > Attachments: HDFS-6831.0.patch, HDFS-6831.1.patch, HDFS-6831.2.patch, > HDFS-6831.3.patch, HDFS-6831.4.patch > > > There is an inconsistency between the console outputs of 'hdfs dfsadmin' > command and 'hdfs dfsadmin -help' command. > {code} > [root@trunk ~]# hdfs dfsadmin > Usage: java DFSAdmin > Note: Administrative commands can only be run as the HDFS superuser. >[-report] >[-safemode enter | leave | get | wait] >[-allowSnapshot ] >[-disallowSnapshot ] >[-saveNamespace] >[-rollEdits] >[-restoreFailedStorage true|false|check] >[-refreshNodes] >[-finalizeUpgrade] >[-rollingUpgrade []] >[-metasave filename] >[-refreshServiceAcl] >[-refreshUserToGroupsMappings] >[-refreshSuperUserGroupsConfiguration] >[-refreshCallQueue] >[-refresh] >[-printTopology] >[-refreshNamenodes datanodehost:port] >[-deleteBlockPool datanode-host:port blockpoolId [force]] >[-setQuota ...] >[-clrQuota ...] >[-setSpaceQuota ...] >[-clrSpaceQuota ...] >[-setBalancerBandwidth ] >[-fetchImage ] >[-shutdownDatanode [upgrade]] >[-getDatanodeInfo ] >[-help [cmd]] > {code} > {code} > [root@trunk ~]# hdfs dfsadmin -help > hadoop dfsadmin performs DFS administrative commands. > The full syntax is: > hadoop dfsadmin > [-report [-live] [-dead] [-decommissioning]] > [-safemode ] > [-saveNamespace] > [-rollEdits] > [-restoreFailedStorage true|false|check] > [-refreshNodes] > [-setQuota ...] > [-clrQuota ...] > [-setSpaceQuota ...] > [-clrSpaceQuota ...] > [-finalizeUpgrade] > [-rollingUpgrade []] > [-refreshServiceAcl] > [-refreshUserToGroupsMappings] > [-refreshSuperUserGroupsConfiguration] > [-refreshCallQueue] > [-refresh [arg1..argn] > [-printTopology] > [-refreshNamenodes datanodehost:port] > [-deleteBlockPool datanodehost:port blockpoolId [force]] > [-setBalancerBandwidth ] > [-fetchImage ] > [-allowSnapshot ] > [-disallowSnapshot ] > [-shutdownDatanode [upgrade]] > [-getDatanodeInfo > [-help [cmd] > {code} > These two outputs should be the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)