[jira] [Updated] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-6986:
-
   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Zhe Zhang. Committed to trunk and branch-2.

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
> Fix For: 2.6.0
>
> Attachments: HDFS-6986-20140905-v2.patch, 
> HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124342#comment-14124342
 ] 

Hadoop QA commented on HDFS-6981:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666983/HDFS-6981.06.patch
  against trunk revision e6420fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7928//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7928//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7928//console

This message is automatically generated.

> DN upgrade with layout version change should not use trash
> --
>
> Key: HDFS-6981
> URL: https://issues.apache.org/jira/browse/HDFS-6981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: James Thomas
>Assignee: Arpit Agarwal
> Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
> HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, HDFS-6981.06.patch
>
>
> Post HDFS-6800, we can encounter the following scenario:
> # We start with DN software version -55 and initiate a rolling upgrade to 
> version -56
> # We delete some blocks, and they are moved to trash
> # We roll back to DN software version -55 using the -rollback flag – since we 
> are running the old code (prior to this patch), we will restore the previous 
> directory but will not delete the trash
> # We append to some of the blocks that were deleted in step 2
> # We then restart a DN that contains blocks that were appended to – since the 
> trash still exists, it will be restored at this point, the appended-to blocks 
> will be overwritten, and we will lose the appended data
> So I think we need to avoid writing anything to the trash directory if we 
> have a previous directory.
> Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124335#comment-14124335
 ] 

Alejandro Abdelnur commented on HDFS-6986:
--

+1, test failure seems unrelated.

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
>     Attachments: HDFS-6986-20140905-v2.patch, 
> HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6943) Improve NN allocateBlock log to include replicas' datanode IPs

2014-09-05 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124322#comment-14124322
 ] 

Jing Zhao commented on HDFS-6943:
-

The patch looks good to me. +1.

bq. Maybe we should add test like TestDatanodeStorageInfo like TestContainerId.

This is a very good suggestion. But looks like a thorough test for 
DatanodeStorageInfo needs to cover multiple data fields. Thus I'm also fine if 
we do it in a separate jira.

> Improve NN allocateBlock log to include replicas' datanode IPs
> --
>
> Key: HDFS-6943
> URL: https://issues.apache.org/jira/browse/HDFS-6943
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6943.patch
>
>
> Datanode storage ID used to use IP and port. It has changed to use UUID. This 
> makes debugging harder when we want to understand which DNs are assigned when 
> DFSClient calls addBlock. For example,
> {noformat}
> BLOCK* allocateBlock: /foo. BP-1980237412-xx.xx.xxx.xxx-1408142057773 
> blk_1227779764_154043834{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-9479727b-24c5-4068-8703-dfb9a41c056c:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-abe7840c-1db8-4623-9da7-3aed6a28c4f4:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-956023f4-56a0-4c30-a148-b78c61cf764b:NORMAL|RBW]]}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-05 Thread Lei Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124291#comment-14124291
 ] 

Lei Chang commented on HDFS-3107:
-

Is it compatible with snapshot?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-05 Thread Lei Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124290#comment-14124290
 ] 

Lei Chang commented on HDFS-3107:
-

Is it compatible with snapshot?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6981) DN upgrade with layout version change should not use trash

2014-09-05 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6981:

Attachment: HDFS-6981.06.patch

Fix stub implementations in SimulatedFSDataset.

> DN upgrade with layout version change should not use trash
> --
>
> Key: HDFS-6981
> URL: https://issues.apache.org/jira/browse/HDFS-6981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: James Thomas
>Assignee: Arpit Agarwal
> Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
> HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, HDFS-6981.06.patch
>
>
> Post HDFS-6800, we can encounter the following scenario:
> # We start with DN software version -55 and initiate a rolling upgrade to 
> version -56
> # We delete some blocks, and they are moved to trash
> # We roll back to DN software version -55 using the -rollback flag – since we 
> are running the old code (prior to this patch), we will restore the previous 
> directory but will not delete the trash
> # We append to some of the blocks that were deleted in step 2
> # We then restart a DN that contains blocks that were appended to – since the 
> trash still exists, it will be restored at this point, the appended-to blocks 
> will be overwritten, and we will lose the appended data
> So I think we need to avoid writing anything to the trash directory if we 
> have a previous directory.
> Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6951) Saving namespace and restarting NameNode will remove existing encryption zones

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124275#comment-14124275
 ] 

Hadoop QA commented on HDFS-6951:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666978/HDFS-6951.005.patch
  against trunk revision e6420fe.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7927//console

This message is automatically generated.

> Saving namespace and restarting NameNode will remove existing encryption zones
> --
>
> Key: HDFS-6951
> URL: https://issues.apache.org/jira/browse/HDFS-6951
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Affects Versions: 3.0.0
>Reporter: Stephen Chu
>Assignee: Charles Lamb
> Attachments: HDFS-6951-prelim.002.patch, HDFS-6951-testrepo.patch, 
> HDFS-6951.001.patch, HDFS-6951.002.patch, HDFS-6951.003.patch, 
> HDFS-6951.004.patch, HDFS-6951.005.patch, editsStored
>
>
> Currently, when users save namespace and restart the NameNode, pre-existing 
> encryption zones will be wiped out.
> I could reproduce this on a pseudo-distributed cluster:
> * Create an encryption zone
> * List encryption zones and verify the newly created zone is present
> * Save the namespace
> * Kill and restart the NameNode
> * List the encryption zones and you'll find the encryption zone is missing
> I've attached a test case for {{TestEncryptionZones}} that reproduces this as 
> well. Removing the saveNamespace call will get the test to pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6951) Saving namespace and restarting NameNode will remove existing encryption zones

2014-09-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6951:
---
Attachment: HDFS-6951.005.patch

Hi [~andrew.wang],

Here's a rebased patch using --binary.

The reason it didn't apply is that the NNLayout Version got bumped to -58 by 
"creating file with overwrite", so this patch bumps it to -59.


> Saving namespace and restarting NameNode will remove existing encryption zones
> --
>
> Key: HDFS-6951
> URL: https://issues.apache.org/jira/browse/HDFS-6951
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Affects Versions: 3.0.0
>Reporter: Stephen Chu
>Assignee: Charles Lamb
> Attachments: HDFS-6951-prelim.002.patch, HDFS-6951-testrepo.patch, 
> HDFS-6951.001.patch, HDFS-6951.002.patch, HDFS-6951.003.patch, 
> HDFS-6951.004.patch, HDFS-6951.005.patch, editsStored
>
>
> Currently, when users save namespace and restart the NameNode, pre-existing 
> encryption zones will be wiped out.
> I could reproduce this on a pseudo-distributed cluster:
> * Create an encryption zone
> * List encryption zones and verify the newly created zone is present
> * Save the namespace
> * Kill and restart the NameNode
> * List the encryption zones and you'll find the encryption zone is missing
> I've attached a test case for {{TestEncryptionZones}} that reproduces this as 
> well. Removing the saveNamespace call will get the test to pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124267#comment-14124267
 ] 

Hadoop QA commented on HDFS-6898:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666935/HDFS-6898.06.patch
  against trunk revision 21c0cde.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.datanode.TestMultipleNNDataBlockScanner
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7924//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7924//console

This message is automatically generated.

> DN must reserve space for a full block when an RBW block is created
> ---
>
> Key: HDFS-6898
> URL: https://issues.apache.org/jira/browse/HDFS-6898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Gopal V
>Assignee: Arpit Agarwal
> Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, 
> HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch
>
>
> DN will successfully create two RBW blocks on the same volume even if the 
> free space is sufficient for just one full block.
> One or both block writers may subsequently get a DiskOutOfSpace exception. 
> This can be avoided by allocating space up front.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124266#comment-14124266
 ] 

Hadoop QA commented on HDFS-6940:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664262/HDFS-6940.patch
  against trunk revision 21c0cde.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7925//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7925//console

This message is automatically generated.

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124257#comment-14124257
 ] 

Hadoop QA commented on HDFS-7008:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666954/HDFS-7008.1.patch
  against trunk revision 21c0cde.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7926//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7926//console

This message is automatically generated.

> xlator should be closed upon exit from DFSAdmin#genericRefresh()
> 
>
> Key: HDFS-7008
> URL: https://issues.apache.org/jira/browse/HDFS-7008
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: HDFS-7008.1.patch
>
>
> {code}
> GenericRefreshProtocol xlator =
>   new GenericRefreshProtocolClientSideTranslatorPB(proxy);
> // Refresh
> Collection responses = xlator.refresh(identifier, args);
> {code}
> GenericRefreshProtocolClientSideTranslatorPB#close() should be called on 
> xlator before return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124258#comment-14124258
 ] 

Hadoop QA commented on HDFS-6981:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666923/HDFS-6981.05.patch
  against trunk revision 21c0cde.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestReplication
  org.apache.hadoop.hdfs.TestPread
  org.apache.hadoop.hdfs.TestSetrepIncreasing
  org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
  
org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer
  org.apache.hadoop.hdfs.server.balancer.TestBalancer
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
  org.apache.hadoop.hdfs.TestFileCreation
  
org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
  org.apache.hadoop.hdfs.TestSmallBlock
  org.apache.hadoop.hdfs.TestWriteBlockGetsBlockLengthHint
  org.apache.hadoop.hdfs.server.namenode.TestFileLimit
  org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7923//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7923//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7923//console

This message is automatically generated.

> DN upgrade with layout version change should not use trash
> --
>
> Key: HDFS-6981
> URL: https://issues.apache.org/jira/browse/HDFS-6981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: James Thomas
>Assignee: Arpit Agarwal
> Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
> HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch
>
>
> Post HDFS-6800, we can encounter the following scenario:
> # We start with DN software version -55 and initiate a rolling upgrade to 
> version -56
> # We delete some blocks, and they are moved to trash
> # We roll back to DN software version -55 using the -rollback flag – since we 
> are running the old code (prior to this patch), we will restore the previous 
> directory but will not delete the trash
> # We append to some of the blocks that were deleted in step 2
> # We then restart a DN that contains blocks that were appended to – since the 
> trash still exists, it will be restored at this point, the appended-to blocks 
> will be overwritten, and we will lose the appended data
> So I think we need to avoid writing anything to the trash directory if we 
> have a previous directory.
> Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-09-05 Thread Zhanwei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124252#comment-14124252
 ] 

Zhanwei Wang commented on HDFS-6994:


Hi [~wheat9]

I like your suggestion, I will put libhdfs3 into contrib directory first and 
make it is useful to everyone. I will separate the patch into sub tasks and 
make the review easy.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-09-05 Thread Zhanwei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124240#comment-14124240
 ] 

Zhanwei Wang commented on HDFS-6994:


Hi [~cmccabe]

Thanks very much for your comments.

Dynamically loading libjvm is a good idea, but it seems not solve all the 
problems you mentioned in HADOOP-10388. To make fall back feature work, users 
have to deploy the HDFS jars on every machine. This adds operational complexity 
for non-Java clients that just want to integrate with HDFS. Otherwise, fall 
back feature will not work. And fall back feature will finally be removed when 
the native client  implement the full HDFS client feature.

About the boost, your are right. Actually boost is not required if the C++ 
compiler is not too old. And I also think using boost can make libhdfs3 be 
useful for as many people as possible who use the old C++ compiler. But, yes, I 
should not require a very new boost version, it can be improved as well as 
other dependency issues.

So, the most important thing I think, is to figure out a way to integrate 
libhdfs3 and benefit other features in HADOOP-10388.

What is your opinion?





> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-09-05 Thread Zhanwei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123912#comment-14123912
 ] 

Zhanwei Wang commented on HDFS-6994:


Hi [~aw]

Naming is hard -_-

The binary name is libhdfs3.so.x.x.x

libhdfs3 is name, x.x.x is version

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-09-05 Thread Zhanwei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123905#comment-14123905
 ] 

Zhanwei Wang commented on HDFS-6994:


Hi [~nidmhbase]

Libhdfs3 provides C interface "hdfsGetFileBlockLocations" in hdfs.h and C++ 
interface "FileSystem::getFileBlockLocations".



> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

2014-09-05 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated HDFS-7008:
-
Attachment: HDFS-7008.1.patch

Thanks for your reporting, Ted. Attached a first patch to fix problem.

> xlator should be closed upon exit from DFSAdmin#genericRefresh()
> 
>
> Key: HDFS-7008
> URL: https://issues.apache.org/jira/browse/HDFS-7008
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: HDFS-7008.1.patch
>
>
> {code}
> GenericRefreshProtocol xlator =
>   new GenericRefreshProtocolClientSideTranslatorPB(proxy);
> // Refresh
> Collection responses = xlator.refresh(identifier, args);
> {code}
> GenericRefreshProtocolClientSideTranslatorPB#close() should be called on 
> xlator before return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

2014-09-05 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated HDFS-7008:
-
Status: Patch Available  (was: Open)

> xlator should be closed upon exit from DFSAdmin#genericRefresh()
> 
>
> Key: HDFS-7008
> URL: https://issues.apache.org/jira/browse/HDFS-7008
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: HDFS-7008.1.patch
>
>
> {code}
> GenericRefreshProtocol xlator =
>   new GenericRefreshProtocolClientSideTranslatorPB(proxy);
> // Refresh
> Collection responses = xlator.refresh(identifier, args);
> {code}
> GenericRefreshProtocolClientSideTranslatorPB#close() should be called on 
> xlator before return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6948) DN rejects blocks if it has older UC block

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123889#comment-14123889
 ] 

Hadoop QA commented on HDFS-6948:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12666906/HDFS-6948.201409052147.txt
  against trunk revision 21c0cde.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7921//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7921//console

This message is automatically generated.

> DN rejects blocks if it has older UC block
> --
>
> Key: HDFS-6948
> URL: https://issues.apache.org/jira/browse/HDFS-6948
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Eric Payne
> Attachments: HDFS-6948.201409052147.txt
>
>
> DNs appear to always reject blocks, even with newer genstamps, if it already 
> has a UC copy in its tmp dir.
> {noformat}ReplicaAlreadyExistsException: Block
> XXX already
> exists in state TEMPORARY and thus cannot be created{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7009) Active NN and standby NN have different live nodes

2014-09-05 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7009:
--
Summary: Active NN and standby NN have different live nodes  (was: Not 
enough retry during DN's initial handshake with NN)

> Active NN and standby NN have different live nodes
> --
>
> Key: HDFS-7009
> URL: https://issues.apache.org/jira/browse/HDFS-7009
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>
> To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
> cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
> fails, it isn't a big deal.
> However, there are cases where DN fails to register with NN during initial 
> handshake due to exceptions not covered by RPC client's connection retry. 
> When this happens, the DN won't talk to that NN until the DN restarts.
> {noformat}
> BPServiceActor
>   public void run() {
> LOG.info(this + " starting to offer service");
> try {
>   // init stuff
>   try {
> // setup storage
> connectToNNAndHandshake();
>   } catch (IOException ioe) {
> // Initial handshake, storage recovery or registration failed
> // End BPOfferService thread
> LOG.fatal("Initialization failed for block pool " + this, ioe);
> return;
>   }
>   initialized = true; // bp is initialized;
>   
>   while (shouldRun()) {
> try {
>   offerService();
> } catch (Exception ex) {
>   LOG.error("Exception in BPOfferService for " + this, ex);
>   sleepAndLogInterrupts(5000, "offering service");
> }
>   }
> ...
> {noformat}
> Here is an example of the call stack.
> {noformat}
> java.io.IOException: Failed on local exception: java.io.IOException: Response 
> is null.; Host Details : local host is: "xxx"; destination host is: 
> "yyy":8030;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
> at org.apache.hadoop.ipc.Client.call(Client.java:1239)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Response is null.
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
> {noformat}
> This will create discrepancy between active NN and standby NN in terms of 
> live nodes.
>  
> Here is a possible scenario of missing blocks after failover.
> 1. DN A, B set up handshakes with active NN, but not with standby NN.
> 2. A block is replicated to DN A, B and C.
> 3. From standby NN's point of view, given A and B are dead nodes, the block 
> is under replicated.
> 4. DN C is down.
> 5. Before active NN detects DN C is down, it fails over.
> 6. The new active NN considers the block is missing. Even though there are 
> two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7007) Interfaces to plugin ConsensusNode.

2014-09-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123884#comment-14123884
 ] 

Konstantin Shvachko commented on HDFS-7007:
---

Another 
[observation|https://issues.apache.org/jira/browse/HDFS-6940?focusedCommentId=14109691&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14109691],
 which [~atm] does, argues that subclassing of NameNode classes makes the 
implementation more fragile (if I understood it correctly).
There is one essential advantage of subclassing. I can create a completely 
different sub-project, recreate the package structure of the parent class and 
reuse methods classes of that package in the parent class project, without 
modifying the parent project. That way the parent project is completely 
independent of the new sub-project.
Example:
{code}
/hadoop-hdfs/
  org.apache.hadoop.hdfs.server.namenode.NameNode {}
/hadoop-cnode/
  org.apache.hadoop.hdfs.server.namenode.ConsensusNode implements NameNode 
{}
{code}
In this case you can modify and build hadoop-hdfs without taking into account 
hadoop-cnode. And deal with CNode only on the integration stage. I thought such 
separation is desirable.

> Interfaces to plugin ConsensusNode.
> ---
>
> Key: HDFS-7007
> URL: https://issues.apache.org/jira/browse/HDFS-7007
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>
> This is to introduce interfaces in NameNode and namesystem, which are needed 
> to plugin ConsensusNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123881#comment-14123881
 ] 

Hadoop QA commented on HDFS-6986:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12666902/HDFS-6986-20140905-v2.patch
  against trunk revision 21c0cde.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7920//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7920//console

This message is automatically generated.

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
>     Attachments: HDFS-6986-20140905-v2.patch, 
> HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6982) nntop: top­-like tool for name node users

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123880#comment-14123880
 ] 

Hadoop QA commented on HDFS-6982:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666898/HDFS-6982.v2.patch
  against trunk revision 21c0cde.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.hdfs.qjournal.server.TestJournalNode
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7919//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7919//console

This message is automatically generated.

> nntop: top­-like tool for name node users
> -
>
> Key: HDFS-6982
> URL: https://issues.apache.org/jira/browse/HDFS-6982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
> Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what 
> top does in Linux, gives the list of top users of the HDFS name node and 
> gives insight about which users are sending majority of each traffic type to 
> the name node. This information turns out to be the most critical when the 
> name node is under pressure and the HDFS admin needs to know which user is 
> hammering the name node and with what kind of requests. Here we present the 
> design of nntop which has been in production at Twitter in the past 10 
> months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K 
> nodes), low memory footprint (less than a few MB), and quite efficient for 
> the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

2014-09-05 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA reassigned HDFS-7008:


Assignee: Tsuyoshi OZAWA

> xlator should be closed upon exit from DFSAdmin#genericRefresh()
> 
>
> Key: HDFS-7008
> URL: https://issues.apache.org/jira/browse/HDFS-7008
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
>
> {code}
> GenericRefreshProtocol xlator =
>   new GenericRefreshProtocolClientSideTranslatorPB(proxy);
> // Refresh
> Collection responses = xlator.refresh(identifier, args);
> {code}
> GenericRefreshProtocolClientSideTranslatorPB#close() should be called on 
> xlator before return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-05 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123873#comment-14123873
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

bq. Sure, but by creating a plugin interface or something of that ilk we can 
precisely define the contract
I have a great idea [~atm] - let's in fact do everything as plugins! For 
example 2.4.0 release introduced 3 backward incompatible fixes that broke _at 
least_ two huge components in the downsteam. In fact, we are catching stuff 
like that in Bigtop all the time. I am sure it could've been avoided if we only 
we had a better plugin contracts for everything that depends on the Hadoop bits.

I think everyone should've figured out by now that being in the position of a 
base-layer puts a tremendous pressure on the development practices and 
architectural decisions. Changes in the Hadoop shouldn't be breaking user space 
(similarly to that of Linux kernel). Likewise, changes in a super class should 
not be breaking its children if the said super-class' contracts are well 
designed and implemented - that's a basic principle of OOP after all. By 
artificially limiting choices of the future consumers of a library instead of 
implementing accommodative APIs one doesn't build a better system. One'd simply 
be forcing downstream developers to hack-in or around those arbitrary 
limitations. And such development won't produce a well integrated stack. The 
evidences of it are plenty.

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7009) No enough retry during DN's initial handshake with NN

2014-09-05 Thread Ming Ma (JIRA)
Ming Ma created HDFS-7009:
-

 Summary: No enough retry during DN's initial handshake with NN
 Key: HDFS-7009
 URL: https://issues.apache.org/jira/browse/HDFS-7009
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma


To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most cases, 
given DN sends HB and BR to NN regularly, if a specific RPC call fails, it 
isn't a big deal.

However, there are cases where DN fails to register with NN during initial 
handshake due to exceptions not covered by RPC client's connection retry. When 
this happens, the DN won't talk to that NN until the DN restarts.

{noformat}
BPServiceActor

  public void run() {
LOG.info(this + " starting to offer service");

try {
  // init stuff
  try {
// setup storage
connectToNNAndHandshake();
  } catch (IOException ioe) {
// Initial handshake, storage recovery or registration failed
// End BPOfferService thread
LOG.fatal("Initialization failed for block pool " + this, ioe);
return;
  }

  initialized = true; // bp is initialized;
  
  while (shouldRun()) {
try {
  offerService();
} catch (Exception ex) {
  LOG.error("Exception in BPOfferService for " + this, ex);
  sleepAndLogInterrupts(5000, "offering service");
}
  }
...
{noformat}


Here is an example of the call stack.

{noformat}
java.io.IOException: Failed on local exception: java.io.IOException: Response 
is null.; Host Details : local host is: "xxx"; destination host is: "yyy":8030;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
at org.apache.hadoop.ipc.Client.call(Client.java:1239)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Response is null.
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
{noformat}

This will create discrepancy between active NN and standby NN in terms of live 
nodes.
 
Here is a possible scenario of missing blocks after failover.

1. DN A, B set up handshakes with active NN, but not with standby NN.
2. A block is replicated to DN A, B and C.
3. From standby NN's point of view, given A and B are dead nodes, the block is 
under replicated.
4. DN C is down.
5. Before active NN detects DN C is down, it fails over.
6. The new active NN considers the block is missing. Even though there are two 
replicas on DN A and B.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7009) Not enough retry during DN's initial handshake with NN

2014-09-05 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7009:
--
Summary: Not enough retry during DN's initial handshake with NN  (was: No 
enough retry during DN's initial handshake with NN)

> Not enough retry during DN's initial handshake with NN
> --
>
> Key: HDFS-7009
> URL: https://issues.apache.org/jira/browse/HDFS-7009
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>
> To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
> cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
> fails, it isn't a big deal.
> However, there are cases where DN fails to register with NN during initial 
> handshake due to exceptions not covered by RPC client's connection retry. 
> When this happens, the DN won't talk to that NN until the DN restarts.
> {noformat}
> BPServiceActor
>   public void run() {
> LOG.info(this + " starting to offer service");
> try {
>   // init stuff
>   try {
> // setup storage
> connectToNNAndHandshake();
>   } catch (IOException ioe) {
> // Initial handshake, storage recovery or registration failed
> // End BPOfferService thread
> LOG.fatal("Initialization failed for block pool " + this, ioe);
> return;
>   }
>   initialized = true; // bp is initialized;
>   
>   while (shouldRun()) {
> try {
>   offerService();
> } catch (Exception ex) {
>   LOG.error("Exception in BPOfferService for " + this, ex);
>   sleepAndLogInterrupts(5000, "offering service");
> }
>   }
> ...
> {noformat}
> Here is an example of the call stack.
> {noformat}
> java.io.IOException: Failed on local exception: java.io.IOException: Response 
> is null.; Host Details : local host is: "xxx"; destination host is: 
> "yyy":8030;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
> at org.apache.hadoop.ipc.Client.call(Client.java:1239)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Response is null.
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
> {noformat}
> This will create discrepancy between active NN and standby NN in terms of 
> live nodes.
>  
> Here is a possible scenario of missing blocks after failover.
> 1. DN A, B set up handshakes with active NN, but not with standby NN.
> 2. A block is replicated to DN A, B and C.
> 3. From standby NN's point of view, given A and B are dead nodes, the block 
> is under replicated.
> 4. DN C is down.
> 5. Before active NN detects DN C is down, it fails over.
> 6. The new active NN considers the block is missing. Even though there are 
> two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-09-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123864#comment-14123864
 ] 

Colin Patrick McCabe commented on HDFS-6994:


bq. Haohui wrote: Do you want to separate the patch into sub tasks so that it 
can go through the review process?

I agree.  Why don't you guys separate this into a few subtasks, and use the 
HADOOP-10388 branch as the target?

bq. Personally I think that this is an alternative implementation of libhdfs. I 
don't think we need to get rid of boost for now, but I think the code can be 
put in the contrib directory which is not built by default, but still allow 
other people to check it out if they're interested.

I think we should make this useful to as many people as possible.  That's the 
reason I made my comment about the possible boost dependency issue.  I looked 
at this a little more closely, though, and I see that the purpose of boost is 
to substitute for C\+\+11 features such as {{std::thread}}, in cases where the 
compiler is too old to provide them.  With that in mind, I think that it's ok 
for now.  I do think we should do the Jenkins build without boost, to make sure 
that the C\+\+11 code works.  C\+\+11 is clearly the future for C++ and we 
should be prepared for it.

I want to reiterate that we should have a way to switch between this new 
library and the existing libhdfs.  I'd be happy to work on that (I can 
re-purpose the existing code from HADOOP-10388 to do that) and it will expand 
the user-base big-time.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123858#comment-14123858
 ] 

Konstantin Shvachko commented on HDFS-6940:
---

Aaron, I created HDFS-7007 we can continue discussing interfaces there.
Do you have technical objections to the proposed patch? It is a regular 
practice to make refactoring on the trunk before creating a branch for a new 
feature in order ease merging. I am sure you are familiar with that.

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6982) nntop: top­-like tool for name node users

2014-09-05 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123852#comment-14123852
 ] 

Maysam Yabandeh commented on HDFS-6982:
---

Thanks [~wheat9].

bq. What are the minimal changes in the hadoop side to enable this 
functionality?
The minimal change is a couple of lines to register TopMetrics with the hadoop 
metrics system. 
{code}
DefaultMetricsSystem.initialize("nntop");
TopConfiguration conf = new TopConfiguration();
TopMetrics.initSingleton(conf, "processName", "sessionId",
TopUtil.getRequestedReportPeriods(conf));
{code}
Also a config change to register TopAuditLogger as the nn audit logger.

bq. Should rolling window reside in the NN?
The rolling window only provides light weight aggregation and this logic can 
also be in an external process as it was suggested in the second architecture 
in the design doc. To transfer the events from the nn to the rolling window 
residing in another process (or any other aggregation service) the second 
architecture benefits from already existing audit logs. We also have been using 
this approach at Twitter mostly to be reliable against the worst case scenarios 
and have the recent top users retrievable even if the name node is not 
responsive. The down side was the overhead of parsing the logs. Smaller 
clusters might also rather not having to maintain an additional process to have 
access to the top users.


> nntop: top­-like tool for name node users
> -
>
> Key: HDFS-6982
> URL: https://issues.apache.org/jira/browse/HDFS-6982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
> Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what 
> top does in Linux, gives the list of top users of the HDFS name node and 
> gives insight about which users are sending majority of each traffic type to 
> the name node. This information turns out to be the most critical when the 
> name node is under pressure and the HDFS admin needs to know which user is 
> hammering the name node and with what kind of requests. Here we present the 
> design of nntop which has been in production at Twitter in the past 10 
> months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K 
> nodes), low memory footprint (less than a few MB), and quite efficient for 
> the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7007) Interfaces to plugin ConsensusNode.

2014-09-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123844#comment-14123844
 ] 

Konstantin Shvachko commented on HDFS-7007:
---

[~sanjay.radia] 
[suggested|https://issues.apache.org/jira/browse/HDFS-6469?focusedCommentId=14111655&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14111655]
 to isolate plugin interfaces, which would make integration of ConsensusNode 
easier.

I see two types of interfaces.
One is CoordinationEngine inteface, which is introduced in HADOOP-10641. This 
one as is ready as an inteface.
Second, is an interface (or a sereas of them), which would allow to intercept a 
client RPC call, identify one that modifies the namespace, allow to submit that 
call for coordination, and then allow to call a namespace operation 
corresponding to the call.
I have an implementation that does all the above, but I haven't thought about 
it in terms of plugins. Any ideas, clarifications, examples are very much 
welcome.

> Interfaces to plugin ConsensusNode.
> ---
>
> Key: HDFS-7007
> URL: https://issues.apache.org/jira/browse/HDFS-7007
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>
> This is to introduce interfaces in NameNode and namesystem, which are needed 
> to plugin ConsensusNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

2014-09-05 Thread Ted Yu (JIRA)
Ted Yu created HDFS-7008:


 Summary: xlator should be closed upon exit from 
DFSAdmin#genericRefresh()
 Key: HDFS-7008
 URL: https://issues.apache.org/jira/browse/HDFS-7008
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
GenericRefreshProtocol xlator =
  new GenericRefreshProtocolClientSideTranslatorPB(proxy);

// Refresh
Collection responses = xlator.refresh(identifier, args);
{code}
GenericRefreshProtocolClientSideTranslatorPB#close() should be called on xlator 
before return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7007) Interfaces to plugin ConsensusNode.

2014-09-05 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created HDFS-7007:
-

 Summary: Interfaces to plugin ConsensusNode.
 Key: HDFS-7007
 URL: https://issues.apache.org/jira/browse/HDFS-7007
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko


This is to introduce interfaces in NameNode and namesystem, which are needed to 
plugin ConsensusNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123808#comment-14123808
 ] 

Aaron T. Myers commented on HDFS-6940:
--

bq. If you write an application, which depends on HDFS (or any other system), 
whether you subclass or encapsulate anything from HDFS you can break that 
application by making changes to HDFS. E.g. a change in getBlockLocations() can 
break Yarn or HBase. Same here.

Sure, but by creating a plugin interface or something of that ilk we can 
precisely define the contract, both for implementers of the interface and 
maintainers of the main system. By subclassing, you're making it more fragile.

Anyway, I'm fine if you want to proceed with this direction, but please only 
commit this to the branch, not to trunk. No reason this change needs to be on 
trunk instead of the branch for you to be able to make progress.

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-05 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-6940:
--
Status: Patch Available  (was: Open)

ATM> Not entirely sure what was unclear here.

Unclear, because you uses expressions like "somehow abstract", "some sort of 
interface", etc. without clarifying how or giving any examples, which is not 
constructive.
If you write an application, which depends on HDFS (or any other system), 
whether you subclass or encapsulate anything from HDFS you can break that 
application by making changes to HDFS. E.g. a change in getBlockLocations() can 
break Yarn or HBase. Same here.
I will create a new jira as discussed in HDFS-6469 to track possible plugin 
interfaces related to ConsensusNode, we can move this discussion there.
For this jira making it patch available to trigger Jenkins. The methods will 
need to be opened up which ever directions we take with interfaces.

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123796#comment-14123796
 ] 

Hadoop QA commented on HDFS-6986:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12666911/HDFS-6986-20140905-v3.patch
  against trunk revision 21c0cde.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestMetaSave

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7922//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7922//console

This message is automatically generated.

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
>     Attachments: HDFS-6986-20140905-v2.patch, 
> HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6898) DN must reserve space for a full block when an RBW block is created

2014-09-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6898:

Hadoop Flags: Reviewed

+1 for the patch.  Thanks again, Arpit.

> DN must reserve space for a full block when an RBW block is created
> ---
>
> Key: HDFS-6898
> URL: https://issues.apache.org/jira/browse/HDFS-6898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Gopal V
>Assignee: Arpit Agarwal
> Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, 
> HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch
>
>
> DN will successfully create two RBW blocks on the same volume even if the 
> free space is sufficient for just one full block.
> One or both block writers may subsequently get a DiskOutOfSpace exception. 
> This can be avoided by allocating space up front.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123783#comment-14123783
 ] 

Hadoop QA commented on HDFS-6986:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12666886/HDFS-6986-20140905.patch
  against trunk revision 0571b45.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7917//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7917//console

This message is automatically generated.

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
>     Attachments: HDFS-6986-20140905-v2.patch, 
> HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6898) DN must reserve space for a full block when an RBW block is created

2014-09-05 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6898:

Attachment: HDFS-6898.06.patch

Thanks for the review Chris. Updated patch attached.

> DN must reserve space for a full block when an RBW block is created
> ---
>
> Key: HDFS-6898
> URL: https://issues.apache.org/jira/browse/HDFS-6898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Gopal V
>Assignee: Arpit Agarwal
> Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, 
> HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch
>
>
> DN will successfully create two RBW blocks on the same volume even if the 
> free space is sufficient for just one full block.
> One or both block writers may subsequently get a DiskOutOfSpace exception. 
> This can be avoided by allocating space up front.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance

2014-09-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123767#comment-14123767
 ] 

Chris Nauroth commented on HDFS-6606:
-

Hi, [~hitliuyi].  Nice work!  This looks like it's fully compatible too with 
the recent work in HDFS-2856 to remove the requirement to run DataNode as root.

If I understand correctly, the {{DFSClient}} is still going to contact the 
NameNode to obtain an encryption key via 
{{ClientProtocol#getDataEncryptionKey}} when {{dfs.encrypt.data.transfer}} is 
true, but then the result wouldn't actually be used if a cipher is negotiated.  
It's a shame to keep around that extraneous RPC, but it's very small, and I 
don't see an easy way to change the code to avoid it.  Maybe we could queue 
this up for future consideration.

I'd just like to suggest a few more tests:
# {{TestSaslDataTransfer}}: A new test here would validate that it works with 
the HDFS-2856 style, setting {{dfs.data.transfer.protection}} instead of 
{{dfs.encrypt.data.transfer}}.
# {{TestBalancerWithEncryptedTransfer}}: A new test here would validate that 
everything works correctly end-to-end with the balancer.
# {{TestBalancerWithSaslDataTransfer}}: Same as #2, using the HDFS-2856 style 
with {{dfs.data.transfer.protection}} configured instead of 
{{dfs.encrypt.data.transfer}}.


> Optimize HDFS Encrypted Transport performance
> -
>
> Key: HDFS-6606
> URL: https://issues.apache.org/jira/browse/HDFS-6606
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client, security
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, 
> HDFS-6606.003.patch, OptimizeHdfsEncryptedTransportperformance.pdf
>
>
> In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, 
> it was a great work.
> It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf),  it supports 
> three security strength:
> * high  3des   or rc4 (128bits)
> * medium des or rc4(56bits)
> * low   rc4(40bits)
> 3des and rc4 are slow, only *tens of MB/s*, 
> http://www.javamex.com/tutorials/cryptography/ciphers.shtml
> http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/
> I will give more detailed performance data in future. Absolutely it’s 
> bottleneck and will vastly affect the end to end performance. 
> AES(Advanced Encryption Standard) is recommended as a replacement of DES, 
> it’s more secure; with AES-NI support, the throughput can reach nearly 
> *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is 
> supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add 
> a new mode support for AES). 
> This JIRA will use AES with AES-NI support as encryption algorithm for 
> DataTransferProtocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123764#comment-14123764
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

No, while under recovery the file has a lease so nobody can open it for append. 
Same as with lease recovery.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6727) Refresh data volumes on DataNode based on configuration changes

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123758#comment-14123758
 ] 

Hadoop QA commented on HDFS-6727:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666838/HDFS-6727.002.patch
  against trunk revision 71269f7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.tracing.TestTracing
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7915//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7915//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7915//console

This message is automatically generated.

> Refresh data volumes on DataNode based on configuration changes
> ---
>
> Key: HDFS-6727
> URL: https://issues.apache.org/jira/browse/HDFS-6727
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>  Labels: datanode
> Attachments: HDFS-6727.000.delta-HDFS-6775.txt, HDFS-6727.001.patch, 
> HDFS-6727.002.patch, HDFS-6727.combo.patch
>
>
> HDFS-1362 requires DataNode to reload configuration file during the runtime, 
> so that DN can change the data volumes dynamically. This JIRA reuses the 
> reconfiguration framework introduced by HADOOP-7001 to enable DN to 
> reconfigure at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6982) nntop: top­-like tool for name node users

2014-09-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123750#comment-14123750
 ] 

Haohui Mai commented on HDFS-6982:
--

This is a nice feature, thanks for [~maysamyabandeh]!

I have a couple questions:

# What are the minimal changes in the hadoop side to enable this functionality? 
If nntop goes for the second architecture, does it mean that there is no code 
changes required in the hadoop side?
# Should rolling window reside in the NN? I wonder whether the code should 
simply publish the metrics to Ganglia / Nagios, etc., and let these framework 
take care of the aggregation, plotting, etc.



> nntop: top­-like tool for name node users
> -
>
> Key: HDFS-6982
> URL: https://issues.apache.org/jira/browse/HDFS-6982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
> Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what 
> top does in Linux, gives the list of top users of the HDFS name node and 
> gives insight about which users are sending majority of each traffic type to 
> the name node. This information turns out to be the most critical when the 
> name node is under pressure and the HDFS admin needs to know which user is 
> hammering the name node and with what kind of requests. Here we present the 
> design of nntop which has been in production at Twitter in the past 10 
> months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K 
> nodes), low memory footprint (less than a few MB), and quite efficient for 
> the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6877) Interrupt writes when the volume being written is removed.

2014-09-05 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-6877:

Attachment: HDFS-6877.001.combo.txt

> Interrupt writes when the volume being written is removed.
> --
>
> Key: HDFS-6877
> URL: https://issues.apache.org/jira/browse/HDFS-6877
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-6877.000.consolidate.txt, 
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
> HDFS-6877.001.patch
>
>
> It will be a race condition that a client is actively writing a block, while 
> the volume that this block is on is being removed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6877) Interrupt writes when the volume being written is removed.

2014-09-05 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-6877:

Attachment: HDFS-6877.001.patch

Update patch to:

* Fix the order of deleting DatanodeStorage from {{FsDatasetImpl#storageMap}}.
* Add timeout for functional tests.

> Interrupt writes when the volume being written is removed.
> --
>
> Key: HDFS-6877
> URL: https://issues.apache.org/jira/browse/HDFS-6877
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-6877.000.consolidate.txt, 
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.patch
>
>
> It will be a race condition that a client is actively writing a block, while 
> the volume that this block is on is being removed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6981) DN upgrade with layout version change should not use trash

2014-09-05 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6981:

Attachment: HDFS-6981.05.patch

Updated patch with a marker file for each BlockPoolSliceStorage root when 
rolling upgrade is in progress. The presence of the marker file is used to 
determine whether or not to delete the 'previous' directory when the rolling 
upgrade is no longer in progress.

> DN upgrade with layout version change should not use trash
> --
>
> Key: HDFS-6981
> URL: https://issues.apache.org/jira/browse/HDFS-6981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: James Thomas
>Assignee: Arpit Agarwal
> Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
> HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch
>
>
> Post HDFS-6800, we can encounter the following scenario:
> # We start with DN software version -55 and initiate a rolling upgrade to 
> version -56
> # We delete some blocks, and they are moved to trash
> # We roll back to DN software version -55 using the -rollback flag – since we 
> are running the old code (prior to this patch), we will restore the previous 
> directory but will not delete the trash
> # We append to some of the blocks that were deleted in step 2
> # We then restart a DN that contains blocks that were appended to – since the 
> trash still exists, it will be restored at this point, the appended-to blocks 
> will be overwritten, and we will lose the appended data
> So I think we need to avoid writing anything to the trash directory if we 
> have a previous directory.
> Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-09-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123718#comment-14123718
 ] 

Yongjun Zhang commented on HDFS-4239:
-

Thanks Jimmy.



> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Yongjun Zhang
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-09-05 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-4239:
--
Assignee: Yongjun Zhang

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Yongjun Zhang
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-09-05 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123715#comment-14123715
 ] 

Jimmy Xiang commented on HDFS-4239:
---

Sure. Assigned it to you.

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Yongjun Zhang
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6951) Saving namespace and restarting NameNode will remove existing encryption zones

2014-09-05 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123703#comment-14123703
 ] 

Andrew Wang commented on HDFS-6951:
---

Charles, do you mind rebasing this? It doesn't apply for me:

{noformat}
-> % git apply -p0 HDFS-6951.004.patch 
error: patch failed: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeLayoutVersion.java:65
error: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeLayoutVersion.java:
 patch does not apply
error: cannot apply binary patch to 
'hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored' without full 
index line
error: hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored: patch 
does not apply
error: patch failed: 
hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml:1
error: hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml: 
patch does not apply
{noformat}

If you provide a {{git diff --binary}}, I can also apply that directly when 
doing the commit.

> Saving namespace and restarting NameNode will remove existing encryption zones
> --
>
> Key: HDFS-6951
> URL: https://issues.apache.org/jira/browse/HDFS-6951
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Affects Versions: 3.0.0
>Reporter: Stephen Chu
>Assignee: Charles Lamb
> Attachments: HDFS-6951-prelim.002.patch, HDFS-6951-testrepo.patch, 
> HDFS-6951.001.patch, HDFS-6951.002.patch, HDFS-6951.003.patch, 
> HDFS-6951.004.patch, editsStored
>
>
> Currently, when users save namespace and restart the NameNode, pre-existing 
> encryption zones will be wiped out.
> I could reproduce this on a pseudo-distributed cluster:
> * Create an encryption zone
> * List encryption zones and verify the newly created zone is present
> * Save the namespace
> * Kill and restart the NameNode
> * List the encryption zones and you'll find the encryption zone is missing
> I've attached a test case for {{TestEncryptionZones}} that reproduces this as 
> well. Removing the saveNamespace call will get the test to pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-09-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123701#comment-14123701
 ] 

Yongjun Zhang commented on HDFS-4239:
-

HI [~jxiang], thanks for your earlier work on this issue. I wonder if you will 
have time to work on this? if not, do you mind I take it over? Thanks.


> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-09-05 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-4239:
--
Status: Open  (was: Patch Available)

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4284) BlockReaderLocal not notified of failed disks

2014-09-05 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-4284:
--
Assignee: (was: Jimmy Xiang)

> BlockReaderLocal not notified of failed disks
> -
>
> Key: HDFS-4284
> URL: https://issues.apache.org/jira/browse/HDFS-4284
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.0.2-alpha
>Reporter: Andy Isaacson
>
> When a DN marks a disk as bad, it stops using replicas on that disk.
> However a long-running {{BlockReaderLocal}} instance will continue to access 
> replicas on the failing disk.
> Somehow we should let the in-client BlockReaderLocal know that a disk has 
> been marked as bad so that it can stop reading from the bad disk.
> From HDFS-4239:
> bq. To rephrase that, a long running BlockReaderLocal will ride over local DN 
> restarts and disk "ejections". We had to drain the RS of all its regions in 
> order to stop it from using the bad disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-09-05 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-4239:
--
Assignee: (was: Jimmy Xiang)

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer

2014-09-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123694#comment-14123694
 ] 

Chris Nauroth commented on HDFS-6506:
-

Unfortunately, it appears this patch has gone stale.  [~decster], would you 
mind updating the patch?  [~djp], would you mind +1'ing a new patch quickly if 
you don't have any other feedback?  I'm happy to take care of the commit if 
you're busy.  It would be nice to get this in and hopefully put an end to the 
spurious failures in the balancer tests.  Thanks!

> Newly moved block replica been invalidated and deleted in TestBalancer
> --
>
> Key: HDFS-6506
> URL: https://issues.apache.org/jira/browse/HDFS-6506
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch
>
>
> TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
> https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
> from the error log, the reason seems to be that newly moved block replicas 
> been invalidated and deleted, so some work of the balancer are reversed.
> {noformat}
> 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
> - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
> to 127.0.0.1:55468 through 127.0.0.1:49159
> 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
> - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
> to 127.0.0.1:55468 through 127.0.0.1:49159
> 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
> - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
> to 127.0.0.1:55468 through 127.0.0.1:49159
> 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
> - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
> to 127.0.0.1:55468 through 127.0.0.1:49159
> 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
> - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
> to 127.0.0.1:55468 through 127.0.0.1:49159
> 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
> - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
> to 127.0.0.1:55468 through 127.0.0.1:49159
> 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
> - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
> to 127.0.0.1:55468 through 127.0.0.1:49159
> 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
> - Successfully moved blk_1073741829_1005 with size=100 fr
> 2014-06-06 18:15:54,706 INFO  BlockStateChange 
> (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
> chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
> invalidated blocks set
> 2014-06-06 18:15:54,709 INFO  BlockStateChange 
> (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
> chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
> invalidated blocks set
> 2014-06-06 18:15:56,421 INFO  BlockStateChange 
> (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
> 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
> 2014-06-06 18:15:57,717 INFO  BlockStateChange 
> (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
> chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
> invalidated blocks set
> 2014-06-06 18:15:57,720 INFO  BlockStateChange 
> (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
> chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
> invalidated blocks set
> 2014-06-06 18:15:57,721 INFO  BlockStateChange 
> (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
> chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
> invalidated blocks set
> 2014-06-06 18:15:57,722 INFO  BlockStateChange 
> (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
> chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
> invalidated blocks set
> 2014-06-06 18:15:57,723 INFO  BlockStateChange 
> (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
> chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
> invalidated blocks set
> 2014-06-06 18:15:59,422 INFO  BlockStateChange 
> (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
> 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
> blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
> 2014-06-06 18:16:02,423 INFO  BlockStateChange 
> (BlockManager.java:invalidat

[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-09-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123683#comment-14123683
 ] 

Haohui Mai commented on HDFS-6994:
--

Thanks for posting the patch. It looks interesting.

Do you want to separate the patch into sub tasks so that it can go through the 
review process?

Personally I think that this is an alternative implementation of libhdfs. I 
don't think we need to get rid of boost for now, but I think the code can be 
put in the contrib directory which is not built by default, but still allow 
other people to check it out if they're interested.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop

2014-09-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123678#comment-14123678
 ] 

stack commented on HDFS-6999:
-

Any chance of your having the particular combination that brings on the 
infinite loop [~yangjiandan]?  Can you reproduce at all?  Thanks.

> PacketReceiver#readChannelFully is in an infinite loop
> --
>
> Key: HDFS-6999
> URL: https://issues.apache.org/jira/browse/HDFS-6999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.4.1
>Reporter: Yang Jiandan
>Priority: Critical
>
> In our cluster, we found hbase handler may be never return when it reads hdfs 
> file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo 
> found this is because PacketReceiver#readChannelFully is in an infinite loop. 
> the following while never break.
> {code:xml}
> while (buf.remaining() > 0) {
>   int n = ch.read(buf);
>   if (n < 0) {
> throw new IOException("Premature EOF reading from " + ch);
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created

2014-09-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123676#comment-14123676
 ] 

Chris Nauroth commented on HDFS-6898:
-

Hi, [~arpitagarwal].  The patch looks great.  I have just one comment.  In 
{{FsVolumeImpl#releaseReservedSpace}}, the failsafe logic could be subject to a 
data race.  If the {{addAndGet}} results in a negative value, and then another 
thread calls {{reserveSpaceForRbw}} before the reset to 0 executes, then we'd 
lose that second thread's reservation.  Another approach might be to use a loop 
that calculates the new value (or 0) and makes a single call to 
{{compareAndSet}}, repeating until successful.

> DN must reserve space for a full block when an RBW block is created
> ---
>
> Key: HDFS-6898
> URL: https://issues.apache.org/jira/browse/HDFS-6898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Gopal V
>Assignee: Arpit Agarwal
> Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, 
> HDFS-6898.04.patch, HDFS-6898.05.patch
>
>
> DN will successfully create two RBW blocks on the same volume even if the 
> free space is sufficient for just one full block.
> One or both block writers may subsequently get a DiskOutOfSpace exception. 
> This can be avoided by allocating space up front.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6986:

Attachment: HDFS-6986-20140905-v3.patch

Comparing 2 token objects directly instead of comparing their identifiers, for 
stronger verification.

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
>     Attachments: HDFS-6986-20140905-v2.patch, 
> HDFS-6986-20140905-v3.patch, HDFS-6986-20140905.patch, HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6948) DN rejects blocks if it has older UC block

2014-09-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-6948:
-
Status: Patch Available  (was: Open)

> DN rejects blocks if it has older UC block
> --
>
> Key: HDFS-6948
> URL: https://issues.apache.org/jira/browse/HDFS-6948
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Eric Payne
> Attachments: HDFS-6948.201409052147.txt
>
>
> DNs appear to always reject blocks, even with newer genstamps, if it already 
> has a UC copy in its tmp dir.
> {noformat}ReplicaAlreadyExistsException: Block
> XXX already
> exists in state TEMPORARY and thus cannot be created{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6948) DN rejects blocks if it has older UC block

2014-09-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-6948:
-
Attachment: HDFS-6948.201409052147.txt

> DN rejects blocks if it has older UC block
> --
>
> Key: HDFS-6948
> URL: https://issues.apache.org/jira/browse/HDFS-6948
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Eric Payne
> Attachments: HDFS-6948.201409052147.txt
>
>
> DNs appear to always reject blocks, even with newer genstamps, if it already 
> has a UC copy in its tmp dir.
> {noformat}ReplicaAlreadyExistsException: Block
> XXX already
> exists in state TEMPORARY and thus cannot be created{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6986:

Attachment: HDFS-6986-20140905-v2.patch

Stronger test case in the new patch. Thanks [~tucu00] for the suggestion.

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
>     Attachments: HDFS-6986-20140905-v2.patch, HDFS-6986-20140905.patch, 
> HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-09-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123627#comment-14123627
 ] 

Colin Patrick McCabe commented on HDFS-6994:


Hi Zhanwei, this is really interesting.  As Wenwu mentioned, there's already a 
branch where we're working on a native client.  It would be nice if we could 
integrate this with that work somehow.  I'm not sure what form that should take.

Did you get a chance to read the design doc on HADOOP-10388?  There are a few 
issues important here before this can replace libhdfs.  We need the ability to 
fall back to the JNI code when necessary-- for example, in the case where HDFS 
is using encryption, and we don't have native client support for that.  But we 
don't want a hard library dependency on libjvm.so-- it should be dynamically 
loaded.

It's good that you are using the existing hdfs.h interface.  getLastError seems 
like it could be a good addition, as well... when using thread-local data for 
the string.

The dependencies here are problematic.  libxml2 is not fully thread-safe, and 
it pulls in a lot of GNOME stuff we don't really want.  The boost dependency 
creates problems as well.  For example, Impala depends on a certain version of 
boost-- if this library pulls in a different version, bad things happen.  
GnuTls is LGPL, which makes it difficult to ship.  I would have to be -1 just 
based on the dependencies alone...

We also have duplicated protobuf files in this patch.  We should simply use the 
protobuf files in the source tree already.

If I could summarize my first thoughts:
* get rid of boost, including all boost ifdefs
* don't worry about earlier RPC versions... we only need to support RPCv9 now 
(same as Java client code policy in Hadoop)
* use libexpat or something instead of libxml2

This is good work overall and hopefully there is stuff we can use here.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations

2014-09-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123618#comment-14123618
 ] 

Yongjun Zhang commented on HDFS-6621:
-

Thanks [~andrew.wang].

HI [~ravwojdyla], couple of more questions, you said that the old code {{will 
notify all scheduling threads, even the ones that are waiting and still have 
all 5 transfer threads occupied}}.  Would you please explain how your fix  of 
problem 2 detects the scheduling threads that still have 5 transfer threads 
occupied so not to notify them?

BTW, have you tried to test with the fix of problem 1 only? Or you have to 
apply fixes for both problem 1 and 2 to see it works?

Thanks.


> Hadoop Balancer prematurely exits iterations
> 
>
> Key: HDFS-6621
> URL: https://issues.apache.org/jira/browse/HDFS-6621
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0, 2.4.0
> Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
> 2.4.0
>Reporter: Benjamin Bowman
>  Labels: balancer
> Attachments: HDFS-6621.patch, HDFS-6621.patch_2
>
>
> I have been having an issue with the balancing being too slow.  The issue was 
> not with the speed with which blocks were moved, but rather the balancer 
> would prematurely exit out of it's balancing iterations.  It would move ~10 
> blocks or 100 MB then exit the current iteration (in which it said it was 
> planning on moving about 10 GB). 
> I looked in the Balancer.java code and believe I found and solved the issue.  
> In the dispatchBlocks() function there is a variable, 
> "noPendingBlockIteration", which counts the number of iterations in which a 
> pending block to move cannot be found.  Once this number gets to 5, the 
> balancer exits the overall balancing iteration.  I believe the desired 
> functionality is 5 consecutive no pending block iterations - however this 
> variable is never reset to 0 upon block moves.  So once this number reaches 5 
> - even if there have been thousands of blocks moved in between these no 
> pending block iterations  - the overall balancing iteration will prematurely 
> end.  
> The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
> is found and scheduled.  In this way, my iterations do not prematurely exit 
> unless there is 5 consecutive no pending block iterations.   Below is a copy 
> of my dispatchBlocks() function with the change I made.
> {code}
> private void dispatchBlocks() {
>   long startTime = Time.now();
>   long scheduledSize = getScheduledSize();
>   this.blocksToReceive = 2*scheduledSize;
>   boolean isTimeUp = false;
>   int noPendingBlockIteration = 0;
>   while(!isTimeUp && getScheduledSize()>0 &&
>   (!srcBlockList.isEmpty() || blocksToReceive>0)) {
> PendingBlockMove pendingBlock = chooseNextBlockToMove();
> if (pendingBlock != null) {
>   noPendingBlockIteration = 0;
>   // move the block
>   pendingBlock.scheduleBlockMove();
>   continue;
> }
> /* Since we can not schedule any block to move,
>  * filter any moved blocks from the source block list and
>  * check if we should fetch more blocks from the namenode
>  */
> filterMovedBlocks(); // filter already moved blocks
> if (shouldFetchMoreBlocks()) {
>   // fetch new blocks
>   try {
> blocksToReceive -= getBlockList();
> continue;
>   } catch (IOException e) {
> LOG.warn("Exception while getting block list", e);
> return;
>   }
> } else {
>   // source node cannot find a pendingBlockToMove, iteration +1
>   noPendingBlockIteration++;
>   // in case no blocks can be moved for source node's task,
>   // jump out of while-loop after 5 iterations.
>   if (noPendingBlockIteration >= MAX_NO_PENDING_BLOCK_ITERATIONS) {
> setScheduledSize(0);
>   }
> }
> // check if time is up or not
> if (Time.now()-startTime > MAX_ITERATION_TIME) {
>   isTimeUp = true;
>   continue;
> }
> /* Now we can not schedule any block to move and there are
>  * no new blocks added to the source block list, so we wait.
>  */
> try {
>   synchronized(Balancer.this) {
> Balancer.this.wait(1000);  // wait for targets/sources to be idle
>   }
> } catch (InterruptedException ignored) {
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6982) nntop: top­-like tool for name node users

2014-09-05 Thread Maysam Yabandeh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maysam Yabandeh updated HDFS-6982:
--
Attachment: HDFS-6982.v2.patch

> nntop: top­-like tool for name node users
> -
>
> Key: HDFS-6982
> URL: https://issues.apache.org/jira/browse/HDFS-6982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
> Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what 
> top does in Linux, gives the list of top users of the HDFS name node and 
> gives insight about which users are sending majority of each traffic type to 
> the name node. This information turns out to be the most critical when the 
> name node is under pressure and the HDFS admin needs to know which user is 
> hammering the name node and with what kind of requests. Here we present the 
> design of nntop which has been in production at Twitter in the past 10 
> months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K 
> nodes), low memory footprint (less than a few MB), and quite efficient for 
> the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-6584) Support Archival Storage

2014-09-05 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123610#comment-14123610
 ] 

Jing Zhao edited comment on HDFS-6584 at 9/5/14 9:27 PM:
-

Upload a consolidated patch to run Jenkins.


was (Author: jingzhao):
Upload a consolidated patch to trigger the Jenkins.

> Support Archival Storage
> 
>
> Key: HDFS-6584
> URL: https://issues.apache.org/jira/browse/HDFS-6584
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer, namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-6584.000.patch, 
> HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf
>
>
> In most of the Hadoop clusters, as more and more data is stored for longer 
> time, the demand for storage is outstripping the compute. Hadoop needs a cost 
> effective and easy to manage solution to meet this demand for storage. 
> Current solution is:
> - Delete the old unused data. This comes at operational cost of identifying 
> unnecessary data and deleting them manually.
> - Add more nodes to the clusters. This adds along with storage capacity 
> unnecessary compute capacity to the cluster.
> Hadoop needs a solution to decouple growing storage capacity from compute 
> capacity. Nodes with higher density and less expensive storage with low 
> compute power are becoming available and can be used as cold storage in the 
> clusters. Based on policy the data from hot storage can be moved to cold 
> storage. Adding more nodes to the cold storage can grow the storage 
> independent of the compute capacity in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6584) Support Archival Storage

2014-09-05 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6584:

Status: Patch Available  (was: Open)

> Support Archival Storage
> 
>
> Key: HDFS-6584
> URL: https://issues.apache.org/jira/browse/HDFS-6584
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer, namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-6584.000.patch, 
> HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf
>
>
> In most of the Hadoop clusters, as more and more data is stored for longer 
> time, the demand for storage is outstripping the compute. Hadoop needs a cost 
> effective and easy to manage solution to meet this demand for storage. 
> Current solution is:
> - Delete the old unused data. This comes at operational cost of identifying 
> unnecessary data and deleting them manually.
> - Add more nodes to the clusters. This adds along with storage capacity 
> unnecessary compute capacity to the cluster.
> Hadoop needs a solution to decouple growing storage capacity from compute 
> capacity. Nodes with higher density and less expensive storage with low 
> compute power are becoming available and can be used as cold storage in the 
> clusters. Based on policy the data from hot storage can be moved to cold 
> storage. Adding more nodes to the cold storage can grow the storage 
> independent of the compute capacity in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6584) Support Archival Storage

2014-09-05 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6584:

Attachment: HDFS-6584.000.patch

Upload a consolidated patch to trigger the Jenkins.

> Support Archival Storage
> 
>
> Key: HDFS-6584
> URL: https://issues.apache.org/jira/browse/HDFS-6584
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer, namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-6584.000.patch, 
> HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf
>
>
> In most of the Hadoop clusters, as more and more data is stored for longer 
> time, the demand for storage is outstripping the compute. Hadoop needs a cost 
> effective and easy to manage solution to meet this demand for storage. 
> Current solution is:
> - Delete the old unused data. This comes at operational cost of identifying 
> unnecessary data and deleting them manually.
> - Add more nodes to the clusters. This adds along with storage capacity 
> unnecessary compute capacity to the cluster.
> Hadoop needs a solution to decouple growing storage capacity from compute 
> capacity. Nodes with higher density and less expensive storage with low 
> compute power are becoming available and can be used as cold storage in the 
> clusters. Based on policy the data from hot storage can be moved to cold 
> storage. Adding more nodes to the cold storage can grow the storage 
> independent of the compute capacity in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-05 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123604#comment-14123604
 ] 

Jing Zhao commented on HDFS-3107:
-

While the file remains in the under_recovery state while truncation, can the 
file still be appended?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-3107) HDFS truncate

2014-09-05 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reassigned HDFS-3107:
-

Assignee: Plamen Jeliazkov

Nicholas in [his 
comment|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=13235941&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13235941]
 proposed three approaches to implement truncate. Here is another one, which 
was mentioned in [this 
comment|https://issues.apache.org/jira/browse/HDFS-6087?focusedCommentId=13948814&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13948814]
 of HDFS-6087.
Conceptually, truncate removes all full blocks and then starts a recovery 
process for the last block which is not fully truncated. The truncate recovery 
is similar to lease recovery. That is, NN sends truncate-DatanodeCommand to one 
of the DNs containing block replicas. The primary DN synchronizes the new 
length between replicas, and then sends commitBlockSynchronization() to NN, 
which completes the truncate.
Truncate will work only for closed files. If the file is opened for write an 
attempt to truncate fails.

Here are the truncate steps in more details:
- NN receives a truncate(src, newLength) call from a client.
- Full blocks are deleted instantaneously. And if there is nothing more to 
truncate NN returns success to the client.
- If not on the block boundary, then NN converts INode to 
INodeUnderConstruction and set file length to newLength.
- The last blocks state is set to BEING_TRUNCATED.
- Truncate operation is persisted in editLog.
- NN triggers last block length recovery by sending DatanodeCommand and waits 
for the DN to report back.
- File remains UNDER_RECOVERY until the recovery completes.
- Lease expiration (soft or hard) will trigger last block recovery for truncate.
- If NN restarts it will restart the recovery

Assigning to Plamen, he seems to be almost ready with the patch.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123578#comment-14123578
 ] 

Alejandro Abdelnur commented on HDFS-6986:
--

can we do the test stronger, asserting we are getting the token set by the mock?

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
>     Attachments: HDFS-6986-20140905.patch, HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations

2014-09-05 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123557#comment-14123557
 ] 

Andrew Wang commented on HDFS-6621:
---

I took a quick look at this, and I'm wondering about the change to 
{{notifyAll}} on a Source rather than Dispatcher. I don't see anything waiting 
on a Source, so this change essentially makes it into a no-op. I haven't looked 
into the Balancer deeply enough to figure out the right change though, so maybe 
someone else can comment.

> Hadoop Balancer prematurely exits iterations
> 
>
> Key: HDFS-6621
> URL: https://issues.apache.org/jira/browse/HDFS-6621
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0, 2.4.0
> Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
> 2.4.0
>Reporter: Benjamin Bowman
>  Labels: balancer
> Attachments: HDFS-6621.patch, HDFS-6621.patch_2
>
>
> I have been having an issue with the balancing being too slow.  The issue was 
> not with the speed with which blocks were moved, but rather the balancer 
> would prematurely exit out of it's balancing iterations.  It would move ~10 
> blocks or 100 MB then exit the current iteration (in which it said it was 
> planning on moving about 10 GB). 
> I looked in the Balancer.java code and believe I found and solved the issue.  
> In the dispatchBlocks() function there is a variable, 
> "noPendingBlockIteration", which counts the number of iterations in which a 
> pending block to move cannot be found.  Once this number gets to 5, the 
> balancer exits the overall balancing iteration.  I believe the desired 
> functionality is 5 consecutive no pending block iterations - however this 
> variable is never reset to 0 upon block moves.  So once this number reaches 5 
> - even if there have been thousands of blocks moved in between these no 
> pending block iterations  - the overall balancing iteration will prematurely 
> end.  
> The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
> is found and scheduled.  In this way, my iterations do not prematurely exit 
> unless there is 5 consecutive no pending block iterations.   Below is a copy 
> of my dispatchBlocks() function with the change I made.
> {code}
> private void dispatchBlocks() {
>   long startTime = Time.now();
>   long scheduledSize = getScheduledSize();
>   this.blocksToReceive = 2*scheduledSize;
>   boolean isTimeUp = false;
>   int noPendingBlockIteration = 0;
>   while(!isTimeUp && getScheduledSize()>0 &&
>   (!srcBlockList.isEmpty() || blocksToReceive>0)) {
> PendingBlockMove pendingBlock = chooseNextBlockToMove();
> if (pendingBlock != null) {
>   noPendingBlockIteration = 0;
>   // move the block
>   pendingBlock.scheduleBlockMove();
>   continue;
> }
> /* Since we can not schedule any block to move,
>  * filter any moved blocks from the source block list and
>  * check if we should fetch more blocks from the namenode
>  */
> filterMovedBlocks(); // filter already moved blocks
> if (shouldFetchMoreBlocks()) {
>   // fetch new blocks
>   try {
> blocksToReceive -= getBlockList();
> continue;
>   } catch (IOException e) {
> LOG.warn("Exception while getting block list", e);
> return;
>   }
> } else {
>   // source node cannot find a pendingBlockToMove, iteration +1
>   noPendingBlockIteration++;
>   // in case no blocks can be moved for source node's task,
>   // jump out of while-loop after 5 iterations.
>   if (noPendingBlockIteration >= MAX_NO_PENDING_BLOCK_ITERATIONS) {
> setScheduledSize(0);
>   }
> }
> // check if time is up or not
> if (Time.now()-startTime > MAX_ITERATION_TIME) {
>   isTimeUp = true;
>   continue;
> }
> /* Now we can not schedule any block to move and there are
>  * no new blocks added to the source block list, so we wait.
>  */
> try {
>   synchronized(Balancer.this) {
> Balancer.this.wait(1000);  // wait for targets/sources to be idle
>   }
> } catch (InterruptedException ignored) {
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7001) Tests in TestTracing depends on the order of execution

2014-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123552#comment-14123552
 ] 

Hadoop QA commented on HDFS-7001:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666871/HDFS-7001-0.patch
  against trunk revision 7a62515.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7916//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7916//console

This message is automatically generated.

> Tests in TestTracing depends on the order of execution
> --
>
> Key: HDFS-7001
> URL: https://issues.apache.org/jira/browse/HDFS-7001
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7001-0.patch
>
>
> o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed 
> first. It should be done in BeforeClass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123530#comment-14123530
 ] 

Zhe Zhang commented on HDFS-6986:
-

The test case in the new patch mocks a key provider which returns an empty 
token. 

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
>     Attachments: HDFS-6986-20140905.patch, HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6986:

Attachment: HDFS-6986-20140905.patch

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
>     Attachments: HDFS-6986-20140905.patch, HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7006) Test encryption zones with KMS

2014-09-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7006:
---
Summary: Test encryption zones with KMS  (was: Test encryption zones with 
MKS)

> Test encryption zones with KMS
> --
>
> Key: HDFS-7006
> URL: https://issues.apache.org/jira/browse/HDFS-7006
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test
>Affects Versions: 2.6.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: HDFS-7006.patch
>
>
> We should test EZs with KMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7006) Test encryption zones with MKS

2014-09-05 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-7006:
-
Attachment: HDFS-7006.patch

> Test encryption zones with MKS
> --
>
> Key: HDFS-7006
> URL: https://issues.apache.org/jira/browse/HDFS-7006
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test
>Affects Versions: 2.6.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: HDFS-7006.patch
>
>
> We should test EZs with KMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7006) Test encryption zones with MKS

2014-09-05 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HDFS-7006:


 Summary: Test encryption zones with MKS
 Key: HDFS-7006
 URL: https://issues.apache.org/jira/browse/HDFS-7006
 Project: Hadoop HDFS
  Issue Type: Test
  Components: security, test
Affects Versions: 2.6.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


We should test EZs with KMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7005) DFS input streams do not timeout

2014-09-05 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7005:
-

 Summary: DFS input streams do not timeout
 Key: HDFS-7005
 URL: https://issues.apache.org/jira/browse/HDFS-7005
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


Input streams lost their timeout.  The problem appears to be 
{{DFSClient#newConnectedPeer}} does not set the read timeout.  During a 
temporary network interruption the server will close the socket, unbeknownst to 
the client host, which blocks on a read forever.

The results are dire.  Services such as the RM, JHS, NMs, oozie servers, etc 
all need to be restarted to recover - unless you want to wait many hours for 
the tcp stack keepalive to detect the broken socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6986) DistributedFileSystem must get delegation tokens from configured KeyProvider

2014-09-05 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123488#comment-14123488
 ] 

Alejandro Abdelnur commented on HDFS-6986:
--

I've tested the provided patch in a real cluster and it works as advertised. 
Please add testcase and we are good to go.

> DistributedFileSystem must get delegation tokens from configured KeyProvider
> 
>
> Key: HDFS-6986
> URL: https://issues.apache.org/jira/browse/HDFS-6986
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Zhe Zhang
> Attachments: HDFS-6986.patch
>
>
> {{KeyProvider}} via {{KeyProviderDelegationTokenExtension}} provides 
> delegation tokens. {{DistributedFileSystem}} should augment the HDFS 
> delegation tokens with the keyprovider ones so tasks can interact with 
> keyprovider when it is a client/server impl (KMS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()

2014-09-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123482#comment-14123482
 ] 

Colin Patrick McCabe commented on HDFS-6841:


In {{DatanodeInfo#getDatanodeReport}}, we translate {{DatanodeInfo#lastUpdate}} 
into a date:

{code}
buffer.append("Last contact: "+new Date(lastUpdate)+"\n");
{code}

This is not going to work if {{lastUpdate}} is a monotonic time.  The easiest 
way to solve this is to maintain another {{long}} with the wall-clock time, 
which we set to the current wall-clock time whenever an update occurs.  That 
way we get the benefits of calculating staleness and deadness based on 
monotonic time, but also reasonable information in {{getDatanodeReport}}.

{{FSNamesystem#reached}}: The same issue occurs here.

{code}
  if (reached > 0)
resText += " Threshold was reached " + new Date(reached) + ".";
{code}

{{EditLogTailer#lastLoadTimestamp}}: can we rename this to {{lastLoadTimeMs}}?  
It is not a timestamp (those come from the wall clock, generally.)  We could 
probably get rid of {{EditLogTailer#getLastLoadTimestamp}} since the only use 
is in {{FSNamesystem#getMillisSinceLastLoadedEdits}}.  All we need is a 
function which returns the amount of time since the edits were last loaded.

> Use Time.monotonicNow() wherever applicable instead of Time.now()
> -
>
> Key: HDFS-6841
> URL: https://issues.apache.org/jira/browse/HDFS-6841
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-6841-001.patch, HDFS-6841-002.patch
>
>
> {{Time.now()}} used  in many places to calculate elapsed time.
> This should be replaced with {{Time.monotonicNow()}} to avoid effect of 
> System time changes on elapsed time calculations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash

2014-09-05 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123471#comment-14123471
 ] 

James Thomas commented on HDFS-6981:


The marker file sounds like the best solution to me.

> DN upgrade with layout version change should not use trash
> --
>
> Key: HDFS-6981
> URL: https://issues.apache.org/jira/browse/HDFS-6981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: James Thomas
>Assignee: Arpit Agarwal
> Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
> HDFS-6981.03.patch, HDFS-6981.04.patch
>
>
> Post HDFS-6800, we can encounter the following scenario:
> # We start with DN software version -55 and initiate a rolling upgrade to 
> version -56
> # We delete some blocks, and they are moved to trash
> # We roll back to DN software version -55 using the -rollback flag – since we 
> are running the old code (prior to this patch), we will restore the previous 
> directory but will not delete the trash
> # We append to some of the blocks that were deleted in step 2
> # We then restart a DN that contains blocks that were appended to – since the 
> trash still exists, it will be restored at this point, the appended-to blocks 
> will be overwritten, and we will lose the appended data
> So I think we need to avoid writing anything to the trash directory if we 
> have a previous directory.
> Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6843) Create FileStatus isEncrypted() method

2014-09-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6843:
---
Attachment: HDFS-6843.005.patch

Resubmitting to see if weird testpatch errors go away. Previous run was as if 
the patch never got applied.


> Create FileStatus isEncrypted() method
> --
>
> Key: HDFS-6843
> URL: https://issues.apache.org/jira/browse/HDFS-6843
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, security
>Affects Versions: 3.0.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
> Attachments: HDFS-6843.001.patch, HDFS-6843.002.patch, 
> HDFS-6843.003.patch, HDFS-6843.004.patch, HDFS-6843.005.patch, 
> HDFS-6843.005.patch
>
>
> FileStatus should have a 'boolean isEncrypted()' method. (it was in the 
> context of discussing with AndreW about FileStatus being a Writable).
> Having this method would allow MR JobSubmitter do the following:
> -
> BOOLEAN intermediateEncryption = false
> IF jobconf.contains("mr.intermidate.encryption") THEN
>   intermediateEncryption = jobConf.getBoolean("mr.intermidate.encryption")
> ELSE
>   IF (I/O)Format INSTANCEOF File(I/O)Format THEN
> intermediateEncryption = ANY File(I/O)Format HAS a Path with status 
> isEncrypted()==TRUE
>   FI
>   jobConf.setBoolean("mr.intermidate.encryption", intermediateEncryption)
> FI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7001) Tests in TestTracing depends on the order of execution

2014-09-05 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7001:
---
Attachment: HDFS-7001-0.patch

attaching patch.

> Tests in TestTracing depends on the order of execution
> --
>
> Key: HDFS-7001
> URL: https://issues.apache.org/jira/browse/HDFS-7001
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7001-0.patch
>
>
> o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed 
> first. It should be done in BeforeClass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7001) Tests in TestTracing depends on the order of execution

2014-09-05 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7001:
---
Status: Patch Available  (was: Open)

> Tests in TestTracing depends on the order of execution
> --
>
> Key: HDFS-7001
> URL: https://issues.apache.org/jira/browse/HDFS-7001
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7001-0.patch
>
>
> o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed 
> first. It should be done in BeforeClass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-6981) DN upgrade with layout version change should not use trash

2014-09-05 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123413#comment-14123413
 ] 

Arpit Agarwal edited comment on HDFS-6981 at 9/5/14 7:12 PM:
-

Lacking an explicit finalize command for rolling upgrade, it is hard for the DN 
to determine when to delete 'previous'. Rolling upgrade is signaled by the 
presence/absence of RollingUpgradeStatus in the heartbeat response.

Without modifying the NN, one solution is that the DN creates a marker file 
when rolling upgrade is signaled by NN. When rolling upgrade is no longer 
signaled by NN, 'previous' is cleaned up only if the marker file is present. 
Else a regular upgrade is in progress and 'previous' is left alone.

I am wary of making NN changes, the interaction with HA is complex enough as it 
is. 


was (Author: arpitagarwal):
Lacking an explicit finalize command for rolling upgrade, it is hard for the DN 
to determine when to delete 'previous'. Rolling upgrade is signaled by the 
presence/absence of RollingUpgradeInfo in the heartbeat response.

Without modifying the NN, one solution is that the DN creates a marker file 
when rolling upgrade is signaled by NN. When rolling upgrade is no longer 
signaled by NN, 'previous' is cleaned up only if the marker file is present. 
Else a regular upgrade is in progress and 'previous' is left alone.

I am wary of making NN changes, the interaction with HA is complex enough as it 
is. 

> DN upgrade with layout version change should not use trash
> --
>
> Key: HDFS-6981
> URL: https://issues.apache.org/jira/browse/HDFS-6981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: James Thomas
>Assignee: Arpit Agarwal
> Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
> HDFS-6981.03.patch, HDFS-6981.04.patch
>
>
> Post HDFS-6800, we can encounter the following scenario:
> # We start with DN software version -55 and initiate a rolling upgrade to 
> version -56
> # We delete some blocks, and they are moved to trash
> # We roll back to DN software version -55 using the -rollback flag – since we 
> are running the old code (prior to this patch), we will restore the previous 
> directory but will not delete the trash
> # We append to some of the blocks that were deleted in step 2
> # We then restart a DN that contains blocks that were appended to – since the 
> trash still exists, it will be restored at this point, the appended-to blocks 
> will be overwritten, and we will lose the appended data
> So I think we need to avoid writing anything to the trash directory if we 
> have a previous directory.
> Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash

2014-09-05 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123413#comment-14123413
 ] 

Arpit Agarwal commented on HDFS-6981:
-

Lacking an explicit finalize command for rolling upgrade, it is hard for the DN 
to determine when to delete 'previous'. Rolling upgrade is signaled by the 
presence/absence of RollingUpgradeInfo in the heartbeat response.

Without modifying the NN, one solution is that the DN creates a marker file 
when rolling upgrade is signaled by NN. When rolling upgrade is no longer 
signaled by NN, 'previous' is cleaned up only if the marker file is present. 
Else a regular upgrade is in progress and 'previous' is left alone.

I am wary of making NN changes, the interaction with HA is complex enough as it 
is. 

> DN upgrade with layout version change should not use trash
> --
>
> Key: HDFS-6981
> URL: https://issues.apache.org/jira/browse/HDFS-6981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: James Thomas
>Assignee: Arpit Agarwal
> Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
> HDFS-6981.03.patch, HDFS-6981.04.patch
>
>
> Post HDFS-6800, we can encounter the following scenario:
> # We start with DN software version -55 and initiate a rolling upgrade to 
> version -56
> # We delete some blocks, and they are moved to trash
> # We roll back to DN software version -55 using the -rollback flag – since we 
> are running the old code (prior to this patch), we will restore the previous 
> directory but will not delete the trash
> # We append to some of the blocks that were deleted in step 2
> # We then restart a DN that contains blocks that were appended to – since the 
> trash still exists, it will be restored at this point, the appended-to blocks 
> will be overwritten, and we will lose the appended data
> So I think we need to avoid writing anything to the trash directory if we 
> have a previous directory.
> Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6984) In Hadoop 3, make FileStatus no longer a Writable

2014-09-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123412#comment-14123412
 ] 

Chris Nauroth commented on HDFS-6984:
-

bq. So, it looks like DistCp depends on FileStatus being writable...

Last time I looked at this, I actually planned on replacing DistCp's usage of 
{{FileStatus}} serialization with its own custom data type.  I believe it 
doesn't need all of the fields of {{FileStatus}}, so there is potential for a 
marginal space/performance improvement by omitting the unnecessaries.

> In Hadoop 3, make FileStatus no longer a Writable
> -
>
> Key: HDFS-6984
> URL: https://issues.apache.org/jira/browse/HDFS-6984
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6984.001.patch
>
>
> FileStatus was a Writable in Hadoop 2 and earlier.  Originally, we used this 
> to serialize it and send it over the wire.  But in Hadoop 2 and later, we 
> have the protobuf {{HdfsFileStatusProto}} which serves to serialize this 
> information.  The protobuf form is preferable, since it allows us to add new 
> fields in a backwards-compatible way.  Another issue is that already a lot of 
> subclasses of FileStatus don't override the Writable methods of the 
> superclass, breaking the interface contract that read(status.write) should be 
> equal to the original status.
> In Hadoop 3, we should just make FileStatus no longer a writable so that we 
> don't have to deal with these issues.  It's probably too late to do this in 
> Hadoop 2, since user code may be relying on the ability to use the Writable 
> methods on FileStatus objects there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6984) In Hadoop 3, make FileStatus no longer a Writable

2014-09-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123402#comment-14123402
 ] 

Colin Patrick McCabe commented on HDFS-6984:


So, it looks like DistCp depends on {{FileStatus}} being writable in a pretty 
fundamental way, since it wants to use it as a MapReduce value in 
CopyMapper.java:

{code}
public class CopyMapper extends Mapper 
{
...
{code}

Maybe, rather than get rid of the "implements Writable", we should just use 
protobuf for the serialization in {{FileStatus#write}}.  That allows us to add 
whatever fields we want later via optional protobuf members.

> In Hadoop 3, make FileStatus no longer a Writable
> -
>
> Key: HDFS-6984
> URL: https://issues.apache.org/jira/browse/HDFS-6984
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6984.001.patch
>
>
> FileStatus was a Writable in Hadoop 2 and earlier.  Originally, we used this 
> to serialize it and send it over the wire.  But in Hadoop 2 and later, we 
> have the protobuf {{HdfsFileStatusProto}} which serves to serialize this 
> information.  The protobuf form is preferable, since it allows us to add new 
> fields in a backwards-compatible way.  Another issue is that already a lot of 
> subclasses of FileStatus don't override the Writable methods of the 
> superclass, breaking the interface contract that read(status.write) should be 
> equal to the original status.
> In Hadoop 3, we should just make FileStatus no longer a writable so that we 
> don't have to deal with these issues.  It's probably too late to do this in 
> Hadoop 2, since user code may be relying on the ability to use the Writable 
> methods on FileStatus objects there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7004) Update KeyProvider instantiation to create by URI

2014-09-05 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-7004:
-

 Summary: Update KeyProvider instantiation to create by URI
 Key: HDFS-7004
 URL: https://issues.apache.org/jira/browse/HDFS-7004
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Andrew Wang
Assignee: Andrew Wang


See HADOOP-11054, would be good to update the NN/DFSClient to fetch via this 
method rather than depending on the URI path lookup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work via webhdfs

2014-09-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1412#comment-1412
 ] 

Yongjun Zhang commented on HDFS-6776:
-

Hi [~wheat9], 

I know that you are opposed to put the msg-parsing hack to webhdfs. However, 
you also have said:
{quote}
However, it's okay to me to return a null here so the behavior is similar to 
DistributedFileSystem. The actual fallback logic can happen at the distcp side 
when building the file list, but maybe we can defer it to another jira.
{quote}

I had quite some questions for you in my last two comments, I'd appreciate that 
you could comment on them. That way, we can understand more about your concern 
why it's so fragile as you said. 

Do you agree that a correct webhdfs contract is not to fail with the exception 
when accessing insecure cluster, rather, it should be able to access insecure 
cluster? This is a very important question that I hope you could answer.

We agree that the msg-parsing is a bit hacky, but why hack in webhdfs is so 
much worse than in distcp, given webhdfs doesn't work without a fix?

BTW, FYI, not to say that it's good thing to do so, there was already code 
doing msg parsing in webhdfs:
{code}
  // extract UGI-related exceptions and unwrap InvalidToken
  // the NN mangles these exceptions but the DN does not and may need
  // to re-fetch a token if either report the token is expired
  if (re.getMessage().startsWith("Failed to obtain user group 
information:")) {
String[] parts = re.getMessage().split(":\\s+", 3);
re = new RemoteException(parts[1], parts[2]);
re = ((RemoteException)re).unwrapRemoteException(InvalidToken.class);
  }
{code}
Do you consider this fragile?

Disclaimer, In the patch I did here, it's not because there was existing code 
like quoted above. Rather it's because the solution has its simplicity which we 
discussed earlier.

Thanks.


> distcp from insecure cluster (source) to secure cluster (destination) doesn't 
> work via webhdfs
> --
>
> Key: HDFS-6776
> URL: https://issues.apache.org/jira/browse/HDFS-6776
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0, 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
> HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch, 
> HDFS-6776.005.patch, HDFS-6776.006.NullToken.patch, 
> HDFS-6776.006.NullToken.patch, HDFS-6776.007.patch, HDFS-6776.008.patch, 
> HDFS-6776.009.patch, HDFS-6776.010.patch, HDFS-6776.011.patch, 
> dummy-token-proxy.js
>
>
> Issuing distcp command at the secure cluster side, trying to copy stuff from 
> insecure cluster to secure cluster, and see the following problem:
> {code}
> hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://:/tmp 
> hdfs://:8020/tmp/tmptgt
> 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[webhdfs://:/tmp], 
> targetPath=hdfs://:8020/tmp/tmptgt, targetPathExists=true}
> 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
> :8032
> 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
> 'ssl.client.truststore.location' has not been set, no TrustStore will be 
> loaded
> 14/07/30 20:06:20 WARN security.UserGroupInformation: 
> PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to get the token for hadoopuser, 
> user=hadoopuser
> 14/07/30 20:06:20 WARN security.UserGroupInformation: 
> PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to get the token for hadoopuser, 
> user=hadoopuser
> 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
> java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFi

[jira] [Updated] (HDFS-6727) Refresh data volumes on DataNode based on configuration changes

2014-09-05 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-6727:

Attachment: HDFS-6727.002.patch

Update the patch to remove {{ReconfigurableServlet}} support from the code. 
Also it uses {{File#getCanonicalPath()}} to determine the changed volumes.

> Refresh data volumes on DataNode based on configuration changes
> ---
>
> Key: HDFS-6727
> URL: https://issues.apache.org/jira/browse/HDFS-6727
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>  Labels: datanode
> Attachments: HDFS-6727.000.delta-HDFS-6775.txt, HDFS-6727.001.patch, 
> HDFS-6727.002.patch, HDFS-6727.combo.patch
>
>
> HDFS-1362 requires DataNode to reload configuration file during the runtime, 
> so that DN can change the data volumes dynamically. This JIRA reuses the 
> reconfiguration framework introduced by HADOOP-7001 to enable DN to 
> reconfigure at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6862) Add missing timeout annotations to tests

2014-09-05 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6862:

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2. Thanks for fixing this [~xyao]!

> Add missing timeout annotations to tests
> 
>
> Key: HDFS-6862
> URL: https://issues.apache.org/jira/browse/HDFS-6862
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>  Labels: newbie
> Fix For: 2.6.0
>
> Attachments: HDFS-6862.0.patch
>
>
> One or more tests in the following classes are missing timeout annotations.
> # org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings
> # org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
> # org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> # org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
> # org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
> # org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
> # org.apache.hadoop.hdfs.TestHDFSServerPorts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6979) hdfs.dll does not produce .pdb files

2014-09-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6979:

   Resolution: Fixed
Fix Version/s: 2.6.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I have committed this to trunk and branch-2.  Arpit, thank you for the code 
review.  Remus, thank you again for reporting the bug.

> hdfs.dll does not produce .pdb files
> 
>
> Key: HDFS-6979
> URL: https://issues.apache.org/jira/browse/HDFS-6979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Remus Rusanu
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: build, cmake, native, windows
> Fix For: 3.0.0, 2.6.0
>
> Attachments: HDFS-6979.1.patch
>
>
> hdfs.dll build does not produce a retail pdb. For comparison we do produce 
> pdbs for winutils.exe and hadoop.dll.
> I did not verify whether cmake project does not produce a dll with embeded 
> pdb.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6979) hdfs.dll does not produce .pdb files

2014-09-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6979:

Summary: hdfs.dll does not produce .pdb files  (was: hdfs.dll  not produce 
.pdb files)

> hdfs.dll does not produce .pdb files
> 
>
> Key: HDFS-6979
> URL: https://issues.apache.org/jira/browse/HDFS-6979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Remus Rusanu
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: build, cmake, native, windows
> Attachments: HDFS-6979.1.patch
>
>
> hdfs.dll build does not produce a retail pdb. For comparison we do produce 
> pdbs for winutils.exe and hadoop.dll.
> I did not verify whether cmake project does not produce a dll with embeded 
> pdb.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6979) hdfs.dll not produce .pdb files

2014-09-05 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123243#comment-14123243
 ] 

Arpit Agarwal commented on HDFS-6979:
-

+1 for the patch. pdb files are good to have. 

> hdfs.dll  not produce .pdb files
> 
>
> Key: HDFS-6979
> URL: https://issues.apache.org/jira/browse/HDFS-6979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Remus Rusanu
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: build, cmake, native, windows
> Attachments: HDFS-6979.1.patch
>
>
> hdfs.dll build does not produce a retail pdb. For comparison we do produce 
> pdbs for winutils.exe and hadoop.dll.
> I did not verify whether cmake project does not produce a dll with embeded 
> pdb.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6376) Distcp data between two HA clusters requires another configuration

2014-09-05 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6376:

   Resolution: Fixed
Fix Version/s: 2.6.0
 Release Note: Allow distcp to copy data between HA clusters. Users can use 
a new configuration property "dfs.internal.nameservices" to explicitly specify 
the name services belonging to the local cluster, while continue using the 
configuration property "dfs.nameservices" to specify all the name services in 
the local and remote clusters.
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this into trunk and branch-2. Thanks for the contribution, 
[~dlmarion] and [~wheat9]!

> Distcp data between two HA clusters requires another configuration
> --
>
> Key: HDFS-6376
> URL: https://issues.apache.org/jira/browse/HDFS-6376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, federation, hdfs-client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
> Environment: Hadoop 2.3.0
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 3.0.0, 2.6.0
>
> Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, 
> HDFS-6376-4-branch-2.4.patch, HDFS-6376-5-trunk.patch, 
> HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch, 
> HDFS-6376-patch-1.patch, HDFS-6376.000.patch, HDFS-6376.008.patch, 
> HDFS-6376.009.patch, HDFS-6376.010.patch, HDFS-6376.011.patch
>
>
> User has to create a third set of configuration files for distcp when 
> transferring data between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties 
> in core-site.xml and hdfs-site.xml for the client to resolve the location of 
> both active namenodes. If you do, then the datanodes from cluster A may join 
> cluster B. I can not find a configuration option that tells the datanodes to 
> federate blocks for only one of the clusters in the configuration.
> [1] 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work via webhdfs

2014-09-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123232#comment-14123232
 ] 

Haohui Mai edited comment on HDFS-6776 at 9/5/14 5:47 PM:
--

I've made it very clear that I'm opposed to put fragile hacks into 
{{WebHdfsFileSystem}}, though I'm okay if it's done at the application level 
(e.g. distcp). Unless this is addressed, I cannot give my +1.

If you are not familiar with the distcp code, I'll take a look and see whether 
I can post a patch for it.


was (Author: wheat9):
I've made it very clear that I'm opposed to put fragile hacks into 
{{WebHdfsFileSystem}}, though I'm okay if it's done at the application level 
(e.g. distcp). Unless this is addressed, I cannot give my +1.

If you don't want to take a look at the distcp code, I'll take a look and see 
whether I can post a patch for it.

> distcp from insecure cluster (source) to secure cluster (destination) doesn't 
> work via webhdfs
> --
>
> Key: HDFS-6776
> URL: https://issues.apache.org/jira/browse/HDFS-6776
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0, 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
> HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch, 
> HDFS-6776.005.patch, HDFS-6776.006.NullToken.patch, 
> HDFS-6776.006.NullToken.patch, HDFS-6776.007.patch, HDFS-6776.008.patch, 
> HDFS-6776.009.patch, HDFS-6776.010.patch, HDFS-6776.011.patch, 
> dummy-token-proxy.js
>
>
> Issuing distcp command at the secure cluster side, trying to copy stuff from 
> insecure cluster to secure cluster, and see the following problem:
> {code}
> hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://:/tmp 
> hdfs://:8020/tmp/tmptgt
> 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[webhdfs://:/tmp], 
> targetPath=hdfs://:8020/tmp/tmptgt, targetPathExists=true}
> 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
> :8032
> 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
> 'ssl.client.truststore.location' has not been set, no TrustStore will be 
> loaded
> 14/07/30 20:06:20 WARN security.UserGroupInformation: 
> PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to get the token for hadoopuser, 
> user=hadoopuser
> 14/07/30 20:06:20 WARN security.UserGroupInformation: 
> PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to get the token for hadoopuser, 
> user=hadoopuser
> 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
> java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218)
>   at 
> org.apache.hadoop.hdfs.web.WebH

[jira] [Updated] (HDFS-6831) Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help'

2014-09-05 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6831:

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 I committed this to trunk and branch-2.

Thanks for the contribution [~xyao] and thanks [~ajisakaa] for reviewing.

> Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help'
> ---
>
> Key: HDFS-6831
> URL: https://issues.apache.org/jira/browse/HDFS-6831
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Akira AJISAKA
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: newbie
> Fix For: 2.6.0
>
> Attachments: HDFS-6831.0.patch, HDFS-6831.1.patch, HDFS-6831.2.patch, 
> HDFS-6831.3.patch, HDFS-6831.4.patch
>
>
> There is an inconsistency between the console outputs of 'hdfs dfsadmin' 
> command and 'hdfs dfsadmin -help' command.
> {code}
> [root@trunk ~]# hdfs dfsadmin
> Usage: java DFSAdmin
> Note: Administrative commands can only be run as the HDFS superuser.
>[-report]
>[-safemode enter | leave | get | wait]
>[-allowSnapshot ]
>[-disallowSnapshot ]
>[-saveNamespace]
>[-rollEdits]
>[-restoreFailedStorage true|false|check]
>[-refreshNodes]
>[-finalizeUpgrade]
>[-rollingUpgrade []]
>[-metasave filename]
>[-refreshServiceAcl]
>[-refreshUserToGroupsMappings]
>[-refreshSuperUserGroupsConfiguration]
>[-refreshCallQueue]
>[-refresh]
>[-printTopology]
>[-refreshNamenodes datanodehost:port]
>[-deleteBlockPool datanode-host:port blockpoolId [force]]
>[-setQuota  ...]
>[-clrQuota ...]
>[-setSpaceQuota  ...]
>[-clrSpaceQuota ...]
>[-setBalancerBandwidth ]
>[-fetchImage ]
>[-shutdownDatanode  [upgrade]]
>[-getDatanodeInfo ]
>[-help [cmd]]
> {code}
> {code}
> [root@trunk ~]# hdfs dfsadmin -help
> hadoop dfsadmin performs DFS administrative commands.
> The full syntax is: 
> hadoop dfsadmin
>   [-report [-live] [-dead] [-decommissioning]]
>   [-safemode ]
>   [-saveNamespace]
>   [-rollEdits]
>   [-restoreFailedStorage true|false|check]
>   [-refreshNodes]
>   [-setQuota  ...]
>   [-clrQuota ...]
>   [-setSpaceQuota  ...]
>   [-clrSpaceQuota ...]
>   [-finalizeUpgrade]
>   [-rollingUpgrade []]
>   [-refreshServiceAcl]
>   [-refreshUserToGroupsMappings]
>   [-refreshSuperUserGroupsConfiguration]
>   [-refreshCallQueue]
>   [-refresh   [arg1..argn]
>   [-printTopology]
>   [-refreshNamenodes datanodehost:port]
>   [-deleteBlockPool datanodehost:port blockpoolId [force]]
>   [-setBalancerBandwidth ]
>   [-fetchImage ]
>   [-allowSnapshot ]
>   [-disallowSnapshot ]
>   [-shutdownDatanode  [upgrade]]
>   [-getDatanodeInfo 
>   [-help [cmd]
> {code}
> These two outputs should be the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >