[jira] [Commented] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299675#comment-14299675
 ] 

Hadoop QA commented on HDFS-7720:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695700/HDFS-7720.0.patch
  against trunk revision 09ad9a8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.balancer.TestBalancer
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.web.TestWebHDFSXAttr

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9389//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9389//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9389//console

This message is automatically generated.

> Quota by Storage Type API, tools and ClientNameNode Protocol changes
> 
>
> Key: HDFS-7720
> URL: https://issues.apache.org/jira/browse/HDFS-7720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-7720.0.patch, HDFS-7720.1.patch
>
>
> Split the patch into small ones based on the feedback. This one covers the 
> HDFS API changes, tool changes and ClientNameNode protocol changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7719) BlockPoolSliceStorage could not remove storageDir.

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299664#comment-14299664
 ] 

Hadoop QA commented on HDFS-7719:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695692/HDFS-7719.000.patch
  against trunk revision 09ad9a8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1191 javac 
compiler warnings (more than the trunk's current 152 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
48 warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/9388//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9388//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9388//artifact/patchprocess/patchReleaseAuditProblems.txt
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9388//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9388//console

This message is automatically generated.

> BlockPoolSliceStorage could not remove storageDir.
> --
>
> Key: HDFS-7719
> URL: https://issues.apache.org/jira/browse/HDFS-7719
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7719.000.patch
>
>
> The parameter of {{BlockPoolSliceStorage#removeVolumes()}} is a set of volume 
> level directories, thus {{BlockPoolSliceStorage}} could not directly compare 
> its own {{StorageDirs}} with this volume-level directory. The result of that 
> is {{BlockPoolSliceStorage}} did not actually remove the targeted 
> {{StorageDirectory}}. 
> It will cause failure when remove a volume and then immediately add a volume 
> back with the same mount point..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7647) DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299665#comment-14299665
 ] 

Hadoop QA commented on HDFS-7647:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695696/HDFS-7647-3.patch
  against trunk revision 09ad9a8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestHeartbeatHandling
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
  org.apache.hadoop.hdfs.TestDataTransferKeepalive
  org.apache.hadoop.hdfs.TestDecommission
  org.apache.hadoop.hdfs.TestBlockReaderFactory

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9387//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9387//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9387//console

This message is automatically generated.

> DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs
> --
>
> Key: HDFS-7647
> URL: https://issues.apache.org/jira/browse/HDFS-7647
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Milan Desai
>Assignee: Milan Desai
> Attachments: HDFS-7647-2.patch, HDFS-7647-3.patch, HDFS-7647.patch
>
>
> DatanodeManager.sortLocatedBlocks() sorts the array of DatanodeInfos inside 
> each LocatedBlock, but does not touch the array of StorageIDs and 
> StorageTypes. As a result, the DatanodeInfos and StorageIDs/StorageTypes are 
> mismatched. The method is called by FSNamesystem.getBlockLocations(), so the 
> client will not know which StorageID/Type corresponds to which DatanodeInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299649#comment-14299649
 ] 

Hadoop QA commented on HDFS-7720:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695711/HDFS-7720.1.patch
  against trunk revision 054a947.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotRename

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9390//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9390//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9390//console

This message is automatically generated.

> Quota by Storage Type API, tools and ClientNameNode Protocol changes
> 
>
> Key: HDFS-7720
> URL: https://issues.apache.org/jira/browse/HDFS-7720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-7720.0.patch, HDFS-7720.1.patch
>
>
> Split the patch into small ones based on the feedback. This one covers the 
> HDFS API changes, tool changes and ClientNameNode protocol changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7712) Switch blockStateChangeLog to use slf4j

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299581#comment-14299581
 ] 

Hadoop QA commented on HDFS-7712:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695654/hdfs-7712.002.patch
  against trunk revision 09ad9a8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles
  org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
  org.apache.hadoop.hdfs.TestFileCreationDelete
  org.apache.hadoop.hdfs.TestFileAppend4
  
org.apache.hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality
  org.apache.hadoop.hdfs.TestFileAppend2
  org.apache.hadoop.hdfs.TestFileAppend3
  
org.apache.hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes
  org.apache.hadoop.hdfs.TestRenameWhileOpen
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles
  
org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage
  org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode
  org.apache.hadoop.hdfs.TestDatanodeDeath
  
org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport
  org.apache.hadoop.hdfs.TestFileCorruption

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestFileCreation

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9386//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9386//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9386//console

This message is automatically generated.

> Switch blockStateChangeLog to use slf4j
> ---
>
> Key: HDFS-7712
> URL: https://issues.apache.org/jira/browse/HDFS-7712
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-7712.001.patch, hdfs-7712.002.patch
>
>
> As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will 
> save a lot of string construction costs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7723) Quota By Storage Type namenode implemenation

2015-01-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7723:
-
Attachment: HDFS-7723.0.patch

This patch assumes ClientNameNodeRPC protocol changes (HDFS-7720) is in. I will 
defer submit the patch until the review for HDFS-7720 is finished.

> Quota By Storage Type namenode implemenation
> 
>
> Key: HDFS-7723
> URL: https://issues.apache.org/jira/browse/HDFS-7723
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Xiaoyu Yao
> Attachments: HDFS-7723.0.patch
>
>
> This includes: 1) new editlog to persist quota by storage type op 2) 
> corresponding fsimage load/save the new op. 3) QuotaCount refactor to update 
> usage of the storage types for quota enforcement 4) Snapshot support 5) Unit 
> test update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations

2015-01-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299554#comment-14299554
 ] 

Tsz Wo Nicholas Sze commented on HDFS-5631:
---

extdataset is missing in the branch-2 patch.  Forgot to add the new files?

> Expose interfaces required by FsDatasetSpi implementations
> --
>
> Key: HDFS-5631
> URL: https://issues.apache.org/jira/browse/HDFS-5631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: Joe Pallas
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HDFS-5631-LazyPersist.patch, 
> HDFS-5631-LazyPersist.patch, HDFS-5631-branch-2.patch, HDFS-5631.patch, 
> HDFS-5631.patch
>
>
> This sub-task addresses section 4.1 of the document attached to HDFS-5194,
> the exposure of interfaces needed by a FsDatasetSpi implementation.
> Specifically it makes ChunkChecksum public and BlockMetadataHeader's
> readHeader() and writeHeader() methods public.
> The changes to BlockReaderUtil (and related classes) discussed by section
> 4.1 are only needed if supporting short-circuit, and should be addressed
> as part of an effort to provide such support rather than this JIRA.
> To help ensure these changes are complete and are not regressed in the
> future, tests that gauge the accessibility (though *not* behavior)
> of interfaces needed by a FsDatasetSpi subclass are also included.
> These take the form of a dummy FsDatasetSpi subclass -- a successful
> compilation is effectively a pass.  Trivial unit tests are included so
> that there is something tangible to track.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7723) Quota By Storage Type namenode implemenation

2015-01-30 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-7723:


 Summary: Quota By Storage Type namenode implemenation
 Key: HDFS-7723
 URL: https://issues.apache.org/jira/browse/HDFS-7723
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Xiaoyu Yao


This includes: 1) new editlog to persist quota by storage type op 2) 
corresponding fsimage load/save the new op. 3) QuotaCount refactor to update 
usage of the storage types for quota enforcement 4) Snapshot support 5) Unit 
test update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes

2015-01-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7720:
-
Attachment: HDFS-7720.1.patch

> Quota by Storage Type API, tools and ClientNameNode Protocol changes
> 
>
> Key: HDFS-7720
> URL: https://issues.apache.org/jira/browse/HDFS-7720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-7720.0.patch, HDFS-7720.1.patch
>
>
> Split the patch into small ones based on the feedback. This one covers the 
> HDFS API changes, tool changes and ClientNameNode protocol changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7707) Edit log corruption due to delayed block removal again

2015-01-30 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-7707:
-
Target Version/s: 2.7.0

> Edit log corruption due to delayed block removal again
> --
>
> Key: HDFS-7707
> URL: https://issues.apache.org/jira/browse/HDFS-7707
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>
> Edit log corruption is seen again, even with the fix of HDFS-6825. 
> Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get 
> into edit log for the fileY under dirX, thus corrupting the edit log 
> (restarting NN with the edit log would fail). 
> What HDFS-6825 does to fix this issue is, to detect whether fileY is already 
> deleted by checking the ancestor dirs on it's path, if any of them doesn't 
> exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for 
> the file.
> For this new edit log corruption, what I found was, the client first deleted 
> dirX recursively, then create another dir with exactly the same name as dirX 
> right away.  Because HDFS-6825 count on the namespace checking (whether dirX 
> exists in its parent dir) to decide whether a file has been deleted, the 
> newly created dirX defeats this checking, thus OP_CLOSE for the already 
> deleted file gets into the edit log, due to delayed block removal.
> What we need to do is to have a more robust way to detect whether a file has 
> been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7701) Support quota by storage type output with "hadoop fs -count -q"

2015-01-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDFS-7701:


Assignee: Xiaoyu Yao

> Support quota by storage type output with "hadoop fs -count -q"
> ---
>
> Key: HDFS-7701
> URL: https://issues.apache.org/jira/browse/HDFS-7701
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> "hadoop fs -count -q" currently shows name space/disk space quota and 
> remaining quota information. With HDFS-7584, we want to display per storage 
> type quota and its remaining information as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes

2015-01-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7720:
-
Attachment: HDFS-7720.0.patch

> Quota by Storage Type API, tools and ClientNameNode Protocol changes
> 
>
> Key: HDFS-7720
> URL: https://issues.apache.org/jira/browse/HDFS-7720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-7720.0.patch
>
>
> Split the patch into small ones based on the feedback. This one covers the 
> HDFS API changes, tool changes and ClientNameNode protocol changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.

2015-01-30 Thread Lei (Eddy) Xu (JIRA)
Lei (Eddy) Xu created HDFS-7722:
---

 Summary: DataNode#checkDiskError should also remove Storage when 
error is found.
 Key: HDFS-7722
 URL: https://issues.apache.org/jira/browse/HDFS-7722
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu


When {{DataNode#checkDiskError}} found disk errors, it removes all block 
metadatas from {{FsDatasetImpl}}. However, it does not removed the 
corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 

The result is that, we could not directly run {{reconfig}} to hot swap the 
failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes

2015-01-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7720:
-
Status: Patch Available  (was: Open)

> Quota by Storage Type API, tools and ClientNameNode Protocol changes
> 
>
> Key: HDFS-7720
> URL: https://issues.apache.org/jira/browse/HDFS-7720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-7720.0.patch
>
>
> Split the patch into small ones based on the feedback. This one covers the 
> HDFS API changes, tool changes and ClientNameNode protocol changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7721) TestBlockScanner.testScanRateLimit may fail

2015-01-30 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-7721:
-

 Summary: TestBlockScanner.testScanRateLimit may fail
 Key: HDFS-7721
 URL: https://issues.apache.org/jira/browse/HDFS-7721
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze


- 
https://builds.apache.org/job/PreCommit-HDFS-Build/9375//testReport/org.apache.hadoop.hdfs.server.datanode/TestBlockScanner/testScanRateLimit/
- 
https://builds.apache.org/job/PreCommit-HDFS-Build/9365//testReport/org.apache.hadoop.hdfs.server.datanode/TestBlockScanner/testScanRateLimit/
{code}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockScanner.testScanRateLimit(TestBlockScanner.java:439)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7696) FsDatasetImpl.getTmpInputStreams(..) may leak file descriptors

2015-01-30 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299518#comment-14299518
 ] 

Brandon Li commented on HDFS-7696:
--

+1. The patch looks good to me.

> FsDatasetImpl.getTmpInputStreams(..) may leak file descriptors
> --
>
> Key: HDFS-7696
> URL: https://issues.apache.org/jira/browse/HDFS-7696
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7696_20150128.patch
>
>
> getTmpInputStreams(..) opens a block file and a meta file, and then return 
> them as ReplicaInputStreams.  The caller responses to closes those streams.  
> In case of errors, an exception is thrown without closing the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7647) DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs

2015-01-30 Thread Milan Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai updated HDFS-7647:
--
Attachment: HDFS-7647-3.patch

Thanks [~arpitagarwal] for the review, and sorry about the delay.

1. I returned the fields for {{storageIDs}} and {{storageTypes}} to store their 
cached versions.
2. Introduced method {{invalidateCachedStorageInfos}} to invalidate the arrays 
for {{storageIDs}} and {{storageTypes}}. It is called by {{sortLocatedBlocks}} 
after the sorting.
3. Added unit test {{TestDatanodeManager.testSortLocatedBlocks}}.
4. I added a comment to {{getLocations()}} saying the returned array is not 
expected to be modified, and if it is, caller must immediately invoke 
{{invalidateCachedStorageInfos}} from (2)

Will open a separate Jira for making {{locs}} an immutable list.

> DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs
> --
>
> Key: HDFS-7647
> URL: https://issues.apache.org/jira/browse/HDFS-7647
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Milan Desai
>Assignee: Milan Desai
> Attachments: HDFS-7647-2.patch, HDFS-7647-3.patch, HDFS-7647.patch
>
>
> DatanodeManager.sortLocatedBlocks() sorts the array of DatanodeInfos inside 
> each LocatedBlock, but does not touch the array of StorageIDs and 
> StorageTypes. As a result, the DatanodeInfos and StorageIDs/StorageTypes are 
> mismatched. The method is called by FSNamesystem.getBlockLocations(), so the 
> client will not know which StorageID/Type corresponds to which DatanodeInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7647) DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs

2015-01-30 Thread Milan Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai updated HDFS-7647:
--
Status: Patch Available  (was: In Progress)

> DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs
> --
>
> Key: HDFS-7647
> URL: https://issues.apache.org/jira/browse/HDFS-7647
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Milan Desai
>Assignee: Milan Desai
> Attachments: HDFS-7647-2.patch, HDFS-7647-3.patch, HDFS-7647.patch
>
>
> DatanodeManager.sortLocatedBlocks() sorts the array of DatanodeInfos inside 
> each LocatedBlock, but does not touch the array of StorageIDs and 
> StorageTypes. As a result, the DatanodeInfos and StorageIDs/StorageTypes are 
> mismatched. The method is called by FSNamesystem.getBlockLocations(), so the 
> client will not know which StorageID/Type corresponds to which DatanodeInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations

2015-01-30 Thread Joe Pallas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Pallas updated HDFS-5631:
-
Target Version/s: 3.0.0, 2.7.0  (was: 3.0.0)

> Expose interfaces required by FsDatasetSpi implementations
> --
>
> Key: HDFS-5631
> URL: https://issues.apache.org/jira/browse/HDFS-5631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: Joe Pallas
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HDFS-5631-LazyPersist.patch, 
> HDFS-5631-LazyPersist.patch, HDFS-5631-branch-2.patch, HDFS-5631.patch, 
> HDFS-5631.patch
>
>
> This sub-task addresses section 4.1 of the document attached to HDFS-5194,
> the exposure of interfaces needed by a FsDatasetSpi implementation.
> Specifically it makes ChunkChecksum public and BlockMetadataHeader's
> readHeader() and writeHeader() methods public.
> The changes to BlockReaderUtil (and related classes) discussed by section
> 4.1 are only needed if supporting short-circuit, and should be addressed
> as part of an effort to provide such support rather than this JIRA.
> To help ensure these changes are complete and are not regressed in the
> future, tests that gauge the accessibility (though *not* behavior)
> of interfaces needed by a FsDatasetSpi subclass are also included.
> These take the form of a dummy FsDatasetSpi subclass -- a successful
> compilation is effectively a pass.  Trivial unit tests are included so
> that there is something tangible to track.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes

2015-01-30 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-7720:


 Summary: Quota by Storage Type API, tools and ClientNameNode 
Protocol changes
 Key: HDFS-7720
 URL: https://issues.apache.org/jira/browse/HDFS-7720
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


Split the patch into small ones based on the feedback. This one covers the HDFS 
API changes, tool changes and ClientNameNode protocol changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations

2015-01-30 Thread Joe Pallas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Pallas updated HDFS-5631:
-
Attachment: HDFS-5631-branch-2.patch

Added patch for branch-2.

> Expose interfaces required by FsDatasetSpi implementations
> --
>
> Key: HDFS-5631
> URL: https://issues.apache.org/jira/browse/HDFS-5631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: Joe Pallas
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HDFS-5631-LazyPersist.patch, 
> HDFS-5631-LazyPersist.patch, HDFS-5631-branch-2.patch, HDFS-5631.patch, 
> HDFS-5631.patch
>
>
> This sub-task addresses section 4.1 of the document attached to HDFS-5194,
> the exposure of interfaces needed by a FsDatasetSpi implementation.
> Specifically it makes ChunkChecksum public and BlockMetadataHeader's
> readHeader() and writeHeader() methods public.
> The changes to BlockReaderUtil (and related classes) discussed by section
> 4.1 are only needed if supporting short-circuit, and should be addressed
> as part of an effort to provide such support rather than this JIRA.
> To help ensure these changes are complete and are not regressed in the
> future, tests that gauge the accessibility (though *not* behavior)
> of interfaces needed by a FsDatasetSpi subclass are also included.
> These take the form of a dummy FsDatasetSpi subclass -- a successful
> compilation is effectively a pass.  Trivial unit tests are included so
> that there is something tangible to track.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HDFS-7719) BlockPoolSliceStorage could not remove storageDir.

2015-01-30 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu moved HADOOP-11530 to HDFS-7719:
--

 Target Version/s: 3.0.0, 2.7.0  (was: 3.0.0, 2.7.0)
Affects Version/s: (was: 2.6.0)
   2.6.0
  Key: HDFS-7719  (was: HADOOP-11530)
  Project: Hadoop HDFS  (was: Hadoop Common)

> BlockPoolSliceStorage could not remove storageDir.
> --
>
> Key: HDFS-7719
> URL: https://issues.apache.org/jira/browse/HDFS-7719
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>
> The parameter of {{BlockPoolSliceStorage#removeVolumes()}} is a set of volume 
> level directories, thus {{BlockPoolSliceStorage}} could not directly compare 
> its own {{StorageDirs}} with this volume-level directory. The result of that 
> is {{BlockPoolSliceStorage}} did not actually remove the targeted 
> {{StorageDirectory}}. 
> It will cause failure when remove a volume and then immediately add a volume 
> back with the same mount point..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7719) BlockPoolSliceStorage could not remove storageDir.

2015-01-30 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7719:

Status: Patch Available  (was: Open)

> BlockPoolSliceStorage could not remove storageDir.
> --
>
> Key: HDFS-7719
> URL: https://issues.apache.org/jira/browse/HDFS-7719
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7719.000.patch
>
>
> The parameter of {{BlockPoolSliceStorage#removeVolumes()}} is a set of volume 
> level directories, thus {{BlockPoolSliceStorage}} could not directly compare 
> its own {{StorageDirs}} with this volume-level directory. The result of that 
> is {{BlockPoolSliceStorage}} did not actually remove the targeted 
> {{StorageDirectory}}. 
> It will cause failure when remove a volume and then immediately add a volume 
> back with the same mount point..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7719) BlockPoolSliceStorage could not remove storageDir.

2015-01-30 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7719:

Attachment: HDFS-7719.000.patch

This patch checks the targeted directories are parent directories in 
{{BlockPoolSliceStorage#removeVolumes}}. A test is added to enforce the 
behavior.


> BlockPoolSliceStorage could not remove storageDir.
> --
>
> Key: HDFS-7719
> URL: https://issues.apache.org/jira/browse/HDFS-7719
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7719.000.patch
>
>
> The parameter of {{BlockPoolSliceStorage#removeVolumes()}} is a set of volume 
> level directories, thus {{BlockPoolSliceStorage}} could not directly compare 
> its own {{StorageDirs}} with this volume-level directory. The result of that 
> is {{BlockPoolSliceStorage}} did not actually remove the targeted 
> {{StorageDirectory}}. 
> It will cause failure when remove a volume and then immediately add a volume 
> back with the same mount point..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299493#comment-14299493
 ] 

Zhe Zhang commented on HDFS-7339:
-

bq. I guess the concern is that with EC we will be going through the block ID 
space much faster since you'll allocate 9 IDs per physical block. Is that 
correct?
We have used Jing's proposal and allocated negative block IDs to EC blocks. 
Within that range ({{LONG.MIN ~ 0}}), 16 IDs will be allocated to each group. A 
physical block _could_ use 16 IDs, if the containing file is smaller than a 
block. In large files, each block group will have multiple blocks. 

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299475#comment-14299475
 ] 

Arpit Agarwal commented on HDFS-7339:
-

Yes I think not reserving will be fine. I guess the concern is that with EC we 
will be going through the block ID space much faster since you'll allocate 9 
IDs per physical block. Is that correct?

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7718) DFSClient objects created by AbstractFileSystem objects created by FileContext are not closed and results in thread leakage

2015-01-30 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated HDFS-7718:
--
Description: Currently, the {{FileContext}} class used by clients such as 
(for eg. {{YARNRunner}}) creates a new {{AbstractFilesystem}} object on 
initialization.. which creates a new {{DFSClient}} object.. which in turn 
creates a KeyProvider object.. If Encryption is turned on, and https is turned 
on, the keyprovider implementation (the {{KMSClientProvider}}) will create a 
{{ReloadingX509TrustManager}} thread per instance... which are never killed and 
can lead to a thread leak  (was: Currently, the {{FileContext}} class used by 
clients such as (for eg. {{YARNRunner}}) creates new {{AbstractFilesystem}} 
object on initialization.. which creates new {{DFSClient}} objects.. which in 
turn creates KeyProvider objects.. If Encryption is turned on, and https is 
turned on, the keyprovider implementation (the {{KMSClientProvider}}) will 
create a {{ReloadingX509TrustManager}} per instance... which are never killed 
and can leak)

> DFSClient objects created by AbstractFileSystem objects created by 
> FileContext are not closed and results in thread leakage
> ---
>
> Key: HDFS-7718
> URL: https://issues.apache.org/jira/browse/HDFS-7718
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Currently, the {{FileContext}} class used by clients such as (for eg. 
> {{YARNRunner}}) creates a new {{AbstractFilesystem}} object on 
> initialization.. which creates a new {{DFSClient}} object.. which in turn 
> creates a KeyProvider object.. If Encryption is turned on, and https is 
> turned on, the keyprovider implementation (the {{KMSClientProvider}}) will 
> create a {{ReloadingX509TrustManager}} thread per instance... which are never 
> killed and can lead to a thread leak



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7520) checknative should display a nicer error message when openssl support is not compiled in

2015-01-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299467#comment-14299467
 ] 

Chris Nauroth commented on HDFS-7520:
-

My best guess is that this happens when the build finds an OpenSSL, but it's 
too old for us to use.  According to the CMake logic, we'd skip compilation of 
OpensslCipher.c:

{code}
if (OPENSSL_LIBRARY AND OPENSSL_INCLUDE_DIR)
GET_FILENAME_COMPONENT(HADOOP_OPENSSL_LIBRARY ${OPENSSL_LIBRARY} NAME)
INCLUDE(CheckCSourceCompiles)
SET(OLD_CMAKE_REQUIRED_INCLUDES ${CMAKE_REQUIRED_INCLUDES})
SET(CMAKE_REQUIRED_INCLUDES ${OPENSSL_INCLUDE_DIR})
CHECK_C_SOURCE_COMPILES("#include 
\"${OPENSSL_INCLUDE_DIR}/openssl/evp.h\"\nint main(int argc, char **argv) { 
return !EVP_aes_256_ctr; }" HAS_NEW_ENOUGH_OPENSSL)
SET(CMAKE_REQUIRED_INCLUDES ${OLD_CMAKE_REQUIRED_INCLUDES})
if(NOT HAS_NEW_ENOUGH_OPENSSL)
MESSAGE("The OpenSSL library installed at ${OPENSSL_LIBRARY} is too 
old.  You need a version at least new enough to have EVP_aes_256_ctr.")
else(NOT HAS_NEW_ENOUGH_OPENSSL)
SET(USABLE_OPENSSL 1)
endif(NOT HAS_NEW_ENOUGH_OPENSSL)
endif (OPENSSL_LIBRARY AND OPENSSL_INCLUDE_DIR)
if (USABLE_OPENSSL)
SET(OPENSSL_SOURCE_FILES
"${D}/crypto/OpensslCipher.c"
"${D}/crypto/random/OpensslSecureRandom.c")
{code}

However, the check for {{buildSupportsOpenssl}} is driven by 
{{HADOOP_OPENSSL_LIBRARY}}, and I believe the CMake logic still left that 
defined:

{code}
JNIEXPORT jboolean JNICALL 
Java_org_apache_hadoop_util_NativeCodeLoader_buildSupportsOpenssl
  (JNIEnv *env, jclass clazz)
{
#ifdef HADOOP_OPENSSL_LIBRARY
  return JNI_TRUE;
#else
  return JNI_FALSE;
#endif
}
{code}

At the Java layer, this would cause it to think the build supports OpenSSL, 
therefore it calls {{initIDs}}, but they symbol isn't really in libhadoop.so.  
Therefore, it's an {{UnsatisfiedLinkError}} with message set to the signature 
of the Java native method.

Colin, if you know you saw this happening with a particular version of OpenSSL, 
would you please comment?  That would help Anu with a repro.  Thanks!

> checknative should display a nicer error message when openssl support is not 
> compiled in
> 
>
> Key: HDFS-7520
> URL: https://issues.apache.org/jira/browse/HDFS-7520
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Anu Engineer
>
> checknative should display a nicer error message when openssl support is not 
> compiled in.  Currently, it displays this:
> {code}
> [cmccabe@keter hadoop]$ hadoop checknative
> 14/12/12 14:08:43 INFO bzip2.Bzip2Factory: Successfully loaded & initialized 
> native-bzip2 library system-native
> 14/12/12 14:08:43 INFO zlib.ZlibFactory: Successfully loaded & initialized 
> native-zlib library
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib64/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: false org.apache.hadoop.crypto.OpensslCipher.initIDs()V
> {code}
> Instead, we should display something like this, if openssl is not supported 
> by the current build:
> {code}
> openssl: false Hadoop was built without openssl support.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7718) DFSClient objects created by AbstractFileSystem objects created by FileContext are not closed and results in thread leakage

2015-01-30 Thread Arun Suresh (JIRA)
Arun Suresh created HDFS-7718:
-

 Summary: DFSClient objects created by AbstractFileSystem objects 
created by FileContext are not closed and results in thread leakage
 Key: HDFS-7718
 URL: https://issues.apache.org/jira/browse/HDFS-7718
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Arun Suresh
Assignee: Arun Suresh


Currently, the {{FileContext}} class used by clients such as (for eg. 
{{YARNRunner}}) creates new {{AbstractFilesystem}} object on initialization.. 
which creates new {{DFSClient}} objects.. which in turn creates KeyProvider 
objects.. If Encryption is turned on, and https is turned on, the keyprovider 
implementation (the {{KMSClientProvider}}) will create a 
{{ReloadingX509TrustManager}} per instance... which are never killed and can 
leak



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299454#comment-14299454
 ] 

Jing Zhao commented on HDFS-7339:
-

I think not reserving currently should be fine. If we find we need to reserve, 
we can reserve from the other end of the block group id space.

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299446#comment-14299446
 ] 

Zhe Zhang commented on HDFS-7339:
-

Thanks [~arpitagarwal]. I feel reserving some block IDs makes sense, but the 
current value is probably too large. 

Would be nice to hear from others. 

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299443#comment-14299443
 ] 

Arpit Agarwal commented on HDFS-7339:
-

bq. Do you know why we reserve 1 billion block IDs (LAST_RESERVED_BLOCK_ID) in 
the current block ID generator?
So we could assign a special meaning to some block IDs in the future, if 
necessary. However the reservation was not useful in hindsight. We can free up 
this range for use.

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7717) Erasure Coding: provide a tool for convert files between replication and erasure coding

2015-01-30 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-7717:
---

 Summary: Erasure Coding: provide a tool for convert files between 
replication and erasure coding
 Key: HDFS-7717
 URL: https://issues.apache.org/jira/browse/HDFS-7717
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao


We need a tool to do offline conversion between replication and erasure coding. 
The tool itself can either utilize MR just like the current distcp, or act like 
the balancer/mover. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-7339.
-
Resolution: Fixed

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299434#comment-14299434
 ] 

Zhe Zhang commented on HDFS-7339:
-

I just committed the patch to HDFS-EC.

Thanks a lot for the reviews from [~jingzhao], [~szetszwo], [~andrew.wang], and 
[~vinayrpet]!

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7339:

Hadoop Flags: Reviewed

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7716) Erasure Coding: extend BlockInfo to handle EC info

2015-01-30 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-7716:
---

 Summary: Erasure Coding: extend BlockInfo to handle EC info
 Key: HDFS-7716
 URL: https://issues.apache.org/jira/browse/HDFS-7716
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao


The current BlockInfo's implementation only supports the replication mechanism. 
To use the same blocksMap handling block group and its data/parity blocks, we 
need to define a new BlockGroupInfo class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7339:

Status: Open  (was: Patch Available)

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299420#comment-14299420
 ] 

Jing Zhao commented on HDFS-7339:
-

Thanks for the quick update, Zhe!

bq. Do you know why we reserve 1 billion block IDs (LAST_RESERVED_BLOCK_ID) in 
the current block ID generator?

Actually I'm not very sure about the reason. Maybe [~arpitagarwal] can comment.

+1 for the current patch. In the meanwhile I just created a jira to address 
{{BlockGroupInfo}} and {{BlockInfo}}.

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299413#comment-14299413
 ] 

Andrew Wang commented on HDFS-7411:
---

RAT complains about a psd file? seems spurious.

> Refactor and improve decommissioning logic into DecommissionManager
> ---
>
> Key: HDFS-7411
> URL: https://issues.apache.org/jira/browse/HDFS-7411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.5.1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
> hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
> hdfs-7411.009.patch, hdfs-7411.010.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to 
> DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299410#comment-14299410
 ] 

Hadoop QA commented on HDFS-7339:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695650/HDFS-7339-008.patch
  against trunk revision 8dc59cb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9385//console

This message is automatically generated.

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7520) checknative should display a nicer error message when openssl support is not compiled in

2015-01-30 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299407#comment-14299407
 ] 

Anu Engineer commented on HDFS-7520:


I looked at this code path and it looks like if Hadoop was indeed compiled 
without OpenSSL you would have gotten the following message 
"build does not support openssl."

This failure seems have come from the InitDS call , which calls into Native 
Code. To understand why the loading failed, I need to understand which OS you 
are running, your LD config info and which version of OpenSSL shared objects 
are in your path. 

In other words, I need more info on how to reproduce this bug. This is error 
message is certainly not due to hadoop being compiled without openssl support. 
It is most probably due to a runtime error.




> checknative should display a nicer error message when openssl support is not 
> compiled in
> 
>
> Key: HDFS-7520
> URL: https://issues.apache.org/jira/browse/HDFS-7520
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Anu Engineer
>
> checknative should display a nicer error message when openssl support is not 
> compiled in.  Currently, it displays this:
> {code}
> [cmccabe@keter hadoop]$ hadoop checknative
> 14/12/12 14:08:43 INFO bzip2.Bzip2Factory: Successfully loaded & initialized 
> native-bzip2 library system-native
> 14/12/12 14:08:43 INFO zlib.ZlibFactory: Successfully loaded & initialized 
> native-zlib library
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib64/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: false org.apache.hadoop.crypto.OpensslCipher.initIDs()V
> {code}
> Instead, we should display something like this, if openssl is not supported 
> by the current build:
> {code}
> openssl: false Hadoop was built without openssl support.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7712) Switch blockStateChangeLog to use slf4j

2015-01-30 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7712:
--
Attachment: hdfs-7712.002.patch

Woops missed a file.

> Switch blockStateChangeLog to use slf4j
> ---
>
> Key: HDFS-7712
> URL: https://issues.apache.org/jira/browse/HDFS-7712
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-7712.001.patch, hdfs-7712.002.patch
>
>
> As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will 
> save a lot of string construction costs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Attachment: HDFS-7339-008.patch

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7648) Verify the datanode directory layout

2015-01-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299387#comment-14299387
 ] 

Colin Patrick McCabe commented on HDFS-7648:


bq. The original design of DirectoryScanner is to reconciles the differences 
between the block information maintained in memory and the actual blocks stored 
in disks. So it does fix the in-memory data structure.

Fixing the in-memory data structure is different than fixing the on-disk data 
structure.  I do not think that the DirectoryScanner should modify the files on 
the disk.  It just introduces too much potential for error and mistakes in the 
scanner to cause data loss.

bq. Yet more questions if the blocks are not fixed: should the block report 
include those blocks? How to access those blocks? How and when to fix those 
blocks?

The only way we could ever get into this state is:
* if someone manually renamed some block files on ext4
* if someone introduced a bug in the datanode code that put blocks in the wrong 
place.
* if there is serious ext4 filesystem corruption

None of those cases seems like something we should be trying to automatically 
recover from.

> Verify the datanode directory layout
> 
>
> Key: HDFS-7648
> URL: https://issues.apache.org/jira/browse/HDFS-7648
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
>
> HDFS-6482 changed datanode layout to use block ID to determine the directory 
> to store the block.  We should have some mechanism to verify it.  Either 
> DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7712) Switch blockStateChangeLog to use slf4j

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299378#comment-14299378
 ] 

Hadoop QA commented on HDFS-7712:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695641/hdfs-7712.001.patch
  against trunk revision 8635822.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9384//console

This message is automatically generated.

> Switch blockStateChangeLog to use slf4j
> ---
>
> Key: HDFS-7712
> URL: https://issues.apache.org/jira/browse/HDFS-7712
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-7712.001.patch
>
>
> As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will 
> save a lot of string construction costs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299370#comment-14299370
 ] 

Zhe Zhang commented on HDFS-7339:
-

Thanks [~jingzhao] for the helpful review!

bq. Instead of the current ID division mechanism (calculating the mid point 
between LAST_RESERVED_BLOCK_ID and LONG.MAX), can we simply let the block group 
id take all the negative long space (i.e., with first bit set to 1)? In this 
way we can utilize larger space and use simple bit manipulations for id 
generation/checking.
I think this is a good idea. With the current HDFS block ID generator, negative 
IDs will be used only when all positive ones are used up (i.e., the long value 
[reaches max | 
http://stackoverflow.com/questions/8513826/atomicinteger-incrementation]). With 
your proposal, regular block IDs are less likely to "grow into" the block group 
ID space.

bq. Why do we need to reserve the first 1024 block group ids?
Do you know why we reserve 1 billion block IDs ({{LAST_RESERVED_BLOCK_ID}}) in 
the current block ID generator? I couldn't figure out the exact reason, so 
chose to do the same for block groups.

bq. If we directly extend the current BlockInfo to BlockGroupInfo, the semantic 
of the triplets may be different for BlockGroupInfo. One possible solution is 
to let triplets's size be 3*(k+m), where k is the number of data blocks and m 
is the number of the parity blocks.
The 007 patch already attempts to do that but didn't finish -- if the file 
{{isStriped()}}, then the group size (currently hardcoded and will be 
configuration with HDFS-7337) will be used instead of {{getReplication()}} to 
choose targets. The updated patch will further use this logic to create the 
{{BlockInfo}} object. Then there will naturally be {{3*(k+m)}} elements in 
{{triplets}}. 

bq. The above #3 and #4 may need some extra refactoring work on the current 
BlockInfo class. I'm also fine with moving this part of work to a separate jira.
I agree. {{BlockGroupInfo}} is for optimization. We should commit this patch 
faster to facilitate a working prototype. I took it out in the new patch.

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299353#comment-14299353
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695598/hdfs-7411.010.patch
  against trunk revision 951b360.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9383//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9383//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9383//console

This message is automatically generated.

> Refactor and improve decommissioning logic into DecommissionManager
> ---
>
> Key: HDFS-7411
> URL: https://issues.apache.org/jira/browse/HDFS-7411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.5.1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
> hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
> hdfs-7411.009.patch, hdfs-7411.010.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to 
> DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7712) Switch blockStateChangeLog to use slf4j

2015-01-30 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7712:
--
Status: Patch Available  (was: Open)

> Switch blockStateChangeLog to use slf4j
> ---
>
> Key: HDFS-7712
> URL: https://issues.apache.org/jira/browse/HDFS-7712
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-7712.001.patch
>
>
> As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will 
> save a lot of string construction costs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7712) Switch blockStateChangeLog to use slf4j

2015-01-30 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7712:
--
Attachment: hdfs-7712.001.patch

Patch attached. [~kihwal] willing to review?

> Switch blockStateChangeLog to use slf4j
> ---
>
> Key: HDFS-7712
> URL: https://issues.apache.org/jira/browse/HDFS-7712
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-7712.001.patch
>
>
> As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will 
> save a lot of string construction costs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7714) Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.

2015-01-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299338#comment-14299338
 ] 

Kihwal Lee commented on HDFS-7714:
--

On a related note, I've seen similar symproms when the two namenodes' ctimes in 
their storage are different. After a datanode registers with one nn, it won't 
be able to register with the other and cause the actor thread to die. Depending 
on whom each datanode talk to first, they will be divided into two sets, each 
of which talking to only one namenode, thus creating a split brain situation.  
Of course, running two namenodes with different storage version is a mistake, 
but I've seen people making this kind of mistake multiple times. Whenever it 
happened, I wished for a way to start the actor thread back up. The 
refreshNamenodes dfs admin command does not work for HA configuration.

> Simultaneous restart of HA NameNodes and DataNode can cause DataNode to 
> register successfully with only one NameNode.
> -
>
> Key: HDFS-7714
> URL: https://issues.apache.org/jira/browse/HDFS-7714
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Chris Nauroth
>
> In an HA deployment, DataNodes must register with both NameNodes and send 
> periodic heartbeats and block reports to both.  However, if NameNodes and 
> DataNodes are restarted simultaneously, then this can trigger a race 
> condition in registration.  The end result is that the {{BPServiceActor}} for 
> one NameNode terminates, but the {{BPServiceActor}} for the other NameNode 
> remains alive.  The DataNode process is then in a "half-alive" state where it 
> only heartbeats and sends block reports to one of the NameNodes.  This could 
> cause a loss of storage capacity after an HA failover.  The DataNode process 
> would have to be restarted to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299300#comment-14299300
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

> This statement is false. Configuration compatibility was the core of the 
> above discussion. ...

Sure, there is a discussion of how to be compatible with the old conf.  
However, it never mentions that the decision is to have an incompatible change.

Anyway, thanks for the update.  Will review the patch.

> Refactor and improve decommissioning logic into DecommissionManager
> ---
>
> Key: HDFS-7411
> URL: https://issues.apache.org/jira/browse/HDFS-7411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.5.1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
> hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
> hdfs-7411.009.patch, hdfs-7411.010.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to 
> DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout

2015-01-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7608:
-
Attachment: HDFS-7608.0.patch

Post a patch for DFSClient newConnectedPeer write timeout.

> hdfs dfsclient  newConnectedPeer has no write timeout
> -
>
> Key: HDFS-7608
> URL: https://issues.apache.org/jira/browse/HDFS-7608
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, fuse-dfs
>Affects Versions: 2.3.0, 2.6.0
> Environment: hdfs 2.3.0  hbase 0.98.6
>Reporter: zhangshilong
>Assignee: Xiaoyu Yao
>  Labels: patch
> Fix For: 2.6.0
>
> Attachments: HDFS-7608.0.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> problem:
> hbase compactSplitThread may lock forever on  read datanode blocks.
> debug found:  epollwait timeout set to 0,so epollwait can not  run out.
> cause: in hdfs 2.3.0
> hbase using DFSClient to read and write blocks.
> DFSClient  creates one socket using newConnectedPeer(addr), but has no read 
> or write timeout. 
> in v 2.6.0,  newConnectedPeer has added readTimeout to deal with the 
> problem,but did not add writeTimeout. why did not add write Timeout?
> I think NioInetPeer need a default socket timeout,so appalications will no 
> need to force adding timeout by themselives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout

2015-01-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDFS-7608:


Assignee: Xiaoyu Yao

> hdfs dfsclient  newConnectedPeer has no write timeout
> -
>
> Key: HDFS-7608
> URL: https://issues.apache.org/jira/browse/HDFS-7608
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, fuse-dfs
>Affects Versions: 2.3.0, 2.6.0
> Environment: hdfs 2.3.0  hbase 0.98.6
>Reporter: zhangshilong
>Assignee: Xiaoyu Yao
>  Labels: patch
> Fix For: 2.6.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> problem:
> hbase compactSplitThread may lock forever on  read datanode blocks.
> debug found:  epollwait timeout set to 0,so epollwait can not  run out.
> cause: in hdfs 2.3.0
> hbase using DFSClient to read and write blocks.
> DFSClient  creates one socket using newConnectedPeer(addr), but has no read 
> or write timeout. 
> in v 2.6.0,  newConnectedPeer has added readTimeout to deal with the 
> problem,but did not add writeTimeout. why did not add write Timeout?
> I think NioInetPeer need a default socket timeout,so appalications will no 
> need to force adding timeout by themselives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7715) Implement the Hitchhiker erasure coding algorithm

2015-01-30 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-7715:
---

 Summary: Implement the Hitchhiker erasure coding algorithm
 Key: HDFS-7715
 URL: https://issues.apache.org/jira/browse/HDFS-7715
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang


[Hitchhiker | 
http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a 
new erasure coding algorithm developed as a research project at UC Berkeley. It 
has been shown to reduce network traffic and disk I/O by 25% and 45% during 
data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC 
framework, as one of the pluggable codec algorithms.

The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7710) Remove dead code in BackupImage.java

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299260#comment-14299260
 ] 

Hadoop QA commented on HDFS-7710:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695580/HDFS-7710.0.patch
  against trunk revision f2c9109.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9382//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9382//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9382//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9382//console

This message is automatically generated.

> Remove dead code in BackupImage.java
> 
>
> Key: HDFS-7710
> URL: https://issues.apache.org/jira/browse/HDFS-7710
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Attachments: HDFS-7710.0.patch
>
>
> BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is 
> proposed to clean it up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7684) The host:port settings of dfs.namenode.secondary.http-address should be trimmed before use

2015-01-30 Thread Tianyin Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299233#comment-14299233
 ] 

Tianyin Xu commented on HDFS-7684:
--

Yes, exactly. It seems that Hadoop has a bunch of such trimming issues that 
bothered a number of users...

Thanks, Xiaoyu!

~t

> The host:port settings of dfs.namenode.secondary.http-address should be 
> trimmed before use
> --
>
> Key: HDFS-7684
> URL: https://issues.apache.org/jira/browse/HDFS-7684
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.1
>Reporter: Tianyin Xu
>Assignee: Anu Engineer
>
> With the following setting,
> 
> dfs.namenode.secondary.http-address
> myhostname:50090 
> 
> The secondary NameNode could not be started
> $ hadoop-daemon.sh start secondarynamenode
> starting secondarynamenode, logging to 
> /home/hadoop/hadoop-2.4.1/logs/hadoop-hadoop-secondarynamenode-xxx.out
> /home/hadoop/hadoop-2.4.1/bin/hdfs
> Exception in thread "main" java.lang.IllegalArgumentException: Does not 
> contain a valid host:port authority: myhostname:50090
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:196)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getHttpAddress(SecondaryNameNode.java:203)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:214)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:651)
> We were really confused and misled by the log message: we thought about the 
> DNS problems (changed to IP address but no success) and the network problem 
> (tried to test the connections with no success...)
> It turned out to be that the setting is not trimmed and the additional space 
> character in the end of the setting caused the problem... OMG!!!...
> Searching on the Internet, we find we are really not alone.  So many users 
> encountered similar trim problems! The following lists a few:
> http://solaimurugan.blogspot.com/2013/10/hadoop-multi-node-cluster-configuration.html
> http://stackoverflow.com/questions/11263664/error-while-starting-the-hadoop-using-strat-all-sh
> https://issues.apache.org/jira/browse/HDFS-2799
> https://issues.apache.org/jira/browse/HBASE-6973



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7684) The host:port settings of dfs.namenode.secondary.http-address should be trimmed before use

2015-01-30 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299229#comment-14299229
 ] 

Xiaoyu Yao commented on HDFS-7684:
--

Thanks [~tianyin] for reporting this. The one that you hit can be fixed by 
changing the conf.get to conf.getTrimmed.

{code}
final String httpsAddrString = conf.get(
DFSConfigKeys.DFS_NAMENODE_SECONDARY_HTTPS_ADDRESS_KEY,
DFSConfigKeys.DFS_NAMENODE_SECONDARY_HTTPS_ADDRESS_DEFAULT);
InetSocketAddress httpsAddr = NetUtils.createSocketAddr(httpsAddrString);
{code}

Searched the call of NetUtils.createSocketAddr() in HDFS code, I found many 
other places with similar untrimmed host:port issues. For example in 
DataNodeManager#DataNodeManager() below. I think we should fix them as well 
with this JIRA.

{code}
this.defaultXferPort = NetUtils.createSocketAddr(
  conf.get(DFSConfigKeys.DFS_DATANODE_ADDRESS_KEY,
  DFSConfigKeys.DFS_DATANODE_ADDRESS_DEFAULT)).getPort();
this.defaultInfoPort = NetUtils.createSocketAddr(
  conf.get(DFSConfigKeys.DFS_DATANODE_HTTP_ADDRESS_KEY,
  DFSConfigKeys.DFS_DATANODE_HTTP_ADDRESS_DEFAULT)).getPort();
this.defaultInfoSecurePort = NetUtils.createSocketAddr(
conf.get(DFSConfigKeys.DFS_DATANODE_HTTPS_ADDRESS_KEY,
DFSConfigKeys.DFS_DATANODE_HTTPS_ADDRESS_DEFAULT)).getPort();
this.defaultIpcPort = NetUtils.createSocketAddr(
  conf.get(DFSConfigKeys.DFS_DATANODE_IPC_ADDRESS_KEY,
  DFSConfigKeys.DFS_DATANODE_IPC_ADDRESS_DEFAULT)).getPort();
{code}

> The host:port settings of dfs.namenode.secondary.http-address should be 
> trimmed before use
> --
>
> Key: HDFS-7684
> URL: https://issues.apache.org/jira/browse/HDFS-7684
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.1
>Reporter: Tianyin Xu
>Assignee: Anu Engineer
>
> With the following setting,
> 
> dfs.namenode.secondary.http-address
> myhostname:50090 
> 
> The secondary NameNode could not be started
> $ hadoop-daemon.sh start secondarynamenode
> starting secondarynamenode, logging to 
> /home/hadoop/hadoop-2.4.1/logs/hadoop-hadoop-secondarynamenode-xxx.out
> /home/hadoop/hadoop-2.4.1/bin/hdfs
> Exception in thread "main" java.lang.IllegalArgumentException: Does not 
> contain a valid host:port authority: myhostname:50090
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:196)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getHttpAddress(SecondaryNameNode.java:203)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:214)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:651)
> We were really confused and misled by the log message: we thought about the 
> DNS problems (changed to IP address but no success) and the network problem 
> (tried to test the connections with no success...)
> It turned out to be that the setting is not trimmed and the additional space 
> character in the end of the setting caused the problem... OMG!!!...
> Searching on the Internet, we find we are really not alone.  So many users 
> encountered similar trim problems! The following lists a few:
> http://solaimurugan.blogspot.com/2013/10/hadoop-multi-node-cluster-configuration.html
> http://stackoverflow.com/questions/11263664/error-while-starting-the-hadoop-using-strat-all-sh
> https://issues.apache.org/jira/browse/HDFS-2799
> https://issues.apache.org/jira/browse/HBASE-6973



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize

2015-01-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299227#comment-14299227
 ] 

Chris Nauroth commented on HDFS-2882:
-

I'm linking this to HDFS-7714, where I reported that a bug in this part of the 
code can cause a DataNode process to remain running in a "half-alive" state 
registered to only one NameNode with no opportunity to re-register to the other 
one.  I don't think this patch introduced the problem though.

> DN continues to start up, even if block pool fails to initialize
> 
>
> Key: HDFS-2882
> URL: https://issues.apache.org/jira/browse/HDFS-2882
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Vinayakumar B
> Fix For: 2.4.1
>
> Attachments: HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, 
> HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, 
> HDFS-2882.patch, hdfs-2882.txt
>
>
> I started a DN on a machine that was completely out of space on one of its 
> drives. I saw the following:
> 2012-02-02 09:56:50,499 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id 
> DS-507718931-172.29.5.194-11072-12978
> 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021
> java.io.IOException: Mkdirs failed to create 
> /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp
> at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.(FSDataset.java:335)
> but the DN continued to run, spewing NPEs when it tried to do block reports, 
> etc. This was on the HDFS-1623 branch but may affect trunk as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7714) Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.

2015-01-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299223#comment-14299223
 ] 

Chris Nauroth commented on HDFS-7714:
-

Here are more details on what I've observed.  I saw that the main 
{{BPServiceActor#run}} loop was active for one NameNode, but for the other one, 
it had reported the fatal "Initialization failed" error from this part of the 
code:

{code}
  while (true) {
// init stuff
try {
  // setup storage
  connectToNNAndHandshake();
  break;
} catch (IOException ioe) {
  // Initial handshake, storage recovery or registration failed
  runningState = RunningState.INIT_FAILED;
  if (shouldRetryInit()) {
// Retry until all namenode's of BPOS failed initialization
LOG.error("Initialization failed for " + this + " "
+ ioe.getLocalizedMessage());
sleepAndLogInterrupts(5000, "initializing");
  } else {
runningState = RunningState.FAILED;
LOG.fatal("Initialization failed for " + this + ". Exiting. ", ioe);
return;
  }
}
  }
{code}

The {{ioe}} was an {{EOFException}} while trying the {{registerDatanode}} RPC.  
Lining up timestamps from NN and DN logs, I could see that the NN had restarted 
at the same time, causing it to abandon this RPC connection, ultimately 
triggering the {{EOFException}} on the DataNode side.

Most importantly, the fact that it was on the code path with the fatal-level 
logging means that it would never reattempt registration with this NameNode.  
{{shouldRetryInit()}} must have returned {{false}}.  The implementation of 
{{BPOfferService#shouldRetryInit}} is that it should only retry if the other 
one already registered successfully:

{code}
  /*
   * Let the actor retry for initialization until all namenodes of cluster have
   * failed.
   */
  boolean shouldRetryInit() {
if (hasBlockPoolId()) {
  // One of the namenode registered successfully. lets continue retry for
  // other.
  return true;
}
return isAlive();
  }
{code}

Tying that all together, this bug happens when the first attempted NameNode 
registration fails but the second succeeds.  The DataNode process remains 
running, but with only one live {{BPServiceActor}}.

HDFS-2882 had a lot of discussion of DataNode startup failure scenarios.  I 
think the summary of that discussion is that the DataNode should in general 
retry its NameNode registrations, but it should instead abort right away if 
there is no possibility for registration to be successful.  (i.e. There is a 
misconfiguration or a hardware failure.)  I think the change we need here is 
that we should keep retrying the {{registerDatanode}} RPC if there is NameNode 
downtime or transient connectivity failure.  Other failure reasons should still 
cause an abort.


> Simultaneous restart of HA NameNodes and DataNode can cause DataNode to 
> register successfully with only one NameNode.
> -
>
> Key: HDFS-7714
> URL: https://issues.apache.org/jira/browse/HDFS-7714
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Chris Nauroth
>
> In an HA deployment, DataNodes must register with both NameNodes and send 
> periodic heartbeats and block reports to both.  However, if NameNodes and 
> DataNodes are restarted simultaneously, then this can trigger a race 
> condition in registration.  The end result is that the {{BPServiceActor}} for 
> one NameNode terminates, but the {{BPServiceActor}} for the other NameNode 
> remains alive.  The DataNode process is then in a "half-alive" state where it 
> only heartbeats and sends block reports to one of the NameNodes.  This could 
> cause a loss of storage capacity after an HA failover.  The DataNode process 
> would have to be restarted to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7714) Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.

2015-01-30 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7714:
---

 Summary: Simultaneous restart of HA NameNodes and DataNode can 
cause DataNode to register successfully with only one NameNode.
 Key: HDFS-7714
 URL: https://issues.apache.org/jira/browse/HDFS-7714
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Chris Nauroth


In an HA deployment, DataNodes must register with both NameNodes and send 
periodic heartbeats and block reports to both.  However, if NameNodes and 
DataNodes are restarted simultaneously, then this can trigger a race condition 
in registration.  The end result is that the {{BPServiceActor}} for one 
NameNode terminates, but the {{BPServiceActor}} for the other NameNode remains 
alive.  The DataNode process is then in a "half-alive" state where it only 
heartbeats and sends block reports to one of the NameNodes.  This could cause a 
loss of storage capacity after an HA failover.  The DataNode process would have 
to be restarted to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again

2015-01-30 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299221#comment-14299221
 ] 

Yongjun Zhang commented on HDFS-7707:
-

Thank you so much Kihwal!

What happened was, the user manually delete the dir by issuing {{Hadoop fs –rm 
–r –skipTrash}} command. So it seems still related to delayed block removal. It 
appears that snapshot is involved but I will confirm.


 

> Edit log corruption due to delayed block removal again
> --
>
> Key: HDFS-7707
> URL: https://issues.apache.org/jira/browse/HDFS-7707
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>
> Edit log corruption is seen again, even with the fix of HDFS-6825. 
> Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get 
> into edit log for the fileY under dirX, thus corrupting the edit log 
> (restarting NN with the edit log would fail). 
> What HDFS-6825 does to fix this issue is, to detect whether fileY is already 
> deleted by checking the ancestor dirs on it's path, if any of them doesn't 
> exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for 
> the file.
> For this new edit log corruption, what I found was, the client first deleted 
> dirX recursively, then create another dir with exactly the same name as dirX 
> right away.  Because HDFS-6825 count on the namespace checking (whether dirX 
> exists in its parent dir) to decide whether a file has been deleted, the 
> newly created dirX defeats this checking, thus OP_CLOSE for the already 
> deleted file gets into the edit log, due to delayed block removal.
> What we need to do is to have a more robust way to detect whether a file has 
> been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7520) checknative should display a nicer error message when openssl support is not compiled in

2015-01-30 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer reassigned HDFS-7520:
--

Assignee: Anu Engineer

> checknative should display a nicer error message when openssl support is not 
> compiled in
> 
>
> Key: HDFS-7520
> URL: https://issues.apache.org/jira/browse/HDFS-7520
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Anu Engineer
>
> checknative should display a nicer error message when openssl support is not 
> compiled in.  Currently, it displays this:
> {code}
> [cmccabe@keter hadoop]$ hadoop checknative
> 14/12/12 14:08:43 INFO bzip2.Bzip2Factory: Successfully loaded & initialized 
> native-bzip2 library system-native
> 14/12/12 14:08:43 INFO zlib.ZlibFactory: Successfully loaded & initialized 
> native-zlib library
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib64/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: false org.apache.hadoop.crypto.OpensslCipher.initIDs()V
> {code}
> Instead, we should display something like this, if openssl is not supported 
> by the current build:
> {code}
> openssl: false Hadoop was built without openssl support.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7684) The host:port settings of dfs.namenode.secondary.http-address should be trimmed before use

2015-01-30 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer reassigned HDFS-7684:
--

Assignee: Anu Engineer

> The host:port settings of dfs.namenode.secondary.http-address should be 
> trimmed before use
> --
>
> Key: HDFS-7684
> URL: https://issues.apache.org/jira/browse/HDFS-7684
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.1
>Reporter: Tianyin Xu
>Assignee: Anu Engineer
>
> With the following setting,
> 
> dfs.namenode.secondary.http-address
> myhostname:50090 
> 
> The secondary NameNode could not be started
> $ hadoop-daemon.sh start secondarynamenode
> starting secondarynamenode, logging to 
> /home/hadoop/hadoop-2.4.1/logs/hadoop-hadoop-secondarynamenode-xxx.out
> /home/hadoop/hadoop-2.4.1/bin/hdfs
> Exception in thread "main" java.lang.IllegalArgumentException: Does not 
> contain a valid host:port authority: myhostname:50090
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:196)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getHttpAddress(SecondaryNameNode.java:203)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:214)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:651)
> We were really confused and misled by the log message: we thought about the 
> DNS problems (changed to IP address but no success) and the network problem 
> (tried to test the connections with no success...)
> It turned out to be that the setting is not trimmed and the additional space 
> character in the end of the setting caused the problem... OMG!!!...
> Searching on the Internet, we find we are really not alone.  So many users 
> encountered similar trim problems! The following lists a few:
> http://solaimurugan.blogspot.com/2013/10/hadoop-multi-node-cluster-configuration.html
> http://stackoverflow.com/questions/11263664/error-while-starting-the-hadoop-using-strat-all-sh
> https://issues.apache.org/jira/browse/HDFS-2799
> https://issues.apache.org/jira/browse/HBASE-6973



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again

2015-01-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299188#comment-14299188
 ] 

Kihwal Lee commented on HDFS-7707:
--

bq. Do you mean that we could get a wrong iFile here?
Since the block collection of a block won't magically get updated to a new 
inode file, I don't see how it can be a wrong inode file. So it may not be due 
to delayed block removal.

bq.  what's the reason that tmpParent won't get a null at the dirX when trying 
to get the parent of dirX (if this happened)?
If snapshot is not involved, the parent will be set to null during delete while 
in the fsn write lock. Lack of memory barrier can cause stale values to be used 
in multi-processor and multi-threaded env, but I am not sure whether that is 
the cause here.

If {{commitBlockSynchronization()}} was involved, was it initiated by client 
(e.g. revoerLease() or create/append() )?

> Edit log corruption due to delayed block removal again
> --
>
> Key: HDFS-7707
> URL: https://issues.apache.org/jira/browse/HDFS-7707
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>
> Edit log corruption is seen again, even with the fix of HDFS-6825. 
> Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get 
> into edit log for the fileY under dirX, thus corrupting the edit log 
> (restarting NN with the edit log would fail). 
> What HDFS-6825 does to fix this issue is, to detect whether fileY is already 
> deleted by checking the ancestor dirs on it's path, if any of them doesn't 
> exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for 
> the file.
> For this new edit log corruption, what I found was, the client first deleted 
> dirX recursively, then create another dir with exactly the same name as dirX 
> right away.  Because HDFS-6825 count on the namespace checking (whether dirX 
> exists in its parent dir) to decide whether a file has been deleted, the 
> newly created dirX defeats this checking, thus OP_CLOSE for the already 
> deleted file gets into the edit log, due to delayed block removal.
> What we need to do is to have a more robust way to detect whether a file has 
> been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299158#comment-14299158
 ] 

Andrew Wang commented on HDFS-7411:
---

I split the logging changes off to HDFS-7706 which I just committed. New rev 
posted.

bq. It seems the discussion above did not consider the incompatibility. I guess 
the unnecessarily complicated and large patch did hide the important details. 
We need to revisit it.

This statement is false. Configuration compatibility was the core of the above 
discussion. In fact, my 003 rev of this patch tried to keep compatibility with 
the old key, and based on the discussion we decided to change that.

This newest rev does bring fallback support for the old key though, which 
satisfies your comment.

> Refactor and improve decommissioning logic into DecommissionManager
> ---
>
> Key: HDFS-7411
> URL: https://issues.apache.org/jira/browse/HDFS-7411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.5.1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
> hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
> hdfs-7411.009.patch, hdfs-7411.010.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to 
> DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again

2015-01-30 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299153#comment-14299153
 ] 

Yongjun Zhang commented on HDFS-7707:
-

HI Kihwal,

Thanks a lot for your further comments. I did the analysis based on the edit 
log. I assumed {{commitBlockSynchronization()}} is involved due to the delayed 
block removal. Basically the same code path as examined by HDFS-6825. I will 
take a look at other path too.

Assuming {{commitBlockSynchronization}} is involved (. The {{iNodeFile}} is got 
by the following code:
{code}
BlockCollection blockCollection = storedBlock.getBlockCollection();
INodeFile iFile = ((INode)blockCollection).asFile();
{code}
Do you mean that we could get a wrong iFile here?

BTW, your comment rang a bell to me: when we delete a dir, what's the reason 
that {{tmpParent}} won't get a null at the {{dirX}} when trying to get the 
parent of {{dirX}} (if this happened)?
{code}
   while (true) {
  if (tmpParent == null ||
  tmpParent.searchChildren(tmpChild.getLocalNameBytes()) < 0) {
return true;
  }
  if (tmpParent.isRoot()) {
break;
  }
  tmpChild = tmpParent;
  tmpParent = tmpParent.getParent();
}
{code}

Thanks.


> Edit log corruption due to delayed block removal again
> --
>
> Key: HDFS-7707
> URL: https://issues.apache.org/jira/browse/HDFS-7707
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>
> Edit log corruption is seen again, even with the fix of HDFS-6825. 
> Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get 
> into edit log for the fileY under dirX, thus corrupting the edit log 
> (restarting NN with the edit log would fail). 
> What HDFS-6825 does to fix this issue is, to detect whether fileY is already 
> deleted by checking the ancestor dirs on it's path, if any of them doesn't 
> exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for 
> the file.
> For this new edit log corruption, what I found was, the client first deleted 
> dirX recursively, then create another dir with exactly the same name as dirX 
> right away.  Because HDFS-6825 count on the namespace checking (whether dirX 
> exists in its parent dir) to decide whether a file has been deleted, the 
> newly created dirX defeats this checking, thus OP_CLOSE for the already 
> deleted file gets into the edit log, due to delayed block removal.
> What we need to do is to have a more robust way to detect whether a file has 
> been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-30 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7411:
--
Attachment: hdfs-7411.010.patch

> Refactor and improve decommissioning logic into DecommissionManager
> ---
>
> Key: HDFS-7411
> URL: https://issues.apache.org/jira/browse/HDFS-7411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.5.1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
> hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
> hdfs-7411.009.patch, hdfs-7411.010.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to 
> DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7713) Improve the HDFS Web UI browser to allow chowning / chmoding, creating dirs, and setting replication

2015-01-30 Thread Ravi Prakash (JIRA)
Ravi Prakash created HDFS-7713:
--

 Summary: Improve the HDFS Web UI browser to allow chowning / 
chmoding, creating dirs, and setting replication
 Key: HDFS-7713
 URL: https://issues.apache.org/jira/browse/HDFS-7713
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ravi Prakash
Assignee: Ravi Prakash


This JIRA is for improving the NN UI (everything except file uploads)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7697) Document the scope of the PB OIV tool

2015-01-30 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299129#comment-14299129
 ] 

Lei (Eddy) Xu commented on HDFS-7697:
-

[~wheat9] Thank for very much for filling this. 

Where should I add document to? 

> Document the scope of the PB OIV tool
> -
>
> Key: HDFS-7697
> URL: https://issues.apache.org/jira/browse/HDFS-7697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>
> As par HDFS-6673, we need to document the applicable scope of the new PB OIV 
> tool so that it won't catch users by surprise.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7648) Verify the datanode directory layout

2015-01-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299128#comment-14299128
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7648:
---

> It's not the goal of DirectoryScanner to fix anything. ...

The original design of DirectoryScanner is to reconciles the differences 
between the block information maintained in memory and the actual blocks stored 
in disks.  So it does fix the in-memory data structure.

> What would be the suggested way to fix these unmatched blocks. Also, if it is 
> not fixed then this warning message will be printed repeatedly during the 
> directory scanning interval.

Yet more questions if the blocks are not fixed: should the block report include 
those blocks?  How to access those blocks?  How and when to fix those blocks?

It seems fixing the blocks is better.  Of course, we still log an error message 
for those blocks.

> Verify the datanode directory layout
> 
>
> Key: HDFS-7648
> URL: https://issues.apache.org/jira/browse/HDFS-7648
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
>
> HDFS-6482 changed datanode layout to use block ID to determine the directory 
> to store the block.  We should have some mechanism to verify it.  Either 
> DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7706) Switch BlockManager logging to use slf4j

2015-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299122#comment-14299122
 ] 

Hudson commented on HDFS-7706:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6970 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6970/])
HDFS-7706. Switch BlockManager logging to use slf4j. (wang: rev 
951b3608a8cb1d9063b9be9c740b524c137b816f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestPendingInvalidateBlock.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyBlockManagement.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReplicationBlocks.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyIsHot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Switch BlockManager logging to use slf4j
> 
>
> Key: HDFS-7706
> URL: https://issues.apache.org/jira/browse/HDFS-7706
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: hdfs-7706.001.patch
>
>
> Nice little refactor to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299116#comment-14299116
 ] 

Jing Zhao commented on HDFS-7339:
-

Thanks Zhe! The patch looks good overall. Some comments and questions:
# Instead of the current ID division mechanism (calculating the mid point 
between LAST_RESERVED_BLOCK_ID and LONG.MAX), can we simply let the block group 
id take all the negative long space (i.e., with first bit set to 1)? In this 
way we can utilize larger space and use simple bit manipulations for id 
generation/checking.
# Why do we need to reserve the first 1024 block group ids?
# If we directly extend the current BlockInfo to BlockGroupInfo, the semantic 
of the {{triplets}} may be different for BlockGroupInfo. One possible solution 
is to let {{triplets}}'s size be {{3*(k+m)}}, where k is the number of data 
blocks and m is the number of the parity blocks.
# The current BlockGroupInfo's constructor calls BlockInfo's copy constructor 
which constructs triplets based on replication factor. We may still need to 
revisit BlockInfo and BlockGroupInfo to make sure BlockGroupInfo is strictly 
separated with replication operations and logic.

The above #3 and #4 may need some extra refactoring work on the current 
BlockInfo class. I'm also fine with moving this part of work to a separate jira.

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7706) Switch BlockManager logging to use slf4j

2015-01-30 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7706:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

Thanks again for reviewing all, committed to trunk and branch-2. I'll work on 
blockStateChangeLog in HDFS-7712.

> Switch BlockManager logging to use slf4j
> 
>
> Key: HDFS-7706
> URL: https://issues.apache.org/jira/browse/HDFS-7706
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: hdfs-7706.001.patch
>
>
> Nice little refactor to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7712) Switch blockStateChangeLog to use slf4j

2015-01-30 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-7712:
-

 Summary: Switch blockStateChangeLog to use slf4j
 Key: HDFS-7712
 URL: https://issues.apache.org/jira/browse/HDFS-7712
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor


As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will 
save a lot of string construction costs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again

2015-01-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299106#comment-14299106
 ] 

Kihwal Lee commented on HDFS-7707:
--

{{isFileDeletec()}} is always called with the fsn lock held, so no modification 
is done while in the method and {{tmpParent}} is obtained by calling 
{{file.getParent()}}. So {{tmpParent}} cannot be a newly created directory 
inode, unless something is automatically setting the file inode's parent to the 
new directory inode.   If {{isFileDeletec()}} is called with a wrong file 
inode, then it is possible to hit this condition.  That means both the parent 
dir and the file were recreated and NN got confused.  Does this case also 
involve {{commitBlockSynchronization()}}?



> Edit log corruption due to delayed block removal again
> --
>
> Key: HDFS-7707
> URL: https://issues.apache.org/jira/browse/HDFS-7707
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>
> Edit log corruption is seen again, even with the fix of HDFS-6825. 
> Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get 
> into edit log for the fileY under dirX, thus corrupting the edit log 
> (restarting NN with the edit log would fail). 
> What HDFS-6825 does to fix this issue is, to detect whether fileY is already 
> deleted by checking the ancestor dirs on it's path, if any of them doesn't 
> exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for 
> the file.
> For this new edit log corruption, what I found was, the client first deleted 
> dirX recursively, then create another dir with exactly the same name as dirX 
> right away.  Because HDFS-6825 count on the namespace checking (whether dirX 
> exists in its parent dir) to decide whether a file has been deleted, the 
> newly created dirX defeats this checking, thus OP_CLOSE for the already 
> deleted file gets into the edit log, due to delayed block removal.
> What we need to do is to have a more robust way to detect whether a file has 
> been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7706) Switch BlockManager logging to use slf4j

2015-01-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299104#comment-14299104
 ] 

Andrew Wang commented on HDFS-7706:
---

Hey Kihwal, I'd like to hit that in a follow-on patch. I split this out from 
HDFS-7411 to aid reviewers which I'd like to rev in the meantime, and this one 
actually doesn't touch {{blockStateChangeLog}}. Promise I'll get right to it, 
just would prefer not to wait the latency of another Jenkins run.

Xiaoyu, I'll take care of the import too in the follow-on too. Thanks for 
reviewing.

I ran the failed test locally and it passed, so looks like a flake. I'll commit 
this shortly based on Yi's +1, thank's Yi for reviewing :)

> Switch BlockManager logging to use slf4j
> 
>
> Key: HDFS-7706
> URL: https://issues.apache.org/jira/browse/HDFS-7706
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-7706.001.patch
>
>
> Nice little refactor to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7710) Remove dead code in BackupImage.java

2015-01-30 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299094#comment-14299094
 ] 

Haohui Mai commented on HDFS-7710:
--

+1 pending jenkins.

> Remove dead code in BackupImage.java
> 
>
> Key: HDFS-7710
> URL: https://issues.apache.org/jira/browse/HDFS-7710
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Attachments: HDFS-7710.0.patch
>
>
> BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is 
> proposed to clean it up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout

2015-01-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7608:

Summary: hdfs dfsclient  newConnectedPeer has no write timeout  (was: hdfs 
dfsclient  newConnectedPeer has no read or write timeout)

I updated the title to make it clear that write timeout is still missing.  
HDFS-7005 already added read timeout.

> hdfs dfsclient  newConnectedPeer has no write timeout
> -
>
> Key: HDFS-7608
> URL: https://issues.apache.org/jira/browse/HDFS-7608
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, fuse-dfs
>Affects Versions: 2.3.0, 2.6.0
> Environment: hdfs 2.3.0  hbase 0.98.6
>Reporter: zhangshilong
>  Labels: patch
> Fix For: 2.6.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> problem:
> hbase compactSplitThread may lock forever on  read datanode blocks.
> debug found:  epollwait timeout set to 0,so epollwait can not  run out.
> cause: in hdfs 2.3.0
> hbase using DFSClient to read and write blocks.
> DFSClient  creates one socket using newConnectedPeer(addr), but has no read 
> or write timeout. 
> in v 2.6.0,  newConnectedPeer has added readTimeout to deal with the 
> problem,but did not add writeTimeout. why did not add write Timeout?
> I think NioInetPeer need a default socket timeout,so appalications will no 
> need to force adding timeout by themselives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299031#comment-14299031
 ] 

Zhe Zhang commented on HDFS-7339:
-

The build failure is because of the divergence of HDFS-EC and trunk 
(HDFS-7347). 

[~jingzhao], [~szetszwo]: please let me know if the patch addresses the issues 
we discussed during the meeting. Thanks.

> Allocating and persisting block groups in NameNode
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, HDFS-7339-007.patch, Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7711) [ HDFS DOC ] Various Typos in ClusterSetup.html and improvements

2015-01-30 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-7711:
---
Description: 
1) dfs.namenode.hosts / dfs.namenode.hosts.exclude.

>> I did not seen above two properties in code...I feel,This should be  
>> *{color:green}dfs.hosts/dfs.hosts.exclude{color}* 

2)  *{color:red}conf{color}* */hadoop-env.sh*  and  *{color:red}conf{color}*  
*/yarn-env.sh* 
>> Most of the places written as conf dir,,but currently conf dir will not 
>> present in hadoop distribution.
It's better to give  *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or 
*{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh

  was:
1) dfs.namenode.hosts / dfs.namenode.hosts.exclude.

>> I did not seen above two properties in code...This should 
>> dfs.hosts/dfs.hosts.exclude

2)  *{color:red}conf{color}* */hadoop-env.sh*  and  *{color:red}conf{color}*  
*/yarn-env.sh* 
>> Most of the places written as conf dir,,but currently conf dir will not 
>> present in hadoop distribution.
It's better to give  *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or 
*{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh


> [ HDFS DOC ] Various Typos  in ClusterSetup.html and improvements
> -
>
> Key: HDFS-7711
> URL: https://issues.apache.org/jira/browse/HDFS-7711
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Brahma Reddy Battula
>
> 1) dfs.namenode.hosts / dfs.namenode.hosts.exclude.
> >> I did not seen above two properties in code...I feel,This should be  
> >> *{color:green}dfs.hosts/dfs.hosts.exclude{color}* 
> 2)  *{color:red}conf{color}* */hadoop-env.sh*  and  *{color:red}conf{color}*  
> */yarn-env.sh* 
> >> Most of the places written as conf dir,,but currently conf dir will not 
> >> present in hadoop distribution.
> It's better to give  *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or 
> *{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7711) [ HDFS DOC ] Various Typos in ClusterSetup.html and improvements

2015-01-30 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-7711:
---
Description: 
1) dfs.namenode.hosts / dfs.namenode.hosts.exclude.

>> I did not seen above two properties in code...This should 
>> dfs.hosts/dfs.hosts.exclude

2)  *{color:red}conf{color}* */hadoop-env.sh*  and  *{color:red}conf{color}*  
*/yarn-env.sh* 
>> Most of the places written as conf dir,,but currently conf dir will not 
>> present in hadoop distribution.
It's better to give  *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or 
*{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh

  was:
1) dfs.namenode.hosts / dfs.namenode.hosts.exclude.

>> I did not seen above two properties in code...This should 
>> dfs.hosts/dfs.hosts.exclude

2)  *{color:red}conf{color}* */hadoop-env.sh*  and  *{color:red}conf{color}*  
*/yarn-env.sh* 
>> Most of the places written as conf dir,,but currently conf dir will not 
>> present in hadoop distribution.
It can be HADOOP_CONF_DIR/hadoop-env.sh or HADOOP_HOME/etc/hadoop/hadoop-env.sh


> [ HDFS DOC ] Various Typos  in ClusterSetup.html and improvements
> -
>
> Key: HDFS-7711
> URL: https://issues.apache.org/jira/browse/HDFS-7711
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Brahma Reddy Battula
>
> 1) dfs.namenode.hosts / dfs.namenode.hosts.exclude.
> >> I did not seen above two properties in code...This should 
> >> dfs.hosts/dfs.hosts.exclude
> 2)  *{color:red}conf{color}* */hadoop-env.sh*  and  *{color:red}conf{color}*  
> */yarn-env.sh* 
> >> Most of the places written as conf dir,,but currently conf dir will not 
> >> present in hadoop distribution.
> It's better to give  *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or 
> *{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7711) [ HDFS DOC ] Various Typos in ClusterSetup.html and improvements

2015-01-30 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created HDFS-7711:
--

 Summary: [ HDFS DOC ] Various Typos  in ClusterSetup.html and 
improvements
 Key: HDFS-7711
 URL: https://issues.apache.org/jira/browse/HDFS-7711
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula


1) dfs.namenode.hosts / dfs.namenode.hosts.exclude.

>> I did not seen above two properties in code...This should 
>> dfs.hosts/dfs.hosts.exclude

2)  *{color:red}conf{color}* */hadoop-env.sh*  and  *{color:red}conf{color}*  
*/yarn-env.sh* 
>> Most of the places written as conf dir,,but currently conf dir will not 
>> present in hadoop distribution.
It can be HADOOP_CONF_DIR/hadoop-env.sh or HADOOP_HOME/etc/hadoop/hadoop-env.sh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7710) Remove dead code in BackupImage.java

2015-01-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7710:
-
Status: Patch Available  (was: Open)

> Remove dead code in BackupImage.java
> 
>
> Key: HDFS-7710
> URL: https://issues.apache.org/jira/browse/HDFS-7710
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Attachments: HDFS-7710.0.patch
>
>
> BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is 
> proposed to clean it up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7710) Remove dead code in BackupImage.java

2015-01-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7710:
-
Attachment: HDFS-7710.0.patch

> Remove dead code in BackupImage.java
> 
>
> Key: HDFS-7710
> URL: https://issues.apache.org/jira/browse/HDFS-7710
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Attachments: HDFS-7710.0.patch
>
>
> BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is 
> proposed to clean it up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7710) Remove dead code in BackupImage.java

2015-01-30 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-7710:


 Summary: Remove dead code in BackupImage.java
 Key: HDFS-7710
 URL: https://issues.apache.org/jira/browse/HDFS-7710
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Priority: Minor


BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is proposed 
to clean it up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4265) BKJM doesn't take advantage of speculative reads

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298945#comment-14298945
 ] 

Hadoop QA commented on HDFS-4265:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695560/0006-HDFS-4265.patch
  against trunk revision f2c9109.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9381//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9381//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9381//artifact/patchprocess/newPatchFindbugsWarningsbkjournal.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9381//console

This message is automatically generated.

> BKJM doesn't take advantage of speculative reads
> 
>
> Key: HDFS-4265
> URL: https://issues.apache.org/jira/browse/HDFS-4265
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Ivan Kelly
>Assignee: Rakesh R
> Attachments: 0005-HDFS-4265.patch, 0006-HDFS-4265.patch, 
> 001-HDFS-4265.patch, 002-HDFS-4265.patch, 003-HDFS-4265.patch, 
> 004-HDFS-4265.patch
>
>
> BookKeeperEditLogInputStream reads entry at a time, so it doesn't take 
> advantage of the speculative read mechanism introduced by BOOKKEEPER-336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7658) HDFS Space Quota not working as expected

2015-01-30 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298939#comment-14298939
 ] 

Xiaoyu Yao commented on HDFS-7658:
--

bq. 
HDFS doesnt know the complete size of the file ahead. It considers default 
blocksize ( in your case 256MB) for calculation while adding the new block.

His default blocksize should be 128 MB without modifying dfs.blocksize in 
hdfs-site.xml. If it is a 256MB block size, copy the first 10MB file will fail 
with RF=2 and a 500MB space quota as you mentioned.

> HDFS Space Quota not working as expected
> 
>
> Key: HDFS-7658
> URL: https://issues.apache.org/jira/browse/HDFS-7658
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CDH4.6
>Reporter: Puttaswamy
>
> I am implementing hdfs quota in a cdh4.6 cluster .Hdfs name quota has been 
> working properly.But the Hdfs Space quota has not been working as 
> expected.i.e,
> I set the space quota of 500MB for a directory say /test-space-quota.
> Then i put a file of 10 Mb into /test-space-quota which worked .Now the space 
> available is 480 MB ( 500 - 10*2) where 2 is rep factor.
> Then i put a file of 50Mb into /test-space-quota which worked too as 
> expected. Now the space available is 380 MB (480 - 50*2)
> "I am checking the quota left from the command hadoop fs -count -q 
> /test-space-quota"
> Then i tried to put a file of 100 Mb . It should since it will just consume 
> 200 Mb of space with replication. But when i put that i got an error 
> "DataStreamer Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /test is exceeded: quota = 524288000 B = 500 MB but diskspace consumed = 
> 662700032 B = 632 MB"
> But the quota says
> hadoop fs -count -q /test-space-quota
> none inf   524288000   3984588801 
>2   62914560 /test-space-quota
> Could you please help on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7658) HDFS Space Quota not working as expected

2015-01-30 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298936#comment-14298936
 ] 

Xiaoyu Yao commented on HDFS-7658:
--

bq. 
HDFS doesnt know the complete size of the file ahead. It considers default 
blocksize ( in your case 256MB) for calculation while adding the new block.

His default blocksize should be 128 MB without modifying dfs.blocksize in 
hdfs-site.xml. If it is a 256MB block size, copy the first 10MB file will fail 
with RF=2 and a 500MB space quota as you mentioned.

> HDFS Space Quota not working as expected
> 
>
> Key: HDFS-7658
> URL: https://issues.apache.org/jira/browse/HDFS-7658
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CDH4.6
>Reporter: Puttaswamy
>
> I am implementing hdfs quota in a cdh4.6 cluster .Hdfs name quota has been 
> working properly.But the Hdfs Space quota has not been working as 
> expected.i.e,
> I set the space quota of 500MB for a directory say /test-space-quota.
> Then i put a file of 10 Mb into /test-space-quota which worked .Now the space 
> available is 480 MB ( 500 - 10*2) where 2 is rep factor.
> Then i put a file of 50Mb into /test-space-quota which worked too as 
> expected. Now the space available is 380 MB (480 - 50*2)
> "I am checking the quota left from the command hadoop fs -count -q 
> /test-space-quota"
> Then i tried to put a file of 100 Mb . It should since it will just consume 
> 200 Mb of space with replication. But when i put that i got an error 
> "DataStreamer Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /test is exceeded: quota = 524288000 B = 500 MB but diskspace consumed = 
> 662700032 B = 632 MB"
> But the quota says
> hadoop fs -count -q /test-space-quota
> none inf   524288000   3984588801 
>2   62914560 /test-space-quota
> Could you please help on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7658) HDFS Space Quota not working as expected

2015-01-30 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298938#comment-14298938
 ] 

Xiaoyu Yao commented on HDFS-7658:
--

bq. 
HDFS doesnt know the complete size of the file ahead. It considers default 
blocksize ( in your case 256MB) for calculation while adding the new block.

His default blocksize should be 128 MB without modifying dfs.blocksize in 
hdfs-site.xml. If it is a 256MB block size, copy the first 10MB file will fail 
with RF=2 and a 500MB space quota as you mentioned.

> HDFS Space Quota not working as expected
> 
>
> Key: HDFS-7658
> URL: https://issues.apache.org/jira/browse/HDFS-7658
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CDH4.6
>Reporter: Puttaswamy
>
> I am implementing hdfs quota in a cdh4.6 cluster .Hdfs name quota has been 
> working properly.But the Hdfs Space quota has not been working as 
> expected.i.e,
> I set the space quota of 500MB for a directory say /test-space-quota.
> Then i put a file of 10 Mb into /test-space-quota which worked .Now the space 
> available is 480 MB ( 500 - 10*2) where 2 is rep factor.
> Then i put a file of 50Mb into /test-space-quota which worked too as 
> expected. Now the space available is 380 MB (480 - 50*2)
> "I am checking the quota left from the command hadoop fs -count -q 
> /test-space-quota"
> Then i tried to put a file of 100 Mb . It should since it will just consume 
> 200 Mb of space with replication. But when i put that i got an error 
> "DataStreamer Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /test is exceeded: quota = 524288000 B = 500 MB but diskspace consumed = 
> 662700032 B = 632 MB"
> But the quota says
> hadoop fs -count -q /test-space-quota
> none inf   524288000   3984588801 
>2   62914560 /test-space-quota
> Could you please help on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7658) HDFS Space Quota not working as expected

2015-01-30 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298921#comment-14298921
 ] 

Xiaoyu Yao commented on HDFS-7658:
--

[~putta_jammy], I'm working on a quota related feature recently. So I took a 
quick look but can't repro this with a dfs.block.size=128MB and replication 
factor of 2. 

Based on the information you posted, the intended quota usage for the first 
block of the last file is 632 MB - 140 MB ~= 492 MB. Consider the replication 
factor of 2, the first block allocated for the last file should be at around 
~250 MB. You could get quota exceeded exception if the dfs.block.size get 
changed (e.g., from 128MB to 256MB) for the last file OR your last file size is 
greater than 128MB?

 

> HDFS Space Quota not working as expected
> 
>
> Key: HDFS-7658
> URL: https://issues.apache.org/jira/browse/HDFS-7658
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CDH4.6
>Reporter: Puttaswamy
>
> I am implementing hdfs quota in a cdh4.6 cluster .Hdfs name quota has been 
> working properly.But the Hdfs Space quota has not been working as 
> expected.i.e,
> I set the space quota of 500MB for a directory say /test-space-quota.
> Then i put a file of 10 Mb into /test-space-quota which worked .Now the space 
> available is 480 MB ( 500 - 10*2) where 2 is rep factor.
> Then i put a file of 50Mb into /test-space-quota which worked too as 
> expected. Now the space available is 380 MB (480 - 50*2)
> "I am checking the quota left from the command hadoop fs -count -q 
> /test-space-quota"
> Then i tried to put a file of 100 Mb . It should since it will just consume 
> 200 Mb of space with replication. But when i put that i got an error 
> "DataStreamer Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /test is exceeded: quota = 524288000 B = 500 MB but diskspace consumed = 
> 662700032 B = 632 MB"
> But the quota says
> hadoop fs -count -q /test-space-quota
> none inf   524288000   3984588801 
>2   62914560 /test-space-quota
> Could you please help on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5782) BlockListAsLongs should take lists of Replicas rather than concrete classes

2015-01-30 Thread Joe Pallas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Pallas updated HDFS-5782:
-
Target Version/s: 3.0.0, 2.7.0  (was: 3.0.0)

> BlockListAsLongs should take lists of Replicas rather than concrete classes
> ---
>
> Key: HDFS-5782
> URL: https://issues.apache.org/jira/browse/HDFS-5782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: Joe Pallas
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HDFS-5782-branch-2.patch, HDFS-5782.patch, 
> HDFS-5782.patch
>
>
> From HDFS-5194:
> {quote}
> BlockListAsLongs's constructor takes a list of Blocks and a list of 
> ReplicaInfos.  On the surface, the former is mildly irritating because it is 
> a concrete class, while the latter is a greater concern due to being a 
> File-based implementation of Replica.
> On deeper inspection, BlockListAsLongs passes members of both to an internal 
> method that accepts just Blocks, which conditionally casts them *back* to 
> ReplicaInfos (this cast only happens to the latter, though this isn't 
> immediately obvious to the reader).
> Conveniently, all methods called on these objects are found in the Replica 
> interface, and all functional (i.e. non-test) consumers of this interface 
> pass in Replica subclasses.  If this constructor took Lists of Replicas 
> instead, it would be more generally useful and its implementation would be 
> cleaner as well.
> {quote}
> Fixing this indeed makes the business end of BlockListAsLongs cleaner while 
> requiring no changes to FsDatasetImpl.  As suggested by the above 
> description, though, the HDFS tests use BlockListAsLongs differently from the 
> production code -- they pretty much universally provide a list of actual 
> Blocks.  To handle this:
> - In the case of SimulatedFSDataset, providing a list of Replicas is actually 
> less work.
> - In the case of NNThroughputBenchmark, rewriting to use Replicas is fairly 
> invasive.  Instead, the patch creates a second constructor in 
> BlockListOfLongs specifically for the use of NNThrougputBenchmark.  It turns 
> the stomach a little, but is clearer and requires less code than the 
> alternatives (and isn't without precedent).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7709) Fix Findbug Warnings

2015-01-30 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298917#comment-14298917
 ] 

Rakesh R commented on HDFS-7709:


I could see lots of findbug warning showing in the pre-commit build. One way is 
to exclude this by adding in the findbug-exclude.xml or it needs to be fixed. 
What would be the best way, any thoughts?

> Fix Findbug Warnings
> 
>
> Key: HDFS-7709
> URL: https://issues.apache.org/jira/browse/HDFS-7709
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Rakesh R
>Assignee: Rakesh R
>
> There are many findbug warnings related to the warning types, 
> - DM_DEFAULT_ENCODING, 
> - RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE,
> - RCN_REDUNDANT_NULLCHECK_WOULD_HAVE_BEEN_A_NPE
> https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html
> https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-rumen.html
> https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5782) BlockListAsLongs should take lists of Replicas rather than concrete classes

2015-01-30 Thread Joe Pallas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Pallas updated HDFS-5782:
-
Attachment: HDFS-5782-branch-2.patch

Added a patch for branch-2.

> BlockListAsLongs should take lists of Replicas rather than concrete classes
> ---
>
> Key: HDFS-5782
> URL: https://issues.apache.org/jira/browse/HDFS-5782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: Joe Pallas
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HDFS-5782-branch-2.patch, HDFS-5782.patch, 
> HDFS-5782.patch
>
>
> From HDFS-5194:
> {quote}
> BlockListAsLongs's constructor takes a list of Blocks and a list of 
> ReplicaInfos.  On the surface, the former is mildly irritating because it is 
> a concrete class, while the latter is a greater concern due to being a 
> File-based implementation of Replica.
> On deeper inspection, BlockListAsLongs passes members of both to an internal 
> method that accepts just Blocks, which conditionally casts them *back* to 
> ReplicaInfos (this cast only happens to the latter, though this isn't 
> immediately obvious to the reader).
> Conveniently, all methods called on these objects are found in the Replica 
> interface, and all functional (i.e. non-test) consumers of this interface 
> pass in Replica subclasses.  If this constructor took Lists of Replicas 
> instead, it would be more generally useful and its implementation would be 
> cleaner as well.
> {quote}
> Fixing this indeed makes the business end of BlockListAsLongs cleaner while 
> requiring no changes to FsDatasetImpl.  As suggested by the above 
> description, though, the HDFS tests use BlockListAsLongs differently from the 
> production code -- they pretty much universally provide a list of actual 
> Blocks.  To handle this:
> - In the case of SimulatedFSDataset, providing a list of Replicas is actually 
> less work.
> - In the case of NNThroughputBenchmark, rewriting to use Replicas is fairly 
> invasive.  Instead, the patch creates a second constructor in 
> BlockListOfLongs specifically for the use of NNThrougputBenchmark.  It turns 
> the stomach a little, but is clearer and requires less code than the 
> alternatives (and isn't without precedent).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7709) Fix Findbug Warnings

2015-01-30 Thread Rakesh R (JIRA)
Rakesh R created HDFS-7709:
--

 Summary: Fix Findbug Warnings
 Key: HDFS-7709
 URL: https://issues.apache.org/jira/browse/HDFS-7709
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


There are many findbug warnings related to the warning types, 
- DM_DEFAULT_ENCODING, 
- RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE,
- RCN_REDUNDANT_NULLCHECK_WOULD_HAVE_BEEN_A_NPE

https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html
https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-rumen.html
https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4265) BKJM doesn't take advantage of speculative reads

2015-01-30 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-4265:
---
Attachment: 0006-HDFS-4265.patch

> BKJM doesn't take advantage of speculative reads
> 
>
> Key: HDFS-4265
> URL: https://issues.apache.org/jira/browse/HDFS-4265
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Ivan Kelly
>Assignee: Rakesh R
> Attachments: 0005-HDFS-4265.patch, 0006-HDFS-4265.patch, 
> 001-HDFS-4265.patch, 002-HDFS-4265.patch, 003-HDFS-4265.patch, 
> 004-HDFS-4265.patch
>
>
> BookKeeperEditLogInputStream reads entry at a time, so it doesn't take 
> advantage of the speculative read mechanism introduced by BOOKKEEPER-336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again

2015-01-30 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298796#comment-14298796
 ] 

Yongjun Zhang commented on HDFS-7707:
-

Hi [~brahmareddy] and [~kihwal],

Thanks a lot for your comments!

Currently {{isFileDeleted()}} does the following:
{code}
 if (tmpParent == null ||
  tmpParent.searchChildren(tmpChild.getLocalNameBytes()) < 0) {
return true;
  }
{code}
which is to check whether a child name exists in parent directory. 
That's the part I was referring to that gets defeated.

I hope my understanding is correct. I described a possible solution as the 
first comment, would you please share some insight?

Thanks.


> Edit log corruption due to delayed block removal again
> --
>
> Key: HDFS-7707
> URL: https://issues.apache.org/jira/browse/HDFS-7707
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>
> Edit log corruption is seen again, even with the fix of HDFS-6825. 
> Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get 
> into edit log for the fileY under dirX, thus corrupting the edit log 
> (restarting NN with the edit log would fail). 
> What HDFS-6825 does to fix this issue is, to detect whether fileY is already 
> deleted by checking the ancestor dirs on it's path, if any of them doesn't 
> exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for 
> the file.
> For this new edit log corruption, what I found was, the client first deleted 
> dirX recursively, then create another dir with exactly the same name as dirX 
> right away.  Because HDFS-6825 count on the namespace checking (whether dirX 
> exists in its parent dir) to decide whether a file has been deleted, the 
> newly created dirX defeats this checking, thus OP_CLOSE for the already 
> deleted file gets into the edit log, due to delayed block removal.
> What we need to do is to have a more robust way to detect whether a file has 
> been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7603) The background replication queue initialization may not let others run

2015-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298754#comment-14298754
 ] 

Hudson commented on HDFS-7603:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2040 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2040/])
HDFS-7603. The background replication queue initialization may not let others 
run. Contributed by Kihwal Lee. (kihwal: rev 
89b07490f8354bb83a67b7ffc917bfe99708e615)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> The background replication queue initialization may not let others run
> --
>
> Key: HDFS-7603
> URL: https://issues.apache.org/jira/browse/HDFS-7603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7603.patch, HDFS-7603.patch
>
>
> The background replication queue initialization processes configured number 
> of blocks at a time and releases the namesystem write lock.  This was to let 
> namenode start serving right after a standby to active transition or leaving 
> safe mode.  However, this does not allow others to run much if the lock 
> fairness is set to "unfair" for the higher throughput.
> I propose adding a delay between unlocking and locking in the async repl 
> queue init thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again

2015-01-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298736#comment-14298736
 ] 

Kihwal Lee commented on HDFS-7707:
--

How is {{isFileDeleted()}} check defeated? The check walks up the tree 
following the parent reference, not symbolically using path name. Creation of 
another directory (i.e. different INode) with the same name should not affect 
the check. 

> Edit log corruption due to delayed block removal again
> --
>
> Key: HDFS-7707
> URL: https://issues.apache.org/jira/browse/HDFS-7707
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>
> Edit log corruption is seen again, even with the fix of HDFS-6825. 
> Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get 
> into edit log for the fileY under dirX, thus corrupting the edit log 
> (restarting NN with the edit log would fail). 
> What HDFS-6825 does to fix this issue is, to detect whether fileY is already 
> deleted by checking the ancestor dirs on it's path, if any of them doesn't 
> exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for 
> the file.
> For this new edit log corruption, what I found was, the client first deleted 
> dirX recursively, then create another dir with exactly the same name as dirX 
> right away.  Because HDFS-6825 count on the namespace checking (whether dirX 
> exists in its parent dir) to decide whether a file has been deleted, the 
> newly created dirX defeats this checking, thus OP_CLOSE for the already 
> deleted file gets into the edit log, due to delayed block removal.
> What we need to do is to have a more robust way to detect whether a file has 
> been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7603) The background replication queue initialization may not let others run

2015-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298719#comment-14298719
 ] 

Hudson commented on HDFS-7603:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #90 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/90/])
HDFS-7603. The background replication queue initialization may not let others 
run. Contributed by Kihwal Lee. (kihwal: rev 
89b07490f8354bb83a67b7ffc917bfe99708e615)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> The background replication queue initialization may not let others run
> --
>
> Key: HDFS-7603
> URL: https://issues.apache.org/jira/browse/HDFS-7603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7603.patch, HDFS-7603.patch
>
>
> The background replication queue initialization processes configured number 
> of blocks at a time and releases the namesystem write lock.  This was to let 
> namenode start serving right after a standby to active transition or leaving 
> safe mode.  However, this does not allow others to run much if the lock 
> fairness is set to "unfair" for the higher throughput.
> I propose adding a delay between unlocking and locking in the async repl 
> queue init thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >