[jira] [Commented] (HDFS-9916) OzoneHandler : Add Key handler

2016-03-07 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184471#comment-15184471
 ] 

Chris Nauroth commented on HDFS-9916:
-

Hi [~anu].

In {{KeyProcessTemplate#handleCall}}, there is a TODO to handle the 
{{IOException}}.  We should do something with that, even if it's just 
rethrowing it as a generic {{OzoneException}} before more detailed error 
mapping logic is implemented.

> OzoneHandler : Add Key handler
> --
>
> Key: HDFS-9916
> URL: https://issues.apache.org/jira/browse/HDFS-9916
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-9916-HDFS-7240.001.patch
>
>
> Add Rest handlers for handing key related functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9891) Ozone: Add container transport client

2016-03-07 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-9891:

Status: Patch Available  (was: Open)

> Ozone: Add container transport client
> -
>
> Key: HDFS-9891
> URL: https://issues.apache.org/jira/browse/HDFS-9891
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9891-HDFS-7240.001.patch, 
> HDFS-9891-HDFS-7240.002.patch
>
>
> Add ozone container transport client -- that makes it easy to talk to server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9891) Ozone: Add container transport client

2016-03-07 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-9891:

Hadoop Flags: Reviewed

[~anu], thank you for updating the patch.  +1 pending pre-commit.

> Ozone: Add container transport client
> -
>
> Key: HDFS-9891
> URL: https://issues.apache.org/jira/browse/HDFS-9891
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9891-HDFS-7240.001.patch, 
> HDFS-9891-HDFS-7240.002.patch
>
>
> Add ozone container transport client -- that makes it easy to talk to server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9891) Ozone: Add container transport client

2016-03-07 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-9891:

Target Version/s: HDFS-7240

> Ozone: Add container transport client
> -
>
> Key: HDFS-9891
> URL: https://issues.apache.org/jira/browse/HDFS-9891
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9891-HDFS-7240.001.patch, 
> HDFS-9891-HDFS-7240.002.patch
>
>
> Add ozone container transport client -- that makes it easy to talk to server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9873) Ozone: Add container transport server

2016-03-07 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-9873:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

The Findbugs warning is pre-existing and unrelated.

The Checkstyle warnings are not worth addressing (package-info.java missing, 
but the packages are not considered public).

The test failures are unrelated.

I have committed this to the HDFS-7240 feature branch.  [~anu], thank you for 
the patch.

> Ozone: Add container transport server
> -
>
> Key: HDFS-9873
> URL: https://issues.apache.org/jira/browse/HDFS-9873
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-9873-HDFS-7240.001.patch, 
> HDFS-9873-HDFS-7240.002.patch, HDFS-9873-HDFS-7240.003.patch
>
>
> Add server part of the container transport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9812) Streamer threads leak if failure happens when closing DFSOutputStream

2016-03-07 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184441#comment-15184441
 ] 

Lin Yiqun commented on HDFS-9812:
-

Thanks [~ajisakaa] for commit!

> Streamer threads leak if failure happens when closing DFSOutputStream
> -
>
> Key: HDFS-9812
> URL: https://issues.apache.org/jira/browse/HDFS-9812
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9812-branch-2.7.patch, HDFS-9812.002.patch, 
> HDFS-9812.003.patch, HDFS-9812.004.patch, HDFS-9812.branch-2.patch, 
> HDFS.001.patch
>
>
> In HDFS-9794, it has solved problem of that stream thread leak if failure 
> happens when closing the striped outputstream. And in {{DFSOutputStream}}, it 
> also exists the same problem in {{DFSOutputStream#closeImpl}}. If failures 
> happen when flushing data blocks, the streamer threads will also not be 
> closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184426#comment-15184426
 ] 

Hudson commented on HDFS-9882:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #9441 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9441/])
HDFS-9882. Add heartbeatsTotal in Datanode metrics. (Contributed by Hua (arp: 
rev c2140d05efaf18b41caae8c61d9f6d668ab0e874)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java


> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned

2016-03-07 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184419#comment-15184419
 ] 

Jing Zhao commented on HDFS-8786:
-

Thanks [~rakeshr]! For #1 I think I misread your patch so please ignore my 
previous comment.

> Erasure coding: DataNode should transfer striped blocks before being 
> decommissioned
> ---
>
> Key: HDFS-8786
> URL: https://issues.apache.org/jira/browse/HDFS-8786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Rakesh R
> Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, 
> HDFS-8786-003.patch, HDFS-8786-004.patch, HDFS-8786-draft.patch
>
>
> Per [discussion | 
> https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004]
>  under HDFS-8697, it's too expensive to reconstruct block groups for decomm 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9882:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed for 2.8.0. checkstyle issue can be ignored here given we follow the 
convention used by the class. Don't need a new test case.

Thanks for the contribution [~hualiu].

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9882:

Issue Type: Improvement  (was: New Feature)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184393#comment-15184393
 ] 

Hadoop QA commented on HDFS-9882:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 1s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s 
{color} | {color:red} root: patch generated 1 new + 83 unchanged - 0 fixed = 84 
total (was 83) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 59s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 40s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 40s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 37s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 25s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color

[jira] [Commented] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned

2016-03-07 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184373#comment-15184373
 ] 

Rakesh R commented on HDFS-8786:


Thanks [~jingzhao] for the reviews and good comments.

Could you please give few clarifications on first comment. I will take care 
other comments while preparing next patch. Thanks!

comment-2=> agreed, will update in next patch

comment-3=> agreed, will raise a follow-on jira and work on.

comment-4=> agreed, will update in next patch

 
bq. comment-1=> 1. Not all the decommissioning nodes can be later used as 
source nodes, since we still need to consider DataNode's current load etc. Thus 
I'm not sure the calculation is correct here. In the meanwhile, I do not think 
we should adjust additionalReplRequired: there is no need to leave 
decommissioning nodes to the next round. Thus looks like we do not need this 
change.

{{getAdditionalReplRequired()}} count/value is used while choosing the required 
number of target nodes logic as shown below and these target nodes will be 
given to reconstruction or replication tasks. The new calculation in my patch 
is adjusting the additional replication required by ignoring the 
decommissioning count, so that the target nodes for these will not be chosen 
later. Here the idea is, will schedule decommissioning replication task only if 
there are no other cases like, under replcias or needed a rack  etc. Again 
ErasureCodingWork will schedule replication task for decommissioning only if 
the block group has {{#hasAllInternalBlocks}}. IIUC correctly your [previous 
comment|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15174447&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15174447]
 also pointing to this ignore the decommssioning if any other tasks exists. I 
think the existing {{BlockManager#chooseSourceDatanodes}} is sufficient for 
source selection, am I missing anything?

{code}
case-1) 9 live replicas on 5 racks and 1 decommissioning replica
Choose 1 target for rack replication task and ignore decommissioning 
replication task

case-2) 7 live replicas, 1 under replica and 1 decommissioning replica
Choose 1 target for reconstruction task and ignore decommissioning replication 
task

case-3) 6 live replicas, 2 under replica and 1 decommissioning replica
Choose 2 target for reconstruction task and ignore decommissioning replication 
task

case-4) 8 live replicas and 1 decommissioning replica
Choose 1 target for decommissioning replication task

case-5) 7 live replicas and 2 decommissioning replica
Choose 2 target for decommissioning replication task
{code}

{code}
ErasureCodingWork.java:-
  void chooseTargets(BlockPlacementPolicy blockplacement,
  BlockStoragePolicySuite storagePolicySuite,
  Set excludedNodes) {
//
DatanodeStorageInfo[] chosenTargets = blockplacement.chooseTarget(
getBc().getName(), getAdditionalReplRequired(), getSrcNodes()[0],
getLiveReplicaStorages(), false, excludedNodes,
getBlock().getNumBytes(),
storagePolicySuite.getPolicy(getBc().getStoragePolicyID()));


ReplicationWork.java:-
void chooseTargets(BlockPlacementPolicy blockplacement,
  BlockStoragePolicySuite storagePolicySuite,
  Set excludedNodes) {
  //
  DatanodeStorageInfo[] chosenTargets = blockplacement.chooseTarget(
  getBc().getName(), getAdditionalReplRequired(), getSrcNodes()[0],
  getLiveReplicaStorages(), false, excludedNodes,
  getBlock().getNumBytes(),
  storagePolicySuite.getPolicy(getBc().getStoragePolicyID()));  
{code}


> Erasure coding: DataNode should transfer striped blocks before being 
> decommissioned
> ---
>
> Key: HDFS-8786
> URL: https://issues.apache.org/jira/browse/HDFS-8786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Rakesh R
> Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, 
> HDFS-8786-003.patch, HDFS-8786-004.patch, HDFS-8786-draft.patch
>
>
> Per [discussion | 
> https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004]
>  under HDFS-8697, it's too expensive to reconstruct block groups for decomm 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9405) When starting a file, NameNode should generate EDEK in a separate thread

2016-03-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reassigned HDFS-9405:
---

Assignee: Xiao Chen

> When starting a file, NameNode should generate EDEK in a separate thread
> 
>
> Key: HDFS-9405
> URL: https://issues.apache.org/jira/browse/HDFS-9405
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption, namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Xiao Chen
>
> {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation 
> to the key provider, which could be slow or cause timeout. It should be done 
> as a separate thread so as to return a proper error message to the RPC caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9405) When starting a file, NameNode should generate EDEK in a separate thread

2016-03-07 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184370#comment-15184370
 ] 

Xiao Chen commented on HDFS-9405:
-

Thanks all for the discussions and thoughts here. I'd like to work on this.

As I understand, there seems to be 2 problems:
- On NN startup/failover, the first call will trigger the {{LoadingCache}} to 
fill up, which happens synchronously.
We may solve this by having a background thread to actively warm up the cache.

- If KMS or the backing key provider is down, all RPCs to create will hang and 
timeout in {{FSNamesystem#startFile}} (if cache is empty).
This is arguably a bug. IMHO this should be identified at the service level, 
instead of depending on the client RPC to find it.
But if we don't like the hang in the RPC, perhaps in addition to the above 
background warm up, we could also update the {{ValueQueue}} to not do a get, 
but a 
[getIfPresent|http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/Cache.html#getIfPresent(java.lang.Object)]
 instead, and throw {{RetryStartFileException}} directly if nothing cached, 
under the assumption that otherwise the cache should have been filled up?

Is my understanding correct?

Will work hard on making the logs/metrics helpful as well.

> When starting a file, NameNode should generate EDEK in a separate thread
> 
>
> Key: HDFS-9405
> URL: https://issues.apache.org/jira/browse/HDFS-9405
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption, namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>
> {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation 
> to the key provider, which could be slow or cause timeout. It should be done 
> as a separate thread so as to return a proper error message to the RPC caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9904) testCheckpointCancellationDuringUpload occasionally fails

2016-03-07 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9904:

Attachment: HDFS-9904.002.patch

> testCheckpointCancellationDuringUpload occasionally fails 
> --
>
> Key: HDFS-9904
> URL: https://issues.apache.org/jira/browse/HDFS-9904
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.3
>Reporter: Kihwal Lee
> Attachments: HDFS-9904.001.patch, HDFS-9904.002.patch
>
>
> The failure was at the end of the test case where the txid of the standby 
> (former active) is checked. Since the checkpoint/uploading was canceled , it 
> is not supposed to have the new checkpoint. Looking at the test log, that was 
> still the case, but the standby then did checkpoint on its own and bumped up 
> the txid, right before the check was performed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9904) testCheckpointCancellationDuringUpload occasionally fails

2016-03-07 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184333#comment-15184333
 ] 

Lin Yiqun commented on HDFS-9904:
-

Thanks [~kihwal] for concrete analysation. I am ignored for that.
{quote}
Also, it should be set before the namenode is started and should be reset for 
other test cases.
{quote}
In method {{testCheckpointCancellationDuringUpload}}, it has already restart 
all namenodes after. So I reset the configuration here is ok.
{code}
// don't compress, we want a big image
for (int i = 0; i < NUM_NNS; i++) {
  cluster.getConfiguration(i).setBoolean(
  DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, false);
}

// Throttle SBN upload to make it hang during upload to ANN
for (int i = 1; i < NUM_NNS; i++) {
  cluster.getConfiguration(i).setLong(
  DFSConfigKeys.DFS_IMAGE_TRANSFER_RATE_KEY, 100);
}
for (int i = 0; i < NUM_NNS; i++) {
  cluster.restartNameNode(i);
}
{code}
It seems that there was a similar problem in 
{{testNonPrimarySBNUploadFSImage}}. If first namenode change to standby, 
because 10 is bigger than 5(set value), it will also do a checkpoint. And 
actually, the checkpoint should be uploaded by one of standby nodes.
{code}
doEdits(0, 10);
cluster.transitionToStandby(0);
{code}
Am I think right? If so, we can slove both two in this jira. Finally update a 
patch for addressing your comments.



> testCheckpointCancellationDuringUpload occasionally fails 
> --
>
> Key: HDFS-9904
> URL: https://issues.apache.org/jira/browse/HDFS-9904
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.3
>Reporter: Kihwal Lee
> Attachments: HDFS-9904.001.patch
>
>
> The failure was at the end of the test case where the txid of the standby 
> (former active) is checked. Since the checkpoint/uploading was canceled , it 
> is not supposed to have the new checkpoint. Looking at the test log, that was 
> still the case, but the standby then did checkpoint on its own and bumped up 
> the txid, right before the check was performed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-03-07 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created HDFS-9917:
--

 Summary: IBR accumulate more objects when SNN was down for 
sometime.
 Key: HDFS-9917
 URL: https://issues.apache.org/jira/browse/HDFS-9917
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula


SNN was down for sometime because of some reasons..After restarting SNN,it 
became unreponsive because 
- 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), where 
as each datanode had only ~2.5 million blocks.
- GC can't trigger on this objects since all will be under RPC queue. 

To recover this( to clear this objects) ,restarted all the DN's one by 
one..This issue happened in 2.4.1 where split of blockreport was not available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184312#comment-15184312
 ] 

Hua Liu commented on HDFS-9882:
---

Hi [~arpiagariu]

I submitted the V4 patch a few hours ago but seems jenkins hasn't built it. I 
will re-submit tomorrow if jenkins still cannot kick in by tomorrow morning.

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9719) Refactoring ErasureCodingWorker into smaller reusable constructs

2016-03-07 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184304#comment-15184304
 ] 

Kai Zheng commented on HDFS-9719:
-

Thanks Uma and Rakesh for the taking care of this, and your much time for the 
reviewing.

> Refactoring ErasureCodingWorker into smaller reusable constructs
> 
>
> Key: HDFS-9719
> URL: https://issues.apache.org/jira/browse/HDFS-9719
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HDFS-9719-v1.patch, HDFS-9719-v2.patch, 
> HDFS-9719-v3.patch, HDFS-9719-v4.patch, HDFS-9719-v5.patch, HDFS-9719-v6.patch
>
>
> This would suggest and refactor {{ErasureCodingWorker}} into smaller 
> constructs to be reused in other places like block group checksum computing 
> in datanode side. As discussed in HDFS-8430 and implemented in HDFS-9694 
> patch, checksum computing for striped block groups would be distributed to 
> datanode in the group, where data block data should be able to be 
> reconstructed when missed/corrupted to recompute the block checksum. The most 
> needed codes are in the current ErasureCodingWorker and could be reused in 
> order to avoid duplication. Fortunately, we have very good and complete 
> tests, which would make the refactoring much easier. The refactoring will 
> also help a lot for subsequent tasks in phase II for non-striping erasure 
> coded files and blocks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9873) Ozone: Add container transport server

2016-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184302#comment-15184302
 ] 

Hadoop QA commented on HDFS-9873:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
57s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} HDFS-7240 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} HDFS-7240 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 8s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in HDFS-7240 has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} HDFS-7240 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 34s 
{color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 
0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 15s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 3s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 163m 38s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.server.datanode.TestDataXceiverLazyPersistHint |
|   | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.server.datanode.TestDataXceiverLazyPersistHint |
|   | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
|   | hadoop.hdfs.server.namenode.TestNameNodeMetadataC

[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-07 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184300#comment-15184300
 ] 

Rui Li commented on HDFS-7866:
--

Thanks Walter for the suggestions.
bq. separating the logic of manipulation of the 12-bits. 2 sets of set/get 
methods for them.
It makes sense to have 2 separate sets of get/set methods. But if we do it 
here, we'll be just repeating the code. What do you think?
bq. We are sure policyID is strictly <=7-bits right?
I think 128 policies is enough for now, and we can just consider negative ID as 
invalid. If we have more policies in the future, we can support negative IDs 
which means we can have 256  different IDs. Then we can think about what to 
return if we call getErasureCodingPolicyID for non-EC files. Maybe throw an 
exception?

[~zhz], please hold on the patch. The latest {{TestEditLog}} failure may be 
related. I'll look into it.

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, 
> HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, 
> HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9719) Refactoring ErasureCodingWorker into smaller reusable constructs

2016-03-07 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184285#comment-15184285
 ] 

Uma Maheswara Rao G commented on HDFS-9719:
---

Thanks [~rakeshr] for pinging me. I will take a look today. Thanks Kai for the 
work.

> Refactoring ErasureCodingWorker into smaller reusable constructs
> 
>
> Key: HDFS-9719
> URL: https://issues.apache.org/jira/browse/HDFS-9719
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HDFS-9719-v1.patch, HDFS-9719-v2.patch, 
> HDFS-9719-v3.patch, HDFS-9719-v4.patch, HDFS-9719-v5.patch, HDFS-9719-v6.patch
>
>
> This would suggest and refactor {{ErasureCodingWorker}} into smaller 
> constructs to be reused in other places like block group checksum computing 
> in datanode side. As discussed in HDFS-8430 and implemented in HDFS-9694 
> patch, checksum computing for striped block groups would be distributed to 
> datanode in the group, where data block data should be able to be 
> reconstructed when missed/corrupted to recompute the block checksum. The most 
> needed codes are in the current ErasureCodingWorker and could be reused in 
> order to avoid duplication. Fortunately, we have very good and complete 
> tests, which would make the refactoring much easier. The refactoring will 
> also help a lot for subsequent tasks in phase II for non-striping erasure 
> coded files and blocks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9812) Streamer threads leak if failure happens when closing DFSOutputStream

2016-03-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184272#comment-15184272
 ] 

Hudson commented on HDFS-9812:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9440 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9440/])
HDFS-9812. Streamer threads leak if failure happens when closing (aajisaka: rev 
352d299cf8ebe330d24117df98d1e6a64ae38c26)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java


> Streamer threads leak if failure happens when closing DFSOutputStream
> -
>
> Key: HDFS-9812
> URL: https://issues.apache.org/jira/browse/HDFS-9812
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9812-branch-2.7.patch, HDFS-9812.002.patch, 
> HDFS-9812.003.patch, HDFS-9812.004.patch, HDFS-9812.branch-2.patch, 
> HDFS.001.patch
>
>
> In HDFS-9794, it has solved problem of that stream thread leak if failure 
> happens when closing the striped outputstream. And in {{DFSOutputStream}}, it 
> also exists the same problem in {{DFSOutputStream#closeImpl}}. If failures 
> happen when flushing data blocks, the streamer threads will also not be 
> closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-07 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184267#comment-15184267
 ] 

Walter Su commented on HDFS-7866:
-

1. Not only javadoc, what I mean was separating the logic of manipulation of 
the 12-bits. 2 sets of set/get methods for them. Now the cuts are both (1,11), 
it's just the meanings of each part are different. But what if in the future 
the cut is different? You have the same concern about unifying set method:
bq. 3. The biggest concern is INodeFile constructor – related to that, the 
toLong method. Currently when isStriped, we just interpret replication as the 
EC policy ID. This looks pretty hacky. But it looks pretty tricky to fix. 

By the way, if we're planning use unified cut (1,11) for both of them, why 
bother having one enum item BLOCK_LAYOUT_AND_REDUNDANCY(12-bits) and do the bit 
masking by myself, insead of 2 enum items as before which does the bit masking 
for us.

Some nits:
1. {{LAYOUT_BIT_WIDTH}}, {{MAX_REDUNDANCY}} can be private inside 
{{HeaderFormat}}.

2. 
{code}
  /**
   * @return The ID of the erasure coding policy on the file. -1 represents no
   *  EC policy.
   */
  @VisibleForTesting
  @Override
  public byte getErasureCodingPolicyID() {
if (isStriped()) {
  return (byte) HeaderFormat.getReplication(header);
}
return -1;
  }
{code}
{code}
 // check if the file has an EC policy
  ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp.
  getErasureCodingPolicy(fsd.getFSNamesystem(), existing);
  if (ecPolicy != null) {
replication = ecPolicy.getId();
  }
{code}
We are sure policyID is strictly <=7-bits right? Casting an ID with value >=128 
to byte becomes negative, then the logic gets wild. Vice versa.

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, 
> HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, 
> HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184253#comment-15184253
 ] 

Arpit Agarwal commented on HDFS-9882:
-

+1 pending jenkins for the v4 patch.

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9812) Streamer threads leak if failure happens when closing DFSOutputStream

2016-03-07 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-9812:

Attachment: HDFS-9812-branch-2.7.patch
HDFS-9812.branch-2.patch

Attaching patches used for branch-2/2.8 and branch-2.7.

> Streamer threads leak if failure happens when closing DFSOutputStream
> -
>
> Key: HDFS-9812
> URL: https://issues.apache.org/jira/browse/HDFS-9812
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9812-branch-2.7.patch, HDFS-9812.002.patch, 
> HDFS-9812.003.patch, HDFS-9812.004.patch, HDFS-9812.branch-2.patch, 
> HDFS.001.patch
>
>
> In HDFS-9794, it has solved problem of that stream thread leak if failure 
> happens when closing the striped outputstream. And in {{DFSOutputStream}}, it 
> also exists the same problem in {{DFSOutputStream#closeImpl}}. If failures 
> happen when flushing data blocks, the streamer threads will also not be 
> closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9812) Streamer threads leak if failure happens when closing DFSOutputStream

2016-03-07 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-9812:

   Resolution: Fixed
Fix Version/s: 2.7.3
   2.8.0
   Status: Resolved  (was: Patch Available)

Committed this to branch-2.7 and above. Thanks [~linyiqun] for the contribution!

> Streamer threads leak if failure happens when closing DFSOutputStream
> -
>
> Key: HDFS-9812
> URL: https://issues.apache.org/jira/browse/HDFS-9812
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9812.002.patch, HDFS-9812.003.patch, 
> HDFS-9812.004.patch, HDFS.001.patch
>
>
> In HDFS-9794, it has solved problem of that stream thread leak if failure 
> happens when closing the striped outputstream. And in {{DFSOutputStream}}, it 
> also exists the same problem in {{DFSOutputStream#closeImpl}}. If failures 
> happen when flushing data blocks, the streamer threads will also not be 
> closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Open  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9812) Streamer threads leak if failure happens when closing DFSOutputStream

2016-03-07 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-9812:

Hadoop Flags: Reviewed
 Description: In HDFS-9794, it has solved problem of that stream thread 
leak if failure happens when closing the striped outputstream. And in 
{{DFSOutputStream}}, it also exists the same problem in 
{{DFSOutputStream#closeImpl}}. If failures happen when flushing data blocks, 
the streamer threads will also not be closed.  (was: In HDFS-9794, it has 
solved problem of that stream thread leak if failure happens when closing the 
striped outputstream. And in {{DfsOutpuStream}}, it also exists the same 
problem in {{DfsOutpuStream#closeImpl}}. If failures happen when flushing data 
blocks, the streamer threads will also not be closed.)
 Summary: Streamer threads leak if failure happens when closing 
DFSOutputStream  (was: Streamer threads leak if failure happens when closing 
the dfsoutputstream)

> Streamer threads leak if failure happens when closing DFSOutputStream
> -
>
> Key: HDFS-9812
> URL: https://issues.apache.org/jira/browse/HDFS-9812
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9812.002.patch, HDFS-9812.003.patch, 
> HDFS-9812.004.patch, HDFS.001.patch
>
>
> In HDFS-9794, it has solved problem of that stream thread leak if failure 
> happens when closing the striped outputstream. And in {{DFSOutputStream}}, it 
> also exists the same problem in {{DFSOutputStream#closeImpl}}. If failures 
> happen when flushing data blocks, the streamer threads will also not be 
> closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9865) TestBlockReplacement fails intermittently in trunk

2016-03-07 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184170#comment-15184170
 ] 

Lin Yiqun commented on HDFS-9865:
-

Thanks [~iwasakims] for commit!

> TestBlockReplacement fails intermittently in trunk
> --
>
> Key: HDFS-9865
> URL: https://issues.apache.org/jira/browse/HDFS-9865
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9865.001.patch, HDFS-9865.002.patch
>
>
> I found the testcase {{TestBlockReplacement}} will be failed sometimes in 
> testing. And I looked the unit log, always I will found these infos:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement
> testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement)
>   Time elapsed: 8.764 sec  <<< FAILURE!
> java.lang.AssertionError: The block should be only on 1 datanode  
> expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436)
> {code}
> Finally I found the reason is that not deleting block completely in 
> testDeletedBlockWhenAddBlockIsInEdit cause the datanode's num not correct. 
> And the time to wait FsDatasetAsyncDsikService to delete the block is not a 
> accurate value. 
> {code}
> LOG.info("replaceBlock:  " + replaceBlock(block,
>   (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc,
>   (DatanodeInfo)destDnDesc));
> // Waiting for the FsDatasetAsyncDsikService to delete the block
> Thread.sleep(3000);
> {code}
> When I adjust this time to 1 seconds, it will be always failed. Also the 3 
> seconds in test is not a accurate value too. We should adjust these code's 
> logic to a better way such as waiting for the block to be replicated in 
> testDecommision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9625) set replication for empty file failed when set storage policy

2016-03-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9625:

Fix Version/s: 2.8.0

> set replication for empty file  failed when set storage policy
> --
>
> Key: HDFS-9625
> URL: https://issues.apache.org/jira/browse/HDFS-9625
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: DENG FEI
>Assignee: DENG FEI
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9625.003.patch, HDFS-9625.004.patch, 
> patch.HDFS-9625.002, patch_HDFS-9625.20160107
>
>
>  When setReplication, the FSDirectory#updateCount need calculate the 
> related storageTypes quota,but will check the file consume the ds quota is 
> positive.
>  Actually,it's may set replication after create file,like  
> JobSplitWriter#createSplitFiles.
> It's also can reproduce on command shell:
> 1.  hdfs storagepolicies -setStoragePolicy -path /tmp -policy HOT
> 2.  hdfs dfs -touchz /tmp/test
> 3.  hdfs dfs -setrep 5 /tmp/test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9916) OzoneHandler : Add Key handler

2016-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184157#comment-15184157
 ] 

Hadoop QA commented on HDFS-9916:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
10s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} HDFS-7240 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} HDFS-7240 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 18s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in HDFS-7240 has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} HDFS-7240 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 52s 
{color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 15 new + 
1 unchanged - 0 fixed = 16 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 31s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 1 
unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 43s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 27s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 201m 27s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  org.apache.hadoop.ozone.web.response.KeyInfo defines compareTo(KeyInfo) 
and uses Object.equals()  At KeyInfo.java:Object.equals()  At 
KeyInfo.java:[lines 176-186] |
| JDK v1.8.0_74 Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeM

[jira] [Commented] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK

2016-03-07 Thread Pan Yuxuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184151#comment-15184151
 ] 

Pan Yuxuan commented on HDFS-9902:
--

[~brahmareddy] thanks for your quick reply and fix this issue.
I think the general config make sense for me. 

> dfs.datanode.du.reserved should be difference between StorageType DISK and 
> RAM_DISK
> ---
>
> Key: HDFS-9902
> URL: https://issues.apache.org/jira/browse/HDFS-9902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9902.patch
>
>
> Now Hadoop support different storage type for DISK, SSD, ARCHIVE and 
> RAM_DISK, but they share one configuration dfs.datanode.du.reserved.
> The DISK size may be several TB and the RAM_DISK size may be only several 
> tens of GB.
> The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same 
> DN, and I set  dfs.datanode.du.reserved values 10GB, this will waste a lot of 
> RAM_DISK size. 
> Since the usage of RAM_DISK can be 100%, so I don't want 
> dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs.
> So can we make a new configuration for RAM_DISK or just skip this 
> configuration for RAM_DISK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9891) Ozone: Add container transport client

2016-03-07 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9891:
---
Attachment: HDFS-9891-HDFS-7240.002.patch

[~cnauroth] Thanks for the review and your comments.

bq. I tried applying both HDFS-9873 and this one, but it didn't apply cleanly. 
I assume there is just some trivial rebase to be done for compatibility with 
the current revision of the HDFS-9873 patch.

Thanks, fixed now.

bq. DatanodeID#getContainerPort has a typo in the JavaDocs: "Retruns".
fixed.

bq. Typically, methods for translating between protobuf objects and our domain 
objects are placed into classes in the org.apache.hadoop.hdfs.protocolPB 
package. Do you think it would be appropriate to move some of these methods 
over there, or is there a reason that they need to stay in DatanodeID and 
Pipeline?

thanks for pointing it out. I did look at it, but those protocolPB classes 
where getting huge , so I decided to change the pattern and break these helper 
functions into classes of its own.


bq. XceiverClient#close: Should group.shutdownGracefully() be called before 
channelFuture.channel().close()? I believe that means any pending I/O events 
would be drained first before closing the socket that event processing depends 
on.
You are right, thanks for pointing this out. Fixed.

bq. TestContainerServer: There is a risk of resource leaks in these tests. I 
think this can be addressed by changing testPipeline to call 
EmbeddedChannel#close and changing testClientServer to call XceiverServer#stop 
and XceiverClient#close. These calls should be guaranteed by using a finally 
block.
fixed.



> Ozone: Add container transport client
> -
>
> Key: HDFS-9891
> URL: https://issues.apache.org/jira/browse/HDFS-9891
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9891-HDFS-7240.001.patch, 
> HDFS-9891-HDFS-7240.002.patch
>
>
> Add ozone container transport client -- that makes it easy to talk to server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Attachment: 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Open  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9723) Improve Namenode Throttling Against Bad Jobs with FCQ and CallerContext

2016-03-07 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9723:
-
Description: 
HDFS namenode handles RPC requests from DFS clients and internal processing 
from datanodes. It has been a recurring pain that some bad jobs overwhelm the 
namenode and bring the whole cluster down. FCQ (Fair Call Queue) by HADOOP-9640 
is the one of the existing efforts added since Hadoop 2.4 to address this 
issue. 

In current FCQ implementation, incoming RPC calls are scheduled based on the 
number of recent RPC calls of different users with a time-decayed scheduler. 
This works well when there is a clear mapping between users and their RPC calls 
from different jobs. However, this may not work effectively when it is hard to 
track calls to a specific caller in a chain of operations from the workflow 
(e.g.Oozie -> Hive -> Yarn). It is not feasible for operators/administrators to 
throttle all the hive jobs because of one “bad” query.

This JIRA proposed to leverage RPC caller context information (such as 
callerType: caller Id from TEZ-2851) available with HDFS-9184 as an alternative 
to existing UGI (or user name when delegation token is not available) based 
Identify Provider to improve effectiveness Hadoop RPC Fair Call Queue 
(HADOOP-9640) for better namenode throttling in multi-tenancy cluster 
deployment.  

  was:
HDFS namenode handles RPC requests from DFS clients and internal processing 
from datanodes. It has been a recurring pain that some bad jobs overwhelm the 
namenode and bring the whole cluster down. FCQ (Fair Call Queue) by HADOOP-9640 
is the one of the existing efforts added since Hadoop 2.4 to address this 
issue. 

In current FCQ implementation, incoming RPC calls are scheduled based on the 
number of recent RPC calls (1000) of different users with a time-decayed 
scheduler. This works well when there is a clear mapping between users and 
their RPC calls from different jobs. However, this may not work effectively 
when it is hard to track calls to a specific caller in a chain of operations 
from the workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for 
operators/administrators to throttle all the hive jobs because of one “bad” 
query.

This JIRA proposed to leverage RPC caller context information (such as 
callerType: caller Id from TEZ-2851) available with HDFS-9184 as an alternative 
to existing UGI (or user name when delegation token is not available) based 
Identify Provider to improve effectiveness Hadoop RPC Fair Call Queue 
(HADOOP-9640) for better namenode throttling in multi-tenancy cluster 
deployment.  


> Improve Namenode Throttling Against Bad Jobs with FCQ and CallerContext
> ---
>
> Key: HDFS-9723
> URL: https://issues.apache.org/jira/browse/HDFS-9723
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> HDFS namenode handles RPC requests from DFS clients and internal processing 
> from datanodes. It has been a recurring pain that some bad jobs overwhelm the 
> namenode and bring the whole cluster down. FCQ (Fair Call Queue) by 
> HADOOP-9640 is the one of the existing efforts added since Hadoop 2.4 to 
> address this issue. 
> In current FCQ implementation, incoming RPC calls are scheduled based on the 
> number of recent RPC calls of different users with a time-decayed scheduler. 
> This works well when there is a clear mapping between users and their RPC 
> calls from different jobs. However, this may not work effectively when it is 
> hard to track calls to a specific caller in a chain of operations from the 
> workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for 
> operators/administrators to throttle all the hive jobs because of one “bad” 
> query.
> This JIRA proposed to leverage RPC caller context information (such as 
> callerType: caller Id from TEZ-2851) available with HDFS-9184 as an 
> alternative to existing UGI (or user name when delegation token is not 
> available) based Identify Provider to improve effectiveness Hadoop RPC Fair 
> Call Queue (HADOOP-9640) for better namenode throttling in multi-tenancy 
> cluster deployment.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9694) Make existing DFSClient#getFileChecksum() work for striped blocks

2016-03-07 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184046#comment-15184046
 ] 

Kai Zheng commented on HDFS-9694:
-

Hi [~andrew.wang], the reported findbugs were introduced by the following code 
block. I'm wondering how to avoid them but meanwhile not to complicate the 
simple {{struct}} like class. Could you give a hint? Thanks.
{code}
+  public StripedBlockInfo(ExtendedBlock block, DatanodeInfo[] datanodes,
+  Token[] blockTokens,
+  ErasureCodingPolicy ecPolicy) {
+this.block = block;
+this.datanodes = datanodes;
+this.blockTokens = blockTokens;
+this.ecPolicy = ecPolicy;
+  }
+
+  public ExtendedBlock getBlock() {
+return block;
+  }
+
+  public DatanodeInfo[] getDatanodes() {
+return datanodes;
+  }
+
+  public Token[] getBlockTokens() {
+return blockTokens;
+  }
+
+  public ErasureCodingPolicy getErasureCodingPolicy() {
+return ecPolicy;
+  }
+}
{code}

> Make existing DFSClient#getFileChecksum() work for striped blocks
> -
>
> Key: HDFS-9694
> URL: https://issues.apache.org/jira/browse/HDFS-9694
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HDFS-9694-v1.patch, HDFS-9694-v2.patch, 
> HDFS-9694-v3.patch, HDFS-9694-v4.patch
>
>
> This is a sub-task of HDFS-8430 and will get the existing API 
> {{FileSystem#getFileChecksum(path)}} work for striped files. It will also 
> refactor existing codes and layout basic work for subsequent tasks like 
> support of the new API proposed there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9873) Ozone: Add container transport server

2016-03-07 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-9873:

Hadoop Flags: Reviewed

[~anu], thank you for the update.  +1 for patch v003 pending pre-commit.

> Ozone: Add container transport server
> -
>
> Key: HDFS-9873
> URL: https://issues.apache.org/jira/browse/HDFS-9873
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-9873-HDFS-7240.001.patch, 
> HDFS-9873-HDFS-7240.002.patch, HDFS-9873-HDFS-7240.003.patch
>
>
> Add server part of the container transport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9873) Ozone: Add container transport server

2016-03-07 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9873:
---
Attachment: HDFS-9873-HDFS-7240.003.patch

[~cnauroth] Thanks for the comments and sample code pointers. I have updated 
the patch based on your suggestions.


> Ozone: Add container transport server
> -
>
> Key: HDFS-9873
> URL: https://issues.apache.org/jira/browse/HDFS-9873
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-9873-HDFS-7240.001.patch, 
> HDFS-9873-HDFS-7240.002.patch, HDFS-9873-HDFS-7240.003.patch
>
>
> Add server part of the container transport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184000#comment-15184000
 ] 

Hadoop QA commented on HDFS-9914:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client: patch generated 1 
new + 66 unchanged - 0 fixed = 67 total (was 66) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 56s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client generated 1 new + 
0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 36s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-client |
|  |  Private method 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(int, 
Configuration) is never called  At URLConnectionFactory.java:never called  At 
URLConnectionFactory.java:[line 161] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca

[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-07 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183969#comment-15183969
 ] 

Zhe Zhang commented on HDFS-7866:
-

Thanks Rui for updating the patch. The latest version (v12) LGTM.

I also like Walter's idea on the Javadoc. Interesting thought on EC with 
contiguous layout (phase 2). [~walter.k.su] Are you OK with committing the v12 
patch and making the Javadoc change in the follow-on JIRA? As discussed above, 
we have a few cleanups to do, including adding a new constructor with policy ID.

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, 
> HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, 
> HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9914:
-
Attachment: HDFS-9914.001.patch

> Fix configurable WebhDFS connect/read timeout
> -
>
> Key: HDFS-9914
> URL: https://issues.apache.org/jira/browse/HDFS-9914
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-9914.001.patch
>
>
> Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is 
> opened to fix the following issues in current implementation:
> 1. The webhdfs read/connect timeout should not affect connection for other 
> callers of URLConnectionFactory.newSslConnConfigurator() such as 
> QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
> TransferFsImage()
> 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
> connect/read timeout even if any exception is thrown during customized SSL 
> configuration. 
>  
> 3.  OAuth2 webhdfs connection should honor the webhdfs connect/read timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9914:
-
Status: Patch Available  (was: Open)

> Fix configurable WebhDFS connect/read timeout
> -
>
> Key: HDFS-9914
> URL: https://issues.apache.org/jira/browse/HDFS-9914
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-9914.001.patch
>
>
> Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is 
> opened to fix the following issues in current implementation:
> 1. The webhdfs read/connect timeout should not affect connection for other 
> callers of URLConnectionFactory.newSslConnConfigurator() such as 
> QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
> TransferFsImage()
> 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
> connect/read timeout even if any exception is thrown during customized SSL 
> configuration. 
>  
> 3.  OAuth2 webhdfs connection should honor the webhdfs connect/read timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9914:
-
Description: 
Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is opened 
to fix the following issues in current implementation:

1. The webhdfs read/connect timeout should not affect connection for other 
callers of URLConnectionFactory.newSslConnConfigurator() such as 
QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage()

2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
connect/read timeout even if any exception is thrown during customized SSL 
configuration. 
 
3.  OAuth2 webhdfs connection should honor the webhdfs connect/read timeout.

  was:
Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is opened 
to fix the following issues in current implementation:

1. The webhdfs read/connect timeout should not affect connection for other 
callers of URLConnectionFactory.newSslConnConfigurator() such as 
QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage()

2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
connect/read timeout even if any exception is thrown during customized SSL 
configuration. 
 


> Fix configurable WebhDFS connect/read timeout
> -
>
> Key: HDFS-9914
> URL: https://issues.apache.org/jira/browse/HDFS-9914
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is 
> opened to fix the following issues in current implementation:
> 1. The webhdfs read/connect timeout should not affect connection for other 
> callers of URLConnectionFactory.newSslConnConfigurator() such as 
> QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
> TransferFsImage()
> 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
> connect/read timeout even if any exception is thrown during customized SSL 
> configuration. 
>  
> 3.  OAuth2 webhdfs connection should honor the webhdfs connect/read timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183887#comment-15183887
 ] 

Arpit Agarwal commented on HDFS-9882:
-

bq. We think heartbeatsTotal may be a good alternative.
Makes sense. Do you want to change the function name {{addHeartbeatTotalTime}} 
to be consistent with the metric name {{addHeartbeatTotal}}? The v3 patch looks 
fine otherwise.

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 
> 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, 
> 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned

2016-03-07 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183810#comment-15183810
 ] 

Jing Zhao commented on HDFS-8786:
-

Thanks for updating the patch, [~rakeshr]! The 004 patch looks pretty good to 
me. Some minors:
# Not all the decommissioning nodes can be later used as source nodes, since we 
still need to consider DataNode's current load etc. Thus I'm not sure the 
calculation is correct here. In the meanwhile, I do not think we should adjust 
additionalReplRequired: there is no need to leave decommissioning nodes to the 
next round. Thus looks like we do not need this change.
{code}
  // should reconstruct all the internal blocks before scheduling
  // replication task for decommissioning node(s).
  if (additionalReplRequired - numReplicas.decommissioning() > 0) {
additionalReplRequired = additionalReplRequired
- numReplicas.decommissioning();
  }
{code}
# We actually need to track if the reconstruction work is triggered only by 
"not enough rack". For normal EC reconstruction work maybe we also do not have 
enough racks, but we will keep {{enoughRack}} as true. Thus using a boolean 
"notEnoughRack" may be more accurate.
{code}
private boolean enoughRack = true;
{code}
# {{DatanodeManager#sortLocatedStripedBlocks}} can be added in a separate jira. 
We can also add some new tests for this change.
# In ErasureCodingWork, We can put the not-enough-rack logic and the 
decommissioning logic into two separate methods. Also if target size is smaller 
than sources, we do not need to create the block in {{addTaskToDatanode}}. 
Maybe the code after change can look like this:
{code}
  private void createReplicationWork(int sourceIndex, DatanodeStorageInfo 
target) {
BlockInfoStriped stripedBlk = (BlockInfoStriped) getBlock();
final byte blockIndex = liveBlockIndicies[sourceIndex];
final DatanodeDescriptor source = getSrcNodes()[sourceIndex];
final long internBlkLen = StripedBlockUtil.getInternalBlockLength(
stripedBlk.getNumBytes(), stripedBlk.getCellSize(),
stripedBlk.getDataBlockNum(), blockIndex);
final Block targetBlk = new Block(
stripedBlk.getBlockId() + blockIndex, internBlkLen,
stripedBlk.getGenerationStamp());
source.addBlockToBeReplicated(targetBlk, new DatanodeStorageInfo[]{target});
if (BlockManager.LOG.isDebugEnabled()) {
  BlockManager.LOG.debug("Add replication task from source {} to "
  + "target {} for EC block {}", source, target, targetBlk);
}
  }

  private List findDecommissioningSources() {
List srcIndices = new ArrayList<>();
for (int i = 0; i < getSrcNodes().length; i++) {
  if (getSrcNodes()[i].isDecommissionInProgress()) {
srcIndices.add(i);
  }
}
return srcIndices;
  }

  @Override
  void addTaskToDatanode(NumberReplicas numberReplicas) {
final DatanodeStorageInfo[] targets = getTargets();
assert targets.length > 0;
BlockInfoStriped stripedBlk = (BlockInfoStriped) getBlock();

if (hasNotEnoughRack()) {
  // if we already have all the internal blocks, but not enough racks,
  // we only need to replicate one internal block to a new rack
  int sourceIndex = chooseSource4SimpleReplication();
  createReplicationWork(sourceIndex, targets[0]);
} else if (numberReplicas.decommissioning() > 0 && hasAllInternalBlocks()) {
  List decommissioningSources = findDecommissioningSources();
  // decommissioningSources.size() should be >= targets.length
  final int num = Math.min(decommissioningSources.size(), targets.length);
  for (int i = 0; i < num; i++) {
createReplicationWork(decommissioningSources.get(i), targets[i]);
  }
} else {
  targets[0].getDatanodeDescriptor().addBlockToBeErasureCoded(
  new ExtendedBlock(blockPoolId, stripedBlk),
  getSrcNodes(), targets, getLiveBlockIndicies(),
  stripedBlk.getErasureCodingPolicy());
}
  }
{code}

> Erasure coding: DataNode should transfer striped blocks before being 
> decommissioned
> ---
>
> Key: HDFS-8786
> URL: https://issues.apache.org/jira/browse/HDFS-8786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Rakesh R
> Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, 
> HDFS-8786-003.patch, HDFS-8786-004.patch, HDFS-8786-draft.patch
>
>
> Per [discussion | 
> https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004]
>  under HDFS-8697, it's too expensive to reconstruct block groups for decomm 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9916) OzoneHandler : Add Key handler

2016-03-07 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9916:
---
Attachment: HDFS-9916-HDFS-7240.001.patch

> OzoneHandler : Add Key handler
> --
>
> Key: HDFS-9916
> URL: https://issues.apache.org/jira/browse/HDFS-9916
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-9916-HDFS-7240.001.patch
>
>
> Add Rest handlers for handing key related functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9916) OzoneHandler : Add Key handler

2016-03-07 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9916:
---
Status: Patch Available  (was: Open)

> OzoneHandler : Add Key handler
> --
>
> Key: HDFS-9916
> URL: https://issues.apache.org/jira/browse/HDFS-9916
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-9916-HDFS-7240.001.patch
>
>
> Add Rest handlers for handing key related functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9916) OzoneHandler : Add Key handler

2016-03-07 Thread Anu Engineer (JIRA)
Anu Engineer created HDFS-9916:
--

 Summary: OzoneHandler : Add Key handler
 Key: HDFS-9916
 URL: https://issues.apache.org/jira/browse/HDFS-9916
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Anu Engineer
Assignee: Anu Engineer
 Fix For: HDFS-7240


Add Rest handlers for handing key related functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9913) DispCp doesn't use Trash with -delete option

2016-03-07 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-9913:
--
Affects Version/s: (was: 2.7.2)
   2.6.0
 Target Version/s: 2.8.0

> DispCp doesn't use Trash with -delete option
> 
>
> Key: HDFS-9913
> URL: https://issues.apache.org/jira/browse/HDFS-9913
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Konstantin Shaposhnikov
>Assignee: John Zhuge
>
> Documentation for DistCp -delete option says 
> ([http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html]):
> | The deletion is done by FS Shell. So the trash will be used, if it is 
> enable.
> However it seems to be no longer the case. The latest source code 
> (https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java)
>  uses `FileSystem.delete` and trash options seems to be not applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9906) Remove spammy log spew when a datanode is restarted

2016-03-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183662#comment-15183662
 ] 

Hudson commented on HDFS-9906:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9438 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9438/])
HDFS-9906. Remove spammy log spew when a datanode is restarted. (arp: rev 
724d2299cd2516d90c030f6e20d814cceb439228)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Remove spammy log spew when a datanode is restarted
> ---
>
> Key: HDFS-9906
> URL: https://issues.apache.org/jira/browse/HDFS-9906
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-9906.patch
>
>
> {code}
> WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock 
> request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size 
> 268435456
> {code}
> This happens wy too much to add any useful information. We should either 
> move this to a different level or only warn once per machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9906) Remove spammy log spew when a datanode is restarted

2016-03-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9906:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've committed this for 2.8.0. Thank you for the contribution [~brahmareddy], 
and thanks [~cnauroth] and [~eclark] for the reviews.

> Remove spammy log spew when a datanode is restarted
> ---
>
> Key: HDFS-9906
> URL: https://issues.apache.org/jira/browse/HDFS-9906
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-9906.patch
>
>
> {code}
> WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock 
> request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size 
> 268435456
> {code}
> This happens wy too much to add any useful information. We should either 
> move this to a different level or only warn once per machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9906) Remove spammy log spew when a datanode is restarted

2016-03-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9906:

Issue Type: Improvement  (was: Bug)

> Remove spammy log spew when a datanode is restarted
> ---
>
> Key: HDFS-9906
> URL: https://issues.apache.org/jira/browse/HDFS-9906
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9906.patch
>
>
> {code}
> WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock 
> request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size 
> 268435456
> {code}
> This happens wy too much to add any useful information. We should either 
> move this to a different level or only warn once per machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9817) Use SLF4J in new classes

2016-03-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9817:

Target Version/s:   (was: HDFS-1312)

> Use SLF4J in new classes
> 
>
> Key: HDFS-9817
> URL: https://issues.apache.org/jira/browse/HDFS-9817
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: logging
>Affects Versions: HDFS-1312
>Reporter: Arpit Agarwal
>Assignee: Anu Engineer
> Fix For: HDFS-1312
>
> Attachments: HDFS-9817-HDFS-1312.001.patch, 
> HDFS-9817-HDFS-1312.002.patch
>
>
> We are trying to use SLF4J for new classes as far as possible so let's change 
> all the newly added classes to use SLF4J instead of depending on Log4J.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9817) Use SLF4J in new classes

2016-03-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9817:

Fix Version/s: HDFS-1312

> Use SLF4J in new classes
> 
>
> Key: HDFS-9817
> URL: https://issues.apache.org/jira/browse/HDFS-9817
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: logging
>Affects Versions: HDFS-1312
>Reporter: Arpit Agarwal
>Assignee: Anu Engineer
> Fix For: HDFS-1312
>
> Attachments: HDFS-9817-HDFS-1312.001.patch, 
> HDFS-9817-HDFS-1312.002.patch
>
>
> We are trying to use SLF4J for new classes as far as possible so let's change 
> all the newly added classes to use SLF4J instead of depending on Log4J.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9913) DispCp doesn't use Trash with -delete option

2016-03-07 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183607#comment-15183607
 ] 

John Zhuge commented on HDFS-9913:
--

The problem exists in CDH5.8.0 (2.6.0) based on my quick test.

> DispCp doesn't use Trash with -delete option
> 
>
> Key: HDFS-9913
> URL: https://issues.apache.org/jira/browse/HDFS-9913
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.2
>Reporter: Konstantin Shaposhnikov
>Assignee: John Zhuge
>
> Documentation for DistCp -delete option says 
> ([http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html]):
> | The deletion is done by FS Shell. So the trash will be used, if it is 
> enable.
> However it seems to be no longer the case. The latest source code 
> (https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java)
>  uses `FileSystem.delete` and trash options seems to be not applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9427) HDFS should not default to ephemeral ports

2016-03-07 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183478#comment-15183478
 ] 

Xiao Chen commented on HDFS-9427:
-

bq. I know nothing about KMS so not sure if it is safe to change the port 
number as Jonathan Hsieh proposed. I'll file a separate Jira for that and we 
can address the ephemeral ports here.
Looks like there's already HADOOP-12811 created for this. I can follow up on 
there and fix the kms default port as well. 
I don't think there's extra problems on changing the port on KMS, except for 
compatibility.

> HDFS should not default to ephemeral ports
> --
>
> Key: HDFS-9427
> URL: https://issues.apache.org/jira/browse/HDFS-9427
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client, namenode
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Xiaobing Zhou
>Priority: Critical
>  Labels: Incompatible
> Attachments: HDFS-9427.000.patch, HDFS-9427.001.patch, 
> HDFS-9427.002.patch
>
>
> HDFS defaults to ephemeral ports for the some HTTP/RPC endpoints. This can 
> cause bind exceptions on service startup if the port is in use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9427) HDFS should not default to ephemeral ports

2016-03-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183446#comment-15183446
 ] 

Colin Patrick McCabe commented on HDFS-9427:


Great idea, but can we please not use port 9075?  This port is the default 
HTrace RPC port.

> HDFS should not default to ephemeral ports
> --
>
> Key: HDFS-9427
> URL: https://issues.apache.org/jira/browse/HDFS-9427
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client, namenode
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Xiaobing Zhou
>Priority: Critical
>  Labels: Incompatible
> Attachments: HDFS-9427.000.patch, HDFS-9427.001.patch, 
> HDFS-9427.002.patch
>
>
> HDFS defaults to ephemeral ports for the some HTTP/RPC endpoints. This can 
> cause bind exceptions on service startup if the port is in use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9915) Typo in DistCp.html

2016-03-07 Thread John Zhuge (JIRA)
John Zhuge created HDFS-9915:


 Summary: Typo in DistCp.html
 Key: HDFS-9915
 URL: https://issues.apache.org/jira/browse/HDFS-9915
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, documentation
Affects Versions: 2.7.2
Reporter: John Zhuge
Assignee: John Zhuge
Priority: Trivial


Typo in http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html

| The deletion is done by FS Shell. So the trash will be used, if it is enable.

Should be "enabled".

Maybe the whole sentence can be rewritten as:

| If HDFS Trash is enabled, the files will be moved to the trash folder.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9913) DispCp doesn't use Trash with -delete option

2016-03-07 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge reassigned HDFS-9913:


Assignee: John Zhuge

> DispCp doesn't use Trash with -delete option
> 
>
> Key: HDFS-9913
> URL: https://issues.apache.org/jira/browse/HDFS-9913
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.2
>Reporter: Konstantin Shaposhnikov
>Assignee: John Zhuge
>
> Documentation for DistCp -delete option says 
> ([http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html]):
> | The deletion is done by FS Shell. So the trash will be used, if it is 
> enable.
> However it seems to be no longer the case. The latest source code 
> (https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java)
>  uses `FileSystem.delete` and trash options seems to be not applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek

2016-03-07 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183351#comment-15183351
 ] 

Chris Nauroth commented on HDFS-7597:
-

bq. I'm still not sure why HDFS-8855 is/was necessary because this internal 
patch solved the problem for us long ago.

Yes, agreed.  That's why it was a forehead-smacking moment when I realized the 
same issue essentially had been fixed twice mistakenly.

I agree that this patch is a more general solution.  We might consider pulling 
out HDFS-8855 later as a clean-up.  As far as scope for this patch, do you want 
to do something to address the {{TestDataNodeUGIProvider}} failure, and we'll 
defer any further clean-up to a separate issue?

> DNs should not open new NN connections when webhdfs clients seek
> 
>
> Key: HDFS-7597
> URL: https://issues.apache.org/jira/browse/HDFS-7597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch
>
>
> Webhdfs seeks involve closing the current connection, and reissuing a new 
> open request with the new offset.  The RPC layer caches connections so the DN 
> keeps a lingering connection open to the NN.  Connection caching is in part 
> based on UGI.  Although the client used the same token for the new offset 
> request, the UGI is different which forces the DN to open another unnecessary 
> connection to the NN.
> A job that performs many seeks will easily crash the NN due to fd exhaustion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9865) TestBlockReplacement fails intermittently in trunk

2016-03-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1518#comment-1518
 ] 

Hudson commented on HDFS-9865:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9436 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9436/])
HDFS-9865. TestBlockReplacement fails intermittently in trunk (Lin Yiqun 
(iwasakims: rev d718fc1ee5aee3628e105339ee3ea183b6242409)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java


> TestBlockReplacement fails intermittently in trunk
> --
>
> Key: HDFS-9865
> URL: https://issues.apache.org/jira/browse/HDFS-9865
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9865.001.patch, HDFS-9865.002.patch
>
>
> I found the testcase {{TestBlockReplacement}} will be failed sometimes in 
> testing. And I looked the unit log, always I will found these infos:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement
> testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement)
>   Time elapsed: 8.764 sec  <<< FAILURE!
> java.lang.AssertionError: The block should be only on 1 datanode  
> expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436)
> {code}
> Finally I found the reason is that not deleting block completely in 
> testDeletedBlockWhenAddBlockIsInEdit cause the datanode's num not correct. 
> And the time to wait FsDatasetAsyncDsikService to delete the block is not a 
> accurate value. 
> {code}
> LOG.info("replaceBlock:  " + replaceBlock(block,
>   (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc,
>   (DatanodeInfo)destDnDesc));
> // Waiting for the FsDatasetAsyncDsikService to delete the block
> Thread.sleep(3000);
> {code}
> When I adjust this time to 1 seconds, it will be always failed. Also the 3 
> seconds in test is not a accurate value too. We should adjust these code's 
> logic to a better way such as waiting for the block to be replicated in 
> testDecommision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDFS-9914:


Assignee: Xiaoyu Yao

> Fix configurable WebhDFS connect/read timeout
> -
>
> Key: HDFS-9914
> URL: https://issues.apache.org/jira/browse/HDFS-9914
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is 
> opened to fix the following issues in current implementation:
> 1. The webhdfs read/connect timeout should not affect connection for other 
> callers of URLConnectionFactory.newSslConnConfigurator() such as 
> QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
> TransferFsImage()
> 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
> connect/read timeout even if any exception is thrown during customized SSL 
> configuration. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable

2016-03-07 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183322#comment-15183322
 ] 

Xiaoyu Yao commented on HDFS-9887:
--

Filed HDFS-9914 for the fix. 

> WebHdfs socket timeouts should be configurable
> --
>
> Key: HDFS-9887
> URL: https://issues.apache.org/jira/browse/HDFS-9887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, webhdfs
> Environment: all
>Reporter: Austin Donnelly
>Assignee: Austin Donnelly
>  Labels: easyfix, newbie
> Fix For: 2.8.0
>
> Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, 
> HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, 
> HADOOP-12827.004.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> WebHdfs client connections use sockets with fixed timeouts of 60 seconds to 
> connect, and 60 seconds for reads.
> This is a problem because I am trying to use WebHdfs to access an archive 
> storage system which can take minutes to hours to return the requested data 
> over WebHdfs.
> The fix is to add new configuration file options to allow these 60s defaults 
> to be customised in hdfs-site.xml.
> If the new configuration options are not present, the behavior is unchanged 
> from before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-9914:


 Summary: Fix configurable WebhDFS connect/read timeout
 Key: HDFS-9914
 URL: https://issues.apache.org/jira/browse/HDFS-9914
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiaoyu Yao


Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is opened 
to fix the following issues in current implementation:

1. The webhdfs read/connect timeout should not affect connection for other 
callers of URLConnectionFactory.newSslConnConfigurator() such as 
QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage()

2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
connect/read timeout even if any exception is thrown during customized SSL 
configuration. 
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable

2016-03-07 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183301#comment-15183301
 ] 

Xiaoyu Yao commented on HDFS-9887:
--

Thanks [~jojochuang] for reporting this. Further reading found that the webhdfs 
specific read/connect timeout implemented by HDFS-9887 should not affect other 
callers of {{URLConnectionFactory.newSslConnConfigurator()}} such as 
{{QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
TransferFsImage()}}. I will file separate ticket to fix it. 

> WebHdfs socket timeouts should be configurable
> --
>
> Key: HDFS-9887
> URL: https://issues.apache.org/jira/browse/HDFS-9887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, webhdfs
> Environment: all
>Reporter: Austin Donnelly
>Assignee: Austin Donnelly
>  Labels: easyfix, newbie
> Fix For: 2.8.0
>
> Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, 
> HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, 
> HADOOP-12827.004.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> WebHdfs client connections use sockets with fixed timeouts of 60 seconds to 
> connect, and 60 seconds for reads.
> This is a problem because I am trying to use WebHdfs to access an archive 
> storage system which can take minutes to hours to return the requested data 
> over WebHdfs.
> The fix is to add new configuration file options to allow these 60s defaults 
> to be customised in hdfs-site.xml.
> If the new configuration options are not present, the behavior is unchanged 
> from before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9865) TestBlockReplacement fails intermittently in trunk

2016-03-07 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-9865:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.7.3
  2.8.0
Target Version/s: 2.7.3
  Status: Resolved  (was: Patch Available)

+1. Committed to branch-2.7 and above. Thanks, [~linyiqun].

> TestBlockReplacement fails intermittently in trunk
> --
>
> Key: HDFS-9865
> URL: https://issues.apache.org/jira/browse/HDFS-9865
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9865.001.patch, HDFS-9865.002.patch
>
>
> I found the testcase {{TestBlockReplacement}} will be failed sometimes in 
> testing. And I looked the unit log, always I will found these infos:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement
> testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement)
>   Time elapsed: 8.764 sec  <<< FAILURE!
> java.lang.AssertionError: The block should be only on 1 datanode  
> expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436)
> {code}
> Finally I found the reason is that not deleting block completely in 
> testDeletedBlockWhenAddBlockIsInEdit cause the datanode's num not correct. 
> And the time to wait FsDatasetAsyncDsikService to delete the block is not a 
> accurate value. 
> {code}
> LOG.info("replaceBlock:  " + replaceBlock(block,
>   (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc,
>   (DatanodeInfo)destDnDesc));
> // Waiting for the FsDatasetAsyncDsikService to delete the block
> Thread.sleep(3000);
> {code}
> When I adjust this time to 1 seconds, it will be always failed. Also the 3 
> seconds in test is not a accurate value too. We should adjust these code's 
> logic to a better way such as waiting for the block to be replicated in 
> testDecommision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9904) testCheckpointCancellationDuringUpload occasionally fails

2016-03-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183248#comment-15183248
 ] 

Kihwal Lee commented on HDFS-9904:
--

Thanks for working on the fix. The config is used to determine whether to 
create a new checkpoint. A standby will, after loading/replaying edits, check 
how many transactions went by since the last checkpoint. If the number is 
greater than the configured limit, it will do checkpoint. As you can see from 
the test output, there are around 106 transactions at the end. In order to 
prevent the standby from checkpointing, the config value should be bigger than 
that. E.g. 1000.  Also, it should be set before the namenode is started and 
should be reset for other test cases.

> testCheckpointCancellationDuringUpload occasionally fails 
> --
>
> Key: HDFS-9904
> URL: https://issues.apache.org/jira/browse/HDFS-9904
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.3
>Reporter: Kihwal Lee
> Attachments: HDFS-9904.001.patch
>
>
> The failure was at the end of the test case where the txid of the standby 
> (former active) is checked. Since the checkpoint/uploading was canceled , it 
> is not supposed to have the new checkpoint. Looking at the test log, that was 
> still the case, but the standby then did checkpoint on its own and bumped up 
> the txid, right before the check was performed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable

2016-03-07 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183241#comment-15183241
 ] 

Xiaoyu Yao commented on HDFS-9887:
--

Agree, this is a bug. Webhdfs with ssl configuration exception will not honor 
the configurable webhdfs connect/read timeout. It will always be 
{{DEFAULT_TIMEOUT_CONN_CONFIGURATOR}} the default value (1 min). 

> WebHdfs socket timeouts should be configurable
> --
>
> Key: HDFS-9887
> URL: https://issues.apache.org/jira/browse/HDFS-9887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, webhdfs
> Environment: all
>Reporter: Austin Donnelly
>Assignee: Austin Donnelly
>  Labels: easyfix, newbie
> Fix For: 2.8.0
>
> Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, 
> HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, 
> HADOOP-12827.004.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> WebHdfs client connections use sockets with fixed timeouts of 60 seconds to 
> connect, and 60 seconds for reads.
> This is a problem because I am trying to use WebHdfs to access an archive 
> storage system which can take minutes to hours to return the requested data 
> over WebHdfs.
> The fix is to add new configuration file options to allow these 60s defaults 
> to be customised in hdfs-site.xml.
> If the new configuration options are not present, the behavior is unchanged 
> from before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9634) webhdfs client side exceptions don't provide enough details

2016-03-07 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183227#comment-15183227
 ] 

Wei-Chiu Chuang commented on HDFS-9634:
---

This seems to contain a bug: After the exception is reinterpreted, the original 
stack trace is lost, and it's impossible to tell where the exception occurred.

> webhdfs client side exceptions don't provide enough details
> ---
>
> Key: HDFS-9634
> URL: https://issues.apache.org/jira/browse/HDFS-9634
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.8.0, 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.7.3
>
> Attachments: HDFS-9634.001.patch, HDFS-9634.002.patch
>
>
> When a WebHDFS client side exception (for example, read timeout) occurs there 
> are no details beyond the fact that a timeout occurred. Ideally it should say 
> which node is responsible for the timeout, but failing that it should at 
> least say which node we're talking to so we can examine that node's logs to 
> further investigate.
> {noformat}
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.MeteredStream.read(MeteredStream.java:134)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
> at 
> org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at 
> com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
> ... 12 more
> {noformat}
> There are no clues as to which datanode we're talking to nor which datanode 
> was responsible for the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally

2016-03-07 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183225#comment-15183225
 ] 

Wei-Chiu Chuang commented on HDFS-9905:
---

[~kihwal]: you are right. This is unrelated to HDFS-9887, even though HDFS-9887 
also contains another bug unrelated to this issue.
But thanks to your comments. I think I should rebase to 2.7 to avoid exception 
reinterpretation in HDFS-9634.

> TestWebHdfsTimeouts fails occasionally
> --
>
> Key: HDFS-9905
> URL: https://issues.apache.org/jira/browse/HDFS-9905
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.3
>Reporter: Kihwal Lee
>Assignee: Wei-Chiu Chuang
>
> When checking for a timeout, it does get {{SocketTimeoutException}}, but the 
> message sometimes does not contain "connect timed out". Since the original 
> exception is not logged, we do not know details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable

2016-03-07 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183218#comment-15183218
 ] 

Wei-Chiu Chuang commented on HDFS-9887:
---

I think this patch is not complete -- if for some reason SSL configuration 
throws an exception, the socket timeouts will not be configured, even if the 
connection is supposed to be webhdfs only, not swebhdfs.

> WebHdfs socket timeouts should be configurable
> --
>
> Key: HDFS-9887
> URL: https://issues.apache.org/jira/browse/HDFS-9887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, webhdfs
> Environment: all
>Reporter: Austin Donnelly
>Assignee: Austin Donnelly
>  Labels: easyfix, newbie
> Fix For: 2.8.0
>
> Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, 
> HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, 
> HADOOP-12827.004.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> WebHdfs client connections use sockets with fixed timeouts of 60 seconds to 
> connect, and 60 seconds for reads.
> This is a problem because I am trying to use WebHdfs to access an archive 
> storage system which can take minutes to hours to return the requested data 
> over WebHdfs.
> The fix is to add new configuration file options to allow these 60s defaults 
> to be customised in hdfs-site.xml.
> If the new configuration options are not present, the behavior is unchanged 
> from before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally

2016-03-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183214#comment-15183214
 ] 

Kihwal Lee commented on HDFS-9905:
--

bq. Looks like a regression introduced after HDFS-9887
We were seeing this failure in the 2.7 builds, which does not have HDFS-9887.

> TestWebHdfsTimeouts fails occasionally
> --
>
> Key: HDFS-9905
> URL: https://issues.apache.org/jira/browse/HDFS-9905
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.3
>Reporter: Kihwal Lee
>Assignee: Wei-Chiu Chuang
>
> When checking for a timeout, it does get {{SocketTimeoutException}}, but the 
> message sometimes does not contain "connect timed out". Since the original 
> exception is not logged, we do not know details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally

2016-03-07 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183212#comment-15183212
 ] 

Wei-Chiu Chuang commented on HDFS-9905:
---

This is the full stacktrace: unfortunately, the exception object was 
reinterpreted in the exception handling, so the original stack trace was lost.
We might also want to retain the original full stack trace.

{noformat}
Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.555 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
testAuthUrlReadTimeout[timeoutSource=Configuration](org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts)
  Time elapsed: 0.036 sec  <<< FAILURE!
java.lang.AssertionError: Expected to find 'localhost:38159: Read timed out' 
but got unexpected exception:java.net.SocketTimeoutException: localhost:38159: 
null
at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:733)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:555)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:586)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:582)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1456)
at 
org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:733)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:555)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:586)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:582)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1456)
at 
org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:195)
{noformat}


> TestWebHdfsTimeouts fails occasionally
> --
>
> Key: HDFS-9905
> URL: https://issues.apache.org/jira/browse/HDFS-9905
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.3
>Reporter: Kihwal Lee
>Assignee: Wei-Chiu Chuang
>
> When checking for a timeout, it does get {{SocketTimeoutException}}, but the 
> message sometimes does not contain "connect timed out". Since the original 
> exception is not logged, we do not know details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.

2016-03-07 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183159#comment-15183159
 ] 

Daryn Sharp commented on HDFS-9874:
---

The synchronization on {{FSDatasetImpl#stopAllDataxceiverThreads}} is a bit 
concerning.  Stopping xceiver threads uses a default timeout of 1min.  That's a 
long time for the DN to block if threads don't exit immediately.

The iteration of replicas might not be safe.  The correct locking model isn't 
immediately clear but  {{ReplicaMap#replicas}} has the comment which other code 
doesn't appear to follow:
{noformat}
  /**
   * Get a collection of the replicas for given block pool
   * This method is not synchronized. It needs to be synchronized
   * externally using the mutex, both for getting the replicas
   * values from the map and iterating over it. Mutex can be accessed using
   * {@link #getMutext()} method.
{noformat}

Might need to consider forcibly decrementing the ref and interrupting with no 
timeout.

For the test, I'd assert the volume actually has a non-zero ref count before 
trying to interrupt.  Instead of triggering an async check and sleeping, which 
inevitable creates flaky race conditions, the disk check should be invoked 
non-async.  Should verify that the client stream fails after the volume is 
failed.

> Long living DataXceiver threads cause volume shutdown to block.
> ---
>
> Key: HDFS-9874
> URL: https://issues.apache.org/jira/browse/HDFS-9874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Critical
> Attachments: HDFS-9874-trunk.patch
>
>
> One of the failed volume shutdown took 3 days to complete.
> Below are the relevant datanode logs while shutting down a volume (due to 
> disk failure)
> {noformat}
> 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing 
> failed volume volumeA/current: 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not 
> writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized
> at 
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194)
> at 
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing 
> scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23)
> 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting.
> 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b.
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp 
> (Read-only file system)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633)
> 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to 
> delete old dfsUsed file in 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current
> 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Failed to 
> write dfsUsed to 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp

[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-07 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183118#comment-15183118
 ] 

Walter Su commented on HDFS-7866:
-

What do you think let it diverge instead of forcing unification? It takes more 
time to understand the new format from the code. I think add some javadoc would 
be nice.
How about like this:
{noformat}
 /** 
   * Bit format:
   * [4-bit storagePolicyID][12-bit BLOCK_LAYOUT_AND_REDUNDANCY]
   * [48-bit preferredBlockSize]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for replicated block:
   * 0 [11-bit replication]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for striped block:
   * 1 [11-bit ErasureCodingPolicy ID]
   *
   */
{noformat}
I think getErasureCodingPolicyID() don't have to re-use getReplication(long 
header). Even though now they are both 11-bits. In the future,
We might keep split 11-bit ec policy ID. The 2 methods keep diverging. I guess 
something like,
{noformat}
 /** 
   * Bit format:
   * [4-bit storagePolicyID][12-bit BLOCK_LAYOUT_AND_REDUNDANCY]
   * [48-bit preferredBlockSize]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for non-ec block:
   * 0 [11-bit replication]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for ec striped  block:
   * 10 [4-bit replication][6-bit ErasureCodingPolicy ID]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for ec contiguous block:
   * 11 [4-bit replication][6-bit ErasureCodingPolicy ID]
   *
   */
{noformat}
And I think we should reserve some high-value ID for custom policy. And reserve 
some for unknown  policy which might intergrated(hard-coded) in the future.

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, 
> HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, 
> HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek

2016-03-07 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183095#comment-15183095
 ] 

Daryn Sharp commented on HDFS-7597:
---

[~cnauroth] We can re-brand this as a more general improvement since it helps 
not only the DN but also the NN by reducing the per-connection UGI instances.  
I'm still not sure why HDFS-8855 is/was necessary because this internal patch 
solved the problem for us long ago.

> DNs should not open new NN connections when webhdfs clients seek
> 
>
> Key: HDFS-7597
> URL: https://issues.apache.org/jira/browse/HDFS-7597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch
>
>
> Webhdfs seeks involve closing the current connection, and reissuing a new 
> open request with the new offset.  The RPC layer caches connections so the DN 
> keeps a lingering connection open to the NN.  Connection caching is in part 
> based on UGI.  Although the client used the same token for the new offset 
> request, the UGI is different which forces the DN to open another unnecessary 
> connection to the NN.
> A job that performs many seeks will easily crash the NN due to fd exhaustion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9865) TestBlockReplacement fails intermittently in trunk

2016-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183061#comment-15183061
 ] 

Hadoop QA commented on HDFS-9865:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 55s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 56s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 27s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 159m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
| JDK v1.8.0_74 Timed out junit tests | 
org.apache.hadoop.hdfs.TestLeaseRecovery2 |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.server.datanode.TestDataNodeLifeline |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791747/HDFS-9865.002.patch |
| JIRA Issue | HDFS-9865 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  check

[jira] [Commented] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK

2016-03-07 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182982#comment-15182982
 ] 

Brahma Reddy Battula commented on HDFS-9902:


[~panyuxuan] thanks for reporting this issue..

I feel, we can introduce a general config, which can be used for all 
storagetypes individually.

 *Example:* 
* Configuration can be like {{dfs.datanode.du.reserved.}} 
,storage-type can be lowercase
   ** Conf for {color:blue}RAM_DISK{color} is 
{{dfs.datanode.du.reserved.ram_disk}} and 
 ** for {color:blue}SSD{color} is {{dfs.datanode.du.reserved.ssd}}

and consider the default value as value of {{dfs.datanode.du.reserved}}.

> dfs.datanode.du.reserved should be difference between StorageType DISK and 
> RAM_DISK
> ---
>
> Key: HDFS-9902
> URL: https://issues.apache.org/jira/browse/HDFS-9902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9902.patch
>
>
> Now Hadoop support different storage type for DISK, SSD, ARCHIVE and 
> RAM_DISK, but they share one configuration dfs.datanode.du.reserved.
> The DISK size may be several TB and the RAM_DISK size may be only several 
> tens of GB.
> The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same 
> DN, and I set  dfs.datanode.du.reserved values 10GB, this will waste a lot of 
> RAM_DISK size. 
> Since the usage of RAM_DISK can be 100%, so I don't want 
> dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs.
> So can we make a new configuration for RAM_DISK or just skip this 
> configuration for RAM_DISK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK

2016-03-07 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-9902:
---
Attachment: HDFS-9902.patch

> dfs.datanode.du.reserved should be difference between StorageType DISK and 
> RAM_DISK
> ---
>
> Key: HDFS-9902
> URL: https://issues.apache.org/jira/browse/HDFS-9902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9902.patch
>
>
> Now Hadoop support different storage type for DISK, SSD, ARCHIVE and 
> RAM_DISK, but they share one configuration dfs.datanode.du.reserved.
> The DISK size may be several TB and the RAM_DISK size may be only several 
> tens of GB.
> The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same 
> DN, and I set  dfs.datanode.du.reserved values 10GB, this will waste a lot of 
> RAM_DISK size. 
> Since the usage of RAM_DISK can be 100%, so I don't want 
> dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs.
> So can we make a new configuration for RAM_DISK or just skip this 
> configuration for RAM_DISK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9521) TransferFsImage.receiveFile should account and log separate times for image download and fsync to disk

2016-03-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182950#comment-15182950
 ] 

Hudson commented on HDFS-9521:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9433 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9433/])
HDFS-9521. TransferFsImage.receiveFile should account and log separate (harsh: 
rev fd1c09be3e7c67c188a1dd7e4fccb3d92dcc5b5b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java


> TransferFsImage.receiveFile should account and log separate times for image 
> download and fsync to disk 
> ---
>
> Key: HDFS-9521
> URL: https://issues.apache.org/jira/browse/HDFS-9521
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HDFS-9521-2.patch, HDFS-9521-3.patch, 
> HDFS-9521.004.patch, HDFS-9521.patch, HDFS-9521.patch.1
>
>
> Currently, TransferFsImage.receiveFile is logging total transfer time as 
> below:
> {noformat}
> double xferSec = Math.max(
>((float)(Time.monotonicNow() - startTime)) / 1000.0, 0.001);
> long xferKb = received / 1024;
> LOG.info(String.format("Transfer took %.2fs at %.2f KB/s",xferSec, xferKb / 
> xferSec))
> {noformat}
> This is really useful, but it just measures the total method execution time, 
> which includes time taken to download the image and do an fsync to all the 
> namenode metadata directories.
> Sometime when troubleshooting these imager transfer problems, it's 
> interesting to know which part of the process is being the bottleneck 
> (whether network or disk write).
> This patch accounts time for image download and fsync to each disk 
> separately, logging how much time did it take on each operation.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9521) TransferFsImage.receiveFile should account and log separate times for image download and fsync to disk

2016-03-07 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-9521:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

> TransferFsImage.receiveFile should account and log separate times for image 
> download and fsync to disk 
> ---
>
> Key: HDFS-9521
> URL: https://issues.apache.org/jira/browse/HDFS-9521
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HDFS-9521-2.patch, HDFS-9521-3.patch, 
> HDFS-9521.004.patch, HDFS-9521.patch, HDFS-9521.patch.1
>
>
> Currently, TransferFsImage.receiveFile is logging total transfer time as 
> below:
> {noformat}
> double xferSec = Math.max(
>((float)(Time.monotonicNow() - startTime)) / 1000.0, 0.001);
> long xferKb = received / 1024;
> LOG.info(String.format("Transfer took %.2fs at %.2f KB/s",xferSec, xferKb / 
> xferSec))
> {noformat}
> This is really useful, but it just measures the total method execution time, 
> which includes time taken to download the image and do an fsync to all the 
> namenode metadata directories.
> Sometime when troubleshooting these imager transfer problems, it's 
> interesting to know which part of the process is being the bottleneck 
> (whether network or disk write).
> This patch accounts time for image download and fsync to each disk 
> separately, logging how much time did it take on each operation.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9521) TransferFsImage.receiveFile should account and log separate times for image download and fsync to disk

2016-03-07 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182915#comment-15182915
 ] 

Harsh J commented on HDFS-9521:
---

+1.

The check-point related tests in one of the tests seemed relevant but they pass 
locally on both JDK7 and JDK8.

{code}
Running org.apache.hadoop.hdfs.TestRollingUpgrade
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 99.737 sec - 
in org.apache.hadoop.hdfs.TestRollingUpgrade
{code}

Therefore they appear to be flaky than at fault here. Other tests appear 
similarly unrelated to the log change here (no tests appear to rely on the 
original message either).

Committing to branch-2 and trunk shortly.

> TransferFsImage.receiveFile should account and log separate times for image 
> download and fsync to disk 
> ---
>
> Key: HDFS-9521
> URL: https://issues.apache.org/jira/browse/HDFS-9521
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-9521-2.patch, HDFS-9521-3.patch, 
> HDFS-9521.004.patch, HDFS-9521.patch, HDFS-9521.patch.1
>
>
> Currently, TransferFsImage.receiveFile is logging total transfer time as 
> below:
> {noformat}
> double xferSec = Math.max(
>((float)(Time.monotonicNow() - startTime)) / 1000.0, 0.001);
> long xferKb = received / 1024;
> LOG.info(String.format("Transfer took %.2fs at %.2f KB/s",xferSec, xferKb / 
> xferSec))
> {noformat}
> This is really useful, but it just measures the total method execution time, 
> which includes time taken to download the image and do an fsync to all the 
> namenode metadata directories.
> Sometime when troubleshooting these imager transfer problems, it's 
> interesting to know which part of the process is being the bottleneck 
> (whether network or disk write).
> This patch accounts time for image download and fsync to each disk 
> separately, logging how much time did it take on each operation.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9865) TestBlockReplacement fails intermittently in trunk

2016-03-07 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182887#comment-15182887
 ] 

Lin Yiqun commented on HDFS-9865:
-

Thanks [~iwasakims] for review. Update the latest patch for addressing comments.

> TestBlockReplacement fails intermittently in trunk
> --
>
> Key: HDFS-9865
> URL: https://issues.apache.org/jira/browse/HDFS-9865
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9865.001.patch, HDFS-9865.002.patch
>
>
> I found the testcase {{TestBlockReplacement}} will be failed sometimes in 
> testing. And I looked the unit log, always I will found these infos:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement
> testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement)
>   Time elapsed: 8.764 sec  <<< FAILURE!
> java.lang.AssertionError: The block should be only on 1 datanode  
> expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436)
> {code}
> Finally I found the reason is that not deleting block completely in 
> testDeletedBlockWhenAddBlockIsInEdit cause the datanode's num not correct. 
> And the time to wait FsDatasetAsyncDsikService to delete the block is not a 
> accurate value. 
> {code}
> LOG.info("replaceBlock:  " + replaceBlock(block,
>   (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc,
>   (DatanodeInfo)destDnDesc));
> // Waiting for the FsDatasetAsyncDsikService to delete the block
> Thread.sleep(3000);
> {code}
> When I adjust this time to 1 seconds, it will be always failed. Also the 3 
> seconds in test is not a accurate value too. We should adjust these code's 
> logic to a better way such as waiting for the block to be replicated in 
> testDecommision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9865) TestBlockReplacement fails intermittently in trunk

2016-03-07 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9865:

Attachment: HDFS-9865.002.patch

> TestBlockReplacement fails intermittently in trunk
> --
>
> Key: HDFS-9865
> URL: https://issues.apache.org/jira/browse/HDFS-9865
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9865.001.patch, HDFS-9865.002.patch
>
>
> I found the testcase {{TestBlockReplacement}} will be failed sometimes in 
> testing. And I looked the unit log, always I will found these infos:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement
> testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement)
>   Time elapsed: 8.764 sec  <<< FAILURE!
> java.lang.AssertionError: The block should be only on 1 datanode  
> expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436)
> {code}
> Finally I found the reason is that not deleting block completely in 
> testDeletedBlockWhenAddBlockIsInEdit cause the datanode's num not correct. 
> And the time to wait FsDatasetAsyncDsikService to delete the block is not a 
> accurate value. 
> {code}
> LOG.info("replaceBlock:  " + replaceBlock(block,
>   (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc,
>   (DatanodeInfo)destDnDesc));
> // Waiting for the FsDatasetAsyncDsikService to delete the block
> Thread.sleep(3000);
> {code}
> When I adjust this time to 1 seconds, it will be always failed. Also the 3 
> seconds in test is not a accurate value too. We should adjust these code's 
> logic to a better way such as waiting for the block to be replicated in 
> testDecommision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9521) TransferFsImage.receiveFile should account and log separate times for image download and fsync to disk

2016-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182879#comment-15182879
 ] 

Hadoop QA commented on HDFS-9521:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 9s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 67m 39s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 160m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeLifeline |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.server.namenode.TestEditLog |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791736/HDFS-9521.004.patch |
| JIRA Issue | HDFS-9521 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbug

[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182845#comment-15182845
 ] 

Hadoop QA commented on HDFS-7866:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} hadoop-hdfs-project: patch generated 0 new + 152 
unchanged - 10 fixed = 152 total (was 162) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 49s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
|

[jira] [Updated] (HDFS-9521) TransferFsImage.receiveFile should account and log separate times for image download and fsync to disk

2016-03-07 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-9521:
--
Attachment: HDFS-9521.004.patch

LGTM. Just had two checkstyle nits I've corrected in this variant, aside of 
some spacing logic. Will commit once jenkins returns +1.

The previously failed tests don't appear related.

> TransferFsImage.receiveFile should account and log separate times for image 
> download and fsync to disk 
> ---
>
> Key: HDFS-9521
> URL: https://issues.apache.org/jira/browse/HDFS-9521
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-9521-2.patch, HDFS-9521-3.patch, 
> HDFS-9521.004.patch, HDFS-9521.patch, HDFS-9521.patch.1
>
>
> Currently, TransferFsImage.receiveFile is logging total transfer time as 
> below:
> {noformat}
> double xferSec = Math.max(
>((float)(Time.monotonicNow() - startTime)) / 1000.0, 0.001);
> long xferKb = received / 1024;
> LOG.info(String.format("Transfer took %.2fs at %.2f KB/s",xferSec, xferKb / 
> xferSec))
> {noformat}
> This is really useful, but it just measures the total method execution time, 
> which includes time taken to download the image and do an fsync to all the 
> namenode metadata directories.
> Sometime when troubleshooting these imager transfer problems, it's 
> interesting to know which part of the process is being the bottleneck 
> (whether network or disk write).
> This patch accounts time for image download and fsync to each disk 
> separately, logging how much time did it take on each operation.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)